Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Rui WangQuentin LohmeyerMirko MeboldtSiyu Tang

With gaussian splatting based self-supervised dynamic-static decomposition, DeGauss models SOTA distractor-free static scene from occluded inputs as casual captured images & challenging egocentric videos, and simultaneously yields high-quality & Efficient dynamic scene representation.

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

ConferenceInternational Conference on Computer Vision (ICCV 2025) Highlight

Authors:Marko MihajlovicSiwei ZhangGen LiKaifeng ZhaoLea MüllerSiyu Tang

VolumetricSMPL is a lightweight extension that adds volumetric capabilities to SMPL(-X) models for efficient 3D interactions and collision detection.

EgoM2P: Egocentric Multimodal Multitask Pretraining

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Gen LiYutong Chen*Yiqian Wu*Kaifeng Zhao*Marc PollefeysSiyu Tang (*equal contribution)

EgoM2P: A large-scale egocentric multimodal and multitask model, pretrained on eight extensive egocentric datasets. It incorporates four modalities—RGB and depth video, gaze dynamics, and camera trajectories—to handle challenging tasks like monocular egocentric depth estimation, camera tracking, gaze estimation, and conditional egocentric video synthesis

Spline Deformation Field

Conference: SIGGRAPH 2025 Conference Track

Authors:Mingyang SongYang ZhangMarko MihajlovicSiyu TangMarkus GrossTunc Aydin

We combine splines, a classical tool from applied mathematics, with implicit Coordinate Neural Networks to model deformation fields, achieving strong performance across multiple datasets. The explicit regularization from spline interpolation enhances spatial coherency in challenging scenarios. We further introduce a metric based on Moran's I to quantitatively evaluate spatial coherence.

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering

Conference: SIGGRAPH 2025 Conference Track

Authors:Zinuo YouStamatios GeorgoulisAnpei ChenSiyu TangDengxin Dai

GaVS reformulates video stabilization task with feed-forward 3DGS reconstruction, ensuring robustness to diverse motions, full-frame rendering and high geometry consistency.

Text-based Animatable 3D Avatars with Morphable Model Alignment

Conference: SIGGRAPH 2025 Conference Track

Authors:Yiqian WuMalte PrinzlerXiaogang JinSiyu Tang

AnimPortrait3D is a novel method for text-based, realistic, animatable 3DGS avatar generation with morphable model alignment.

RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering

Conference: The 12th International Conference on 3D Vision (3DV 2025)

Authors:Deheng Zhang*Jingyu Wang*Shaofei WangMarko MihajlovicSergey ProkudinHendrik P.A. LenschSiyu Tang (*equal contribution)

We present RISE-SDF, a method for reconstructing the geometry and material of glossy objects while achieving high-quality relighting.

Latest News

Here’s what we've been up to recently.