Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

ConferenceInternational Conference on Computer Vision (ICCV 2025) highlight

Authors:Yan WuKorrawe KarunratanakulZhengyi LuoSiyu Tang

UniPhys is a diffusion-based unified planner and text-driven controller for physics-based character control. It generalizes across diverse tasks using a single model—from short-term reactive control tasks to long-term planning tasks, without requiring task-specific training.


DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Rui WangQuentin LohmeyerMirko MeboldtSiyu Tang

With gaussian splatting based self-supervised dynamic-static decomposition, DeGauss models SOTA distractor-free static scene from occluded inputs as casual captured images & challenging egocentric videos, and simultaneously yields high-quality & Efficient dynamic scene representation.

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

ConferenceInternational Conference on Computer Vision (ICCV 2025) highlight

Authors:Marko MihajlovicSiwei ZhangGen LiKaifeng ZhaoLea MüllerSiyu Tang

VolumetricSMPL is a lightweight extension that adds volumetric capabilities to SMPL(-X) models for efficient 3D interactions and collision detection.

EgoM2P: Egocentric Multimodal Multitask Pretraining

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Gen LiYutong Chen*Yiqian Wu*Kaifeng Zhao*Marc PollefeysSiyu Tang (*equal contribution)

EgoM2P: A large-scale egocentric multimodal and multitask model, pretrained on eight extensive egocentric datasets. It incorporates four modalities—RGB and depth video, gaze dynamics, and camera trajectories—to handle challenging tasks like monocular egocentric depth estimation, camera tracking, gaze estimation, and conditional egocentric video synthesis

Spline Deformation Field

Conference: SIGGRAPH 2025 Conference Track

Authors:Mingyang SongYang ZhangMarko MihajlovicSiyu TangMarkus GrossTunc Aydin

We combine splines, a classical tool from applied mathematics, with implicit Coordinate Neural Networks to model deformation fields, achieving strong performance across multiple datasets. The explicit regularization from spline interpolation enhances spatial coherency in challenging scenarios. We further introduce a metric based on Moran's I to quantitatively evaluate spatial coherence.

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering

Conference: SIGGRAPH 2025 Conference Track

Authors:Zinuo YouStamatios GeorgoulisAnpei ChenSiyu TangDengxin Dai

GaVS reformulates video stabilization task with feed-forward 3DGS reconstruction, ensuring robustness to diverse motions, full-frame rendering and high geometry consistency.

Latest News

Here’s what we've been up to recently.