Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

GGPT: Geometry Grounded Point Transformer

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2026)

Authors:Yutong ChenYiming WangXucong ZhangSergey ProkudinSiyu Tang

GGPT can use reliable geometric guidance to augment various feed-forward method for 3D reconstruction.

Masked Modeling for Human Motion Recovery Under Occlusions

ConferenceInternational Conference on 3D Vision (3DV 2026)

Authors:Zhiyin QianSiwei ZhangBharat Lal BhatnagarFederica BogoSiyu Tang

Given a monocular video captured from a static camera, MoRo robustly reconstructs accurate and physically plausible human motion, even under challenging occlusion scenarios.

Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

ConferenceSIGGRAPH Asia 2025 Conference Track

Authors:Yiming WangShaofei WangMarko MihajlovicSiyu Tang

Neural Texture Splatting is an expressive extension of 3D Gaussian Splatting that introduces a local neural RGBA field for each primitive.

Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting

ConferenceNeurIPS 2025

Authors:Yiming WangLucy ChaiXuan LuoMichael NiemeyerManuel LagunasStephen LombardiSiyu TangTiancheng Sun

SplatVoxel is a hybrid Splat-Voxel representation that fuses and refines Gaussian Splatting, improving static scene reconstruction and enabling history-aware streaming reconstruction in a zero-shot manner.

DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting

ConferenceICCV 2025 Findings Workshop

Authors:Zeren JiangShaofei WangSiyu Tang

DNF-Avatar, a novel framework to distill knowledge from implicit model to explicit one for real-time rendering and relighting.

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

ConferenceInternational Conference on Computer Vision (ICCV 2025) highlight

Authors:Yan WuKorrawe KarunratanakulZhengyi LuoSiyu Tang

UniPhys is a diffusion-based unified planner and text-driven controller for physics-based character control. It generalizes across diverse tasks using a single model—from short-term reactive control tasks to long-term planning tasks, without requiring task-specific training.


DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Rui WangQuentin LohmeyerMirko MeboldtSiyu Tang

With gaussian splatting based self-supervised dynamic-static decomposition, DeGauss models SOTA distractor-free static scene from occluded inputs as casual captured images & challenging egocentric videos, and simultaneously yields high-quality & Efficient dynamic scene representation.

Latest News

Here’s what we've been up to recently.