Welcome to
Computer Vision and Learning Group.


Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

ConferenceSiggraph 2024 Journal Track

Authors:Yiqian WuHao XuXiangjun TangXien ChenSiyu Tang, Zhebin Zhang, Chen Li, Xiaogang Jin

Portrait3D is a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve high-quality text-to-3D-portrait generation.

Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Yan ZhangSergey ProkudinMarko MihajlovicQianli MaSiyu Tang

DOMA is an implicit motion field modeled by a spatiotemporal SIREN network. The learned motion field can predict how novel points move in the same field.

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Korrawe KarunratanakulKonpat PreechakulEmre AksanThabo BeelerSupasorn SuwajanakornSiyu Tang

Diffusion Noise Optimization (DNO) can leverage the existing human motion diffusion models as universal motion priors. We demonstrate its capability in the motion editing tasks where DNO can preserve the content of the original model and accommodates a diverse range of editing modes, including changing trajectory, pose, joint location, and avoiding newly added obstacles.

RoHM: Robust Human Motion Reconstruction via Diffusion

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024) oral presentation

Authors:Siwei ZhangBharat Lal BhatnagarYuanlu XuAlexander WinklerPetr KadlecekSiyu TangFederica Bogo

Conditioned on noisy and occluded input data, RoHM reconstructs complete, plausible motions in consistent global coordinates.

EgoGen: An Egocentric Synthetic Data Generator

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024) oral presentation

Authors:Gen LiKaifeng ZhaoSiwei ZhangXiaozhong LyuMihai DusmanuYan ZhangMarc PollefeysSiyu Tang

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Xiyi Chen Marko Mihajlovic Shaofei Wang Sergey Prokudin Siyu Tang

We introduce a morphable diffusion model to enable consistent controllable novel view synthesis of humans from a single image. Given a single input image and a morphable mesh with a desired facial expression, our method directly generates 3D consistent and photo-realistic images from novel viewpoints, which we could use to reconstruct a coarse 3D model using off-the-shelf neural surface reconstruction methods such as NeuS2.

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Zhiyin QianShaofei WangMarko MihajlovicAndreas GeigerSiyu Tang

Given a monocular video, 3DGS-Avatar learns clothed human avatars that model pose-dependent appearance and generalize to out-of-distribution poses, with short training time and interactive rendering frame rate.

Latest News

Here’s what we've been up to recently.