Welcome to
Computer Vision and Learning Group.

...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

RoHM: Robust Human Motion Reconstruction via Diffusion

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Siwei ZhangBharat Lal BhatnagarYuanlu XuAlexander WinklerPetr KadlecekSiyu TangFederica Bogo

Conditioned on noisy and occluded input data, RoHM reconstructs complete, plausible motions in consistent global coordinates.

EgoGen: An Egocentric Synthetic Data Generator

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Gen LiKaifeng ZhaoSiwei ZhangXiaozhong LyuMihai DusmanuYan ZhangMarc PollefeysSiyu Tang

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Xiyi Chen Marko Mihajlovic Shaofei Wang Sergey Prokudin Siyu Tang

We introduce a morphable diffusion model to enable consistent controllable novel view synthesis of humans from a single image. Given a single input image and a morphable mesh with a desired facial expression, our method directly generates 3D consistent and photo-realistic images from novel viewpoints, which we could use to reconstruct a coarse 3D model using off-the-shelf neural surface reconstruction methods such as NeuS2.

3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Zhiyin QianShaofei WangMarko MihajlovicAndreas GeigerSiyu Tang

Given a monocular video, 3DGS-Avatar learns clothed human avatars that model pose-dependent appearance and generalize to out-of-distribution poses, with short training time and interactive rendering frame rate.

ResFields: Residual Neural Fields for Spatiotemporal Signals

Conference: International Conference on Learning Representations (ICLR 2024) spotlight presentation

Authors:Marko MihajlovicSergey ProkudinMarc PollefeysSiyu Tang

ResField layers incorporates time-dependent weights into MLPs to effectively represent complex temporal signals.

Latest News

Here’s what we've been up to recently.