Welcome to
Computer Vision and Learning Group.


Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

Human3D: 3D Segmentation of Humans in Point Clouds with Synthetic Data

Conference: International Conference on Computer Vision (ICCV 2023)

Authors:Ayça Takmaz*Jonas Schult*Irem KaftanMertcan AkçayBastian LeibeRobert SumnerFrancis EngelmannSiyu Tang

We propose the first multi-human body-part segmentation model, called Human3D 🧑‍🤝‍🧑, that directly operates on 3D scenes. In an extensive analysis, we validate the benefits of training on synthetic data on multiple baselines and tasks.

DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes

Conference: International Conference on Computer Vision (ICCV 2023)

Authors:Kaifeng ZhaoYan Zhang,  Shaofei WangThabo BeelerSiyu Tang

Interaction with environments is one core ability of virtual humans and remains a challenging problem. We propose a method capable of generating a sequence of natural interaction events in real cluttered scenes.

GMD: Controllable Human Motion Synthesis via Guided Diffusion Models

Conference: International Conference on Computer Vision (ICCV 2023)

Authors:Korrawe KarunratanakulKonpat PreechakulSupasorn SuwajanakornSiyu Tang

Guided Motion Diffusion (GMD) model can synthesize realistic human motion according to a text prompt, a reference trajectory, and key locations, as well as avoiding hitting your toe on giant X-mark circles that someone dropped on the floor. No need to retrain diffusion models for each of these tasks!

EgoHMR: Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views

Conference: International Conference on Computer Vision (ICCV 2023) oral presentation

Authors:Siwei ZhangQianli MaYan ZhangSadegh AliakbarianDarren CoskerSiyu Tang

We propose a novel scene-conditioned probabilistic method to recover the human mesh from an egocentric view image (typically with the body truncated) in the 3D environment.

Dynamic Point Fields: Towards Efficient and Scalable Dynamic Surface Representations

Conference: International Conference on Computer Vision (ICCV 2023) oral presentation

Authors:Sergey ProkudinQianli MaMaxime RaafatJulien ValentinSiyu Tang

We propose to model dynamic surfaces with a point-based model, where the motion of a point over time is represented by an implicit deformation field. Working directly with points (rather than SDFs) allows us to easily incorporate various well-known deformation constraints, e.g. as-isometric-as-possible. We showcase the usefulness of this approach for creating animatable avatars in complex clothing.

Dictionary Fields: Learning a Neural Basis Decomposition

Journal: Siggraph 2023 Journal Track

Authors:Anpei ChenZexiang XuXinyue WeiSiyu TangHao SuAndreas Geiger

We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates.

Interactive Object Segmentation in 3D Point Clouds

Conference: International Conference on Robotics and Automation (ICRA 2023) Best Paper Nominee

Authors:Theodora KontogianniEkin CelikkanSiyu TangKonrad Schindler

We present interactive object segmentation directly in 3D point clouds. Users provide feedback to a deep learning model in the form of positive and negative clicks to segment a 3D object of interest.

Latest News

Here’s what we've been up to recently.