Welcome to
Computer Vision and Learning Group.


Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

Dictionary Fields: Learning a Neural Basis Decomposition

Journal: Siggraph 2023 Journal Track

Authors:Anpei ChenZexiang XuXinyue WeiSiyu TangHao SuAndreas Geiger

We present Dictionary Fields, a novel neural representation which decomposes a signal into a product of factors, each represented by a classical or neural field representation, operating on transformed input coordinates.

Interactive Object Segmentation in 3D Point Clouds

Conference: International Conference on Robotics and Automation (ICRA 2023) Best Paper Nominee

Authors:Theodora KontogianniEkin CelikkanSiyu TangKonrad Schindler

We present interactive object segmentation directly in 3D point clouds. Users provide feedback to a deep learning model in the form of positive and negative clicks to segment a 3D object of interest.

HARP: Personalized Hand Reconstruction from a Monocular RGB Video

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2023)

Authors:Korrawe KarunratanakulSergey ProkudinOtmar HilligesSiyu Tang

We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry.

Mask3D: Mask Transformer for 3D Instance Segmentation

Conference: International Conference on Robotics and Automation (ICRA 2023)

Authors:Jonas SchultFrancis EngelmannAlexander HermansOr LitanySiyu Tang, and Bastian Leibe

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

Neural Point-based Shape Modeling of Humans in Challenging Clothing

Conference:  International Conference on 3D Vision (3DV 2022)

Authors:Qianli Ma, Jinlong Yang, Michael J. Black and Siyu Tang

The power of point-based digital human representations further unleashed: SkiRT models dynamic shapes of 3D clothed humans including those that wear challenging outfits such as skirts and dresses.

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Siwei ZhangQianli MaYan ZhangZhiyin QianTaein KwonMarc PollefeysFederica Bogo and Siyu Tang

A large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.

Latest News

Here’s what we've been up to recently.