VLG | Computer Vision and Learning Group

Basic Information

I am dedicated on human-centered AI, in particular on human behavior perceiving and synthesis in 3D scenes. This is an intersection area between computer vision, machine learning, computer graphics, robotics, and cognitive science. The goal is to capture high-quality human motions and infer behavioural intentions from various modalities, learn generative models of human behaviour, and synthesize it in novel environments. The core scientific challenge is to understand why and how our bodies move. The donwstream applications include digital scene population, synthetic data creation, computer-aidded design, game/VFX, smart home, healthcare, and beyond. I am actively exploring and implementing solutions to real world problems, in order to make our lives better.

I am currently a postdoc researcher at Computer Vision and Learning Group (VLG), ETH Zurich, working with Prof. Siyu Tang. Before I was a research intern at Perceiving Systems Department, MPI Tuebingen, working with Prof. Michael J. Black. I got my PhD degree at Ulm University with magna cum laude, supervised by Prof. Heiko Neumann.

Social

Publications

DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes

Conference: International Conference on Computer Vision (ICCV 2023)

Authors:Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, Siyu Tang
Interaction with environments is one core ability of virtual humans and remains a challenging problem. We propose a method capable of generating a sequence of natural interaction events in real cluttered scenes.

Project PDF BibTeX

EgoHMR: Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views

Conference: International Conference on Computer Vision (ICCV 2023) oral presentation

Authors:Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang
We propose a novel scene-conditioned probabilistic method to recover the human mesh from an egocentric view image (typically with the body truncated) in the 3D environment.

Project PDF BibTeX

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo and Siyu Tang
A large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.

Project Code Dataset Challenge BibTeX

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, Siyu Tang
Synthesizing natural interactions between virtual humans and their 3D environments is critical for numerous applications, such as computer games and AR/VR experiences. We propose COINS, for COmpositional INteraction Synthesis with Semantic Control.

Project PDF Code BibTeX

SAGA: Stochastic Whole-Body Grasping with Contact

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang
(* denotes equal contribution)
Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

Project PDF Code BibTeX

The Wanderings of Odysseus in 3D Scenes

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2022)

Authors:Yan Zhang, and Siyu Tang
We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

Project PDF Code BibTeX

Learning Motion Priors for 4D Human Body Capture in 3D Scenes

Conference: International Conference on Computer Vision (ICCV 2021) oral presentation

Authors:Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang
LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

Project PDF Code BibTeX

We are More than Our Joints: Predicting how 3D Bodies Move

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2021)

Authors:Yan Zhang, Michael J. Black and Siyu Tang
"We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Project PDF Code BibTeX

LEAP: Learning Articulated Occupancy of People

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2021)

Authors:Marko Mihajlovic, Yan Zhang, Michael J. Black and Siyu Tang
LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.

Project PDF Code BibTeX

Grasping Field: Learning Implicit Representations for Human Grasps

Conference: International Virtual Conference on 3D Vision (3DV) 2020 oral presentation & best paper

Authors:Korrawe Karunratanakul, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, Siyu Tang
Capturing and synthesizing hand-object interaction is essential for understanding human behaviours, and is key to a number of applications including VR/AR, robotics and human-computer interaction.

PDF Code BibTeX

PLACE: Proximity Learning of Articulation and Contact in 3D Environments

Conference: International Virtual Conference on 3D Vision (3DV) 2020

Authors:Siwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, Siyu Tang
Automated synthesis of realistic humans posed naturally in a 3D scene is essential for many applications. In this paper we propose explicit representations for the 3D scene and the person-scene contact relation in a coherent manner.

Project PDF Code BibTeX

Perpetual Motion: Generating Unbounded Human Motion

Arxiv: arXiv preprint arXiv:2007.13886

Authors:Yan Zhang, Michael J. Black, Siyu Tang
In this work, our goal is to generate significantly longer, or “perpetual”, motion: given a short motion sequence or even a static body pose, the goal is to generate non-deterministic ever-changing human motions in the future.

PDF

Generating 3D People in Scenes without People

Conference: Computer Vision and Pattern Recognition (CVPR) 2020 oral presentation

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.

PDF Code BibTeX

Basic Information

Social

Publications

DIMOS: Synthesizing Diverse Human Motions in 3D Indoor Scenes

Authors:Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, Siyu Tang Interaction with environments is one core ability of virtual humans and remains a challenging problem. We propose a method capable of generating a sequence of natural interaction events in real cluttered scenes.

EgoHMR: Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views

Authors:Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang We propose a novel scene-conditioned probabilistic method to recover the human mesh from an egocentric view image (typically with the body truncated) in the 3D environment.

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control

SAGA: Stochastic Whole-Body Grasping with Contact

Authors:Yan Wu*, Jiahao Wang*, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang (* denotes equal contribution) Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

The Wanderings of Odysseus in 3D Scenes

Authors:Yan Zhang, and Siyu Tang We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

Learning Motion Priors for 4D Human Body Capture in 3D Scenes

Authors:Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-​stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

We are More than Our Joints: Predicting how 3D Bodies Move

Authors:Yan Zhang, Michael J. Black and Siyu Tang "We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

LEAP: Learning Articulated Occupancy of People

Authors:Marko Mihajlovic, Yan Zhang, Michael J. Black and Siyu Tang LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.

Grasping Field: Learning Implicit Representations for Human Grasps

PLACE: Proximity Learning of Articulation and Contact in 3D Environments

Perpetual Motion: Generating Unbounded Human Motion

Authors:Yan Zhang, Michael J. Black, Siyu Tang In this work, our goal is to generate significantly longer, or “perpetual”, motion: given a short motion sequence or even a static body pose, the goal is to generate non-​deterministic ever-​changing human motions in the future.

Generating 3D People in Scenes without People

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang We present a fully-​automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.

Authors:Kaifeng Zhao, Yan Zhang, Shaofei Wang, Thabo Beeler, Siyu Tang
Interaction with environments is one core ability of virtual humans and remains a challenging problem. We propose a method capable of generating a sequence of natural interaction events in real cluttered scenes.

Authors:Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang
We propose a novel scene-conditioned probabilistic method to recover the human mesh from an egocentric view image (typically with the body truncated) in the 3D environment.

Authors:Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang
(* denotes equal contribution)
Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

Authors:Yan Zhang, and Siyu Tang
We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

Authors:Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang
LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

Authors:Yan Zhang, Michael J. Black and Siyu Tang
"We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Authors:Marko Mihajlovic, Yan Zhang, Michael J. Black and Siyu Tang
LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.

Authors:Yan Zhang, Michael J. Black, Siyu Tang
In this work, our goal is to generate significantly longer, or “perpetual”, motion: given a short motion sequence or even a static body pose, the goal is to generate non-deterministic ever-changing human motions in the future.

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.