Dr. Yan Zhang

CNB G 103.1

Basic Information

I am dedicated on human-centered AI, in particular on human behavior perceiving and synthesis in 3D scenes. This is an intersection area between computer vision, machine learning, computer graphics, robotics, and cognitive science. The goal is to capture high-quality human motions and infer behavioural intentions from various modalities, learn generative models of human behaviour, and synthesize it in novel environments. The core scientific challenge is to understand why and how our bodies move. The donwstream applications include digital scene population, synthetic data creation, computer-aidded design, game/VFX, smart home, healthcare, and beyond. I am actively exploring and implementing solutions to real world problems, in order to make our lives better.

I am currently a postdoc researcher at Computer Vision and Learning Group (VLG), ETH Zurich, working with Prof. Siyu Tang. Before I was a research intern at Perceiving Systems Department, MPI Tuebingen, working with Prof. Michael J. Black. I got my PhD degree at Ulm University with magna cum laude, supervised by Prof. Heiko Neumann.



Authors: Yan Zhang, and Siyu Tang

We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

AuthorsMiao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M. Rehg, Siyu Tang

We seek to reconstruct 4D second-person human body meshes that are grounded on the 3D scene captured in an egocentric view. Our method exploits 2D observations from the entire video sequence and the 3D scene context to optimize human body models over time, and thereby leads to more accurate human motion capture and more realistic human-scene interaction.

Authors: Qianli Ma, Jinlong Yang, Siyu Tang and Michael J. Black

We introduce POP — a point-based, unified model for multiple subjects and outfits that can turn a single, static 3D scan into an animatable avatar with natural pose-dependent clothing deformations.

AuthorsSiwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang

LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-​stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

Authors: Yan Zhang, Michael J. Black and Siyu Tang

"We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Authors: Marko Mihajlovic, Yan Zhang, Michael J. Black and Siyu Tang

LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.

AuthorsKorrawe Karunratanakul, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, Siyu Tang

Capturing and synthesizing hand-​object interaction is essential for understanding human behaviours, and is key to a number of applications including VR/AR, robotics and human-​computer interaction. 

AuthorsSiwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, Siyu Tang

Automated synthesis of realistic humans posed naturally in a 3D scene is essential for many applications. In this paper we propose explicit representations for the 3D scene and the person-​scene contact relation in a coherent manner.

AuthorsYan Zhang, Michael J. Black, Siyu Tang

In this work, our goal is to generate significantly longer, or “perpetual”, motion: given a short motion sequence or even a static body pose, the goal is to generate non-​deterministic ever-​changing human motions in the future.

AuthorsYan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang

We present a fully-​automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.