VLG | Computer Vision and Learning Group

A key step towards understanding human behavior is predicting 3D human motion. Our group has made several contributions in this direction. First, we focused on learning robust and efficient marker-based representations of 3D human bodies in motion. We proposed new generative motion models and optimization algorithms to synthesize realistic human motion sequences. Second, we developed an efficient and fully automated system to generate long-term, even infinite motion for various human shapes. Specifically, given a 3D scene, e.g., digital architecture, our model can generate a massive number of virtual humans, who possess diverse body shapes, move perpetually, and have plausible body-scene contact in an automatic,efficient, scalable, and controllable manner. Besides top-tier scientific publications, our motion synthesis method is the key component for the exhibition of inhabiting a virtual city hosted by theGuggenheim Museum Bilbao.

Authors:

Prof. Dr. Siyu Tang
Assistant Professor of Computer Science, CNB G 104

Dr. Yan Zhang
Meshcapade

Siwei Zhang
PostDoc CAB G 89

Korrawe Karunratanakul
PostDoc CAB G 89

Dr. Qianli Ma
Nvidia

Kaifeng Zhao
PhD student CAB G 85.1

Publications

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Kaifeng Zhao, Shaofei Wang, Yan Zhang, Thabo Beeler, Siyu Tang
Synthesizing natural interactions between virtual humans and their 3D environments is critical for numerous applications, such as computer games and AR/VR experiences. We propose COINS, for COmpositional INteraction Synthesis with Semantic Control.

Project PDF Code BibTeX

SAGA: Stochastic Whole-Body Grasping with Contact

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang
(* denotes equal contribution)
Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

Project PDF Code BibTeX

The Wanderings of Odysseus in 3D Scenes

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2022)

Authors:Yan Zhang, and Siyu Tang
We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

Project PDF Code BibTeX

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Conference: International Virtual Conference on 3D Vision (3DV 2021) oral presentation

Authors:Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang
We present HALO, a neural occupancy representation for articulated hands that produce implicit hand surfaces from input skeletons in a differentiable manner.

Project PDF Code BibTeX

We are More than Our Joints: Predicting how 3D Bodies Move

Conference: Conference on Computer Vision and Pattern Recognition (CVPR 2021)

Authors:Yan Zhang, Michael J. Black and Siyu Tang
"We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Project PDF Code BibTeX

Grasping Field: Learning Implicit Representations for Human Grasps

Conference: International Virtual Conference on 3D Vision (3DV) 2020 oral presentation & best paper

Authors:Korrawe Karunratanakul, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, Siyu Tang
Capturing and synthesizing hand-object interaction is essential for understanding human behaviours, and is key to a number of applications including VR/AR, robotics and human-computer interaction.

PDF Code BibTeX

PLACE: Proximity Learning of Articulation and Contact in 3D Environments

Conference: International Virtual Conference on 3D Vision (3DV) 2020

Authors:Siwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, Siyu Tang
Automated synthesis of realistic humans posed naturally in a 3D scene is essential for many applications. In this paper we propose explicit representations for the 3D scene and the person-scene contact relation in a coherent manner.

Project PDF Code BibTeX

Generating 3D People in Scenes without People

Conference: Computer Vision and Pattern Recognition (CVPR) 2020 oral presentation

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.

PDF Code BibTeX

Authors:

Publications

COINS: Compositional Human-Scene Interaction Synthesis with Semantic Control

SAGA: Stochastic Whole-Body Grasping with Contact

Authors:Yan Wu*, Jiahao Wang*, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang (* denotes equal contribution) Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

The Wanderings of Odysseus in 3D Scenes

Authors:Yan Zhang, and Siyu Tang We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

HALO: A Skeleton-Driven Neural Occupancy Representation for Articulated Hands

Authors:Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang We present HALO, a neural occupancy representation for articulated hands that produce implicit hand surfaces from input skeletons in a differentiable manner.

We are More than Our Joints: Predicting how 3D Bodies Move

Authors:Yan Zhang, Michael J. Black and Siyu Tang "We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Grasping Field: Learning Implicit Representations for Human Grasps

PLACE: Proximity Learning of Articulation and Contact in 3D Environments

Generating 3D People in Scenes without People

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang We present a fully-​automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.

Authors:Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu and Siyu Tang
(* denotes equal contribution)
Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

Authors:Yan Zhang, and Siyu Tang
We propose GAMMA, an automatic and scalable solution, to populate the 3D scene with diverse digital humans. The digital humans have 1) varied body shapes, 2) realistic and perpetual motions to reach goals, and 3) plausible body-ground contact.

Authors:Korrawe Karunratanakul, Adrian Spurr, Zicong Fan, Otmar Hilliges, Siyu Tang
We present HALO, a neural occupancy representation for articulated hands that produce implicit hand surfaces from input skeletons in a differentiable manner.

Authors:Yan Zhang, Michael J. Black and Siyu Tang
"We are more than our joints", or MOJO for short, is a solution to stochastic motion prediction of expressive 3D bodies. Given a short motion from the past, MOJO generates diverse plausible motions in the near future.

Authors:Yan Zhang, Mohamed Hassan, Heiko Neumann, Michael J. Black, Siyu Tang
We present a fully-automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene.