Siwei Zhang

PhD student

Basic Information

I am a PhD student at Computer Vision and Learning Group (VLG), ETH Zürich, supervised by Professor Siyu Tang. Prior to this, I obtained my Master degree (2020) in Electrical Engineering and Information Technology, ETH Zürich, and Bachelor degree in Automation, Tsinghua University (2017).

My research focuses on human-scene interaction learning, human motion modelling and egocentric human understanding, particularly with the 3D scenes.


  • Microsoft Swiss Joint Research Center 2022  

    Egocentric Interaction Capture for Mixed Reality.

  • Microsoft Swiss Joint Research Center 2021  

    Learning Motion Priors for 4D Human Body Capture in 3D Scenes.



AuthorsSiwei ZhangQianli Ma, Yan ZhangZhiyin Qian, Taein KwonMarc Pollefeys, Federica Bogo and Siyu Tang

EgoBody is a large-scale egocentric dataset for human 3D motion and social interactions in 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground-truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human poses and shapes relative to the scene.

AuthorsYan WuJiahao Wang, Yan Zhang, Siwei ZhangOtmar Hilliges, Fisher Yu and Siyu Tang

Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

AuthorsSiwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang

LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-​stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

AuthorsSiwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, Siyu Tang

Automated synthesis of realistic humans posed naturally in a 3D scene is essential for many applications. In this paper we propose explicit representations for the 3D scene and the person-​scene contact relation in a coherent manner.

AuthorsSiwei Zhang, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool

To reduce human labelling effort on multi-task labels, we introduce a new problem of facial emotion recognition with noisy multi-task annotations.

AuthorsYan Wu, Aoming Liu, Zhiwu Huang, Siwei Zhang, Luc Van Gool

This paper aims at enlarging the problem of Neural Architecture Search (NAS) from Single-Path and Multi-Path Search to automated Mixed-Path Search.

Authors: Yunxuan Zhang, Siwei Zhang, Yue He, Cheng Li, Chen Change Loy, Ziwei Liu

To enable realistic shape (e.g. pose and expression) transfer, we bridge this gap by proposing a novel one-shot face reenactment learning system.