Siwei Zhang

PhD student

Basic Information

I am a PhD student at Computer Vision and Learning Group (VLG), ETH Zürich, supervised by Professor Siyu Tang. Prior to this, I obtained my Master degree (2020) in Electrical Engineering and Information Technology, ETH Zürich, and Bachelor degree in Automation, Tsinghua University (2017).

My research focuses on human-scene interaction learning, human motion modelling and egocentric human understanding, particularly with the 3D scenes.


  • 2nd International Ego4D Workshop @ ECCV 2022  

    EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices.

  • Microsoft Swiss Joint Research Center 2022  

    Egocentric Interaction Capture for Mixed Reality.

  • Microsoft Swiss Joint Research Center 2021  

    Learning Motion Priors for 4D Human Body Capture in 3D Scenes.



Authors:Siwei ZhangQianli MaYan ZhangZhiyin QianTaein KwonMarc PollefeysFederica Bogo and Siyu Tang

A large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.

Authors:Yan Wu*Jiahao Wang*Yan ZhangSiwei ZhangOtmar HilligesFisher Yu and Siyu Tang
(* denotes equal contribution)

Our goal is to synthesize whole-body grasping motion. Given a 3D object, we aim to generate diverse and natural whole-body human motions that approach and grasp the object.

Authors:Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys and Siyu Tang

LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-​stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

Authors:Siwei ZhangZhiwu HuangDanda Pani Paudel, and Luc Van Gool

To reduce human labelling effort on multi-task labels, we introduce a new problem of facial emotion recognition with noisy multi-task annotations.

Authors:Yan Wu*; Aoming Liu*, Zhiwu HuangSiwei Zhang, and Luc Van Gool

We model the NAS problem as a sparse supernet using a new continuous architecture representation with a mixture of sparsity constraints.

Authors:Siwei Zhang, Yan Zhang, Qianli MaMichael J. Black, Siyu Tang

Automated synthesis of realistic humans posed naturally in a 3D scene is essential for many applications. In this paper we propose explicit representations for the 3D scene and the person-​scene contact relation in a coherent manner.

Authors:Yunxuan Zhang,  Siwei Zhang, Yue He, Cheng Li, Chen Change Loy, and Ziwei Liu

We propose a novel one-shot face reenactment learning system, that is able to disentangle and compose appearance and shape information for effective modeling.