Egocentric Human Behavior Understanding

With the continuous advancement of mobile processing and the increasing availability of extended reality (XR) headsets, extended reality will soon become available to the masses. We Envision a future world where humans, objects, and scenes from the physical and digital worlds are merged. In this embodied and immersive future, the ability to understand humans is essential for extended reality headsets. However, current work in the computer vision literature mainly focuses on third-person view scenarios. The egocentric perception is a fundamentally new and challenging arena, where existing data and models learned in the third-person view setting do not generalize to the unprecedented scale and diversity of the egocentric setting. Our group has focused on bridging this gap. To understand human-human interactions from egocentric views, we have proposed novel datasets (EgoBody) and methods (4DCapture) to reconstruct human motion. Furthermore, together with Microsoft and the ETH AI center, we will host the first egocentric human pose and shape estimation challenge in conjunction with ECCV2022. By sharing our data and benchmark with the research community, we envision our work as a major milestone to unlock 3D human body estimation in the egocentric setting.

Authors:

Prof. Dr. Siyu Tang
Assistant Professor of Computer Science, CNB G 104

Dr. Yan Zhang
Meshcapade

Siwei Zhang
Meta

Dr. Qianli Ma
Nvidia

Publications

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

Conference: European Conference on Computer Vision (ECCV 2022)

Authors:Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo and Siyu Tang
A large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.

Project Code Dataset Challenge BibTeX

4D Human Body Capture from Egocentric Video via 3D Scene Grounding

Conference: International Virtual Conference on 3D Vision (3DV 2021)

Authors:Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M. Rehg, Siyu Tang
We seek to reconstruct 4D second-person human body meshes that are grounded on the 3D scene captured in an egocentric view. Our method exploits 2D observations from the entire video sequence and the 3D scene context to optimize human body models over time, and thereby leads to more accurate human motion capture and more realistic human-scene interaction.

Project PDF Code BibTeX

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity

Conference: European Conference on Computer Vision (ECCV) 2020 oral presentation

Authors:Miao Liu, Siyu Tang, Yin Li, and James M. Rehg
We address the challenging task of anticipating human-object interaction in first person videos. We adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action.

Project PDF Code BibTeX