VLG | Computer Vision and Learning Group

Authors:Alexey Nekrasov, Jonas Schult, Or Litany, Bastian Leibe, Francis Engelmann

Abstract

Mix3D is a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for a car. In this work, we focus on the importance of balancing global scene context and local geometry, with the goal of generalizing beyond the contextual priors in the training set. In particular, we propose a "mixing" technique which creates new training samples by combining two augmented scenes. By doing so, object instances are implicitly placed into novel out-of-context environments and therefore making it harder for models to rely on scene context alone, and instead infer semantics from local structure as well.

Authors:

Dr. Francis Engelmann
PostDoc at Stanford University

Links:

Project PDF Source BibTeX

Mix3D: Out-of-Context Data Augmentation for 3D Scenes

Conference: International Conference on 3D Vision (3DV 2021) oral presentation

Abstract

Authors:

Links: