Abstract
Labeling large-scale datasets with very accurate objectsegmentations is an elaborate task that requires a high de-gree of quality control and a budget of tens or hundreds ofthousands of dollars. Thus, developing solutions that canautomatically perform the labeling given only weak superimage stereo depth vision is key to reduce this cost. In this paper, we show howto exploit 3D information to automatically generate very ac-curate object segmentations given annotated 3D bounding models 3D bbox (GT) boxes. We formulate the problem as the one of inference ina binary Markov random field which exploits appearance models, stereo and/or noisy point clouds, a repository of 3D CAD models as well as topological constraints. We demon-strate the effectiveness of our approach in the context of autonomous driving, and show that we can segment cars with the accuracy of 86% intersection-over-union, performing as well as highly recommended MTurkers!