Abstract
We present an approach to interpret the ma jor surfaces, ob- jects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, in- door scenes into floor, walls, supporting surfaces, and ob ject regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpreta- tion. We also contribute a novel integer programming formulation to infer physical support relations. We offer a new dataset of 1449 RGBD images, capturing 464 diverse indoor scenes, with detailed annotations. Our experiments demonstrate our ability to infer support relations in complex scenes and verify that our 3D scene cues and inferred support lead to better ob ject segmentation.