Abstract
Objects’ spatial layout estimation and clutter identifification are two important tasks to understand indoor scenes. We propose to solve both of these problems in a joint framework using RGBD images of indoor scenes. In contrast to recent approaches which focus on either one of these two problems, we perform ‘fifine grained structure categorization’ by predicting all the major objects and simultaneously labeling the cluttered regions. A conditional random fifield model is proposed to incorporate a rich set of local appearance, geometric features and interactions between the scene elements. We take a structural learning approach with a loss of 3D localisation to estimate the model parameters from a large annotated RGBD dataset, and a mixed integer linear programming formulation for inference. We demonstrate that our approach is able to detect cuboids and estimate cluttered regions across many difffferent object and scene categories in the presence of occlusion, illumination and appearance variations