Abstract
We consider the problem of estimating the spatial layoutof an indoor scene from a monocular RGB image, modeledas the projection of a 3D cuboid. Existing solutions to thisproblem often rely strongly on hand-engineered featuresand vanishing point detection, which are prone to failure inthe presence of clutter. In this paper, we present a methodthat uses a fully convolutional neural network (FCNN) inconjunction with a novel optimization framework for gener-ating layout estimates. We demonstrate that our method isrobust in the presence of clutter and handles a wide rangeof highly challenging scenes. We evaluate our method ontwo standard benchmarks and show that it achieves state ofthe art results, outperforming previous methods by a widemargin.