Abstract. In this paper, we study the problem of recovering 3D planar
surfaces from a single image of man-made environment. We show that it
is possible to directly train a deep neural network to achieve this goal.
A novel plane structure-induced loss is proposed to train the network to
simultaneously predict a plane segmentation map and the parameters of
the 3D planes. Further, to avoid the tedious manual labeling process, we
show how to leverage existing large-scale RGB-D dataset to train our
network without explicit 3D plane annotations, and how to take advantage of the semantic labels come with the dataset for accurate planar
and non-planar classification. Experiment results demonstrate that our
method significantly outperforms existing methods, both qualitatively
and quantitatively. The recovered planes could potentially benefit many
important visual tasks such as vision-based navigation and human-robot
interaction