Abstract
We present Im2Pano3D, a convolutional neural network
that generates a dense prediction of 3D structure and a
probability distribution of semantic labels for a full 360?
panoramic view of an indoor scene when given only a partial observation (? 50%) in the form of an RGB-D image.
To make this possible, Im2Pano3D leverages strong contextual priors learned from large-scale synthetic and realworld indoor scenes. To ease the prediction of 3D structure,
we propose to parameterize 3D surfaces with their plane
equations and train the model to predict these parameters
directly. To provide meaningful training supervision, we use
multiple loss functions that consider both pixel level accuracy and global context consistency. Experiments demonstrate that Im2Pano3D is able to predict the semantics and
3D structure of the unobserved scene with more than 56%
pixel accuracy and less than 0.52m average distance error,
which is significantly better than alternative approaches