Abstract
Generation of 3D data by deep neural networks has
been attracting increasing attention in the research community. The majority of extant works resort to regular
representations such as volumetric grids or collections of
images; however, these representations obscure the natural
invariance of 3D shapes under geometric transformations,
and also suffer from a number of other issues. In this paper
we address the problem of 3D reconstruction from a single
image, generating a straight-forward form of output – point
cloud coordinates. Along with this problem arises a unique
and interesting issue, that the groundtruth shape for an
input image may be ambiguous. Driven by this unorthodox
output form and the inherent ambiguity in groundtruth, we
design architecture, loss function and learning paradigm
that are novel and effective. Our final solution is a
conditional shape sampler, capable of predicting multiple
plausible 3D point clouds from an input image. In
experiments not only can our system outperform state-ofthe-art methods on single image based 3d reconstruction
benchmarks; but it also shows strong performance for 3D
shape completion and promising ability in making multiple
plausible predictions.