Synthesizing 3D Shapes via Modeling Multi-View
Depth Maps and Silhouettes with Deep Generative Networks
Abstract
We study the problem of learning generative models of
3D shapes. Voxels or 3D parts have been widely used as
the underlying representations to build complex 3D shapes;
however, voxel-based representations suffer from high memory requirements, and parts-based models require a large
collection of cached or richly parametrized parts. We take
an alternative approach: learning a generative model over
multi-view depth maps or their corresponding silhouettes,
and using a deterministic rendering function to produce
3D shapes from these images. A multi-view representation
of shapes enables generation of 3D models with fine details, as 2D depth maps and silhouettes can be modeled at a
much higher resolution than 3D voxels. Moreover, our approach naturally brings the ability to recover the underlying
3D representation from depth maps of one or a few viewpoints. Experiments show that our framework can generate
3D shapes with variations and details. We also demonstrate
that our model has out-of-sample generalization power for
real-world tasks with occluded objects.