Pixels, voxels, and views: A study of shape representations
for single view 3D object shape prediction
Abstract
The goal of this paper is to compare surface-based and
volumetric 3D object shape representations, as well as
viewer-centered and object-centered reference frames for
single-view 3D shape prediction. We propose a new algorithm for predicting depth maps from multiple viewpoints,
with a single depth or RGB image as input. By modifying
the network and the way models are evaluated, we can directly compare the merits of voxels vs. surfaces and viewercentered vs. object-centered for familiar vs. unfamiliar
objects, as predicted from RGB or depth images. Among
our findings, we show that surface-based methods outperform voxel representations for objects from novel classes
and produce higher resolution outputs. We also find that using viewer-centered coordinates is advantageous for novel
objects, while object-centered representations are better for
more familiar objects. Interestingly, the coordinate frame
significantly affects the shape representation learned, with
object-centered placing more importance on implicitly recognizing the object category and viewer-centered producing shape representations with less dependence on category
recognition