Abstract
In this paper, we develop novel, efficient 2D encodings for 3D geometry, which enable reconstructing full 3D
shapes from a single image at high resolution. The key idea
is to pose 3D shape reconstruction as a 2D prediction problem. To that end, we first develop a simple baseline network
that predicts entire voxel tubes at each pixel of a reference
view. By leveraging well-proven architectures for 2D pixelprediction tasks, we attain state-of-the-art results, clearly
outperforming purely voxel-based approaches. We scale
this baseline to higher resolutions by proposing a memoryefficient shape encoding, which recursively decomposes a
3D shape into nested shape layers, similar to the pieces of
a Matryoshka doll. This allows reconstructing highly detailed shapes with complex topology, as demonstrated in extensive experiments; we clearly outperform previous octreebased approaches despite having a much simpler architecture using standard network components. Our Matryoshka
networks further enable reconstructing shapes from IDs or
shape similarity, as well as shape sampling