Abstract. We present a novel semantic 3D reconstruction framework
which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end
trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based
approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our
network architecture requires only a moderate number of parameters
while keeping a high level of expressiveness which enables learning from
very little data. Experiments on real and synthetic datasets demonstrate
that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude
less iterations to converge. Moreover, our approach handles ten times
more semantic class labels using the same computational resources.