Abstract. The rise of virtual and augmented reality fuels an increased
need for content suitable to these new technologies including 3D contents obtained from real scenes. We consider in this paper the problem
of 3D shape reconstruction from multi-view RGB images. We investigate
the ability of learning-based strategies to effectively benefit the reconstruction of arbitrary shapes with improved precision and robustness.
We especially target real life performance capture, containing complex
surface details that are difficult to recover with existing approaches. A
key step in the multi-view reconstruction pipeline lies in the search for
matching features between viewpoints in order to infer depth information. We propose to cast the matching on a 3D receptive field along
viewing lines and to learn a multi-view photoconsistency measure for
that purpose. The intuition is that deep networks have the ability to
learn local photometric configurations in a broad way, even with respect
to different orientations along various viewing lines of the same surface
point. Our results demonstrate this ability, showing that a CNN, trained
on a standard static dataset, can help recover surface details on dynamic
scenes that are not perceived by traditional 2D feature based methods.
Our evaluation also shows that our solution compares on par to state-ofthe-art-reconstruction pipelines on standard evaluation datasets, while
yielding significantly better results and generalization with realistic performance capture data