Abstract
We present DeepMVS, a deep convolutional neural network (ConvNet) for multi-view stereo reconstruction. Taking an arbitrary number of posed images as input, we first
produce a set of plane-sweep volumes and use the proposed
DeepMVS network to predict high-quality disparity maps.
The key contributions that enable these results are (1) supervised pretraining on a photorealistic synthetic dataset,
(2) an effective method for aggregating information across a
set of unordered images, and (3) integrating multi-layer feature activations from the pre-trained VGG-19 network. We
validate the efficacy of DeepMVS using the ETH3D Benchmark. Our results show that DeepMVS compares favorably against state-of-the-art conventional MVS algorithms
and other ConvNet based methods, particularly for neartextureless regions and thin structures