Abstract
We present a multi-view reconstruction method thatcombines conventional multi-view stereo (MVS) withappearance-based normal prediction, to obtain dense andaccurate 3D surface models. Reliable surface normalsreconstructed from multi-view correspondence serve astraining data for a convolutional neural network (CNN),which predicts continuous normal vectors from raw imagepatches. By training from known points in the same im-age, the prediction is specifically tailored to the materialsand lighting conditions of the particular scene, as well asto the precise camera viewpoint. It is therefore a lot easierto learn than generic single-view normal estimation. Theestimated normal maps, together with the known depth val-ues from MVS, are integrated to dense depth maps, whichin turn are fused into a 3D model. Experiments on theDTU dataset show that our method delivers 3D reconstructions with the same accuracy as MVS, but with significantly higher completeness.