Abstract. Most multi-view 3D reconstruction algorithms, especially when shapefrom-shading cues are used, assume that object appearance is predominantly diffuse. To alleviate this restriction, we introduce S2Dnet, a generative adversarial
network for transferring multiple views of objects with specular reflection into
diffuse ones, so that multi-view reconstruction methods can be applied more effectively. Our network extends unsupervised image-to-image translation to multiview “specular to diffuse” translation. To preserve object appearance across multiple views, we introduce a Multi-View Coherence loss (MVC) that evaluates
the similarity and faithfulness of local patches after the view-transformation. In
addition, we carefully design and generate a large synthetic training data set using physically-based rendering. During testing, our network takes only the raw
glossy images as input, without extra information such as segmentation masks
or lighting estimation. Results demonstrate that multi-view reconstruction can be
significantly improved using the images filtered by our network