Abstract. Intrinsic image decomposition—decomposing a natural image into a
set of images corresponding to different physical causes—is one of the key and
fundamental problems of computer vision. Previous intrinsic decomposition approaches either address the problem in a fully supervised manner, or require multiple images of the same scene as input. These approaches are less desirable in
practice, as ground truth intrinsic images are extremely difficult to acquire, and
requirement of multiple images pose severe limitation on applicable scenarios. In
this paper, we propose to bring the best of both worlds. We present a two stream
convolutional neural network framework that is capable of learning the decomposition effectively in the absence of any ground truth intrinsic images, and can
be easily extended to a (semi-)supervised setup. At inference time, our model can
be easily reduced to a single stream module that performs intrinsic decomposition on a single input image. We demonstrate the effectiveness of our framework
through extensive experimental study on both synthetic and real-world datasets,
showing superior performance over previous approaches in both single-image and
multi-image settings. Notably, our approach outperforms previous state-of-the-art
single image methods while using only 50% of ground truth supervision.