Abstract
While invaluable for many computer vision applications,
decomposing a natural image into intrinsic reflectance and
shading layers represents a challenging, underdetermined
inverse problem. As opposed to strict reliance on conventional optimization or filtering solutions with strong prior
assumptions, deep learning based approaches have also
been proposed to compute intrinsic image decompositions
when granted access to sufficient labeled training data.
The downside is that current data sources are quite limited, and broadly speaking fall into one of two categories:
either dense fully-labeled images in synthetic/narrow settings, or weakly-labeled data from relatively diverse natural scenes. In contrast to many previous learning-based approaches, which are often tailored to the structure of a particular dataset (and may not work well on others), we adopt
core network structures that universally reflect loose prior
knowledge regarding the intrinsic image formation process
and can be largely shared across datasets. We then apply
flexibly supervised loss layers that are customized for each
source of ground truth labels. The resulting deep architecture achieves state-of-the-art results on all of the major
intrinsic image benchmarks, and runs considerably faster
than most at test time.