Abstract
We focus on the non-Lambertian object-level intrinsic
problem of recovering diffuse albedo, shading, and specular highlights from a single image of an object. Based on
existing 3D models in the ShapeNet database, a large-scale
object intrinsics database is rendered with HDR environment maps. Millions of synthetic images of objects and their
corresponding albedo, shading, and specular ground-truth
images are used to train an encoder-decoder CNN, which
can decompose an image into the product of albedo and
shading components along with an additive specular component. Our CNN delivers accurate and sharp results in this
classical inverse problem of computer vision. Evaluated on
our realistically synthetic dataset, our method consistently
outperforms the state-of-the-art by a large margin.
We train and test our CNN across different object categories. Perhaps surprising especially from the CNN classification perspective, our intrinsics CNN generalizes very
well across categories. Our analysis shows that feature
learning at the encoder stage is more crucial for developing a universal representation across categories. We apply our model to real images and videos from Internet, and
observe robust and realistic intrinsics results. Quality nonLambertian intrinsics could open up many interesting applications such as realistic product search based on material
properties and image-based albedo / specular editing.