Abstract
We introduce InverseFaceNet, a deep convolutional inverse
rendering framework for faces that jointly estimates facial
pose, shape, expression, reflectance and illumination from
a single input image. By estimating all parameters from
just a single image, advanced editing possibilities on a
single face image, such as appearance editing and relighting,
become feasible in real time. Most previous learning-based
face reconstruction approaches do not jointly recover all
dimensions, or are severely limited in terms of visual quality.
In contrast, we propose to recover high-quality facial pose,
shape, expression, reflectance and illumination using a deep
neural network that is trained using a large, synthetically
created training corpus. Our approach builds on a novel loss
function that measures model-space similarity directly in
parameter space and significantly improves reconstruction
accuracy. We further propose a self-supervised bootstrapping
process in the network training loop, which iteratively
updates the synthetic training corpus to better reflect the
distribution of real-world imagery. We demonstrate that
this strategy outperforms completely synthetically trained
networks. Finally, we show high-quality reconstructions and
compare our approach to several state-of-the-art approaches