Abstract
We present a data-driven inference method that can synthesize a photorealistic texture map of a complete 3D face
model given a partial 2D view of a person in the wild. After
an initial estimation of shape and low-frequency albedo, we
compute a high-frequency partial texture map, without the
shading component, of the visible face area. To extract the
fine appearance details from this incomplete input, we introduce a multi-scale detail analysis technique based on midlayer feature correlations extracted from a deep convolutional neural network. We demonstrate that fitting a convex
combination of feature correlations from a high-resolution
face database can yield a semantically plausible facial detail description of the entire face. A complete and photorealistic texture map can then be synthesized by iteratively
optimizing for the reconstructed feature correlations. Using
these high-resolution textures and a commercial rendering
framework, we can produce high-fidelity 3D renderings that
are visually comparable to those obtained with state-of-theart multi-view face capture systems. We demonstrate successful face reconstructions from a wide range of low resolution input images, including those of historical figures. In
addition to extensive evaluations, we validate the realism of
our results using a crowdsourced user study