Abstract
Reconstructing the detailed geometric structure of a face
from a given image is a key to many computer vision and
graphics applications, such as motion capture and reenactment. The reconstruction task is challenging as human
faces vary extensively when considering expressions, poses,
textures, and intrinsic geometries. While many approaches
tackle this complexity by using additional data to reconstruct the face of a single subject, extracting facial surface
from a single image remains a difficult problem. As a result, single-image based methods can usually provide only
a rough estimate of the facial geometry. In contrast, we propose to leverage the power of convolutional neural networks
to produce a highly detailed face reconstruction from a single image. For this purpose, we introduce an end-to-end
CNN framework which derives the shape in a coarse-to-fine
fashion. The proposed architecture is composed of two main
blocks, a network that recovers the coarse facial geometry
(CoarseNet), followed by a CNN that refines the facial features of that geometry (FineNet). The proposed networks
are connected by a novel layer which renders a depth image
given a mesh in 3D. Unlike object recognition and detection
problems, there are no suitable datasets for training CNNs
to perform face geometry reconstruction. Therefore, our
training regime begins with a supervised phase, based on
synthetic images, followed by an unsupervised phase that
uses only unconstrained facial images. The accuracy and
robustness of the proposed model is demonstrated by both
qualitative and quantitative evaluation tests