Abstract
As a classic statistical model of 3D facial shape and texture, 3D Morphable Model (3DMM) is widely used in facial
analysis, e.g., model fitting, image synthesis. Conventional
3DMM is learned from a set of well-controlled 2D face images with associated 3D face scans, and represented by two
sets of PCA basis functions. Due to the type and amount
of training data, as well as the linear bases, the representation power of 3DMM can be limited. To address these
problems, this paper proposes an innovative framework to
learn a nonlinear 3DMM model from a large set of unconstrained face images, without collecting 3D face scans.
Specifically, given a face image as input, a network encoder
estimates the projection, shape and texture parameters. Two
decoders serve as the nonlinear 3DMM to map from the
shape and texture parameters to the 3D shape and texture,
respectively. With the projection parameter, 3D shape, and
texture, a novel analytically-differentiable rendering layer
is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power
of our nonlinear 3DMM over its linear counterpart, and its
contribution to face alignment and 3D reconstruction