Abstract
We present a method for synthesizing a frontal, neutralexpression image of a person’s face given an input face
photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a
facial-recognition network. Unlike previous generative approaches, our encoding feature vector is largely invariant
to lighting, pose, and facial expression. Exploiting this invariance, we train our decoder network using only frontal,
neutral-expression photographs. Since these photographs
are well aligned, we can decompose them into a sparse
set of landmark points and aligned texture maps. The decoder then predicts landmarks and textures independently
and combines them using a differentiable image warping
operation. The resulting images can be used for a number of
applications, such as analyzing facial attributes, exposure
and white balance adjustment, or creating a 3-D avatar