Abstract
Facial motion retargeting is an important problem in
both computer graphics and vision, which involves capturing the performance of a human face and transferring it
to another 3D character. Learning 3D morphable model
(3DMM) parameters from 2D face images using convolutional neural networks is common in 2D face alignment,
3D face reconstruction etc. However, existing methods either require an additional face detection step before retargeting or use a cascade of separate networks to perform
detection followed by retargeting in a sequence. In this paper, we present a single end-to-end network to jointly predict the bounding box locations and 3DMM parameters for
multiple faces. First, we design a novel multitask learning framework that learns a disentangled representation of
3DMM parameters for a single face. Then, we leverage the
trained single face model to generate ground truth 3DMM
parameters for multiple faces to train another network that
performs joint face detection and motion retargeting for images with multiple faces. Experimental results show that
our joint detection and retargeting network has high face
detection accuracy and is robust to extreme expressions and
poses while being faster than state-of-the-art methods.