Abstract. Recent advances in Generative Adversarial Networks (GANs)
have shown impressive results for task of facial expression synthesis. The
most successful architecture is StarGAN, that conditions GANs’ generation process with images of a specific domain, namely a set of images
of persons sharing the same expression. While effective, this approach
can only generate a discrete number of expressions, determined by the
content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU)
annotations, which describes in a continuous manifold the anatomical
facial movements defining a human expression. Our approach allows
controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to
train the model, that only requires images annotated with their activated
AUs, and exploit attention mechanisms that make our network robust
to changing backgrounds and lighting conditions. Extensive evaluation
show that our approach goes beyond competing conditional generators
both in the capability to synthesize a much wider range of expressions
ruled by anatomically feasible muscle movements, as in the capacity of
dealing with images in the wild