Abstract. Facial expressions are combinations of basic components called
Action Units (AU). Recognizing AUs is key for general facial expression analysis. Recently, efforts in automatic AU recognition have been
dedicated to learning combinations of local features and to exploiting
correlations between AUs. We propose a deep neural architecture that
tackles both problems by combining learned local and global features in
its initial stages and replicating a message passing algorithm between
classes similar to a graphical model inference approach in later stages.
We show that by training the model end-to-end with increased supervision we improve state-of-the-art by 5.3% and 8.2% performance on BP4D
and DISFA datasets, respectively.