Abstract. Annotation errors and bias are inevitable among different
facial expression datasets due to the subjectiveness of annotating facial expressions. Ascribe to the inconsistent annotations, performance of
existing facial expression recognition (FER) methods cannot keep improving when the training set is enlarged by merging multiple datasets.
To address the inconsistency, we propose an Inconsistent Pseudo Annotations to Latent Truth(IPA2LT) framework to train a FER model from
multiple inconsistently labeled datasets and large scale unlabeled data.
In IPA2LT, we assign each sample more than one labels with human annotations or model predictions. Then, we propose an end-to-end LTNet
with a scheme of discovering the latent truth from the inconsistent pseudo
labels and the input face images. To our knowledge, IPA2LT serves as the
first work to solve the training problem with inconsistently labeled FER
datasets. Experiments on synthetic data validate the effectiveness of the
proposed method in learning from inconsistent labels. We also conduct
extensive experiments in FER and show that our method outperforms
other state-of-the-art and optional methods under a rigorous evaluation
protocol involving 7 FER datasets