E2-Capsnet takes a facial image as input and extracts rich feature maps with enhancement module1. Then the feature maps are fed to the capsule layers to be encoded. The three fully connected layers decode the feature maps. Finally, we get the results of facial expression recognition by squashing function. Our E2-Capsnet is trained end-to-end.
Attention map
The classification results of VGG16, Capsnet, RCCnet and the proposed method on RAF-DB are visualized.
Comparisons with others
Our E2-Capsnet can achieve more discriminative and effective representations than the other methods.