DeepCU: Integrating both Common and Unique Latent Information for
Multimodal Sentiment Analysis
Abstract
Multimodal sentiment analysis combines information available from visual, textual, and acoustic
representations for sentiment prediction. The recent multimodal fusion schemes combine multiple
modalities as a tensor and obtain either; the common information by utilizing neural networks, or
the unique information by modeling low-rank representation of the tensor. However, both of these
information are essential as they render inter-modal
and intra-modal relationships of the data. In this
research, we first propose a novel deep architecture
to extract the common information from the multimode representations. Furthermore, we propose
unique networks to obtain the modality-specific information that enhances the generalization performance of our multimodal system. Finally, we integrate these two aspects of information via a fusion layer and propose a novel multimodal data fusion architecture, which we call DeepCU (Deep
network with both Common and Unique latent information). The proposed DeepCU consolidates
the two networks for joint utilization and discovery of all-important latent information. Comprehensive experiments are conducted to demonstrate
the effectiveness of utilizing both common and
unique information discovered by DeepCU on multiple real-world datasets. The source code of proposed DeepCU is available at https://github.com/
sverma88/DeepCU-IJCAI19