Abstract Multi-modal learning refers to the process of learning a precise model to represent the joint representations of different modalities. Despite its promise for multi-modal learning, the co-regularization method is based on the consistency principle with a suffificient assumption, which usually does not hold for real-world multi-modal data. Indeed, due to the modal insuffificiency in real-world applications, there are divergences among heterogeneous modalities. This imposes a critical challenge for multi-modal learning. To this end, in this paper, we propose a novel Comprehensive Multi-Modal Learning (CMML) framework, which can strike a balance between the consistency and divergency modalities by considering the insuffificiency in one unifified framework. Specififically, we utilize an instance level attention mechanism to weight the suf- fificiency for each instance on different modalities. Moreover, novel diversity regularization and robust consistency metrics are designed for discovering insuffificient modalities. Our empirical studies show the superior performances of CMML on real-world data in terms of various criteria