Abstract
Multimodal classification arises in many computer vision tasks such as object classification and image retrieval.
The idea is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance
compared to using a single source (modality). The varying characteristics exhibited by multiple modalities make it
necessary to simultaneously learn the corresponding metrics. In this paper, we propose a multiple metrics learning algorithm for multimodal data. Metric of each modality
is a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities.
The learned metrics can improve multimodal classification
accuracy and experimental results on four datasets show
that the proposed algorithm outperforms existing learning
algorithms based on multiple metrics as well as other approaches tested on these datasets. Specifically, we report
95.0% object instance recognition accuracy, 89.2% object
category recognition accuracy on the multi-view RGB-D
dataset and 52.3% scene category recognition accuracy on
SUN RGB-D dataset