Abstract
In this paper we investigate the problem of exploiting multiple sources of information for ob ject recognition tasks when additional modal- ities that are not present in the labeled training set are available for infer- ence. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that re- quire at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modali- ties using multiple kernel learning. We demonstrate the effectiveness of our approach on several multi-modal tasks including ob ject recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.