Abstract
Due to availability of large amounts of multimedia data,
cross-modal matching is gaining increasing importance.
Hashing based techniques provide an attractive solution to
this problem when the data size is large. Different scenarios of cross-modal matching are possible, for example, data
from the different modalities can be associated with a single label or multiple labels, and in addition may or may
not have one-to-one correspondence. Most of the existing
approaches have been developed for the case where there
is one-to-one correspondence between the data of the two
modalities. In this paper, we propose a simple, yet effective generalized hashing framework which can work for all
the different scenarios, while preserving the semantic distance between the data points. The approach first learns the
optimum hash codes for the two modalities simultaneously,
so as to preserve the semantic similarity between the data
points, and then learns the hash functions to map from the
features to the hash codes. Extensive experiments on single
label dataset like Wiki and multi-label datasets like NUSWIDE, Pascal and LabelMe under all the different scenarios and comparisons with the state-of-the-art shows the effectiveness of the proposed approach