Abstract
Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However,
there still remains a crucial bottleneck: how to bridge the
modality gap to further enhance the retrieval accuracy. In
this paper, we propose a self-supervised adversarial hashing (SSAH) approach, which lies among the early attempts
to incorporate adversarial learning into cross-modal hashing in a self-supervised fashion. The primary contribution
of this work is that two adversarial networks are leveraged
to maximize the semantic correlation and consistency of the
representations between different modalities. In addition,
we harness a self-supervised semantic network to discover
high-level semantic information in the form of multi-label
annotations. Such information guides the feature learning
process and preserves the modality relationships in both the
common semantic space and the Hamming space. Extensive
experiments carried out on three benchmark datasets validate that the proposed SSAH surpasses the state-of-the-art
methods