Abstract
Cross-media retrieval is a research hotspot in multimedia area, which aims to perform retrieval across different
media types such as image and text. The performance of
existing methods usually relies on labeled data for model
training. However, cross-media data is very labor consuming to collect and label, so how to transfer valuable knowledge in existing data to new data is a key problem towards
application. For achieving the goal, this paper proposes
deep cross-media knowledge transfer (DCKT) approach,
which transfers knowledge from a large-scale cross-media
dataset to promote the model training on another smallscale cross-media dataset. The main contributions of DCKT
are: (1) Two-level transfer architecture is proposed to
jointly minimize the media-level and correlation-level domain discrepancies, which allows two important and complementary aspects of knowledge to be transferred: intramedia semantic and inter-media correlation knowledge. It
can enrich the training information and boost the retrieval
accuracy. (2) Progressive transfer mechanism is proposed to iteratively select training samples with ascending
transfer difficulties, via the metric of cross-media domain
consistency with adaptive feedback. It can drive the transfer
process to gradually reduce vast cross-media domain discrepancy, so as to enhance the robustness of model training.
For verifying the effectiveness of DCKT, we take the largescale dataset XMediaNet as source domain, and 3 widelyused datasets as target domain for cross-media retrieval.
Experimental results show that DCKT achieves promising
improvement on retrieval accuracy.