资源论文Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching

Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching

2019-10-08 | |  41 |   31 |   0

Abstract Image-text matching is central to visual-semantic cross-modal retrieval and has been attracting extensive attention recently. Previous studies have been devoted to fifinding the latent correspondence between image regions and words, e.g., connecting key words to specifific regions of salient objects. However, existing methods are usually committed to handle concrete objects, rather than abstract ones, e.g., a description of some action, which in fact are also ubiquitous in description texts of realworld. The main challenge in dealing with abstract objects is that there is no explicit connections between them, unlike their concrete counterparts. One therefore has to alternatively fifind the implicit and intrinsic connections between them. In this paper, we propose a relation-wise dual attention network (RDAN) for image-text matching. Specififi- cally, we maintain an over-complete set that contains pairs of regions and words. Then built upon this set, we encode the local correlations and the global dependencies between regions and words by training a visual-semantic network. Then a dual pathway attention network is presented to infer the visual-semantic alignments and image-text similarity. Extensive experiments validate the effificacy of our method, by achieving the state-of-the-art performance on several public benchmark datasets

上一篇:Multi-agent Attentional Activity Recognition

下一篇:Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...