资源论文Densely Connected Attention Flow for Visual Question Answering

Densely Connected Attention Flow for Visual Question Answering

2019-10-08 | |  56 |   49 |   0

Abstract Learning effective interactions between multimodal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex imagequestion relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fifine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector effificiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance

上一篇:Deep Light-field-driven Saliency Detection from a Single View

下一篇:Dynamically Visual Disambiguation of Keyword-based Image Search

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...