资源论文Feature Enhancement in Attention for Visual Question Answering

Feature Enhancement in Attention for Visual Question Answering

2019-11-06 | |  69 |   54 |   0
Abstract Attention mechanism has been an indispensable part of Visual Question Answering (VQA) models, due to the importance of its selective ability on image regions and/or question words. However, attention mechanism in almost all the VQA models takes as input the image visual and question textual features, which stem from different sources and between which there exists essential semantic gap. In order to further improve the accuracy of correlation between region and question in attention, we focus on region representation and propose the idea of feature enhancement, which includes three aspects. (1) We propose to leverage region semantic representation which is more consistent with the question representation. (2) We enrich the region representation using features from multiple hierarchies and (3) we refine the semantic representation for richer information. With these three incremental feature enhancement mechanisms, we improve the region representation and achieve better attentive effect and VQA performance. We conduct extensive experiments on the largest VQA v2.0 benchmark dataset and achieve competitive results without additional training data, and prove the effectiveness of our proposed feature-enhanced attention by visual demonstrations.

上一篇:A Question Type Driven Framework to Diversify Visual Question Generation

下一篇:Learning Transferable UAV for Forest Visual Perception

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...