资源论文Where To Look: Focus Regions for Visual Question Answering

Where To Look: Focus Regions for Visual Question Answering

2019-12-26 | |  69 |   43 |   0

Abstract

We present a method that learns to answer visual ques-tions by selecting image regions relevant to the text-basedquery. Our method maps textual queries and visual featuresfrom various regions into a shared space where they arecompared for relevance with an inner product. Our methodexhibits significant improvements in answering questionssuch as “what color,” where it is necessary to evaluatea specific location, and “what room,” where it selectively identifies informative image regions. Our model is testedon the recently released VQA [1] dataset, which featuresfree-form human-annotated questions and answers.

上一篇:Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

下一篇:Action Recognition in Video Using Sparse Coding and Relative Features

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...