Counting component for VQA

This is the official implementation of our ICLR 2018 paper Learning to Count Objects in Natural Images for Visual Question Answering in PyTorch. In this paper, we introduce a counting component that allows VQA models to count objects from an attention map, achieving state-of-the-art results on the number category of VQA v2.

The core module is fully contained in counting.py. If you want to use the counting component, that is the only file that you need.

Check out the README's in the vqa-v2 directory for VQA v2 and toy directory for our toy dataset for more specific information on how to train and evaluate on these datasets.

Single-model results on VQA v2 test-std split

As of time of writing, our accuracy on number questions is state-of-the art for single and ensemble models. The accuracy on the overall category is, as far as we know, the second best among single models (see MFH), though our approach is complementary to theirs.

Yes/No	Number	Other	All
83.56	51.39	59.11	68.41

UPDATE: With this year's VQA Challenge, our number results are no longer SotA. However, Bilinear Attention Networks [code] use this counting component with their improved attention model and get 54.04% on the number category, which is the new SotA on the number category. This validates our claim that a better attention model should lead to further improvements in counting through our counting module.

BibTeX entry

@InProceedings{zhang2018vqacount,
  author    = {Yan Zhang and Jonathon Hare and Adam Pr"ugel-Bennett},
  title     = {Learning to Count Objects in Natural Images for Visual Question Answering},
  booktitle = {International Conference on Learning Representations},
  year      = {2018},
  eprint    = {1802.05766},
  url       = {https://openreview.net/forum?id=B12Js_yRb},
}

上一篇：VQA-tensorflow

下一篇：convolution-visualizer

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Keras-ResNeXt

Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
spark-corenlp

This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...

CapsNet with capsule-wise convolution Project ...
inferno-boilerplate

This is a very basic boilerplate example for pe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com