The current state-of-the-art model for visual question answering, as described in the following paper:
<br/>@article{fukui16mcb,<br/> title={Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding},<br/> author={Fukui, Akira and Park, Dong Huk and Yang, Daylen and Rohrbach, Anna and Darrell, Trevor and Rohrbach, Marcus},<br/> journal={arXiv:1606.01847},<br/> year={2016},<br/>}<br/>
[[arXiv](https://arxiv.org/abs/1606.01847)] [[GitHub repo](https://github.com/akirafukui/vqa-mcb/)]
无链接