资源论文Yin and Yang: Balancing and Answering Binary Visual Questions

Yin and Yang: Balancing and Answering Binary Visual Questions

2019-12-26 | |  66 |   56 |   0

Abstract

scenes play two roles (1) They allow us to focus on the highlevel semantics of the VQA task as opposed to the low-level recognition problems, and perhaps more importantly, (2) They provide us the modality to balance the dataset such that language priors are controlled, and the role of vision is essential. In particular, we collect fine-grained pairs of scenes for every question, such that the answer to the question is “yes” for one scene, and “no” for the other for the exact same question. Indeed, language priors alone do not perform better than chance on our balanced dataset. Moreover, our proposed approach matches the performance of a state-of-the-art VQA approach on the unbalanced dataset,and outperforms it on the balanced dataset.

上一篇:Feature Space Optimization for Semantic Video Segmentation

下一篇:Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Shape-based Autom...

    We present an algorithm for automatic detection...