Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

资源分类

2020-02-18 |

68 |

75 |

Abstract

Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel ‘fact-based’ visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer. We observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. Instead, we develop an entity graph and use a graph convolutional network to ‘reason’ about the correct answer by jointly considering all entities. We show on the challenging FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state of the art.

上一篇：Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection

下一篇：Geometry-Aware Recurrent Neural Networks for Active Visual Recognition

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...
The Variational S...

Unlike traditional images which do not offer in...
Depth Super Resol...

We tackle the problem of jointly increasing the...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com