visual7w

登录免费注册

论文
算法
数据集
经验分享
技术动态
行业动态

论文
学习
研究领域

算法
学习
研究领域

数据集
自动驾驶
图片

经验分享
学习
研究领域

技术动态
计算机视觉
自然语言处理

行业动态
教育
语音识别

》资源》数据集》visual7w

visual7w

2020-02-06 |

|

103 |

0 |

0

0

visual7w

We have seen great progress in basic perceptual tasks such as object recognition and detection. However, AI models still fail to match humans in high-level vision tasks due to the lack of capacities for deeper reasoning. Recently the new task of visual question answering (QA) has been proposed to evaluate a model's capacity for deep image understanding. Previous works have established a loose, global association between QA sentences and images. However, many questions and answers, in practice, relate to local regions in the images. We establish a semantic link between textual descriptions and image regions by object-level grounding. It enables a new type of QA with visual answers, in addition to textual answers used in previous work. We study the visual QA tasks in a grounded setting with a large collection of 7W multiple-choice QA pairs. Furthermore, we evaluate human performance and several baseline models on the QA tasks. Finally, we propose a novel LSTM model with spatial attention to tackle the 7W QA

上一篇：tinyimages

下一篇：Visual Geonome

用户评价

登录
注册

全部评价

还没有评论，说两句吧！

热门资源

GRAZ 图像分类数据

GRAZ 图像分类数据
凶杀案报告数据

凶杀案报告数据
MIT Cars 汽车图像...

MIT Cars 汽车图像数据
猫和狗图像分类数...

Kaggle 上的竞赛数据，用以区分猫和狗两类对象，...
Bosch 流水线降低...

数据来自产品在Bosch真实生产线上制造过程中的设备...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com

关于我们
智享云简介联系我们隐私声明
服务与支持
使用帮助联系我们
快速链接
启迪智享官网
咨询电话：010-82353090

工作日早9:00-晚6:00

© 2009-2019 tusaishared.com.cn 版权所有京ICP备19018324号