Natural Language Object Retrieval

资源分类

2019-12-27 |

149 |

104 |

Abstract

In this paper, we address the task of natural languageobject retrieval, to localize a target object within a givenimage based on a natural language query of the object. Natural language object retrieval differs from text-based imageretrieval task as it involves spatial information about ob-jects within the scene and global scene context. To addressthis issue, we propose a novel Spatial Context Recurrent ConvNet (SCRC) model as scoring function on candidate boxes for object retrieval, integrating spatial configurationsand global scene-level contextual information into the network. Our model processes query text, local image de-scriptors, spatial configurations and global context features through a recurrent network, outputs the probability of the query text conditioned on each candidate box as a score for the box, and can transfer visual-linguistic knowledge from image captioning domain to our task. Experimental results demonstrate that our method effectively utilizes both local and global information, outperforming previous baselinemethods significantly on different datasets and scenarios, and can exploit large scale vision and language datasetsfor knowledge transfer.

上一篇：Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

下一篇：DenseCap: Fully Convolutional Localization Networks for Dense Captioning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Deep Cross-media ...

Cross-media retrieval is a research hotspot in ...
Regularizing RNNs...

Recently, caption generation with an encoder-de...
The Variational S...

Unlike traditional images which do not offer in...
Supervised Descen...

Many computer vision problems (e.
Hierarchical Task...

We extend hierarchical task network planning wi...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com