资源论文Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval

Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval

2019-12-10 | |  101 |   90 |   0

Abstract

Querying with an example image is a simple and intuitive interface to retrieve information from a visual database. Most of the research in image retrieval has focused on the task of instance-level image retrieval, where the goal is to retrieve images that contain the same object instance as the query image. In this work we move beyond instance-level retrieval and consider the task of semantic image retrieval in complex scenes, where the goal is to retrieve images that share the same semantics as the query image. We show that, despite its subjective nature, the task of semantically ranking visual scenes is consistently implemented across a pool of human annotators. We also show that a similarity based on human-annotated region-level captions is highly correlated with the human ranking and constitutes a good computable surrogate. Following this observation, we learn a visual embedding of the images where the similarity in the visual space is correlated with their semantic similarity surrogate. We further extend our model to learn a joint embedding of visual and textual cues that allows one to query the database using a text modififier in addition to the query image, adapting the results to the modififier. Finally, our model can ground the ranking decisions by showing regions that contributed the most to the similarity between pairs of images, providing a visual explanation of the similarity

上一篇:Annotating Object Instances with a Polygon-RNN

下一篇:Boundary-aware Instance Segmentation

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Bounding the Inef...

    Social networks on the Internet have seen an en...

  • Shape-based Autom...

    We present an algorithm for automatic detection...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...