资源论文extracting visual knowledge from the web with multimodal learning

extracting visual knowledge from the web with multimodal learning

2019-11-01 | |  65 |   43 |   0
Abstract We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy. In this paper we present a multimodal learning algorithm to integrate text information into visual knowledge extraction. To demonstrate the effectiveness of our approach, we developed a system that takes raw webpages and a small set of training images from ImageNet as inputs, and automatically extracts visual knowledge (e.g. object bounding boxes) from tens of millions of images crawled from the Web. Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with state-ofthe-art deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates.

上一篇:explicit knowledge based reasoning for visual question answering

下一篇:identifying human mobility via trajectory embeddings

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...