资源论文Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts

Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts

2019-12-17 | |  104 |   47 |   0

Abstract

Current state-of-the-art systems for visual content analysis require large training sets for each class of interest, and performance degrades rapidly with fewer examples. In this paper, we present a general framework for the zeroshot learning problem of performing high-level event detection with no training exemplars, using only textual descriptions. This task goes beyond the traditional zero-shot framework of adapting a given set of classes with training data to unseen classes. We leverage video and image collections with free-form text descriptions from widely available web sources to learn a large bank of concepts, in addition to using several off-the-shelf concept detectors, speech, and video text for representing videos. We utilize natural language processing technologies to generate event description features. The extracted features are then projected to a common high-dimensional space using text expansion, and similarity is computed in this space. We present extensive experimental results on the large TRECVID MED [26] corpus to demonstrate our approach. Our results show that the proposed concept detection methods signifificantly outperform current attribute classififiers such as Classemes [34], ObjectBank [21], and SUN attributes [28]. Further, we fifind that fusion, both within as well as between modalities, is crucial for optimal performance.

上一篇:What are you talking about? Text-to-Image Coreference

下一篇:Multi-fold MIL Training for Weakly Supervised Object Localization

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...