资源论文Harnessing Object and Scene Semantics for Large-Scale Video Understanding

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

2019-12-26 | |  40 |   36 |   0

Abstract

Large-scale action recognition and video categorizationare important problems in computer vision. To address these problems, we propose a novel objectand scene-basedsemantic fusion network and representation. Our semantic fusion network combines three streams of information us-ing a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-artlarge-scale CNN object-detector trained to recognize 20Kclasses, and (iii) scene features from a state-of-the-art CNNscene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion net-work, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.

上一篇:Visual Tracking Using Attention-Modulated Disintegration and Integration

下一篇:Answer-Type Prediction for Visual Question Answering

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...