Harnessing Object and Scene Semantics for Large-Scale Video Understanding

资源分类

2019-12-26 |

40 |

36 |

Abstract

Large-scale action recognition and video categorizationare important problems in computer vision. To address these problems, we propose a novel objectand scene-basedsemantic fusion network and representation. Our semantic fusion network combines three streams of information us-ing a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-artlarge-scale CNN object-detector trained to recognize 20Kclasses, and (iii) scene features from a state-of-the-art CNNscene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion net-work, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.

上一篇：Visual Tracking Using Attention-Modulated Disintegration and Integration

下一篇：Answer-Type Prediction for Visual Question Answering

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com