资源论文Scene Aligned Pooling for Complex Video Recognition

Scene Aligned Pooling for Complex Video Recognition

2020-04-02 | |  52 |   47 |   0

Abstract

Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in uncon- strained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often composed with shots of different scenes, the key idea of scene aligned pooling is to decompose any video features into con- current scene components, and to construct classi fication models adaptive to dif- ferent scenes. The experiments on two large scale real-world datasets including the TRECVID Multimedia Event Detection 2011 and the Human Motion Recog- nition Databases (HMDB) show that our new visual representation can consis- tently improve various kinds of visual features such as different low-level color and texture features, or middle-level histogram of local descriptors such as SIFT, or space-time interest points, and high level semantic model features, by a signif- icant margin. For example, we improve the-state-of-the-art accuracy on HMDB dataset by 20% in terms of accuracy.

上一篇:Coherent Filtering: Detecting Coherent Motions from Crowd Clutters

下一篇:Scale Robust Multi View Stereo

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...