Scene Aligned Pooling for Complex Video Recognition

资源分类

2020-04-02 |

52 |

47 |

Abstract

Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in uncon- strained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often composed with shots of different scenes, the key idea of scene aligned pooling is to decompose any video features into con- current scene components, and to construct classi fication models adaptive to dif- ferent scenes. The experiments on two large scale real-world datasets including the TRECVID Multimedia Event Detection 2011 and the Human Motion Recog- nition Databases (HMDB) show that our new visual representation can consis- tently improve various kinds of visual features such as different low-level color and texture features, or middle-level histogram of local descriptors such as SIFT, or space-time interest points, and high level semantic model features, by a signif- icant margin. For example, we improve the-state-of-the-art accuracy on HMDB dataset by 20% in terms of accuracy.

上一篇：Coherent Filtering: Detecting Coherent Motions from Crowd Clutters

下一篇：Scale Robust Multi View Stereo

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com