资源论文Representing Videos using Mid-level Discriminative Patches

Representing Videos using Mid-level Discriminative Patches

2019-12-10 | |  60 |   49 |   0

Abstract

How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defifines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classifification where they demonstrate stateof-the-art performance on UCF50 and Olympics datasets.

上一篇:A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

下一篇:The SVM-minus Similarity Score for Video Face Recognition

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • dynamical system ...

    allows to preform manipulations of heavy or bul...

  • The Variational S...

    Unlike traditional images which do not offer in...