资源论文End-to-end Learning of Action Detection from Frame Glimpses in Videos

End-to-end Learning of Action Detection from Frame Glimpses in Videos

2019-12-26 | |  66 |   35 |   0

Abstract

In this work we introduce a fully end-to-end approachfor action detection in videos that learns to directly predictthe temporal bounds of actions. Our intuition is that theprocess of detecting actions is naturally one of observationand refinement: observing moments in video, and refininghypotheses about when an action is occurring. Based onthis insight, we formulate our model as a recurrent neu-ral network-based agent that interacts with a video overtime. The agent observes video frames and decides both where to look next and when to emit a prediction. Since backpropagation is not adequate in this non-differentiable setting, we use REINFORCE to learn the agent’s decisionpolicy. Our model achieves state-of-the-art results on the THUMOS’14 and ActivityNet datasets while observing only a fraction (2% or less) of the video frames.

上一篇:What if we do not have multiple videos of the same action? — Video Action Localization Using Web Images

下一篇:Using Self-Contradiction to Learn Confidence Measures in Stereo Vision

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...