资源论文End-to-end Learning of Action Detection from Frame Glimpses in Videos

End-to-end Learning of Action Detection from Frame Glimpses in Videos

2019-12-26 | |  109 |   93 |   0

Abstract

In this work we introduce a fully end-to-end approachfor action detection in videos that learns to directly predictthe temporal bounds of actions. Our intuition is that theprocess of detecting actions is naturally one of observationand refinement: observing moments in video, and refininghypotheses about when an action is occurring. Based onthis insight, we formulate our model as a recurrent neu-ral network-based agent that interacts with a video overtime. The agent observes video frames and decides both where to look next and when to emit a prediction. Since backpropagation is not adequate in this non-differentiable setting, we use REINFORCE to learn the agent’s decisionpolicy. Our model achieves state-of-the-art results on the THUMOS’14 and ActivityNet datasets while observing only a fraction (2% or less) of the video frames.

上一篇:What if we do not have multiple videos of the same action? — Video Action Localization Using Web Images

下一篇:Using Self-Contradiction to Learn Confidence Measures in Stereo Vision

用户评价
全部评价

热门资源

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Supervised Descen...

    Many computer vision problems (e.

  • Learning Expressi...

    Facial expression is temporally dynamic event w...