End-to-end Learning of Action Detection from Frame Glimpses in Videos

资源分类

2019-12-26 |

91 |

65 |

Abstract

In this work we introduce a fully end-to-end approachfor action detection in videos that learns to directly predictthe temporal bounds of actions. Our intuition is that theprocess of detecting actions is naturally one of observationand refinement: observing moments in video, and refininghypotheses about when an action is occurring. Based onthis insight, we formulate our model as a recurrent neu-ral network-based agent that interacts with a video overtime. The agent observes video frames and decides both where to look next and when to emit a prediction. Since backpropagation is not adequate in this non-differentiable setting, we use REINFORCE to learn the agent’s decisionpolicy. Our model achieves state-of-the-art results on the THUMOS’14 and ActivityNet datasets while observing only a fraction (2% or less) of the video frames.

上一篇：What if we do not have multiple videos of the same action? — Video Action Localization Using Web Images

下一篇：Using Self-Contradiction to Learn Confidence Measures in Stereo Vision

用户评价

全部评价

还没有评论，说两句吧！

热门资源

A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...
Hierarchical Task...

We extend hierarchical task network planning wi...
The Variational S...

Unlike traditional images which do not offer in...
Shape-based Autom...

We present an algorithm for automatic detection...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com