Watching a Small Portion could be as Good as Watching All: Towards Ef?cient Video Classi?cation

资源分类

2019-11-06 |

85 |

39 |

Abstract We aim to signi?cantly reduce the computational cost for classi?cation of temporally untrimmed videos while retaining similar accuracy. Existing video classi?cation methods sample frames with a prede?ned frequency over entire video. Differently, we propose an end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames like what we do. We make two main contributions. First, information is not equally distributed in video frames along time. An agent needs to watch more carefully when a clip is informative and skip the frames if they are redundant or irrelevant. The proposed approach enables the agent to adapt sampling rate to video content and skip most of the frames without the loss of information. Second, in order to have a con?dent decision, the number of frames that should be watched by an agent varies greatly from one video to another. We incorporate an adaptive stop network to measure con?dence score and generate timely trigger to stop the agent watching videos, which improves ef?ciency without loss of accuracy. Our approach reduces the computational cost signi?cantly for the large-scale YouTube-8M dataset, while the accuracy remains the same.

上一篇：Scanpath Prediction for Visual Attention using IOR-ROI LSTM

下一篇：StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com