资源论文Video Captioning via Hierarchical Reinforcement Learning

Video Captioning via Hierarchical Reinforcement Learning

2019-10-22 | |  83 |   63 |   0

Abstract Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fifine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a highlevel Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfifill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach signifificantly outperforms all the baseline methods on a newly introduced large-scale dataset for fifine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widelyused MSR-VTT dataset

上一篇:FFNet: Video Fast-Forwarding via Reinforcement Learning

下一篇:Deep Reinforcement Learning of Region Proposal Networks for Object Detection

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...