Video Captioning via Hierarchical Reinforcement Learning

资源分类

2019-10-22 |

83 |

63 |

Abstract Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fifine-grained actions with a detailed description. This paper aims to address the challenge by proposing a novel hierarchical reinforcement learning framework for video captioning, where a highlevel Manager module learns to design sub-goals and a low-level Worker module recognizes the primitive actions to fulfifill the sub-goal. With this compositional framework to reinforce video captioning at different levels, our approach signifificantly outperforms all the baseline methods on a newly introduced large-scale dataset for fifine-grained video captioning. Furthermore, our non-ensemble model has already achieved the state-of-the-art results on the widelyused MSR-VTT dataset

上一篇：FFNet: Video Fast-Forwarding via Reinforcement Learning

下一篇：Deep Reinforcement Learning of Region Proposal Networks for Object Detection

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com