资源论文Weakly Supervised Dense Video Captioning

Weakly Supervised Dense Video Captioning

2019-12-06 | |  83 |   42 |   0

Abstract

This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences. The proposed method is trained without explicit annotation of fifine-grained sentence to video regionsequence correspondence, but is only based on weak videolevel sentence annotations. It differs from existing video captioning systems in three technical aspects. First, we propose lexical fully convolutional neural networks (LexicalFCN) with weakly supervised multi-instance multi-label learning to weakly link video regions with lexical labels. Second, we introduce a novel submodular maximization scheme to generate multiple informative and diverse regionsequences based on the Lexical-FCN outputs. A winnertakes-all scheme is adopted to weakly associate sentences to region-sequences in the training phase. Third, a sequenceto-sequence learning based language model is trained with the weakly supervised information obtained through the association process. We show that the proposed method can not only produce informative and diverse dense captions, but also outperform state-of-the-art single video captioning methods by a large margin.

上一篇:WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation

下一篇:Weakly Supervised Semantic Segmentation using Web-Crawled Videos

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...