资源论文StyleNet: Generating Attractive Visual Captions with Styles

StyleNet: Generating Attractive Visual Captions with Styles

2019-12-04 | |  48 |   32 |   0
Abstract We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions

上一篇:Straight to Shapes: Real-time Detection of Encoded Shapes

下一篇:Subspace Clustering via Variance Regularized Ridge Regression

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...