资源论文Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning

2019-11-07 | |  86 |   64 |   0
Abstract Image captioning aims to generate textual descriptions for images. Most previous work generates a single-sentence description for each image. However, a picture is worth a thousand words. Singlesentence can hardly give a complete view of an image even by humans. In this paper, we propose a novel Topic-Oriented Multi-Sentence (TOMS) captioning model, which can generate multiple topicoriented sentences to describe an image. Different from object instances or visual attributes, topics mined by the latent Dirichlet allocation reflect hidden thematic structures in reference sentences of an image. In our model, each topic is integrated to a caption generator with a Fusion Gate Unit (FGU) to guide the generation of a sentence towards a certain topic perspective. With multiple sentences from different topics, our TOMS provides a complete description of an image. Experimental results on both sentence and paragraph datasets demonstrate the effectiveness of our TOMS in terms of topical consistency and descriptive completeness.

上一篇:Multi-modal Sentence Summarization with Modality Attention and Image Filtering

下一篇:Redundancy-resistant Generative Hashing for Image Retrieval

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...