资源论文GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints

GroupCap: Group-based Image Captioning with Structured Relevance and Diversity Constraints

2019-10-17 | |  67 |   45 |   0

Abstract Most image captioning models focus on one-line (single image) captioning, where the correlations like relevance and diversity among group images (e.g., within the same album or event) are simply neglected, resulting in less accurate and diverse captions. Recent works mainly consider imposing the diversity during the online inference only, which neglect the correlation among visual structures in of- flfline training. In this paper, we propose a novel group-based image captioning scheme (termed GroupCap), which jointly models the structured relevance and diversity among group images towards an optimal collaborative captioning. In particular, we fifirst propose a visual tree parser (VP-Tree) to construct the structured semantic correlations within individual images. Then, the relevance and diversity among images are well modeled by exploiting the correlations among their tree structures. Finally, such correlations are modeled as constraints and sent into the LSTM-based captioning generator. We adopt an end-to-end formulation to train the visual tree parser, the structured relevance and diversity constraints, as well as the LSTM based captioning model jointly. To facilitate quantitative evaluation, we further release two group captioning datasets derived from the MSCOCO benchmark, serving as the fifirst of their kind. Quantitative results show that the proposed GroupCap model outperforms the state-of-the-art and alternative approaches

上一篇:Grounding Referring Expressions in Images by Variational Context

下一篇:Hallucinated-IQA: No-Reference Image Quality Assessment via Adversarial Learning

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...