Abstract
In this work we propose a new automatic image annotation model, dubbed diverse and distinct image annotation (D2
IA). The generative model D2
IA is inspired by the
ensemble of human annotations, which create semantically
relevant, yet distinct and diverse tags. In D2
IA, we generate a relevant and distinct tag subset, in which the tags are
relevant to the image contents and semantically distinct to
each other, using sequential sampling from a determinantal
point process (DPP) model. Multiple such tag subsets that
cover diverse semantic aspects or diverse semantic levels
of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative
adversarial network (GAN) model to train D2
IA. Extensive
experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark
datasets demonstrate that the proposed model can produce
more diverse and distinct tags than the state-of-the-arts