Abstract
A huge volume of user-generated content is
daily produced on social media. To facilitate automatic language understanding, we
study keyphrase prediction, distilling salient
information from massive posts. While most
existing methods extract words from source
posts to form keyphrases, we propose a
sequence-to-sequence (seq2seq) based neural
keyphrase generation framework, enabling absent keyphrases to be created. Moreover, our
model, being topic-aware, allows joint modeling of corpus-level latent topic representations, which helps alleviate the data sparsity
that widely exhibited in social media language.
Experiments on three datasets collected from
English and Chinese social media platforms
show that our model significantly outperforms
both extraction and generation models that do
not exploit latent topics.1 Further discussions
show that our model learns meaningful topics,
which interprets its superiority in social media
keyphrase generation