Abstract
We present an unsupervised method to generate Word2Sense word embeddings that are
interpretable — each dimension of the embedding space corresponds to a fine-grained
sense, and the non-negative value of the embedding along the j-th dimension represents
the relevance of the j-th sense to the word.
The underlying LDA-based generative model
can be extended to refine the representation
of a polysemous word in a short context,
allowing us to use the embeddings in contextual tasks. On computational NLP tasks,
Word2Sense embeddings compare well with
other word embeddings generated by unsupervised methods. Across tasks such as word similarity, entailment, sense induction, and contextual interpretation, Word2Sense is competitive with the state-of-the-art method for that
task. Word2Sense embeddings are at least as
sparse and fast to compute as prior art