Abstract
Recently a number of unsupervised approaches
have been proposed for learning vectors that capture the relationship between two words. Inspired
by word embedding models, these approaches rely
on co-occurrence statistics that are obtained from
sentences in which the two target words appear.
However, the number of such sentences is often
quite small, and most of the words that occur in
them are not relevant for characterizing the considered relationship. As a result, standard cooccurrence statistics typically lead to noisy relation vectors. To address this issue, we propose
a latent variable model that aims to explicitly determine what words from the given sentences best
characterize the relationship between the two target words. Relation vectors then correspond to the
parameters of a simple unigram language model
which is estimated from these words.