Abstract
A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in
two languages and using the resulting crosslingual embeddings to induce word translation
pairs through nearest neighbor or related retrieval methods. In this paper, we propose an
alternative approach to this problem that builds
on the recent work on unsupervised machine
translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual
embeddings, we use them to build a phrasetable, combine it with a language model, and
use the resulting machine translation system
to generate a synthetic parallel corpus, from
which we extract the bilingual lexicon using statistical word alignment techniques. As
such, our method can work with any word embedding and cross-lingual mapping technique,
and it does not require any additional resource
besides the monolingual corpus used to train
the embeddings. When evaluated on the exact
same cross-lingual embeddings, our proposed
method obtains an average improvement of 6
accuracy points over nearest neighbor and 4
points over CSLS retrieval, establishing a new
state-of-the-art in the standard MUSE dataset