Abstract
State-of-the-art methods for unsupervised
bilingual word embeddings (BWE) train a
mapping function that maps pre-trained monolingual word embeddings into a bilingual
space. Despite its remarkable results, unsupervised mapping is also well-known to be limited by the dissimilarity between the original
word embedding spaces to be mapped. In this
work, we propose a new approach that trains
unsupervised BWE jointly on synthetic parallel data generated through unsupervised machine translation. We demonstrate that existing algorithms that jointly train BWE are
very robust to noisy training data and show
that unsupervised BWE jointly trained signifi-
cantly outperform unsupervised mapped BWE
in several cross-lingual NLP tasks.