Abstract
Recent research in cross-lingual word embeddings has almost exclusively focused on of-
fline methods, which independently train word
embeddings in different languages and map
them to a shared space through linear transformations. While several authors have questioned the underlying isomorphism assumption, which states that word embeddings in different languages have approximately the same
structure, it is not clear whether this is an inherent limitation of mapping approaches or
a more general issue when learning crosslingual embeddings. So as to answer this
question, we experiment with parallel corpora,
which allows us to compare offline mapping
to an extension of skip-gram that jointly learns
both embedding spaces. We observe that, under these ideal conditions, joint learning yields
to more isomorphic embeddings, is less sensitive to hubness, and obtains stronger results
in bilingual lexicon induction. We thus conclude that current mapping methods do have
strong limitations, calling for further research
to jointly learn cross-lingual embeddings with
a weaker cross-lingual signal.