Are Girls Neko or Shojo? Cross-Lingual Alignment of Non-Isomorphic ¯
Embeddings with Iterative Normalization
Abstract
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language
processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For nonisomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by
simultaneously enforcing that (1) individual
word vectors are unit length, and (2) each language’s average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with
the largest improvement observed on EnglishJapanese (from 2% to 44% test accuracy)