Abstract
In this paper, we propose a multilingual unsupervised NMT scheme which jointly trains
multiple languages with a shared encoder and
multiple decoders. Our approach is based on
denoising autoencoding of each language and
back-translating between English and multiple non-English languages. This results in a
universal encoder which can encode any language participating in training into an interlingual representation, and language-specific
decoders. Our experiments using only monolingual corpora show that multilingual unsupervised model performs better than the separately trained bilingual models achieving improvement of up to 1.48 BLEU points on
WMT test sets. We also observe that even if
we do not train the network for all possible
translation directions, the network is still able
to translate in a many-to-many fashion leveraging encoder’s ability to generate interlingual
representation.