Translating Translationese: A Two-Step Approach to UnsupervisedMachine Translation
Abstract
Given a rough, word-by-word gloss of a source
language sentence, target language natives can
uncover the latent, fully-fluent rendering of the
translation. In this work we explore this intuition by breaking translation into a two step
process: generating a rough gloss by means of
a dictionary and then ‘translating’ the resulting
pseudo-translation, or ‘Translationese’ into a
fully fluent translation. We build our Translationese decoder once from a mish-mash of
parallel data that has the target language in
common and then can build dictionaries on demand using unsupervised techniques, resulting
in rapidly generated unsupervised neural MT
systems for many source languages. We apply this process to 14 test languages, obtaining better or comparable translation results on
high-resource languages than previously published unsupervised MT studies, and obtaining good quality results for low-resource languages that have never been used in an unsupervised MT scenario.