Abstract
While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both
Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems
using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches
by exploiting subword information, developing a theoretically well founded unsupervised
tuning method, and incorporating a joint re-
finement procedure. Moreover, we use our improved SMT system to initialize a dual NMT
model, which is further fine-tuned through onthe-fly back-translation. Together, we obtain
large improvements over the previous stateof-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points
in English-to-German WMT 2014, 5.5 points
more than the previous best unsupervised system, and 0.5 points more than the (supervised)
shared task winner back in 2014