From Words to Sentences: A Progressive Learning Approach for
Zero-resource Machine Translation with Visual Pivots
Abstract
The neural machine translation model has suffered
from the lack of large-scale parallel corpora. In
contrast, we humans can learn multi-lingual translations even without parallel texts by referring our
languages to the external world. To mimic such human learning behavior, we employ images as pivots
to enable zero-resource translation learning. However, a picture tells a thousand words, which makes
multi-lingual sentences pivoted by the same image
noisy as mutual translations and thus hinders the
translation model learning. In this work, we propose a progressive learning approach for imagepivoted zero-resource machine translation. Since
words are less diverse when grounded in the image,
we first learn word-level translation with image pivots, and then progress to learn the sentence-level
translation by utilizing the learned word translation
to suppress noises in image-pivoted multi-lingual
sentences. Experimental results on two widely used
image-pivot translation datasets, IAPR-TC12 and
Multi30k, show that the proposed approach signifi-
cantly outperforms other state-of-the-art methods