Abstract
Soft-attention based Neural Machine Translation (NMT) models have achieved promising results on several translation tasks. These
models attend all the words in the source sequence for each target token, which makes
them ineffective for long sequence translation. In this work, we propose a hard-attention
based NMT model which selects a subset of
source tokens for each target token to effectively handle long sequence translation. Due
to the discrete nature of the hard-attention
mechanism, we design a reinforcement learning algorithm coupled with reward shaping
strategy to efficiently train it. Experimental results show that the proposed model performs better on long sequences and thereby
achieves significant BLEU score improvement
on English-German (EN-DE) and EnglishFrench (EN-FR) translation tasks compared to
the soft-attention based NMT