Abstract
While most neural machine translation (NMT)
systems are still trained using maximum likelihood estimation, recent work has demonstrated that optimizing systems to directly improve evaluation metrics such as BLEU can
substantially improve final translation accuracy. However, training with BLEU has some
limitations: it doesn’t assign partial credit, it
has a limited range of output values, and it
can penalize semantically correct hypotheses
if they differ lexically from the reference. In
this paper, we introduce an alternative reward
function for optimizing NMT systems that is
based on recent work in semantic similarity.
We evaluate on four disparate languages translated to English, and find that training with our
proposed metric results in better translations as
evaluated by BLEU, semantic similarity, and
human evaluation, and also that the optimization procedure converges faster. Analysis suggests that this is because the proposed metric
is more conducive to optimization, assigning
partial credit and providing more diversity in
scores than BLEU