Abstract
We investigate adaptive ensemble weighting
for Neural Machine Translation, addressing
the case of improving performance on a new
and potentially unknown domain without sacrificing performance on the original domain.
We adapt sequentially across two SpanishEnglish and three English-German tasks, comparing unregularized fine-tuning, L2 and Elastic Weight Consolidation. We then report a
novel scheme for adaptive NMT ensemble decoding by extending Bayesian Interpolation
with source information, and show strong improvements across test domains without access
to the domain label.