资源论文Generalized Data Augmentation for Low-Resource Translation

Generalized Data Augmentation for Low-Resource Translation

2019-09-19 | |  79 |   50 |   0 0 0
Abstract Translation to or from low-resource languages (LRLs) poses challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing large amounts of monolingual data is regarded as an effective way to alleviate these problems. In this paper, we propose a general framework for data augmentation in low-resource machine translation that not only uses target-side monolingual data, but also pivots through a related highresource language (HRL). Specifically, we experiment with a two-step pivoting method to convert high-resource data to the LRL, making use of available resources to better approximate the true data distribution of the LRL. First, we inject LRL words into HRL sentences through an induced bilingual dictionary. Second, we further edit these modified sentences using a modified unsupervised machine translation framework. Extensive experiments on four low-resource datasets show that under extreme low-resource settings, our data augmentation techniques improve translation quality by up to 1.5 to 8 BLEU points compared to supervised back-translation baselines.

上一篇:Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

下一篇:Hierarchical Transfer Learning for Multi-label Text Classification

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...