资源论文DATA -DEPENDENT GAUSSIAN PRIOR OBJECTIVEFOR LANGUAGE GENERATION

DATA -DEPENDENT GAUSSIAN PRIOR OBJECTIVEFOR LANGUAGE GENERATION

2019-12-31 | |  103 |   48 |   0

Abstract

For typical sequence prediction problems like language generation, maximum likelihood estimation (MLE) has been commonly adopted as it encourages the predicted sequence most consistent with the ground-truth sequence to have the highest probability of occurring. However, MLE focuses on a once-for-all matching between the predicted sequence and gold-standard consequently, treating all incorrect predictions as being equally incorrect. We call such a drawback negative diversity ignorance in this paper. Treating all incorrect predictions as equal unfairly downplays the nuance of these sequences’ detailed token-wise structure. To counteract this, we augment the MLE loss by introducing an extra KL divergence term which is derived from comparing a data-dependent Gaussian prior and the detailed training prediction. The proposed data-dependent Gaussian prior objective (D2GPo) is defined over a prior topological order of tokens, poles apart from the data-independent Gaussian prior (L2 regularization) commonly adopted for smoothing the training of MLE. Experimental results show that the proposed method can effectively make use of more detailed prior in the data and significantly improve the performance of typical language generation tasks, including supervised and unsupervised machine translation, text summarization, storytelling, and image caption.

上一篇:COMPOSITIONAL LANGUAGES EMERGE IN ANEURALITERATED LEARNING MODEL

下一篇:ARE PRE -TRAINED LANGUAGE MODELS AWARE OFP HRASES ?S IMPLE BUT STRONG BASELINES FORG RAMMAR INDUCTION

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...