资源论文COMPRESSIVE TRANSFORMERS FOR LONG -R ANGES EQUENCE MODELLING

COMPRESSIVE TRANSFORMERS FOR LONG -R ANGES EQUENCE MODELLING

2020-01-02 | |  64 |   52 |   0

Abstract

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new openvocabulary language modelling benchmark derived from books, PG-19.

上一篇:MIXUP INFERENCE :B ETTER EXPLOITING MIXUP TOD EFEND ADVERSARIAL ATTACKS

下一篇:NEURAL OUTLIER REJECTION FORS ELF -S UPERVISED KEYPOINT LEARNING

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...