资源论文EFFICIENT TRANSFORMERFOR MOBILE APPLICATIONS

EFFICIENT TRANSFORMERFOR MOBILE APPLICATIONS

2020-01-02 | |  53 |   40 |   0

Abstract

Transformer has become ubiquitous in natural language processing (e.g., machine translation, question answering); however, it requires enormous amount of computations to achieve high performance, which makes it not suitable for real-world mobile applications since mobile phones are tightly constrained by the hardware resources and battery. In this paper, we investigate the mobile setting (under 500M Mult-Adds) for NLP tasks to facilitate the deployment on the edge devices. We present Long-Short Range Attention (LSRA), where some heads specialize in the local context modeling (by convolution) while the others capture the long-distance relationship (by attention). Based on this primitive, we design Mobile Transformer (MBT) that is tailored for the mobile NLP application. Our MBT demonstrates consistent improvement over the transformer on three well-established language tasks: IWSLT 2014 German-English, WMT 2014 English-German and WMT 2014 English-French. It outperforms the transformer by 0.9 BLEU under 500M Mult-Adds and 1.2 BLEU under 100M Mult-Adds on WMT’14 English-German. On WMT’14 English-French, our MBT reduces the computation of the transformer by 2.5× with negligible BLEU degradation. Without the costly architecture search that requires more than 250 GPU years, our MBT achieves 0.5 higher BLEU than the AutoML-based Evolved Transformer under the mobile setting.

上一篇:INDUCTIVE MATRIX COMPLETION BASED ON GRAPHN EURAL NETWORKS

下一篇:DEEP LEARNING OF DETERMINANTAL POINT PRO -CESSES VIA PROPER SPECTRAL SUB -GRADIENT

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...