资源论文MONOTONIC MULTIHEAD ATTENTION

MONOTONIC MULTIHEAD ATTENTION

2020-01-02 | |  49 |   41 |   0

Abstract

Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attention heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We analyze how the latency controls affect the attention span and we study the relationship between the speed of a head and the layer it belongs to. Finally, we motivate the introduction of our model by analyzing the effect of the number of decoder layers and heads on quality and latency.

上一篇:HYPERMODELS FOR EXPLORATION

下一篇:EXPLORATORY NOT EXPLANATORY:C OUNTERFACTUAL ANALYSIS OF SALIENCY MAPSFOR DEEP RL

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...