资源论文ON IDENTIFIABILITY IN TRANSFORMERS

ON IDENTIFIABILITY IN TRANSFORMERS

2020-01-02 | |  56 |   36 |   0

Abstract

In this work we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that attention weights are not unique and propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we are the first to show that input tokens retain to a large degree their identity across the model. We also study how identity information propagates and find that it is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to further investigate Transformer models.

上一篇:N-BEATS: NEURAL BASIS EXPANSION ANALYSIS FORINTERPRETABLE TIME SERIES FORECASTING

下一篇:Fantastic Generalization Measuresand Where to Find Them

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...