资源论文Context and related work

Context and related work

2020-02-21 | |  82 |   79 |   0

OurobjectofstudyistheBERTmodelintroducedin[6]. Tosetcontextandterminology,webriefly describe the model’s architecture. The input to BERT is based on a sequence of tokens (words or piecesofwords). Theoutputisasequenceofvectors,oneforeachinputtoken. Wewilloftenreferto thesevectorsascontextembeddingsbecausetheyincludeinformationaboutatoken’scontext. BERT’s internals consist of two parts. First, an initial embedding for each token is created by combining a pre-trained wordpiece embedding with position and segment information. Next, this initialsequenceofembeddingsisrunthroughmultipletransformerlayers,producinganewsequence ofcontextembeddingsateachstep. (BERTcomesintwoversions,a12-layerBERT-basemodeland a24-layerBERT-largemodel.) Implicitineachtransformerlayerisasetofattentionmatrices,one foreachattentionhead,eachofwhichcontainsascalarvalueforeachorderedpair (tokeni,token j)

上一篇:Are deep ResNets provably better than linear predictors?

下一篇:Nonstochastic Multiarmed Bandits with Unrestricted Delays

用户评价
全部评价

热门资源

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning Expressi...

    Facial expression is temporally dynamic event w...

  • Visual Reinforcem...

    For an autonomous agent to fulfill a wide range...