资源算法pytorch-pretrained-BERT_annotation

pytorch-pretrained-BERT_annotation

2020-04-13 | |  43 |   0 |   0

PyTorch Pretrained Bert Annotation

This BERT annotation repo is for my personal study.

  • The raw README of PyTorch Pretrained Bert is here.

  • A very nice PPT to help understanding.

  • Synthetic Self-Training PPT.

Arch

The BertModel and BertForMaskedLM arch.

BertModel Arch

  • BertEmbeddings

    • word_embeddings: Embedding(30522, 768)

    • position_embeddings: Embedding(512, 768)

    • token_type_embeddings: Embedding(2, 768)

    • LayerNorm: BertLayerNorm()

    • dropout: Dropout(p=0.1)

  • BertEncoder

    • BertAttention

    • BertIntermediate

    • BertOutput

    • dense: Linear(in_features=768, out_features=768, bias=True)

    • LayerNorm: BertLayerNorm()

    • dropout: Dropout(p=0.1)

    • query: Linear(in_features=768, out_features=768, bias=True)

    • key: Linear(in_features=768, out_features=768, bias=True)

    • value: Linear(in_features=768, out_features=768, bias=True)

    • dropout: Dropout(p=0.1)

    • BertSelfAttention

    • BertSelfOutput

    • dense: Linear(in_features=768, out_features=3072, bias=True)

    • activation: gelu

    • dense: Linear(in_features=3072, out_features=768, bias=True)

    • LayerNorm: BertLayerNorm()

    • dropout: Dropout(p=0.1)

    • BertLayer: (12 layers)

    • BertPooler

      • dense: Linear(in_features=768, out_features=768, bias=True)

      • activation: Tanh()

    BertForMaskedLM Arch

    • BertModel

      • dense: Linear(in_features=768, out_features=768, bias=True)

      • activation: Tanh()

      • BertLayer: (12 layers)

      • dense: Linear(in_features=3072, out_features=768, bias=True)

      • LayerNorm: BertLayerNorm()

      • dropout: Dropout(p=0.1)

      • dense: Linear(in_features=768, out_features=3072, bias=True)

      • activation: gelu

      • BertSelfAttention

      • BertSelfOutput

      • query: Linear(in_features=768, out_features=768, bias=True)

      • key: Linear(in_features=768, out_features=768, bias=True)

      • value: Linear(in_features=768, out_features=768, bias=True)

      • dropout: Dropout(p=0.1)

      • dense: Linear(in_features=768, out_features=768, bias=True)

      • LayerNorm: BertLayerNorm()

      • dropout: Dropout(p=0.1)

      • BertAttention

      • BertIntermediate

      • BertOutput

      • word_embeddings: Embedding(30522, 768)

      • position_embeddings: Embedding(512, 768)

      • token_type_embeddings: Embedding(2, 768)

      • LayerNorm: BertLayerNorm()

      • dropout: Dropout(p=0.1)

      • BertEmbeddings

      • BertEncoder

      • BertPooler

      • BertOnlyMLMHead

        • transform: BertPredictionHeadTransform

        • decoder: Linear(in_features=768, out_features=30522, bias=False)

        • dense: Linear(in_features=768, out_features=768, bias=True)

        • LayerNorm: BertLayerNorm()

        • BertLMPredictionHead


        上一篇:kubeflow-development

        下一篇:pytorch_pretrained_BERT

        用户评价
        全部评价

        热门资源

        • Keras-ResNeXt

          Keras ResNeXt Implementation of ResNeXt models...

        • seetafaceJNI

          项目介绍 基于中科院seetaface2进行封装的JAVA...

        • spark-corenlp

          This package wraps Stanford CoreNLP annotators ...

        • capsnet-with-caps...

          CapsNet with capsule-wise convolution Project ...

        • inferno-boilerplate

          This is a very basic boilerplate example for pe...