PyTorch Pretrained Bert Annotation
This BERT annotation repo is for my personal study.
The raw README of PyTorch Pretrained Bert is here.
A very nice PPT to help understanding.
Synthetic Self-Training PPT.
Arch
The BertModel and BertForMaskedLM arch.
BertModel Arch
BertEmbeddings
word_embeddings: Embedding(30522, 768)
position_embeddings: Embedding(512, 768)
token_type_embeddings: Embedding(2, 768)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
BertEncoder
BertAttention
BertIntermediate
BertOutput
dense: Linear(in_features=768, out_features=768, bias=True)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
query: Linear(in_features=768, out_features=768, bias=True)
key: Linear(in_features=768, out_features=768, bias=True)
value: Linear(in_features=768, out_features=768, bias=True)
dropout: Dropout(p=0.1)
BertSelfAttention
BertSelfOutput
dense: Linear(in_features=768, out_features=3072, bias=True)
activation: gelu
dense: Linear(in_features=3072, out_features=768, bias=True)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
BertPooler
BertForMaskedLM Arch
BertModel
dense: Linear(in_features=768, out_features=768, bias=True)
activation: Tanh()
BertLayer: (12 layers)
dense: Linear(in_features=3072, out_features=768, bias=True)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
dense: Linear(in_features=768, out_features=3072, bias=True)
activation: gelu
BertSelfAttention
BertSelfOutput
query: Linear(in_features=768, out_features=768, bias=True)
key: Linear(in_features=768, out_features=768, bias=True)
value: Linear(in_features=768, out_features=768, bias=True)
dropout: Dropout(p=0.1)
dense: Linear(in_features=768, out_features=768, bias=True)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
BertAttention
BertIntermediate
BertOutput
word_embeddings: Embedding(30522, 768)
position_embeddings: Embedding(512, 768)
token_type_embeddings: Embedding(2, 768)
LayerNorm: BertLayerNorm()
dropout: Dropout(p=0.1)
BertEmbeddings
BertEncoder
BertPooler
BertOnlyMLMHead
transform: BertPredictionHeadTransform
decoder: Linear(in_features=768, out_features=30522, bias=False)
dense: Linear(in_features=768, out_features=768, bias=True)
LayerNorm: BertLayerNorm()
BertLMPredictionHead