better_use_pytorch_bert_pretrained
This repository contains pre-trained models and vocab for:
This will allow you to better manage the BERT model without having to pull the model and vocab from https://s3.amazonaws.com/models.huggingface.co/bert each time you run it.
before download, you can change line 10 in download_pytorch-pretrained-BERT_model_and_vocab.sh to determine the path then, run:
sh download_pytorch-pretrained-BERT_model_and_vocab.sh
This repo was tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 0.4.1/1.0.0
PyTorch pretrained bert can be installed by pip as follows:
pip install pytorch-pretrained-bert
or you can use git to clone the pytorch-pretrained-BERT repository
git clone https://github.com/huggingface/pytorch-pretrained-BERT.git
this allow you to change the code!
Here is a quick-start example using BertTokenizer
, BertModel
and BertForMaskedLM
class with Google AI's pre-trained Bert base uncased
model. See the doc section below for all the details on these classes.
You can change the pretrained_model_name_or_path = '/157Dataset/data-chen.yirong/pytorch_bert_pretrained_model/bert-base-cased/' to determine the bert loading path.
First let's prepare a tokenized input with BertTokenizer
import torchfrom pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM# OPTIONAL: if you want to have more information on what's happening, activate the logger as followsimport logging logging.basicConfig(level=logging.INFO)# Load pre-trained model tokenizer (vocabulary)pretrained_model_name_or_path = '/157Dataset/data-chen.yirong/pytorch_bert_pretrained_model/bert-base-cased/'tokenizer = BertTokenizer.from_pretrained(pretrained_model_name_or_path)# Tokenized inputtext = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"tokenized_text = tokenizer.tokenize(text)# Mask a token that we will try to predict back with `BertForMaskedLM`masked_index = 8tokenized_text[masked_index] = '[MASK]'assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']# Convert token to vocabulary indicesindexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]# Convert inputs to PyTorch tensorstokens_tensor = torch.tensor([indexed_tokens]) segments_tensors = torch.tensor([segments_ids])
Let's see how to use BertModel
to get hidden states
# Load pre-trained model (weights)model = BertModel.from_pretrained(pretrained_model_name_or_path) model.eval()# If you have a GPU, put everything on cudatokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') model.to('cuda')# Predict hidden states features for each layerwith torch.no_grad(): encoded_layers, _ = model(tokens_tensor, segments_tensors)# We have a hidden states for each of the 12 layers in model bert-base-uncasedassert len(encoded_layers) == 12
And how to use BertForMaskedLM
# Load pre-trained model (weights)model = BertForMaskedLM.from_pretrained(pretrained_model_name_or_path) model.eval()# If you have a GPU, put everything on cudatokens_tensor = tokens_tensor.to('cuda') segments_tensors = segments_tensors.to('cuda') model.to('cuda')# Predict all tokenswith torch.no_grad(): predictions = model(tokens_tensor, segments_tensors)# confirm we were able to predict 'henson'predicted_index = torch.argmax(predictions[0, masked_index]).item() predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]assert predicted_token == 'henson'
还没有评论,说两句吧!
热门资源
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com