bert-multi-gpu
Feel free to fine tune large BERT models with large batch size easily. Multi-GPU and FP16 are supported.
Tensorflow
tensorflow >= 1.11.0 # CPU Version of TensorFlow.
tensorflow-gpu >= 1.11.0 # GPU version of TensorFlow. (Upgrade to 1.14.0 when meets ImportError: No module named 'tensorflow.python.distribute.cross_device_ops' )
NVIDIA Collective Communications Library (NCCL)
CPU/GPU/TPU Support
Multi-GPU Support: tf.distribute.MirroredStrategy
is used to achieve Multi-GPU support for this project, which mirrors vars to distribute across multiple devices and machines. The maximum batch_size for each GPU is almost the same as bert. So global batch_size depends on how many GPUs there are.
global_batch_size = train_batch_size * num_gpu_cores = 32
iteration_steps = num_train_examples * num_train_epochs / train_batch_size = 4000
global_batch_size = train_batch_size * num_gpu_cores = 32
iteration_steps = num_train_examples * num_train_epochs / train_batch_size = 4000
Assume: num_train_examples = 32000
Situation 1 (multi-gpu): train_batch_size = 8, num_gpu_cores = 4, num_train_epochs = 1
Situation 2 (single-gpu): train_batch_size = 32, num_gpu_cores = 1, num_train_epochs = 4
Result after training is equivalent between situation 1 and 2 when synchronous update on gradients is applied.
FP16 Support: FP16 allows you to use a larger batch_size. And training speed will increase by 70~100% on Volta GPUs, but may be slower on Pascal GPUs(REF1, REF2).
SavedModel Export
List some optional parameters below:
task_name
: The name of task which you want to fine tune, you can define your own task by implementing DataProcessor
class.
do_lower_case
: Whether to lower case the input text. Should be True for uncased models and False for cased models. Default value is true
.
do_train
: Fine tune classifier or not. Default value is false
.
do_eval
: Evaluate classifier or not. Default value is false
.
do_predict
: Predict by classifier recovered from checkpoint or not. Default value is false
.
save_for_serving
: Output SavedModel for tensorflow serving. Default value is false
.
data_dir
: Your original input data directory.
vocab_file
, bert_config_file
, init_checkpoint
: Files in BERT model directory.
max_seq_length
: The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Default value is 128
.
train_batch_size
: Batch size for each GPU. For example, if train_batch_size
is 16, and num_gpu_cores
is 4, your GLOBAL batch size is 16 * 4 = 64.
learning_rate
: Learning rate for Adam optimizer initialization.
num_train_epochs
: Train epoch number.
use_gpu
: Use GPU or not.
num_gpu_cores
: Total number of GPU cores to use, only used if use_gpu
is True.
use_fp16
: Use FP16
or not.
output_dir
: Checkpoints and SavedModel(.pb) files will be saved in this directory.
python run_custom_classifier.py --task_name=QQP --do_lower_case=true --do_train=true --do_eval=true --do_predict=true --save_for_serving=true --data_dir=/cfs/data/glue/QQP --vocab_file=/cfs/models/bert-large-uncased/vocab.txt --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --use_gpu=true --num_gpu_cores=4 --use_fp16=false --output_dir=/cfs/outputs/bert-large-uncased-qqp
Shell script is available also (see run_custom_classifier.sh)
Optional params could be passed flexibly through command line.
CUDA_VISIBLE_DEVICES could be set and export as environmental variables when multi-gpus are used.
# refer to the variables acronymbash run_custom_classifier.sh -h# outputcurrent params setting: -s max_seq_length, default val is: 128 -g num_gpu_cores, default val is: 4 -b train_batch_size, default val is: 32 -l learning_rate, default val is: 2e-5 -e num_train_epochs, default val is: 3.0 -c CUDA_VISIBLE_DEVICES, default val is: 0,1,2,3# example to pass paramsbash run_custom_classifier.sh -s 512 -b 8 -l 3e-5 -e 1 -g 2 -c 2,3
Use case: In some situations, one example could be assigned to different groups, e.g. one movie could be tagged as romantic, commercial, boring with different aspects. As a result, multi-label classification should be applied rather than multi-class classification as labels are not exclusive (e.g. [1, 1, 0]).
One additional parameter 'num_labels' are required and other parameters keep similar to basic classifier.
python run_custom_classifier_mlabel.py --num_labels=10 --task_name=Mlabel --do_lower_case=true --do_train=true --do_eval=true --do_predict=true --save_for_serving=true --data_dir=/cfs/data/Mlabel --vocab_file=/cfs/models/bert-large-uncased/vocab.txt --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --use_gpu=true --num_gpu_cores=4 --use_fp16=false --output_dir=/cfs/outputs/bert-large-uncased-mlabel
List some optional parameters below:
task_name
: The name of task which you want to fine tune, you can define your own task by implementing DataProcessor
class.
do_lower_case
: Whether to lower case the input text. Should be True for uncased models and False for cased models. Default value is true
.
do_train
: Fine tune model or not. Default value is false
.
do_eval
: Evaluate model or not. Default value is false
.
do_predict
: Predict by model recovered from checkpoint or not. Default value is false
.
save_for_serving
: Output SavedModel for tensorflow serving. Default value is false
.
data_dir
: Your original input data directory.
vocab_file
, bert_config_file
, init_checkpoint
: Files in BERT model directory.
max_seq_length
: The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Default value is 128
.
train_batch_size
: Batch size for each GPU. For example, if train_batch_size
is 16, and num_gpu_cores
is 4, your GLOBAL batch size is 16 * 4 = 64.
learning_rate
: Learning rate for Adam optimizer initialization.
num_train_epochs
: Train epoch number.
use_gpu
: Use GPU or not.
num_gpu_cores
: Total number of GPU cores to use, only used if use_gpu
is True.
use_fp16
: Use FP16
or not.
output_dir
: Checkpoints and SavedModel(.pb) files will be saved in this directory.
python run_seq_labeling.py --task_name=PUNCT --do_lower_case=true --do_train=true --do_eval=true --do_predict=true --save_for_serving=true --data_dir=/cfs/data/PUNCT --vocab_file=/cfs/models/bert-large-uncased/vocab.txt --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=5e-5 --num_train_epochs=10.0 --use_gpu=true --num_gpu_cores=4 --use_fp16=false --output_dir=/cfs/outputs/bert-large-uncased-punct
You can define your own task data processor by implementing DataProcessor
class.
Then, add your CustomProcessor
to processors.
Finally, you can pass --task=your_task_name
to python script.
# Create custom task data processor in run_custom_classifier.pyclass CustomProcessor(DataProcessor): """Processor for the Custom data set.""" def get_train_examples(self, data_dir): """See base class.""" return self._create_examples(read_custom_train_lines(data_dir), 'train') def get_dev_examples(self, data_dir): """See base class.""" return self._create_examples(read_custom_dev_lines(data_dir), 'dev') def get_test_examples(self, data_dir): """See base class.""" return self._create_examples(read_custom_test_lines(data_dir), 'test') def get_labels(self): """See base class.""" return your_label_list # ["label-1", "label-2", "label-3", ..., "label-k"] def _create_examples(self, lines, set_type): """Creates examples for the training/evaluation/testing sets.""" examples = [] for (i, line) in enumerate(lines): # text_b can be None (guid, text_a, text_b, label) = parse_your_data_line(line) examples.append( InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label)) return examples# Add CustomProcessor to processors in run_custom_classifier.pydef main(_): # ... # Register 'custom' processor name to processors, and you can pass --task_name=custom to this script processors = { "cola": ColaProcessor, "mnli": MnliProcessor, "mrpc": MrpcProcessor, "xnli": XnliProcessor, "qqp": QqpProcessor, "custom": CustomProcessor, } # ...
If --save_for_serving=true
is passed to run_custom_classifier.py
or run_seq_labeling.py
, python script will export SavedModel file to output_dir
. Now you are good to go.
Install the SavedModel CLI by installing a pre-built Tensorflow binary(usually already installed on your system at pathname binsaved_model_cli
) or building TensorFlow from source code.
Check your SavedModel file:
saved_model_cli show --dir <bert_savedmodel_output_path>/<timestamp> --all# For example:saved_model_cli show --dir tf_serving/bert_base_uncased_multi_gpu_qqp/1557722227/ --all# Output:# signature_def['serving_default']:# The given SavedModel SignatureDef contains the following input(s):# inputs['input_ids'] tensor_info:# dtype: DT_INT32# shape: (-1, 128)# name: input_ids:0# inputs['input_mask'] tensor_info:# dtype: DT_INT32# shape: (-1, 128)# name: input_mask:0# inputs['label_ids'] tensor_info:# dtype: DT_INT32# shape: (-1)# name: label_ids:0# inputs['segment_ids'] tensor_info:# dtype: DT_INT32# shape: (-1, 128)# name: segment_ids:0# The given SavedModel SignatureDef contains the following output(s):# outputs['probabilities'] tensor_info:# dtype: DT_FLOAT# shape: (-1, 2)# name: loss/Softmax:0# Method name is: tensorflow/serving/predict
Install Bazel and compile tensorflow_model_server.
cd /your/path/to/tensorflow/serving bazel build -c opt //tensorflow_serving/model_servers:tensorflow_model_server
Start tensorflow serving to listen on port for HTTP/REST API or gRPC API, tensorflow_model_server
will initialize the models in <bert_savedmodel_output_path>
.
# HTTP/REST APIbazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --rest_api_port=<rest_api_port> --model_name=<model_name> --model_base_path=<bert_savedmodel_output_path># For example:bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --rest_api_port=9000 --model_name=bert_base_uncased_qqp --model_base_path=/root/tf_serving/bert_base_uncased_multi_gpu_qqp --enable_batching=true# Output:# 2019-05-14 23:26:38.135575: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: bert_base_uncased_qqp version: 1557722227}# 2019-05-14 23:26:38.158674: I tensorflow_serving/model_servers/server.cc:324] Running gRPC ModelServer at 0.0.0.0:8500 ...# 2019-05-14 23:26:38.179164: I tensorflow_serving/model_servers/server.cc:344] Exporting HTTP/REST API at:localhost:9000 ...
Make a request to test your latest serving model.
curl -H "Content-type: application/json" -X POST -d '{"instances": [{"input_ids": [101,2054,2064,2028,2079,2044,16914,5910,1029,102,2054,2079,1045,2079,2044,2026,16914,5910,1029,102,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "input_mask": [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "segment_ids": [0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "label_ids":[0]}]}' "http://localhost:9000/v1/models/bert_base_uncased_qqp:predict"# Output:# {"predictions": [[0.608512461, 0.391487628]]}
License
Apache License, please click to check more details.
上一篇: VL-BERT
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
capsnet-with-caps...
CapsNet with capsule-wise convolution Project ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com