资源算法bert-multi-gpu

bert-multi-gpu

2020-03-10 | |  60 |   0 |   0

bert-multi-gpu

Feel free to fine tune large BERT models with large batch size easily. Multi-GPU and FP16 are supported.

Dependencies

Features

  • CPU/GPU/TPU Support

  • Multi-GPU Supporttf.distribute.MirroredStrategy is used to achieve Multi-GPU support for this project, which mirrors vars to distribute across multiple devices and machines. The maximum batch_size for each GPU is almost the same as bert. So global batch_size depends on how many GPUs there are.

    • global_batch_size = train_batch_size * num_gpu_cores = 32

    • iteration_steps = num_train_examples * num_train_epochs / train_batch_size = 4000

    • global_batch_size = train_batch_size * num_gpu_cores = 32

    • iteration_steps = num_train_examples * num_train_epochs / train_batch_size = 4000

    • Assume: num_train_examples = 32000

    • Situation 1 (multi-gpu): train_batch_size = 8, num_gpu_cores = 4, num_train_epochs = 1

    • Situation 2 (single-gpu): train_batch_size = 32, num_gpu_cores = 1, num_train_epochs = 4

    • Result after training is equivalent between situation 1 and 2 when synchronous update on gradients is applied.

  • FP16 SupportFP16 allows you to use a larger batch_size. And training speed will increase by 70~100% on Volta GPUs, but may be slower on Pascal GPUs(REF1REF2).

  • SavedModel Export

Usage

Run Classifier

List some optional parameters below:

  • task_name: The name of task which you want to fine tune, you can define your own task by implementing DataProcessor class.

  • do_lower_case: Whether to lower case the input text. Should be True for uncased models and False for cased models. Default value is true.

  • do_train: Fine tune classifier or not. Default value is false.

  • do_eval: Evaluate classifier or not. Default value is false.

  • do_predict: Predict by classifier recovered from checkpoint or not. Default value is false.

  • save_for_serving: Output SavedModel for tensorflow serving. Default value is false.

  • data_dir: Your original input data directory.

  • vocab_filebert_config_fileinit_checkpoint: Files in BERT model directory.

  • max_seq_length: The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Default value is 128.

  • train_batch_size: Batch size for each GPU. For example, if train_batch_size is 16, and num_gpu_cores is 4, your GLOBAL batch size is 16 * 4 = 64.

  • learning_rate: Learning rate for Adam optimizer initialization.

  • num_train_epochs: Train epoch number.

  • use_gpu: Use GPU or not.

  • num_gpu_cores: Total number of GPU cores to use, only used if use_gpu is True.

  • use_fp16: Use FP16 or not.

  • output_dirCheckpoints and SavedModel(.pb) files will be saved in this directory.

python run_custom_classifier.py 
  --task_name=QQP 
  --do_lower_case=true 
  --do_train=true 
  --do_eval=true 
  --do_predict=true 
  --save_for_serving=true 
  --data_dir=/cfs/data/glue/QQP 
  --vocab_file=/cfs/models/bert-large-uncased/vocab.txt 
  --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json 
  --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt 
  --max_seq_length=128 
  --train_batch_size=32 
  --learning_rate=2e-5 
  --num_train_epochs=3.0 
  --use_gpu=true 
  --num_gpu_cores=4 
  --use_fp16=false 
  --output_dir=/cfs/outputs/bert-large-uncased-qqp

Shell script is available also (see run_custom_classifier.sh)

  • Optional params could be passed flexibly through command line.

  • CUDA_VISIBLE_DEVICES could be set and export as environmental variables when multi-gpus are used.

# refer to the variables acronymbash run_custom_classifier.sh -h# outputcurrent params setting:
-s max_seq_length,        default val is: 128
-g num_gpu_cores,         default val is: 4
-b train_batch_size,      default val is: 32
-l learning_rate,         default val is: 2e-5
-e num_train_epochs,      default val is: 3.0
-c CUDA_VISIBLE_DEVICES,  default val is: 0,1,2,3# example to pass paramsbash run_custom_classifier.sh -s 512 -b 8 -l 3e-5 -e 1 -g 2 -c 2,3

Run Multi-label Classification

Use case: In some situations, one example could be assigned to different groups, e.g. one movie could be tagged as romantic, commercial, boring with different aspects. As a result, multi-label classification should be applied rather than multi-class classification as labels are not exclusive (e.g. [1, 1, 0]).

One additional parameter 'num_labels' are required and other parameters keep similar to basic classifier.

python run_custom_classifier_mlabel.py 
  --num_labels=10 
  --task_name=Mlabel 
  --do_lower_case=true 
  --do_train=true 
  --do_eval=true 
  --do_predict=true 
  --save_for_serving=true 
  --data_dir=/cfs/data/Mlabel 
  --vocab_file=/cfs/models/bert-large-uncased/vocab.txt 
  --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json 
  --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt 
  --max_seq_length=128 
  --train_batch_size=32 
  --learning_rate=2e-5 
  --num_train_epochs=3.0 
  --use_gpu=true 
  --num_gpu_cores=4 
  --use_fp16=false 
  --output_dir=/cfs/outputs/bert-large-uncased-mlabel

Run Sequence Labeling

List some optional parameters below:

  • task_name: The name of task which you want to fine tune, you can define your own task by implementing DataProcessor class.

  • do_lower_case: Whether to lower case the input text. Should be True for uncased models and False for cased models. Default value is true.

  • do_train: Fine tune model or not. Default value is false.

  • do_eval: Evaluate model or not. Default value is false.

  • do_predict: Predict by model recovered from checkpoint or not. Default value is false.

  • save_for_serving: Output SavedModel for tensorflow serving. Default value is false.

  • data_dir: Your original input data directory.

  • vocab_filebert_config_fileinit_checkpoint: Files in BERT model directory.

  • max_seq_length: The maximum total input sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Default value is 128.

  • train_batch_size: Batch size for each GPU. For example, if train_batch_size is 16, and num_gpu_cores is 4, your GLOBAL batch size is 16 * 4 = 64.

  • learning_rate: Learning rate for Adam optimizer initialization.

  • num_train_epochs: Train epoch number.

  • use_gpu: Use GPU or not.

  • num_gpu_cores: Total number of GPU cores to use, only used if use_gpu is True.

  • use_fp16: Use FP16 or not.

  • output_dirCheckpoints and SavedModel(.pb) files will be saved in this directory.

python run_seq_labeling.py 
  --task_name=PUNCT 
  --do_lower_case=true 
  --do_train=true 
  --do_eval=true 
  --do_predict=true 
  --save_for_serving=true 
  --data_dir=/cfs/data/PUNCT 
  --vocab_file=/cfs/models/bert-large-uncased/vocab.txt 
  --bert_config_file=/cfs/models/bert-large-uncased/bert_config.json 
  --init_checkpoint=/cfs/models/bert-large-uncased/bert_model.ckpt 
  --max_seq_length=128 
  --train_batch_size=32 
  --learning_rate=5e-5 
  --num_train_epochs=10.0 
  --use_gpu=true 
  --num_gpu_cores=4 
  --use_fp16=false 
  --output_dir=/cfs/outputs/bert-large-uncased-punct

What's More

Add custom task

You can define your own task data processor by implementing DataProcessor class.

Then, add your CustomProcessor to processors.

Finally, you can pass --task=your_task_name to python script.

# Create custom task data processor in run_custom_classifier.pyclass CustomProcessor(DataProcessor):    """Processor for the Custom data set."""

    def get_train_examples(self, data_dir):        """See base class."""
        return self._create_examples(read_custom_train_lines(data_dir), 'train')    def get_dev_examples(self, data_dir):        """See base class."""
        return self._create_examples(read_custom_dev_lines(data_dir), 'dev')    def get_test_examples(self, data_dir):        """See base class."""
        return self._create_examples(read_custom_test_lines(data_dir), 'test')    def get_labels(self):        """See base class."""
        return your_label_list # ["label-1", "label-2", "label-3", ..., "label-k"]

    def _create_examples(self, lines, set_type):        """Creates examples for the training/evaluation/testing sets."""
        examples = []        for (i, line) in enumerate(lines):            # text_b can be None
            (guid, text_a, text_b, label) = parse_your_data_line(line)
            examples.append(
                InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))        return examples# Add CustomProcessor to processors in run_custom_classifier.pydef main(_):    # ...
    # Register 'custom' processor name to processors, and you can pass --task_name=custom to this script
    processors = {        "cola": ColaProcessor,        "mnli": MnliProcessor,        "mrpc": MrpcProcessor,        "xnli": XnliProcessor,        "qqp": QqpProcessor,        "custom": CustomProcessor,
    }    # ...

Tensorflow serving

If --save_for_serving=true is passed to run_custom_classifier.py or run_seq_labeling.py, python script will export SavedModel file to output_dir. Now you are good to go.

  • Install the SavedModel CLI by installing a pre-built Tensorflow binary(usually already installed on your system at pathname binsaved_model_cli) or building TensorFlow from source code.

  • Check your SavedModel file:

    saved_model_cli show --dir <bert_savedmodel_output_path>/<timestamp> --all# For example:saved_model_cli show --dir tf_serving/bert_base_uncased_multi_gpu_qqp/1557722227/ --all# Output:# signature_def['serving_default']:#   The given SavedModel SignatureDef contains the following input(s):#     inputs['input_ids'] tensor_info:#         dtype: DT_INT32#         shape: (-1, 128)#         name: input_ids:0#     inputs['input_mask'] tensor_info:#         dtype: DT_INT32#         shape: (-1, 128)#         name: input_mask:0#     inputs['label_ids'] tensor_info:#         dtype: DT_INT32#         shape: (-1)#         name: label_ids:0#     inputs['segment_ids'] tensor_info:#         dtype: DT_INT32#         shape: (-1, 128)#         name: segment_ids:0#   The given SavedModel SignatureDef contains the following output(s):#     outputs['probabilities'] tensor_info:#         dtype: DT_FLOAT#         shape: (-1, 2)#         name: loss/Softmax:0#   Method name is: tensorflow/serving/predict
  • Install Bazel and compile tensorflow_model_server.

    cd /your/path/to/tensorflow/serving
    bazel build -c opt //tensorflow_serving/model_servers:tensorflow_model_server
  • Start tensorflow serving to listen on port for HTTP/REST API or gRPC APItensorflow_model_server will initialize the models in <bert_savedmodel_output_path>.

    # HTTP/REST APIbazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --rest_api_port=<rest_api_port> --model_name=<model_name> --model_base_path=<bert_savedmodel_output_path># For example:bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --rest_api_port=9000 --model_name=bert_base_uncased_qqp --model_base_path=/root/tf_serving/bert_base_uncased_multi_gpu_qqp --enable_batching=true# Output:# 2019-05-14 23:26:38.135575: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: bert_base_uncased_qqp version: 1557722227}# 2019-05-14 23:26:38.158674: I tensorflow_serving/model_servers/server.cc:324] Running gRPC ModelServer at 0.0.0.0:8500 ...# 2019-05-14 23:26:38.179164: I tensorflow_serving/model_servers/server.cc:344] Exporting HTTP/REST API at:localhost:9000 ...
  • Make a request to test your latest serving model.

    curl -H "Content-type: application/json" -X POST -d '{"instances": [{"input_ids": [101,2054,2064,2028,2079,2044,16914,5910,1029,102,2054,2079,1045,2079,2044,2026,16914,5910,1029,102,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "input_mask": [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "segment_ids": [0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "label_ids":[0]}]}'  "http://localhost:9000/v1/models/bert_base_uncased_qqp:predict"# Output:# {"predictions": [[0.608512461, 0.391487628]]}

Stargazers over time

图片.png

License

Apache License, please click to check more details.


上一篇: VL-BERT

下一篇:BERT_ChineseWordSegment

用户评价
全部评价

热门资源

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...