allennlp-distributed-training

This repo holds a few example AllenNLP experiments modified to run with DistributedDataParallel support. The training_config directory has two versions of the same set of experiments. The ones in distributed_data_parallel directory mostly differs with the dataset readers. The dataset readers are replicas of the original ones in AllenNLP, with a minor modification to support distributed sampling.

To run the distributed experiments install AllenNLP:

conda create -n allennlp_distributed python=3.7
conda activate allennlp_distributed
git clone https://github.com/allenai/allennlpcd allennlp
pip install .

And run:

allennlp train training_config/distributed_data_parallel/esim.jsonnet --include-package distributed-training -s output/

To run without distributed setup, do the usual AllenNLP installation and use experiments in training_config/ data_parallel/

Speed Comparison: Time taken to train one epoch (averaged over 3 epochs)

GPU - 2080 Ti

NOTE: The time reported does not correspond to the training_duration metric. This is the time taken by the Trainer._train_epoch method.

Experiment	Single GPU	2x Data Parallel	2x Distributed	4x Data Parallel	4x Distributed
esim.jsonnet (400K SNLI samples)	4m 15s	NA	NA	4m 30s	2m 13s
bidaf.jsonnet	5m 44s	NA	NA	4m 10s	2m 5s

上一篇：allennlp-course-examples

下一篇： allennlp-probe-hw

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
DuReader_QANet_BiDAF

Machine Reading Comprehension on DuReader Usin...
Klukshu-Sockeye-...

KLUKSHU SOCKEYE PROJECTS 2016 This repositor...
flaireWebSite

flaireWebSite
caffe_ocr

caffe_ocr是一个对现有主流ocr算法研究实验性的项...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com