资源算法gpt2-ml

gpt2-ml

2020-03-02 | |  50 |   0 |   0

图片.png

中文说明 | English

  •  Simplifed GPT2 train scripts(based on Grover, supporting TPUs)

  •  Ported bert tokenizer, multilingual corpus compatible

  •  1.5B GPT2 pretrained Chinese model ( ~15G corpus, 10w steps )

  •  Batteries-included Colab demo #

  •  1.5B GPT2 pretrained Chinese model ( ~50G corpus, 100w steps )

Pretrained Model

1.5B GPT2 pretrained Chinese model [Google Drive]

SHA256: 4a6e5124df8db7ac2bdd902e6191b807a6983a7f5d09fb10ce011f9a073b183e

Corpus from THUCNews and nlp_chinese_corpus

Using Cloud TPU Pod v3-256 to train 10w steps

图片.png

Google Colab

With just 2 clicks (not including Colab auth process), the 1.5B pretrained Chinese model demo is ready to go:

[Colab Notebook]

图片.png

Train

Disclaimer

The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks.

Citation

@misc{GPT2-ML,
  author = {Zhibo Zhang},
  title = {GPT2-ML: GPT-2 for Multiple Languages},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {url{https://github.com/imcaspar/gpt2-ml}},
}

Reference

https://github.com/google-research/bert

https://github.com/rowanz/grover

Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)


上一篇:gpt-2-simple

下一篇:GPT2-chitchat

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...