Follow the instructions in the converscope repo to prepare the inbox.pb data file. We are not supplying data for privacy reasons.
python3 converscope/dump_gpt2.py to dump train and test text files.
Finetuning GPT-2
We experimented with several existing implementations of GPT-2 training.
gpt-2-simple
The gpt-2-simple repo features a Colab notebook that enables free GPU training on Colaboratory without having to pay for a GPU on Google Cloud
transformers
We used the transformers repo to train GPT-2 models and evaluate perplexity. The following commands assume you are in the transformers/examples directory.