资源算法creative-writing-with-gpt2

creative-writing-with-gpt2

2020-03-02 | |  98 |   0 |   0

Creative writing with GPT-2

Quickly get started with a notebook on Google Colab.

One of 2019's most important machine learning stories is the progress of using transfer learning on massive language models.

I have been experimenting with retraining GPT-2 on authors we like, and using the model as a writing partner. The process has been enlightening, and points towards a future where human and machine can write creatively together.

You can see examples of text generation from some of the finetuned models here.

This library wraps around the excellent Hugging Face Transformers library. Two of the scripts have been copied into this repo - run_generation and run_lm_finetuningboth of which can be found here.

How to write creatively with GPT-2

GPT-2 is not ready to write text on it's own - but with a bit of human supervision you can use the text it generates to write interesting text!

GPT-2 was originally trained on 40 GB of text from Wikipedia & news articles. This library can be used to generate text with the base GPT-2 model and to fine tune the base GPT-2 model to text of your choosing.

The library has a number of datasets in creative-writing-with-gpt2/data. A dataset is defined as a text file called clean.txt - for example asimov/clean.txt.

$ tree -L 1 creative-writing-with-gpt2/data
creative-writing-with-gpt2/data
├── alan-watts
├── asimov
├── bible
├── harry
├── hemingway
├── mahabarta
├── meditations
├── plato
└── tolkien

A number of pre-fine-tuned models are available in creative-writing-with-gpt2/models.py - you can download them to your machine by running python models.py.

Run on Colab

The recommended way to interact with this repo is through this Google Colab notebook - the free GPU is useful for fine-tuning.

Run locally

git clone https://github.com/ADGEfficiency/creative-writing-with-gpt2cd creative-writing-with-gpt2
pip install -r requirements.txt
python models.py

To run the text generation with fine-tuned model (either downloaded from running python gdrive_models.py or from training yourself.

python run_generation.py 
  --model_type=gpt2 
  --model_name_or_path="./models/tolkien" 
  --length=200
python run_lm_finetuning.py 
  --output_dir="./models/harry" 
  --model_type=gpt2 
  --model_name_or_path=gpt2 
  --do_train 
  --train_data_file="./data/harry/clean.txt" 
  --num_train_epochs=4 
  --overwrite_output_dir 
  --save_steps 10000

To run the text generation with the base GPT2 model:

python run_generation.py 
  --model_type=gpt2 
  --model_name_or_path="models/gpt2" 
  --length=200

Further reading

Allen Institute for Artificial Intelligence GPT-2 Explorer

huggingface/transformers

The Illustrated GPT-2 - Visualizing Transformer Language Models

The State of Transfer Learning in NLP


上一篇:gpt2-app

下一篇:gpt2-discord-bot

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...