资源算法reddit-gpt-2-cloud-run

reddit-gpt-2-cloud-run

2020-03-03 | |  63 |   0 |   0

reddit-gpt-2-cloud-run

Code for running a Reddit title generator API using gpt-2-cloud-run. You can play with the API here.

The Reddit data was retrieved using the BigQuery in query.sql, which retrieves the Top 2000 posts on each of the Top 2500 subreddits from January 2017 to February 2019 (w/ miscellaneous quality filters).

The resulting CSV was encoded using gpt-2-keyword-generation (w/ a 32 vCPU cloud machine as it's a lot of data!), pre-encoded for training using gpt-2-simple's encode_dataset() function (since otherwise it would take a half hour to start training!) and GPT-2 117M was finetuned on the resulting pre-encoded dataset using gpt-2-simple.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Disclaimer

This repo has no affiliation or relationship with OpenAI.


上一篇:minizoo-gpt2

下一篇: gpt2-flask

用户评价
全部评价

热门资源

  • Keras-ResNeXt

    Keras ResNeXt Implementation of ResNeXt models...

  • seetafaceJNI

    项目介绍 基于中科院seetaface2进行封装的JAVA...

  • spark-corenlp

    This package wraps Stanford CoreNLP annotators ...

  • capsnet-with-caps...

    CapsNet with capsule-wise convolution Project ...

  • inferno-boilerplate

    This is a very basic boilerplate example for pe...