gpt2-bert-reddit-bot

series of scripts to fine-tune GPT-2 and BERT models using reddit data for generating realistic replies.

jupyter notebooks also available on Google Colab here

see my blog post for a walkthrough on running the scripts

processing training data

I use pandas read_gbq to read from google bigquery. get_reddit_from_gbq.py automates the download. prep_data.py cleans and transforms the data into a format that is usable by the GPT2 and BERT fine-tuning scripts. I manually upload the results from prep_data.py into Google Drive to be used by the Google Colab notebooks.

pulling reddit comments with praw

I use praw to download comments.

reddit = praw.Reddit(client_id='client_id', 
                     client_secret='client_secret',
                     password='reddit_password',
                     username='reddit_username',
                     user_agent='reddit user agent name')
                     
...
subreddit = reddit.subreddit(subreddit_name)
for h in subreddit.rising(limit=5):
  for c in h.comments:
    {do stuff}

See the code for more details.

training, generating, classifying

more documentation to come soon...

上一篇：gpt-2-tensorflow2.0

下一篇：gpt-2-keyword-generation

用户评价

全部评价

还没有评论，说两句吧！

热门资源

TensorFlow-Course

This repository aims to provide simple and read...
seetafaceJNI

项目介绍基于中科院seetaface2进行封装的JAVA...
mxnet_VanillaCNN

This is a mxnet implementation of the Vanilla C...
DuReader_QANet_BiDAF

Machine Reading Comprehension on DuReader Usin...
Klukshu-Sockeye-...

KLUKSHU SOCKEYE PROJECTS 2016 This repositor...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com