Context
The dataset is a csv file compiled using a python scrapper developed using Reddit's PRAW API. The raw data is a list of 3-tuples of [username,subreddit,utc timestamp]. Each row represents a single comment made by the user, representing about 5 days worth of Reddit data. Note that the actual comment text is not included, only the user, subreddit and comment timestamp of the users comment. The goal of the dataset is to provide a lens in discovering user patterns from reddit meta-data alone. The original use case was to compile a dataset suitable for training a neural network in developing a subreddit recommender system. That final system can be found here
A very unpolished EDA for the dataset can be found here. Note the published dataset is only half of the one used in the EDA and recommender system, to meet kaggle's 500MB size limitation.
Content
user - The username of the person submitting the comment
subreddit - The title of the subreddit the user made the comment in
utc_stamp - the utc timestamp of when the user made the comment
Acknowledgements
The dataset was compiled as part of a school project. The final project report, with my collaborators, can be found here