资源数据集Reddit 用户交互记录【Kaggle竞赛】

Reddit 用户交互记录【Kaggle竞赛】

2020-01-15 | |  146 |   0 |   0

Context

The dataset is a csv file compiled using a python scrapper developed using Reddit's PRAW API. The raw data is a list of 3-tuples of [username,subreddit,utc timestamp]. Each row represents a single comment made by the user, representing about 5 days worth of Reddit data. Note that the actual comment text is not included, only the user, subreddit and comment timestamp of the users comment. The goal of the dataset is to provide a lens in discovering user patterns from reddit meta-data alone. The original use case was to compile a dataset suitable for training a neural network in developing a subreddit recommender system. That final system can be found here

A very unpolished EDA for the dataset can be found here. Note the published dataset is only half of the one used in the EDA and recommender system, to meet kaggle's 500MB size limitation.

Content

user - The username of the person submitting the comment
subreddit - The title of the subreddit the user made the comment in
utc_stamp - the utc timestamp of when the user made the comment

Acknowledgements

The dataset was compiled as part of a school project. The final project report, with my collaborators, can be found here


上一篇:广告实时竞价数据【Kaggle竞赛】

下一篇:UCI经典二分类数据集

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...