根据 Twitter 发文预测用户性别竞赛数据【Kaggle竞赛】

资源分类

2019-12-24 |

144 |

0 |

根据 Twitter 发文预测用户性别竞赛数据【Kaggle竞赛】

Description:

This data set was used to train a CrowdFlower AI gender predictor. You can read all about the project here. Contributors were asked to simply view a Twitter profile and judge whether the user was a male, a female, or a brand (non-individual). The dataset contains 20,000 rows, each with a user name, a random tweet, account profile and image, location, and even link and sidebar color.

Inspiration

Here are a few questions you might try to answer with this dataset:

how well do words in tweets and profiles predict user gender?
what are the words that strongly predict male or female gender?
how well do stylistic factors (like link color and sidebar color) predict user gender?

Acknowledgments

Data was provided by the Data For Everyone Library on Crowdflower.

Our Data for Everyone library is a collection of our favorite open data jobs that have come through our platform. They're available free of charge for the community, forever.

The Data

The dataset contains the following fields:

_unit_id: a unique id for user
_golden: whether the user was included in the gold standard for the model; TRUE or FALSE
_unit_state: state of the observation; one of finalized (for contributor-judged) or golden (for gold standard observations)
_trusted_judgments: number of trusted judgments (int); always 3 for non-golden, and what may be a unique id for gold standard observations
_last_judgment_at: date and time of last contributor judgment; blank for gold standard observations
gender: one of male, female, or brand (for non-human profiles)
gender:confidence: a float representing confidence in the provided gender
profile_yn: "no" here seems to mean that the profile was meant to be part of the dataset but was not available when contributors went to judge it
profile_yn:confidence: confidence in the existence/non-existence of the profile
created: date and time when the profile was created
description: the user's profile description
fav_number: number of tweets the user has favorited
gender_gold: if the profile is golden, what is the gender?
link_color: the link color on the profile, as a hex value
name: the user's name
profile_yn_gold: whether the profile y/n value is golden
profileimage: a link to the profile image
retweet_count: number of times the user has retweeted (or possibly, been retweeted)
sidebar_color: color of the profile sidebar, as a hex value
text: text of a random one of the user's tweets
tweet_coord: if the user has location turned on, the coordinates as a string with the format "[latitude, longitude]"
tweet_count: number of tweets that the user has posted
tweet_created: when the random tweet (in the text column) was created
tweet_id: the tweet id of the random tweet
tweet_location: location of the tweet; seems to not be particularly normalized
user_timezone: the timezone of the user

上一篇：ISIS的Twitter发文数据【Kaggle竞赛】

下一篇：美国2016年总统大选前Twitter对选民情绪的调查数据【Kaggle竞赛】

用户评价

全部评价

还没有评论，说两句吧！

热门资源

GRAZ 图像分类数据

GRAZ 图像分类数据
MIT Cars 汽车图像...

MIT Cars 汽车图像数据
凶杀案报告数据

凶杀案报告数据
Bosch 流水线降低...

数据来自产品在Bosch真实生产线上制造过程中的设备...
猫和狗图像分类数...

Kaggle 上的竞赛数据，用以区分猫和狗两类对象，...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com