Context

This is real real-time bidding data that is used to predict if an advertiser should bid for a marketing slot e.g. a banner on a webpage. Explanatory variables are things like browser, operation system or time of the day the user is online, marketplace his identifiers were traded on earlier, etc. The column 'convert' is 1, when the person clicked on the ad, and 0 if this is not the case.

Content

Unfortunately, the data had to be anonymized, so you basically can't do a lot of feature engineering. I just applied PCA and kept 0.99 of the linear explanatory power. However, I think it's still really interesting data to just test your general algorithms on imbalanced data. ;)

Inspiration

Since it's heavily imbalanced data, it doesn't make sense to train for accuracy, but rather try to get obtain a good AUC, F1Score, MCC or recall rate, by cross-validating your data. It's interesting to compare different models (logistic regression, decision trees, svms, ...) over these metrics and see the impact that your split in train:test data has on the data.

It might be good strategy to follow these Tactics to combat imbalanced classes.

上一篇：美国连环凶案数据（1980-2014）【Kaggle竞赛】

下一篇：Reddit 用户交互记录【Kaggle竞赛】

用户评价

全部评价

还没有评论，说两句吧！

热门资源

GRAZ 图像分类数据

GRAZ 图像分类数据
MIT Cars 汽车图像...

MIT Cars 汽车图像数据
凶杀案报告数据

凶杀案报告数据
猫和狗图像分类数...

Kaggle 上的竞赛数据，用以区分猫和狗两类对象，...
Bosch 流水线降低...

数据来自产品在Bosch真实生产线上制造过程中的设备...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com