资源数据集广告实时竞价数据【Kaggle竞赛】

广告实时竞价数据【Kaggle竞赛】

2020-01-15 | |  127 |   0 |   0

Description

Context

This is real real-time bidding data that is used to predict if an advertiser should bid for a marketing slot e.g. a banner on a webpage. Explanatory variables are things like browser, operation system or time of the day the user is online, marketplace his identifiers were traded on earlier, etc. The column 'convert' is 1, when the person clicked on the ad, and 0 if this is not the case.

Content

Unfortunately, the data had to be anonymized, so you basically can't do a lot of feature engineering. I just applied PCA and kept 0.99 of the linear explanatory power. However, I think it's still really interesting data to just test your general algorithms on imbalanced data. ;)

Inspiration

Since it's heavily imbalanced data, it doesn't make sense to train for accuracy, but rather try to get obtain a good AUC, F1Score, MCC or recall rate, by cross-validating your data. It's interesting to compare different models (logistic regression, decision trees, svms, ...) over these metrics and see the impact that your split in train:test data has on the data.

It might be good strategy to follow these Tactics to combat imbalanced classes.


上一篇:美国连环凶案数据(1980-2014)【Kaggle竞赛】

下一篇:Reddit 用户交互记录【Kaggle竞赛】

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...