资源数据集贷款违约预测竞赛数据【Kaggle竞赛】

贷款违约预测竞赛数据【Kaggle竞赛】

2019-12-25 | |  86 |   0 |   0

Description:


This competition asks you to determine whether a loan will default, as well as the loss incurred if it does default. Unlike traditional finance-based approaches to this problem, where one distinguishes between good or bad counterparties in a binary way, we seek to anticipate and incorporate both the default and the severity of the losses that result. In doing so, we are building a bridge between traditional banking, where we are looking at reducing the consumption of economic capital, to an asset-management perspective, where we optimize on the risk to the financial investor.

This competition is sponsored by researchers at Imperial College London.

Evaluation:


This competition is evaluated on the mean absolute error (MAE):


MAE=1ni=1n|yiy^i|,


where

  • n is the number of rows

  • y^i is the predicted loss

  • yi is the actual loss

Submission File

For each row in the test set, you should predict the loss. The file should have a header and looks like the following:

id,loss
118679,0
118680,0
118681,0
...

Data Description:


This data corresponds to a set of financial transactions associated with individuals. The data has been standardized, de-trended, and anonymized. You are provided with over two hundred thousand observations and nearly 800 features.  Each observation is independent from the previous. 

For each observation, it was recorded whether a default was triggered. In case of a default, the loss was measured. This quantity lies between 0 and 100. It has been normalised, considering that the notional of each transaction at inception is 100. For example, a loss of 60 means that only 40 is reimbursed. If the loan did not default, the loss was 0. You are asked to predict the losses for each observation in the test set.

Missing feature values have been kept as is, so that the competing teams can really use the maximum data available, implementing a strategy to fill the gaps if desired. Note that some variables may be categorical (e.g. f776 and f777).

The competition sponsor has worked to remove time-dimensionality from the data. However, the observations are still listed in order from old to new in the training set. In the test set they are in random order.



上一篇:广告点击预测竞赛数据【Kaggle竞赛】

下一篇:预测公寓租金竞赛数据【Kaggle竞赛】

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...