资源数据集Winton 股票回报率预测竞赛数据【Kaggle竞赛】

Winton 股票回报率预测竞赛数据【Kaggle竞赛】

2019-12-26 | |  158 |   0 |   0

Description:

Do you laugh (and then get down to work) in the face of terabytes of noisy, non-stationary data? Winton Capital is looking for data scientists who excel at finding the hidden signal in the proverbial haystack, and who are excited by creating novel statistical modelling and data mining techniques. 

In this recruiting competition, Winton challenges you to take on the very difficult task of predicting the future (stock returns). Given historical stock performance and a host of masked features, can you predict intra and end of day returns without being deceived by all the noise? 

Research scientists at Winton have crafted this competition to be challenging and fun for the community while providing a taste of the types of problems they work on everyday. They're excited to connect with Kagglers who bring a unique background and creative approach to the competition.

Winton is offering cash prizes to winning teams as a reward for their work, but the intent of the competition is not commercial. The intellectual property you create remains your own and will be evaluated in the context of suitability for employment. 

Evaluation:

Submissions are evaluated using the Weighted Mean Absolute Error. Each return you predicted is compared with the actual return. The formula is then


WMAE=1ni=1nwi|yiyi^|,


where wi is the weight associated with the return, Weight_Intraday, Weight_Daily for intraday and daily returns, iyi is the predicted return, yi^ is the actual return, n is the number of predictions. 

The weights for the training set are given in the training data. The weights for the test set are unknown.

Submission File

The submission file should contain two columns: Id and Predicted. For each 5-day window, you need to predict 62 returns. For example, for the first time window, you will predict 1_1, 1_2, to 1_62. 1_1 to 1_60 are predicting Ret_121 through Ret_180, 1_61 the prediction for Ret_PlusOne, and 1_62 the prediction for Ret_PlusTwo.

The file should contain a header and have the following format:

Id,Predicted
1_1,0
1_2,0
1_3,0
1_4,0
...
1_60,0
1_61,0
1_62,0
2_1,0
2_2,0
etc.

Data Description:

Updated 2015-12-21: Winton have added new data into the test set. If you downloaded the test set before 2015-12-21 please re-download the data set and submit predictions on this instead. 

In this competition the challenge is to predict the return of a stock, given the history of the past few days. 

We provide 5-day windows of time, days D-2, D-1, D, D+1, and D+2. You are given returns in days D-2, D-1, and part of day D, and you are asked to predict the returns in the rest of day D, and in days D+1 and D+2.

During day D, there is intraday return data, which are the returns at different points in the day. We provide 180 minutes of data, from t=1 to t=180. In the training set you are given the full 180 minutes, in the test set just the first 120 minutes are provided.

For each 5-day window, we also provide 25 features, Feature_1 to Feature_25. These may or may not be useful in your prediction.

Each row in the dataset is an arbitrary stock at an arbitrary 5 day time window.

111.jpg

How these returns are calculated is defined by Winton, and will not to be revealed to you in this competition. The data set is designed to be representative of real data and so should bring about a number of challenges.

File descriptions

  • train.csv - the training set, including the columns of:

    • Feature_1 - Feature_25

    • Ret_MinusTwo, Ret_MinusOne

    • Ret_2 - Ret_120

    • Ret_121 - Ret_180: target variables

    • Ret_PlusOne, Ret_PlusTwo: target variables

    • Weight_Intraday, Weight_Daily

  • test.csv - the test set, including the columns of:

    • Feature_1 - Feature_25

    • Ret_MinusTwo, Ret_MinusOne

    • Ret_2 - Ret_120

  • sample_submission.csv - a sample submission file in the correct format

Data fields

  • Feature_1 to Feature_25: different features relevant to prediction

  • Ret_MinusTwo:  this is the return from the close of trading on day D-2 to the close of trading on day D-1 (i.e. 1 day)

  • Ret_MinusOne: this is the return from the close of trading on day D-1 to the point at which the intraday returns start on day D (approximately 1/2 day)

  • Ret_2 to Ret_120: these are returns over approximately one minute on day D. Ret_2 is the return between t=1 and t=2. 

  • Ret_121 to Ret_180: intraday returns over approximately one minute on day D. These are the target variables you need to predict as {id}_{1-60}. 

  • Ret_PlusOne: this is the return from the time Ret_180 is measured on day D to the close of trading on day D+1. (approximately 1
    day). This is a target variable you need to predict as {id}_61. 

  • Ret_PlusTwo: this is the return from the close of trading on day D+1 to the close of trading on day D+2 (i.e. 1 day) This is a target variable you need to predict as {id}_62. 

  • Weight_Intraday: weight used to evaluate intraday return predictions Ret 121 to 180

  • Weight_Daily: weight used to evaluate daily return predictions (Ret_PlusOne and Ret_PlusTwo).



上一篇:Homesite 保险定价竞赛数据【Kaggle竞赛】

下一篇:美国查塔努加市共享单车骑行数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...