资源数据集购物车商品关联竞赛数据【Kaggle竞赛】

购物车商品关联竞赛数据【Kaggle竞赛】

2019-12-26 | |  192 |   0 |   0

Description:

Whether you shop from meticulously planned grocery lists or let whimsy guide your grazing, our unique food rituals define who we are. Instacart, a grocery ordering and delivery app, aims to make it easy to fill your refrigerator and pantry with your personal favorites and staples when you need them. After selecting products through the Instacart app, personal shoppers review your order and do the in-store shopping and delivery for you.

Instacart’s data science team plays a big part in providing this delightful shopping experience. Currently they use transactional data to develop models that predict which products a user will buy again, try for the first time, or add to their cart next during a session. Recently, Instacart open sourced this data - see their blog post on 3 Million Instacart Orders, Open Sourced.

In this competition, Instacart is challenging the Kaggle community to use this anonymized data on customer orders over time to predict which previously purchased products will be in a user’s next order. They’re not only looking for the best model, Instacart’s also looking for machine learning engineers to grow their team.

Evaluation:

Submissions will be evaluated based on their mean F1 score.

Submission File

For each orderid in the test set, you should predict a space-delimited list of productids for that order. If you wish to predict an empty order, you should submit an explicit 'None' value. You may combine 'None' with product_ids. The spelling of 'None' is case sensitive in the scoring metric. The file should have a header and look like the following:

order_id,products  
17,1 2  
34,None  
137,1 2 3  
etc.

Data Description:

The dataset for this competition is a relational set of files describing customers' orders over time. The goal of the competition is to predict which products will be in a user's next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We also provide the week and hour of day the order was placed, and a relative measure of time between orders. For more information, see the blog post accompanying its public release.

File descriptions

Each entity (customer, product, order, aisle, etc.) has an associated unique id. Most of the files and variable names should be self-explanatory.

aisles.csv

 aisle_id,aisle  
 1,prepared soups salads  
 2,specialty cheeses  
 3,energy granola bars  
 ...

departments.csv

 department_id,department  
 1,frozen  
 2,other  
 3,bakery  
 ...

order_products__*.csv

These files specify which products were purchased in each order. order_products__prior.csv contains previous order contents for all customers. 'reordered' indicates that the customer has a previous order that contains the product. Note that some orders will have no reordered items. You may predict an explicit 'None' value for orders with no reordered items. See the evaluation page for full details.

 order_id,product_id,add_to_cart_order,reordered  
 1,49302,1,1  
 1,11109,2,1  
 1,10246,3,0  
 ...

orders.csv

This file tells to which set (prior, train, test) an order belongs. You are predicting reordered items only for the test set orders. 'order_dow' is the day of week.

 order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order  
 2539329,1,prior,1,2,08,  
 2398795,1,prior,2,3,07,15.0  
 473747,1,prior,3,3,12,21.0  
 ...

products.csv

 product_id,product_name,aisle_id,department_id
 1,Chocolate Sandwich Cookies,61,19  
 2,All-Seasons Salt,104,13  
 3,Robust Golden Unsweetened Oolong Tea,94,7  
 ...

sample_submission.csv

order_id,products
17,39276  
34,39276  
137,39276  
...



上一篇:运用卫星数据跟踪亚马逊热带雨林中的人类轨迹竞赛【Kaggle竞赛】

下一篇:芝加哥街边咖啡厅季节中的人行道咖啡厅许可数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...