资源数据集usedCarsDataset

usedCarsDataset

2019-09-10 | |  167 |   0 |   0

Over 370000 used cars scraped with Scrapy from Ebay-Kleinanzeigen. The content of the data is in german, so one has to translate it first if one can not speak german. Those fields are included: autos.csv:

  • dateCrawled : when this ad was first crawled, all field-values are taken from this date

  • name : "name" of the car

  • seller : private or dealer

  • offerType

  • price : the price on the ad to sell the car

  • abtest

  • vehicleType

  • yearOfRegistration : at which year the car was first registered

  • gearbox

  • powerPS : power of the car in PS

  • model

  • kilometer : how many kilometers the car has driven

  • monthOfRegistration : at which month the car was first registered

  • fuelType

  • brand

  • notRepairedDamage : if the car has a damage which is not repaired yet

  • dateCreated : the date for which the ad at ebay was created

  • nrOfPictures : number of pictures in the ad (unfortunately this field contains everywhere a 0 and is thus useless (bug in crawler!) )

  • postalCode

  • lastSeenOnline : when the crawler saw this ad last online

The fields lastSeen and dateCreated could be used to estimate how long a car will be at least online before it is sold.


The second file is produced in MySQL from the first one through the query:

select 
 count(*) as count, 
 kilometer, 
 yearOfRegistration, 
20*round(powerPS/20) as powerPS, 
min(price) as minprice, 
max(price) as maxPrice, 
avg(price) as avgPreis, 
sqrt(variance(price)) as sdPreis from items where 
     yearOfRegistration > 1990 and yearOfRegistration < 2016 
    and price > 100 and price < 100000 
    and powerPS < 600 and powerPS > 0 
 group by yearOfRegistration, round(powerPS/20),kilometer 
having count > 10 
into outfile '/tmp/cnt_km_year_powerPS_minPrice_maxPrice_avgPrice_sdPrice.csv' 
fields terminated by ',' lines terminated by 'n';


上一篇:Pokemon

下一篇:Maluuba NewsQA

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...