芝加哥出租车行驶记录(2013年至今)
Chicago residents and visitors took more than 27 million taxi rides in 2015, traveling 83 million miles and spending more than $400 million.
The City of Chicago assures the quality and safety of those rides through its Department of Business Affairs & Consumer Protection (BACP). We have long published the taxi drivers and vehicles licensed by BACP on the Chicago Data Portal.
As part of its mission, BACP is also authorized to collect information on taxi rides, themselves. It does so through periodic reporting by two major payment processors believed to cover most taxis in Chicago. Based on these reports, we are now able to provide a dataset of over 100 million Chicago taxi rides, dating back to 2013.
Each row in the dataset describes a distinct taxi trip and shows:
Which taxi provided the trip
What times the trip started and ended
Length of the trip in both time and distance
Starting and ending Community Area — plus Census Tract for many trips
Fare amount and other components of the trip cost
Type of payment — such as cash or credit card. (As an important note, cash tips are not included in the data because they do not go through the payment systems.)
Taxi company
Although taxi rides take place on the public streets and can be freely observed, we realized from the start that there could be privacy issues in publishing them. We took this issue very seriously and implemented a number of measures to preserve privacy while not unduly hampering use of the data. These measures include the following.
Taxi trips are not reported in real time. Each trip appears long after the completion of the ride. Therefore, the dataset cannot be used to track trips in motion or even just-completed trips. By the nature of how the data are collected, reported, and processed, a minimum of a few days will pass between completion of a ride and its appearance in the dataset. More typically, the delay will be anywhere from a week to a month.
Each licensed Chicago taxi has a license number, indicated by the Illinois license plate number, a painted number on the body of the taxi, and the medallion on the taxi’s hood. The Taxi ID in this dataset is not that license number. It is created specifically for this dataset, with no external meaning, to allow users to determine rides provided by the same taxi but not which taxi.
We anticipate that analysis of taxi trips by time will be a major use of this dataset and we hope will add significant value for understanding the taxi industry and travel in Chicago. However, there is minimal value and some potential privacy cost in making it possible to find a specific trip that someone knows departed at 10:13 am. To balance these issues, we have rounded all start and end times to the nearest 15 minutes.
From where and to where people travel is, of course, the most basic information about taxi trips and expected to be the topic of much analysis. However, as with exact time, exact location down to the street address could affect privacy. Therefore, we provide location only at the Census Tract and Community Area levels.
Since some Census Tracts have relatively infrequent taxi trips, we show the Census Tracts of a trip only if both the starting and ending Census Tracts had at least three trips in the relevant 15-minute time period. Because of this rule, about 1/4 of Census Tracts that would otherwise be shown are blank. (Others are blank because of missing data or falling outside Chicago.)
Our usual approaches to open data are “you see what we see” and “open data can be messy data.” We generally lean against cleaning up data before publication to the Data Portal. However, with over 100 million rows of data, collected from a variety of hardware and software under real-world conditions, we found that some values not only were plainly implausible or impossible but also affected such things as averages and data visualizations to unreasonable degrees. Therefore, we have applied the following corrections to the data.
Trip times less than zero or greater than 86,400 seconds are removed.
Trip lengths less than zero or greater than 3,500 miles are removed.
If any component of the trip cost is less than $0 or greater than $10,000, all components of the trip cost are removed.
A time of 86,400 seconds is one day. Even making generous assumptions about breaks for rest, a trip longer than that pushes against City of Chicago limits on maximum working time for taxi drivers.
A distance of 3,500 miles happens to be about the furthest from Chicago one can drive and end within the United States.
In the case of trip costs, $10,000 likely is very high but happens to be a break point in the scaling of charts on our dashboard and the next break point down, $1,000, is not necessarily implausible for rare trips.
Naturally, many of the extreme values that remain likely are also wrong but we prefer to leave it to the user to filter further, based on his or her judgement and needs for a particular use of the data.
Finally, we determined that we likely have duplicate trips and have de-duplicated a bit (currently about 0.45% of records). Again, being conservative about removing data, we call records duplicates only if they have identical values for:
Taxi ID
Trip Start and End Timestamps
Trip Seconds
Trip Miles
Pickup and Dropoff Census Tracts (after blanking out for privacy, if necessary)
Pickup and Dropoff Community Areas
We hope residents of Chicago, researchers, and others find the taxi trip data useful. We do realize that at 100 million rows and growing, the dataset speed, particularly the classic rows and columns view, may not always be as fast as for other datasets. As always, and especially with a dataset of this size and complexity, we welcome questions, comments, suggestions for improvements, and any other feedback to dataportal@cityofchicago.org or @ChicagoCDO.
还没有评论,说两句吧!
热门资源
GRAZ 图像分类数据
GRAZ 图像分类数据
MIT Cars 汽车图像...
MIT Cars 汽车图像数据
凶杀案报告数据
凶杀案报告数据
猫和狗图像分类数...
Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...
Bosch 流水线降低...
数据来自产品在Bosch真实生产线上制造过程中的设备...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com