资源数据集全世界1908年以来空难数据

全世界1908年以来空难数据

2019-11-26 | |  92 |   0 |   0

Questions

  1. Yearly how many planes crashed? how many people were on board? how many survived? how many died?

  2. Highest number of crashes by operator and Type of aircrafts.

  3. ‘Summary’ field has the details about the crashes. Find the reasons of the crash and categorize them in different clusters i.e Fire, shot down, weather (for the ‘Blanks’ in the data category can be UNKNOWN) you are open to make clusters of your choice but they should not exceed 7.

  4. Find the number of crashed aircrafts and number of deaths against each category from above step.

  5. Find any interesting trends/behaviors that you encounter when you analyze the dataset.


My solution

The following bar charts display the answers requested by point 1. of the assignment, in particular:

  • the planes crashed per year

  • people aboard per year during crashes

  • people dead per year during crashes

  • people survived per year during crashes

image.png

 The following answers regard point 2 of the assignment

  • Highest number of crashes by operator: Aeroflot with 179 crashes

  • By Type of aircraft: Douglas DC-3 with 334 crashes

I have identified 7 clusters using k-means clustering technique on a matrix obtained by a text corpus created by using Text Analysis (plain text, remove punctuation, to lower, etc.) The following table summarize for each cluster the number of crashes and death.

  • Cluster 1: 258 crashes, 6368 deaths

  • Cluster 2: 500 crashes, 9408 deaths

  • Cluster 3: 211 crashes, 3513 deaths

  • Cluster 4: 1014 crashes, 14790 deaths

  • Cluster 5: 2749 crashes, 58826 deaths

  • Cluster 6: 195 crashes, 4439 deaths

  • Cluster 7: 341 crashes, 8135 deaths

The following picture shows clusters using the first 2 principal components: 

image.png

For each clusters I will summarize the most used words and I will try to identify the causes of the crash

Cluster 1 (258) aircraft, crashed, plane, shortly, taking. No many information about this cluster can be deducted using Text Analysis

Cluster 2 (500) aircraft, airport, altitude, crashed, crew, due, engine, failed, failurefire, flight, landing, lost, pilot, plane, runway, takeoff, taking. Engine failure on the runway after landing or takeoff

Cluster 3 (211): aircraft, crashed, fog Crash caused by fog

Cluster 4 (1014): aircraft, airport, attempting, cargo, crashed, fire, land, landing, miles, pilot, plane, route, runwaystrucktakeoff Struck a cargo during landing or takeoff

Cluster 5 (2749): accident, aircraft, airport, altitude, approach, attempting, cargoconditions, control, crashed, crew, due, engine, failed, failure, feet, fire, flight, flying, fog, ground, killed, land, landing, lost, low, miles, mountain, pilot. plane, poor, route, runway, short, shortly, struck, takeoff, taking, weather
Struck a cargo due to engine failure or bad weather conditions mainly fog

Cluster 6 (195): aircraft, crashed, enginefailurefire, flight, left, pilot, plane, runway
Engine failure on the runway

Cluster 7 (341): accident, aircraft, altitude, cargo, control, crashed, crew, due, enginefailure, flight, landing, loss, lost, pilot, plane, takeoff
Engine failure during landing or takeoff


上一篇:2016年美国总统大选中的宣传海报数据

下一篇:美国股票新闻数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...