Data Mining Project: Understanding and Predicting Airline delay for ATL airport.
Tressy Thomas and Nidhi Davis
Airline traffic are tremendously increasing year over year. More and more travelers in the US are increasingly using air transport, for it being the fastest mode of transport, mainly for time saving. Flight departure and arrival delays are critical factors affecting Airlines’ operational efficiency and customer satisfaction. With the availability of the reliable data, it is possible to provide insights into the patterns and characteristics of airline delays. Data mining techniques are leveraged to discover interesting relationships from the data that impact flight delays
The main objectives of this research are
-
Conduct exploratory data analysis to identify contribution of temporal factors such as time of the day, day of the week and week of the month towards flight delays.
-
Understand the severely affected origin-destinations and airlines.
-
Predict for the future if a flight would be delayed or not.
Methodology
For this project, data pertaining to Atlanta International Airport for nonstop domestic flights, are collected from the source :Bureau of Transportation Statistics https://www.transtats.bts.gov/Tables.asp?DB_ID=120.
Data preparation step will handle any problems in the data such as missing, invalid values and outliers. Appropriate data transformation techniques will be used if necessary. We intend to conduct statistical tests like ANOVA, Chi-square etc. and visualization techniques to understand the correlation between concerned features and flight delay. For the flight delay prediction problem, we plan to use data modeling techniques like decision trees, logistic regression, Support Vector Machines, kNN with appropriate features. Model selection will be performed based on the accuracy of the models. Supplementary data or methods may be utilized as requirements arise.