WORK IN PROGRESS - Update expected in November 2024
Can you link this to source github page?
Datasets featuring global, high-level flights per aircraft, extracted from ADS-B position reports.
Published per quarter of a year, starting from 2024 onwards. Covers all flights as long as within coverage of the ADSBlol initiative.
-
This project uses the ADS-B data from the ADSBlol initiative. Consider supporting their great project. That would also make the flights extract of this repository even better.
-
This project uses validation data from vradarserver/Andrew Whewell to check extracted routes with additional route data (based on aircraft callsign). Consider supporting their great initiative. Again, that would also make the flights extract of this repository even better.
See the Releases section of this repository for a parquet/tar/zip file with the flights per aircraft, per quarter of a year
The parquet filetype has been selected to keep flights data manageable in terms of size and processing/loading times. Each quarter features approx. 10 million flights and 500,000 aircraft, which in csv format would total approx. 3 GB. Hence the selection of a parquet filetype, which stays below 1 GB. Loading a parquet file is very straightforward with python:
df = pandas.read_parquet('2024_Q1.parquet')
Furthermore, to check the parquet dataset without python, you can use tools like ParquetViewer which feature a user interface/GUI and can be installed on Windows as exe.
The data is published per quarter of a year. The 4 quarters of each year feature some overlap to ensure no flights are incomplete (not cut in half).
Given potentially limited ADS-B reception coverage of the ADSBlol initiative in certain continents, some aircraft tracks start after the airport of origin or end before the airport of destination. For those cases, the flights data has been enhanced by looking up the aircraft flight callsign and matching it with the open-source aircraft callsign vs route dataset of vradarserver/Andrew Whewell. Kindly note the departure/arrival times have not been updated accordingly in the basic dataset provided here.
Given ADS-B transmissions simply sending location data, wrong location data as a result of GPS spoofing can also be transmitted. Once more, the added column with callsign vs route lookup allows to filter out those flights where aircraft emitted wrong position data.
If you require specfic data or analyses, you can contact me, details in my github profile. More detailed datasets available upon request include updated departure/arrival times for those cases where the aircraft track was not complete (1), on/off-block instead of RWY times (2), used RWY per flight (3), among other data. Furthermore, advanced passenger estimates per flight (4) taking into consideration the type of flight, time of day, etc can be provided upon request.
Status Q2 2024 Number of receivers/antennas of ADSBlol initiative (image above)
Aircraft coverage of ADSBlol initiative. Time of day ~13:00 UTC to have reasonable ops in all continents - no midnight situation in major markets (image above)
Given the fact that ADSBlol coverage improves regularly, validation is a never finished task, especially for airports/flights all over the world.
At present, each quarter of extracted flights features approx. 10 million flights and 500,000 aircraft. In more detail, flights for AMS/EHAM airport have been analysed and outcomes documented below.
Validation case study AMS/EHAM reference day 2024-06-14
-
Number of (commercial!) flights extracted from ADS-B data vs # flights from real AMS schedule --> significantly close, within 5% error margin
-
Airline representation --> significantly close, within 5% error margin
-
Destination/origin of a flight accurate 73% of time purely based on ADS-B track data, improved to 95+% by using callsign vs route lookup
Please use in line with license defined in this repository. No guarantee, liability, warranty. All open-source. Free to use.
In case of data inaccuracies, please use the 'issues' tab of this repository.
This concerns RWY times, so lift-off time for departures and touchdown time for arrivals.
... SDU example --> aerodrome reference point vs coordinates of track are at RWY, not ARP ... to account for this when assigning an airport ... AMS example... if touchdown @ RWY 18R, hence search for airports within 9 km
In case of go-arounds/touch-and-go/balked landings, only the final touchdown is counted as touchdown time of the flight (with commercial flights in mind).