Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating ground time in Question 3, section 16.3.4 #605

Open
jhanarato opened this issue Aug 26, 2021 · 0 comments
Open

Calculating ground time in Question 3, section 16.3.4 #605

jhanarato opened this issue Aug 26, 2021 · 0 comments

Comments

@jhanarato
Copy link

jhanarato commented Aug 26, 2021

The question text:

Compare air_time with the duration between the departure and arrival. Explain your findings. (Hint: consider the location of the airport.)

I spent 10 days on this to figure out the following:

  1. Departure times of 2400 are actually midnight the next day.

  2. There's no date given for arrival times. We have to assume that when the
    arrival time is prior to departure it means the flight arrives the next day.

  3. Differences in timezone between arrival and departure need to be taken into
    account. There is no data for some destinations in the airports table. As it
    happens these destinations are all in the same timezone and we set them to that.

  4. 6 airports (HNL, PHX, BQN, PSE, SJU and STT) are in locations that do not
    observe DST. However, all destinations have been adjusted for DST. In my
    analysis I added the hour to the taxi_time variable, though perhaps adding
    it to the arrival time would be better.

  5. There are three data points left, just on the cusp of the end of daylight savings. I just bumped them up by 1hr.

The end result is this mutate script:

# Daylight savings.
dst_start <- ymd_hm("2013-03-10 03:00")
dst_end <-   ymd_hm("2013-11-03 02:00")

# None of these airports observe daylight savings.
no_dst_airports <- c("HNL", "PHX", "BQN", "PSE", "SJU", "STT")

# These are not in the airports table.
na_airports <- c("BQN", "PSE", "SJU", "STT")

arr_dep <- flights %>%
  filter(!is.na(dep_time), !is.na(arr_time), !is.na(air_time)) %>%
  left_join(airports, c("dest" = "faa")) %>%
  mutate(
    # Get the departure datetime. 
    # A dep_time of 2400 indicates thatit leaves at midnight the following day.
    dep_time_zero = if_else(dep_time == 2400, as.integer(0), dep_time),
    dep_day =       if_else(dep_time == 2400, as.integer(day + 1), day),
    
    dep_hour = dep_time_zero %/% 100,
    dep_min  = dep_time_zero %%  100,
    
    dep_dt = make_datetime(year, month, dep_day, dep_hour, dep_min),
    
    # Get the arrival datetime.
    arr_hour = arr_time %/% 100,
    arr_min  = arr_time %%  100,
    
    # In flight during midnight. Assumes no 24hr flights.
    arr_day = if_else(arr_hour < dep_hour, as.integer(dep_day + 1), dep_day),
    arr_dt = make_datetime(year, month, arr_day, arr_hour, arr_min),
    
    # Find the flight duration in minutes. 
    duration = as.numeric(difftime(arr_dt, dep_dt, units = "mins")),
    
    # Add missing tz.
    tz = if_else(dest %in% na_airports, -4, tz),
    
    # Adjust for timezone relative to New York. 
    duration_tz = duration - ((tz + 5) * 60),
    
    # Destination does not observe daylight savings.
    no_dst = dest %in% no_dst_airports,
    
    # The flight arrives during daylight savings.
    arr_during_dst = arr_dt > dst_start & arr_dt < dst_end,
    
    # Adjustment where arrival time is wrongly recorded as daylight savings
    dst_adjustment = if_else(no_dst & arr_during_dst, 60, 0),
    
    # "Duration" seems to be the time travelling when not in the air.
    taxi_time_unadjusted = duration_tz - air_time,
    taxi_time = duration_tz - air_time + dst_adjustment,
    
    # There are three remaining flights where taxi_time is negative.
    # They're within a couple of hours of the DST so bump them up an hour.
    taxi_time = if_else(taxi_time < 0, taxi_time + 60, taxi_time)
  ) %>%
  select(origin, dest, 
         dep_dt, arr_dt, 
         air_time, duration, duration_tz, taxi_time, taxi_time_unadjusted,
         no_dst, arr_during_dst,
         distance, lat, lon, tz)

Once all this is done, the difference between dep_time - arr_time air_time is always positive and on average about half an hour. A reasonable guess is that dep_time - arr_time is equal to the time on the ground which I've referred to as taxi_time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant