Skip to content

Commit

Permalink
update readme/vignettes
Browse files Browse the repository at this point in the history
  • Loading branch information
polettif committed Jun 23, 2023
1 parent fb45fe0 commit d43a7cc
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 28 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ devtools::install_github("r-transit/tidytransit")

- [`gtfsio`](https://github.com/r-transit/gtfsio) R package to read and write gtfs feeds, tidytransit builds on gtfsio
- [`gtfstools`](https://github.com/ipeaGIT/gtfstools) Tools for editing and analysing transit feeds
- [`gtfsrouter`](https://github.com/ATFutures/gtfs-router) Package for public transport routing
- [`gtfsrouter`](https://github.com/UrbanAnalyst/gtfsrouter) Package for public transport routing
- [`gtfs2gps`](https://github.com/ipeaGIT/gtfs2gps) Converting public transport data from GTFS format to GPS-like records


Expand Down
15 changes: 8 additions & 7 deletions vignettes/introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -44,26 +44,27 @@ devtools::install_github("r-transit/tidytransit")

# The General Transit Feed Specification

The [summary page for the GTFS standard](https://developers.google.com/transit/gtfs/#overview-of-a-gtfs-feed)
is a good resource for more information on the standard.
The [summary page for the GTFS standard](https://gtfs.org/schedule/)
is a good resource for more information on the standard.

GTFS feeds contain many linked tables about published transit schedules about service
GTFS feeds contain many linked tables about published transit schedules, service
schedules, trips, stops, and routes. Below is a diagram of these relationships and tables:

![gtfs-relationship-diagram](figures/GTFS_class_diagram.svg.png)
Source: Wikimedia, user -stk.

# Read a GTFS Feed

GTFS data come packaged as a zip file of tables in text form. The first thing tidytransit
does is consolidate the reading of all those tables into a single R object, which contains
GTFS data come packaged as a zip file of comma separated tables in text form. The first thing
tidytransit does is consolidate the reading of all those tables into a single R object, which contains
a list of the tables in each feed.

Below we use the tidytransit `read_gtfs` function in order to read a feed from the NYC MTA
into R.
into R.

We use a feed included in the package in the example below. But note that you can read
directly from the New York City Metropolitan Transit Authority, as shown in the commented code below.
directly from the New York City Metropolitan Transit Authority, as shown in the commented
code below.

You can also read from any other URL. This is useful because there are many sources for
GTFS data, and often the best source is transit service providers themselves. See the next
Expand Down
18 changes: 10 additions & 8 deletions vignettes/servicepatterns.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ knitr::opts_chunk$set(echo = TRUE)

## Overview

Each trip in a GTFS feed is referenced to a service_id (in trips.txt). The [GTFS reference](https://developers.google.com/transit/gtfs/reference/#calendartxt)
Each trip in a GTFS feed is referenced to a service_id (in trips.txt). The [GTFS reference](https://gtfs.org/schedule/reference/#calendartxt)
specifies that a "service_id contains an ID that uniquely identifies a set of dates when
service is available for one or more routes". A service could run on every weekday or only
on Saturdays for example. Other possible services run only on holidays during a year,
independent of weekdays. However, feeds are not required to indicate anything with
service_ids and some feeds even use a unique service_id for each trip and day. In
this vignette we'll look at a general way to gather information on when trips run by
this vignette, we'll look at a general way to gather information on when trips run by
using "service patterns".

Service patterns can be used to find a typical day for further analysis like routing or
Expand All @@ -51,8 +51,8 @@ head(gtfs$.$dates_services)
```

To understand service patterns better we need information on weekdays and holidays. With
a calendar table we know the weekday and possible holidays for each date. We'll use a minimal example
with two holidays.
a calendar table we know the weekday and possible holidays for each date. We'll use a
minimal example with two holidays.

```{r}
holidays = tribble(~date, ~holiday,
Expand All @@ -75,7 +75,7 @@ head(calendar)
To analyse on which dates trips run and to group similar services we use service patterns.
Such a pattern simply lists all dates a trip runs on. For example, a trip with a pattern
like _2019-03-07, 2019-03-14, 2019-03-21, 2019-03-28_ runs every Thursday in March 2019.
To handle these patterns we create a `servicepattern_id` using a hash function. Ideally
To handle these patterns, we create a `servicepattern_id` using a hash function. Ideally
there are the same number of servicepattern_ids and service_ids. However, in real life feeds
this is rarely the case. In addition, the usability of service patterns depends largely on
the feed and its complexity.
Expand All @@ -95,15 +95,17 @@ In addition, `gtfs$.$dates_servicepatterns` has been created which connects date
patterns (like `dates_services`). We can compare the number of service patterns to the number of services.

```{r}
head(gtfs$.$dates_servicepatterns)
# service ids used
n_services <- length(unique(gtfs$trips$service_id)) # 70
n_services <- length(unique(gtfs$trips$service_id)) # 70
# unique date patterns
n_servicepatterns <- length(unique(gtfs$.$servicepatterns$servicepattern_id)) # 7
```

The feed uses 70 service_ids but there are actually only 7 different date patterns. Other feeds
might not have such low numbers, for example the [Swiss GTFS feed](https://opentransportdata.swiss/dataset/timetable-2019-gtfs)
The feed uses 70 service_ids but there are actually only 7 different date patterns. Other
feeds might not have such low numbers, for example the [Swiss GTFS feed](https://opentransportdata.swiss/dataset/timetable-2019-gtfs)
uses around 15'600 service_ids which all identify unique date patterns.

## Analyse Data
Expand Down
28 changes: 16 additions & 12 deletions vignettes/timetable.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ if(!exists("trip_headsign", where = gtfs$trips)) {
trip_headsigns <- gtfs$stop_times %>%
group_by(trip_id) %>%
summarise(stop_id = stop_id[which.max(stop_sequence)]) %>%
left_join(gtfs$stops, by="stop_id") %>% select(trip_id, trip_headsign.computed = stop_name)
left_join(gtfs$stops, by="stop_id") %>%
select(trip_id, trip_headsign.computed = stop_name)
# assign the headsign to the gtfs object
gtfs$trips <- left_join(gtfs$trips, trip_headsigns, by = "trip_id")
Expand All @@ -77,7 +78,7 @@ if(!exists("trip_headsign", where = gtfs$trips)) {

## Create A Departure Time Table

To create a departure timetable we first need to find the ids of all stops in the stops
To create a departure timetable, we first need to find the ids of all stops in the stops
table with the same same name, as `stop_name` might cover different platforms and thus have
multiple stop_ids in the stops table.

Expand All @@ -87,6 +88,9 @@ stop_ids <- gtfs$stops %>%
select(stop_id)
```

Note that multiple unrelated stops can have the same `stop_name`, see `cluster_stops()`
for examples how to find these cases.

## Trips departing from stop

To the selected stop_ids for Time Square, we can join trip columns: `route_id`, `service_id`,
Expand Down Expand Up @@ -122,8 +126,8 @@ departures <- departures %>%
by = "route_id")
```

Now we have a data frame that tells us about the origin, destination, and time at which each
train depart from Times Square for every possible schedule of service.
Now we have a data frame that tells us about the origin, destination, and time at which
each train departs from Times Square for every possible schedule of service.

```{r}
departures %>%
Expand All @@ -136,14 +140,14 @@ departures %>%
```

However, we don't know days on which these trips run. Using the service_id column on our
calculated departures, and tidytransit's calculated `dates_services` data frame, we can filter
trips to a given date of interest.
calculated departures and tidytransit's calculated `dates_services` data frame, we can
filter trips to a given date of interest.

```{r}
head(gtfs$.$dates_services)
```

Please see the `servicepatterns` vignette for further examples on how to use this table.
Please see the `servicepatterns` vignette for further examples on how to use this table.

## Extract a single day

Expand Down Expand Up @@ -191,8 +195,8 @@ ggplot(departures_180823) + theme_bw() +
labs(title = "Departures from Times Square on 08/23/18")
```

Now we plot departures for all stop_ids with the same name, we can separate for different
stop_id. The following plot shows all departures for stop_ids 127N and 127S from 7 to 8 AM.
Now we plot departures for all stop_ids with the same name, so we can separate for different
stop_ids. The following plot shows all departures for stop_ids 127N and 127S from 7 to 8 AM.

```{r fig.width=6, fig.height=6}
departures_180823_sub_7to8 <- departures_180823 %>%
Expand All @@ -210,6 +214,6 @@ ggplot(departures_180823_sub_7to8) + theme_bw() +
```

Of course this plot idea can be expanded further. You could also differentiate each route by
direction (using headsign, origin or next/previous stops). Another approach is to calculate
frequencies and show different levels of service during the day, all depending on the goal
of your analysis.
direction (using direction_id, headsign, origin or next/previous stops). Another approach is
to calculate frequencies and show different levels of service during the day, all depending
on the goal of your analysis.

0 comments on commit d43a7cc

Please sign in to comment.