update readme/vignettes

r-transit · Jun 23, 2023 · d43a7cc · d43a7cc
1 parent fb45fe0
commit d43a7cc
Show file tree

Hide file tree

Showing 4 changed files with 35 additions and 28 deletions.
diff --git a/README.md b/README.md
@@ -49,7 +49,7 @@ devtools::install_github("r-transit/tidytransit")
 
 -   [`gtfsio`](https://github.com/r-transit/gtfsio) R package to read and write gtfs feeds, tidytransit builds on gtfsio
 -   [`gtfstools`](https://github.com/ipeaGIT/gtfstools) Tools for editing and analysing transit feeds
--   [`gtfsrouter`](https://github.com/ATFutures/gtfs-router) Package for public transport routing 
+-   [`gtfsrouter`](https://github.com/UrbanAnalyst/gtfsrouter) Package for public transport routing 
 -   [`gtfs2gps`](https://github.com/ipeaGIT/gtfs2gps) Converting public transport data from GTFS format to GPS-like records
 
 

diff --git a/vignettes/introduction.Rmd b/vignettes/introduction.Rmd
@@ -44,26 +44,27 @@ devtools::install_github("r-transit/tidytransit")
 
 # The General Transit Feed Specification
 
-The [summary page for the GTFS standard](https://developers.google.com/transit/gtfs/#overview-of-a-gtfs-feed) 
-is a good resource for more information on the standard.  
+The [summary page for the GTFS standard](https://gtfs.org/schedule/) 
+is a good resource for more information on the standard.
 
-GTFS feeds contain many linked tables about published transit schedules about service 
+GTFS feeds contain many linked tables about published transit schedules, service 
 schedules, trips, stops, and routes. Below is a diagram of these relationships and tables:
 
 ![gtfs-relationship-diagram](figures/GTFS_class_diagram.svg.png)
 Source: Wikimedia, user -stk. 
 
 # Read a GTFS Feed
 
-GTFS data come packaged as a zip file of tables in text form. The first thing tidytransit 
-does is consolidate the reading of all those tables into a single R object, which contains 
+GTFS data come packaged as a zip file of comma separated tables in text form. The first thing 
+tidytransit does is consolidate the reading of all those tables into a single R object, which contains 
 a list of the tables in each feed. 
 
 Below we use the tidytransit `read_gtfs` function in order to read a feed from the NYC MTA 
-into R. 
+into R.
 
 We use a feed included in the package in the example below. But note that you can read 
-directly from the New York City Metropolitan Transit Authority, as shown in the commented code below. 
+directly from the New York City Metropolitan Transit Authority, as shown in the commented 
+code below. 
 
 You can also read from any other URL. This is useful because there are many sources for 
 GTFS data, and often the best source is transit service providers themselves. See the next 

diff --git a/vignettes/servicepatterns.Rmd b/vignettes/servicepatterns.Rmd
@@ -19,13 +19,13 @@ knitr::opts_chunk$set(echo = TRUE)
 
 ## Overview
 
-Each trip in a GTFS feed is referenced to a service_id (in trips.txt). The [GTFS reference](https://developers.google.com/transit/gtfs/reference/#calendartxt) 
+Each trip in a GTFS feed is referenced to a service_id (in trips.txt). The [GTFS reference](https://gtfs.org/schedule/reference/#calendartxt) 
 specifies that a "service_id contains an ID that uniquely identifies a set of dates when 
 service is available for one or more routes". A service could run on every weekday or only 
 on Saturdays for example. Other possible services run only on holidays during a year, 
 independent of weekdays. However, feeds are not required to indicate anything with 
 service_ids and some feeds even use a unique service_id for each trip and day. In 
-this vignette we'll look at a general way to gather information on when trips run by 
+this vignette, we'll look at a general way to gather information on when trips run by 
 using "service patterns".
 
 Service patterns can be used to find a typical day for further analysis like routing or
@@ -51,8 +51,8 @@ head(gtfs$.$dates_services)
 ```
 
 To understand service patterns better we need information on weekdays and holidays. With 
-a calendar table we know the weekday and possible holidays for each date. We'll use a minimal example 
-with two holidays.
+a calendar table we know the weekday and possible holidays for each date. We'll use a 
+minimal example with two holidays.
 
 ```{r}
 holidays = tribble(~date, ~holiday,
@@ -75,7 +75,7 @@ head(calendar)
 To analyse on which dates trips run and to group similar services we use service patterns. 
 Such a pattern simply lists all dates a trip runs on. For example, a trip with a pattern 
 like _2019-03-07, 2019-03-14, 2019-03-21, 2019-03-28_ runs every Thursday in March 2019. 
-To handle these patterns we create a `servicepattern_id` using a hash function. Ideally 
+To handle these patterns, we create a `servicepattern_id` using a hash function. Ideally 
 there are the same number of servicepattern_ids and service_ids. However, in real life feeds 
 this is rarely the case. In addition, the usability of service patterns depends largely on 
 the feed and its complexity.
@@ -95,15 +95,17 @@ In addition, `gtfs$.$dates_servicepatterns` has been created which connects date
 patterns (like `dates_services`). We can compare the number of service patterns to the number of services.
 
 ```{r}
+head(gtfs$.$dates_servicepatterns)
+
 # service ids used
-n_services <-  length(unique(gtfs$trips$service_id)) # 70
+n_services <- length(unique(gtfs$trips$service_id)) # 70
 
 # unique date patterns 
 n_servicepatterns <- length(unique(gtfs$.$servicepatterns$servicepattern_id)) # 7
 ```
 
-The feed uses 70 service_ids but there are actually only 7 different date patterns. Other feeds 
-might not have such low numbers, for example the [Swiss GTFS feed](https://opentransportdata.swiss/dataset/timetable-2019-gtfs) 
+The feed uses 70 service_ids but there are actually only 7 different date patterns. Other 
+feeds might not have such low numbers, for example the [Swiss GTFS feed](https://opentransportdata.swiss/dataset/timetable-2019-gtfs) 
 uses around 15'600 service_ids which all identify unique date patterns.
 
 ## Analyse Data

diff --git a/vignettes/timetable.Rmd b/vignettes/timetable.Rmd
@@ -68,7 +68,8 @@ if(!exists("trip_headsign", where = gtfs$trips)) {
   trip_headsigns <- gtfs$stop_times %>% 
     group_by(trip_id) %>% 
     summarise(stop_id = stop_id[which.max(stop_sequence)]) %>% 
-    left_join(gtfs$stops, by="stop_id") %>% select(trip_id, trip_headsign.computed = stop_name)
+    left_join(gtfs$stops, by="stop_id") %>%
+    select(trip_id, trip_headsign.computed = stop_name)
 
   # assign the headsign to the gtfs object 
   gtfs$trips <- left_join(gtfs$trips, trip_headsigns, by = "trip_id")
@@ -77,7 +78,7 @@ if(!exists("trip_headsign", where = gtfs$trips)) {
 
 ## Create A Departure Time Table
 
-To create a departure timetable we first need to find the ids of all stops in the stops 
+To create a departure timetable, we first need to find the ids of all stops in the stops 
 table with the same same name, as `stop_name` might cover different platforms and thus have 
 multiple stop_ids in the stops table. 
 
@@ -87,6 +88,9 @@ stop_ids <- gtfs$stops %>%
   select(stop_id)
 ```
 
+Note that multiple unrelated stops can have the same `stop_name`, see `cluster_stops()` 
+for examples how to find these cases.
+
 ## Trips departing from stop 
 
 To the selected stop_ids for Time Square, we can join trip columns: `route_id`, `service_id`, 
@@ -122,8 +126,8 @@ departures <- departures %>%
             by = "route_id")
 ```
 
-Now we have a data frame that tells us about the origin, destination, and time at which each 
-train depart from Times Square for every possible schedule of service. 
+Now we have a data frame that tells us about the origin, destination, and time at which 
+each train departs from Times Square for every possible schedule of service. 
 
 ```{r}
 departures %>% 
@@ -136,14 +140,14 @@ departures %>%
 ```
 
 However, we don't know days on which these trips run. Using the service_id column on our 
-calculated departures, and tidytransit's calculated `dates_services` data frame, we can filter 
-trips to a given date of interest. 
+calculated departures and tidytransit's calculated `dates_services` data frame, we can 
+filter trips to a given date of interest.
 
 ```{r}
 head(gtfs$.$dates_services)
 ```
 
-Please see the `servicepatterns` vignette for further examples on how to use this table.  
+Please see the `servicepatterns` vignette for further examples on how to use this table.
 
 ## Extract a single day 
 
@@ -191,8 +195,8 @@ ggplot(departures_180823) + theme_bw() +
   labs(title = "Departures from Times Square on 08/23/18")
 ```
 
-Now we plot departures for all stop_ids with the same name, we can separate for different
-stop_id. The following plot shows all departures for stop_ids 127N and 127S from 7 to 8 AM.
+Now we plot departures for all stop_ids with the same name, so we can separate for different
+stop_ids. The following plot shows all departures for stop_ids 127N and 127S from 7 to 8 AM.
 
 ```{r fig.width=6, fig.height=6}
 departures_180823_sub_7to8 <- departures_180823 %>% 
@@ -210,6 +214,6 @@ ggplot(departures_180823_sub_7to8) + theme_bw() +
 ```
 
 Of course this plot idea can be expanded further. You could also differentiate each route by
-direction (using headsign, origin or next/previous stops). Another approach is to calculate 
-frequencies and show different levels of service during the day, all depending on the goal 
-of your analysis.
+direction (using direction_id, headsign, origin or next/previous stops). Another approach is 
+to calculate frequencies and show different levels of service during the day, all depending 
+on the goal of your analysis.