-
Notifications
You must be signed in to change notification settings - Fork 1
/
distances.qmd
330 lines (240 loc) · 10.2 KB
/
distances.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
---
number-depth: 2
format:
pdf:
prefer-html: true
---
# Euclidean and routing distances
We will show how to estimate euclidean distances (*as crown flights*) using `sf` package, and the distances using a road network using `r5r` package (demonstrative).
## Euclidean distances
Taking the survey respondents' location, we will estimate the distance to the university (IST) using the `sf` package.
### Import survey data frame convert to sf
We will use a survey dataset with 200 observations, with the following variables: ID, Affiliation, Age, Sex, Transport Mode to IST, and latitude and longitude coordinates.
```{r}
#| message: false
library(dplyr)
SURVEY = read.csv("data/SURVEY.txt", sep = "\t") # tab delimiter
names(SURVEY)
```
As we have the coordinates, we can convert this data frame to a spatial feature, as explained in the [Introduction to spatial data](spatial-data.qmd#create-spatial-data-from-coordinates) section.
```{r}
#| message: false
library(sf)
SURVEYgeo = st_as_sf(SURVEY, coords = c("lon", "lat"), crs = 4326) # convert to as sf data
```
### Create new point at the university
Using coordinates from Instituto Superior Técnico, we can directly create a simple feature and assign its crs.
```{r}
UNIVERSITY = data.frame(place = "IST",
lon = -9.1397404,
lat = 38.7370168) |> # first a dataframe
st_as_sf(coords = c("lon", "lat"), # then a spacial feature
crs = 4326)
```
Visualize in a map:
```{r}
#| fig-format: png
#| message: false
library(mapview)
mapview(SURVEYgeo, zcol = "MODE") + mapview(UNIVERSITY, col.region = "red", cex = 12)
```
### Straight lines
First we will create lines connecting the survey locations to the university, using the `st_nearest_points()` function.
This function finds returns the nearest points between two geometries, and creates a line between them.
This can be useful to find the nearest train station to each point, for instance.
As we only have 1 point at UNIVERSITY layer, we will have the same number of lines as number of surveys = `r nrow(SURVEY)`.
```{r}
#| fig-format: png
SURVEYeuclidean = st_nearest_points(SURVEYgeo, UNIVERSITY, pairwise = TRUE) |>
st_as_sf() # this creates lines
mapview(SURVEYeuclidean)
```
Note that if we have more than one point in the second layer, the `pairwise = TRUE` will create a line for each combination of points.
Set to `FALSE` if, for instance, you have the same number of points in both layers and want to create a line between the corresponding points.
### Distance
Now we can estimate the distance using the `st_length()` function.
```{r}
#| eval: false
# compute the line length and add directly in the first survey layer
SURVEYgeo = SURVEYgeo |>
mutate(distance = st_length(SURVEYeuclidean))
# remove the units - can be useful
SURVEYgeo$distance = units::drop_units(SURVEYgeo$distance)
```
We could also estimate the distance using the `st_distance()` function **directly**, although we would not get and sf with lines.
```{r}
SURVEYgeo = SURVEYgeo |>
mutate(distance = st_distance(SURVEYgeo, UNIVERSITY)[,1] |> # in meters
units::drop_units()) |> # remove units
mutate(distance = round(distance)) # round to integer
summary(SURVEYgeo$distance)
```
`SURVEYgeo` is still a points' sf.
## Routing Engines
There are different types of routing engines, regarding the type of network they use, the type of transportation they consider, and the type of data they need.
We can have:
- Uni-modal vs. Multi-modal
- One mode per trip vs. One trip with multiple legs that can be made with different modes
- Multi-modal routing may require GTFS data (realistic Public Transit)
- Output level of the results
- Routes (1 journey = 1 route)
- Legs
- Segments
- Routing profiles
- Type of user
- fastest / shortest path
- avoid barriers / tolls, etc
![Routing options in [OpenRouteService](https://maps.openrouteservice.org/)](images/clipboard-1370155720.png){fig-align="center"}
- Local vs. Remote (service request - usually web API)
- Speed vs. Quota limits / price
- Hard vs. Easy set up
- Hardware limitations in local routing
- Global coverage in remote routing, with frequent updates
Examples: [OSRM](https://project-osrm.org/), [Dodgr](https://urbananalyst.github.io/dodgr/), [r5r](https://ipeagit.github.io/r5r/), [Googleway](https://symbolixau.github.io/googleway/reference/access_result.html), [CycleStreets](https://m.cyclestreets.net/journey), [HERE](https://munterfi.github.io/hereR/).
### Routing distances with `r5r`
We use the `r5r` package to estimate the distance using a road network [@Pereira2021r5r].
::: {.callout-note appearance="simple"}
To properly the setup r5r model for the area you are working on, you need to download the **road network** data from OpenStreetMap and, if needed, add a **GTFS** and **DEM** file, as it will be explained in the [next section](r5r.qmd).
:::
```{r}
#| message: false
#| warning: false
#| include: false
options(java.parameters = "-Xmx4G") # allocate memory
url = "https://github.com/U-Shift/EITcourse/releases/download/2024.09/network.dat"
download.file(url, destfile = "data/network.dat", mode = "wb")
library(r5r)
r5r_network = setup_r5("data/", overwrite = FALSE) # load existing network
```
We will use only respondents with a distance to the university less than 2 km.
```{r}
#| warning: false
SURVEYsample = SURVEYgeo |> filter(distance <= 2000)
nrow(SURVEYsample)
```
We need an id (unique identifier) for each survey location, to be used in the routing functions of `r5r`.
```{r}
# create id columns for both datasets
SURVEYsample = SURVEYsample |>
mutate(id = c(1:nrow(SURVEYsample))) # from 1 to the number of rows
UNIVERSITY = UNIVERSITY |>
mutate(id = 1) # only one row
```
#### Distances by car
Estimate the routes with time and distance by car, from survey locations to University.
```{r}
#| warning: false
SURVEYcar = detailed_itineraries(
r5r_core = r5r_network,
origins = SURVEYsample,
destinations = UNIVERSITY,
mode = "CAR",
all_to_all = TRUE # if false, only 1-1 would be calculated
)
names(SURVEYcar)
```
The [`detailed_itineraries()`](https://ipeagit.github.io/r5r/reference/detailed_itineraries.html) function is super detailed!
::: {.callout-note appearance="simple"}
If we want to know only time and distance, and **not the route** itself, we can use the [`travel_time_matrix()`](https://ipeagit.github.io/r5r/reference/travel_time_matrix.html).
:::
#### Distances by foot
Repeat the same for `WALK`[^distances-1].
[^distances-1]: For bike you would use `BICYCLE`.
```{r}
#| warning: false
#| code-fold: true
SURVEYwalk = detailed_itineraries(
r5r_core = r5r_network,
origins = SURVEYsample,
destinations = UNIVERSITY,
mode = "WALK",
all_to_all = TRUE # if false, only 1-1 would be calculated
)
```
#### Distances by PT
For Public Transit (`TRANSIT`) you may specify the egress mode, the departure time, and the maximum number of transfers.
```{r}
#| warning: false
#| code-fold: true
#| eval: false
SURVEYtransit = detailed_itineraries(
r5r_core = r5r_network,
origins = SURVEYsample,
destinations = UNIVERSITY,
mode = "TRANSIT",
mode_egress = "WALK",
max_rides = 1, # The maximum PT rides allowed in the same trip
departure_datetime = as.POSIXct("20-09-2023 08:00:00",
format = "%d-%m-%Y %H:%M:%S"),
all_to_all = TRUE # if false, only 1-1 would be calculated
)
```
## Compare distances
We can now compare the euclidean and routing distances that we estimated for the survey locations under 2 km.
```{r}
summary(SURVEYsample$distance) # Euclidean
summary(SURVEYwalk$distance) # Walk
summary(SURVEYcar$distance) # Car
```
> What can you understand from this results?
```{r}
#| echo: false
distances = data.frame(euclidean = SURVEYsample$distance,
walk = SURVEYwalk$distance,
car = SURVEYcar$distance)
distances = distances |> arrange(euclidean)
# Define the number of observations
n <- nrow(distances)
# Create an empty plot with the appropriate y-axis limits
plot(1:n, distances$euclidean, type = "n",
ylim = range(distances),
xlab = "Observation (sorted by Euclidean distance)",
ylab = "Distance [m]",
main = "Distances by Euclidean, Walk, and Car")
# Add lines for each type of distance
lines(1:n, distances$euclidean, lwd = 2)
lines(1:n, distances$walk, col = "red", lwd = 2)
lines(1:n, distances$car, col = "blue", lwd = 2)
# Optional: Add points for better visibility
points(1:n, distances$euclidean, pch = 16)
points(1:n, distances$walk, col = "red", pch = 16)
points(1:n, distances$car, col = "blue", pch = 16)
# Add a legend to distinguish the lines
legend("topleft", legend = c("Euclidean", "Walk", "Car"),
col = c("black", "red", "blue"),
lwd = 2, lty = c(1, 2, 3), pch = 16)
```
### Circuity
Compare 1 single route.
```{r}
#| fig-format: png
#| code-fold: true
mapview(SURVEYeuclidean[165,], color = "black") + # 1556 meters
mapview(SURVEYwalk[78,], color = "red") + # 1989 meters
mapview(SURVEYcar[78,], color = "blue") # 2565 meters
```
With this we can see the **circuity** of the routes, a measure of route / transportation efficiency, which is the ratio between the routing distance and the euclidean distance.
The cicuity for car (`r round(2565/1556,2)`) is usually higher than for walking (`r round(1989/1556,2)`) or biking, for shorter distances.
## Visualize routes
Visualize with transparency of 30%, to get a clue when they overlay.
```{r}
#| fig-format: png
mapview(SURVEYwalk, alpha = 0.3)
mapview(SURVEYcar, alpha = 0.3, color = "red")
```
We can also use the [`overline()`](https://docs.ropensci.org/stplanr/reference/overline.html) function from `stplanr` package to break up the routes when they *overline*, and add them up.
```{r}
#| message: false
#| fig-format: png
# we create a value that we can later sum
# it can be the number of trips represented by this route
SURVEYwalk$trips = 1 # in this case is only one respondent per route
SURVEYwalk_overline = stplanr::overline(
SURVEYwalk,
attrib = "trips",
fun = sum
)
mapview(SURVEYwalk_overline, zcol = "trips", lwd = 3)
```
With this we can visually inform on how many people travel along a route, from the survey dataset[^distances-2].
[^distances-2]: Assuming all travel by the shortest path.