Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version argument #301

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

* Bump minimum R version from 3.5.0 to 3.6.0 since that's a requirement for one of our indirect dependencies (i.e. [evaluate](https://cran.r-project.org/package=evaluate)).
* Adjusted the SQL syntax used inside `oe_get_network` so that the queries are compatible with GDAL 3.10 ([#298](https://github.com/ropensci/osmextract/issues/291)).
* Added a `version` argument to `oe_match` to simplify the download of old extracts from Geofabrik provider ([#295](https://github.com/ropensci/osmextract/issues/295))
* The output of `oe_get_network` does not drop elements tagged as `access = 'no'` as long as the `foot`/`bicycle`/`motor_vehicle` (according to the chosen mode of transport) key is equal to `yes`, `permissive`, or `designated` ([#289](https://github.com/ropensci/osmextract/issues/289)).

### MINOR CHANGES
Expand Down
3 changes: 3 additions & 0 deletions R/get-network.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@
#' modifications to the current filters or propose new values for alternative
#' modes of transport.
#'
#' Starting from version 0.5.2, the `version` argument (see [oe_get()]) can be
#' used to download historical OSM extracts from Geofabrik provider.
#'
#' @seealso [oe_get()]
#'
#' @examples
Expand Down
8 changes: 8 additions & 0 deletions R/get.R
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@
#' say that smaller administrative units correspond to bigger levels. If
#' `NULL`, the default, the `oe_*` functions will select the highest available
#' level. See Details and Examples in [oe_match()].
#' @param version The version of the OSM extract to download. The default is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great solution, and 👍 to backwards compatibility. Quick question: will this still default to not downloading new data if there's already (possibly out of date) files matching the region in the download directory?

#' "latest". Other possible values are typically specified using the format
#' YYMMDD (e.g. "200101"). The complete list of all available historic files
#' for a given extract can be browsed from the Geofabrik website (e.g.
#' <https://download.geofabrik.de/europe/italy.html> and then click on 'raw
#' directory index').
#' @param download_directory Directory to store the file containing OSM data?.
#' @param force_download Should the `.osm.pbf` file be updated even if it has
#' already been downloaded? `FALSE` by default. This parameter is used to
Expand Down Expand Up @@ -216,6 +222,7 @@ oe_get = function(
match_by = "name",
max_string_dist = 1,
level = NULL,
version = "latest",
download_directory = oe_download_directory(),
force_download = FALSE,
max_file_size = 5e+8,
Expand Down Expand Up @@ -246,6 +253,7 @@ oe_get = function(
match_by = match_by,
max_string_dist = max_string_dist,
level = level,
version = version,
quiet = quiet
)

Expand Down
35 changes: 26 additions & 9 deletions R/match.R
Original file line number Diff line number Diff line change
Expand Up @@ -152,11 +152,13 @@ oe_match.sfc = function(
place,
provider = "geofabrik",
level = NULL,
version = "latest",
quiet = FALSE,
...
) {
# Load the data associated with the chosen provider.
provider_data = load_provider_data(provider)
version <- check_version(version, provider)

# Check if place has no CRS (i.e. NA_crs_, see ?st_crs) and, in that case, set
# 4326 + raise a warning message.
Expand Down Expand Up @@ -216,7 +218,6 @@ oe_match.sfc = function(
# If, again, there are multiple matches with the same "level", we will select
# only the area closest to the input place.
if (nrow(matched_zones) > 1L) {

nearest_id_centroid = sf::st_nearest_feature(
place,
sf::st_centroid(sf::st_geometry(matched_zones))
Expand All @@ -231,13 +232,19 @@ oe_match.sfc = function(
.subclass = "oe_match_sfcInputMatchedWith"
)

url <- matched_zones[["pbf"]]
url <- adjust_version_in_url(version, url)
file_size <- matched_zones[["pbf_file_size"]]
if (version != "latest") {
file_size <- NA # The file size is not available for older versions
}

# Return a list with the URL and the file_size of the matched place
result = list(
url = matched_zones[["pbf"]],
file_size = matched_zones[["pbf_file_size"]]
url = url,
file_size = file_size
)
result

}

#' @inheritParams oe_get
Expand Down Expand Up @@ -277,6 +284,7 @@ oe_match.character = function(
quiet = FALSE,
match_by = "name",
max_string_dist = 1,
version = "latest",
...
) {
# For the moment we support only length-one character vectors
Expand All @@ -290,6 +298,7 @@ oe_match.character = function(
)
)
}
version <- check_version(version, provider)

# See https://github.com/ropensci/osmextract/pull/125
if (place == "ITS Leeds") {
Expand Down Expand Up @@ -339,7 +348,6 @@ oe_match.character = function(
# If the approximate string distance between the best match is greater than
# the max_string_dist threshold, then:
if (isTRUE(high_distance)) {

# 1. Raise a message
oe_message(
"No exact match found for place = ", place,
Expand Down Expand Up @@ -389,7 +397,8 @@ oe_match.character = function(
provider = other_provider,
match_by = match_by,
quiet = TRUE,
max_string_dist = max_string_dist
max_string_dist = max_string_dist,
version = version
)
)
}
Expand All @@ -410,7 +419,8 @@ oe_match.character = function(
oe_match(
place = sf::st_geometry(place_online),
provider = provider,
quiet = quiet
quiet = quiet,
version = version
)
)
}
Expand All @@ -434,9 +444,16 @@ oe_match.character = function(
.subclass = "oe_match_characterinputmatchedWith"
)

url <- best_matched_place[["pbf"]]
url <- adjust_version_in_url(version, url)
file_size <- best_matched_place[["pbf_file_size"]]
if (version != "latest") {
file_size <- NA # The file size is not available for older versions
}

result = list(
url = best_matched_place[["pbf"]],
file_size = best_matched_place[["pbf_file_size"]]
url = url,
file_size = file_size
)
result
}
Expand Down
21 changes: 21 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,27 @@ check_layer_provider = function(layer, provider) {
invisible(0)
}

check_version <- function(version, provider) {
# Currently, the only provider that includes historic data for the OSM
# extracts is geofabrik.
if (version != "latest" && provider != "geofabrik") {
warning(
"version != 'latest' is only supported for 'geofabrik' provider. ",
"Overriding it to 'latest'.",
call. = FALSE
)
return("latest")
}
version
}
adjust_version_in_url <- function(version, url) {
if (version == "latest") {
return(url)
}
gsub("latest(?=\\.osm\\.pbf$)", version, url, perl = TRUE)
}


# Starting from sf 1.0.2, sf::st_read raises a warning message when both layer
# and query arguments are set, while it raises a warning in sf < 1.0.2 when
# there are multiple layers and the layer argument is not set. See also
Expand Down
8 changes: 8 additions & 0 deletions man/oe_get.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions man/oe_get_network.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 16 additions & 1 deletion man/oe_match.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions tests/testthat/test-match.R
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,22 @@ test_that("oe_match_pattern: test spatial combine", {
MI_PA = sf::st_sfc(milan, palermo, crs = 4326)
expect_identical(oe_match_pattern(MI_PA)$geofabrik, c("Europe", "Italy"))
})

test_that("oe-match: detecting version works", {
latest_match <- oe_match("Italy", quiet = TRUE)
expect_true(grepl("latest", latest_match$url))

version2020_match <- oe_match("Italy", quiet = TRUE, version = "200101")
expect_true(grepl("200101", version2020_match$url))
})

test_that("oe-match: warning with version and provider", {
expect_warning(
oe_match("Leeds", provider = "bbbike", version = "2", quiet = TRUE),
regexp = "version != 'latest' is only supported for 'geofabrik' provider."
)
expect_warning(
oe_match("Lombardia", version = "ABC", quiet = TRUE),
regexp = "version != 'latest' is only supported for 'geofabrik' provider."
)
})
28 changes: 28 additions & 0 deletions vignettes/osmextract.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,34 @@ Finally, to reduce unnecessary computational resources and save bandwidth/electr
(its_details = oe_match("ITS Leeds"))
```

### Matching historical OSM extracts

Starting from `osmextract` v0.5.2, the `version` argument can be used to match historical OSM extracts stored by Geofabrik provider. The default value is `"latest"` which corresponds to the most recent OSM extract. Other values can be specified using the format `"YYMMDD"`. The available extracts for each zone can be browsed from Geofabrik [website](https://download.geofabrik.de/).

For example:

```{r}
oe_match("Italy", quiet = TRUE)$url
oe_match("Italy", version = "200101", quiet = TRUE)$url # OSM data up to January 1st 2020
oe_match(c(9.1916, 45.4650), version = "210101", quiet = TRUE)$url
```

Unfortunately, Geofabrik is the only provider which currently stores historical OSM extracts. Therefore, `version != "latest"` is ignored (with a warning message) if you select a different provider

```{r}
oe_match("Leeds", provider = "bbbike", version = "200101")
```

or if the input `place` is not matched with Geofabrik provider.

```{r}
oe_match("Leeds", version = "200101")
```

<!-- TODO: Get in contact with Geofabrik to check the status of historical OSM extract. -->

Beware that the default value, i.e. `"latest"`, selects an _always evolving_ OSM extract. On the other hand, historical OSM extracts are static and they may be preferable for reproducibility purposes.

## `oe_download()`: Download OSM extracts

The `oe_download()` function is used to download `.pbf` files representing OSM extracts.
Expand Down