Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arc_read() function #108

Closed
elipousson opened this issue Nov 29, 2023 · 8 comments
Closed

Add arc_read() function #108

elipousson opened this issue Nov 29, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@elipousson
Copy link
Contributor

I would find it really convenient to have a single function, e.g. arc_read(), that I could use to read in data from a ArcGIS service. Here is a reprex showing my existing workflow and an alternate workflow with this function:

library(arcgislayers)
library(rlang) # for %||%

url <- "https://geodata.baltimorecity.gov/egis/rest/services/CitiMap/DOT_Layers/MapServer/5"

# The way I usually use arcgislayers
data <- url |>
  arc_open() |>
  arc_select()

# The additional convenience function for reading layers
arc_read <- function(url,
                     fields = NULL,
                     where = NULL,
                     crs = NULL,
                     geometry = TRUE,
                     filter_geom = NULL,
                     ...,
                     .name_repair = NULL,
                     token = Sys.getenv("ARCGIS_TOKEN")) {
  arc_obj <- arc_open(url = url, token = token)

  arc_obj <- arc_select(
    x = arc_obj,
    fields = fields,
    where = where,
    crs = crs %||% sf::st_crs(arc_obj),
    geometry = geometry,
    filter_geom = filter_geom,
    token = token,
    ...
  )

  if (!is.null(.name_repair)) {
    if (is.function(.name_repair)) {
      nm <- .name_repair(names(arc_obj))
    }

    arc_obj <- rlang::set_names(arc_obj, nm)
  }

  arc_obj
}

url <- "https://geodata.baltimorecity.gov/egis/rest/services/CitiMap/DOT_Layers/MapServer/5"

data <- arc_read(url)

Created on 2023-11-29 with reprex v2.0.2

This reprex also shows the potential of adding a .name_repair parameter. If you'd be OK with picking up a {vctrs} dependency you could use vctrs::vec_as_names() or re-implement comparable flexibility.

@elipousson elipousson added the enhancement New feature or request label Nov 29, 2023
@JosiahParry
Copy link
Collaborator

vctrs is already recursively imported by httr2 so that's not such a big deal. I still don't think I'm sold on this. When you read from a database you open a connection, then you query the results and then bring it into memory. That's the workflow we're replicating here.

For example arrow::open_dataset() does not give us the ability to query and read in the same function.

I don't think arc_open() |> arc_select() is too much of a hassle at the moment.

If arc_read() is to be provided i think it would need to be inline with other reading functions meaning it would need to read everything into memory. But I would like to explicitly discourage that.

@elipousson
Copy link
Contributor Author

I don't think arc_open() |> arc_select() is too much hassle but I think a single function would be useful—especially when so many vector layers are fairly small and read into memory quickly without any issues.

I'd also note that most of the common read functions don't require you to read everything into memory:

  • sf::read_sf() supplies a wkt_filter parameter that is similar to filter_geom (and you can use the query parameter to select specific columns or attributes or filter rows)
  • readr::read_delim() has col_select (comparable to fields) and skip
  • googlesheets4::read_sheet() also has range and skip
  • arrow::read_parquet() works with URLs as well as any Arrow input stream and also has col_select

arrow::open_dataset() is designed for multi-file datasets which is why the limitation makes sense there. You could have arc_read() kick back an error if the feature count from the initial metadata exceeds some default threshold.

@JosiahParry
Copy link
Collaborator

Let's go ahead and make a pr. Why not? be sure to use rlang::names_inform_repair() if names are changed. Let's set n_max = 10000 as a safe maximum (I've seen some insanely detailed polylines recently where 300 take longer to read in 5,000 polygons). After using arc_open() its probably best to use obj_is_layer() in case somehow we read in a map or feature server.

@elipousson
Copy link
Contributor Author

Nice. Thanks! Would you be interested in me including the col_select argument as an alias for fields? Internally inconsistent but maybe nice for familiarity based on those other read functions.

elipousson added a commit to elipousson/arcgislayers that referenced this issue Dec 3, 2023
@JosiahParry
Copy link
Collaborator

col_select would be fine for me. I'm not sure how you would replicate the tidyselect functionality, though. I'm content with supporting integer positions and explicit field names in a character vector. For .name_repair lets use rlang::check_installed() to avoid the explicit dependency on vctrs and add it to Suggests. We can also support col_names argument with the caveat that the first row will not be read into the dataset.

I think signature should be:

arc_read <- function(
  url,
  col_names = NULL,
  col_select = NULL,
  n_max = Inf # or a sensible number e.g. 10000
  name_repair = "unique"
) {
  # ...
}

@elipousson
Copy link
Contributor Author

That is almost exactly how I have it started. Stayed up late getting it working (getting col_names to parity with the standard tidyverse functionality was a bit tricky) but the PR may not be until next weekend. I think it could also easily support arc_raster (that is how I wrote it so far)

@JosiahParry
Copy link
Collaborator

FWIW I dont think it needs to be exactly the same as readr but close enough is fine. We're reading json and not csv files here so good is better than perfect

@elipousson elipousson mentioned this issue Dec 13, 2023
3 tasks
JosiahParry added a commit that referenced this issue Dec 27, 2023
@JosiahParry
Copy link
Collaborator

Closed by #118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants