Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-session caching #231

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/workflows/cmd-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,9 @@ jobs:
shell: Rscript {0}

- name: Check
run: rcmdcheck::rcmdcheck(args = ${{ matrix.config.args }}, error_on = 'warning', check_dir = 'check')
run: |
cat(paste0('options(bcdata.cache_path = "', file.path(Sys.getenv("GITHUB_WORKSPACE"), "bcdata_cache"), '")\n'), file = "~/.Rprofile", append = TRUE)
rcmdcheck::rcmdcheck(args = ${{ matrix.config.args }}, error_on = 'warning', check_dir = 'check')
shell: Rscript {0}

- name: Upload check results
Expand Down
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,9 @@ Imports:
sf (>= 0.7),
tidyselect (>= 0.2.5),
utils,
xml2
xml2,
memoise (>= 1.1.0),
rappdirs (>= 0.3.1)
Suggests:
covr,
ggplot2,
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ export(TOUCHES)
export(WITHIN)
export(as_tibble)
export(bcdc_browse)
export(bcdc_cache_path)
export(bcdc_cache_timeout)
export(bcdc_describe_feature)
export(bcdc_forget)
export(bcdc_get_data)
export(bcdc_get_record)
export(bcdc_list)
Expand All @@ -65,6 +68,7 @@ exportClasses(wfsConnection)
exportMethods(dbQuoteIdentifier)
exportMethods(dbQuoteString)
import(DBI)
import(memoise)
import(methods)
importFrom(cli,cat_bullet)
importFrom(cli,cat_line)
Expand Down
61 changes: 44 additions & 17 deletions R/bcdc_options.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,49 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

#' Retrieve options used in bcdata, their value if set and the default value.
#' Retrieve options used in bcdata, their value if set and
#' the default value.
#'
#' This function retrieves bcdata specific options that can be set. These options can be set
#' using `option({name of the option} = {value of the option})`. The default options are purposefully
#' set conservatively to hopefully ensure successful requests. Resetting these options may result in
#' failed calls to the data catalogue. Options in R are reset every time R is re-started. See examples for
#' addition ways to restore your initial state.
#' This function retrieves bcdata specific options that
#' can be set. These options can be set using
#' `option({name of the option} = {value of the option})`.
#' The default options are purposefully set conservatively
#' to hopefully ensure successful requests. Resetting
#' these options may result in failed calls to the data
#' catalogue. Options in R are reset every time R is
#' re-started. See examples for addition ways to restore
#' your initial state.
#'
#' `bcdata.max_geom_pred_size` is the maximum size of an object used for a geometric operation. Objects
#' that are bigger than this value will have a bounding box drawn and apply the geometric operation
#' on that simpler polygon. Users can try to increase the maximum geometric predicate size and see
#' if the bcdata catalogue accepts their request.
#' `bcdata.max_geom_pred_size` is the maximum size of an
#' object used for a geometric operation. Objects that are
#' bigger than this value will have a bounding box drawn
#' and apply the geometric operation on that simpler
#' polygon. Users can try to increase the maximum
#' geometric predicate size and see if the bcdata
#' catalogue accepts their request.
#'
#' `bcdata.chunk_limit` is an option useful when dealing with very large data sets. When requesting large objects
#' from the catalogue, the request is broken up into smaller chunks which are then recombined after they've
#' been downloaded. bcdata does this all for you but using this option you can set the size of the chunk
#' requested. On faster internet connections, a bigger chunk limit could be useful while on slower connections,
#' it is advisable to lower the chunk limit. Chunks must be less than 10000.
#' `bcdata.chunk_limit` is an option useful when dealing
#' with very large data sets. When requesting large
#' objects from the catalogue, the request is broken up
#' into smaller chunks which are then recombined after
#' they've been downloaded. bcdata does this all for you
#' but using this option you can set the size of the chunk
#' requested. On faster internet connections, a bigger
#' chunk limit could be useful while on slower
#' connections, it is advisable to lower the chunk limit.
#' Chunks must be less than 10000.
#'
#' `bcdata.cache_path` is the location on your computer
#' where results from web requests are cached. The default
#' is set by [rappdirs::user_cache_dir()] via
#' [bcdc_cache_path()]. This option can only be set before
#' the package is loaded (e.g., by setting it in your
#' .Rprofile file).
#'
#' `bcdata.cache_timeout` is the time, in seconds, that
#' the cache is maintained. Default is 3600 seconds (one
#' hour). This option can only be set before the package
#' is loaded (e.g., by setting it in your .Rprofile file).
#'
#' @examples
#' \donttest{
Expand Down Expand Up @@ -64,8 +89,10 @@ bcdc_options <- function() {

dplyr::tribble(
~ option, ~ value, ~default,
"bcdata.max_geom_pred_size", null_to_na(getOption("bcdata.max_geom_pred_size")), 5E5,
"bcdata.chunk_limit",null_to_na(getOption("bcdata.chunk_limit")), 1000
"bcdata.max_geom_pred_size", null_to_na(getOption("bcdata.max_geom_pred_size")), as.character(5E5),
"bcdata.chunk_limit",null_to_na(getOption("bcdata.chunk_limit")), as.character(1000),
"bcdata.cache_path",null_to_na(getOption("bcdata.cache_path")), rappdirs::user_cache_dir("bcdata"),
"bcdata.cache_timeout",null_to_na(getOption("bcdata.cache_timeout")), as.character(3600)
)
}

Expand Down
112 changes: 85 additions & 27 deletions R/utils-classes.R
Original file line number Diff line number Diff line change
Expand Up @@ -325,32 +325,7 @@ mutate.bcdc_promise <- function(.data, ...){
mutate({dots}) "), call. = FALSE)
}


#' Force collection of Web Service request from B.C. Data Catalogue
#'
#' After tuning a query, `collect()` is used to actually bring the data into memory.
#' This will retrieve an sf object into R. The `as_tibble()` function can be used
#' interchangeably with `collect` which matches `dbplyr` behaviour.
#'
#' @param x object of class bcdc_promise
#' @inheritParams collect
#' @rdname collect-methods
#' @export
#'
#' @examples
#' \donttest{
#' try(
#' bcdc_query_geodata("bc-airports") %>%
#' collect()
#' )
#'
#' try(
#' bcdc_query_geodata("bc-airports") %>%
#' as_tibble()
#' )
#' }
#'
collect.bcdc_promise <- function(x, ...){
collect_bcdc_promise_ <- function(x, ...){
check_chunk_limit()

x$query_list$CQL_FILTER <- finalize_cql(x$query_list$CQL_FILTER)
Expand Down Expand Up @@ -406,11 +381,94 @@ collect.bcdc_promise <- function(x, ...){

txt <- cc$parse("UTF-8")

as.bcdc_sf(bcdc_read_sf(txt), query_list = query_list, url = url,
ret <- as.bcdc_sf(bcdc_read_sf(txt), query_list = query_list, url = url,
full_url = full_url)

if (getOption("bcdata.cache_verbose", FALSE)) {
message("caching for ", bcdc_cache_timeout(),
" seconds at ", bcdc_cache_path())
}

ret

}

#' Retrieve Default Cache timeout
#'
#' Retrieves the length of time that a cache of [collect()]ed
#' web resources is kept. Default is 1 hour (3600 secons).
#'
#' @export
bcdc_cache_timeout <- function() {
getOption("bcdata.cache_timeout", 3600)
}

#' Retrieve Default Cache Path
#'
#' Retrieves the default path used to cache the result of web requests. Makes
#' use of the \code{rappdirs} package to use cache folders
#' defined by each operating system
#'
#' @export
bcdc_cache_path <- function() {
getOption("bcdata.cache_path", rappdirs::user_cache_dir("bcdata"))
}

#' Force collection of Web Service request from B.C. Data
#' Catalogue
#'
#' After tuning a query, `collect()` is used to actually
#' bring the data into memory. This will retrieve an sf
#' object into R. The `as_tibble()` function can be used
#' interchangeably with `collect` which matches `dbplyr`
#' behaviour.
#'
#' The result of `collect()`-ing a query will be cached to
#' avoid repeatedly requesting the same data from the
#' server. The duration of the caching can be customized
#' by setting the option `bcdc_cache_timeout` to a
#' different value (in seconds). The default is one hour
#' (3600 seconds).
#'
#' The cache can be cleared by running [bcdc_forget()].
#' Note this will clear the cache for all `collect()`
#' calls in the previous time frame specified in the
#' `bcdc_cache_timeout` option.
#'
#' @param x object of class bcdc_promise
#' @import memoise
#' @inheritParams collect
#' @rdname collect-methods
#' @export
#'
#' @examples
#' \donttest{
#' try(
#' bcdc_query_geodata("bc-airports") %>%
#' collect()
#' )
#'
#' try(
#' bcdc_query_geodata("bc-airports") %>%
#' as_tibble()
#' )
#' }
#'
collect.bcdc_promise <- memoise(
collect_bcdc_promise_,
~ timeout(bcdc_cache_timeout()), # 1 hour
cache = cache_filesystem(bcdc_cache_path())
)

#' Forget (clear) the cache of objects returned by
#' [collect()]
#'
#' @return `TRUE` if the cache existed previously and was
#' successfully cleared, otherwise `FALSE`.
#' @export
bcdc_forget <- function() {
memoise::forget(collect.bcdc_promise)
}

#' @inheritParams collect.bcdc_promise
#' @rdname collect-methods
Expand Down
13 changes: 13 additions & 0 deletions man/bcdc_cache_path.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions man/bcdc_cache_timeout.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions man/bcdc_forget.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 40 additions & 15 deletions man/bcdc_options.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 18 additions & 3 deletions man/collect-methods.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.