diff --git a/NEWS.md b/NEWS.md index 0a560c0e..44901316 100644 --- a/NEWS.md +++ b/NEWS.md @@ -7,11 +7,6 @@ the long cubble has class: `temporal_cubble_df` and `cubble_df`. * refactor the matching function * better integration with sf and tsibble -# cubble 0.2.2 - -* `as_cubble()` method for `sftime` objects -* new function `add_geometry_column()` to facilitate class cast with `sftime` objects (#15) - # cubble 0.2.1 * add pkg logo diff --git a/vignettes/cb1class.Rmd b/vignettes/cb1class.Rmd index 65367175..a8148927 100644 --- a/vignettes/cb1class.Rmd +++ b/vignettes/cb1class.Rmd @@ -3,7 +3,7 @@ title: "1. The cubble class" output: rmarkdown::html_vignette bibliography: '`r system.file("reference.bib", package = "cubble")`' vignette: > - %\VignetteIndexEntry{1. class} + %\VignetteIndexEntry{class} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- @@ -27,7 +27,7 @@ library(tsibble) library(patchwork) ``` -Spatio-temporal data comes various spatial and temporal characteristics and requires different data structures to wrangle: climate weather stations are recorded at fixed point location but with potential temporal data quality issue (missingness on the day); GPS data tracks unique point locations at different timestamps; satellite imageries captures snapshots of landscape at selected time. The type of spatio-temporal data cubble tackles are those collected at unique fixed locations while allowed for irregularity in the temporal dimension, like the weather station data. In the four layouts presented by the spacetime paper [@spacetime], cubble handles full space-time and sparse space-time layouts. +The term spatio-temporal data can incorporate various spatial and temporal characteristics and different data may require different data structures for wrangling and analysis: climate weather stations records data at fixed point location but may suffer from potential temporal data quality issue such as missing data for certain days. GPS data tracks unique point locations at different timestamps and can be organised as trajectories; satellite imagery captures snapshots of landscapes at selected times and is commonly structured as rasters. The spatio-temporal data that cubble addresses are those collected at unique fixed locations, allowing for irregularity in the temporal dimension, such as the weather station data. This corresponds to the full space-time and sparse space-time layouts in the spacetime paper [@spacetime]: ```{r echo = FALSE} p1 <- tibble( @@ -64,12 +64,12 @@ p1 | p2 # The cubble object -The cubble class is an S3 class, built on tibble, to pivot spatio-temporal data into a nested/spatial form and a long/temporal form. It has two subclasses: +The cubble class is an S3 class built on tibble that allows the spatio-temporal data to be wrnagled in two forms: a nested/spatial form and a long/temporal form. It consists of two subclasses: - - a nested/ spatial cubble has class `c("spatial_cubble_df", "cubble_df")` - - a long/ temporal cubble has class `c("temporal_cubble_df", "cubble_df")` + - a nested/ spatial cubble is represented by the class `c("spatial_cubble_df", "cubble_df")` + - a long/ temporal cubble is represented by the class `c("temporal_cubble_df", "cubble_df")` -A nested cubble arranges spatial variables in columns and nests temporal variables in a specialised `ts` column: +In a nested cubble, spatial variables are organised as columns and temporal variables are nested within a specialised `ts` column: ```{r echo = FALSE} cb_nested <- climate_mel @@ -80,9 +80,9 @@ cb_nested class(cb_nested) ``` -This toy dataset is a subset of a larger data `climate_aus` from Global Historical Climatology Network Daily (GHCND). The three airport stations in Melbourne are recorded with station metadata: station ID, longitude, latitude, elevation, station name, world meteorology organisation ID. The temporal variables are precipitation, maximum and minimum temperature, which can be read from the cubble header. +This toy dataset is a subset of a larger data `climate_aus` sourced from the Global Historical Climatology Network Daily (GHCND). It records three airport stations located in Melbourne, Australia and includes spatial variables such as station ID, longitude, latitude, elevation, station name, World Meteorology Organisation ID. The dataset contains temporal variables including precipitation, maximum and minimum temperature, which can be read from the cubble header. -A long cubble expands the temporal variables into the long form and stores the spatial variables as a data attribute: +In a long cubble, the temporal variables are expanded into the long form, while the spatial variables are stored as a data attribute: ```{r echo = FALSE} cb_long <- climate_mel %>% face_temporal() @@ -97,31 +97,31 @@ The cubble header now shows the recorded temporal period (2020-01-01 to 2020-01- # The cubble attributes -A cubble object inherits the attributes from tibble (and its subclasses): `class`, `row.names`, and `names`, in addition to three specialised attributes: +A cubble object inherits the attributes from tibble (and its subclasses): `class`, `row.names`, and `names`. Additionally, it has three specialised attributes: - `key`: the spatial identifier - `index`: the temporal identifier - `coords`: a pair of ordered coordinates associated with the location -Readers known the `key` and `index` attributes from the `tsibble` package would already be familiar the two arguments. In cubble, the `key` attribute identifies the row in the nested cubble, and together with the `index` argument, identifies the row in the long cubble. Currently, cubble only supports one variable as the key and the accepted temporal class for index includes the base R class `Date`, `POSIXlt`, `POSIXct` and tsibble's `tsibble::yearmonth()`, `tsibble::yearweek()`, and `tsibble::yearquarter()` class. +Readers familiar with the `key` and `index` attributes from the `tsibble` package will already know the two arguments. In cubble, the `key` attribute identifies the row in the nested cubble, and when combined with the `index` argument, it identifies the row in the long cubble. Currently, cubble only supports one variable as the key, and the accepted temporal classes for the index include the base R classes `Date`, `POSIXlt`, `POSIXct` as well as tsibble's `tsibble::yearmonth()`, `tsibble::yearweek()`, and `tsibble::yearquarter()` classes. -The `coords` attribute takes an ordered pair of coordinate. It can be a unprojected pair of longitude and latitude, or a projected easting and northing values. Under the hood, the `sf` package is used to calculate the bounding box, shown in the header of a nested cubble, and other spatial operations. +The `coords` attribute represents an ordered pair of coordinates. It can be either an unprojected pair of longitude and latitude or a projected easting and northing value. The `sf` package is used under the hood to calculate the bounding box, displayed in the header of a nested cubble, and perform other spatial operations. -The long cubble has a special attribute `spatial` to store the spatial variables: all the variables in the nested cubble, except for the `ts` column. Below we print the attributes for `cb_nested` and `cb_long`, shown previously: +The long cubble has a special attribute called `spatial` to store the spatial variables, which includes all the variables from the nested cubble except for the `ts` column. Below we print the attributes information for the previously shown `cb_nested` and `cb_long` objects: ```{r} attributes(cb_nested) attributes(cb_long) ``` -The shortcut function are available to extract components in the attributes: +The following shortcut functions are available to extract components from the attributes: - `key_vars()`: the name of the key attribute as a string , i.e. `"id"`, - `key_data()`: the tibble object stored in the key attribute, - - `key()`: the name of the key attribute as a symbol, in a list, i.e. `[[1]] id`, + - `key()`: the name of the key attribute as a symbol in a list, i.e. `[[1]] id`, - `index()`: the index attribute as a symbol, i.e. `date`, - `index_var()`: the index attribute as a string, i.e. `"date"`, - - `coords()`: a character vector of length two for the coordinate pairs, i.e. `"long" "lat"`, and + - `coords()`: a character vector of length two representing the coordinate pairs, i.e. `"long" "lat"`, and - `spatial()`: the tibble object for the spatial variables. # Reference diff --git a/vignettes/cb2create.Rmd b/vignettes/cb2create.Rmd index 50f5131c..b65ab77d 100644 --- a/vignettes/cb2create.Rmd +++ b/vignettes/cb2create.Rmd @@ -18,7 +18,7 @@ knitr::opts_chunk$set( library(cubble) ``` -This article shows you how to create a cubble from data in the wild. You should have already seen examples of constructing a cubble from a tibble in the README page and here are more examples that construct a cubble from: +This article demonstrates how to create a cubble from various types of data. From the README page, you have already seen examples of constructing a cubble from a tibble. Here we provide additional examples of creating a cubble from: - separate spatial and temporal tables - a `tibble` object @@ -28,16 +28,16 @@ This article shows you how to create a cubble from data in the wild. You should # Create from separate spatial and temporal tables -Spatio-temporal data may arrives in separate tables for analysts. For example, in climate data, analysts may initially receive station data containing geographic location information, variables recorded and their recording periods. They can then query the temporal variables using the stations of interest to obtain the corresponding temporal data. Alternatively, analyses may begin as purely spatial or temporal, and analysts may obtain additional temporal or spatial data to expand the result to spatio-temporal. +In many cases, spatio-temporal data arrive in separate tables for analysis. For example, in climate data, analysts may initially receive station data containing geographic location information, recorded variables and their recording periods. They can then query the temporal variables using the stations of interest to obtain the relevant temporal data. Alternatively, analyses may begin as purely spatial or temporal, and analysts may obtain additional temporal or spatial data to expand the result to spatio-temporal. -The function `make_cubble()` composes a cubble object from a spatial table (`spatial`) and a temporal table (`temporal`) with the three attributes `key`, `index`, and `coords` introduced in [1. The cubble class](class.html). The following code creates the nested `cubble`: +The function `make_cubble()` composes a cubble object from a spatial table (`spatial`) and a temporal table (`temporal`), along with the three attributes `key`, `index`, and `coords` introduced in [1. The cubble class](cb1class.html). The following code creates the nested `cubble`: ```{r} make_cubble(spatial = stations, temporal = meteo, key = id, index = date, coords = c(long, lat)) ``` -The `key` and `index` argument comes from the same arguments from the tsibble objects and `coords` derives from the geometry column in an sf object. Hence, the corresponding argument can be safely omitted, if the spatial data is an sf object, i.e. `stations_sf`, or the temporal data is a tsibble object, i.e. `meteo_ts`. The sf and tsibble class from the input will be carried over to the cubble object: +The `coords` argument can be safely omitted if the spatial data is an sf object (e.g. `stations_sf`) . Similarly, if the temporal object is a tsibble (i.e. `meteo_ts`), you don't need to specify the `key` and `index` arguments. The class attributes from sf and tsibble will be carried over to the nested and long cubble: ```{r echo = TRUE} (res <- make_cubble(spatial = stations_sf, temporal = meteo_ts)) @@ -45,19 +45,21 @@ class(res) class(res$ts[[1]]) ``` +The viegnette [3. Compatibility with tsibble and sf](cb3tsibblesf.html) will introduce more on the cubble's compatibility with tsibble and sf. + # Coerce from foreign objects -## a `tibble` object +## The `tibble` objects -The dataset `climate_flat` joins the spatial data, `stations`, with the temporal data, `meteo`, to a single tibble object and it can be coerced into a cubble using: +The dataset `climate_flat` combines the spatial data, `stations`, with the temporal data, `meteo`, into a single tibble object. It can be coerced into a cubble using: ```{r} climate_flat %>% as_cubble(key = id, index = date, coords = c(long, lat)) ``` -## NetCDF data +## The NetCDF data -In `R`, packages for wrangling NetCDF data include a high-level R interface: `ncdf4`, a low-level interface that calls a C-interface: `RNetCDF`, and a tidyverse implementation: `tidync`. The code below casts a NetCDF object in the ncdf4 class into a cubble object: +In `R`, there are several packages available for wrangling NetCDF data, including `ncdf4`, `RNetCDF`, and `tidync`. The code below converts a NetCDF object of class ncdf4 into a cubble object: ```{r echo = TRUE} path <- system.file("ncdf/era5-pressure.nc", package = "cubble") @@ -65,17 +67,14 @@ raw <- ncdf4::nc_open(path) as_cubble(raw) ``` -Sometimes, one may want to read in a subset of the NetCDF data and the argument `vars`, `long_range` and `lat_range` can be used to subset on the variable and the grid resolution: +Sometimes, analysts may choose to read only a subset of the NetCDF data. In such cases, the `vars`, `long_range` and `lat_range` arguments can be used to subset the data based on the variable and the grid resolution: ```{r echo = TRUE, eval = FALSE} as_cubble(raw, vars = "q", long_range = seq(-180, 180, 1), lat_range = seq(-90, 90, 1)) ``` -We would recommend reducing to about 300 $\times$ 300 grid points for three daily variables in one year. A 300 by 300 spatial grid can be a bounding box of [100, -80, 180, 0] at 0.25 degree resolution or a global bounding box [-180, -90, 180, -90] at 1 degree resolution. - - -## An `stars` object +## The `stars` objects ```{r} tif <- system.file("tif/L7_ETMs.tif", package = "stars") @@ -83,9 +82,9 @@ x <- stars::read_stars(tif) x %>% as_cubble() ``` -When the `dimensions` object is too complex for `cubble` to handle, the package will emit an message. +When the `dimensions` object is too complex for the `cubble` package to handle, a warning message will be generated. -## An `sftime` object +## The `sftime` objects ```{r} dt <- climate_flat %>% diff --git a/vignettes/cb3tsibblesf.Rmd b/vignettes/cb3tsibblesf.Rmd index 6e410e64..c8ad2254 100644 --- a/vignettes/cb3tsibblesf.Rmd +++ b/vignettes/cb3tsibblesf.Rmd @@ -22,9 +22,9 @@ library(sf) ``` -Analysts often have their own preferred spatial or temporal data structure, which they may wish to keep for spatio-temporal analysis. For example, the `tbl_ts` class from the tsibble package [@tsibble] is commonly used in time series forecasting and similarly, the sf class [@sf] is often used in spatial data science. In cubble, analysts can combine these two structures together by allowing the spatial component to also be an sf object and the temporal component to also be a tsibble object. +Analysts often have their preferred spatial or temporal data structure that they prefer to use for spatio-temporal analysis. For example, the `tbl_ts` class from the tsibble package [@tsibble] is commonly used in time series forecasting and the sf class [@sf] is frequently used in spatial data science. In cubble, analysts have the flexibility to combine these two structures together by allowing the spatial component to be an sf object and the temporal component to also be a tsibble object. -# A temporal component with tsibble +# USing a tsibble for the temporal component The `key` and `index` arguments in a cubble object corresponds to the tsibble counterparts and they can be safely omitted, if the temporal component is a tsibble object, i.e. `meteo_ts` in the example below. The tsibble class from the input will be carried over to the cubble object: diff --git a/vignettes/cb4glyph.Rmd b/vignettes/cb4glyph.Rmd index 1fdab3d3..aefb8369 100644 --- a/vignettes/cb4glyph.Rmd +++ b/vignettes/cb4glyph.Rmd @@ -24,37 +24,20 @@ library(cubble) library(ggplot2) ``` -Sometimes, we wish to communicate spatial and temporal information collectively through visualisation. This can be done through making faceted maps across time, creating map animation, or in interactive graphics, constructing linking between maps and time series plot. +Sometimes, we wish to communicate spatial and temporal information collectively through visualisation. This can be achieved through several graphical displays: one can make faceted maps across time, creating map animations, or constructing interactive graphics to link between map and time series plot. While interactive graphics will be the main focus of vignette [6. Interactive graphics](cb6interactive.hml), this vignette will introduce a specific type of spatio-temporal plot called glyph maps. -This vignette will introduce a type of spatio-temporal plot, glyph map, which displays spatial and temporal information in a single plot using linear algebra. +# Understanding glyph maps -# What is a glyph map? - -Glyph maps are initially proposed in @Wickham2012-yr and the idea is to transform the temporal coordinates into the spatial coordinates so that time series plot can be displayed on the map. The following diagram illustrates how the transformation works: +The concept of glyph maps was initially proposed in @Wickham2012-yr. The underlying idea is to transform the temporal coordinates into spatial coordinates so that time series plot can be displayed on the map. The diagram below illustrates how the coordinate transformation: ```{r echo = FALSE} knitr::include_graphics("cluster-diagram/glyph-steps.png") ``` -Subplot (1) and (2) show the location of a weather station and its associated maximum temperature in 2020. In (3), the same time series is transformed into the spatial coordinates with a defined `height` and `width` using linear algebra (Equation 1 in @Wickham2012-yr). The transformed time series can then be placed on the map in (4). A polar transformation (Equation 2 in @Wickham2012-yr) is also available with wrap the time series plot into a circle and it is useful to visualise seasonality. - -The package `GGally` initially implement the glyph map. It uses `glyphs()` to calculate the axis transformation and then uses `geom_polygon()` to draw the map. +Subplot (1) show the spatial location of a weather station and subplot (2) displays its associated maximum temperature as time series in 2020. In subplot (3), the temporal coordinates are transformed into the spatial coordinates using linear algebra with a defined `height` and `width` (Equation 1 in @Wickham2012-yr), while the time series glyph remains unchanged. The transformed time series can then be plotted as a layer on the map in (4). -``` -gly <- glyphs(data, x_major = ..., x_minor = ..., y_major = ..., y_minor = ..., ...) - -# `gx`, `gy`, and `gid` are created within `glyphs()` -ggplot(gly, aes(gx, gy, group = gid)) + - geom_path() -``` - -Four variables are required to construct a glyph map: `x_major`, `y_major`, `x_minor`, and `y_minor`. The major axes are the coordinates used to create the map and in the illustration above `x_major = long, y_major = lat`. The minor axes are the x/y variable used to construct the time series plot(`x_minor = date, y_minor = tmax` in above). - - -# Glyph map in cubble - -The `cubble` package implements the glyph map in a `geom_glyph()`, which perform the linear algebra internally as data transformation before the plot rendering. This allows you to use the conventionally `aes()` syntax within `geom_glyph()` to specify the four major/minor axes: +The package `GGally` initially implement the glyph map. It uses `glyphs()` to calculate the axis transformation and then uses `geom_polygon()` to draw the map. In cubble, a ggproto implementation `geom_glyph()` performs the linear algebra internally as data transformation . The `geom_glyph()` requires four aesthetics: `x_major`, `y_major`, `x_minor`, and `y_minor`. The major axes are the outer spatial coordinates and the minor axes are the inner/ temporal coordinates: ``` data |> @@ -62,7 +45,7 @@ data |> geom_glyph(aes(x_major = ..., x_minor = ..., y_major = ..., y_minor = ...)) ``` -Reference line and box can be added by separate geoms (`geom_glyph_box()`, `geom_glyph_line()`) with the same aesthetics (`x_major, x_minor, y_major, y_minor`) and to avoid repetition, you may want specify them collectively in `ggplot()`: +Reference line and box can be added by separate geoms (`geom_glyph_box()`, `geom_glyph_line()`) with the same aesthetics (`x_major, x_minor, y_major, y_minor`). To avoid repetition, you may want specify the aesthetics collectively inside `ggplot()`: ``` data |> @@ -72,7 +55,7 @@ data |> geom_glyph() ``` -If you want add additional layer to the plot, i.e. an undelying map, that does not use the four glyph map aesthetics, the argument `inherit.aes = FALSE` is handy: +If you want add an undelying map which does not use the four glyph map aesthetics, the argument `inherit.aes = FALSE` is handy: ``` data |> @@ -112,8 +95,6 @@ df %>% labs(x = "Longitude", y = "Latitude") ``` - - ```{r eval = FALSE, echo = FALSE} # script for diagram library(tidyverse) diff --git a/vignettes/cb5match.Rmd b/vignettes/cb5match.Rmd index 5aa66395..08ecfef7 100644 --- a/vignettes/cb5match.Rmd +++ b/vignettes/cb5match.Rmd @@ -21,18 +21,18 @@ library(ggplot2) library(patchwork) ``` -One common type of task with spatio-temporal data is to match nearby sites. For example, we may want to verify the location of an old list of stations with current stations, or we may want to match the data from different data sources. This vignette introduces the spatial and temporal matching in cubble with an example on matching river level data with precipitation in Victoria, Australia. +One common task when working with spatio-temporal data is to match nearby sites. For example, we may want to verify the location of an old list of stations with current stations, or we may want to match the data from different data sources. In this vignette, we will introduce the spatial and temporal matching in cubble using an example on matching river level data with precipitation in Victoria, Australia. -In cubble, data can be matched spatially or temporarily with `match_spatial()` and `match_temporal()`. The function `match_spatial()` calculates the spatial distance between observations in two cubbles. Different distances are available with projected or unprojected coordinate reference system. Analysts can subset the number of matched group to output with argument `spatial_n_group` (by default 4 groups) and the number of match per group with argument `spatial_n_group` (by default 1, that is, one-to-one matching). The syntax to use `match_spatial()` is +In cubble, spatial and tmeporal matching are performed using the functions `match_spatial()` and `match_temporal()`. The `match_spatial()` function calculates the spatial distance between observations in two cubble objects. Various distance measures are available (check `sf::st_distance`). Analysts can specify the number of matched groups to output using the `spatial_n_group` argument (default to 4 groups) and the number of matches per group using the `spatial_n_group`argument (default to 1, one-to-one matching). The syntax to use `match_spatial()` is: ```` match_spatial(, , ...) ```` -The function `match_temporal()` calculates the time series similarity between spatially matched groups. Two identifiers needs to be specified on the variable separates the each matched group (`match_id`) and the variable separates the two sources (`data_id`). The argument `temporal_by` uses the `by` syntax from dplyr `*_join` to specify the temporal matching variable. +The function `match_temporal()` calculates the similarity between time series within spatially matched groups. Two identifiers are required: one for separating each matched group (`match_id`) and one for separating the two data sources (`data_id`). The argument `temporal_by` uses the `by` syntax from dplyr's `*_join` to specify the temporal variables to match. - The similarity score between two time series in the spatially matched group is calculated by a matching function, which analysts can customise. The matching function should take two time series in a list and output a single numerical score, which allows for interfacing with existing time series distance calculation implementation. By default, cubble implements a simple peak matching algorithm (`match_peak`) that counts the number of peaks in two time series that fall within a specified temporal window. The syntax to use `match_temporal()` is +The similarity score between two time series is calculated using a matching function, which can be customised by the analysts. The matching function takes two time series as a list and returns a single numerical score. This allows for flexibility in using existing time series distance calculation implementation. By default, cubble implements a simple peak matching algorithm (`match_peak`) that counts the number of peaks in two time series that fall within a specified temporal window. The syntax to use `match_temporal()` is ```` match_temporal( @@ -44,7 +44,7 @@ match_temporal( # Spatial matching -Bureau of Meteorology collects [water data](http://www.bom.gov.au/metadata/catalogue/19115/ANZCW0503900528?template=full) from river gauges and this includes variables: electrical conductivity, turbidity, water course discharge, water course level, and water temperature. In particular, water level will interactive with precipitation from the climate data since rainfall will raise the water level in the river. Here is the location of available weather station and water gauges in Victoria: +Now let's consider an example of matching water data from river gauges with precipitation. The [water level data]((http://www.bom.gov.au/metadata/catalogue/19115/ANZCW0503900528?template=full), collected by the Bureau of Meteorology, can be compared with the precipitation since rainfall can directly impact water level in river. Here is the location of available weather stations and water gauges in Victoria, Australia: ```{r echo = FALSE} river <- cubble::river %>% mutate(type = "river") %>% rename(id = station) @@ -66,13 +66,13 @@ ggplot() + scale_color_brewer(palette = "Dark2") ``` -Both `climate_vic` and `river` are cubble objects and we can get a summary of the 10 closest pairs: +Both `climate_vic` and `river` are cubble objects, and we can obtain a summary of the 10 closest pairs between them: ```{r} (res_sp <- match_spatial(climate_vic, river, spatial_n_group = 10)) ``` -The result can also be returned as cubble objects with argument `return_cubble = TRUE`. The output is a list where each element is a paired cubble object and you may consider combining all the results into a single cubble with `bind_rows()`. Care needs to be taken on in the case when a site is close to two stations since by construction, cubble require unique rows in the nested form. From the summary table above, river station `226027` is matched to more than one weather station: `ASN00085072` (group 3) and `ASN00085298` (group 5). (Similarly river station `230200` is matched in group 7 and 8). One can either deselect one pair before binding the results, or take the list and work with the `purrr::map` syntax: +The result can also be returned as cubble objects by setting the argument `return_cubble = TRUE`. The output is be a list where each element is a paired cubble object. To combine all the results into a single cubble, you can use `bind_rows()`. In the case when a site in the second cubble (the `river` data here) is matched to two stations in the first cubble (`climate_vic` here), the binding may not be successful since cubble requires unique rows in the nested form. In the summary table above, the river station `226027` is matched to more than one weather station: `ASN00085072` (group 3) and `ASN00085298` (group 5). Similarly, the river station `230200` is matched in group 7 and 8). In such cases, you can either deselect one pair before combining, or work with the list output with the `purrr::map` syntax: ```{r} res_sp <- match_spatial(climate_vic, river, spatial_n_group = 10, return_cubble = TRUE) @@ -84,7 +84,7 @@ res_sp[[1]] # Temporal matching -For temporal matching, the variable water level (`Water_course_level`) from the river data will be matched to precipitation (`prcp`) in the weather station data. The variable identifying each matched group is `group` and the variable identifying the two datasets is `type`: +For temporal matching, we match teh variable `Water_course_level` from the river data to `prcp` in the weather station data. The variable `group` and `types` identify the matching group and the two datasets: ```{r} (res_tm <- res_sp %>% @@ -93,7 +93,7 @@ For temporal matching, the variable water level (`Water_course_level`) from the temporal_by = c("prcp" = "Water_course_level"))) ``` -Similarly, the cubble output can be returned with the argument `return_cubble = TRUE`. Here we select the four pairs with the highest number of matching peaks: +Similarly, the cubble output can be returned using the argument `return_cubble = TRUE`. Here we select the four pairs with the highest number of matching peaks: ```{r} res_tm <- res_sp %>% diff --git a/vignettes/cb6interactive.Rmd b/vignettes/cb6interactive.Rmd index b47430f3..0fdce128 100644 --- a/vignettes/cb6interactive.Rmd +++ b/vignettes/cb6interactive.Rmd @@ -24,7 +24,7 @@ library(crosstalk) library(plotly) ``` -Interactive graphics can be useful because they make it possible to look at the data in multiple of ways on-the-fly. This is especially important for spatio-temporal data, where we would like to interactively connect spatial and temporal displays. This vignette will show you how to make an interactive graphic with a cubble object. We will be using `crosstalk::bscols()` to create a linked interactive plot of an Australia map, made with leaflet, and a ggplot-turned plotly time series plot: +Interactive graphics can be useful when working with spatio-temporal data since they allow for exploring the data from multiple perspective. In this vignette, we will demonstrate how to create an interactive graphic with cubble objects. We will be using `crosstalk::bscols()` to create a linked interactive plot of an Australia map, created with leaflet, and a ggplot plotly time series plot: ```{r echo = FALSE} knitr::include_graphics("cluster-diagram/interactive.png") @@ -33,14 +33,13 @@ knitr::include_graphics("cluster-diagram/interactive.png") This vignette assumes you have gone through [Get started](cubble.html) and are familiar with basic data wrangling in cubble with `face_temporal()` and `face_spatial()`. -# Variation of the diurnal temperature range in Australia +# Variation of diurnal temperature range in Australia -Australia occupies a and different temperature patterns can be observed. Given the maximum and minimum temperature in the climate data in `climate_subset`, we can compute the average maximum and minimum temperature by month at each location. The difference between the maximum and minimum temperature, the diurnal temperature range, has different variations across the year and its variance will be used as the color for our plot. The codes below compute these variables: +Australia has diverse climate conditions with different temperature patterns across its regions. and different temperature patterns can be observed. We can compute the average maximum and minimum temperature by month at 30 locations sampled from the dataset `climate_aus`. The diurnal temperature range, the difference between the maximum and minimum temperature, has different variations throughout the year. We will use its variance to color the plot. The codes below calculate these variables: ```{r} set.seed(123) -climate_smaller <- climate_aus |> head(n = 30) %>% - filter(!id %in% c("ASN00003030", "ASN00004028", "ASN00006044")) +climate_smaller <- climate_aus |> head(n = 30) (clean <- climate_smaller |> face_temporal() |> group_by(month = lubridate::month(date, label = TRUE, abbr = TRUE)) |> @@ -58,7 +57,7 @@ climate_smaller <- climate_aus |> head(n = 30) %>% # Linking with crosstalk -Crosstalk accepts linking between multiple data objects in the same group. Here we create two SharedData objects (one using the nested form and another using the long form), with `id` as the key and give them the same group name (`group = "cubble"`): +We create two SharedData objects in crosstalk - one using the nested cubble and another using the long cubble. We will use the `id` as the key and give them the same group name (`group = "cubble"`): ```{r} nested <- clean %>% SharedData$new(~id, group = "cubble") @@ -71,7 +70,7 @@ long <- clean |> # Create maps with leaflet -A basic leaflet map of stations can be created with an underlying map tile (`addTiles()`) and points to represent stations (`addCircleMarkers()`): +To create a basic leaflet map showing station location, we can use `addTiles()` to create an underlying map and `addCircleMarkers()` to add points representing the stations: ``` leaflet(nested, width = 300, height = 300) |> @@ -79,7 +78,7 @@ leaflet(nested, width = 300, height = 300) |> addCircleMarkers() ``` -Applying color to the stations requires mapping the variable in the data to the color palette. Here the numerical variable `temp_diff_var` is mapped onto a sequential color palette, Rocket, with some fine-tuning using `colorNumeric()`. A popup of station names can be added with the `popup` argument in `addCircleMarkers()` and a `~` is needed when specifying variable name in leaflet syntax: +To apply colors to the stations, we need to map a variable in the data to a color palette. In this example, we map the numerical variable `temp_diff_var` to a sequential color palette, Rocket, with some color fine-tuning using `colorNumeric()`. We also add a popup to display the station names using the `popup` argument in `addCircleMarkers()`: ```{r} domain <- clean$temp_diff_var @@ -95,9 +94,9 @@ map <- leaflet(nested, width = 300, height = 300) |> ``` -# Time series plot with plotly +# Creating time series plot with plotly -The time series plot can show the temperature band of each station, allowing for visualising the diurnal temperature range by month. We use `geom_ribbon()` to create a temperature band that shows both the maximum and minimum temperature and add `geom_points()` to allow selection on the plot: +The time series plot allows us to visualize the temperature band of each station, providing insights into the diurnal temperature range by month. We can use geom_ribbon() to create a temperature band that displays both the maximum and minimum temperature: ```{r} ts_static <- long %>% @@ -121,7 +120,7 @@ ts_static <- long %>% ) ``` -The static ggplot object can be turned into a plotly object with `ggplotly()` and `plotly::highlight()` enable the selection with box or lasso (`on = "plotly_selected"`): +The static ggplot object can be converted into a plotly object using `ggplotly()` and `plotly::highlight()` enable the box or lasso selection (`on = "plotly_selected"`): ```{r} ts_interactive <- ggplotly(ts_static, width = 600, height = 300) %>% @@ -131,7 +130,7 @@ ts_interactive <- ggplotly(ts_static, width = 600, height = 300) %>% # Assemble into a linked plot -`crosstalk::bscols()` can be thought of as the `patchwork` for interactive graphics, which arranges multiple interactive graphics in columns. +`crosstalk::bscols()` combines multiple interactive graphics in columns: ```{r eval = FALSE} bscols(map, ts_interactive, widths = c(4, 6)) @@ -145,13 +144,13 @@ knitr::include_graphics("cluster-diagram/interactive-full.png") # Making selection to see the linking -The selection built in the linked plot goes in both directions. In the screenshot below, a lasso selection is made on the time series and this links to cygnet bay on the northwest coastline of Australia. The area has a larger temperature range in July than in the summer period (December - February). +The selection in the linked plot works in both directions. In the screenshot below, a lasso selection is made on the time series, linking to the Cygnet Bay on the northwest coastline of Australia. In July, this area shows a larger temperature range compared to the summer period (December - February). ```{r echo = FALSE, out.width="150%"} knitr::include_graphics("cluster-diagram/selection1.png") ``` -Selection on the leaflet map is made through the selection tool below the zoom-in/out bottom on the map. Two selections are made on northern Australia and inland Queensland. Northern Australia has a narrow temperature range constantly 20 degrees throughout the year, while the temperature range in inland Queensland is much larger and there is a clear difference between the summer and winter periods. +Selection on the leaflet map can be made using the selection tool below the zoom-in/out bottom on the map. In the screenshot, two selections are made - one on northern Australia and the other in inland Queensland. Northern Australia has a narrow temperature range constantly 20 degrees throughout the year, while inland Queensland temperature has a much larger temperature range with a noticable difference between the summer and winter periods. ```{r echo = FALSE, out.width="150%"} knitr::include_graphics("cluster-diagram/selection2.png")