Skip to content

Commit

Permalink
proofread
Browse files Browse the repository at this point in the history
  • Loading branch information
huizezhang-sherry committed Jun 16, 2023
1 parent fa6c80f commit 70cc662
Show file tree
Hide file tree
Showing 7 changed files with 60 additions and 86 deletions.
5 changes: 0 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,6 @@ the long cubble has class: `temporal_cubble_df` and `cubble_df`.
* refactor the matching function
* better integration with sf and tsibble

# cubble 0.2.2

* `as_cubble()` method for `sftime` objects
* new function `add_geometry_column()` to facilitate class cast with `sftime` objects (#15)

# cubble 0.2.1

* add pkg logo
Expand Down
30 changes: 15 additions & 15 deletions vignettes/cb1class.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "1. The cubble class"
output: rmarkdown::html_vignette
bibliography: '`r system.file("reference.bib", package = "cubble")`'
vignette: >
%\VignetteIndexEntry{1. class}
%\VignetteIndexEntry{class}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
Expand All @@ -27,7 +27,7 @@ library(tsibble)
library(patchwork)
```

Spatio-temporal data comes various spatial and temporal characteristics and requires different data structures to wrangle: climate weather stations are recorded at fixed point location but with potential temporal data quality issue (missingness on the day); GPS data tracks unique point locations at different timestamps; satellite imageries captures snapshots of landscape at selected time. The type of spatio-temporal data cubble tackles are those collected at unique fixed locations while allowed for irregularity in the temporal dimension, like the weather station data. In the four layouts presented by the spacetime paper [@spacetime], cubble handles full space-time and sparse space-time layouts.
The term spatio-temporal data can incorporate various spatial and temporal characteristics and different data may require different data structures for wrangling and analysis: climate weather stations records data at fixed point location but may suffer from potential temporal data quality issue such as missing data for certain days. GPS data tracks unique point locations at different timestamps and can be organised as trajectories; satellite imagery captures snapshots of landscapes at selected times and is commonly structured as rasters. The spatio-temporal data that cubble addresses are those collected at unique fixed locations, allowing for irregularity in the temporal dimension, such as the weather station data. This corresponds to the full space-time and sparse space-time layouts in the spacetime paper [@spacetime]:

```{r echo = FALSE}
p1 <- tibble(
Expand Down Expand Up @@ -64,12 +64,12 @@ p1 | p2

# The cubble object

The cubble class is an S3 class, built on tibble, to pivot spatio-temporal data into a nested/spatial form and a long/temporal form. It has two subclasses:
The cubble class is an S3 class built on tibble that allows the spatio-temporal data to be wrnagled in two forms: a nested/spatial form and a long/temporal form. It consists of two subclasses:

- a nested/ spatial cubble has class `c("spatial_cubble_df", "cubble_df")`
- a long/ temporal cubble has class `c("temporal_cubble_df", "cubble_df")`
- a nested/ spatial cubble is represented by the class `c("spatial_cubble_df", "cubble_df")`
- a long/ temporal cubble is represented by the class `c("temporal_cubble_df", "cubble_df")`

A nested cubble arranges spatial variables in columns and nests temporal variables in a specialised `ts` column:
In a nested cubble, spatial variables are organised as columns and temporal variables are nested within a specialised `ts` column:

```{r echo = FALSE}
cb_nested <- climate_mel
Expand All @@ -80,9 +80,9 @@ cb_nested
class(cb_nested)
```

This toy dataset is a subset of a larger data `climate_aus` from Global Historical Climatology Network Daily (GHCND). The three airport stations in Melbourne are recorded with station metadata: station ID, longitude, latitude, elevation, station name, world meteorology organisation ID. The temporal variables are precipitation, maximum and minimum temperature, which can be read from the cubble header.
This toy dataset is a subset of a larger data `climate_aus` sourced from the Global Historical Climatology Network Daily (GHCND). It records three airport stations located in Melbourne, Australia and includes spatial variables such as station ID, longitude, latitude, elevation, station name, World Meteorology Organisation ID. The dataset contains temporal variables including precipitation, maximum and minimum temperature, which can be read from the cubble header.

A long cubble expands the temporal variables into the long form and stores the spatial variables as a data attribute:
In a long cubble, the temporal variables are expanded into the long form, while the spatial variables are stored as a data attribute:

```{r echo = FALSE}
cb_long <- climate_mel %>% face_temporal()
Expand All @@ -97,31 +97,31 @@ The cubble header now shows the recorded temporal period (2020-01-01 to 2020-01-

# The cubble attributes

A cubble object inherits the attributes from tibble (and its subclasses): `class`, `row.names`, and `names`, in addition to three specialised attributes:
A cubble object inherits the attributes from tibble (and its subclasses): `class`, `row.names`, and `names`. Additionally, it has three specialised attributes:

- `key`: the spatial identifier
- `index`: the temporal identifier
- `coords`: a pair of ordered coordinates associated with the location

Readers known the `key` and `index` attributes from the `tsibble` package would already be familiar the two arguments. In cubble, the `key` attribute identifies the row in the nested cubble, and together with the `index` argument, identifies the row in the long cubble. Currently, cubble only supports one variable as the key and the accepted temporal class for index includes the base R class `Date`, `POSIXlt`, `POSIXct` and tsibble's `tsibble::yearmonth()`, `tsibble::yearweek()`, and `tsibble::yearquarter()` class.
Readers familiar with the `key` and `index` attributes from the `tsibble` package will already know the two arguments. In cubble, the `key` attribute identifies the row in the nested cubble, and when combined with the `index` argument, it identifies the row in the long cubble. Currently, cubble only supports one variable as the key, and the accepted temporal classes for the index include the base R classes `Date`, `POSIXlt`, `POSIXct` as well as tsibble's `tsibble::yearmonth()`, `tsibble::yearweek()`, and `tsibble::yearquarter()` classes.

The `coords` attribute takes an ordered pair of coordinate. It can be a unprojected pair of longitude and latitude, or a projected easting and northing values. Under the hood, the `sf` package is used to calculate the bounding box, shown in the header of a nested cubble, and other spatial operations.
The `coords` attribute represents an ordered pair of coordinates. It can be either an unprojected pair of longitude and latitude or a projected easting and northing value. The `sf` package is used under the hood to calculate the bounding box, displayed in the header of a nested cubble, and perform other spatial operations.

The long cubble has a special attribute `spatial` to store the spatial variables: all the variables in the nested cubble, except for the `ts` column. Below we print the attributes for `cb_nested` and `cb_long`, shown previously:
The long cubble has a special attribute called `spatial` to store the spatial variables, which includes all the variables from the nested cubble except for the `ts` column. Below we print the attributes information for the previously shown `cb_nested` and `cb_long` objects:

```{r}
attributes(cb_nested)
attributes(cb_long)
```

The shortcut function are available to extract components in the attributes:
The following shortcut functions are available to extract components from the attributes:

- `key_vars()`: the name of the key attribute as a string , i.e. `"id"`,
- `key_data()`: the tibble object stored in the key attribute,
- `key()`: the name of the key attribute as a symbol, in a list, i.e. `[[1]] id`,
- `key()`: the name of the key attribute as a symbol in a list, i.e. `[[1]] id`,
- `index()`: the index attribute as a symbol, i.e. `date`,
- `index_var()`: the index attribute as a string, i.e. `"date"`,
- `coords()`: a character vector of length two for the coordinate pairs, i.e. `"long" "lat"`, and
- `coords()`: a character vector of length two representing the coordinate pairs, i.e. `"long" "lat"`, and
- `spatial()`: the tibble object for the spatial variables.

# Reference
29 changes: 14 additions & 15 deletions vignettes/cb2create.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ knitr::opts_chunk$set(
library(cubble)
```

This article shows you how to create a cubble from data in the wild. You should have already seen examples of constructing a cubble from a tibble in the README page and here are more examples that construct a cubble from:
This article demonstrates how to create a cubble from various types of data. From the README page, you have already seen examples of constructing a cubble from a tibble. Here we provide additional examples of creating a cubble from:

- separate spatial and temporal tables
- a `tibble` object
Expand All @@ -28,64 +28,63 @@ This article shows you how to create a cubble from data in the wild. You should

# Create from separate spatial and temporal tables

Spatio-temporal data may arrives in separate tables for analysts. For example, in climate data, analysts may initially receive station data containing geographic location information, variables recorded and their recording periods. They can then query the temporal variables using the stations of interest to obtain the corresponding temporal data. Alternatively, analyses may begin as purely spatial or temporal, and analysts may obtain additional temporal or spatial data to expand the result to spatio-temporal.
In many cases, spatio-temporal data arrive in separate tables for analysis. For example, in climate data, analysts may initially receive station data containing geographic location information, recorded variables and their recording periods. They can then query the temporal variables using the stations of interest to obtain the relevant temporal data. Alternatively, analyses may begin as purely spatial or temporal, and analysts may obtain additional temporal or spatial data to expand the result to spatio-temporal.

The function `make_cubble()` composes a cubble object from a spatial table (`spatial`) and a temporal table (`temporal`) with the three attributes `key`, `index`, and `coords` introduced in [1. The cubble class](class.html). The following code creates the nested `cubble`:
The function `make_cubble()` composes a cubble object from a spatial table (`spatial`) and a temporal table (`temporal`), along with the three attributes `key`, `index`, and `coords` introduced in [1. The cubble class](cb1class.html). The following code creates the nested `cubble`:

```{r}
make_cubble(spatial = stations, temporal = meteo,
key = id, index = date, coords = c(long, lat))
```

The `key` and `index` argument comes from the same arguments from the tsibble objects and `coords` derives from the geometry column in an sf object. Hence, the corresponding argument can be safely omitted, if the spatial data is an sf object, i.e. `stations_sf`, or the temporal data is a tsibble object, i.e. `meteo_ts`. The sf and tsibble class from the input will be carried over to the cubble object:
The `coords` argument can be safely omitted if the spatial data is an sf object (e.g. `stations_sf`) . Similarly, if the temporal object is a tsibble (i.e. `meteo_ts`), you don't need to specify the `key` and `index` arguments. The class attributes from sf and tsibble will be carried over to the nested and long cubble:

```{r echo = TRUE}
(res <- make_cubble(spatial = stations_sf, temporal = meteo_ts))
class(res)
class(res$ts[[1]])
```

The viegnette [3. Compatibility with tsibble and sf](cb3tsibblesf.html) will introduce more on the cubble's compatibility with tsibble and sf.

# Coerce from foreign objects

## a `tibble` object
## The `tibble` objects

The dataset `climate_flat` joins the spatial data, `stations`, with the temporal data, `meteo`, to a single tibble object and it can be coerced into a cubble using:
The dataset `climate_flat` combines the spatial data, `stations`, with the temporal data, `meteo`, into a single tibble object. It can be coerced into a cubble using:

```{r}
climate_flat %>% as_cubble(key = id, index = date, coords = c(long, lat))
```

## NetCDF data
## The NetCDF data

In `R`, packages for wrangling NetCDF data include a high-level R interface: `ncdf4`, a low-level interface that calls a C-interface: `RNetCDF`, and a tidyverse implementation: `tidync`. The code below casts a NetCDF object in the ncdf4 class into a cubble object:
In `R`, there are several packages available for wrangling NetCDF data, including `ncdf4`, `RNetCDF`, and `tidync`. The code below converts a NetCDF object of class ncdf4 into a cubble object:

```{r echo = TRUE}
path <- system.file("ncdf/era5-pressure.nc", package = "cubble")
raw <- ncdf4::nc_open(path)
as_cubble(raw)
```

Sometimes, one may want to read in a subset of the NetCDF data and the argument `vars`, `long_range` and `lat_range` can be used to subset on the variable and the grid resolution:
Sometimes, analysts may choose to read only a subset of the NetCDF data. In such cases, the `vars`, `long_range` and `lat_range` arguments can be used to subset the data based on the variable and the grid resolution:

```{r echo = TRUE, eval = FALSE}
as_cubble(raw, vars = "q",
long_range = seq(-180, 180, 1), lat_range = seq(-90, 90, 1))
```

We would recommend reducing to about 300 $\times$ 300 grid points for three daily variables in one year. A 300 by 300 spatial grid can be a bounding box of [100, -80, 180, 0] at 0.25 degree resolution or a global bounding box [-180, -90, 180, -90] at 1 degree resolution.


## An `stars` object
## The `stars` objects

```{r}
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
x <- stars::read_stars(tif)
x %>% as_cubble()
```

When the `dimensions` object is too complex for `cubble` to handle, the package will emit an message.
When the `dimensions` object is too complex for the `cubble` package to handle, a warning message will be generated.

## An `sftime` object
## The `sftime` objects

```{r}
dt <- climate_flat %>%
Expand Down
4 changes: 2 additions & 2 deletions vignettes/cb3tsibblesf.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ library(sf)
```


Analysts often have their own preferred spatial or temporal data structure, which they may wish to keep for spatio-temporal analysis. For example, the `tbl_ts` class from the tsibble package [@tsibble] is commonly used in time series forecasting and similarly, the sf class [@sf] is often used in spatial data science. In cubble, analysts can combine these two structures together by allowing the spatial component to also be an sf object and the temporal component to also be a tsibble object.
Analysts often have their preferred spatial or temporal data structure that they prefer to use for spatio-temporal analysis. For example, the `tbl_ts` class from the tsibble package [@tsibble] is commonly used in time series forecasting and the sf class [@sf] is frequently used in spatial data science. In cubble, analysts have the flexibility to combine these two structures together by allowing the spatial component to be an sf object and the temporal component to also be a tsibble object.

# A temporal component with tsibble
# USing a tsibble for the temporal component

The `key` and `index` arguments in a cubble object corresponds to the tsibble counterparts and they can be safely omitted, if the temporal component is a tsibble object, i.e. `meteo_ts` in the example below. The tsibble class from the input will be carried over to the cubble object:

Expand Down
Loading

0 comments on commit 70cc662

Please sign in to comment.