We welcome all contributors to sits package! Please submit questions, bug reports, and requests in the issues tracker. If you plan to contribute code, go ahead! Fork the repo and submit a pull request. A few notes:
- This package is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
- If you have large changes, please open an issue first to discuss.
- We will include contributors as authors in the DESCRIPTION file (with their permission) for contributions that go beyond small typos in code or documentation.
- This package generally uses the rOpenSci packaging guidelines for style and structure.
- Documentation is generated by roxygen2. Please write documentation in code files and let it auto-generate documentation files.
- For more substantial contributions, consider adding a new section to one of the chapters of the SITS book (https://e-sensing.github.io/sitsbook/), which has been written in R markdown and whose source is available in the sitsbook repository.
- We aim for testing that has high coverage and is robust. Include tests with any major contribution to code.
- We particularly welcome additions in two areas: new STAC-based image repositories and new raster machine learning/deep learning algorithms. Please see more details below.
New functions that build on the sits
API should follow the general principles below.
-
The target audience for
sits
is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of thesits
API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master. -
For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.
-
Most functions in
sits
use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of thesits_bands()
function. -
Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.
-
Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using
if-else
clauses, such asif A, do X; else do Y
consider using generic functions. -
Functions that use the
torch
package use the R6 model to be compatible with that package. See for example, the code insits_tempcnn.R
andapi_torch.R
. To convertpyTorch
code to R and include it is straightforward. Please see the Technical Annex of the sits on-line book.
The sits code
relies on the packages of the tidyverse
to work with tables and list. We use dplyr
and tidyr
for data selection and wrangling, purrr
and slider
for loops on lists and table, lubridate
to handle dates and times.
-
The
sits
package in built on top of three data types: time series tibble, data cubes and models. Mostsits
functions have one or more of these types as inputs and one of them as return values. -
The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class
sits
. -
The
cube
data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes insits
are organized by tiles. A tile is an element of a satellite's mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. Acube
is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column namedfile_info
; this column contains a list that stores a tibble -
The
cube
data type is specialised inraster_cube
(ARD images),vector_cube
(ARD cube with segmentation vectors).probs_cube
(probabilities produced by classification algorithms on raster data),probs_vector_cube
(probabilites generated by vector classification of segments),uncertainty_cube
(cubes with uncertainty information), andclass_cube
(labelled maps). See the code insits_plot.R
as an example of specialisation ofplot
to handle different classes of raster data. -
All ML/DL models in
sits
which are the result ofsits_train
belong to theml_model
class. In addition, models are assigned a second class, which is unique to ML models (e.g,rfor_model
,svm_model
) and generic for all DLtorch
based models (torch_model
). The class information is used for plotting models and for establishing if a model can run on GPUs.
-
The internal
sits
code has no literal values, which are all stored in the YAML configuration files./inst/extdata/config.yml
and./inst/extdata/config_internals.yml
. The first file contains configuration parameters that are relevant to users, related to visualisation and plotting; the second contains parameters that are relevant only for developers. These values are accessible using the.conf
function. For example, the value of the default size for leaflet objects (64 MB) is accessed using the command.conf["view", "leaflet_megabytes"]
. -
Error messages are also stored outside of the code in the YAML configuration file
./inst/extdata/config_messages.yml
. These values are accessible using the.conf
function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function.conf("messages", ".check_na_parameter")
. -
Color handling in
sits
is described in the Technical Annex section "How colors work in sits". The legends and colors available by default are described in the YAML file./inst/extdata/config_colors.yml
.
-
If you want to include a STAC-based catalogue not yet supported by
sits
, we encourage you to look at existing implementations of catalogues such as Microsoft Planetary Computer (MPC), Digital Earth Africa (DEA) and AWS. -
STAC-based catalogues in
sits
are associated to YAML description files, which are available in the directory.inst/exdata/sources
. For example, the YAML fileconfig_source_mpc.yml
describes the contents of the MPC collections supported bysits
. Please first provide an YAML file which lists the detailed contents of the new catalogue you wish to include. Follow the examples provided. -
After writing the YAML file, you need to consider how to access and query the new catalogue. The entry point for access to all catalogues is the
sits_cube.stac_cube()
function, which in turn calls a sequence of functions which are described in the generic interfaceapi_source.R
. Most calls of this API are handled by the functions ofapi_source_stac.R
which provides an interface to therstac
package and handles STAC queries. -
Each STAC catalogue is different. The STAC specification allows providers to implement their data descriptions with specific information. For this reason, the generic API described in
api_source.R
needs to be specialized for each provider. Whenever a provider needs specific implementations of parts of the STAC protocol, we include them in separate files. For example,api_source_mpc.R
implements specific quirks of the MPC platform. Similarly, specific support for CDSE (Copernicus Data Space Environment) is available inapi_source_cdse.R
.
-
In general terms, ML/DL algorithms in
sits
are encapsulated as closures which are the output of thesits_train()
function. In line with the established practices in R, each closure contains a function that classifies input values, as well as information on the samples used to train the model. -
Please read the Technical Annex to the
sits
book. It describes how include a new ML method, in this case thelightGBM
algorithm. Follow those guidelines to include a new ML algorithm. -
If you aim to include a
torch
based deep learning method, in addition to understanding the concepts presented in the Technical Annex please study carefully the implementation ofsits_tempcnn()
andsits_lighttae()
. -
Bear in mind that your only task is to provide a new function that is compatible with the requirements of ML/DL methods in
sits
. Once the function has been correctly implemented, you will be able to use in connection with the rest ofsits
.
- The roadmap for
sits
is included as part of the issues tracker. Issues created by the developers are assigned to milestones. Each milestone corresponds to an expected new version ofsits
to be released in CRAN.