Skip to content

Latest commit

 

History

History
341 lines (262 loc) · 23.7 KB

common-metadata.md

File metadata and controls

341 lines (262 loc) · 23.7 KB

STAC Common Metadata

This document outlines commonly used fields in STAC. They are often used in STAC Item properties, but can also be used in other places, e.g. an Item Asset or Collection Asset.

Various examples are available in the folder examples. JSON Schemas can be found in the folder json-schema.

Implementation of any of the fields is not required, unless explicitly required by a specification using the field. For example, datetime is required in STAC Items.

Basics

Descriptive fields to give a basic overview of a STAC entity (e.g. Catalog, Collection, Item, Asset).

Field Name Type Description
title string A human readable title describing the STAC entity.
description string Detailed multi-line description to fully explain the STAC entity. CommonMark 0.29 syntax MAY be used for rich text representation.
keywords [string] List of keywords describing the STAC entity.
roles [string] The semantic roles of the entity, e.g. for assets, links, providers, bands, etc.

Date and Time

Fields to provide additional temporal information such as ranges with a start and an end datetime stamp.

Field Name Type Description
datetime string|null See the Item Specification Fields for more information.
created string Creation date and time of the corresponding STAC entity or Asset (see below), in UTC.
updated string Date and time the corresponding STAC entity or Asset (see below) was updated last, in UTC.

All timestamps MUST be formatted according to RFC 3339, section 5.6.

created and updated have different meaning depending on where they are used. If those fields are available in a Collection, in a Catalog (both top-level), or in a Item (in the properties), the fields refer the metadata (e.g., when the STAC metadata was created). Having those fields in the Assets or Links, they refer to the actual data linked to (e.g., when the asset was created).

NOTE: There are more date and time related fields available in the Timestamps extension.

Date and Time Range

While a STAC entity (e.g. an Item) can have a nominal datetime describing the capture, these properties allow a STAC entity to have a range of capture dates and times. An example of this is the MODIS 16 day vegetation index product.

Important: Using one of the fields REQUIRES inclusion of the other field as well to enable a user to search STAC records by the provided times. So if you use start_datetime you need to add end_datetime and vice-versa. Both fields are also REQUIRED if the datetime field is set to null. The datetime property in a STAC Item and these fields are not mutually exclusive.

Field Name Type Description
start_datetime string The first or start date and time for the resource, in UTC. It is formatted as date-time according to RFC 3339, section 5.6.
end_datetime string The last or end date and time for the resource, in UTC. It is formatted as date-time according to RFC 3339, section 5.6.

start_datetime and end_datetime constitute inclusive bounds, meaning that the range covers the entire time interval between the two timestamps and the timestamps itself.

Licensing

Information about the license(s) of the data, which is not necessarily the same license that applies to the metadata. Licensing information should be defined at the Collection level if possible.

Field Name Type Description
license string License(s) of the data as SPDX License identifier, SPDX License expression, or other (see below).

license: License(s) of the data that the STAC entity provides.

The license(s) can be provided as:

  1. SPDX License identifier
  2. SPDX License expression
  3. String with the value other if the license is not on the SPDX license list. The strings various and proprietary are deprecated.

If the license is not an SPDX license identifier, links to the license texts SHOULD be added. The links MUST use the license link relation type. If there is no public license URL available, it is RECOMMENDED to supplement the STAC Item with the license text in a separate file and link to this file. If no link to a license is included and the license field is set to other (or one of the deprecated values), the data is private, and consumers have not been granted any explicit right to use it.

License relation

Type Description
license The license URL(s) for the resource SHOULD be specified if the license field is not a SPDX license identifier.

Provider

Information about the organizations capturing, producing, processing, hosting or publishing this data. Provider information should be defined at the Collection level if possible.

Field Name Type Description
providers [Provider Object] A list of providers, which may include all organizations capturing or processing the data or the hosting provider. Providers should be listed in chronological order with the most recent provider being the last element of the list.

Provider Object

The object provides information about a provider. A provider is any of the organizations that captures or processes the content of the assets and therefore influences the data offered by the STAC implementation. May also include information about the final storage provider hosting the data.

Field Name Type Description
name string REQUIRED. The name of the organization or the individual.
description string Multi-line description to add further provider information such as processing details for processors and producers, hosting details for hosts or basic contact information. CommonMark 0.29 syntax MAY be used for rich text representation.
roles [string] Roles of the provider. Any of licensor, producer, processor or host.
url string Homepage on which the provider describes the dataset and publishes contact information.

roles

The provider's role(s) can be one or more of the following elements:

  • licensor: The organization that is licensing the dataset under the license specified in the Collection's license field.
  • producer: The producer of the data is the provider that initially captured and processed the source data, e.g. ESA for Sentinel-2 data.
  • processor: A processor is any provider who processed data to a derived product.
  • host: The host is the actual provider offering the data on their storage. There should be no more than one host, specified as the last element of the provider list.

Instrument

Adds metadata specifying a platform and instrument used in a data collection mission. These fields will often be combined with domain-specific extensions that describe the actual data, such as the eo or sar extensions.

Field Name Type Description
platform string Unique name of the specific platform to which the instrument is attached.
instruments [string] Name of instrument or sensor used (e.g., MODIS, ASTER, OLI, Canon F-1).
constellation string Name of the constellation to which the platform belongs.
mission string Name of the mission for which data is collected.
gsd number Ground Sample Distance at the sensor, in meters (m), must be greater than 0.

platform

The unique name of the specific platform the instrument is attached to. For satellites this would be the name of the satellite, whereas for drones this would be a unique name for the drone. Examples include landsat-8 (Landsat-8), sentinel-2a and sentinel-2b (Sentinel-2), terra and aqua (part of NASA EOS, carrying the MODIS instruments), mycorp-uav-034 (hypothetical drone name), and worldview02 (Maxar/DigitalGlobe WorldView-2).

instruments

An array of all the sensors used in the creation of the data. For example, data from the Landsat-8 platform is collected with the OLI sensor as well as the TIRS sensor, but the data is distributed together so would be specified as ['oli', 'tirs']. Other instrument examples include msi (Sentinel-2), aster (Terra), and modis (Terra and Aqua), c-sar (Sentinel-1) and asar (Envisat).

constellation

The name of a logical collection of one or more platforms that have similar payloads and have their orbits arranged in a way to increase the temporal resolution of acquisitions of data with similar geometric and radiometric characteristics. This field allows users to search for related data sets without the need to specify which specific platform the data came from, for example, from either of the Sentinel-2 satellites. Examples include landsat-8 (Landsat-8, a constellation consisting of a single platform), sentinel-2 (Sentinel-2), rapideye (operated by Planet Labs), and modis (NASA EOS satellites Aqua and Terra). In the case of modis, this is technically referring to a pair of sensors on two different satellites, whose data is combined into a series of related products. Additionally, the Aqua satellite is technically part of the A-Train constellation and Terra is not part of a constellation, but these are combined to form the logical collection referred to as MODIS.

mission

The name of the mission or campaign for collecting data. This could be a discrete set of data collections over a period of time (such as collecting drone imagery), or could be a set of tasks of related tasks from a satellite data collection.

gsd

The nominal Ground Sample Distance for the data, as measured in meters on the ground. There are many definitions of GSD. The value of this field should be related to the spatial resolution at the sensor, rather than the pixel size of images after orthorectification, pansharpening, or scaling. The GSD of a sensor can vary depending on geometry (off-nadir / grazing angle) and wavelength, so it is at the discretion of the implementer to decide which value most accurately represents the GSD. For example, Landsat8 optical and short-wave IR bands are all 30 meters, but the panchromatic band is 15 meters. The gsd should be 30 meters in this case because that is the nominal spatial resolution at the sensor. The Planet PlanetScope Ortho Tile Product has an gsd of 3.7 (or 4 if rounding), even though the pixel size of the images is 3.125. For example, one might choose for WorldView-2 the Multispectral 20° off-nadir value of 2.07 and for WorldView-3 the Multispectral 20° off-nadir value of 1.38.

Bands

Field Name Type Description
bands [Band Object] An array of available bands where each object is a Band Object.

The bands array is used to describe the available bands in a STAC entity or Asset. This field describes the general construct of a band or layer, which doesn't necessarily need to be a spectral band. By adding fields from extensions you can indicate that a band, for example, is

Please refer to the Bands best practices for more details.

Note

This property is the successor of the eo:bands and raster:bands fields, which has been present in previous versions of these extensions. The behavior is very similar and they can be migrated easily. Usually, you can simply merge each object on a by-index basis. Nevertheless, you should consider deduplicating properties with the same values across all bands to the asset level (see the best practices). For some fields, you need to add the extension prefix of the eo or raster extension to the property name though. See the Band migration best practice for details.

Band Object

Specifically defined for the Band Object is just a single property name, which serves as a unique identifier. You can add additional fields from the common metadata such as a description or the value-related properties.

Field Name Type Description
name string The name of the band (e.g., "B01", "B8", "band2", "red"), which should be unique across all bands defined in the list of bands. This is typically the name the data provider uses for the band.
description string Description to fully explain the band. CommonMark 0.29 syntax MAY be used for rich text representation.

A Band Object must contain at least one property, which is not necessarily one of the properties defined here and can be a property from an extension or common metadata.

Data Values

Adds metadata about the data values or measurement values contained in the entity that is described by the object these fields get added to (e.g., an asset or a band). These fields will often be combined with extensions that group data values into a "unit" or "chunk", e.g., a band or layer in a file (raster and eo extensions), a column in a table (table extension), or dimensions in a datacube (datacube extension).

Field Name Type Description
nodata number|string Value used to identify no-data, see below.
data_type string The data type of the values, see below.
statistics Statistics Object Statistics of all the values.
unit string Unit of measurement of the value, see below.

No-data

The no-data value must be provided either as:

  • a number
  • a string:
    • nan - NaN (not a number) as defined in IEEE-754
    • inf - Positive Infinity
    • -inf - Negative Infinity

Units

It is STRONGLY RECOMMENDED to provide units in one of the following two formats:

  • UCUM: The unit code that is compliant to the UCUM specification.
  • UDUNITS-2: The unit symbol if available, otherwise the singular unit name.

Statistics Object

Statistics usually specify the range of values by providing the minimum and maximum values, but can optionally be accompanied by additional statistical values. Some additional statistical sizes are listed below, but the object can also be extended with other statistical sizes that are not listed below. For example, it could list additional coverages such as vegetation cover, land cover, etc. If statistics are provided in the Item Properties (example), it is recommended to list the statistical sizes with a JSON Schema in the Collection Summaries to better describe the sizes (example). Please note that some statistical sizes such as cloud cover have explicit fields in other extensions such as the EO extension. It is recommended to use the fields standardized in extensions in favor of providing them in the Statistics Object.

Field Name Type Description
minimum number minimum value of the values in the band. If not present, the minimum value of the given data type or negative infinity can be assumed.
maximum number maximum value of the values in the band. If not present, the maximum value of the given data type or positive infinity can be assumed.
mean number mean value of all the values in the band
stddev number standard deviation value of the values in the band
count integer Total number of all data values (>= 0)
valid_percent number Percentage of valid (not nodata) values (0-100)

Data Types

The data type gives information about the values. This can be used to indicate the (maximum) range of numerical values expected. For example uint8 indicates that the numbers are in a range between 0 and 255, they can never be smaller or larger. This can help to pick the optimal numerical data type when reading the files to keep memory consumption low. Nevertheless, it doesn't necessarily mean that the expected values fill the whole range. For example, there can be use cases for uint8 that just use the numbers 0 to 10 for example. Through the Statistics Object it is possible to specify an exact value range so that visualizations can be optimized. The allowed values for data_type are:

  • int8: 8-bit integer
  • int16: 16-bit integer
  • int32: 32-bit integer
  • int64: 64-bit integer
  • uint8: unsigned 8-bit integer (common for 8-bit RGB PNG's)
  • uint16: unsigned 16-bit integer
  • uint32: unsigned 32-bit integer
  • uint64: unsigned 64-bit integer
  • float16: 16-bit float
  • float32: 32-bit float
  • float64: 64-big float
  • cint16: 16-bit complex integer
  • cint32: 32-bit complex integer
  • cfloat32: 32-bit complex float
  • cfloat64: 64-bit complex float
  • other: Other data type than the ones listed above (e.g. boolean, string, higher precision numbers)