Intertidal Classification of Europe: Categorising Reflectance of Emerged Areas of Marine vegetation using Sentinel-2 (ICE CREAMS)
Here we present the Neural Network Model produced during the BiCOME project to identify Intertidal habitats using Sentinel-2. The project was part of a wider European Space Agency funded work, with three studies that form part of the European Space Agency’s ‘Biodiversity+ Precursors’ with Terrestrial (EO4DIVERSITY), Freshwater (BIOMONDO) and Coastal ecosystems BiCOME project.
This repository contains the scripts to train and apply the ICE CREAMS model to Sentinel-2 imagery. The training data for the model can be downloaded from DOI: 10.6084/m9.figshare.26069293, then Sentinel-2 imagery should be downloaded from any source in the .SAFE format. This model assumes cloud free, low tide (totally emerged) intertidal areas (currently validated in Europe).
High classification accuracy at Sentinel-2’s spectral resolution has previously been shown for Class level inter-tidal habitats Davies et al., 2023, so data were labelled at the Class level for vegetated habitats alongside other non-vegetated habitat types. Pixels were labelled into 9 classes: Bare Sand, Bare Mud, Ulvophyceae (green macroalgae), Magnoliopsida (seagrass), Microphytobenthos (unicellular photosynthetic eukaryotes and cyanobacteria forming biofilms at the sediments surface during low tide), Mixed-Rocks with associated Phaeophyceae (brown macroalgae), Rhodophyceae (red macroalgae), Xanthophyceae (yellow-green macroalgae) and Water. Due to the heterogeneous nature of inter-tidal habitats, both spatially and temporally, labelled data need to align spatially and temporally to available Sentinel-2 imagery. Therefore, training data were collated across a range of methods to account for this difference in spatial and temporal variability of habitats. For classes that show greater variability in their spatial extent over time: drone imagery derived data were used. For classes that show spatial fidelity over time: additional data were collected, alongside drone acquisition, through visual inspection of Sentinel-2 imagery.
To adequately cover the expected spectral variability of inter-tidal habitat classes that occur across the North-East Atlantic coast, drone imagery was taken from multiple sites in Western Europe (Auray Estuary, Morbihan Gulf, Bourgneuf Bay and Ria de Aveiro Coastal Lagoon). Drone imagery were acquired at two different flight altitudes (12 and 120 m) meaning pixel sizes were either 8 or 80 mm, allowing the classification of habitats at high spatial resolution. In total these drone images covered over 4 km$^2$ of inter-tidal habitats.
To increase the balance between classes, pixels of some classes such as bare muds and sands, sediments containing high abundances of microphytobenthos, as well as hard substrates covered by vegetation, were added to the training dataset. These pixels were selected through visual inspection of spectral signatures, true colour RGB and false colour imagery derived from Sentinel-2 accessed and visualised through the Copernicus data portal.
All labelled data were aggregated (majority class) to the 10 m resolution of Sentinel-2, then all Level-2A Sentinel-2 A/B images that coincided spatially and temporally (+/- 15 days) with these labelled were downloaded from the Copernicus data portal. Level-2A data have already been atmospherically corrected using the Sen2Cor processing algorithm, and are distributed as bottom-of-atmosphere (BOA) reflectance. Manual inspection of RGB true colour was used to select cloud free and low tide Sentinel-2 images to remove any unusable images.
All 12 bands of Sentinel-2 were resampled to 10 m resolution, and standardised following a Min-Max Standardisation. Furthermore, Normalised Difference Vegetation Index (NDVI) and Normalised Difference Water Index (NDWI) were calculated for each pixel from the BOA Sentinel-2 reflectance values:
with
Labelled pixels consisting of all 26 features were used to train a deep learning neural network tabular learner from the FastAI framework in Python v3. The model consisted of 2 hidden layers with 26,761 trainable parameters and was fine-tuned across 20 epochs to minimise cross entropy loss using the ADAptive Moment estimation (ADAM) optimiser. The final within-sample error rate was 0.0365. The ICE CREAMS model provided a classification for each pixel, based on the greatest probability class.
To ensure validation of the ICE CREAMS model was independent of model
building, several methods were employed to generate validation data.
Field campaigns were carried out by taking geo-located photo quadrats.
These photo quadrats were taken within the Tagus Estuary and Ria de
Aveiro Coastal Lagoon (Portugal GBIF Record: Davies et al.,
2023),
and Bourgneuf Bay and Ria D’Etel (France). Further validation data were
collected through Red Green Blue (RGB) drone imagery, taken within two
estuaries in the UK (Tamar and Kingsbridge) and a bay in Spain (Cádiz).
As with training data, labelled validation data were aggregated
(majority class) to the 10 m resolution of Sentinel-2, then all Level-2A
Sentinel-2 A/B images that coincided spatially and temporally (+/- 15
days) with these labelled were downloaded from the Copernicus data
portal. The ICE CREAMS model was applied to these Sentinel-2 images that
aligned spatially and temporally with the validation data. The model
predictions were then compared to the validation data labels. Global
model accuracy (
where