From 6bfc1ffb8417ef839345569c8889442dfba96511 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Thu, 12 Sep 2024 11:10:49 -0700 Subject: [PATCH 01/10] Adds/updates 00_Introduction_Setup --- .../00_Getting_NASA_EarthData_Credentials.md | 22 ++++++ .../01_Using_the_2i2c_Hub.md | 77 +++++++++++++++++++ .../02_Introduction_to_Open_Science.md} | 25 ++++-- book/00_Introduction_Setup/make_netrc.py | 22 ++++++ book/00_Introduction_Setup/test_netrc.py | 36 +++++++++ .../slides}/Open_Science_Intro_Slides.md | 0 6 files changed, 175 insertions(+), 7 deletions(-) create mode 100644 book/00_Introduction_Setup/00_Getting_NASA_EarthData_Credentials.md create mode 100644 book/00_Introduction_Setup/01_Using_the_2i2c_Hub.md rename book/{01_Open_Science/Open_Science_Intro.md => 00_Introduction_Setup/02_Introduction_to_Open_Science.md} (88%) create mode 100644 book/00_Introduction_Setup/make_netrc.py create mode 100644 book/00_Introduction_Setup/test_netrc.py rename book/{01_Open_Science => 09_Scratch/slides}/Open_Science_Intro_Slides.md (100%) diff --git a/book/00_Introduction_Setup/00_Getting_NASA_EarthData_Credentials.md b/book/00_Introduction_Setup/00_Getting_NASA_EarthData_Credentials.md new file mode 100644 index 0000000..5e7a95c --- /dev/null +++ b/book/00_Introduction_Setup/00_Getting_NASA_EarthData_Credentials.md @@ -0,0 +1,22 @@ +# Getting NASA Earthdata Credentials + + +This notebook lays out the process to obtain for NASA Earthdata credentials. + +**You can complete this step before the day of the actual tutorial.** + + +## Brief Introduction + + +The [NASA Earth Science Data Systems (ESDS) program oversees the lifecycle of Earth science data from all its Earth observation missions, from acquisition to processing and distribution. + +For the purposes of this guide, the NASA Earthdata website is the entry point that allows full, free and open access to NASA's Earth science data collections, in order to accelerate scientific progress for the benefit of society. To access the data through this portal, users must first define their access credentials. To create an EarthData account, follow these steps: + ++ Go to the Earth Nasa website: [`https://www.earthdata.nasa.gov/`](https://www.earthdata.nasa.gov/). Then, select the menu options "*Use Data*" and then "*Register*". Finally, navigate to [`https://urs.earthdata.nasa.gov/`](https://urs.earthdata.nasa.gov/). + +![earthdata_login](../assets/earthdata_login.png) + ++ Select the "*Register for a profile*" option, there choose a username and password. You will need these later, so choose ones that you remember well. You will also need to complete your profile to complete the registration, where you will be asked for information such as email, country, and affiliation. Finally, choose "*Register for Earthdata Login*". + +![earthdata_profile](../assets/earthdata_profile2.png) diff --git a/book/00_Introduction_Setup/01_Using_the_2i2c_Hub.md b/book/00_Introduction_Setup/01_Using_the_2i2c_Hub.md new file mode 100644 index 0000000..3d50b08 --- /dev/null +++ b/book/00_Introduction_Setup/01_Using_the_2i2c_Hub.md @@ -0,0 +1,77 @@ +# Using the 2i2c Hub + + +This notebook lays out instructions to log into the cloud infrastructure ([JupyterHub](https://jupyter.org/hub)) provided by [2i2c](https://2i2c.org) for this tutorial. + +**You won't be able to complete this step until the actual day of the tutorial (you'll get the password then).** + + +## 1. Logging into the 2i2c Hub + + +To login to the JupyterHub provided by 2i2c, follow these steps: + +1. **Navigate to the 2i2c Hub:** Point your web browser to [this link](https://climaterisk.opensci.2i2c.cloud). + +2. **Log in with your Credentials:** + + + **Username:** Feel free to choose any username you like. We suggest using your GitHub username to avoid conflicts. + + **Password:** *You'll receive the password on the day of the tutorial*. + +![2i2c_login](../assets/2i2c_login.png) + +3. **Logging In:** + +The login process might take a few minutes, especially if a new virtual workspace needs to be created just for you. + +![start_server2](../assets/start_server_2i2c.png) + +* **What to Expect:** + +By default, logging into [`https://climaterisk.opensci.2i2c.cloud`](https://climaterisk.opensci.2i2c.cloud) automatically clones a repository to work in. If the login is successful, you will see the following screen and are ready to start working. + +![work_environment_jupyter_lab](../assets/work_environment_jupyter_lab.png) + +**Notes:** Any files you work on will persist between sessions as long as you use the same username when logging in. + +## 2. Configuring the Cloud Environment to Access NASA EarthData from Python + +To access NASA's EarthData products from Python programs or Jupyter notebooks, it is necessary to save your NASA EarthData credentials in a special file called `.netrc` in your home directory. + ++ You can create this file by executing the script `make_netrc.py` in a terminal: + ```bash + $ python make_netrc.py + ``` + You can also choose to execute this script within this Jupyter notebook by executing the Python cell below (using the `%run` magic). + + Some caveats: + + The script won't execute if `~/.netrc` exists already. You can delete that file or rename it if you want to preserve the credentials within. + + The script prompts for your NASA EarthData username & password, so watch for the prompt if you execute it from a Jupyter notebook. + +```bash +%run make_netrc.py +``` + ++ Alternatively, you can create a file called `.netrc` in your home folder (i.e., `~/.netrc`) with content as follows: + ``` + machine urs.earthdata.nasa.gov login USERNAME password PASSWORD + ``` + Of course, you would replace `USERNAME` and `PASSWORD` in your `.netrc` file with your actual NASA EarthData account details. Once the `.netrc` file is saved with your correct credentials, it's good practice to restrict access to the file: + ```bash + $ chmod 600 ~/.netrc + ``` + + +## 3. Verifying Access to NASA EarthData Products + +To make sure everything is working properly, execute the script `test_netrc.py`: +```bash +$ python test_netrc.py +``` +Again, you can execute this directly from this notebook using the Python cell below: + +```bash +%run test_netrc.py +``` + +If that worked smoothly, you're done! You now have everything you need to explore NASA's Earth observation data through the EarthData portal! diff --git a/book/01_Open_Science/Open_Science_Intro.md b/book/00_Introduction_Setup/02_Introduction_to_Open_Science.md similarity index 88% rename from book/01_Open_Science/Open_Science_Intro.md rename to book/00_Introduction_Setup/02_Introduction_to_Open_Science.md index a9b7fb8..07bb238 100644 --- a/book/01_Open_Science/Open_Science_Intro.md +++ b/book/00_Introduction_Setup/02_Introduction_to_Open_Science.md @@ -7,37 +7,44 @@ jupyter: format_version: '1.3' jupytext_version: 1.16.2 kernelspec: - display_name: base + display_name: Python 3 (ipykernel) language: python name: python3 --- # About this tutorial + This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all. We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing. ## What is Open Science + "Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity." - ![](../assets/image165.png) + ### Availability of Open Science Resources: + - Many existing open science resources, over 100 Petabytes of openly available NASA data. - Tools and practices for collaboration and code development. + ### Outputs and Project Openness: + - Choice between openness from project inception or at publication. - Making data, code, and results open. + ### Importance of Sharing and Impact: + - Enhances the discoverability and accessibility of scientific processes and outputs. - Open methods enhance reproducibility. - Transparency and verifiability enhance accuracy. @@ -48,43 +55,50 @@ We aim to equip learners with the skills to analyze, visualize, and report on da ![](../assets/image377.jpg) - ## Why now + - The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online. - Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation. - Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process. + - ## Where to start: Open Research Products + Scientific knowledge, or research products, take the form of: ![](../assets/image5.png) + ### What is data? + Scientifically or technically relevant information that can be stored digitally and accessed electronically such as: - Information produced by missions and experiments, including calibrations, coefficients, and documentation. - Information needed to validate scientific conclusions of peer-reviewed publications. - Metadata. + ### What is code? + - General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software. - Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services. - Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement. - Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data. - Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding. - Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way. + ### What are results? + Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products: - Peer-reviewed publications @@ -99,7 +113,4 @@ Products are created throughout the scientific process that are needed to enable ![](../assets/image7.jpeg) - - - diff --git a/book/00_Introduction_Setup/make_netrc.py b/book/00_Introduction_Setup/make_netrc.py new file mode 100644 index 0000000..fa19178 --- /dev/null +++ b/book/00_Introduction_Setup/make_netrc.py @@ -0,0 +1,22 @@ +# This script generates .netrc file, prompting user for EarthData credentials. +# Delete or rename ~/.netrc before executing this script (it won't overwrite a +# pre-existing ~/.netrc file). + +from getpass import getpass +from pathlib import Path +import sys + +NETRC_PATH = Path('~/.netrc').expanduser() +TEMPLATE = " ".join(["machine", "urs.earthdata.nasa.gov", "login", + "{USERNAME}", "password", "{PASSWORD}\n"]) + +if NETRC_PATH.exists(): + print("Warning: ~/.netrc exists already (this script won't overwrite).") + print(" Delete ~/.netrc first or back up to avoid losing credentials.") + sys.exit(1) + +username = input("NASA EarthData login: ") +password = getpass(prompt="NASA EarthData password: ") + +NETRC_PATH.write_text(TEMPLATE.format(USERNAME=username, PASSWORD=password)) +NETRC_PATH.chmod(0o600) diff --git a/book/00_Introduction_Setup/test_netrc.py b/book/00_Introduction_Setup/test_netrc.py new file mode 100644 index 0000000..0acd34e --- /dev/null +++ b/book/00_Introduction_Setup/test_netrc.py @@ -0,0 +1,36 @@ +# Minimal test to verify NASA Earthdata credentials for downloading data products. +# Requires a .netrc file in home directory containing valid credentials. + +import osgeo.gdal, rasterio +from pystac_client import Client +from warnings import filterwarnings +filterwarnings("ignore") # suppress PySTAC warnings + +# Mandatory GDAL setup for accessing cloud data +osgeo.gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/.gdal_cookies.txt') +osgeo.gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/.gdal_cookies.txt') +osgeo.gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR') +osgeo.gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') + +# Define AOI (Area-Of-Interest) & time-window +livingston_tx, delta = (-95.09, 30.69), 0.1 +AOI = tuple(coord + sgn*delta for sgn in (-1,+1) for coord in livingston_tx) +start, stop = '2024-04-30', '2024-05-31' +WINDOW = f'{start}/{stop}' + +# Prepare PySTAC client +STAC_URL, COLLECTIONS = 'https://cmr.earthdata.nasa.gov/stac', ["OPERA_L3_DSWX-HLS_V1"] +catalog = Client.open(f'{STAC_URL}/POCLOUD/') + +print("Testing PySTAC search...") +opts = dict(bbox=AOI, collections=COLLECTIONS, datetime=WINDOW) +results = list(catalog.search(**opts).items_as_dicts()) +test_uri = results[0]['assets']['0_B01_WTR']['href'] + +try: + print(f"Search successful, accessing test data...") + with rasterio.open(test_uri) as ds: _ = ds.profile + print("Success! Your credentials file is correctly configured!") +except: + print(f"Could not access NASA EarthData asset: {test_uri}") + print("Ensure that a .netrc file containing valid NASA Earthdata credentials exists in the user home directory.") diff --git a/book/01_Open_Science/Open_Science_Intro_Slides.md b/book/09_Scratch/slides/Open_Science_Intro_Slides.md similarity index 100% rename from book/01_Open_Science/Open_Science_Intro_Slides.md rename to book/09_Scratch/slides/Open_Science_Intro_Slides.md From db2dffe6d991a60ba03a684de86991b68a476f77 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Fri, 13 Sep 2024 06:40:53 -0700 Subject: [PATCH 02/10] Adds/updates 01_Geospatial_Background --- .gitignore | 4 +- .../01_Coordinate_systems.md | 48 +++++++++++++ .../02_Geospatial_data.md | 59 ++++++++++++++++ .../03_Geospatial_data_formats.md | 68 +++++++++++++++++++ 4 files changed, 177 insertions(+), 2 deletions(-) create mode 100644 book/01_Geospatial_Background/01_Coordinate_systems.md create mode 100644 book/01_Geospatial_Background/02_Geospatial_data.md create mode 100644 book/01_Geospatial_Background/03_Geospatial_data_formats.md diff --git a/.gitignore b/.gitignore index 1813055..9cbff58 100644 --- a/.gitignore +++ b/.gitignore @@ -15,8 +15,8 @@ venv.bak/ .netrc # Jupyter Notebook -.ipynb_checkpoints -book/*.ipynb +**/.ipynb_checkpoints +**/*.ipynb # Jupyter Book book/_build diff --git a/book/01_Geospatial_Background/01_Coordinate_systems.md b/book/01_Geospatial_Background/01_Coordinate_systems.md new file mode 100644 index 0000000..baaf95d --- /dev/null +++ b/book/01_Geospatial_Background/01_Coordinate_systems.md @@ -0,0 +1,48 @@ +# Coordinate Reference Systems + + +The following summary was made using: ++ the [QGIS documentation](https://docs.qgis.org/3.34/en/docs/gentle_gis_introduction/coordinate_reference_systems.html) ++ the book [*Sistemas de Información Geográfica*](https://volaya.github.io/libro-sig/) by Victor Olaya + +With the help of coordinate reference systems (CRS), every location on the earth can be specified by a set of two numbers, called *coordinates*. There are two commonly-used categories of CRS: *geographic* coordinate reference systems and *projected* coordinate reference systems (also called *rectangular* coordinate reference systems). + + +## Geographic Coordinate Systems + + +The geographic coordinate system is a spherical coordinate system by which a point is located with two angular values: + +- Lines of **longitude** run perpendicular to the equator and converge at the poles. The reference line for longitude (the prime meridian) runs from the North pole to the South pole through Greenwich, England. Subsequent lines of longitude are measured from zero to 180 degrees East or West of the prime meridian. Note that values West of the prime meridian are assigned negative values for use in digital mapping applications. At the equator, and only at the equator, the distance represented by one line of longitude is equal to the distance represented by one degree of latitude. As you move towards the poles, the distance between lines of longitude becomes progressively less, until, at the exact location of the pole, all 360° of longitude are represented by a single point that you could put your finger on (you probably would want to wear gloves though). + +- Lines of **latitude** run parallel to the equator and divide the earth into 180 equally spaced sections from North to South (or South to North). The reference line for latitude is the equator and each hemisphere is divided into ninety sections, each representing one degree of latitude. In the northern hemisphere, degrees of latitude are measured from zero at the equator to ninety at the north pole. In the southern hemisphere, degrees of latitude are measured from zero at the equator to ninety degrees at the south pole. To simplify the digitization of maps, degrees of latitude in the southern hemisphere are often assigned negative values (0 to -90°). Wherever you are on the earth’s surface, the distance between the lines of latitude is the same (60 nautical miles). + +![geographic_crs](../assets/geographic_crs.png) + +

Geographic coordinate system with lines of latitude parallel to the equator and lines of longitude with the prime meridian through Greenwich. Source: QGIS Documentation. +

+ + +Using the geographic coordinate system, we have a grid of lines dividing the earth into squares that cover approximately 12363.365 square kilometers at the equator — a good start, but not very useful for determining the location of anything within that square. To be truly useful, a map grid must be divided into small enough sections so that they can be used to describe (with an acceptable level of accuracy) the location of a point on the map. To accomplish this, degrees are divided into minutes (') and seconds ("). There are sixty minutes in a degree, and sixty seconds in a minute (3600 seconds in a degree). So, at the equator, one second of latitude or longitude = 30.87624 meters. + + +## Projected coordinate reference systems + + +A [*projected coordinate reference system*](https://en.wikipedia.org/wiki/Projected_coordinate_system) uses a [map projection](https://en.wikipedia.org/wiki/Map_projection) to identify pairs of spatial coordinates $(X,Y)$ with points on the surface of the Earth. These coordinate values are typically referred to as "easting" and "northing" (referring to distances east and north respectively to the origin in some locally flattened $XY$-plane). Unlike angular longitude-latitude pairs, the spatial easting-northing coordinates in a projected coordinate system have units of length (e.g., metres). + +Projected coordinate reference systems in the southern hemisphere (south of the equator) normally assign the origin on the equator at a specific longitude. This means that the $Y$-values increase southwards and the $X$-values increase westwards. In the northern hemisphere (north of the equator) the origin is also the equator at a specific Longitude. However, now the Y-values increase northwards and the X-values increase to the East. + + +### Universal Transverse Mercator (UTM) coordinates + + +The [*Universal Transverse Mercator (UTM)*](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system) coordinate reference system has its origin on the equator at a specific Longitude. Now the $Y$-values increase southwards and the X-values increase to the West. The UTM CRS is a global map projection. This means it is generally used all over the world. To avoid excessive distortion, the world is divided into 60 equal zones that are all 6 degrees wide in longitude from East to West. The UTM zones are numbered 1 to 60, starting at the antimeridian (zone 1 at 180 degrees West longitude) and progressing East back to the antemeridian (zone 60 at 180 degrees East longitude). + +![utm_zones](../assets/utm_zones.png) + +

The Universal Transverse Mercator zones. Source: QGIS Documentation. +

+ + +The position of a coordinate in UTM south of the equator must be indicated with the zone number and with its northing ($Y$) value and easting ($X$) value in meters. The northing value is the distance of the position from the equator in meters. The easting value is the distance from the central meridian (longitude) of the used UTM zone. Furthermore, in UTM zones that are south of the equator, a so-called false northing value of 10,000,000 m to the northing ($Y$) value to avoid negative values. Similarly, in some UTM zones, a false easting value of 500,000 m is added to the easting ($X$) value. diff --git a/book/01_Geospatial_Background/02_Geospatial_data.md b/book/01_Geospatial_Background/02_Geospatial_data.md new file mode 100644 index 0000000..94427b5 --- /dev/null +++ b/book/01_Geospatial_Background/02_Geospatial_data.md @@ -0,0 +1,59 @@ +# Geospatial data + + +## Vector data + + +*Vector data* is split into three types: point, line, and polygon data.^1^ + +![points-lines-polygons-vector-data-types](../assets/points-lines-polygons-vector-data-types.png) + +

This image shows the three vector types: points, lines and polygons. Source: National Ecological Observatory Network. +

+ + +### Point data + + +Point data is most commonly used to represent nonadjacent features and discrete data points. Points have zero dimensions, therefore you can measure neither length or area with this dataset.^1^ + +![points](../assets/points.png) + +

This image shows an example of a spatial entity (left) and its point vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. +

+ + +### Line Data + + +Line data is used to represent linear features. Common examples would be rivers, trails, and streets. Line features only have one dimension and therefore can only be used to measure length. Line features have a starting and ending point. Common examples would be road centerlines and hydrology.^1^ + +![lines](../assets/lines.png) +

This image shows an example of a spatial entity (left) and its line vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. +

+ + +### Polygon data + + +Polygons are used to represent areas such as the boundary of a city (on a large-scale map), lake, or forest. Polygon features are two-dimensional and therefore can be used to measure the area and perimeter of a geographic feature.^1^ +![polygon](../assets/polygon.png) + +

This image shows an example of a spatial entity (left) and its point vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. +

+ + +## Raster data + + +Raster data represents geographic data as a matrix of cells containing an attribute value. While the area of different polygon shapes in a data set can differ, each cell in a raster data set is the same cell. The size of the area in the real world that each cell represents is known as the spatial resolution.^1^ + +![raster_neon](../assets/raster_neon.png) + +

This image shows an example of raster data. Source: National Ecological Observatory Network (NEON). +

+ + +## References + +1. https://www.geographyrealm.com/geodatabases-explored-vector-and-raster-data/ diff --git a/book/01_Geospatial_Background/03_Geospatial_data_formats.md b/book/01_Geospatial_Background/03_Geospatial_data_formats.md new file mode 100644 index 0000000..db5cb49 --- /dev/null +++ b/book/01_Geospatial_Background/03_Geospatial_data_formats.md @@ -0,0 +1,68 @@ +# Geospatial data formats + + +There are numerous standard file formats used for sharing scientific data (e.g., [*HDF*](https://en.wikipedia.org/wiki/Hierarchical_Data_Format), [*Parquet*](https://parquet.apache.org/), [*CSV*](https://en.wikipedia.org/wiki/Comma-separated_values), etc.). Moreover, there are [dozens of file formats](https://www.spatialpost.com/list-common-gis-file-format/) available for [*Geographic Information Systems (GIS)*](https://en.wikipedia.org/wiki/Geographic_information_system) e.g., DRG, [*NetCDF*](https://docs.unidata.ucar.edu/nug/current/), USGS DEM, DXF, DWG, SVG, and so on. + +For this tutorial, we focus on only three specific formats. + + +## GeoTIFF(for raster data) + + +[GeoTIFF](https://en.wikipedia.org/wiki/GeoTIFF) is a public domain metadata standard designed to work within [*TIFF (Tagged Image File Format)*)](https://en.wikipedia.org/wiki/TIFF) files. The GeoTIFF format enables georeferencing information to be embedded as geospatial metadata within image files. GIS applications use GeoTIFFs for aerial photography, satellite imagery, and digitized maps. The GeoTIFF data format is described in detail in the [OGC GeoTIFF Standard](https://www.ogc.org/standard/geotiff/) document. + +A GeoTIFF file extension contains geographic metadata that describes the actual location in space that each pixel in an image represents. In creating a GeoTIFF file, spatial information is included in the `.tif` file as embedded tags, which can include raster image metadata such as: +* horizontal and vertical datums +* spatial extent, i.e. the area that the dataset covers +* the coordinate reference system (CRS) used to store the data +* spatial resolution, measured in the number of independent pixel values per unit length +* the number of layers in the .tif file +* ellipsoids and geoids (i.e., estimated models of the Earth’s shape) +* mathematical rules for map projection to transform data for a three-dimensional space into a two-dimensional display. + + +## Shapefiles (for vector data) + + +The [shapefile](https://en.wikipedia.org/wiki/Shapefile) standard is a digital format for distributing geospatial vector data and its associated attributes. The standard—first developed by [ESRI](https://en.wikipedia.org/wiki/Esri) roughly 30 years ago—is supported by most modern GIS software tools. The name "shapefile" is a bit misleading because a "shapefile" typically consists of a bundle of several files (some mandatory, some optional) stored in a common directory with a common filename prefix. + +From [Wikipedia](https://en.wikipedia.org/wiki/Shapefile): + +> The shapefile format stores the geometry as primitive geometric shapes like points, lines, and polygons. These shapes, together with data attributes that are linked to each shape, create the representation of the geographic data. The term "shapefile" is quite common, but the format consists of a collection of files with a common filename prefix, stored in the same directory. The three mandatory files have filename extensions .shp, .shx, and .dbf. The actual shapefile relates specifically to the .shp file, but alone is incomplete for distribution as the other supporting files are required. Legacy GIS software may expect that the filename prefix be limited to eight characters to conform to the DOS 8.3 filename convention, though modern software applications accept files with longer names. + +Shapefiles use the [*Well-Known Binary (WKB)*](https://libgeos.org/specifications/wkb/) format for encoding geometries. This is a compact tabular format, i.e., row and column numbers ass value is significant. Some minor limitations of this format include the restirction of attribute field names to 10 characters or fewer and poor Unicode support; as a result, text is often abbreviated and encoded in ASCII.^3^ + +You can refer to the [ESRI Shapefile Technical Whitepaper](https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf) to find out more about shapefiles. + + +#### Mandatory files + + +- Main File (`.shp`): shape format, i.e., the spatial vector data (points, lines, and polygons) representing feature geometry. +- Index File (`.shx`): shape index positions (to enable retrieval of feature geometry). +- dBASE File (`.dbf`): standard database file storing attribute format (columnar attributes for each shape in dBase IV format, typically readable by, e.g., Microsoft Access or Excel). + +Records correspond in sequence in each of these files , i.e., attributes in record $k$ of the `dbf` file describe the feature in record $k$ of the associated `shp` file. + + +#### Optional files + + +- Projection File (`.prj`): description of relevant coordinate reference system using a [*well-known text (WKT or WKT-CRS)* representation](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems). +- Extensible Markup Language File (`.xml`): [geospatial metadata](https://en.wikipedia.org/wiki/Geospatial_metadata) in [XML](https://en.wikipedia.org/wiki/XML) format. +- Code Page File (`.cpg`): plain text files to describe the encoding applied to create the shapefile. If your shapefile doesn’t have a .cpg file, then it has the system´s default encoding. +- Spatial Index Files (`.sbn` and `.sbx`): encoded index files to speed up loading times. +- etc. (there are numerous other optional files; see the [whitepaper](https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf)). + + +## GeoJSON (for vector data) + + +[GeoJSON](https://geojson.org/) is a subset of [JSON (JavaScript object notation)](https://www.json.org) developed in the early 2000s to represent simple geographical features together with non-spatial attributes. The core idea is to provide a specification for encoding geospatial data that can be decoded by any JSON decoder. + +![json_example](../assets/json_example.PNG) + +

This image shows an example of a GeoJson file. Source: https://gist.githubusercontent.com/wavded. +

+ +The GIS developers responsible for GeoJSON designed it with the intention that any web developer should be able to write a custom GeoJSON parser, allowing for many possible ways to use geospatial data beyond standard GIS software. The technical details of the GeoJSON format are described in the standards document [RFC7946](https://datatracker.ietf.org/doc/html/rfc7946). From 6ba7eb3a293da076ec50c5b7fb7003ebfa4f0258 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Fri, 13 Sep 2024 06:41:43 -0700 Subject: [PATCH 03/10] Adds/updates 02_Software_Tools --- .../01_Data_Manipulation_Tools.md | 440 ++++++++++++++++++ .../02_Data_Visualization_Tools.md | 364 +++++++++++++++ 2 files changed, 804 insertions(+) create mode 100644 book/02_Software_Tools/01_Data_Manipulation_Tools.md create mode 100644 book/02_Software_Tools/02_Data_Visualization_Tools.md diff --git a/book/02_Software_Tools/01_Data_Manipulation_Tools.md b/book/02_Software_Tools/01_Data_Manipulation_Tools.md new file mode 100644 index 0000000..4cbe369 --- /dev/null +++ b/book/02_Software_Tools/01_Data_Manipulation_Tools.md @@ -0,0 +1,440 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.2 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Data Manipulation Tools + + +In this tutorial, we'll make use of a number of Python libraries to work with geospatial data. There are numerous ways to work with data and so choosing tooling can be difficult. The principal library we'll be using is [*Xarray*](https://docs.xarray.dev/en/stable/index.html) for its `DataArray` and `Dataset` data structures and associated utilities as well as [NumPy](https://numpy.org) and [Pandas](https://pandas.pydata.org) for manipulating homogeneous numerical arrays and tabular data respectively. We'll also make use of [Rasterio](https://rasterio.readthedocs.io/en/stable) as a tool for reading data or meta-data from GeoTIFF files; judicious use of Rasterio can make a big difference when working with remote files in the cloud. + + +```python jupyter={"source_hidden": true} +from warnings import filterwarnings +filterwarnings('ignore') + +from pathlib import Path +import xarray as xr +import numpy as np +import pandas as pd +import rioxarray as rio +import rasterio +``` + +## [rioxarray](https://corteva.github.io/rioxarray/html/index.html) + + ++ `rioxarray` is a package to *extend* Xarray ++ Primary use within this tutorial: + + `rioxarray.open_rasterio` enables loading GeoTIFF files directly into Xarray `DataArray` structures + + `xarray.DataArray.rio` extension provides useful utilities (e.g., for specifying CRS information) + +To get used to working with GeoTIFF files, we'll use a specific example in this notebook. We'll explain more about what kind of data is contained within the file later; for now, we want to get used to the tools we'll use to load such data throughout the tutorial. + + +### Loading files into a DataArray + + +Observe first that `open_rasterio` works on local file paths and remote URLs. ++ Predictably, local access is faster than remote access. + + +```python jupyter={"source_hidden": true} +%%time +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif' +da = rio.open_rasterio(LOCAL_PATH) +``` + +```python jupyter={"source_hidden": true} +%%time +REMOTE_URL ='https://opera-provisional-products.s3.us-west-2.amazonaws.com/DIST/DIST_HLS/WG/DIST-ALERT/McKinney_Wildfire/OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1/OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif' +da_remote = rio.open_rasterio(REMOTE_URL) +``` + + +This next operation compares elements of an Xarray `DataArray` elementwise (the use of the `.all` method is similar to what we would do to compare NumPy arrays). This is generally not an advisable way to compare arrays, series, dataframes, or other large data structures that contain many elements. However, in this particular instance, because the two data structures have been read from the same file stored in two different locations, element-by-element comparison in memory confirms that the data loaded from two different sources is identical in every bit. + + +```python +(da_remote == da).all() # Verify that the data is identical from both sources +``` + +## [Xarray](https://docs.xarray.dev/en/stable/index.html) + + +Let's examine the data structure loaded above from the file `LOCAL_PATH`. + + +### Examining the rich DataArray repr + + +Observe, in this notebook, the `repr` for an Xarray `DataArray` can be interactively examined. ++ The output cell contains a rich Jupyter notebook `repr` for the `DataArray` class. ++ The triangles next to the "Coordinates", "Indexes", and "Attributes" headings can be clicked with a mouse to reveal an expanded view. + + +```python jupyter={"source_hidden": true} +print(f'{type(da)=}\n') +da +``` + +### Examining DataArray attributes programmatically + + +Of course, while this graphical view is handy, it is also possible to access various `DataArray` attributes programmatically. This permits us to write progam logic to manipulate `DataArray`s conditionally as needed. For instance: + + +```python jupyter={"source_hidden": true} +print(da.coords) +``` + + +The dimensions `da.dims` are the strings/labels associated with the `DataArray` axes. + + +```python jupyter={"source_hidden": true} +da.dims +``` + + +We can extract the coordinates as one-dimensional (homogeneous) NumPy arrays using the `coords` and the `.values` attributes. + + +```python jupyter={"source_hidden": true} +print(da.coords['x'].values) +``` + + +`data.attrs` is a dictionary containing other meta-data parsed from the GeoTIFF tags (the "Attributes" in the graphical view). Again, this is why `rioxarray` is useful; it is possible to write code that loads data from various fileformats into Xarray `DataArray`s, but this package wraps a lot of the messy code that would, e.g., populate `da.attrs`. + + +```python jupyter={"source_hidden": true} +da.attrs +``` + +### Using the DataArray rio accessor + + +As mentioned, `rioxarray` extends the class `xarray.DataArray` with an *accessor* called `rio`. The `rio` accessor effectively adds a namespace with a variety of attributes. WE can use a Python list comprehension to display those that do not start with an underscore (the so-called "private" and "dunder" methods/attributes). + + +```python jupyter={"source_hidden": true} +# Use a list comprehension to display relevant attributes/methods +[name for name in dir(da.rio) if not name.startswith('_')] +``` + + +The attribute `da.rio.crs` is important for our purposes; it provides access to the coordinate reference system associated with this raster dataset. + + +```python jupyter={"source_hidden": true} +print(type(da.rio.crs)) +print(da.rio.crs) +``` + + +The `.rio.crs` attribute itself is a data structure of class `CRS`. The Python `repr` for this class returns a string like `EPSG:32610`; this number refers to the [EPSG Geodetic Parameter Dataset](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset). + +From [Wikipedia](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset): + +> EPSG Geodetic Parameter Dataset (also EPSG registry) is a public registry of [geodetic datums](https://en.wikipedia.org/wiki/Geodetic_datum), [spatial reference systems](https://en.wikipedia.org/wiki/Spatial_reference_system), [Earth ellipsoids](https://en.wikipedia.org/wiki/Earth_ellipsoid), coordinate transformations and related [units of measurement](https://en.wikipedia.org/wiki/Unit_of_measurement), originated by a member of the [European Petroleum Survey Group](https://en.wikipedia.org/wiki/European_Petroleum_Survey_Group) (EPSG) in 1985. Each entity is assigned an EPSG code between 1024 and 32767, along with a standard machine-readable [well-known text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems) representation. The dataset is maintained by the [IOGP](https://en.wikipedia.org/wiki/International_Association_of_Oil_%26_Gas_Producers) Geomatics Committee. + + +### Manipulating data in a DataArray + + +Given that this data is stored using a particular [Universal Transverse Mercator (UTM)](https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system) CRS, let's relabel the coordinates to reflect this; that is, the coordinate labelled `x` would conventionally be called `easting` and the coordinate called `y` would be called `northing`. + + +```python jupyter={"source_hidden": true} +da = da.rename({'x':'easting', 'y':'northing', 'band':'band'}) +``` + +```python +print(da.coords) +``` + + +Xarray `DataArray`s permits slicing using coordinate values or their corresponding integer positions using the `sel` and `isel` accessors respectively. This is similar to using `.loc` and `.iloc` in Pandas to extract entries from a Pandas `Series` or `DataFrame`. + + +```python jupyter={"source_hidden": true} +da.isel(easting=slice(0,2)) +``` + +```python jupyter={"source_hidden": true} +da.sel(easting=[499995, 500025]) +``` + + +If we take a 2D slice from this 3D `DataArray`, we can plot it using the `.plot` accessor (more on this later). + + +```python jupyter={"source_hidden": true} +da.isel(band=0).plot(); +``` + + +The plot produced is rather dark (reflecting that most of the entries are zero according to the legend). Notice that the axes are labelled automatically using the `coords` we renamed before. + + +### Extracting DataArray data to NumPy, Pandas + + +Finally, recall that a `DataArray` is a wrapper around a NumPy array. That NumPy array can be retrieved using the `.values` attribute. + + +```python jupyter={"source_hidden": true} +array = da.values +print(f'{type(array)=}') +print(f'{array.shape=}') +print(f'{array.dtype=}') +print(f'{array.nbytes=}') +``` + + +This raster data is stored as 8-bit unsigned integer data, so one byte for each pixel. A single unsigned 8-bit integer can represent integer values between 0 and 255. In an array with a bit more than thirteen million elements, that means there are many repeated values. We can see by putting the pixel values into a Pandas `Series` and using the `.value_counts` method. + + +```python jupyter={"source_hidden": true} +s_flat = pd.Series(array.flatten()).value_counts() +s_flat.sort_index() +``` + + +Most of the entries in this raster array are zero. The numerical values vary between 0 and 100 with the exception of some 1,700 pixels with the value 255. This will make more sense when we discuss the DIST data product specification. + + +## [rasterio](https://rasterio.readthedocs.io/en/stable) + + +Having reviewed some features of Xarray (and of its extension `rioxarray`), let's briefly look at `rasterio` as a means of exploring GeoTIFF files. + +From the [Rasterio documentation](https://rasterio.readthedocs.io/en/stable): + +> Before Rasterio there was one Python option for accessing the many different kind of raster data files used in the GIS field: the Python bindings distributed with the [Geospatial Data Abstraction Library, GDAL](http://gdal.org/). These bindings extend Python, but provide little abstraction for GDAL’s C API. This means that Python programs using them tend to read and run like C programs. For example, GDAL’s Python bindings require users to watch out for dangling C pointers, potential crashers of programs. This is bad: among other considerations we’ve chosen Python instead of C to avoid problems with pointers. +> +>What would it be like to have a geospatial data abstraction in the Python standard library? One that used modern Python language features and idioms? One that freed users from concern about dangling pointers and other C programming pitfalls? Rasterio’s goal is to be this kind of raster data library – expressing GDAL’s data model using fewer non-idiomatic extension classes and more idiomatic Python types and protocols, while performing as fast as GDAL’s Python bindings. +> +>High performance, lower cognitive load, cleaner and more transparent code. This is what Rasterio is about. + + +### Opening files with rasterio.open + +```python jupyter={"source_hidden": true} +# Show rasterio.open works using context manager +import rasterio +from pathlib import Path +LOCAL_PATH = Path('..') / 'assets' / \ + 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif' +print(LOCAL_PATH) +``` + + +Given a data source (e.g., a GeoTIFF file in the current context), we can open a `DatasetReader` object associated with using `rasterio.open`. Technically, we have to remember to close the object afterward. That is, our code would look like this: + +```python +ds = rasterio.open(LOCAL_PATH) +.. +# do some computation +... +ds.close() +``` + +As with file-handling in Python, we can use a *context manager* (i.e., a `with` clause) instead. +```python +with rasterio.open(LOCAL_PATH) as ds: + ... + # do some computation + ... + +# more code outside the scope of the with block. +``` +The dataset will be closed automatically outside the `with` block. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'{type(ds)=}') + assert not ds.closed + +# outside the scope of the with block +assert ds.closed +``` + + +The principal advantage of using `rasterio.open` rather than `rioxarray.open_rasterio` to open a file is that the latter approach opens the file and immediately loads its contents into a `DataDarray` in memory. + +By contrast, using `rasterio.open` merely opens the file in place and its contents can be examined *without* immediately loading its contents into memory. This makes a lot of difference when working with remote data; transferring the entire contents across a network takes time. If we examine the meta-data—which is typically much smaller and can be transferred quickly—we may find, e.g., that the contents of the file do not warrant moving arrays of data across the network. + + +### Examining DatasetReader attributes + + +When a file is opened using `rasterio.open`, the object instantiated is from the `DatasetReader` class. This class has a number of attributes and methods of interest to us: + + | | | | + |--|--|--| + |`profile`|`height`|`width` | + |`shape` |`count`|`nodata`| + |`crs`|`transform`|`bounds`| + |`xy`|`index`|`read` | + +First, given a `DatasetReader` `ds` associated with a data source, examining `ds.profile` returns some diagnostic information. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'{ds.profile=}') +``` + + +The attributes `ds.height`, `ds.width`, `ds.shape`, `ds.count`, `ds.nodata`, and `ds.transform` are all included in the output from `ds.profile` but are also accessible individually. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'{ds.height=}') + print(f'{ds.width=}') + print(f'{ds.shape=}') + print(f'{ds.count=}') + print(f'{ds.nodata=}') + print(f'{ds.crs=}') + print(f'{ds.transform=}') +``` + +### Reading data into memory + + +The method `ds.read` loads an array from the data file into memory. Notice this can be done on local or remote files. + + +```python jupyter={"source_hidden": true} +%%time +with rasterio.open(LOCAL_PATH) as ds: + array = ds.read() + print(f'{array.shape=}') +``` + +```python jupyter={"source_hidden": true} +%%time +with rasterio.open(REMOTE_URL) as ds: + array = ds.read() + print(f'{array.shape=}') +``` + +```python +print(f'{type(array)=}') +``` + + +The array loaded into memory with `ds.read` is a NumPy array. This can be wrapped by an Xarray `DataArray` if we provide additional code to specify the coordinate labels and so on. + + +### Mapping coordinates + + +Earlier, we loaded data from a local file into a `DataArray` called `da` using `rioxarray.open_rasterio`. + + +```python jupyter={"source_hidden": true} +da = rio.open_rasterio(LOCAL_PATH) +da +``` + + +Doing so simplified the loading raster data from a GeoTIFF file into an Xarray `DataArray` while filling in the metadata for us. In particular, the coordinates associated with the pixels were stored into `da.coords` (the default coordinate axes are `band`, `x`, and `y` for this three-dimensional array). + +If we ignore the extra `band` dimension, the pixels of the raster data are associated with pixel coordinates (integers) and spatial coordinates (real values, typically a point at the centre of each pixel). + +![](http://ioam.github.io/topographica/_images/matrix_coords.png) +![](http://ioam.github.io/topographica/_images/sheet_coords_-0.2_0.4.png) +(from `http://ioam.github.io/topographica`) + +The accessors `da.isel` and `da.sel` allow us to extract slices from the data array using pixel coordinates or spatial coordinates respectively. + + + +If we use `rasterio.open` to open a file, the `DatasetReader` attribute `transform` provides the details of how to convert between pixel and spatial coordinates. We will use this capability in some of the case studies later. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'{ds.transform=}') + print(f'{np.abs(ds.transform[0])=}') + print(f'{np.abs(ds.transform[4])=}') +``` + + +The attribute `ds.transform` is an object describing an [*affine transformation*](https://en.wikipedia.org/wiki/Affine_transformation) (represented above as a $2\times3$ matrix). Notice that the absolute values of the diagonal entries of the matrix `ds.transform` give the spatial dimensions of pixels ($30\mathrm{m}\times30\mathrm{m}$ in this case). + +We can also use this object to convert pixel coordinates to the corresponding spatial coordinates. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'{ds.transform * (0,0)=}') # top-left pixel + print(f'{ds.transform * (0,3660)=}') # bottom-left pixel + print(f'{ds.transform * (3660,0)=}') # top-right pixel + print(f'{ds.transform * (3660,3660)=}') # bottom-right pixel +``` + + +The attribute `ds.bounds` displays the bounds of the spatial region (left, bottom, right, top). + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(f'coordinate bounds: {ds.bounds=}') +``` + + +The method `ds.xy` also converts integer index coordinates to continuous coordinates. Notice that `ds.xy` maps integers to the centre of pixels. The loops below print out the first top left corner of the coordinates in pixel coordinates and then the cooresponding spatial coordinates. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + for k in range(3): + for l in range(4): + print(f'({k:2d},{l:2d})','\t', end='') + print() + print() + for k in range(3): + for l in range(4): + e,n = ds.xy(k,l) + print(f'({e},{n})','\t', end='') + print() + print() +``` + + +`ds.index` does the reverse: given spatial coordinates `(x,y)`, it returns the integer indices of the pixel that contains that point. + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(ds.index(500000, 4700015)) +``` + + +These conversions can be tricky, particularly because pixel coordinates map to the centres of the pixels and also because the second `y` spatial coordinate *decreases* as the second pixel coordinate *increases*. Keeping track of tedious details like this is partly why loading from `rioxarray` is useful, i.e., this is done for us. But it is worth knowing that we can reconstruct this mapping if needed from meta-data in the GeoTIFF file (we use this fact later). + + +```python jupyter={"source_hidden": true} +with rasterio.open(LOCAL_PATH) as ds: + print(ds.bounds) + print(ds.transform * (0.5,0.5)) # Maps to centre of top left pixel + print(ds.xy(0,0)) # Same as above + print(ds.transform * (0,0)) # Maps to top left corner of top left pixel + print(ds.xy(-0.5,-0.5)) # Same as above + print(ds.transform[0], ds.transform[4]) +``` diff --git a/book/02_Software_Tools/02_Data_Visualization_Tools.md b/book/02_Software_Tools/02_Data_Visualization_Tools.md new file mode 100644 index 0000000..5ea64a7 --- /dev/null +++ b/book/02_Software_Tools/02_Data_Visualization_Tools.md @@ -0,0 +1,364 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.2 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Data Visualization + + +The primary tools we'll use for plotting come from the [Holoviz](https://holoviz.org/) family of Python libraries, principally [GeoViews](https://geoviews.org/) and [hvPlot](https://hvplot.holoviz.org/). These are largely built on top of [HoloViews](https://holoviews.org/) and support multiple backends for rendering plots (notably [Bokeh](http://bokeh.pydata.org/) for interactive visualization and [Matplotlib](http://matplotlib.org/) for static, publication-quality plots). + + +## [GeoViews](https://geoviews.org/) + + +From the [GeoViews documentation](https://geoviews.org/index.html): + +> GeoViews is a [Python](http://python.org/) library that makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research. +> +> GeoViews is built on the [HoloViews](http://holoviews.org/) library for building flexible visualizations of multidimensional data. GeoViews adds a family of geographic plot types based on the [Cartopy](http://scitools.org.uk/cartopy) library, plotted using either the [Matplotlib](http://matplotlib.org/) or [Bokeh](http://bokeh.pydata.org/) packages. With GeoViews, you can now work easily and naturally with large, multidimensional geographic datasets, instantly visualizing any subset or combination of them, while always being able to access the raw data underlying any plot. + + +```python jupyter={"source_hidden": true} +import warnings +warnings.filterwarnings('ignore') +from pathlib import Path + +import geoviews as gv +gv.extension('bokeh') +from geoviews import opts +``` + +### Displaying a basemap + + +- Principal utility is `gv.tile_sources` +- Use the method `opts` to specify optional settings +- Bokeh menu at right enables interactive exploration + + +```python jupyter={"source_hidden": true} +basemap = gv.tile_sources.OSM.opts(width=600, height=400) +basemap +``` + +### Plotting points + + +To get started, let's define a regular Python tuple for the longitude-latitude coordinates of Tokyo, Japan. + + +```python jupyter={"source_hidden": true} +tokyo_lonlat = (139.692222, 35.689722) +print(tokyo_lonlat) +``` + + ++ `geoviews.Points` accepts a list of tuples (each of the form `(x, y)`) to plot. ++ Use the OpenStreetMap tiles from `gv.tile_sources.OSM` as a basemap. ++ Overlay using the Holoviews operator `*` ++ Define the options using `geoviews.opts` ++ ? find a way to initialize the zoom level sensibly? + + +```python jupyter={"source_hidden": true} +tokyo_point = gv.Points([tokyo_lonlat]) +point_opts = opts.Points( + size=48, + alpha=0.5, + color='red' + ) +``` + +```python jupyter={"source_hidden": true} +# Use Holoviews * operator to overlay plot on basemap +# Note: zoom out to see basemap (starts zoomed "all the way in") +(basemap * tokyo_point).opts(point_opts) +``` + +```python jupyter={"source_hidden": true} +# to avoid starting zoomed all the way in, this zooms "all the way out" +(basemap * tokyo_point).opts(point_opts, opts.Overlay(global_extent=True)) +``` + +### Plotting rectangles + + ++ Standard way to represent rectangle with corners + $$(x_{\mathrm{min}},y_{\mathrm{min}}), (x_{\mathrm{min}},y_{\mathrm{max}}), (x_{\mathrm{max}},y_{\mathrm{min}}), (x_{\mathrm{max}},y_{\mathrm{max}})$$ + (assuming $x_{\mathrm{max}}>x_{\mathrm{min}}$ & $y_{\mathrm{max}}>y_{\mathrm{min}}$) is as a single 4-tuple + $$(x_{\mathrm{min}},y_{\mathrm{min}},x_{\mathrm{max}},y_{\mathrm{max}}),$$ + i.e., the lower,left corner coordinates followed by the upper, right corner coordinates. + + +```python jupyter={"source_hidden": true} +# simple utility to make a rectangle of "half-width" dx & "half-height" dy & centred pt +def bounds(pt,dx,dy): + '''Returns rectangle represented as tuple (x_lo, y_lo, x_hi, y_hi) + given inputs pt=(x, y), half-width & half-height dx & dy respectively, + where x_lo = x-dx, x_hi=x+dx, y_lo = y-dy, y_hi = y+dy. + ''' + return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx,dy))) +``` + +```python jupyter={"source_hidden": true} +# Verify that the function bounds works as intended +marrakesh_lonlat = (-7.93, 31.67) +dlon, dlat = 0.25, 0.25 +marrakesh_rect = bounds(marrakesh_lonlat, dlon, dlat) +print(marrakesh_rect) +``` + + ++ `geoviews.Rectangles` accepts a list of bounding boxes (each described by a tuple of the form `(x_min, y_min, x_max, y_max)`) to plot. + + +```python jupyter={"source_hidden": true} +rect_opts = opts.Rectangles( + line_width=0, + alpha=0.25, + color='red' + ) +``` + +```python jupyter={"source_hidden": true} +rectangle = gv.Rectangles([marrakesh_rect]) +(basemap * rectangle).opts( rect_opts ) +``` + + +We'll use the approach above to visualize *Areas of Interest (AOIs)* when constructing search queries for NASA EarthData products. In particular, the convention of representing a bounding box by (left, lower, right, upper) ordinates is also used in the [PySTAC](https://pystac.readthedocs.io/en/stable/) API. + +--- + + +## [hvPlot](https://hvplot.holoviz.org/) + + ++ [hvPlot](https://hvplot.holoviz.org/) is designed to extend the `.plot` API from Pandas DataFrames. ++ It works for Pandas DataFrames and Xarray DataArrays/Datasets. + + +### Plotting from a DataFrame with hvplot.pandas + + +The code below loads a Pandas DataFrame of temperature data. + + +```python jupyter={"source_hidden": true} +import pandas as pd, numpy as np +from pathlib import Path +LOCAL_PATH = Path().cwd() / '..' / 'assets' / 'temperature.csv' +``` + +```python jupyter={"source_hidden": true} +df = pd.read_csv(LOCAL_PATH, index_col=0, parse_dates=[0]) +df.head() +``` + +#### Reviewing the Pandas DataFrame.plot API + + +Let's extract a subset of columns from this DataFrame and generate a plot. + + +```python jupyter={"source_hidden": true} +west_coast = df[['Vancouver', 'Portland', 'San Francisco', 'Seattle', 'Los Angeles']] +west_coast.head() +``` + + +The Pandas DataFrame `.plot` API provides access to a number of plotting methods. Here, we'll use `.plot.line`, but a range of other options are available (e.g., `.plot.area`, `.plot.bar`, `.plot.hist`, `.plot.scatter`, etc.). This API has been repeated in several libraries due to its convenience. + + +```python jupyter={"source_hidden": true} +west_coast.plot.line(); # This produces a static Matplotlib plot +``` + +#### Using the hvPlot DataFrame.hvplot API + + +By importing `hvplot.pandas`, a similar interactive plot can be generated. The API for `.hvplot` mimics that for `.plot`. For instance, we can generate the line plot above using `.hvplot.line`. In this case, the default plotting backend is Bokeh, so the plot is *interactive*. + + +```python jupyter={"source_hidden": true} +import hvplot.pandas +west_coast.hvplot.line() # This produces an interactive Bokeh plot +``` + + +It is possible to modify the plot to make it cleaner. + + +```python jupyter={"source_hidden": true} +west_coast.hvplot.line(width=600, height=300, grid=True) +``` + + +The `hvplot` API also works when chained together with other DataFrame method calls. For instance, we can resample the temperature data and compute a mean to smooth it out. + + +```python jupyter={"source_hidden": true} +smoothed = west_coast.resample('2d').mean() +smoothed.hvplot.line(width=600, height=300, grid=True) +``` + +### Plotting from a DataArray with hvplot.xarray + + +The Pandas `.plot` API was also extended to Xarray, i.e., for Xarray `DataArray`. + + +```python jupyter={"source_hidden": true} +import xarray as xr +import hvplot.xarray +import rioxarray as rio +``` + + +To start, load a local GeoTIFF file using `rioxarray` into an Zarray `DataArray` structure. + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif' +``` + +```python jupyter={"source_hidden": true} +data = rio.open_rasterio(LOCAL_PATH) +data +``` + + +We do some minor reformatting to the `DataArray`. + + +```python jupyter={"source_hidden": true} +data = data.squeeze() # to reduce 3D array with singleton dimension to 2D array +data = data.rename({'x':'easting', 'y':'northing'}) +data +``` + +#### Reviewing the Xarray DataArray.plot API + + +The `DataArray.plot` API by default uses Matplotlib's `pcolormesh` to display a 2D array stored within a `DataArray`. This takes a little time to render for this moderately high-resolution image. + + +```python jupyter={"source_hidden": true} +data.plot(); # by default, uses pcolormesh +``` + +#### Using the hvPlot DataArray.hvplot API + + +Again, the `DataArray.hvplot` API mimics the `DataArray.plot` API; by default, it uses a subclass derived from `holoviews.element.raster.Image`. + + +```python jupyter={"source_hidden": true} +plot = data.hvplot() # by default uses Image class +print(f'{type(plot)=}') +plot +``` + + +The result above is an interactive plot, rendered using Bokeh.It's a bit slow, but we can add some options to speed up rendering. It also requires cleaning up; for instance, the image is not square, the colormap does not highlight useful features, the axes are transposed, and so on. + + +#### Building up options incrementally to improve plots + + +Let's add options to improve the image. To do this, we'll initialize a Python dictionary `image_opts` to use within the call to the `image` method. We'll use the `dict.update` method to add key-value pairs to the dictionary incrementally, each time improving the output image. + + +```python +image_opts = dict(rasterize=True, dynamic=True) +``` + + +To start, let's make the call to `hvplot.image` explicit & specify the sequence of axes. & apply the options from the dictionary `image_opts`. We'll use the dict-unpacking operation `**image_opts` each time we invoke `data.hvplot.image`. + + +```python jupyter={"source_hidden": true} +plot = data.hvplot.image(x='easting', y='northing', **image_opts) +plot +``` + + +Next, let's fix the aspect ratio and image dimensions. + + +```python jupyter={"source_hidden": true} +image_opts.update(frame_width=500, frame_height=500, aspect='equal') + +plot = data.hvplot.image(x='easting', y='northing', **image_opts) +plot +``` + + +The image colormap is a bit unpleasant; let's change it and use transparency (`alpha`). + + +```python jupyter={"source_hidden": true} +image_opts.update( cmap='hot_r', clim=(0,100), alpha=0.8 ) +plot = data.hvplot.image(x='easting', y='northing', **image_opts) +plot +``` + + +Before adding a basemap, we need to account for the coordinate system. This is stored in the GeoTIFF file and, when read using `rioxarray.open_rasterio`, it is available by using the attribute `data.rio.crs`. + + +```python jupyter={"source_hidden": true} +crs = data.rio.crs +crs +``` + + +We can use the CRS recovered above as an optional argument to `hvplot.image`. Notice the coordinates have changed on the axes, but the labels are wrong. We can fix that shortly. + + +```python jupyter={"source_hidden": true} +image_opts.update(crs=crs) + +plot = data.hvplot.image(x='easting', y='northing', **image_opts) +plot +``` + +Let's now fix the labels. We'll use the Holoviews/GeoViews `opts` system to specify these options. + +```python jupyter={"source_hidden": true} +label_opts = dict(title='VEG_ANOM_MAX', xlabel='Longitude (degrees)', ylabel='Latitude (degrees)') +plot = data.hvplot.image(x='easting', y='northing', **image_opts).opts(**label_opts) +plot +``` + + +Let's overlay the image on a basemap so we can see the terrain underneath. + + +```python jupyter={"source_hidden": true} +base = gv.tile_sources.ESRI +base * plot +``` + + +Finally, the white pixels are distracting; let's filter them out using the `DataArray` method `where`. + + +```python jupyter={"source_hidden": true} +plot = data.where(data>0).hvplot.image(x='easting', y='northing', **image_opts).opts(**label_opts) +plot * base +``` + + +In this notebook, we applied some common strategies to generate plots. We'll use these extensively in the rest of the tutorial. + From 9df869eb577f708ebc8f50cdb2a434c47618a3f7 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Fri, 13 Sep 2024 06:48:28 -0700 Subject: [PATCH 04/10] Cleaning up outdated files --- .../01_Coordinate_systems.md | 40 ---------- .../02_Geospatial_data | 46 ----------- .../remote-sensing.md | 3 - .../01_Geospatial_data_formats.md | 79 ------------------- .../geographic_data_formats.md | 76 ------------------ .../2_Selecting_an_AOI.md | 0 6 files changed, 244 deletions(-) delete mode 100644 book/02_Geospatial_fundamentals/01_Coordinate_systems.md delete mode 100644 book/02_Geospatial_fundamentals/02_Geospatial_data delete mode 100644 book/02_Geospatial_fundamentals/remote-sensing.md delete mode 100644 book/03_Geospatial_data_files/01_Geospatial_data_formats.md delete mode 100644 book/03_Geospatial_data_files/geographic_data_formats.md rename book/{02_Geospatial_fundamentals => 09_Scratch}/2_Selecting_an_AOI.md (100%) diff --git a/book/02_Geospatial_fundamentals/01_Coordinate_systems.md b/book/02_Geospatial_fundamentals/01_Coordinate_systems.md deleted file mode 100644 index 066f788..0000000 --- a/book/02_Geospatial_fundamentals/01_Coordinate_systems.md +++ /dev/null @@ -1,40 +0,0 @@ -The following summary was made based on the [QGIS documentation](https://docs.qgis.org/3.34/en/docs/gentle_gis_introduction/coordinate_reference_systems.html) and [Victor Olaya´s book](https://volaya.github.io/libro-sig/). - -# Coordinate reference system - -With the help of coordinate reference systems (CRS) every place on the earth can be specified by a set of three numbers, called coordinates. In general, CRS can be divided into geographic coordinate reference systems and projected coordinate reference systems (also called Cartesian or rectangular coordinate reference systems). - -## Geographic Coordinate Systems -The geographic coordinate system is a spherical coordinate system by which a point is located with two angular values: - -- Lines of **latitude** run parallel to the equator and divide the earth into 180 equally spaced sections from North to South (or South to North). The reference line for latitude is the equator and each hemisphere is divided into ninety sections, each representing one degree of latitude. In the northern hemisphere, degrees of latitude are measured from zero at the equator to ninety at the north pole. In the southern hemisphere, degrees of latitude are measured from zero at the equator to ninety degrees at the south pole. To simplify the digitization of maps, degrees of latitude in the southern hemisphere are often assigned negative values (0 to -90°). Wherever you are on the earth’s surface, the distance between the lines of latitude is the same (60 nautical miles). - -- Lines of **longitude** run perpendicular to the equator and converge at the poles. The reference line for longitude (the prime meridian) runs from the North pole to the South pole through Greenwich, England. Subsequent lines of longitude are measured from zero to 180 degrees East or West of the prime meridian. Note that values West of the prime meridian are assigned negative values for use in digital mapping applications. At the equator, and only at the equator, the distance represented by one line of longitude is equal to the distance represented by one degree of latitude. As you move towards the poles, the distance between lines of longitude becomes progressively less, until, at the exact location of the pole, all 360° of longitude are represented by a single point that you could put your finger on (you probably would want to wear gloves though). - - -![geographic_crs](../assets/geographic_crs.png) - -

Geographic coordinate system with lines of latitude parallel to the equator and lines of longitude with the prime meridian through Greenwich. Source: QGIS Documentation. -

- - -Using the geographic coordinate system, we have a grid of lines dividing the earth into squares that cover approximately 12363.365 square kilometers at the equator — a good start, but not very useful for determining the location of anything within that square. To be truly useful, a map grid must be divided into small enough sections so that they can be used to describe (with an acceptable level of accuracy) the location of a point on the map. To accomplish this, degrees are divided into minutes (') and seconds ("). There are sixty minutes in a degree, and sixty seconds in a minute (3600 seconds in a degree). So, at the equator, one second of latitude or longitude = 30.87624 meters. - -## Projected coordinate reference systems -A two-dimensional coordinate reference system is commonly defined by two axes. At right angles to each other, they form a so called XY-plane. The horizontal axis is normally labelled X, and the vertical axis is normally labelled Y. In a three-dimensional coordinate reference system, another axis, normally labelled Z, is added. It is also at right angles to the X and Y axes. The Z axis provides the third dimension of space. Every point that is expressed in spherical coordinates can be expressed as an X Y Z coordinate. - -A projected coordinate reference system in the southern hemisphere (south of the equator) normally has its origin on the equator at a specific Longitude. This means that the Y-values increase southwards and the X-values increase to the West. In the northern hemisphere (north of the equator) the origin is also the equator at a specific Longitude. However, now the Y-values increase northwards and the X-values increase to the East. - -### UTM - -The Universal Transverse Mercator (UTM) coordinate reference system has its origin on the equator at a specific Longitude. Now the Y-values increase southwards and the X-values increase to the West. The UTM CRS is a global map projection. This means it is generally used all over the world. To avoid too much distortion, the world is divided into 60 equal zones that are all 6 degrees wide in longitude from East to West. The UTM zones are numbered 1 to 60, starting at the antimeridian (zone 1 at 180 degrees West longitude) and progressing East back to the antemeridian (zone 60 at 180 degrees East longitude). - -![utm_zones](../assets/utm_zones.png) - -

The Universal Transverse Mercator zones. Source: QGIS Documentation. -

- - -The position of a coordinate in UTM south of the equator must be indicated with the zone number and with its northing (Y) value and easting (X) value in meters. The northing value is the distance of the position from the equator in meters. The easting value is the distance from the central meridian (longitude) of the used UTM zone. Furthermore, because we are south of the equator and negative values are not allowed in the UTM coordinate reference system, we have to add a so-called false northing value of 10,000,000 m to the northing (Y) value and a false easting value of 500,000 m to the easting (X) value. - -For more details, you can refer to the [QGIS Documentation](https://docs.qgis.org/3.34/en/docs/gentle_gis_introduction/coordinate_reference_systems.html). diff --git a/book/02_Geospatial_fundamentals/02_Geospatial_data b/book/02_Geospatial_fundamentals/02_Geospatial_data deleted file mode 100644 index 3e5ee8d..0000000 --- a/book/02_Geospatial_fundamentals/02_Geospatial_data +++ /dev/null @@ -1,46 +0,0 @@ -# Geospatial data - -## Vector data -Vector data is split into three types: point, line, and polygon data.^1^ - -![points-lines-polygons-vector-data-types](../assets/points-lines-polygons-vector-data-types.png) - -

This image shows the three vector types: points, lines and polygons. Source: National Ecological Observatory Network. -

- -**Point data** -Point data is most commonly used to represent nonadjacent features and discrete data points. Points have zero dimensions, therefore you can measure neither length or area with this dataset.^1^ - -![points](../assets/points.png) - -

This image shows an example of a spatial entity (left) and its point vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. -

- -**Line Data** -Line data is used to represent linear features. Common examples would be rivers, trails, and streets. Line features only have one dimension and therefore can only be used to measure length. Line features have a starting and ending point. Common examples would be road centerlines and hydrology.^1^ - -![lines](../assets/lines.png) -

This image shows an example of a spatial entity (left) and its line vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. -

- - -**Polygon data** -Polygons are used to represent areas such as the boundary of a city (on a large-scale map), lake, or forest. Polygon features are two-dimensional and therefore can be used to measure the area and perimeter of a geographic feature.^1^ -![polygon](../assets/polygon.png) - -

This image shows an example of a spatial entity (left) and its point vector representation (right). Source: " Sistemas de Información Geográfica" by Victor Olaya. -

- - -## Raster data -Raster data represents geographic data as a matrix of cells containing an attribute value. While the area of different polygon shapes in a data set can differ, each cell in a raster data set is the same cell. The size of the area in the real world that each cell represents is known as the spatial resolution.^1^ - -![raster_neon](../assets/raster_neon.png) - -

This image shows an example of raster data. Source: National Ecological Observatory Network (NEON). -

- - -## References - -1. https://www.geographyrealm.com/geodatabases-explored-vector-and-raster-data/ diff --git a/book/02_Geospatial_fundamentals/remote-sensing.md b/book/02_Geospatial_fundamentals/remote-sensing.md deleted file mode 100644 index 0f7539f..0000000 --- a/book/02_Geospatial_fundamentals/remote-sensing.md +++ /dev/null @@ -1,3 +0,0 @@ -# Remote Sensing - -![](../assets/climate_risks_remote_sensing_filled.png) diff --git a/book/03_Geospatial_data_files/01_Geospatial_data_formats.md b/book/03_Geospatial_data_files/01_Geospatial_data_formats.md deleted file mode 100644 index 8f7559a..0000000 --- a/book/03_Geospatial_data_files/01_Geospatial_data_formats.md +++ /dev/null @@ -1,79 +0,0 @@ -# Geospatial data formats - -## Vector data formats - -In this tutorial, we will be using the following vector data formats: - -### 1. SHP - -Shapefile is the most widely known format for distributing geospatial data. It is a standard first developed by ESRI almost 30 years ago, which is considered ancient in software development.1 - -The shapefiles consist of several files some of them are mandatory and others optional. The mandatory file extensions needed for a shapefile are .shp, .shx, and .dbf. But the optional files are .prj, .xml, .sbn, and .sbx. 2 This makes operating Shapefiles slightly clunky and confusing. However, Shapefile has been around for so long that any GIS software supports handling it.1 - -In the following, we show a list and a brief explanation of all the files that make up a shapefile, including shp, shx, dbf, prj, xml,sbn, sbx, and cpg based on 2. - -**Mandatory files** -- Main File (.shp): shp is a mandatory Esri file that gives features their geometry. Every shapefile has its own .shp file that represents spatial vector data. For example, it could be points, lines, and polygons in a map. - -- Index File (.shx): shx is a mandatory Esri and AutoCAD shape index positions. This type of file is used to search forward and backward. - -- dBASE File (.dbf): dbf is a standard database file used to store attribute data and object IDs. A .dbf file is mandatory for shape files. You can open DBF files in Microsoft Access or Excel. - -**Optional files** -- Projection File (.prj): prj is an optional file that contains the metadata associated with the shapefiles coordinate and projection system. If this file does not exist, you will get the error “unknown coordinate system”. If you want to fix this error, you have to use the “define projection” tool which generates .prj files. - -- Extensible Markup Language File (.xml): xml file types contain the metadata associated with the shapefile. If you delete this file, you essentially delete your metadata. You can open and edit this optional file type (.xml) in any text editor. - -- Spatial Index File (.sbn): sbn files are optional spatial index files that optimize spatial queries. This file type is saved together with a .sbx file. These two files make up a shape index to speed up spatial queries. - -- Spatial Index File (.sbx): sbx files are similar to .sbn files which speed up loading times. It works with .sbn files to optimize spatial queries. We tested .sbn and .sbx extensions and found that there were faster load times when these files existed. It was 6 seconds faster (27.3 sec versus 33.3 sec) compared with/without .sbn and .sbx files. - -- Code Page File (.cpg): cpg files are optional plain text files that describe the encoding applied to create the shapefile. If your shapefile doesn’t have a .cpg file, then it has the system´s default encoding. - -Internally, Shapefile uses Well-known binary (WKB) for encoding the geometries. 4 This is a compact format that is based on tabular thinking, i.e. the row and column number of a value is significant. A minor nuisance is the limitation of the attribute field names to 10 characters and poor Unicode support, so some abbreviations and forcing to ASCII may have to be used.^3^ - -For more details about the shapefiles you can refer to the [ESRI Shapefile Technical Description](https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf) - -### 2. GeoJSON - -GeoJSON is a subset of [JSON (JavaScript object notation)](https://www.json.org). It was developed 10 years ago by a group of enthusiastic GIS developers. The core idea is to provide a specification for encoding geospatial data while remaining decodable by any JSON decoder.3 - -![json_example](../assets/json_example.PNG) - -

This image shows an example of a GeoJson file. Source: https://gist.githubusercontent.com/wavded. -

- - -Being a subset of the immensely popular JSON, the parsing support is on a different level than with Shapefile. In addition to support from most GIS software, any web developer will be able to write a custom GeoJSON parser, opening new possibilities for integrating the data.3 - -Being designed as one blob of data instead of a small "text database" means it is simpler to handle but is essentially designed to be loaded to memory in full at once.^3^ - -For more information about GeoJSON data formats, you can refer to [https://geojson.org/](https://geojson.org/). - -## Raster data formats - - -In this tutorial, we will be using the following raster data format: - -### 1. GeoTiffs - -[GeoTIFF](https://en.wikipedia.org/wiki/GeoTIFF) is a public domain metadata standard that enables georeferencing information to be embedded within a [TIFF](https://en.wikipedia.org/wiki/TIFF) file. The GeoTIFF format embeds geospatial metadata into image files such as aerial photography, satellite imagery, and digitized maps so that they can be used in GIS applications.3 - -A GeoTIFF file extension contains geographic metadata that describes the actual location in space that each pixel in an image represents. In creating a GeoTIFF file, spatial information is included in the .tif file as embedded tags, which can include raster image metadata such as: -* horizontal and vertical datums -* spatial extent, i.e. the area that the dataset covers -* the coordinate reference system (CRS) used to store the data -* spatial resolution, measured in the number of independent pixel values per unit length -* the number of layers in the .tif file -* ellipsoids and geoids - estimated models of the Earth’s shape -* mathematical rules for map projection to transform data for a three-dimensional space into a two-dimensional display. - -Geodata is drawn from vector formats on a map, and the geodata is converted to the specified output projection of the map if the projection in the source file differs. Some of the vector and raster formats typically supported by a GeoTIFF online viewer include asc, gml, gpx, json, kml, kmz, mid, mif, osm, tif, tab, map, id, dat, gdbtable, and gdbtablx.3 - -For more details about GeoTiff data format, you can refer to [OGC GeoTIFF Standard](https://www.ogc.org/standard/geotiff/). - -## References -1. https://feed.terramonitor.com/shapefile-vs-geopackage-vs-geojson/ -2. https://gisgeography.com/arcgis-shapefile-files-types-extensions/ -3. https://www.heavy.ai/technical-glossary/geotiff -4. https://libgeos.org/specifications/wkb/ diff --git a/book/03_Geospatial_data_files/geographic_data_formats.md b/book/03_Geospatial_data_files/geographic_data_formats.md deleted file mode 100644 index a664a10..0000000 --- a/book/03_Geospatial_data_files/geographic_data_formats.md +++ /dev/null @@ -1,76 +0,0 @@ -# Geographic data format - -## Vector data -Vector data is split into three types: point, line, and polygon data.^1^ - -**Point data** -Point data is most commonly used to represent nonadjacent features and to represent discrete data points. Points have zero dimensions, therefore you can measure neither length or area with this dataset.^1^ - -**Line Data** -Line data is used to represent linear features. Common examples would be rivers, trails, and streets. Line features only have one dimension and therefore can only be used to measure length. Line features have a starting and ending point. Common examples would be road centerlines and hydrology.^1^ - -**Polygon data** -Polygons are used to represent areas such as the boundary of a city (on a large scale map), lake, or forest. Polygon features are two dimensional and therefore can be used to measure the area and perimeter of a geographic feature.^1^ - -![points-lines-polygons-vector-data-types](../assets/points-lines-polygons-vector-data-types.png) - -

This image shows the three vector types: points, lines and polygons. Source: National Ecological Observatory Network. -

- - - -In this tutorial we will be using the following vector data formats: - -### 1. SHP - -Shapefile is the most widely known format for distributing geospatial data. It is a standard first developed by ESRI almost 30 years ago, which is considered ancient in software development.^3^ - -The Shapefile in fact consists of several files: in addition to one file with the actual geometry data, another file for defining the coordinate reference system is needed, as well as a file for defining the attributes and a file to index the geometries. This makes operating Shapefiles slightly clunky and confusing. However, Shapefile has been around for so long that any GIS software supports handling it.^3^ - -Internally, Shapefile uses Well-known binary (WKB) for encoding the geometries. This is a compact format which is based on tabular thinking, i.e. the row and column number of a value is significant. A minor nuisance is the limitation of the attribute field names to 10 characters and poor Unicode support, so some abbreviations and forcing to ASCII may have to be used.^3^ - -For more details about the shapefiles you can refer to the [ESRI Shapefile Technical Description](https://www.esri.com/content/dam/esrisites/sitecore-archive/Files/Pdfs/library/whitepapers/pdfs/shapefile.pdf) - -### 2. GeoJSON - -GeoJSON is a subset of JSON (JavaScript object notation). It was developed 10 years ago by a group of enthusiastic GIS developers. The core idea is to provide a specification for encoding geospatial data while remaining decodable by any JSON decoder.^3^ - -Being a subset of the immensely popular JSON, the parsing support is on a different level than with Shapefile. In addition to support from most GIS software, any web developer will be able to write a custom GeoJSON parser, opening new possibilites for integrating the data.^3^ - -Being designed as one blob of data instead of a small "text database" means it is simpler to handle but is essentially designed to be loaded to memory in full at once.^3^ - -## Raster data -Raster data represents geographic data as a matrix of cells that each contains an attribute value. While the area of different polygon shapes in a data set can differ, each cell in a raster data set is the same cell. The size of the area in the real world that each cell represents is known as the spatial resolution.^1^ - -In this tutorial we will be using the following raster data format: - -### 1. GeoTiffs - -GeoTIFF is a public domain metadata standard that enables georeferencing information to be embedded within an image file. The GeoTIFF format embeds geospatial metadata into image files such as aerial photography, satellite imagery, and digitized maps so that they can be used in GIS applications.^2^ - -A GeoTIFF file extension contains geographic metadata that describes the actual location in space that each pixel in an image represents. In creating a GeoTIFF file, spatial information is included in the .tif file as embedded tags, which can include raster image metadata such as: -* horizontal and vertical datums -* spatial extent, i.e. the area that the dataset covers -* the coordinate reference system (CRS) used to store the data -* spatial resolution, measured in the number of independent pixel values per unit length -* the number of layers in the .tif file -* ellipsoids and geoids - estimated models of the Earth’s shape -* mathematical rules for map projection to transform data for a three-dimensional space into a two-dimensional display. - -Geodata is drawn from vector formats on a map, and the geodata is converted to the specified output projection of the map if the projection in the source file differs. Some of the vector and raster formats typically supported by a GeoTIFF online viewer include: asc, gml, gpx, json, kml, kmz, mid, mif, osm, tif, tab, map, id, dat, gdbtable, and gdbtablx.^2^ - - -## A Note on the Ordering of Coordinates in Python GIS Libraries - -In Python, 2D raster data are represented as matrices. Typically, the x-dimension of the array corresponds to `easting` or `longitude`, while the y-dimension corresponds to `northing` or `latitude`. However, the convention for listing the dimensions of matrices is to list the number of rows first and columns second. This means a matrix with dimensions (n, m) has `n` rows (latitude bins) and `m` columns (longitude bins). This convention is reflected when querying the shape of a `numpy` array using the `shape` attribute. - -On the other hand, vector shapes, such as Shapely `Points`, follow the (longitude, latitude) notation. For example, the coordinates for Livingston, TX, are specified as `livingston_tx = Point(-95.09, 30.69)`. Similarly, `Polygon` bounds are specified in the order `(longitude_0, latitude_0, longitude_1, latitude_1)`. - -It is important to keep these coordinate orderings in mind when working with geospatial data, as they can easily become a source of confusion or errors. - - -## References - -1. https://www.geographyrealm.com/geodatabases-explored-vector-and-raster-data/ -3. https://www.heavy.ai/technical-glossary/geotiff -4. https://feed.terramonitor.com/shapefile-vs-geopackage-vs-geojson/ diff --git a/book/02_Geospatial_fundamentals/2_Selecting_an_AOI.md b/book/09_Scratch/2_Selecting_an_AOI.md similarity index 100% rename from book/02_Geospatial_fundamentals/2_Selecting_an_AOI.md rename to book/09_Scratch/2_Selecting_an_AOI.md From 16829b11a754d8c16c52ce263835291a14a22592 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Mon, 16 Sep 2024 06:28:59 -0700 Subject: [PATCH 05/10] Adds/updates 03_Using_NASA_EarthData --- .../01_Using_OPERA_DIST_Products.md | 255 ++++++++++++++ .../02_Using_OPERA_DSWx_Products.md | 325 ++++++++++++++++++ 03_Using_NASA_EarthData/03_Using_PySTAC.md | 240 +++++++++++++ 3 files changed, 820 insertions(+) create mode 100644 03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md create mode 100644 03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md create mode 100644 03_Using_NASA_EarthData/03_Using_PySTAC.md diff --git a/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md b/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md new file mode 100644 index 0000000..32e4943 --- /dev/null +++ b/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md @@ -0,0 +1,255 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.2 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Using OPERA DIST Products + + +## The OPERA project + + + +From the [OPERA (Observational Products for End-Users from Remote Sensing Analysis)](https://www.jpl.nasa.gov/go/opera) project: + +>Started in April 2021, the Observational Products for End-Users from Remote Sensing Analysis (OPERA) project at the Jet Propulsion Laboratory collects data from satellite radar and optical instruments to generate six product suites: +> +> + a near-global Surface Water Extent product suite +> + a near-global Surface Disturbance product suite +> + a near-global Radiometric Terrain Corrected product +> + a North America Coregistered Single Look complex product suite +> + a North America Displacement product suite +> + a North America Vertical Land Motion product suite + +That is, OPERA is a NASA initiative that takes, e.g., optical or radar remote-sensing data gathered from satellites and produces a variety of pre-processed data sets for public use. OPERA products are not raw satellite images; they are the result of algorithmic classification to determine, e.g., which land regions contain water or where vegetation has been displaced. The raw satellite images are collected from measurements made by the instruments onboard the Sentinel-1 A/B, Sentinel-2 A/B, and Landsat-8/9 satellite missions (hence the term *HLS* for "*Harmonized Landsat-Sentinel*" in numerous product descriptions). + + +--- + + +## The OPERA Land Surface Disturbance (DIST) product + + +One of these OPERA data products is the *Land Surface Disturbance (DIST)* product (more fully described in the [OPERA DIST HLS product specification](https://lpdaac.usgs.gov/documents/1766/OPERA_DIST_HLS_Product_Specification_V1.pdf)). The DIST product maps per pixel vegetation disturbance (specifically, vegetation cover loss) from Harmonized Landsat-8 and Sentinel-2 A/B (HLS) scenes. One application of this data is to quantify damage due to wildfires in forests. Vegetation disturbance is mapped when there is an indicated decrease in vegetation cover within an HLS pixel. The DIST_ALERT product is released at regular intervals (the same cadence of HLS imagery, roughly every 12 days over a given tile/region); the DIST_ANN product summarizes disturbance measurements over a year. + +The DIST products quanitfy surface reflectance (SR) data acquired from the Operational Land Imager (OLI) aboard the Landsat-8 remote sensing satellite and the Multi-Spectral Instrument (MSI) aboard the Sentinel-2 A/B remote-sensing satellite. The HLS products used to generate DIST data are distributed over tiles in projected map coordinates aligned with the [Military Grid Reference System (MGRS)](https://en.wikipedia.org/wiki/Military_Grid_Reference_System). Each tile covers 109.8 $km^2$ divided into 3660 rows and 3660 columns at 30 meter pixel spacing with tiles overlapping neighbors by 4900 meters in each direction (the details are fully described in the [product specification](https://lpdaac.usgs.gov/documents/1766/OPERA_DIST_HLS_Product_Specification_V1.pdf)). + +The OPERA DIST products are distributed as [Cloud Optimized GeoTIFFs](https://www.cogeo.org/); in practice, this means that different bands are stored in distinct TIFF files. The TIFF specification does permit storage of multidimensional arrays in a single file; storing distinct *bands* in distinct TIFF files allows files to be downloaded independently. + + +--- + + +### Band 1: Maximum Vegetation Loss Anomaly Value (VEG_ANOM_MAX) + + + +Let's examine a local file with an example of DIST-ALERT data. The file contains the first band: the *maximum vegetation loss anomaly*. For each pixel, this is a value between 0% and 100% representing the percentage difference between current observed vegetation cover and a historical reference value. That is, a value of 100 corresponds to complete loss of vegetation within a pixel and a value of 0 corresponds to no loss of vegetation. The pixel values are stored as 8-bit unsigned integers (UInt8) because the pixel values need only range between 0 and 100. A pixel value of 255 indicates missing data, namely that the HLS data was unable to distill a maximum vegetation anomaly value for that pixel. Of course, using 8-bit unsigned integer data is a lot more efficient for storage and for transmitting data across a network (as compared to, e.g., 32- or 64-bit floating-point data). + + + +Let's begin by importing the required libraries. Notice we're also pulling in some classes from the Bokeh library to make interactive plots a little nicer. + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +# Notebook dependencies +import warnings +warnings.filterwarnings('ignore') +from pathlib import Path +import rioxarray as rio +import hvplot.xarray +from bokeh.models import FixedTicker +import geoviews as gv +gv.extension('bokeh') +``` + + +We'll read the data from a local file and load it into an Xarray `DataArray` using `rioxarray.open_rasterio`. We'll do some relabelling to label the coordinates appropriately and we'll extract the CRS (coordinate reference system). + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-ANOM-MAX.tif' +data = rio.open_rasterio(LOCAL_PATH) +crs = data.rio.crs +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + +```python jupyter={"source_hidden": true} +data +``` + +```python jupyter={"source_hidden": true} +crs +``` + + +Before generating a plot, let's create a basemap using [ESRI](https://en.wikipedia.org/wiki/Esri) tiles. + + +```python jupyter={"source_hidden": true} +# Creates basemap +base = gv.tile_sources.ESRI.opts(width=1000, height=1000, padding=0.1) +``` + + +We'll also use dictionaries to capture the bulk of the plotting options we'll use in conjunction with `.hvplot.image` later on. + + +```python jupyter={"source_hidden": true} +image_opts = dict( + x='easting', + y='northing', + rasterize=True, + dynamic=True, + frame_width=500, + frame_height=500, + aspect='equal', + cmap='hot_r', + clim=(0, 100), + alpha=0.8 + ) +layout_opts = dict( + xlabel='Longitude', + ylabel='Latitude' + ) +``` + + +Finally, we'll use the `DataArray.where` method to filter out missing pixels and the pixels that saw no change in vegetation, and we'll modify the options in `image_opts` and `layout_opts` to values particular to this dataset. + + +```python jupyter={"source_hidden": true} +veg_anom_max = data.where((data>0) & (data!=255)) +image_opts.update(crs=data.rio.crs) +layout_opts.update(title=f"VEG_ANOM_MAX") +``` + + +This allows us to generate a meaningful plot. + + +```python jupyter={"source_hidden": true} +veg_anom_max.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + + +In the resulting plot, the white and yellow pixels correspond to regions in which some deforestation has occurred, but not much. By contrast, the dark and black pixels correspond to regions that have lost almost all vegetation. + + +--- + + +### Band 2: Date of Initial Vegetation Disturbance (VEG_DIST_DATE) + + +The DIST-ALERT products contain several bands (as summarized in the product specification). The second band we'll examine is the *date of initial vegetation disturbance* within the last year. This is stored as a 16-bit integer (Int16). + +Let's use a local data file again, namely `OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-DIST-DATE.tif`. The product specification describes how to file-naming conventions used; notable here is the *acquisition date & time* `20220815T185931`, i.e., almost 7pm (UTC) on August 15th, 2022. + +We'll load and relabel the `DataArray` as before. + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-DIST-DATE.tif' +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + + +In this particular band, the value 0 indicates no disturbance in the last year and -1 is a sentinel value indicating missing data. Any positive value is the number of days since Dec. 31, 2020 in which the first disturbance is measured in that pixel. We'll filter out the non-positive values and preserve these meaningful values using `DataArray.where`. + + +```python jupyter={"source_hidden": true} +veg_dist_date = data.where(data>0) +``` + + +Let's examine the range of numerical values in `veg_dist_date` using `DataArray.min` and `DataArray.max`. Both of these methods will ignore pixels containing `nan` ("Not-a-Number") when computing the minimum and maximum. + + +```python jupyter={"source_hidden": true} +d_min, d_max = int(veg_dist_date.min().item()), int(veg_dist_date.max().item()) +print(f'{d_min=}\t{d_max=}') +``` + + +In this instance, the meaningful data lies between 247 and 592. Remember, this is the number of days elapsed since Dec. 31, 2020 when the first disturbance was observed in the last year. Since this data was acquired on Aug. 15, 2022, the only possible values would be between 227 and 592 days. So we need to recalibrate the colormap in the plot + + +```python jupyter={"source_hidden": true} +image_opts.update( + clim=(d_min,d_max), + crs=data.rio.crs + ) +layout_opts.update(title=f"VEG_DIST_DATE") +``` + +```python +veg_dist_date.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + + +With thie colormap, the lightest pixels showed some signs of deforestation close to a year ago. By contrast, the black pixels first showed deforestation close to the time of data acquisition. This band, then, is useful for tracking progress of wildfires as they sweep across forests. + + +--- + + +### Band 3: Vegetation Disturbance Status (VEG_DIST_STATUS) + + + +Finally, let's take a look at a third band from the DIST-ALERT data product family, namely the *vegetation disturbance status*. These pixel values are stored as 8-bit unsigned integers; there are only 6 distinct values stored: +* **0:** No disturbance +* **1:** Provisional (**first detection**) disturbance with vegetation cover change <50% +* **2:** Confirmed (**recurrent detection**) disturbance with vegetation cover change < 50% +* **3:** Provisional disturbance with vegetation cover change ≥ 50% +* **4:** Confirmed disturbance with vegetation cover change ≥ 50% +* **255**: Missing data + +A pixel value is flagged *provisionally* changed when the vegetation cover loss (disturbance) is first observed by a satellite; if the change is noticed again in subsequent HLS acquisitions over that pixel, then the pixel is flagged as *confirmed*. + + + +We can use a local file as an example of this particular layer/band of the DIST-ALERT data. The code is the same as above, but do observe: ++ the data filtered reflects the meaning pixel values for this layer (i.e., `data<0` and `data<5`); and ++ the limits on the colormap are reassigned accordingly (i.e., from 0 to 4). + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DIST-ALERT-HLS_T10TEM_20220815T185931Z_20220817T153514Z_S2A_30_v0.1_VEG-DIST-STATUS.tif' +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + +```python jupyter={"source_hidden": true} +veg_dist_status = data.where((data>0)&(data<5)) +image_opts.update(crs=data.rio.crs) +``` + +```python jupyter={"source_hidden": true} +layout_opts.update( + # title=f"VEG_DIST_STATUS", + clim=(0,4), + colorbar_opts={'ticker': FixedTicker(ticks=[0, 1, 2, 3, 4])} + ) +``` + +```python +veg_dist_status.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + + +This continuous colormap does not highlight the features in this plot particularly well. A better choice would be a *categorical* colormap. We'll see how to achieve this in the next notebook (with the OPERA DSWx data products). + + +--- diff --git a/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md b/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md new file mode 100644 index 0000000..2d808ee --- /dev/null +++ b/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md @@ -0,0 +1,325 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.2 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Using the OPERA DSWx Product + + +## The OPERA project + + + +From the [OPERA (Observational Products for End-Users from Remote Sensing Analysis)](https://www.jpl.nasa.gov/go/opera) project: + +>Started in April 2021, the Observational Products for End-Users from Remote Sensing Analysis (OPERA) project at the Jet Propulsion Laboratory collects data from satellite radar and optical instruments to generate six product suites: +> +> + a near-global Surface Water Extent product suite +> + a near-global Surface Disturbance product suite +> + a near-global Radiometric Terrain Corrected product +> + a North America Coregistered Single Look complex product suite +> + a North America Displacement product suite +> + a North America Vertical Land Motion product suite + +That is, OPERA is a NASA initiative that takes, e.g., optical or radar remote-sensing data gathered from satellites and produces a variety of pre-processed data sets for public use. OPERA products are not raw satellite images; they are the result of algorithmic classification to determine, e.g., which land regions contain water or where vegetation has been displaced. The raw satellite images are collected from measurements made by the instruments onboard the Sentinel-1 A/B, Sentinel-2 A/B, and Landsat-8/9 satellite missions (hence the term *HLS* for "*Harmonized Landsat-Sentinel*" in numerous product descriptions). + + +--- + + +## The OPERA Dynamic Surface Water Extent (DSWx) product + + +We've already looked at the DIST family of OPERA data products. In this notebook, we'll examine another OPERA data product: the *Dynamic Surface Water Extent (DSWx)* product (more fully described in the [OPERA DSWx HLS product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf)). This data summarizes the extent of inland water (i.e., water on land masses as opposed to part of an ocean) that can be used to track flooding events or + +As with the DIST products, the DSWx data products are generated from HLS surface reflectance (SR) measurements; specifically, these are made by the Operational Land Imager (OLI) aboard the Landsat 8 satellite, the Operational Land Imager 2 (OLI-2) aboard the Landsat 9 satellite, and the MultiSpectral Instrument (MSI) aboard the Sentinel-2A/B satellites. The derived data products are distributed over tiles in projected map coordinates aligned with the [Military Grid Reference System (MGRS)](https://en.wikipedia.org/wiki/Military_Grid_Reference_System). Each tile covers 109.8 $km^2$ divided into 3660 rows and 3660 columns at 30 meter pixel spacing with tiles overlapping neighbors by 4900 meters in each direction (the details are fully described in the [product specification](https://lpdaac.usgs.gov/documents/1766/OPERA_DIST_HLS_Product_Specification_V1.pdf)). + +Again, the OPERA DSWx products are distributed as [Cloud Optimized GeoTIFFs](https://www.cogeo.org/); in practice, this means that different bands are stored in distinct TIFF files enabling independent downloads of distinct bands/layers. + + +--- + + +### Band 1: Water classification (WTR) + + + +The first band we'll examine is the *water classification (WTR)* layer. This is unsigned 8-bit integer raster data (UInt8) meant to represent whether a pixel contains inland water (e.g., part of a reservoir, a lake, a river, etc., but not water associated with the open ocean). The values in this raster layer are computed from raw images acquired by the satellite with pixels being assigned one of the following 7 integer values. + ++ **0**: Not Water – an area with valid reflectance data that is not open water (class 1), partial surface water (class 2), snow/ice (class 252), cloud/cloud shadow (class 253), or ocean masked (class 254). Masking can result in “not water” (class 0) where land cover masking is applied. ++ **1**: Open Water – an area that is entirely water and unobstructed to the sensor, including obstructions by vegetation, terrain, and buildings. ++ **2**: Partial Surface Water – an area that is at least 50% and less than 100% open water (e.g., inundated sinkholes, floating vegetation, and pixels bisected by coastlines). This may be referred to as "subpixel inundation" when referring to a pixel's area. ++ **252**: Snow/Ice. ++ **253**: Cloud or Cloud Shadow – an area obscured by or adjacent to cloud or cloud shadow. ++ **254**: Ocean Masked - an area identified as ocean using a shoreline database with an added margin ++ **255**: Fill value (missing data). + + + +Let's begin by importing the required libraries and loading a suitable file into an Xarray `DataArray`. + + +```python jupyter={"source_hidden": true} +# Notebook dependencies +import warnings +warnings.filterwarnings('ignore') +from pathlib import Path +import rioxarray as rio +import hvplot.xarray +from bokeh.models import FixedTicker +import geoviews as gv +gv.extension('bokeh') +``` + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DSWx-HLS_T11SQA_20230828T181921Z_20230831T000636Z_S2A_30_v1.0_B01_WTR.tif' + +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + + +As before, we define a basemap using tiles from ESRI and we set up dictionaries `image_opts` and `layout_opts` for common options we'll use when invoking `.hvplot.image`. + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +# Creates basemap +base = gv.tile_sources.EsriImagery.opts(width=1000, height=1000, padding=0.1) +# Initialize image options dictionary +image_opts = dict( + x='easting', + y='northing', + rasterize=True, + dynamic=True, + frame_width=500, + frame_height=500, + aspect='equal', + cmap='hot_r', + clim=(0, 10), + alpha=0.8 + ) +# Initialize layout options dictionary +layout_opts = dict( + xlabel='Longitude', + ylabel='Latitude' + ) +``` + + +As this data is categorical, a continuous colormap is not all that helpful. We'll choose color keys using the dictionary `color_key` with codes used frequently for this kind of data. For all the images plotted here, we'll use variants of the code in the cell below to update `layout_opts` so that plots generated for various layers/bands from the DSWx data products have suitable legends. + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +# Defines colormap for visualization +levels = [0, 0.9, 1.9, 2.9, 7.9, 8.9, 10] +color_key = { + "Not Water": "#ffffff", + "Open Water": "#0000ff", + "Partial Surface Water": "#00ff00", + "Reserved": "#000000", + "Snow/Ice": "#00ffff", + "Clouds/Cloud Shadow": "#7f7f7f" +} + +ticks = [0.5, 1.5, 2.5, 5.5, 8.5, 9.5] +ticker = FixedTicker(ticks=ticks) +labels = dict(zip(ticks, color_key)) + +layout_opts.update( + title='B01_WTR', + color_levels=levels, + cmap=tuple(color_key.values()), + colorbar_opts={'ticker':ticker,'major_label_overrides':labels} + ) +``` + +```python jupyter={"source_hidden": true} +b01_wtr = data.where((data!=255) & (data!=0)) +image_opts.update(crs=data.rio.crs) +``` + +```python jupyter={"source_hidden": true} +b01_wtr.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + + +The plot shows the open water region (Lake Mead) fairly clearly as well as partial surface water in parts of the surrounding region. Zooming in reveals areas of cloud cover, but not much. + + +--- + + +### Band 2: Binary water (BWTR) + + +The second band of the DSWx product is derived from the first, namely *binary water*. It is similar to the WTR layer with the "open water" and "partial surface water" pixels merged together into a single category (and hence is also stored using unsigned 8-bit integers). + ++ **0**: Not Water – an area with valid reflectance data that is not open water (class 1), partial surface water (class 2), snow/ice (class 252), cloud/cloud shadow (class 253), or ocean masked (class 254). Masking can result in “not water” (class 0) where land cover masking is applied. ++ **1**: Water – an area classified as "open water" or "partial surface water" in the WTR layer. ++ **252**: Snow/Ice. ++ **253**: Cloud or Cloud Shadow – an area obscured by or adjacent to cloud or cloud shadow. ++ **254**: Ocean Masked - an area identified as ocean using a shoreline database with an added margin ++ **255**: Fill value (missing data). + + + +Let's load local data from a sample file to see what this the binary water layer looks like. + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DSWx-HLS_T11SQA_20230828T181921Z_20230831T000636Z_S2A_30_v1.0_B02_BWTR.tif' +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + + +The `levels`, `color_key`, and `ticks` need to be redefined to reflect the different categories in this data set. + + +```python jupyter={"source_hidden": true} +# Defines colormap for visualization +levels = [0, 0.9, 1.9, 7.9, 8.9, 10] +color_key = { + "Not Water": "#ffffff", + "Water": "#0000ff", + "Reserved": "#000000", + "Snow/Ice": "#00ffff", + "Clouds/Cloud Shadow": "#7f7f7f" +} + +ticks = [0.5, 1.5, 5.5, 8.5, 9.5] +ticker = FixedTicker(ticks=ticks) +labels = dict(zip(ticks, color_key)) +layout_opts.update( + title='B02_BWTR', + color_levels=levels, + cmap=tuple(color_key.values()), + colorbar_opts={'ticker':ticker,'major_label_overrides':labels} + ) +``` + +```python jupyter={"source_hidden": true} +bwtr = data.where((data!=255) & (data!=0)) +image_opts.update(clim=(0,10), crs=data.rio.crs) +``` + +```python jupyter={"source_hidden": true} +bwtr.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + +--- + + +### Band 5: Interpretation of diagnostic layer into water classes (WTR-1) + + +There are a ten bands or layers associated with the DSWx data product. We won't examine all of them here, but they can be retrieved when needed. For instance, band 3 is the *Confidence (CONF)* layer that provides, for each pixel, quantitative values describing the degree of confidence in the categories given in band 1 (the Water classification layer). Band 4 is a *Diagnostic (DIAG)* layer that encodes, for each pixel, which of five tests were positive in deriving the CONF layer. Details are given in the product specification linked above. + +We'll examine a local file providing an example of band 5, the *Interpretation of diagnostic layer into water classes (WTR-1)*. This layer, encoded with unsigned 8-bit integers, classifies the DIAG layer results in open water, partial surface water, and no-water. This layer is further refined using masks to yield bands 6 (WTR-2) and 1 (WTR). The relevant pixel values are as follows: + ++ **0**: Not Water – an area with valid reflectance data that is not open water (class 1) or partial surface water (class 2). ++ **1**: Open Water – an area that is entirely water and unobstructed to the sensor, including obstructions by vegetation, terrain, and buildings. ++ **2**: Partial Surface Water – an area that is at least 50% and less than 100% open water. This may be referred to as “subpixel inundation” when referring to a pixel’s area. Examples include wetlands, water bodies with floating vegetation, and pixels bisected by coastlines. ++ **254**: Ocean Masked - an area identified as ocean using a shoreline database with an added margin ++ **255**: Fill value (no data). + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DSWx-HLS_T11SQA_20230828T181921Z_20230831T000636Z_S2A_30_v1.0_B05_WTR-1.tif' + +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + +```python jupyter={"source_hidden": true} +# Defines colormap for visualization +levels = [0, 0.6, 1.3, 2] +color_key = { + "Not Water": "#ffffff", + "Open Water": "#0000ff", + "Partial Surface Water": "#00ff00", +} + +ticks = [0.25, 0.9, 1.6] +ticker = FixedTicker(ticks=ticks) +labels = dict(zip(ticks, color_key)) + +layout_opts.update( + title='B05_WTR-1', + color_levels=levels, + cmap=tuple(color_key.values()), + colorbar_opts={'ticker':ticker,'major_label_overrides':labels} + ) +``` + +```python jupyter={"source_hidden": true} +wtr1 = data.where((data!=255) & (data!=0)) +image_opts.update(clim=(0,2), crs=data.rio.crs) +``` + +```python jupyter={"source_hidden": true} +wtr1.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + +--- + + +### Band 6: Interpreted layer refined using land cover and terrain shadow testing (WTR-2) + + +The sixth band — *Interpreted layer refined using land cover and terrain shadow testing (WTR-2)* — is derived from the fifth using additional tests. The details are in the product specification. The code provided here shows how to visualize it. It has the possible pixel values as band 5, but pixels may differ in their classification due to additional tests that can eliminate false-positive water detections using land cover and terrain shadow information. + + +```python jupyter={"source_hidden": true} +LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DSWx-HLS_T11SQA_20230828T181921Z_20230831T000636Z_S2A_30_v1.0_B06_WTR-2.tif' + +data = rio.open_rasterio(LOCAL_PATH) +data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +``` + +```python jupyter={"source_hidden": true} +# Defines colormap for visualization +levels = [0, 0.6, 1.3, 2] +color_key = { + "Not Water": "#ffffff", + "Open Water": "#0000ff", + "Partial Surface Water": "#00ff00", +} + +ticks = [0.25, 0.9, 1.6] +ticker = FixedTicker(ticks=ticks) +labels = dict(zip(ticks, color_key)) + +layout_opts.update( + title='B06_WTR-2', + color_levels=levels, + cmap=tuple(color_key.values()), + colorbar_opts={'ticker':ticker,'major_label_overrides':labels} + ) +``` + +```python jupyter={"source_hidden": true} +wtr2 = data.where((data!=255) & (data!=0)) +image_opts.update(clim=(0,2), crs=data.rio.crs) +``` + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +wtr2.hvplot.image(**image_opts).opts(**layout_opts) * base +``` + + +This notebook provides an overview of how to visualize data extracted from OPERA DSWx data products that are stored locally. We're now ready to automate the search for such products in the cloud using the PySTAC API. + + + +---- + diff --git a/03_Using_NASA_EarthData/03_Using_PySTAC.md b/03_Using_NASA_EarthData/03_Using_PySTAC.md new file mode 100644 index 0000000..8e5462d --- /dev/null +++ b/03_Using_NASA_EarthData/03_Using_PySTAC.md @@ -0,0 +1,240 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.2 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Using the PySTAC API + + +There is an abundance of data searchable through NASA's [Earthdata Search](https://search.earthdata.nasa.gov). The preceding link connects to a GUI for searching [SpatioTemporal Asset Catalogs (STAC)](https://stacspec.org/) by specifying an *Area of Interest (AOI)* and a *time-window* or *range of dates*. + + + +For the sake of reproducibility, we want to be able to search asset catalogs programmatically. This is where the [PySTAC](https://pystac.readthedocs.io/en/stable/) library comes in. + +--- + + +## Defining AOI & range of dates + + +Let's start by considering a particular example. [Heavy rains severely impacted Argentina in March 2024](https://www.reuters.com/world/americas/argentina-downpour-drenches-crop-fields-flash-floods-buenos-aires-2024-03-12/). The event resulted in flash floods and impacted crop yields, severely impacting the Buenos Aires metropolitan area, and caused significant damage to property and human life. In this notebook, we'll set up a DataFrame to process results retrieved when searching relevant OPERA DSWx-HLS data catalogs. + +Let's start with relevant imports. + + +```python jupyter={"source_hidden": true} +from warnings import filterwarnings +filterwarnings('ignore') +# data wrangling imports +import numpy as np +import pandas as pd +import xarray as xr +# STAC imports to retrieve cloud data +from pprint import pprint +from pystac_client import Client +``` + +```python jupyter={"source_hidden": true} +# Imports for plotting +import hvplot.pandas +import geoviews as gv +from geoviews import opts +gv.extension('bokeh') +``` + + +Next, let's define search parameters so we can retrieve data pertinent to that flooding event. This involves specifying an *area of interest (AOI)* and a *range of dates*. ++ The AOI is specified as a rectangle of latitude-longitude coordinates in a single 4-tuple of the form + $$({\mathtt{latitude}}_{\mathrm{min}},{\mathtt{longitude}}_{\mathrm{min}},{\mathtt{latitude}}_{\mathrm{max}},{\mathtt{longitude}}_{\mathrm{max}}),$$ + i.e., the lower,left corner coordinates followed by the upper, right corner coordinates. ++ The range of dates is specified as a string of the form + $$ {\mathtt{date}_{\mathrm{start}}}/{\mathtt{date}_{\mathrm{end}}}, $$ + where dates are specified in standard ISO 8601 format `YYYY-MM-DD`. + + +```python jupyter={"source_hidden": true} +# Define data search parameters + +# Define AOI as left, bottom, ri/ght and top lat/lon extent +aoi = (-59.63818, -35.02927, -58.15723, -33.77271) +# We will search data for the month of March 2024 +date_range = '2024-03-01/2024-03-31' +``` + + +Make a quick visual check that the tuple `aoi` actually describes the geographic region around Buenos Aires. + + +```python jupyter={"source_hidden": true} +basemap = gv.tile_sources.OSM +rect = gv.Rectangles(aoi).opts(opts.Rectangles(alpha=0.25, color='cyan')) + +rect*basemap +``` + +## Executing a search with the PySTAC API + +```python jupyter={"source_hidden": true} +# We open a client instance to search for data, and retrieve relevant data records +STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' + +# Setup PySTAC client +catalog = Client.open(f'{STAC_URL}/POCLOUD/') +collections = ["OPERA_L3_DSWX-HLS_V1"] + +# We would like to search data using the search parameters defined above. +opts = { + 'bbox' : aoi, + 'collections': collections, + 'datetime' : date_range, +} + +search = catalog.search(**opts) +``` + +```python jupyter={"source_hidden": true} +print(f'{type(search)=}') +``` + +```python jupyter={"source_hidden": true} +results = list(search.items_as_dicts()) +print(f"Number of tiles found intersecting given bounding box: {len(results)}") +``` + + +The object `results` retrieved from the search is a list of Python dictionaries (as suggested by the method name `items_as_dicts`). Let's parse the the first entry of `results`. + + +```python jupyter={"source_hidden": true} +result = results[0] +print(f'{type(result)=}') +print(result.keys()) +``` + + +Each element of `results` is a dictionary that contains other other nested dictionaries. The Python utility `pprint.pprint` library helps us examine the structure of the search results. + + +```python jupyter={"source_hidden": true} +pprint(result, compact=True, width=10, sort_dicts=False) +``` + + +The particular values we want to pick out from `result` are: ++ `result['properties']['datetime']` : timestamp associated with a particular granule; and ++ `result['assets']['0_B01_WTR']['href']` : URI associated with a particular granule (pointing to a GeoTIFF file). + +``` +{... + 'properties': {'eo:cloud_cover': 95, + 'datetime': '2024-03-01T13:44:11.879Z', + 'start_datetime': '2024-03-01T13:44:11.879Z', + 'end_datetime': '2024-03-01T13:44:11.879Z'}, + 'assets': {'0_B01_WTR': {'href': 'https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T21HUC_20240301T134411Z_20240305T232837Z_L8_30_v1.0_B01_WTR.tif', + 'title': 'Download ' + 'OPERA_L3_DSWx-HLS_T21HUC_20240301T134411Z_20240305T232837Z_L8_30_v1.0_B01_WTR.tif'}, + '0_B02_BWTR': ... + } + +``` + + +```python jupyter={"source_hidden": true} +# Look at specific values extracted from the 'properties' & 'assets' keys. +print(result['properties']['datetime']) +print(result['assets']['0_B01_WTR']['href']) +``` + +## Summarizing search results in a DataFrame + + +Let's extract these particular fields into a Pandas DataFrame for convenience. + + +```python jupyter={"source_hidden": true} +times = pd.DatetimeIndex([result['properties']['datetime'] for result in results]) +hrefs = {'hrefs': [result['assets']['0_B01_WTR']['href'] for result in results]} +``` + +```python jupyter={"source_hidden": true} +# Construct Pandas DataFrame to summarize granules from search results +granules = pd.DataFrame(index=times, data=hrefs) +granules.index.name = 'times' +``` + +```python jupyter={"source_hidden": true} +granules +``` + + +Examining the index reveals that the timestamps of the granules returned are not unique, i.e., granules correspond to distinct data products deriveded during a single aerial acquisition by a satellite. + + +```python jupyter={"source_hidden": true} +len(granules.index.unique()) / len(granules) # Notice the timestamps are not all unique, i.e., some are repeated +``` + + +The `hrefs` (i.e., the URIs or URLs pointed to in a given row in `granules`) are unique, telling us that the granules refer to distinct data products or bands derived from each data acquisition even if the timestamps match. + + +```python jupyter={"source_hidden": true} +len(granules.hrefs.unique()) / len(granules) # Make sure all the hrefs are unique +``` + + +Let's get a sense of how many granules are available for each day of the month. Note, we don't know how many of these tiles contain cloud cover obscuring features of interest yet. + +The next few lines do some Pandas manipulations of the DataFrame `granules` to yield a line plot showing what dates are associated with the most granules. + + +```python jupyter={"source_hidden": true} +granules_by_day = granules.resample('1d') # Grouping by day, i.e., "resampling" +``` + +```python jupyter={"source_hidden": true} +granule_counts = granules_by_day.count() # Aggregating counts +``` + +```python jupyter={"source_hidden": true} +# Ignore the days with no associated granules +granule_counts = granule_counts[granule_counts.hrefs > 0] +``` + +```python jupyter={"source_hidden": true} +# Relabel the index & column of the DataFrame +granule_counts.index.name = 'Day of Month' +granule_counts.rename({'hrefs':'Granule count'}, inplace=True, axis=1) +``` + +```python jupyter={"source_hidden": true} +count_title = '# of DSWx-HLS granules available / day' +granule_counts.hvplot.line(title=count_title, grid=True, frame_height=300, frame_width=600) +``` + + +The floods primarily occurred between March 11th and 13th. Unfortunately, there are few granules associated with those particular days. We can, in principal, use the URIs stored in this DataFrame to set up analysis of the data associated with this event; we'll do so in other examples with better data available. + + +--- + + +We could go further to download data from the URIs provided but we won't with this example. This notebook primarily provides an example to show how to use the PySTAC API. + +In subsequent notebooks, we'll use this general workflow: + +1. Set up a search query by identifying a particular AOI and range of dates. +2. Identify a suitable asset catalog and execute the search using `pystac.Client`. +3. Convert the search results into a Pandas DataFrame containing the principal fields of interest. +4. Use the resulting DataFrame to access relevant remote data for analysis and/or visualization. + From bc8b6351e50909d69ad945784701edc450124f4e Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Mon, 16 Sep 2024 09:32:36 -0700 Subject: [PATCH 06/10] Clean-up --- .../01_Using_OPERA_DIST_Products.md | 0 .../02_Using_OPERA_DSWx_Products.md | 0 .../03_Using_PySTAC.md | 0 .../0_Configuracion_inicial.md | 64 ----------------- book/04_NASA_Earthdata/0_Initial_Setup.md | 70 ------------------- 5 files changed, 134 deletions(-) rename {03_Using_NASA_EarthData => book/03_Using_NASA_EarthData}/01_Using_OPERA_DIST_Products.md (100%) rename {03_Using_NASA_EarthData => book/03_Using_NASA_EarthData}/02_Using_OPERA_DSWx_Products.md (100%) rename {03_Using_NASA_EarthData => book/03_Using_NASA_EarthData}/03_Using_PySTAC.md (100%) delete mode 100644 book/04_NASA_Earthdata/0_Configuracion_inicial.md delete mode 100644 book/04_NASA_Earthdata/0_Initial_Setup.md diff --git a/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md b/book/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md similarity index 100% rename from 03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md rename to book/03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md diff --git a/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md b/book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md similarity index 100% rename from 03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md rename to book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md diff --git a/03_Using_NASA_EarthData/03_Using_PySTAC.md b/book/03_Using_NASA_EarthData/03_Using_PySTAC.md similarity index 100% rename from 03_Using_NASA_EarthData/03_Using_PySTAC.md rename to book/03_Using_NASA_EarthData/03_Using_PySTAC.md diff --git a/book/04_NASA_Earthdata/0_Configuracion_inicial.md b/book/04_NASA_Earthdata/0_Configuracion_inicial.md deleted file mode 100644 index 55e4cec..0000000 --- a/book/04_NASA_Earthdata/0_Configuracion_inicial.md +++ /dev/null @@ -1,64 +0,0 @@ -# Configuración Inicial - -## 1-¿Cómo utilizar el hub de 2i2c? - -Para acceder al 2i2c Hub seguí estos sencillos pasos: -* Accede al [Hub de 2i2c](https://showcase.2i2c.cloud/hub/login) - - - -![2i2c_login](../assets/2i2c_login.png) - - - -* Introducí tus credenciales: Ingresa tu nombre de usuario y contraseña (Nota: Para esto deberías haber enviado tu correo asociado a tu cuenta de Github para que se te habilite el acceso con dicha cuenta). - -* Si el acceso es correcto, verás la pantalla que se muestra a continuación. Por último elegí la opción Start para ingresar al ambiente de Jupyter lab en la nube. - - -![start_server](../assets/start_server.png) - - - -* Listo, ya estás listo para empezar a trabajar. - - - - -![ambiente_trabajo_jupyter_lab](../assets/work_environment_jupyter_lab.png) - - - -## 2- ¿Cómo utilizar el Earthdata de la NASA? - - -### Breve introducción - -El programa **Earth Science Data Systems (ESDS)**, **Programa de Sistemas de Datos de Ciencias de la Tierra** de la NASA, supervisa el ciclo de vida de los datos científicos de la Tierra de todas sus misiones de observación de la Tierra, desde su adquisición hasta su procesamiento y distribución. - -A los efectos de esta guía, el sitio web Earthdata de la NASA es el punto de entrada que permite acceder de manera completa, gratuita y abierta a las colecciones de datos de ciencias de la Tierra de la NASA, con el fin de acelerar el avance científico en beneficio de la sociedad. Para acceder a los datos a través de este portal, los usuarios deben definir primero sus credenciales de acceso. - - Para crear una cuenta en EarthData, seguí los pasos que se indica: - - * Ingresa al sitio de Earth Nasa: https://www.earthdata.nasa.gov/. Luego selecciona la opción "Use Data" y a continuación "Register". Por último, ingresa a https://urs.earthdata.nasa.gov/. - - -![earthdata_login](../assets/earthdata_login.png) - - - * Selecciona la opción "Register for a profile", allí elige un *nombre de usuario* y *contraseña*. Como sugerencia, elige aquellos que recuerdes bien, ya que los necesitarás más adelante. También deberás cargar tu perfil para compeltar el registro, en el mismo se te pedirán datos como correo, país, afiliación, entre otros. Al final, elige "Register for Earthdata Login". - - -![earthdata_profile](../assets/earthdata_profile2.png) - - - - -## 3- Configuración de datos para acceder desde Jupyter notebooks - -Ahora viene la parte técnica: para acceder a los datos desde programas de Python y Jupyter notebooks, es necesario guardar las credenciales (de EarthData) en un archivo especial. En este repositorio encontrarás un archivo llamado `.netrc` con un ejemplo (puedes pensar en él como una plantilla). Abre ese archivo y edita la siguiente línea: - -`machine urs.earthdata.nasa.gov login {tu_nombre_de_usuario} password {tu_contraseña}` - -Luego, reemplaza `{tu_nombre_de_usuario`} y `{tu_contraseña}` con los datos de tu cuenta. Guarda el archivo y ¡listo! Ya tienes todo lo necesario para acceder a los datos de observación de la Tierra a través del portal de EarthData. ️ -Para asegurarte de que todo funciona correctamente, abre la notebook titulada 1`_primeros_pasos.ipynb `y sigue las indicaciones. ¡Con esto ya podrás explorar el mundo de los datos de la NASA! diff --git a/book/04_NASA_Earthdata/0_Initial_Setup.md b/book/04_NASA_Earthdata/0_Initial_Setup.md deleted file mode 100644 index 4177c01..0000000 --- a/book/04_NASA_Earthdata/0_Initial_Setup.md +++ /dev/null @@ -1,70 +0,0 @@ -# Initial Setup - -## 1- Accesing to the 2i2c Hub -To login to the 2i2c Hub, follow these simple steps: - -* **Head to the Hub:** Visit this link to access the 2i2c Hub: https://climaterisk.opensci.2i2c.cloud/. - -* **Log in with your Credentials:** - -**Username:** Feel free to choose any username you like. Te recommend your GitHub username helps avoid conflicts with others. - -**Password:** You'll receive the password the day before the tutorial. - - -![2i2c_login](../assets/2i2c_login.png) - - -* **Logging In:** - -The login process might take a few minutes, especially if a new virtual workspace needs to be created just for you. - - -![start_server2](../assets/start_server_2i2c.png) - - -* **What to Expect:** - -By default, logging into https://climaterisk.opensci.2i2c.cloud will automatically clone https://github.com/ScienceCore/scipy-2024-climaterisk and change to that directly. If the login is successful, you will see the following screen. - - -![work_environment_jupyter_lab](../assets/work_environment_jupyter_lab.png) - -Finally, if you see the previous JupyterLab screen, you are ready to start working. - -**Notes:** Any files you work on will be saved between sessions as long as you use the same username. - - - -## 2. Using NASA's Earthdata - -### Brief Introduction - -The NASA Earth Science Data Systems (ESDS) program oversees the lifecycle of Earth science data from all its Earth observation missions, from acquisition to processing and distribution. - -For the purposes of this guide, the NASA Earthdata website is the entry point that allows full, free and open access to NASA's Earth science data collections, in order to accelerate scientific progress for the benefit of society. To access the data through this portal, users must first define their access credentials. To create an EarthData account, follow these steps: - -Go to the Earth Nasa website: https://www.earthdata.nasa.gov/. Then, select the option "Use Data" and then "Register". Finally, go to https://urs.earthdata.nasa.gov/. - -![earthdata_login](../assets/earthdata_login.png) - -Select the "Register for a profile" option, there choose a username and password. As a suggestion, choose ones that you remember well, as you will need them later. You will also need to complete your profile to complete the registration, where you will be asked for information such as email, country, affiliation, among others. Finally, choose "Register for Earthdata Login". - -![earthdata_profile](../assets/earthdata_profile2.png) - -## 3. Data Configuration for Access from Jupyter Notebooks - - -Now comes the technical part: to access data from Python programs and Jupyter notebooks, it is necessary to save the credentials (from EarthData) in a special file. In this repository you will find a file called .netrc with an example (you can think of it as a template). Open that file and edit the following line: -``` - -machine urs.earthdata.nasa.gov login {your_username} password {your_password} -``` - -Then, replace `{your_username}` and `{your_password} `with your account details. Save the file and you're done! You now have everything you need to access Earth observation data through the EarthData portal. ️ - -To make sure everything is working properly, open the notebook titled `1_getting_started.ipynb` and follow the instructions. With this, you will be able to explore the world of NASA data! - - - - From ead2d88a99761ec89debba1efa33b92c1fb9266b Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Mon, 16 Sep 2024 09:34:11 -0700 Subject: [PATCH 07/10] Updates 02_Data_Visualization_Tools.md with rectangle/point plotting --- .../02_Data_Visualization_Tools.md | 56 ++++++++++++------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/book/02_Software_Tools/02_Data_Visualization_Tools.md b/book/02_Software_Tools/02_Data_Visualization_Tools.md index 5ea64a7..411b560 100644 --- a/book/02_Software_Tools/02_Data_Visualization_Tools.md +++ b/book/02_Software_Tools/02_Data_Visualization_Tools.md @@ -41,14 +41,12 @@ from geoviews import opts ### Displaying a basemap -- Principal utility is `gv.tile_sources` -- Use the method `opts` to specify optional settings -- Bokeh menu at right enables interactive exploration +A *basemap* or *tile layer* is useful when displaying vector or raster data because it allows us to overlay the relevant geospatial data on a familar gepgraphical map as a background. The principal utility is we'll use is `gv.tile_sources`. We can use the method `opts` to specify additional confirguration settings. Below, we use the *Open Street Map (OSM)* Web Map Tile Service to create the object `basemap`. When we display the repr for this object in the notebook cell, the Bokeh menu at right enables interactive exploration. ```python jupyter={"source_hidden": true} basemap = gv.tile_sources.OSM.opts(width=600, height=400) -basemap +basemap # When displayed, this basemap can be zoomed & panned using the menu at the right ``` ### Plotting points @@ -63,11 +61,7 @@ print(tokyo_lonlat) ``` -+ `geoviews.Points` accepts a list of tuples (each of the form `(x, y)`) to plot. -+ Use the OpenStreetMap tiles from `gv.tile_sources.OSM` as a basemap. -+ Overlay using the Holoviews operator `*` -+ Define the options using `geoviews.opts` -+ ? find a way to initialize the zoom level sensibly? +The class `geoviews.Points` accepts a list of tuples (each of the form `(x, y)`) & constructs a `Points` object that can be displayed. We can overlay the point created in the OpenStreetMap tiles from `basemap` using the `*` operator in Holoviews. We can also use `geoviews.opts` to set various display preferences for these points. ```python jupyter={"source_hidden": true} @@ -77,6 +71,7 @@ point_opts = opts.Points( alpha=0.5, color='red' ) +print(type(tokyo_point)) ``` ```python jupyter={"source_hidden": true} @@ -98,31 +93,38 @@ point_opts = opts.Points( (assuming $x_{\mathrm{max}}>x_{\mathrm{min}}$ & $y_{\mathrm{max}}>y_{\mathrm{min}}$) is as a single 4-tuple $$(x_{\mathrm{min}},y_{\mathrm{min}},x_{\mathrm{max}},y_{\mathrm{max}}),$$ i.e., the lower,left corner coordinates followed by the upper, right corner coordinates. + + Let's create a simple function to generate a rectangle of a given width & height given the centre coordinate. ```python jupyter={"source_hidden": true} -# simple utility to make a rectangle of "half-width" dx & "half-height" dy & centred pt -def bounds(pt,dx,dy): +# simple utility to make a rectangle of width dx & height dy & centre pt +def make_rect(pt,dx,dy): '''Returns rectangle represented as tuple (x_lo, y_lo, x_hi, y_hi) - given inputs pt=(x, y), half-width & half-height dx & dy respectively, - where x_lo = x-dx, x_hi=x+dx, y_lo = y-dy, y_hi = y+dy. + given inputs pt=(x, y), width & height dx & dy respectively, + where x_lo = x-dx/2, x_hi=x+dx/2, y_lo = y-dy/2, y_hi = y+dy/2. ''' - return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx,dy))) + return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx/2,dy/2))) ``` + +We can test the preceding function using the longitude-latitude coordinates of Marrakesh, Morocco. + + ```python jupyter={"source_hidden": true} # Verify that the function bounds works as intended marrakesh_lonlat = (-7.93, 31.67) -dlon, dlat = 0.25, 0.25 -marrakesh_rect = bounds(marrakesh_lonlat, dlon, dlat) +dlon, dlat = 0.5, 0.25 +marrakesh_rect = make_rect(marrakesh_lonlat, dlon, dlat) print(marrakesh_rect) ``` -+ `geoviews.Rectangles` accepts a list of bounding boxes (each described by a tuple of the form `(x_min, y_min, x_max, y_max)`) to plot. +The utility `geoviews.Rectangles` accepts a list of bounding boxes (each described by a tuple of the form `(x_min, y_min, x_max, y_max)`) to plot. We can again use `geoviews.opts` to tailor the rectangle as needed. ```python jupyter={"source_hidden": true} +rectangle = gv.Rectangles([marrakesh_rect]) rect_opts = opts.Rectangles( line_width=0, alpha=0.25, @@ -130,9 +132,25 @@ rect_opts = opts.Rectangles( ) ``` + +We can plot a point for Marrakesh as before using `geoviews.Points` (customized using `geoviews.opts`). + + ```python jupyter={"source_hidden": true} -rectangle = gv.Rectangles([marrakesh_rect]) -(basemap * rectangle).opts( rect_opts ) +marrakesh_point = gv.Points([marrakesh_lonlat]) +point_opts = opts.Points( + size=48, + alpha=0.25, + color='blue' + ) +``` + + +Finally, we can overlay all these features on the basemap with the options applied. + + +```python jupyter={"source_hidden": true} +(basemap * rectangle * marrakesh_point).opts( rect_opts, point_opts ) ``` From 9589c2e5fdcd69a4cea8bde437c53609adfacc78 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Mon, 16 Sep 2024 23:18:14 -0700 Subject: [PATCH 08/10] Cleans up 02_Data_Visualization_Tools.md, 03_Using_PySTAC.md --- .../02_Data_Visualization_Tools.md | 14 +++--- .../03_Using_PySTAC.md | 44 ++++++++++--------- 2 files changed, 30 insertions(+), 28 deletions(-) diff --git a/book/02_Software_Tools/02_Data_Visualization_Tools.md b/book/02_Software_Tools/02_Data_Visualization_Tools.md index 411b560..60f61b9 100644 --- a/book/02_Software_Tools/02_Data_Visualization_Tools.md +++ b/book/02_Software_Tools/02_Data_Visualization_Tools.md @@ -98,9 +98,9 @@ print(type(tokyo_point)) ```python jupyter={"source_hidden": true} -# simple utility to make a rectangle of width dx & height dy & centre pt -def make_rect(pt,dx,dy): - '''Returns rectangle represented as tuple (x_lo, y_lo, x_hi, y_hi) +# simple utility to make a rectangle centered at pt of width dx & height dy +def make_bbox(pt,dx,dy): + '''Returns bounding-box represented as tuple (x_lo, y_lo, x_hi, y_hi) given inputs pt=(x, y), width & height dx & dy respectively, where x_lo = x-dx/2, x_hi=x+dx/2, y_lo = y-dy/2, y_hi = y+dy/2. ''' @@ -115,8 +115,8 @@ We can test the preceding function using the longitude-latitude coordinates of M # Verify that the function bounds works as intended marrakesh_lonlat = (-7.93, 31.67) dlon, dlat = 0.5, 0.25 -marrakesh_rect = make_rect(marrakesh_lonlat, dlon, dlat) -print(marrakesh_rect) +marrakesh_bbox = make_bbox(marrakesh_lonlat, dlon, dlat) +print(marrakesh_bbox) ``` @@ -124,10 +124,10 @@ The utility `geoviews.Rectangles` accepts a list of bounding boxes (each describ ```python jupyter={"source_hidden": true} -rectangle = gv.Rectangles([marrakesh_rect]) +rectangle = gv.Rectangles([marrakesh_bbox]) rect_opts = opts.Rectangles( line_width=0, - alpha=0.25, + alpha=0.1, color='red' ) ``` diff --git a/book/03_Using_NASA_EarthData/03_Using_PySTAC.md b/book/03_Using_NASA_EarthData/03_Using_PySTAC.md index 8e5462d..9dd53b2 100644 --- a/book/03_Using_NASA_EarthData/03_Using_PySTAC.md +++ b/book/03_Using_NASA_EarthData/03_Using_PySTAC.md @@ -68,7 +68,12 @@ Next, let's define search parameters so we can retrieve data pertinent to that f # Define AOI as left, bottom, ri/ght and top lat/lon extent aoi = (-59.63818, -35.02927, -58.15723, -33.77271) # We will search data for the month of March 2024 -date_range = '2024-03-01/2024-03-31' +date_range = '2024-03-08/2024-03-15' +# Define a dictionary with appropriate keys: 'bbox' and 'datetime' +search_params = { + 'bbox' : aoi, + 'datetime' : date_range, + } ``` @@ -77,7 +82,7 @@ Make a quick visual check that the tuple `aoi` actually describes the geographic ```python jupyter={"source_hidden": true} basemap = gv.tile_sources.OSM -rect = gv.Rectangles(aoi).opts(opts.Rectangles(alpha=0.25, color='cyan')) +rect = gv.Rectangles([aoi]).opts(opts.Rectangles(alpha=0.25, color='cyan')) rect*basemap ``` @@ -85,29 +90,26 @@ rect*basemap ## Executing a search with the PySTAC API ```python jupyter={"source_hidden": true} -# We open a client instance to search for data, and retrieve relevant data records +# Define the base URL for the STAC to search STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -catalog = Client.open(f'{STAC_URL}/POCLOUD/') -collections = ["OPERA_L3_DSWX-HLS_V1"] - -# We would like to search data using the search parameters defined above. -opts = { - 'bbox' : aoi, - 'collections': collections, - 'datetime' : date_range, -} - -search = catalog.search(**opts) +# Update the dictionary opts with list of collections to search +collections = ["OPERA_L3_DSWX-HLS_V1_1.0"] +search_params.update(collections=collections) +print(search_params) ``` -```python jupyter={"source_hidden": true} -print(f'{type(search)=}') -``` + +Having defined the search parameters, we can instantiate a `Client` and search the spatio-temporal asset catalog. + ```python jupyter={"source_hidden": true} -results = list(search.items_as_dicts()) +# We open a client instance to search for data, and retrieve relevant data records +catalog = Client.open(f'{STAC_URL}/POCLOUD/') +search_results = catalog.search(**search_params) + +print(f'{type(search_results)=}') + +results = list(search_results.items_as_dicts()) print(f"Number of tiles found intersecting given bounding box: {len(results)}") ``` @@ -180,7 +182,7 @@ granules Examining the index reveals that the timestamps of the granules returned are not unique, i.e., granules correspond to distinct data products deriveded during a single aerial acquisition by a satellite. -```python jupyter={"source_hidden": true} +```python len(granules.index.unique()) / len(granules) # Notice the timestamps are not all unique, i.e., some are repeated ``` From 35bc787b87c7b3dbf65093558b2cc455c78a9f59 Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Mon, 16 Sep 2024 23:53:06 -0700 Subject: [PATCH 09/10] Removes extraneous directories/materials --- .../Assessment_Activities.md | 0 .../Learner_Personas.md | 0 .../Module_Objectives.md | 0 .../Outcomes.md | 0 .../Requirements.md | 0 .../disenio_leccion.md | 0 .../02_Data_Visualization_Tools.md | 2 +- .../Retrieving_Disturbance_Data.md | 329 ---------- book/07_Wildfire_analysis/Wildfire.md | 3 - .../3_Retrieving_FloodData.md | 431 ------------- book/08_Flood_analysis/flood.md | 4 - book/09_Scratch/2_Selecting_an_AOI.md | 224 ------- book/09_Scratch/4_Analyzing_Datasets.md | 2 - ...5_Manipulating_and_Visualizing_Datasets.md | 2 - book/09_Scratch/SLIDES-NASA-TOPS-flood-EN.md | 251 -------- book/09_Scratch/SLIDES-NASA-TOPS-flood-ES.md | 251 -------- book/09_Scratch/drought.md | 3 - book/09_Scratch/notebooks/2_ES_Flood.md | 605 ------------------ book/09_Scratch/proposal.md | 86 --- book/09_Scratch/references.bib | 156 ----- .../slides/Open_Science_Intro_Slides.md | 72 --- 21 files changed, 1 insertion(+), 2420 deletions(-) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/Assessment_Activities.md (100%) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/Learner_Personas.md (100%) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/Module_Objectives.md (100%) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/Outcomes.md (100%) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/Requirements.md (100%) rename {book/09_Scratch/about_the_tutorial => about_the_tutorial}/disenio_leccion.md (100%) delete mode 100644 book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md delete mode 100644 book/07_Wildfire_analysis/Wildfire.md delete mode 100644 book/08_Flood_analysis/3_Retrieving_FloodData.md delete mode 100644 book/08_Flood_analysis/flood.md delete mode 100644 book/09_Scratch/2_Selecting_an_AOI.md delete mode 100644 book/09_Scratch/4_Analyzing_Datasets.md delete mode 100644 book/09_Scratch/5_Manipulating_and_Visualizing_Datasets.md delete mode 100644 book/09_Scratch/SLIDES-NASA-TOPS-flood-EN.md delete mode 100644 book/09_Scratch/SLIDES-NASA-TOPS-flood-ES.md delete mode 100644 book/09_Scratch/drought.md delete mode 100644 book/09_Scratch/notebooks/2_ES_Flood.md delete mode 100644 book/09_Scratch/proposal.md delete mode 100644 book/09_Scratch/references.bib delete mode 100644 book/09_Scratch/slides/Open_Science_Intro_Slides.md diff --git a/book/09_Scratch/about_the_tutorial/Assessment_Activities.md b/about_the_tutorial/Assessment_Activities.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/Assessment_Activities.md rename to about_the_tutorial/Assessment_Activities.md diff --git a/book/09_Scratch/about_the_tutorial/Learner_Personas.md b/about_the_tutorial/Learner_Personas.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/Learner_Personas.md rename to about_the_tutorial/Learner_Personas.md diff --git a/book/09_Scratch/about_the_tutorial/Module_Objectives.md b/about_the_tutorial/Module_Objectives.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/Module_Objectives.md rename to about_the_tutorial/Module_Objectives.md diff --git a/book/09_Scratch/about_the_tutorial/Outcomes.md b/about_the_tutorial/Outcomes.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/Outcomes.md rename to about_the_tutorial/Outcomes.md diff --git a/book/09_Scratch/about_the_tutorial/Requirements.md b/about_the_tutorial/Requirements.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/Requirements.md rename to about_the_tutorial/Requirements.md diff --git a/book/09_Scratch/about_the_tutorial/disenio_leccion.md b/about_the_tutorial/disenio_leccion.md similarity index 100% rename from book/09_Scratch/about_the_tutorial/disenio_leccion.md rename to about_the_tutorial/disenio_leccion.md diff --git a/book/02_Software_Tools/02_Data_Visualization_Tools.md b/book/02_Software_Tools/02_Data_Visualization_Tools.md index 60f61b9..96a1812 100644 --- a/book/02_Software_Tools/02_Data_Visualization_Tools.md +++ b/book/02_Software_Tools/02_Data_Visualization_Tools.md @@ -41,7 +41,7 @@ from geoviews import opts ### Displaying a basemap -A *basemap* or *tile layer* is useful when displaying vector or raster data because it allows us to overlay the relevant geospatial data on a familar gepgraphical map as a background. The principal utility is we'll use is `gv.tile_sources`. We can use the method `opts` to specify additional confirguration settings. Below, we use the *Open Street Map (OSM)* Web Map Tile Service to create the object `basemap`. When we display the repr for this object in the notebook cell, the Bokeh menu at right enables interactive exploration. +A *basemap* or *tile layer* is useful when displaying vector or raster data because it allows us to overlay the relevant geospatial data on a familar geographical map as a background. The principal utility is we'll use is `gv.tile_sources`. We can use the method `opts` to specify additional confirguration settings. Below, we use the *Open Street Map (OSM)* Web Map Tile Service to create the object `basemap`. When we display the repr for this object in the notebook cell, the Bokeh menu at right enables interactive exploration. ```python jupyter={"source_hidden": true} diff --git a/book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md b/book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md deleted file mode 100644 index 066a717..0000000 --- a/book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md +++ /dev/null @@ -1,329 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.1 - kernelspec: - display_name: Python 3 (ipykernel) - language: python - name: python3 ---- - -# Retrieving Disturbance Data - -The [OPERA DIST-HLS data product](https://lpdaac.usgs.gov/documents/1766/OPERA_DIST_HLS_Product_Specification_V1.pdf) can be used to study the impacts and evolution of wildfires at a large scale. In this notebook, we will retrieve data associated with the [2023 Greece wildfires](https://en.wikipedia.org/wiki/2023_Greece_wildfires) to understand its evolution and extent. We will also generate a time series visualization of the event. - -In particular, we will be examining the area around the city of [Alexandroupolis](https://en.wikipedia.org/wiki/Alexandroupolis) which was severely impacted by the wildfires, resulting in loss of lives, property, and forested areas. - -```python -# Plotting imports -import matplotlib.pyplot as plt -from matplotlib.colors import ListedColormap -from rasterio.plot import show -from mpl_toolkits.axes_grid1.anchored_artists import AnchoredSizeBar - -# GIS imports -from shapely.geometry import Point -from osgeo import gdal -from rasterio.merge import merge -import rasterio -import contextily as cx -import folium - -# data wrangling imports -import pandas as pd -import numpy as np - -# misc imports -from datetime import datetime, timedelta -from collections import defaultdict - -# STAC imports to retrieve cloud data -from pystac_client import Client - -# GDAL setup for accessing cloud data -gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt') -gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt') -gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR') -gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') -``` - -```python -# Define data search parameters - -# Define AOI as left, bottom, right and top lat/lon extent -dadia_forest = Point(26.18, 41.08).buffer(0.1) - -# We will search data for the month of March 2024 -start_date = datetime(year=2023, month=8, day=1) -stop_date = datetime(year=2023, month=9, day=30) -``` - -```python -# We open a client instance to search for data, and retrieve relevant data records -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -# LPCLOUD refers to the LP DAAC cloud environment that hosts earth observation data -catalog = Client.open(f'{STAC_URL}/LPCLOUD/') - -collections = ["OPERA_L3_DIST-ALERT-HLS_V1"] - -# We would like to search data for August-September 2023 -date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}' - -opts = { - 'bbox' : dadia_forest.bounds, - 'collections': collections, - 'datetime' : date_range, -} - -search = catalog.search(**opts) -``` - -NOTE: The OPERA DIST data product is hosted on [LP DAAC](https://lpdaac.usgs.gov/news/lp-daac-releases-opera-land-surface-disturbance-alert-version-1-data-product/), and this is specified when setting up the PySTAC client to search their catalog of data products in the above code cell. - -```python -results = list(search.items_as_dicts()) -print(f"Number of tiles found intersecting given AOI: {len(results)}") -``` - -Let's load the search results into a pandas dataframe - -```python -def search_to_df(results, layer_name = 'VEG-DIST-STATUS'): - - times = pd.DatetimeIndex([result['properties']['datetime'] for result in results]) # parse of timestamp for each result - data = {'hrefs': [value['href'] for result in results for key, value in result['assets'].items() if layer_name in key], # parse out links only to DIST-STATUS data layer - 'tile_id': [value['href'].split('/')[-1].split('_')[3] for result in results for key, value in result['assets'].items() if layer_name in key]} - - # Construct pandas dataframe to summarize granules from search results - granules = pd.DataFrame(index=times, data=data) - granules.index.name = 'times' - - return granules -``` - -```python -granules = search_to_df(results) -granules.head() - -``` - -```python -# Let's refine the dataframe a bit more so that we group together granules by -# date of acquisition - we don't mind if they were acquired at different times -# of the same day - -refined_granules = defaultdict(list) - -for i, row in granules.iterrows(): - refined_granules[i.strftime('%Y-%m-%d')].append(row.hrefs) - -refined_granules = pd.DataFrame(index=refined_granules.keys(), data = {'hrefs':refined_granules.values()}) -``` - -```python -# The wildfire near Alexandroupolis started on August 21st and rapidly spread, particularly affecting the nearby Dadia Forest -# For demonstration purposes, let's look at three dates to study the extent of the fire - -# August 1st, August 25th, and September 19th -# We will plot the OPERA-DIST-ALERT data product, highlighting only those pixels corresponding to confirmed vegetation damage, -# and in particular only those pixels where at least 50% of the area was affected (layer value 6) - -dates_of_interest = [datetime(year=2023, month=8, day=1), datetime(year=2023, month=8, day=26), datetime(year=2023, month=9, day=18)] -hrefs_of_interest = [x.hrefs for i, x in refined_granules.iterrows() if datetime.strptime(i, '%Y-%m-%d') in dates_of_interest] -``` - -**Relevant layer Values for DIST-ALERT product:** - -* **0:** No disturbance
-* **1:** First detection of disturbance with vegetation cover change <50%
-* **2:** Provisional detection of disturbance with vegetation cover change <50%
-* **3:** Confirmed detection of disturbance with vegetation cover change <50%
-* **4:** First detection of disturbance with vegetation cover change >50%
-* **5:** Provisional detection of disturbance with vegetation cover change >50%
-* **6:** Confirmed detection of disturbance with vegetation cover change >50%
- -```python -# Define color map to generate plot (Red, Green, Blue, Alpha) -colors = [(1, 1, 1, 0)] * 256 # Initial set all values to white, with zero opacity -colors[6] = (1, 0, 0, 1) # Set class 6 to Red with 100% opacity - -# Create a ListedColormap -cmap = ListedColormap(colors) -``` - -```python -fig, ax = plt.subplots(1, 3, figsize = (30, 10)) -crs = None - -for i, (date, hrefs) in enumerate(zip(dates_of_interest, hrefs_of_interest)): - - # Read the crs to be used to generate basemaps - if crs is None: - with rasterio.open(hrefs[0]) as ds: - crs = ds.crs - - if len(hrefs) == 1: - with rasterio.open(hrefs[0]) as ds: - raster = ds.read() - transform = ds.transform - else: - raster, transform = merge(hrefs) - - show(raster, ax=ax[i], transform=transform, interpolation='none') - cx.add_basemap(ax[i], crs=crs, zoom=9, source=cx.providers.OpenStreetMap.Mapnik) - show(raster, ax=ax[i], transform=transform, interpolation='none', cmap=cmap) - - scalebar = AnchoredSizeBar(ax[i].transData, - 10000 , '10 km', 'lower right', - color='black', - frameon=False, - pad = 0.25, - sep=5, - fontproperties = {'weight':'semibold', 'size':12}, - size_vertical=300) - - ax[i].add_artist(scalebar) - ax[i].ticklabel_format(axis='both', style='scientific',scilimits=(0,0),useOffset=False,useMathText=True) - ax[i].set_xlabel('UTM easting (meters)') - ax[i].set_ylabel('UTM northing (meters)') - ax[i].set_title(f"Disturbance extent on: {date.strftime('%Y-%m-%d')}") -``` - -Next, let's calculate the area extent of damage over time - -```python -damage_area = [] -conversion_factor = (30*1e-3)**2 # to convert pixel count to area in km^2; each pixel is 30x30 meters - -# this will take a few minutes to run, since we are retrieving data for multiple days -for index, row in refined_granules.iterrows(): - raster, transform = merge(row.hrefs) - damage_area.append(np.sum(raster==6)*conversion_factor) - -refined_granules['damage_area'] = damage_area - -``` - -```python -fig, ax = plt.subplots(1, 1, figsize=(20, 10)) -ax.plot([datetime.strptime(i, '%Y-%m-%d') for i in refined_granules.index], refined_granules['damage_area'], color='red') -ax.grid() -plt.ylabel('Area damaged by wildfire (km$^2$)', size=15) -plt.xlabel('Date', size=15) -plt.xticks([datetime(year=2023, month=8, day=1) + timedelta(days=6*i) for i in range(11)], size=14) -plt.title('2023 Dadia forest wildfire detected extent', size=14) -``` - -### Great Green Wall, Sahel Region, Africa - -```python -ndiaye_senegal = Point(-16.09, 16.50) - -# We will search data through the product record -start_date = datetime(year=2022, month=1, day=1) -stop_date = datetime.now() -``` - -```python -# Plotting search location in folium as a sanity check -m = folium.Map(location=(ndiaye_senegal.y, ndiaye_senegal.x), control_scale = True, zoom_start=9) -radius = 5000 -folium.Circle( - location=[ndiaye_senegal.y, ndiaye_senegal.x], - radius=radius, - color="red", - stroke=False, - fill=True, - fill_opacity=0.6, - opacity=1, - popup="{} pixels".format(radius), - tooltip="50 px radius", - # -).add_to(m) - -m -``` - -```python -# We open a client instance to search for data, and retrieve relevant data records -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -# LPCLOUD refers to the LP DAAC cloud environment that hosts earth observation data -catalog = Client.open(f'{STAC_URL}/LPCLOUD/') - -collections = ["OPERA_L3_DIST-ANN-HLS_V1"] - -# We would like to search data for August-September 2023 -date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}' - -opts = { - 'bbox' : ndiaye_senegal.bounds, - 'collections': collections, - 'datetime' : date_range, -} - -search = catalog.search(**opts) -results = list(search.items_as_dicts()) -print(f"Number of tiles found intersecting given AOI: {len(results)}") -``` - -```python -def urls_to_dataset(granule_dataframe): - '''method that takes in a list of OPERA tile URLs and returns an xarray dataset with dimensions - latitude, longitude and time''' - - dataset_list = [] - - for i, row in granule_dataframe.iterrows(): - with rasterio.open(row.hrefs) as ds: - # extract CRS string - crs = str(ds.crs).split(':')[-1] - - # extract the image spatial extent (xmin, ymin, xmax, ymax) - xmin, ymin, xmax, ymax = ds.bounds - - # the x and y resolution of the image is available in image metadata - x_res = np.abs(ds.transform[0]) - y_res = np.abs(ds.transform[4]) - - # read the data - img = ds.read() - - # Ensure img has three dimensions (bands, y, x) - if img.ndim == 2: - img = np.expand_dims(img, axis=0) - - - - lon = np.arange(xmin, xmax, x_res) - lat = np.arange(ymax, ymin, -y_res) - - lon_grid, lat_grid = np.meshgrid(lon, lat) - - da = xr.DataArray( - data=img, - dims=["band", "y", "x"], - coords=dict( - lon=(["y", "x"], lon_grid), - lat=(["y", "x"], lat_grid), - time=i, - band=np.arange(img.shape[0]) - ), - attrs=dict( - description="OPERA DIST ANN", - units=None, - ), - ) - da.rio.write_crs(crs, inplace=True) - - dataset_list.append(da) - return xr.concat(dataset_list, dim='time').squeeze() - -dataset= urls_to_dataset(granules) -``` diff --git a/book/07_Wildfire_analysis/Wildfire.md b/book/07_Wildfire_analysis/Wildfire.md deleted file mode 100644 index 6cd0988..0000000 --- a/book/07_Wildfire_analysis/Wildfire.md +++ /dev/null @@ -1,3 +0,0 @@ -# Wildfire - -![](../assets/climate_risks_fire_color.png) diff --git a/book/08_Flood_analysis/3_Retrieving_FloodData.md b/book/08_Flood_analysis/3_Retrieving_FloodData.md deleted file mode 100644 index 8cf6223..0000000 --- a/book/08_Flood_analysis/3_Retrieving_FloodData.md +++ /dev/null @@ -1,431 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 - kernelspec: - display_name: Python 3 (ipykernel) - language: python - name: python3 ---- - -## Retrieving OPERA DSWx-HLS data for a flood event - -Heavy rains severely impacted Southeast Texas in May 2024 [[1]](https://www.texastribune.org/2024/05/03/texas-floods-weather-harris-county/), resulting in flooding and causing significant damage to property and human life [[2]](https://www.texastribune.org/series/east-texas-floods-2024/). In this notebook, we will retrieve [OPERA DSWx-HLS](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf) data associated to understand the extent of flooding and damage, and visualize data from before, during, and after the event. - -```python -import rasterio -import rioxarray -import folium - -import hvplot.xarray # noqa -import xarray as xr -import xyzservices.providers as xyz - -from shapely.geometry import Point -from osgeo import gdal - -from holoviews.plotting.util import process_cmap - -import pandas as pd - -from warnings import filterwarnings -filterwarnings("ignore") # suppress PySTAC warnings - -# STAC imports to retrieve cloud data -from pystac_client import Client - -from datetime import datetime -import numpy as np - -# GDAL setup for accessing cloud data -gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt') -gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt') -gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR') -gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') - -``` - -```python -# Let's set up the parameters of our search query - -# Flooding in Texas, 2024; -livingston_tx = Point(-95.09, 30.69) - -# We will search data around the flooding event at the start of May -start_date = datetime(year=2024, month=4, day=30) -stop_date = datetime(year=2024, month=5, day=31) -date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}' - -# We open a client instance to search for data, and retrieve relevant data records -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -# POCLOUD refers to the PO DAAC cloud environment that hosts earth observation data -catalog = Client.open(f'{STAC_URL}/POCLOUD/') - -# Setup PySTAC client -provider_cat = Client.open(STAC_URL) -catalog = Client.open(f'{STAC_URL}/POCLOUD/') -collections = ["OPERA_L3_DSWX-HLS_V1"] - -opts = { - 'bbox' : livingston_tx.buffer(0.01).bounds, - 'collections': collections, - 'datetime' : date_range, -} -``` - - -```python -livingston_tx = Point(-95.09, 30.69) -m = folium.Map(location=(livingston_tx.y, livingston_tx.x), control_scale = True, zoom_start=8) -radius = 15000 -folium.Circle( - location=[livingston_tx.y, livingston_tx.x], - radius=radius, - color="red", - stroke=False, - fill=True, - fill_opacity=0.6, - opacity=1, - popup="{} pixels".format(radius), - tooltip="Livingston, TX", - # -).add_to(m) - -m -``` - -```python -# Execute the search -search = catalog.search(**opts) -results = list(search.items_as_dicts()) -print(f"Number of tiles found intersecting given AOI: {len(results)}") -``` - -```python -def search_to_df(results, layer_name='0_B01_WTR'): - ''' - Given search results returned from a NASA earthdata query, load and return relevant details and band links as a dataframe - ''' - - times = pd.DatetimeIndex([result['properties']['datetime'] for result in results]) # parse of timestamp for each result - data = {'hrefs': [value['href'] for result in results for key, value in result['assets'].items() if layer_name in key], - 'tile_id': [value['href'].split('/')[-1].split('_')[3] for result in results for key, value in result['assets'].items() if layer_name in key]} - - - # Construct pandas dataframe to summarize granules from search results - granules = pd.DataFrame(index=times, data=data) - granules.index.name = 'times' - - return granules -``` - -```python -granules = search_to_df(results) -granules.head() -``` - -```python -# We now filter the dataframe to restrict our results to a single tile_id -granules = granules[granules.tile_id == 'T15RTQ'] -granules.sort_index(inplace=True) -``` - -```python -granules -``` - -```python -type(results[0]) - -``` - -```python -def filter_search_by_cc(results, cloud_threshold=10): - ''' - Given a list of results returned by the NASA Earthdata STAC API for OPERA DSWx data, - filter them by cloud cover - ''' - - filtered_results = [] - - for result in results: - try: - cloud_cover = result['properties']['eo:cloud_cover'] - except: - href = result['assets']['0_B01_WTR']['href'] - with rasterio.open(href) as ds: - img = ds.read(1).flatten() - cloud_cover = 100*(np.sum(np.isin(img, 253))/img.size) - - if cloud_cover <= cloud_threshold: - filtered_results.append(result) - - return filtered_results -``` - -```python -def urls_to_dataset(granule_dataframe): - '''method that takes in a list of OPERA tile URLs and returns an xarray dataset with dimensions - latitude, longitude and time''' - - dataset_list = [] - - for i, row in granule_dataframe.iterrows(): - with rasterio.open(row.hrefs) as ds: - # extract CRS string - crs = str(ds.crs).split(':')[-1] - - # extract the image spatial extent (xmin, ymin, xmax, ymax) - xmin, ymin, xmax, ymax = ds.bounds - - # the x and y resolution of the image is available in image metadata - x_res = np.abs(ds.transform[0]) - y_res = np.abs(ds.transform[4]) - - # read the data - img = ds.read() - - # Ensure img has three dimensions (bands, y, x) - if img.ndim == 2: - img = np.expand_dims(img, axis=0) - - - - lon = np.arange(xmin, xmax, x_res) - lat = np.arange(ymax, ymin, -y_res) - - lon_grid, lat_grid = np.meshgrid(lon, lat) - - da = xr.DataArray( - data=img, - dims=["band", "y", "x"], - coords=dict( - lon=(["y", "x"], lon_grid), - lat=(["y", "x"], lat_grid), - time=i, - band=np.arange(img.shape[0]) - ), - attrs=dict( - description="OPERA DSWx B01", - units=None, - ), - ) - da.rio.write_crs(crs, inplace=True) - - dataset_list.append(da) - return xr.concat(dataset_list, dim='time').squeeze() - -dataset= urls_to_dataset(granules) -``` - -```python -COLORS = [(150, 150, 150, 0)]*256 -COLORS[0] = (0, 255, 0, 1) -COLORS[1] = (0, 0, 255, 1) # OSW -COLORS[2] = (0, 0, 255, 1) # PSW -COLORS[252] = (0, 0, 255, 1) # ICE -COLORS[253] = (150, 150, 150, 1) -``` - -```python -img = dataset.hvplot.quadmesh(title = 'DSWx data for May 2024 Texas floods', - x='lon', y='lat', - project=True, rasterize=True, - cmap=COLORS, - colorbar=False, - widget_location='bottom', - tiles = xyz.OpenStreetMap.Mapnik, - xlabel='Longitude (degrees)',ylabel='Latitude (degrees)', - fontscale=1.25, frame_width=1000, frame_height=1000) - -img -``` - -### Vaigai Reservoir - -```python -vaigai_reservoir = Point(77.568, 10.054) -``` - -```python -# Plotting search location in folium as a sanity check -m = folium.Map(location=(vaigai_reservoir.y, vaigai_reservoir.x), control_scale = True, zoom_start=9) -radius = 5000 -folium.Circle( - location=[vaigai_reservoir.y, vaigai_reservoir.x], - radius=radius, - color="red", - stroke=False, - fill=True, - fill_opacity=0.6, - opacity=1, - popup="{} pixels".format(radius), - tooltip="50 px radius", - # -).add_to(m) - -m -``` - -```python -# We will search data around the flooding event in December -start_date = datetime(year=2023, month=4, day=1) -stop_date = datetime(year=2024, month=4, day=1) -date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}' - -# We open a client instance to search for data, and retrieve relevant data records -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -# POCLOUD refers to the PO DAAC cloud environment that hosts earth observation data -catalog = Client.open(f'{STAC_URL}/POCLOUD/') - -# Setup PySTAC client -provider_cat = Client.open(STAC_URL) -catalog = Client.open(f'{STAC_URL}/POCLOUD/') -collections = ["OPERA_L3_DSWX-HLS_V1"] - -# Setup search options -opts = { - 'bbox' : vaigai_reservoir.buffer(0.01).bounds, - 'collections': collections, - 'datetime' : date_range, -} - -# Execute the search -search = catalog.search(**opts) -results = list(search.items_as_dicts()) -print(f"Number of tiles found intersecting given AOI: {len(results)}") -``` - -```python -# let's filter our results so that only scenes with less than 10% cloud cover are returned -results = filter_search_by_cc(results) - -print("Number of results containing less than 10% cloud cover: ", len(results)) -``` - -```python -# Load results into dataframe -granules = search_to_df(results) -``` - -```python -# This may take a while depending on the number of results we are loading -dataset= urls_to_dataset(granules) -``` - -```python -img = dataset.hvplot.quadmesh(title = 'Vaigai Reservoir, India - water extent over a year', - x='lon', y='lat', - project=True, rasterize=True, - cmap=COLORS, - colorbar=False, - widget_location='bottom', - tiles = xyz.OpenStreetMap.Mapnik, - xlabel='Longitude (degrees)',ylabel='Latitude (degrees)', - fontscale=1.25, frame_width=1000, frame_height=1000) - -img -``` - -### Lake Mead - -```python -lake_mead = Point(-114.348, 36.423) -``` - -```python -# Plotting search location in folium as a sanity check -m = folium.Map(location=(lake_mead.y, lake_mead.x), control_scale = True, zoom_start=9) -radius = 5000 -folium.Circle( - location=[lake_mead.y, lake_mead.x], - radius=radius, - color="red", - stroke=False, - fill=True, - fill_opacity=0.6, - opacity=1, - popup="{} pixels".format(radius), - tooltip="50 px radius", - # -).add_to(m) - -m -``` - -```python -# We will search data for a year -start_date = datetime(year=2023, month=3, day=1) -stop_date = datetime(year=2024, month=6, day=15) -date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}' - -# We open a client instance to search for data, and retrieve relevant data records -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -# POCLOUD refers to the PO DAAC cloud environment that hosts earth observation data -catalog = Client.open(f'{STAC_URL}/POCLOUD/') - -# Setup PySTAC client -provider_cat = Client.open(STAC_URL) -catalog = Client.open(f'{STAC_URL}/POCLOUD/') -collections = ["OPERA_L3_DSWX-HLS_V1"] - -# Setup search options -opts = { - 'bbox' : lake_mead.buffer(0.01).bounds, - 'collections': collections, - 'datetime' : date_range, -} - -# Execute the search -search = catalog.search(**opts) -results = list(search.items_as_dicts()) -print(f"Number of tiles found intersecting given AOI: {len(results)}") -``` - -```python -# let's filter our results so that only scenes with less than 10% cloud cover are returned -results = filter_search_by_cc(results) - -print("Number of results containing less than 10% cloud cover: ", len(results)) -``` - -That's a fairly large number of tiles! Loading the data into an xarray dataset will take a few minutes. - -```python -# Load results into dataframe -granules = search_to_df(results) -``` - -```python -# Let's filter by tile id so that we can study changes over the same spatial extent -granules = granules[granules.tile_id=='T11SQA'] -``` - -```python -# Similarly, loading a year's worth of data will take a few minutes -dataset= urls_to_dataset(granules) -``` - -```python -img = dataset.hvplot.quadmesh(title = 'Lake Mead, NV USA - water extent over a year', - x='lon', y='lat', - project=True, rasterize=True, - cmap=COLORS, - colorbar=False, - widget_location='bottom', - tiles = xyz.OpenStreetMap.Mapnik, - xlabel='Longitude (degrees)',ylabel='Latitude (degrees)', - fontscale=1.25, frame_width=1000, frame_height=1000) - -img -``` diff --git a/book/08_Flood_analysis/flood.md b/book/08_Flood_analysis/flood.md deleted file mode 100644 index d2fa60c..0000000 --- a/book/08_Flood_analysis/flood.md +++ /dev/null @@ -1,4 +0,0 @@ -# Flood - - -![](../assets/climate_risks_flood_color.png) diff --git a/book/09_Scratch/2_Selecting_an_AOI.md b/book/09_Scratch/2_Selecting_an_AOI.md deleted file mode 100644 index cf5dda5..0000000 --- a/book/09_Scratch/2_Selecting_an_AOI.md +++ /dev/null @@ -1,224 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 - kernelspec: - display_name: nasa_topst - language: python - name: python3 ---- - -# Selecting an AOI - -Selecting and modifying Areas of Interest (AOIs) is an important part of geospatial data analysis workflows. The Python ecosystem of libraries provide a number ways to do this, some of which will be explored and demonstrated in this notebook. In particular, we will demonstrate the following: -1. How to specify AOIs in different ways -2. How to use `geopandas` to load shapely geometries, visualize them, and perform operations such as `intersection` -3. Querying data providers using the AOIs defined, and understanding how query results can change based on AOI -4. Perform windowing operations when reading raster data using `rasterio` - -This can be used to effectively query cloud services for datasets over specific regions, - -```python -# library to handle filepath operations -from pathlib import Path - -# library for handling geospatial data -import rasterio -from rasterio.plot import show -import geopandas as gpd -from shapely.geometry import Polygon, Point -from pystac_client import Client - -# libraries to help with visualization -import matplotlib.pyplot as plt -from matplotlib.colors import ListedColormap -from matplotlib import colors -import folium - -# handle numbers -import numpy as np - - -# We set the following rasterio environment variables to read data from the cloud -rio_env = rasterio.Env( - GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR', - CPL_VSIL_CURL_ALLOWED_EXTENSIONS="TIF, TIFF", - GDAL_HTTP_COOKIEFILE=Path('~/cookies.txt').expanduser(), - GDAL_HTTP_COOKIEJAR=Path('~/cookies.txt').expanduser() - ) -rio_env.__enter__() -``` - -AOIs are `vector` data formats, because they refer to specific `points` or `polygons` that refer to the location of interest in a given co-ordinate reference system (CRS). For example, the city center of of [Tokyo, Japan](https://en.wikipedia.org/wiki/Tokyo) can be specified by the latitude and longitude pair (35.689722, 139.692222) in the [WGS84 CRS](https://en.wikipedia.org/wiki/World_Geodetic_System). In Python, we use the popular `shapely` library to define a vector shapes, as shown below: - -```python -tokyo_point = Point(35.689722, 139.692222) -``` - -This code will generative an interactive plot - feel free to pan/zoom around! - -```python -m = folium.Map(location=(tokyo_point.x, tokyo_point.y), control_scale = True, zoom_start=8) -radius = 50 -folium.CircleMarker( - location=[tokyo_point.x, tokyo_point.y], - radius=radius, - color="cornflowerblue", - stroke=False, - fill=True, - fill_opacity=0.6, - opacity=1, - popup="{} pixels".format(radius), - tooltip="50 px radius", - # -).add_to(m) - -m -``` - -AOIs can also take the form of bounds. Typically they are specified by four values - the minimum and maximum extent each in the `x` and `y` directions. For rasterio, these are specified in the format `(x_min, y_min, x_max, y_max)`, with values specified in the local CRS. We specify values in the local CRS. - -```python -marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92) - -# We will load the polygon into a geopandas dataframe for ease of plotting -gdf = gpd.GeoDataFrame({'geometry':[marrakesh_polygon]}, crs='epsg:4326') -``` - -```python -m = folium.Map(location=[31.62, -7.88], zoom_start=10) -for _, row in gdf.iterrows(): - sim_geo = gpd.GeoSeries(row["geometry"]).simplify(tolerance=0.001) - geo_j = sim_geo.to_json() - geo_j = folium.GeoJson(data=geo_j, style_function=lambda x: {"fillColor": "orange"}) - # folium.GeoJson(data=.to_json(), style_function=lambda x: {"fillColor": "orange"}) - geo_j.add_to(m) - -m -``` - -Geopandas dataframes require a `geometry` column containing `shapely` shapes (`Points`, `Polygons`, etc.) and also require a `CRS` to be specified to work and render correctly. In this example, we specify `EPSG:4326` as our CRS, which corresponds to WGS84 system, which refers to locations on the globe using `(latitude, longitude)` pair values. - -Let's add another polygon to the above example, and also see how to calculate their intersection: - -```python -marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92) # Original polygon -marrakesh_polygon_2 = Polygon.from_bounds(-8.38, 31.22, -7.68, 31.52) # Arbitrary second overlapping polygon -intersection_polygon = marrakesh_polygon.intersection(marrakesh_polygon_2) # Calculate the intersection of polygons - -# We will load the polygon into a geopandas dataframe for ease of plotting -gdf = gpd.GeoDataFrame({'name':['Original Polygon', 'New Polygon', 'Intersection Area'], # Add some text that will appear when you hover over the polygono - 'color':['blue', 'orange', 'red'], # Unique colors for each shape - 'geometry':[marrakesh_polygon, marrakesh_polygon_2, intersection_polygon]}, # column of geometries - crs='epsg:4326') # CRS for the dataframe, which must be common to all the shapes - -m = folium.Map(location=[31.62, -7.88], zoom_start=10) -for _, row in gdf.iterrows(): - sim_geo = gpd.GeoSeries(row["geometry"]).simplify(tolerance=0.001) - geo_j = sim_geo.to_json() - geo_j = folium.GeoJson(data=geo_j, fillColor=row['color'], tooltip=row["name"]) - geo_j.add_to(m) - -m -``` - -Let us now query a DAAC for data over a new region. We will be going over the details of the query in the next chapter, but will simply see an example of data querying here. First, let's look at the region in a folium map - -```python -lake_mead_polygon = Polygon.from_bounds(-114.52, 36.11,-114.04, 36.48) - -# We will load the polygon into a geopandas dataframe for ease of plotting -gdf = gpd.GeoDataFrame({'geometry':[lake_mead_polygon]}, crs='epsg:4326') - -m = folium.Map(location=[36.11, -114.5], zoom_start=10) -for _, row in gdf.iterrows(): - sim_geo = gpd.GeoSeries(row["geometry"]).simplify(tolerance=0.001) - geo_j = sim_geo.to_json() - geo_j = folium.GeoJson(data=geo_j, style_function=lambda x: {"fillColor": "orange"}) - geo_j.add_to(m) - -m -``` - -```python -# URL of CMR service -STAC_URL = 'https://cmr.earthdata.nasa.gov/stac' - -# Setup PySTAC client -provider_cat = Client.open(STAC_URL) -catalog = Client.open(f'{STAC_URL}/POCLOUD/') -collections = ["OPERA_L3_DSWX-HLS_V1"] - -# We would like to search data for April 2023 -date_range = "2023-04-01/2023-04-30" - -opts = { - 'bbox' : lake_mead_polygon.bounds, - 'collections': collections, - 'datetime' : date_range, -} - -search = catalog.search(**opts) -``` - -```python -print("Number of tiles found intersecting given polygon: ", len(list(search.items()))) -``` - -How many search results did you get? What happens if you modify the date range in the previous cell and re-run the search? Note: if you make the time window too large, it will take a while for results to return. - -Lastly, let's visualize some of the returned data. Here's a sample returned search result - you can click on the keys and see the data contained in them: - -```python -sample_result = list(search.items())[0] -sample_result -``` - -```python -data_url = sample_result.assets['0_B01_WTR'].href -``` - -```python -with rasterio.open(data_url) as ds: - img = ds.read(1) - cmap = ds.colormap(1) - profile = ds.profile -cmap = ListedColormap([np.array(cmap[key]) / 255 for key in range(256)]) -``` - -```python -fig, ax = plt.subplots(1, 1, figsize=(10, 10)) -im = show(img, ax=ax, transform=profile['transform'], cmap=cmap, interpolation='none') - -ax.set_xlabel("Eastings (meters)") -ax.set_ylabel("Northings (meters)") -ax.ticklabel_format(axis='both', style='scientific',scilimits=(0,0),useOffset=False,useMathText=True) - -bounds = [0, 1, 2, 3, - 251, 252, 253, - ] - -im = im.get_images()[0] - -cbar=fig.colorbar(im, - ax=ax, - shrink=0.5, - pad=0.05, - boundaries=bounds, - cmap=cmap, - ticks=[0.5, 1.5, 2.5, 251.5, 252.5]) - -cbar.ax.tick_params(labelsize=8) -norm = colors.BoundaryNorm(bounds, cmap.N) -cbar.set_ticklabels(['Not Water', - 'Open Water', - 'Partial Surface Water', - 'HLS Snow/Ice', - 'HLS Cloud/Cloud Shadow', - ], - fontsize=7) -``` diff --git a/book/09_Scratch/4_Analyzing_Datasets.md b/book/09_Scratch/4_Analyzing_Datasets.md deleted file mode 100644 index b48644d..0000000 --- a/book/09_Scratch/4_Analyzing_Datasets.md +++ /dev/null @@ -1,2 +0,0 @@ -# Time-series analysis of datasets -Now that we are able to identify and retrieve datasets of interest, we can perform a simple time series analysis of our dataset - for example, we will try to answer the question, "How does the amount of water present in our scene change over time?" \ No newline at end of file diff --git a/book/09_Scratch/5_Manipulating_and_Visualizing_Datasets.md b/book/09_Scratch/5_Manipulating_and_Visualizing_Datasets.md deleted file mode 100644 index c13e5bd..0000000 --- a/book/09_Scratch/5_Manipulating_and_Visualizing_Datasets.md +++ /dev/null @@ -1,2 +0,0 @@ -# Manipulating and Visualizing datasets -AOIs may span multiple rasters, and in such situations it is useful to `mosaic` them together to obtain a single continuous image. We also show how to visualize raster images by using `rasterio`'s in-built plotting routines, as well as using additional python libraries such as `matplotlib` to generate scalebars. \ No newline at end of file diff --git a/book/09_Scratch/SLIDES-NASA-TOPS-flood-EN.md b/book/09_Scratch/SLIDES-NASA-TOPS-flood-EN.md deleted file mode 100644 index 3bc1080..0000000 --- a/book/09_Scratch/SLIDES-NASA-TOPS-flood-EN.md +++ /dev/null @@ -1,251 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 ---- - - -# Analyzing Flood Risk Reproducibly with NASA Earthdata Cloud - -
- Image -

ScienceCore:
Climate Risk

-
- - - - - - -## Objective - -Utilize NASA's open products called Opera Dynamic Surface Water eXtent (DSWx) - Harmonized Landsat Sentinel-2 (HLS) to map the extent of flooding resulting from the September 2022 monsoon event in Pakistan. - - - - -In 2022, Pakistan's monsoon rains reached record levels, causing devastating floods and landslides that affected all four provinces of the country and around 14% of its population. In this example, you will see how to use NASA's open products Opera DSWx-HLS to map the extent of flooding caused by the monsoon in September 2022 in Pakistan. - - - - -## Roadmap - -- Opera DSWx-HLS Products -- Set up the working environment -- Define the area of interest -- Data search and retrieval -- Data analysis -- Processing and visualization - - - -First, we will define what Opera DSWx-HLS products are and what kind of information you can obtain from them. -Then, you will learn to set up your working environment, define the area of interest you want to gather information about, perform data search and retrieval, analyze and visualize it. - - - - -## The Expected Outcome - --- Insert resulting image -- - - - -This is the final result you will achieve after completing the workshop activities. - - - -## Before We Start - -- To participate in this class, you must accept the coexistence guidelines detailed [here](). -- If you speak, then mute your microphone to avoid interruptions from background noise. We might do it for you. -- To say something, request the floor or use the chat. -- Can we record the chat? Can we take pictures? - - - - -Coexistence guidelines: -- If you're in this course, you've accepted the coexistence guidelines of our community, which broadly means we'll behave in a polite and friendly manner to make this an open, safe, and friendly environment, ensuring the participation of all individuals in our virtual activities and spaces. -- If any of you see or feel that you're not comfortable enough, you can write to us privately. -- In case those of us who don't make you feel comfortable are --teachers-- you can indicate it by sending an email to --add reference email-- -How to participate: -- We're going to ask you to mute/turn off your microphones while you're not speaking so that the ambient sound from each of us doesn't bother us. -- You can request the floor by raising your hand or in the chat, and --teachers-- will be attentive so that you can participate at the right time. -About recording: -- The course will be recorded, if you don't want to appear in the recording, we ask you to turn off the camera. -- If any of you want to share what we're doing on social media, please, before taking a photo or screenshot with the faces of each of the people present, ask for permission because some people may not feel comfortable sharing their image on the internet. There are no problems in sharing images of the slides or --the teacher's face--. - - - - - -## Opera DSWx-HLS Dataset - -- Contains observations of surface water extent at specific locations and times (from February 2019 to September 2022). -- Distributed over projected map coordinates as mosaics. -- Each mosaic covers an area of 109.8 x 109.8 km. -- Each mosaic includes 10 GeoTIFF (layers). - - - - -This dataset contains observations of surface water extent at specific locations and times spanning from February 2019 to September 2022. The input dataset for generating each product is the Harmonized Landsat-8 and Sentinel-2A/B (HLS) product version 2.0. HLS products provide surface reflectance (SR) data from the Operational Land Imager (OLI) aboard the Landsat 8 satellite and the MultiSpectral Instrument (MSI) aboard the Sentinel-2A/B satellite. - -Surface water extent products are distributed over projected map coordinates. Each UTM mosaic covers an area of 109.8 km × 109.8 km. This area is divided into 3,660 rows and 3,660 columns with a pixel spacing of 30 m. - -Each product is distributed as a set of 10 GeoTIFF files (layers) including water classification, associated confidence, land cover classification, terrain shadow layer, cloud/cloud-shadow classification, Digital Elevation Model (DEM), and Diagnostic layer in PNG format. - - - - - -## Opera DSWx-HLS Dataset - -1. B02_BWTR (Water binary layer): - - 1 (white) = presence of water. - - 0 (black) = absence of water. - -2. B03_CONF (Confidence layer): - - % confidence in its water predictions. - - - -In this workshop, we will use two layers: -1. **B02_BWTR (Water binary layer):** -This layer provides us with a simple image of flooded areas. Where there is water, the layer is valued at 1 (white), and where there is no water, it takes a value of 0 (black). It's like a binary map of floods, ideal for getting a quick overview of the disaster's extent. -2. **B03_CONF (Confidence layer):** -This layer indicates how confident the DSWx-HLS system is in its water predictions. Where the layer shows high values (near 100%), we can be very sure that there is water. In areas with lower values, confidence decreases, meaning that what appears to be water could be something else, such as shadows or clouds. - -To help you better visualize how this works, think of a satellite image of the flood-affected area. Areas with water appear dark blue, while dry areas appear brown or green. -The water binary layer (B02_BWTR), overlaid on the image, would shade all blue areas white, creating a simple map of water yes/no. -In contrast, the confidence layer (B03_CONF) would function as a transparency overlaid on the image, with solid white areas where confidence is high and increasing transparency towards black where confidence is low. This allows you to see where the DSWx-HLS system is most confident that its water predictions are correct. -By combining these layers, scientists and humanitarian workers can get a clear picture of the extent of floods and prioritize rescue and recovery efforts. - - - - -## Set Up the Working Environment - -TO BE COMPLETED. - - - -THIS NEEDS TO BE DEFINED BASED ON NOTEBOOK MODIFICATIONS. - - - -## Live Coding: Let's go to notebook XXXXX. - - - -## Selection of the Area of Interest (AOI) - -- Initialize user-defined parameters. -- Perform a specific data search on NASA. -- Search for images within the DSWx-HLS collection that match the AOI. - - - -Next, you will learn: - -1. How to initialize user-defined parameters: - -* Define the search area: Draw a rectangle on the map to indicate the area where you want to search for data. -* Set the search period: Mark the start and end dates to narrow down the results to a specific time range. -* Display parameters: Print on the screen the details of the search area and the chosen dates so you can verify them. - -2. Perform a specific data search on NASA: - -* Connect to the database: Link to NASA's CMR-STAC API to access its files. -* Specify the collection: Indicate that you want to search for data from the "OPERA_L3_DSWX-HLS_PROVISIONAL_V0" collection. -* Perform the search: Filter the results according to the search area, dates, and a maximum limit of 1000 results. - -3. Search for images (from the DSWx-HLS collection) that match the area of interest: - -* Measure overlap: Calculate how much each image overlaps with the area you are interested in. -* Show percentages: Print these percentages on the screen so you can see the coverage. -* Filter images: Select only those with an overlap greater than a set limit. - - - - - -## Live Coding: Let's go to notebook XXXXX. - - - -## Activity 1: - -Modify the XXX parameters to define a new area of interest. - - - -## Data Search and Retrieval - -- Transform filtered data into a list. -- Display details of the first result: - - Count the results. - - Show overlap. - - Indicate cloudiness. - - - -In the following section, you'll learn: - -1. How to transform filtered results into a list to work with them more easily. -2. How to display details of the first result to see what information it contains. - - Count the results: how many files were found after applying the filters. - - Show overlap: how much each file overlaps with the area you're looking for, so you know how well they cover the area. - - Indicate cloudiness: amount of clouds in each file before filtering, so you can consider if cloud coverage is an important factor for you. - - - - - - -## Live Coding: Let's go to notebook XXXX. - - - -## Activity 2: - -TO BE DEFINED - - - -## Data Analysis - -TO BE DEFINED - - - -## Live Coding: Let's go to notebook XXXX. - - - -## Activity 3: - -TO BE DEFINED - - - -## Processing and Visualization - -TO BE DEFINED - - - -## Live Coding: Let's go to the notebook XXXX. - - - -## Activity 4: - -TO BE DEFINED - diff --git a/book/09_Scratch/SLIDES-NASA-TOPS-flood-ES.md b/book/09_Scratch/SLIDES-NASA-TOPS-flood-ES.md deleted file mode 100644 index 7d9d881..0000000 --- a/book/09_Scratch/SLIDES-NASA-TOPS-flood-ES.md +++ /dev/null @@ -1,251 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 ---- - - -# Analizando de manera reproducible el riesgo de inundaciones con NASA Earthdata cloud. - -
- Image -

ScienceCore:
Climate Risk

-
- - - - - - -## Objetivo - -Utilizar los productos abiertos de la NASA llamados Opera Dynamic Surface Water eXtent (DSWx) - Landsat Sentinel-2 armonizado (HLS) para mapear la extensión de la inundación como resultado del evento monzónico de septiembre de 2022 en Pakistán. - - - - -En 2022, las lluvias monzónicas de Pakistán alcanzaron niveles récord, provocando devastadoras inundaciones y deslizamientos de tierra que afectaron a las cuatro provincias del país y a alrededor del 14% de su población. En este ejemplo, podrás ver cómo utilizar los productos abiertos de NASA Opera DSWx-HLS para mapear la extensión de las inundaciones causadas por el monzón ocurrido en septiembre de 2022 en Pakistán. - - - - -## Hoja de Ruta - -- Productos Opera DSWx-HLS -- Configurar el ambiente de trabajo -- Definir el área de interés -- Búsqueda y obtención de datos -- Análisis de datos -- Procesamiento y visualización - - - -Primero, definiremos qué son los productos Opera DSWx-HLS y qué tipo de información puedes obtener de ellos. -Luego, aprenderás a configurarás tu ambiente de trabajo, definir el área de interés sobre la que quieres recolectar información, realizar la búsqueda y recolección de datos, analizaros y visualizarlos. - - - - -## Qué te llevarás - --- Insertar imagen resultante -- - - - -Este es el resultado final al que llegarás, luego de completar las actividades del taller. - - - -## Antes de empezar - -- Para participar de esta clase, debes aceptar las pautas de convivencia detalladas [aquí](). -- Si hablas, luego silencia tu micrófono para evitar interrupciones por ruidos de fondo. Puede que lo hagamos por ti. -- Para decir algo, pide la palabra o usa el chat. -- ¿Podemos grabar el chat? ¿Podemos “sacar fotos”? - - - - -Pautas de convencia: -- Si están en este curso aceptaron las pautas de convivencia de nuestra comunidad el cuál implica, a grandes rasgos, que nos vamos a comportar de forma educada y amable para que este sea un ambiente abierto, seguro y amigable y garantizar la participación de todas las personas en nuestras actividades y espacios virtuales. -- Si alguno de ustedes ve o siente que no está lo suficientemente cómodo o cómoda, nos puede escribir a nosotros por mensajes privados. -- En caso de que quienes no los hagamos sentir cómodo seamos --docentes-- lo pueden indicar enviando un mail a --agregar mail de referencia -- -Cómo participar: -- Vamos a pedirles que se silencien/apaguen los micrófonos mientras no están hablando para que no nos moleste el sonido ambiente de cada uno de nosotros. -- Pueden pedir la palabra levantando la mano o en el chat y --docentes-- vamos a estar atentos para que puedan participar en el momento adecuado. -Acerca de la grabación: -- El curso va a grabarse, si no desean aparecer en la grabación les pedimos que apaguen la camara. -- Si alguno de ustedes quiere contar lo que estamos haciendo en redes sociales, por favor, antes de sacar una foto o captura de pantalla con las caras de cada una de las personas que están presentes, pidamos permiso porque puede haber gente que no se sienta cómoda compartiendo su imagen en internet. No hay inconvenientes en que compartan imágenes de las diapositivas o --la cara del docente--. - - - - - -## Dataset Opera DSWx-HLS - -- Contiene observaciones de la extensión superficial de agua en ubicaciones y momentos específicos (de febrero de 2019 hasta septiembre de 2022). -- Se distribuyen sobre coordenadas de mapa proyectadas como mosaicos. -- Cada mosaico cubre un área de 109.8 x 109.8 km. -- Cada mosaico incluye 10 GeoTIFF (capas). - - - - -Este conjunto de datos contiene observaciones de la extensión superficial de agua en ubicaciones y momentos específicos que abarcan desde febrero de 2019 hasta septiembre de 2022. El conjunto de datos de entrada para generar cada producto es el producto Harmonized Landsat-8 y Sentinel-2A/B (HLS) versión 2.0. Los productos HLS proporcionan datos de reflectancia de superficie (SR) del Operador de Imágenes Terrestres (OLI) a bordo del satélite Landsat 8 y del Instrumento Multiespectral (MSI) a bordo de los satélites Sentinel-2A/B. - -Los productos de extensión superficial de agua se distribuyen sobre coordenadas de mapa proyectadas. Cada mosaico UTM cubre un área de 109,8 km × 109,8 km. Esta área se divide en 3,660 filas y 3,660 columnas con un espaciado de píxeles de 30 m. - -Cada producto se distribuye como un conjunto de 10 archivos GeoTIFF (capas) que incluyen clasificación de agua, confianza asociada, clasificación de cobertura terrestre, capa de sombra del terreno, clasificación de nubes/sombras de nubes, Modelo Digital de Elevación (DEM) y capa diagnóstica en formato PNG. - - - - - -## Dataset Opera DSWx-HLS - -1. B02_BWTR (Capa binaria de agua): - - 1 (blanco) = presencia de agua. - - 0 (negro) = ausencia de agua. - -2. B03_CONF (Capa de confianza): - - % de confianza en sus predicciones de agua. - - - -En este taller, utilizaremos dos capas: -1. **B02_BWTR (Capa binaria de agua):** -Esta capa nos brinda una imagen simple de las áreas inundadas. Donde hay agua, la capa vale 1 (blanco) y donde no hay agua, toma valor 0 (negro). Es como un mapa binario de las inundaciones, ideal para obtener una visión general rápida de la extensión del desastre. -2. **B03_CONF (Capa de confianza):** -Esta capa nos indica qué tan seguro está el sistema DSWx-HLS de sus predicciones de agua. Donde la capa muestra valores altos (cerca de 100%), podemos estar muy seguros de que hay agua. En áreas con valores más bajos, la confianza disminuye, lo que significa que lo que parece agua podría ser otra cosa, como sombras o nubes. - -Para ayudarte a visualizar mejor cómo funciona esto piensa en una imagen satelital de la zona afectada por las inundaciones. Las áreas con agua se ven azul oscuro, mientras que las áreas secas se ven de color marrón o verde. -La capa binaria de agua (B02_BWTR), superpuesta sobre la imagen, sombrearía de blanco todas las áreas azules, creando un mapa simple de agua sí/no. -En cambio, la capa de confianza (B03_CONF) funcionaría como una transparencia superpuesta sobre la imagen, con áreas blancas sólidas donde la confianza es alta y transparencia creciente hacia el negro donde la confianza es baja. Esto te permite ver dónde el sistema DSWx-HLS está más seguro de que sus predicciones de agua son correctas. -Al combinar estas capas, los científicos y los trabajadores humanitarios pueden obtener una imagen clara de la extensión de las inundaciones y priorizar los esfuerzos de rescate y recuperación. - - - - -## Configurar el ambiente de trabajo - -A COMPLETAR. - - - -ESTO HAY QUE DEFINIRLO A PARTIR DE LA MODIFICACIÓN DE LAS NOTEBOOKS. - - - -## Live Coding: Vamos a la notebook XXXXX. - - - -## Selección del área de interés (AOI) - -- Inicializar parámetros definidos por el usuario. -- Relizar una búsqueda de datos específicos en la NASA. -- Buscar imágenes dentro de colección DSWx-HLS que coincidan con el AOI. - - - -A continuación, aprenderás a: - -1. Inicializar parámetros definidos por el usuario: - -* Define la zona de búsqueda: Dibuja un rectángulo en el mapa para indicar el área donde quieres buscar datos. -* Establece el periodo de búsqueda: Marca la fecha de inicio y de fin para acotar los resultados a un rango de tiempo específico. -* Muestra los parámetros: Imprime en la pantalla los detalles de la zona de búsqueda y las fechas elegidas para que puedas verificarlos. - -2. Realizar la búsqueda de datos específicos en la NASA: - -* Se conecta a la base de datos: Enlaza con la API CMR-STAC de la NASA para poder acceder a sus archivos. -* Especifica la colección: Indica que quiere buscar datos de la colección "OPERA_L3_DSWX-HLS_PROVISIONAL_V0". -* Realiza la búsqueda: Filtra los resultados según la zona de búsqueda, las fechas y un límite máximo de 1000 resultados. - -3. Buscar imágenes (de la colección DSWx-HLS) que coincidan con el área de interés: - -* Mide la superposición: Calcula cuánto se solapa cada imagen con el área que te interesa. -* Muestra los porcentajes: Imprime en la pantalla estos porcentajes para que puedas ver la cobertura. -* Filtra las imágenes: Selecciona solo aquellas que tengan una superposición mayor a un límite establecido. - - - - - -## Live Coding: Vamos a la notebook XXXXX. - - - -## Actividad 1: - -Modifica los parámetros XXX para definir una nueva área de interés. - - - -## Búsqueda y obtención de datos - -- Transformar los datos filtrados en una lista. -- Mostrar los detalles del primer resultado: - - Contar los resultados. - - Mostrar superposición. - - Indicar nubosidad. - - - -En la siguiente sección aprenderás: - -1. Cómo transformar los resultados filtrados en una lista para poder trabajar con ellos más fácilmente. -2. Cómo mostrar los detalles del primer resultado para ver cómo es la información que contiene. - - Contar los resultado: cuántos archivos se encontraron después de aplicar los filtros. - - Mostrar la superposición: cuánto se solapa cada archivo con la zona que buscas, para que sepas qué tan bien cubren el área. - - Indicar la nubosidad: cantidad de nubes que había en cada archivo antes de filtrarlo, para que puedas considerar si la cobertura de nubes es un factor importante para ti. - - - - - - -## Live Coding: Vamos a la notebook XXXX. - - - -## Actividad 2: - -A DEFINIR - - - -## Análisis de datos - -A DEFINIR - - - -## Live Coding: Vamos a la notebook XXXX. - - - -## Actividad 3: - -A DEFINIR - - - -## Procesamiento y visualización - -A DEFINIR - - - -## Live Coding: Vamos a la notebook XXXX. - - - -## Actividad 4: - -A DEFINIR - diff --git a/book/09_Scratch/drought.md b/book/09_Scratch/drought.md deleted file mode 100644 index 71a4f4c..0000000 --- a/book/09_Scratch/drought.md +++ /dev/null @@ -1,3 +0,0 @@ -# Drought - -![](../assets/climate_risks_drought_color.png) diff --git a/book/09_Scratch/notebooks/2_ES_Flood.md b/book/09_Scratch/notebooks/2_ES_Flood.md deleted file mode 100644 index 0973c93..0000000 --- a/book/09_Scratch/notebooks/2_ES_Flood.md +++ /dev/null @@ -1,605 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 - kernelspec: - display_name: opera_app_dev - language: python - name: python3 ---- - - -# **Introducción a la generación de mapas de inundaciones utilizando datos de teledetección** -**Guía para principiantes:** Este tutorial te enseñará cómo consultar y trabajar con los datos provisionales de OPERA DSWx-HLS desde la nube . Para conocer más sobre los datos utilizados en esta notebook, podes acceder a [OPERA_L3_DSWX-HLS_PROVISIONAL_V0](https://dx.doi.org/10.5067/OPDSW-PL3V0) ) - - - - - - - - - -**Cómo empezar con los mapas de agua utilizando el dataset de OPERA DSWx-HLS:** -Esta guía te muestrará cómo explorar los cambios en el agua en todo el mundo utilizando herramientas basadas en la nube. Usaremos OPERA DSWx-HLS, un conjunto de datos, obtenidos mediante teledetección, que rastrea la extensión de agua desde febrero de 2019 hasta septiembre de 2022 (ACTUALIZAR RANGO DE FECHA). - -**1. Conectándote a la información desde la nube:** - -Accederás a imágenes optimizadas de la Tierra (llamadas COG) directamente desde la nube, sin descargas pesadas. -Usarás un catálogo espacial súper práctico, denominado Catálogo de Activos Espaciales y Temporales de CMR (CMR-STAC), para encontrar las imágenes que necesitas, como si buscaras un libro en una biblioteca. - -**2. Explorando los datos de los productos OPERA DSWx-HLS:** - -Trabajarás con imágenes provisionales de la extensión de agua en la superficie terrestre (OPERA_L3_DSWX-HLS_PROVISIONAL_V0), recopiladas entre febrero de 2019 y septiembre de 2022, ¡una buena cantidad de información! (ACTUALIZAR RANGO DE FECHA). Dichas imágenes combinan lo mejor de dos satélites: Landsat 8 y Sentinel-2A/B, para una visión más completa. -Además, tendrás acceso a 10 capas de información por imagen, incluyendo clasificación de agua, confianza en los datos, cobertura del suelo, sombras del terreno, nubes, ¡y más! - -**3. Visualizando los datos a tu gusto:** - -Aprenderás a visualizar estas imágenes de la forma que más te convenga para analizarlas. - - -**Inundaciones en Pakistán con DSWx-HLS: Un ejemplo práctico** - -En 2022, las lluvias monzónicas de Pakistán alcanzaron niveles récord, provocando devastadoras inundaciones y deslizamientos de tierra que afectaron a las cuatro provincias del país y a alrededor del 14% de su población [CDP]. En este ejemplo, te mostramos cómo el sistema DSWx-HLS puede usarse para mapear la extensión de las inundaciones causadas por el monzón en septiembre de 2022. - -Capas del conjunto de datos científico (SDS): -DSWx-HLS nos brinda capas de información que nos permiten visualizar y analizar la situación con mayor detalle. -1. **B02_BWTR (Capa binaria de agua):** -Esta capa nos brinda una imagen simple de las áreas inundadas. Donde hay agua, la capa vale 1 (blanco) y donde no hay agua, toma valor 0 (negro). Es como un mapa binario de las inundaciones, ideal para obtener una visión general rápida de la extensión del desastre. -2. **B03_CONF (Capa de confianza):** -Esta capa nos indica qué tan seguro está el sistema DSWx-HLS de sus predicciones de agua. Donde la capa muestra valores altos (cerca de 100%), podemos estar muy seguros de que hay agua. En áreas con valores más bajos, la confianza disminuye, lo que significa que lo que parece agua podría ser otra cosa, como sombras o nubes. - -Para ayudarte a visualizar mejor cómo funciona esto piensa en una imagen satelital de la zona afectada por las inundaciones. Las áreas con agua se ven azul oscuro, mientras que las áreas secas se ven de color marrón o verde. -La capa binaria de agua (B02_BWTR), superpuesta sobre la imagen, sombrearía de blanco todas las áreas azules, creando un mapa simple de agua sí/no. -En cambio, la capa de confianza (B03_CONF) funcionaría como una transparencia superpuesta sobre la imagen, con áreas blancas sólidas donde la confianza es alta y transparencia creciente hacia el negro donde la confianza es baja. Esto te permite ver dónde el sistema DSWx-HLS está más seguro de que sus predicciones de agua son correctas. -Al combinar estas capas, los científicos y los trabajadores humanitarios pueden obtener una imagen clara de la extensión de las inundaciones y priorizar los esfuerzos de rescate y recuperación. - -**Recuerda:** -DSWx-HLS es una herramienta poderosa, pero los datos no son perfectos. Siempre es importante tener en cuenta la capa de confianza al interpretar los resultados. -Este es solo un ejemplo de cómo se puede usar DSWx-HLS para mapear las inundaciones. Hay muchas otras aplicaciones potenciales para este sistema. - -**Qué necesitas:** -- Una computadora con acceso a Internet -(Opcional) -- Conocimientos básicos de mapas e imágenes satelitales - -**Bono:** Para obtener más detalles técnicos, consulta el documento de especificación del [producto OPERA](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf). - - - - - -# **¡Antes de empezar!** - -Para estar preparado paa el tutorial y aprovechar al máximo, por favor revisa la sección` 1_Primeros pasos.md.` - - - - - - -## **Parte 1: Setear el Ambiente de Trabajo** - - - - - -### **1.1 Importar Paquetes** - - - -El código Python %load_ext autoreload y %autoreload 2 habilitan la recarga automática de módulos en un cuaderno de Jupyter. Esto significa que si modifica un módulo que ha importado en su cuaderno, los cambios se reflejarán automáticamente en su cuaderno sin tener que reiniciarlo. - - - -En la siguiente sección se importa una variedad de bibliotecas y herramientas que permiten: -* Obtener datos geoespaciales de diferentes fuentes. -* Procesar y analizar estos datos. -* Crear visualizaciones estáticas e interactivas para explorar y comunicar los resultados. - - - - -```python id="4-SnwjZJ0CSF" outputId="884b26b5-08a3-45d3-9570-1a27e6a1dd2c" -import os -from netrc import netrc -from subprocess import Popen -from platform import system -from getpass import getpass - -from pystac_client import Client -from pystac_client import ItemSearch - -import json - -import matplotlib.pyplot as plt -from matplotlib import cm -from datetime import datetime -from tqdm import tqdm - -from shapely.geometry import box -from shapely.geometry import shape -from shapely.ops import transform - -import numpy as np -import pandas as pd -import geopandas as gpd -from skimage import io - -from osgeo import gdal -from rioxarray.merge import merge_arrays - -import pyproj -from pyproj import Proj - -import folium -from folium import plugins -import geoviews as gv -import hvplot.xarray -import holoviews as hv -hv.extension('bokeh') -gv.extension('bokeh', 'matplotlib') - -import sys -sys.path.append('../../') -from src.dswx_utils import intersection_percent, colorize, getbasemaps, transform_data_for_folium - -import warnings -warnings.filterwarnings('ignore') -``` - - - -### **1.2 Setear el Ambiente de Trabajo** - - - -El código siguiente proporciona instrucciones para establecer el entorno de trabajo. Primero, se obtiene la ruta del directorio de trabajo actual y luego se define como directorio de trabajo. - - -```python id="zS09y7Ff0CSH" -inDir = os.getcwd() -``` - - - - -### **1.3 Generar token de autenticación** - - - - -Este código te ayuda a acceder a datos de la NASA de forma segura: - -* Revisa si ya guardaste tus datos de acceso: Si encuentra tu usuario y contraseña guardados, los usa automáticamente. -* Si es necesario, pide tu usuario y contraseña: Si no encuentra tus datos, te pide ingresarlos para guardarlos de forma segura para la próxima vez. -* Genera un token de autenticación: Este token permite que el código acceda a los datos de la NASA sin que tengas que ingresar tu usuario y contraseña cada vez. - - -```python id="UXe44Ncb0CSI" -# Generates authentication token -# Asks for your Earthdata username and password for first time, if netrc does not exists in your home directory. - -urs = 'urs.earthdata.nasa.gov' # Earthdata URL endpoint for authentication -prompts = ['Enter NASA Earthdata Login Username: ', - 'Enter NASA Earthdata Login Password: '] - -# Determine the OS (Windows machines usually use an '_netrc' file) -netrc_name = "_netrc" if system()=="Windows" else ".netrc" - -# Determine if netrc file exists, and if so, if it includes NASA Earthdata Login Credentials -try: - netrcDir = os.path.expanduser(f"~/{netrc_name}") - netrc(netrcDir).authenticators(urs)[0] - -``` - - -El siguiente código: -* Prepara el acceso a los datos en la nube: Guarda la información necesaria para conectarse a los datos de PODAAC en un archivo llamado "cookies.txt". -* Evita errores al buscar archivos: Le indica a la herramienta GDAL que no pierda tiempo buscando archivos en carpetas vacías. -* Se enfoca en los archivos que necesitas: Le dice a GDAL que solo trabaje con archivos que tengan la extensión TIF o TIFF. - - - - -```python id="LkOCFmpm0CSJ" -# GDAL configurations used to successfully access PODAAC Cloud Assets via vsicurl -gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/cookies.txt') -gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/cookies.txt') -gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR') -gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') -``` - - -# **Parte 2: Selección del Área de interés (AOI)** - - - - -## **2. API CMR-STAC: Búsqueda de datos basada en consultas espaciales** - - - - - -### **2.1 Inicializar parámetros definidos por el usuario ** - - - -* Define la zona de búsqueda: Dibuja un rectángulo en el mapa para indicar el área donde quieres buscar datos. -* Establece el periodo de búsqueda: Marca la fecha de inicio y de fin para acotar los resultados a un rango de tiempo específico. -* Muestra los parámetros: Imprime en la pantalla los detalles de la zona de búsqueda y las fechas elegidas para que puedas verificarlos. - - -```python id="bDyra8KD0CSK" outputId="2ca06a37-54b3-4ade-cc6b-761efe056c5f" -# USER-DEFINED PARAMETERS -aoi = box(67.4, 26.2, 68.0, 27.5) -start_date = datetime(2022, 1, 1) # in 2022-01-01 00:00:00 format -stop_date = f"{datetime.today().strftime('%Y-%m-%d')} 23:59:59" # in 2022-01-01 00:00:00 format -overlap_threshold = 10 # in percent -#cloud_cover_threshold = 20 # in percent - -print(f"Search between {start_date} and {stop_date}") -print(f"With AOI: {aoi.__geo_interface__}") -``` - - -Este código busca datos específicos en la NASA: - -* Se conecta a la base de datos: Enlaza con la API CMR-STAC de la NASA para poder acceder a sus archivos. -* Especifica la colección: Indica que quiere buscar datos de la colección "OPERA_L3_DSWX-HLS_PROVISIONAL_V0". -* Realiza la búsqueda: Filtra los resultados según la zona de búsqueda, las fechas y un límite máximo de 1000 resultados. - - -```python id="ixhkdWNE0CSK" -# Search data through CMR-STAC API -stac = 'https://cmr.earthdata.nasa.gov/cloudstac/' # CMR-STAC API Endpoint -api = Client.open(f'{stac}/POCLOUD/') -collections = ['OPERA_L3_DSWX-HLS_PROVISIONAL_V0'] - -search_params = {"collections": collections, - "intersects": aoi.__geo_interface__, - "datetime": [start_date, stop_date], - "max_items": 1000} -search_dswx = api.search(**search_params) -``` - - - -### **2.2 Búsqueda de imágenes (de la colección DSWx-HLS) que coincidan con el área de interés** - - - -El siguiente código: - -* Mide la superposición: Calcula cuánto se solapa cada imagen con el área que te interesa. -* Muestra los porcentajes: Imprime en la pantalla estos porcentajes para que puedas ver la cobertura. -* Filtra las imágenes: Selecciona solo aquellas que tengan una superposición mayor a un límite establecido. - - -```python id="H9s93OUm0CSL" outputId="8c38f171-0034-42d8-a5c8-638d00827034" -# Filter datasets based on spatial overlap -intersects_geometry = aoi.__geo_interface__ - -#Check percent overlap values -print("Percent overlap before filtering: ") -print([f"{intersection_percent(i, intersects_geometry):.2f}" for i in search_dswx.items()]) - -# Apply spatial overlap and cloud cover filter # utilizamos la variable overloap definida anteriormente -dswx_filtered = ( - i for i in search_dswx.items() if (intersection_percent(i, intersects_geometry) > overlap_threshold) -) -``` - - -# **Parte 3: Búsqueda y obtención de datos** - - - - -El siguiente código: - -1. Transforma los resultados filtrados en una lista para poder trabajar con ellos más fácilmente. -2. Muestra los detalles del primer resultado para ver cómo es la información que contiene. - - -```python id="3BL5APaz0CSL" outputId="cdd13930-819a-4f1a-a087-bf182dc42c0f" -# Inspect the items inside the filtered query -dswx_data = list(dswx_filtered) -# Inspect one data -dswx_data[0].to_dict() -``` - - -**A continuación, se muestra un resumen de los resultados de la búsqueda:** - -* Cuenta los resultados: Te dice cuántos archivos encontró después de aplicar los filtros. -* Muestra la superposición: Te indica cuánto se solapa cada archivo con la zona que buscas, para que sepas qué tan bien cubren el área. -* Indica la nubosidad: Te informa la cantidad de nubes que había en cada archivo antes de filtrarlo, para que puedas considerar si la cobertura de nubes es un factor importante para ti. - - - - -```python id="gGPdYpKc0CSM" -## Print search information -# Tota granules -print(f"Total granules after search filter: {len(dswx_data)}") - -#Check percent overlap values -print("Percent-overlap: ") -print([f"{intersection_percent(i, intersects_geometry):.2f}" for i in dswx_data]) - -# Check percent cloud cover values -print("\nPercent cloud cover before filtering: ") -print([f"{i.properties['eo:cloud_cover']}" for i in search_dswx.items()]) -``` - - -# **Actividad práctica A**: Exploración y obtención del área de interés. - - - -# **Parte 4: Análisis de los datos obtenidos** - -Análisis de Series de tiempo con los datos de la zonade interés. - - - -# **Actividad práctica B: Series de tiempo aplicada al área de interés.** - - - -# **Parte 5: Procesamiento y visualización de los datos** - - - -**Crea un mapa para que puedas ver cómo encajan los datos con tu zona de interés:** - -* Dibuja los límites de los archivos: Traza los bordes de cada archivo encontrado en color azul para que puedas ver su forma y ubicación. -* Coloca un mapa de fondo: Agrega un mapa base de la zona para que tengas una referencia visual del terreno. -* Marca tu área de interés: Dibuja un rectángulo amarillo alrededor de la zona que especificaste en la búsqueda para que puedas ver cómo se superponen los archivos con ella. -* Muestra el mapa: Te presenta el mapa resultante para que puedas analizar la cobertura de los datos y la coincidencia con tu zona de interés. - - -```python id="8hDY5sJD0CSM" -# Visualize the DSWx tile boundary and the user-defined bbox -geom_df = [] -for d,_ in enumerate(dswx_data): - geom_df.append(shape(dswx_data[d].geometry)) - -geom_granules = gpd.GeoDataFrame({'geometry':geom_df}) -granules_poly = gv.Polygons(geom_granules, label='DSWx tile boundary').opts(line_color='blue', color=None, show_legend=True) - -# Use geoviews to combine a basemap with the shapely polygon of our Region of Interest (ROI) -base = gv.tile_sources.EsriImagery.opts(width=1000, height=1000) - -# Get the user-specified aoi -geom_aoi = shape(intersects_geometry) -aoi_poly = gv.Polygons(geom_aoi, label='User-specified bbox').opts(line_color='yellow', color=None, show_legend=True) - -# Plot using geoviews wrapper -granules_poly*base*aoi_poly -``` - - - - -### **5.1 Presentar los resultados de la búsqueda en una tabla ** - - - -Crea una tabla para que puedas revisar los resultados de forma organizada: - -* Recorre los resultados: Lee cada uno de los archivos encontrados y extrae la información más relevante. -* Organiza los datos: Coloca la información en columnas para que sea fácil de leer y comparar: -ID del archivo -Sensor que lo capturó -Fecha de la captura -Coordenadas del archivo -Límites del área que cubre -Porcentaje de superposición con tu área de interés -Cobertura de nubes en el archivo -Enlaces para descargar las bandas del archivo -* Muestra la tabla: Te presenta la tabla completa para que puedas analizar los resultados y seleccionar los archivos que mejor se adapten a tus necesidades. - - -```python id="uYhO2Ycs0CSM" -# Create table of search results -dswx_data_df = [] -for item in dswx_data: - item.to_dict() - fn = item.id.split('_') - ID = fn[3] - sensor = fn[6] - dat = item.datetime.strftime('%Y-%m-%d') - spatial_overlap = intersection_percent(item, intersects_geometry) - geom = item.geometry - bbox = item.bbox - - # Take all the band href information - band_links = [item.assets[links].href for links in item.assets.keys()] - dswx_data_df.append([ID,sensor,dat,geom,bbox,spatial_overlap,cloud_cover,band_links]) - -dswx_data_df = pd.DataFrame(dswx_data_df, columns = ['TileID', 'Sensor', 'Date', 'Coords', 'bbox', 'SpatialOverlap', 'CloudCover', 'BandLinks']) -dswx_data_df -``` - - - -## **5.2. Carga y visualización de la extensión de la inundación** - - - -Antes de avanzar, repasemos los pasos necesarios para visualizar la extensión de inundación: - -1. **Obtención de los datos de inundación:** - - En este ejemplo se utilizan imágenes satelitales del sitio de la NASA. -En cuanto al formato de datos, existen diferentes tipos como ráster, shapefile, KML, o GeoJSON. En nuestro caso, trabajamos con ráster. - -2. **Carga de datos:** - - Utilizamos las bibliotecas como GDAL o rasterio en Python para cargar los datos de inundación. - -3. **Visualización la extensión de inundación:** - - Simbología: Es necesario, asignar un color y transparencia apropiados a las áreas inundadas para diferenciarlas del terreno seco. Puedes usar una escala de colores para representar la profundidad de la inundación. - - Superposición: Superpone la extensión de inundación sobre un mapa base, como una imagen aérea o un mapa topográfico, para proporcionar contexto geográfico. -También será de utilidad añadir leyendas, escalas, rótulos y otros elementos gráficos para mejorar la claridad y la interpretación del mapa. - - Ejemplo de visualización: - - [Imagen de un mapa que muestra la extensión de inundación en una ciudad. Las áreas inundadas están representadas en azul claro, con mayor transparencia para las zonas de menor profundidad. El mapa base es una imagen aérea y se incluyen una leyenda, una escala y rótulos.][Agregar imagen] - - - - -### **5.2.1 Cargar B02-BWTR (Capa binaria de agua) y B03-CONF (Capa de confianza)** - - - -Para cargar estas capas, necesitarás un programa compatible con el formato de los datos y las herramientas específicas de carga. Utilizamos GDAL y Rasterio para el procesamiento de imágenes comunes que te permitirán cargar estas capas. - - - -```python id="dCH6j5wp0CSN" -# Take one of the flooded dataset and check what files are included. -dswx_data_df.iloc[43].BandLinks -``` - - - -### **5.2.2 "Fusionar mosaicos".** - - - -"Fusionar mosaicos". Esta frase se refiere a la acción de combinar dos o más mosaicos de datos para crear un mosaico más grande. - -En el contexto de imágenes satelitales, los mosaicos son imágenes individuales que se superponen para cubrir un área más grande. La fusión de mosaicos se utiliza a menudo para crear imágenes de mayor resolución espacial o temporal que las que se pueden obtener a partir de un solo mosaico. - -Para fusionar mosaicos, se necesitan dos o más mosaicos que tengan el mismo formato y resolución. Los mosaicos se pueden fusionar utilizando un software de SIG o una biblioteca de procesamiento de imágenes. - - - -El código está preparando imágenes para mostrarlas en un mapa interactivo.Imagina que tienes un rompecabezas que quieres armar en un mapa. Entonces, el código hace lo siguiente: - -* Busca las piezas correctas: Encuentra las imágenes del 30 de septiembre que muestran dónde hay agua (capa B02-BWTR) y qué tan confiable es esa información (capa B03-CONF). - -* Prepara las piezas: Adapta las imágenes para que encajen en el mapa (como si estuvieras recortando los bordes del rompecabezas para que calcen bien). -Las coloca en la posición correcta sobre el mapa (como ir poniendo las piezas en su lugar). - -* Une las piezas: Combina las imágenes separadas para crear una vista completa, como si unieras las piezas del rompecabezas. - -* Revisa una pieza: Mira de cerca una de las imágenes para asegurarse de que todo esté bien (como revisar una pieza del rompecabezas antes de seguir armando). - - -```python id="V4mhzgDZ0CSN" -# Get B02-BWTR layer for tiles acquired on 2022-09-30, project to folium's projection and merge tiles -T42RUR_B02, T42RUR_B02_cm = transform_data_for_folium(dswx_data_df.iloc[42].BandLinks[1]) -T42RUQ_B02, T42RUQ_B02_cm = transform_data_for_folium(dswx_data_df.iloc[43].BandLinks[1]) -merged_B02 = merge_arrays([T42RUR_B02, T42RUQ_B02]) - -# Get B03-CONF layer for tiles acquired on 2022-09-30, project to folium's projection and merge tiles -T42RUR_B03, T42RUR_B03_cm = transform_data_for_folium(dswx_data_df.iloc[42].BandLinks[2]) -T42RUQ_B03, T42RUQ_B03_cm = transform_data_for_folium(dswx_data_df.iloc[43].BandLinks[2]) -merged_B03 = merge_arrays([T42RUR_B03, T42RUQ_B03]) - -# Check one of the DataArrays -merged_B02 -``` - - - - -### **5.3 Visualiza las imágenes en un mapa interactivo** - - - - El código está prepara los colores para que las imágenes sean claras y fáciles de interpretar. Podes imagianr que tenes un libro para colorear con áreas numeradas, y cada número corresponde a un color específico. El código está haciendo lo siguiente: - - - Toma los dibujos en blanco y negro: Busca las imágenes que ya preparó (las que muestran el agua y la confianza). - - Encuentra la paleta de colores adecuada: Tiene la lista de colores que se deben usar para cada área de las imágenes (como la lista de colores del libro para colorear). - - Pinta cuidadosamente cada área: Asigna los colores correctos a cada zona de las imágenes, siguiendo la paleta elegida. - - Guarda los dibujos coloreados: Almacena las imágenes ya coloreadas para usarlas en el mapa. - - - -```python id="RLcoFzy10CSO" -# Colorize the map using predefined colors from DSWx for Folium display -colored_B02,cmap_B02 = colorize(merged_B02[0], cmap=T42RUR_B02_cm) -colored_B03,cmap_B03 = colorize(merged_B03[0], cmap=T42RUR_B03_cm) -``` - - - -Imagina que estás creando un mapa interactivo con capas superpuestas, el código está haciendo lo siguiente: - -* Extiende el lienzo: Despliega un mapa base sobre la mesa, como un mapamundi, -para tener un fondo sobre el que trabajar. - -* Agrega mapas extra: Coloca varios mapas transparentes encima del mapa base, como si fueran hojas de acetato. Así, podrás elegir la vista que más te guste. - -* Dibuja las áreas inundadas: En una de las hojas transparentes, pinta con colores llamativos las zonas donde hay agua, para que se vean claramente. - -* Marca la confianza en la información: En otra hoja transparente, dibuja con colores más tenues la confianza que se tiene en los datos de las áreas inundadas, como si fuera una pista extra para los más aventureros. - -* Añade herramientas útiles: Coloca sobre la mesa una lupa para ampliar zonas, un botón para ver el mapa en pantalla completa y una mini versión del mapa en una esquina, como si fuera una brújula. - -* Muestra las coordenadas: Activa un indicador que te dice en qué lugar del mapa estás apuntando en cada momento, como si fuera una guía de coordenadas. - - -```python id="AP5r2kHN0CSQ" -# Initialize Folium basemap -xmid =(merged_B02.x.values.min()+merged_B02.x.values.max())/2 ; ymid = (merged_B02.y.values.min()+merged_B02.y.values.max())/2 -m = folium.Map(location=[ymid, xmid], zoom_start=9, tiles='CartoDB positron', show=True) - -# Add custom basemaps -basemaps = getbasemaps() -for basemap in basemaps: - basemaps[basemap].add_to(m) - -# Overlay B02 and B03 layers -folium.raster_layers.ImageOverlay(colored_B02, - opacity=0.6, - bounds=[[merged_B02.y.values.min(),merged_B02.x.values.min()],[merged_B02.y.values.max(),merged_B02.x.values.max()]], - name='Flooded Area', - show=True).add_to(m) - -folium.raster_layers.ImageOverlay(colored_B03, - opacity=0.8, - bounds=[[merged_B03.y.values.min(),merged_B03.x.values.min()],[merged_B03.y.values.max(),merged_B03.x.values.max()]], - name='Confidence Layer', - show=False).add_to(m) - -#layer Control -m.add_child(folium.LayerControl()) - -# Add fullscreen button -plugins.Fullscreen().add_to(m) - -#Add inset minimap image -minimap = plugins.MiniMap(width=300, height=300) -m.add_child(minimap) - -#Mouse Position -fmtr = "function(num) {return L.Util.formatNum(num, 3) + ' º ';};" -plugins.MousePosition(position='bottomright', separator=' | ', prefix="Lat/Lon:", - lat_formatter=fmtr, lng_formatter=fmtr).add_to(m) - -#Display -m -``` - - -# **Actividad práctica C: Visualización de la extensión de agua en el área seleccionada** - - -```python id="hdF6iHO5Ljzz" - -``` diff --git a/book/09_Scratch/proposal.md b/book/09_Scratch/proposal.md deleted file mode 100644 index d4e9c9d..0000000 --- a/book/09_Scratch/proposal.md +++ /dev/null @@ -1,86 +0,0 @@ -# Reproducibly Analyzing Wildfire, Drought, and Flood Risk with NASA Earthdata Cloud - -:::{seealso} -This text is excerpted from the original project proposal: - -Munroe, James, Palopoli, Nicolas, & Acion, Laura. (2023). Reproducibly Analyzing Wildfire, Drought, and Flood Risk with NASA Earthdata Cloud. Zenodo. https://doi.org/10.5281/zenodo.8212073 -::: - - -## Project summary -As the climate changes, prediction and management of the risk of wildfire, drought, and floods has become increasingly challenging. It is no longer sufficient to assume that what has been normal and historic for the last century will occur with the same frequency into the future. These natural risks are intrinsically linked to the changing distributions of surface water, precipitation, vegetation, and land use in both time and space. At the same time, there are now hundreds of petabytes of relevant Earth science data available through the NASA Earthdata Cloud that can be used to understand and forecast these water-dependent environmental risks. With the volume of Earth science data growing dramatically year over year, it is important for scientists to understand how open science and cloud-based data intensive computing can be used to reproducibly analyze and assess the changing risk profile of wildfire, drought, and floods. - -In this proposed TOPS ScienceCore module, learners will learn to identify, extract, analyze, visualize, and report on data available through NASA Earthdata Cloud for three different scenarios: wildfire, drought, and flood risk. The module will build upon TOPS OpenCore and reinforce principles of reproducibility and open science-based workflows. Computationally, the scenarios will estimate changes in the hydrological water mass balance for defined regions primarily using remote sensing data. We will demonstrate best practices in “data-proximate computing” by considering examples that involve computing climatologies and other statistics from long-time series using numerical methods that scale well with the data being available on the cloud. This module will leverage scientific Python libraries such as Xarray and Dask to perform the computations. The focus of this module will be on data processing and visualization and doing so in a reproducible and transparent way. - -After completing this module, learners will be able to adapt and remix the scenarios for their own open science objectives regarding environmental risks such as wildfire, drought, and flood. These risks are common worldwide yet need to be each analyzed in a regional context. The module will provide concrete examples that showcase how open science can be done. - -The module will be written as an extension to the OpenCore framework and all course materials will be open, available in English and Spanish, and accessible in the vision, hearing, mobility, and attention dimensions. The ScienceCore module will be released as one or more Jupyter notebooks on GitHub with supporting material for delivering the course using the cloud either for in-person or for virtual cohorts. - - -## Scientific/Technical Management - -### Introduction -With a changing climate, wildfires, droughts, and floods continue to be significant risks across the United States and the rest of the world {cite:p}`seneviratne_weather_2021,hicke_north_2022`. Events that used to occur only once a century are now occurring every few years. Historical norms for frequency of extreme climate leading to episodic disaster are not sufficient to infer the frequency into the future. - -Flooding is the most significant environmental disaster affecting more than two billion people a year. Floods cause significant damage to infrastructure, displace people, and lead to disease. The frequency of flooding has increased in recent years due to changes in rainfall and land use. - -The data from the [National Interagency Fire Center](https://www.nifc.gov/) shows significant growth in the size of wildfires in the US over the last 25 years. In some years, over 10 million acres in the US have been burned by wildfires. Canada has also experienced significant recent wildfires resulting in loss of property, livestock, and industry. - -Floods and wildfires are intrinsically linked to underlying water conditions: too much or too little water. Droughts, although on a longer timescale, are also caused by too little water being retained in the environment. Droughts, wildfires, and floods can all occur over the course of a few short months in the same region. The key factor is the abundance, or absence, of precipitation (rain and snowfall) either in short-term events or long-term shifts of how that water is retained by the land. - -Floods, wildfires, and droughts are at risk to increase in frequency and intensity due to fundamental shifts in extreme precipitation caused by climate change. Fundamentally, these three environmental risks are about water no longer having the distribution that they have had in the past. We can recognize that our world is changing and causing risk. How can we mitigate that risk? - -[NASA Earthdata Cloud](https://www.earthdata.nasa.gov/eosdis/cloud-evolution) has moved petabytes of data into the cloud and that data is ideally suited to answering questions about climate change risk. However, practitioners don’t yet have the proper skills and training to take advantage of these amazing resources. It is no longer sufficient to search for a dataset and then download it locally. The sheer size of the available data makes this not just impractical but, in many cases, impossible. But how do scientists attempting risk assessment complete their work when it is so difficult to download the needed data to a computer to allow running their analysis code locally? The answer is to instead perform data-proximate computing by pushing the analysis code to a computer that is very close (in terms of both costs and latency using the Internet) to where the data is hosted. For data that is hosted in the commercial cloud (such as NASA Earthdata), this means running data analysis on computers in that same cloud environment. - -This ScienceCore module, building on top of the OpenCore modules, will teach learners how to access the NASA Earthdata Cloud and produce dashboard-based visualizations and analyses of water sensing data. This data can then be compared with model outputs to forecasts, the new ‘normals’ for wildfire, drought, and flood risk across the world. -This risk assessment is highly localized and needs to be repeated for every country, state, city, and village. Demonstrating how to leverage NASA Earthdata cloud data in an open, reproducible way will aid thousands of scientists and analysts to produce the reports they need for their own communities. - -There is currently a barrier for scientists to use NASA Earthdata Cloud. They do not have the skills and expertise to analyze data at scale in the cloud. This ScienceCore module will teach that skill. - -### Objectives and Expected Significance -Teaching a large number of users about reproducibly analyzing earth data will accelerate our ability to mitigate and adapt to climate change. So a ‘science objective’ is determining if this ScienceCore module will actually help with those adaptation efforts. - -This new ScienceCore module will solve the need to apply earth sensing data to climate risk assessment. More generally, it will serve as a template for future ScienceCore modules in this domain. As part of this work, we will not only develop the ScienceCore module but measure its effectiveness in meeting its learning objectives. - -Open science is important for climate change because it helps to ensure that the research and data related to climate change is accessible to everyone. This means that anyone, regardless of their expertise or background, can review and verify the research, which helps to build trust in the findings. In addition, open science promotes collaboration and sharing of ideas among researchers, which can lead to more rapid progress in understanding and addressing climate change. By making research open and transparent, we can ensure that the best science is being used to inform decisions about how to address climate change. - -### Impact of proposed work to state of the art -Climate risk needs to be reevaluated at the national, state, and municipality levels. Companies and non-governmental organizations also need to assess climate risk. NASA data contained in the Earthdata cloud is highly relevant to answering the question of assigning climate risk. This ScienceCore module will provide a template for accessing the data and then analyzing it reproducibly, in a consistent and open manner that exemplifies the best practices identified in the TOPS OpenCore content. - -The intended audience for this ScienceCore module is people tasked with producing climate risk assessments. There is potential for taking the tooling and data that we collectively now have available from these global models and downscaling that information to every country, region, state, city, town, and village on anticipated climate impacts. Policy and planners at all levels are beginning to tackle the questions of taking these large, global scale data products and figuring out what they mean for their locality. The community is thinking about how the physical climate variables (e.g. temperature and precipitation) affect risks such as flooding, wildfires, droughts, sea level rise, food sustainability, among many others. - -JupyterHub infrastructure will provide a platform to deliver this module. This infrastructure can be deployed by any team with intermediate DevOps and Kubernetes knowledge. Using the open infrastructure allows the content to be run by any organization. - -### Relevance of proposed work to announcement -This ScienceCore module is focused on helping people access and analyze data from NASA, including data that is stored in the cloud. This includes using open-source tools and libraries to analyze and visualize the data, and to create and share reproducible research workflows. The project also includes modules that build on existing OpenCore concepts, and that cover important topics in different scientific disciplines. - -These modules can be accessed through the TOPS Open edX platform or as Jupyter Books on the TOPS GitHub. The goal of ScienceCore is to make it easy for people to work with NASA data, and to collaborate and share their research with others. The project is designed to be accessible, open, collaborative, multilingual, and interactive. All of the final products created through ScienceCore will be openly licensed and shared on the TOPS GitHub, and proposals for funded projects must include plans for collaboration and participation in annual coordination meetings. - -### Technical approach and methodology -To create this OpenCore module we will bring together **pedagogical experts**, with training in open and inclusive content design, alongside **scientific content specialists** to design course material that follows best-practices in open science, data analysis, and domain methodologies. - -This module will contain three fully worked examples of reproducible analysis of an environmental risk. During course delivery, an instructor may choose to only go through one of these worked examples and leave the others for reference. Each example will be independent. - -The largest risk is that the technology for accessing cloud enabled data is rapidly changing. It is possible that the specific code examples developed in this ScienceCore module will be obsolete or otherwise out of date in the near term. However, the science content and especially the open science content has a much longer persistence. - -Any of the ‘code exercises’ will be written so that they can be replaced in the future without having to fundamentally change the narrative of the module. The science objectives (data discovery, data reduction, visualization, comparison to model output, use in machine learning) will remain even if the libraries we use continue to undergo rapid development. - -Another problem is how to scope this module so that participants have the right background for it to be useful. In a relatively short half-day course, learners will already have to have a sufficient background knowledge in data driving computing, programming, visualization as well as risk assessment, remote sensing data, and statistical analysis to be able to directly apply this module. This ScienceCore module will contain links and reference to ‘additional material’ that can help learners to prepare in advance with background material as needed. So the module is taught to those for whom it was designed, the module will provide six learner personas {cite}`wilson_teaching_2020` including the person’s general background, what they already know, what they want to achieve, and any special needs they have. - -As well, the module will include pre-assessment tools for assessing learners' knowledge of climate science and risk assessment, proficiency with Jupyter notebooks and Python programming, as well as years of experience related with these areas to control for those piloting the ScienceCore module who may have not yet developed the adequate skills. - -The module will be designed using a backward design -{cite:p}`wiggins_understanding_2005,biggs_teaching_2011,fink_creating_2013` -incorporating guidelines like those in {cite}`metadocencia_team_hoja_2022`, - {cite}`via_course_2020`, and {cite:p}`noauthor_collaborative_nodate`. Module backward design entails: -1. Creating learner personas to determine and clearly communicate what audience the module is designed for. -1. Writing an initial draft of topics that will be included and not included. -1. Creating one summative assessment including the contents learners will have to master to obtain this module badge. -1. Creating all the formative assessments so learners practice contents throughout the module. About one formative assessment per teaching unit, comprising 5 to 9 new concepts and at most after every 15 minutes of content, ensures the module uses an active teaching style and allows both trainers and learners to assess their progress and adapt to the requirements of the occasion. -1. Ordering formative assessments considering complexity, dependencies, and how a topic motivates learners. -1. Writing the content that will guide learners between formative assessments. -1. Generating a succinct module description for promoting it and reaching interested learners. - - -```{bibliography} -``` diff --git a/book/09_Scratch/references.bib b/book/09_Scratch/references.bib deleted file mode 100644 index a316d33..0000000 --- a/book/09_Scratch/references.bib +++ /dev/null @@ -1,156 +0,0 @@ - -@techreport{metadocencia_team_hoja_2022, - title = {Hoja de rota modelo para desarrollo de {Cursos}}, - copyright = {Creative Commons Attribution 4.0 International, Open Access}, - url = {https://zenodo.org/record/7390559}, - doi = {10.5281/ZENODO.7390559}, - abstract = {En este documento se ofrecen pautas, sugerencias y pasos para el diseño de cursos.}, - urldate = {2022-12-08}, - author = {MetaDocencia Team}, - institution = {}, - month = aug, - year = {2022}, - note = {Publisher: Zenodo}, - keywords = {Ciencia Abierta, clases virtuales, Comunidad, Cursos, docencia, enseñanza, Latin America, pedagogía, Teaching}, -} - -@techreport{via_course_2020, - title = {Course design: {Considerations} for trainers – a {Professional} {Guide}}, - shorttitle = {Course design}, - url = {https://f1000research.com/documents/9-1377}, - doi = {10.7490/F1000RESEARCH.1118395.1}, - urldate = {2022-12-08}, - author = {Via, Allegra and Palagi, Patricia M. and Lindvall, Jessica M. and Tractenberg, Rochelle E. and Attwood, Teresa K. and Foundation, The GOBLET}, - year = {2020}, - note = {Publisher: F1000 Research Limited}, - institution = {}, - -} - -@book{wiggins_understanding_2005, - address = {Alexandria, VA}, - edition = {Expanded 2nd ed}, - title = {Understanding by design}, - isbn = {978-1-4166-0035-0}, - publisher = {Association for Supervision and Curriculum Development}, - author = {Wiggins, Grant P. and McTighe, Jay}, - year = {2005}, - keywords = {Comprehension, Curriculum planning, Curriculum-based assessment, Learning, United States}, -} - -@book{biggs_teaching_2011, - address = {Maidenhead, England New York, NY}, - edition = {4th edition}, - series = {{SRHE} and {Open} {University} {Press} imprint}, - title = {Teaching for quality learning at university: what the student does}, - isbn = {978-0-335-24275-7}, - shorttitle = {Teaching for quality learning at university}, - language = {eng}, - publisher = {McGraw-Hill, Society for Research into Higher Education \& Open University Press}, - author = {Biggs, John B. and Tang, Catherine So-kum}, - collaborator = {Society for Research into Higher Education}, - year = {2011}, - file = {Table of Contents PDF:C\:\\Users\\jmunr\\Zotero\\storage\\QDQERYVW\\Biggs and Tang - 2011 - Teaching for quality learning at university what .pdf:application/pdf}, -} - -@book{fink_creating_2013, - address = {San Francisco}, - edition = {Revised and updated edition}, - series = {Jossey-{Bass} {Higher} and {Adult} {Education} {Series}}, - title = {Creating significant learning experiences: an integrated approach to designing college courses}, - isbn = {978-1-118-12425-3 978-1-118-41901-4 978-1-118-41632-7}, - shorttitle = {Creating significant learning experiences}, - abstract = {"In this thoroughly updated edition of L. Dee Fink's bestselling classic, he discusses new research on how people learn, active learning, and the effectiveness of his popular model adds more examples from online teaching; and further focuses on the impact of student engagement on student learning. The book explores the changes in higher education nationally and internationally since the publication of the previous edition, includes additional procedures for integrating one's course, and adds strategies for dealing with student resistance to innovative teaching. This edition continues to provide conceptual and procedural tools that are invaluable for all teachers when designing instruction. It shows how to use a taxonomy of significant learning and systematically combine the best research-based practices for learning-centered teaching with a teaching strategy in a way that results in powerful learning experiences for students. Acquiring a deeper understanding of the design process will empower teachers to creatively design courses that will result in significant learning for students"--}, - publisher = {Jossey-Bass}, - author = {Fink, L. Dee}, - year = {2013}, - keywords = {United States, College teaching, Curricula, EDUCATION / Teaching Methods \& Materials / General, Education, Higher}, -} - -@misc{noauthor_collaborative_nodate, - title = {Collaborative {Lesson} {Development} {Training}: {Summary} and {Setup}}, - url = {https://carpentries.github.io/lesson-development-training/}, - urldate = {2022-12-08}, - file = {Collaborative Lesson Development Training\: Summary and Setup:C\:\\Users\\jmunr\\Zotero\\storage\\7QVQ6ZJH\\lesson-development-training.html:text/html}, -} - -@misc{earth_science_data_systems_earthdata_2019, - type = {Program}, - title = {Earthdata {Cloud} {Evolution}}, - url = {http://www.earthdata.nasa.gov/eosdis/cloud-evolution}, - abstract = {Feature article about efforts to move EOSDIS data and services into the commercial cloud, the reasons behind this effort, and the benefits to data users.}, - language = {en}, - urldate = {2022-12-08}, - journal = {Earthdata}, - author = {Earth Science Data Systems, NASA}, - month = may, - year = {2019}, - note = {Publisher: Earth Science Data Systems, NASA}, - file = {Snapshot:C\:\\Users\\jmunr\\Zotero\\storage\\3A2IVYZ8\\cloud-evolution.html:text/html}, -} - -@incollection{hicke_north_2022, - address = {Cambridge, UK and New York, USA}, - title = {North {America}}, - isbn = {978-1-00-932584-4}, - booktitle = {Climate {Change} 2022: {Impacts}, {Adaptation} and {Vulnerability}. {Contribution} of {Working} {Group} {II} to the {Sixth} {Assessment} {Report} of the {Intergovernmental} {Panel} on {Climate} {Change}}, - publisher = {Cambridge University Press}, - author = {Hicke, J.A. and Lucatello, S. and L.D., Mortsch and Dawson, J. and Aguilar, M. Domínguez and Enquist, C.A.F. and Gilmore, E.A. and Gutzler, D.S. and Harper, S. and Holsman, K. and Jewett, E.B. and Kohler, T.A. and Miller, KA.}, - editor = {Pörtner, H. O. and Roberts, D. C. and Tignor, M. and Poloczanska, E. S. and Mintenbeck, K. and Alegría, A. and Craig, M. and Langsdorf, S. and Löschke, S. and Möller, V. and Okem, A. and Rama, B.}, - year = {2022}, - doi = {10.1017/9781009325844.016.1929}, - note = {Type: Book Section}, - pages = {1929--2042}, - file = {IPCC_AR6_WGII_Chapter14.pdf:C\:\\Users\\jmunr\\Zotero\\storage\\6XXKL7L2\\IPCC_AR6_WGII_Chapter14.pdf:application/pdf}, -} - -@incollection{doblas-reyes_linking_2021, - address = {Cambridge, United Kingdom and New York, NY, USA}, - title = {Linking {Global} to {Regional} {Climate} {Change}}, - booktitle = {Climate {Change} 2021: {The} {Physical} {Science} {Basis}. {Contribution} of {Working} {Group} {I} to the {Sixth} {Assessment} {Report} of the {Intergovernmental} {Panel} on {Climate} {Change}}, - publisher = {Cambridge University Press}, - author = {Doblas-Reyes, F.J. and Sörensson, A.A. and Almazroui, M. and Dosio, A. and Gutowski, W.J. and Haarsma, R. and Hamdi, R. and Hewitson, B. and Kwon, W.-T. and Lamptey, B.L. and Maraun, D. and Stephenson, T.S. and Takayabu, I. and Terray, L. and Turner, A. and Zuo, Z.}, - editor = {Masson-Delmotte, V. and Zhai, P. and Pirani, A. and Connors, S.L. and Péan, C. and Berger, S. and Caud, N. and Chen, Y. and Goldfarb, L. and Gomis, M.I. and Huang, M. and Leitzell, K. and Lonnoy, E. and Matthews, J.B.R. and Maycock, T.K. and Waterfield, T. and Yelekçi, O. and Yu, R. and Zhou, B.}, - year = {2021}, - doi = {10.1017/9781009157896.012}, - note = {Type: Book Section}, - pages = {1363--1512}, -} - -@incollection{douville_water_2021, - address = {Cambridge, United Kingdom and New York, NY, USA}, - title = {Water {Cycle} {Changes}}, - booktitle = {Climate {Change} 2021: {The} {Physical} {Science} {Basis}. {Contribution} of {Working} {Group} {I} to the {Sixth} {Assessment} {Report} of the {Intergovernmental} {Panel} on {Climate} {Change}}, - publisher = {Cambridge University Press}, - author = {Douville, H. and Raghavan, K. and Renwick, J. and Allan, R.P. and Arias, P.A. and Barlow, M. and Cerezo-Mota, R. and Cherchi, A. and Gan, T.Y. and Gergis, J. and Jiang, D. and Khan, A. and Pokam Mba, W. and Rosenfeld, D. and Tierney, J. and Zolina, O.}, - editor = {Masson-Delmotte, V. and Zhai, P. and Pirani, A. and Connors, S.L. and Péan, C. and Berger, S. and Caud, N. and Chen, Y. and Goldfarb, L. and Gomis, M.I. and Huang, M. and Leitzell, K. and Lonnoy, E. and Matthews, J.B.R. and Maycock, T.K. and Waterfield, T. and Yelekçi, O. and Yu, R. and Zhou, B.}, - year = {2021}, - doi = {10.1017/9781009157896.010}, - note = {Type: Book Section}, - pages = {1055--1210}, -} - -@incollection{seneviratne_weather_2021, - address = {Cambridge, United Kingdom and New York, NY, USA}, - title = {Weather and {Climate} {Extreme} {Events} in a {Changing} {Climate}}, - booktitle = {Climate {Change} 2021: {The} {Physical} {Science} {Basis}. {Contribution} of {Working} {Group} {I} to the {Sixth} {Assessment} {Report} of the {Intergovernmental} {Panel} on {Climate} {Change}}, - publisher = {Cambridge University Press}, - author = {Seneviratne, S.I. and Zhang, X. and Adnan, M. and Badi, W. and Dereczynski, C. and Di Luca, A. and Ghosh, S. and Iskandar, I. and Kossin, J. and Lewis, S. and Otto, F. and Pinto, I. and Satoh, M. and Vicente-Serrano, S.M. and Wehner, M. and Zhou, B.}, - editor = {Masson-Delmotte, V. and Zhai, P. and Pirani, A. and Connors, S.L. and Péan, C. and Berger, S. and Caud, N. and Chen, Y. and Goldfarb, L. and Gomis, M.I. and Huang, M. and Leitzell, K. and Lonnoy, E. and Matthews, J.B.R. and Maycock, T.K. and Waterfield, T. and Yelekçi, O. and Yu, R. and Zhou, B.}, - year = {2021}, - doi = {10.1017/9781009157896.013}, - note = {Type: Book Section}, - pages = {1513--1766}, -} - -@book{wilson_teaching_2020, - address = {Boca Raton}, - title = {Teaching tech together: how to make lessons that work and build a teaching community around them}, - isbn = {978-0-429-33070-4 978-1-00-072801-9}, - shorttitle = {Teaching tech together}, - abstract = {"Hundreds of grassroots groups have sprung up around the world to teach programming, web design, robotics, and other skills outside traditional classrooms. These groups exist so that people don't have to learn these things on their own, but ironically, their founders and instructors are often teaching themselves how to teach. There's a better way. This book presents evidence-based practices that will help you create and deliver lessons that work and build a teaching community around them. Topics include the differences between different kinds of learners, diagnosing and correcting misunderstandings, teaching as a performance art, what motivates and demotivates adult learners, how to be a good ally, fostering a healthy community, getting the word out, and building alliances with like-minded groups. The book includes over a hundred exercises that can be done individually or in groups, over 350 references, and a glossary to help you navigate educational jargon"--}, - publisher = {CRC Press}, - author = {Wilson, Greg}, - year = {2020}, - keywords = {Computer programming, Design Study and teaching, Robotics, Study and teaching, Web sites}, -} diff --git a/book/09_Scratch/slides/Open_Science_Intro_Slides.md b/book/09_Scratch/slides/Open_Science_Intro_Slides.md deleted file mode 100644 index dca7cc0..0000000 --- a/book/09_Scratch/slides/Open_Science_Intro_Slides.md +++ /dev/null @@ -1,72 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.16.2 - kernelspec: - display_name: base - language: python - name: python3 ---- - -## About this tutorial -* Use data available through the NASA Earthdata Cloud to understand and forecast environmental risks such as wildfire, drought, and floods. -* Analyze, visualize, and report on data through open science-based workflows and the use of cloud-based data computing. - - -## What is Open Science? -***The principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity*** - - - - - -### What open science resources are available? - -- Over 100 Petabytes of openly available NASA Earthdata. -- Online platforms for data sharing and collaboration. -- Publicly accessible code repositories and development tools. - - -### What are the benefits of Open Science? - -- Enhances the discoverability and accessibility of scientific processes and outputs. -- Open methods enhance reproducibility. -- Transparency and verifiability enhance accuracy. -- Scrutiny of analytic decisions promotes trust. -- Accessible data and collective efforts accelerate discoveries. -- Open science fosters inclusion, diversity, equity, and accessibility (IDEA). -- And much more.. - - - - - - - - -## Where to start? Open Research Products - -Scientific knowledge, or research products, take the form of: - -#### Data - -Scientifically relevant information in digital format. -- Examples: mission data, calibration details, metadata. - -#### Code - -Software used in scientific computing. -- Types: general purpose, libraries, modeling, analysis, single-use. - -#### Results - -Various research outputs showcasing scientific findings. -- Examples: publications, notebooks, presentations, blog posts. - - - - From ddef25278069a4a2c611ce14f3e8fae97a06414d Mon Sep 17 00:00:00 2001 From: Dhavide Aruliah Date: Tue, 17 Sep 2024 00:02:27 -0700 Subject: [PATCH 10/10] Updates _toc.yml to reflect reorganization --- book/_toc.yml | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/book/_toc.yml b/book/_toc.yml index 03318e0..0f0802a 100644 --- a/book/_toc.yml +++ b/book/_toc.yml @@ -4,31 +4,25 @@ format: jb-book root: intro parts: - - caption: Open Science + - caption: Introduction & Setup chapters: - - file: 01_Open_Science/Open_Science_Intro + - file: 00_Introduction_Setup/00_Getting_NASA_EarthData_Credentials.md + - file: 00_Introduction_Setup/01_Using_the_2i2c_Hub.md + - file: 00_Introduction_Setup/02_Introduction_to_Open_Science.md - - caption: Geospatial fundamentals + - caption: Geospatial Background chapters: - - file: 02_Geospatial_fundamentals/2_Selecting_an_AOI - - file: 02_Geospatial_fundamentals/remote-sensing + - file: 01_Geospatial_Background/01_Coordinate_systems.md + - file: 01_Geospatial_Background/02_Geospatial_data.md + - file: 01_Geospatial_Background/03_Geospatial_data_formats.md - - caption: Geospatial data files + - caption: Software Tools chapters: - - file: 03_Geospatial_data_files/geographic_data_formats + - file: 02_Software_Tools/01_Data_Manipulation_Tools.md + - file: 02_Software_Tools/02_Data_Visualization_Tools.md - - caption: NASA EarthData + - caption: Using NASA EarthData chapters: - - file: 04_NASA_Earthdata/0_Initial_Setup - - file: 04_NASA_Earthdata/0_Configuracion_inicial - - - caption: Wildfire Analysis - chapters: - - file: 07_Wildfire_analysis/Retrieving_Disturbance_Data - - file: 07_Wildfire_analysis/Wildfire - - - caption: Flood Analysis - chapters: - - file: 08_Flood_analysis/3_Retrieving_FloodData - - file: 08_Flood_analysis/flood - \ No newline at end of file + - file: 03_Using_NASA_EarthData/01_Using_OPERA_DIST_Products.md + - file: 03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md + - file: 03_Using_NASA_EarthData/03_Using_PySTAC.md \ No newline at end of file