Merge pull request ScienceCore#86 from jnywong/update-environment

Update environment
ScienceCore · Jul 9, 2024 · 1ca98d0 · 1ca98d0
2 parents 6574496 + e1ea8a9
commit 1ca98d0
Show file tree

Hide file tree

Showing 10 changed files with 713 additions and 24 deletions.
diff --git a/assessment/Assessment-form.md b/assessment/Assessment-form.md
@@ -0,0 +1,92 @@
+
+# Assessment form for the Scipy tutorial: preliminar questions
+
+### What is the primary difference between geographic and projected coordinate reference systems?
+
+- Geographic coordinate reference systems use latitude and longitude while projected coordinate reference systems use XY coordinates.
+- Geographic coordinate reference systems are three-dimensional while projected coordinate reference systems are two-dimensional.
+- Projected coordinate reference systems can only be used in the southern hemisphere.
+- Geographic coordinate reference systems use the prime meridian as the origin while projected coordinate reference systems use the equator.
+
+### What is the reference line for latitude in the geographic coordinate system?
+
+- The prime meridian
+- The equator
+- The North pole
+- The South pole
+
+### Which statement is true about lines of longitude?
+
+- They run parallel to the equator.
+- They are assigned positive values in the southern hemisphere.
+- They converge at the poles.
+- The distance between lines of longitude is the same at all latitudes.
+
+### What is the origin point for the Universal Transverse Mercator (UTM) coordinate reference system?
+
+- The North pole
+- The South pole
+- The equator at a specific Longitude
+- Greenwich England
+
+### Which file is mandatory for a shapefile to represent spatial vector data?
+
+- .prj
+- .xml
+- .shp
+- .cpg
+
+### What type of data does GeoJSON format encode?
+
+- Raster data
+- Vector data
+- Both raster and vector data
+- Metadata for geospatial images
+
+---
+
+### NASA EarthData
+
+**Have you accessed NASA EarthData Cloud before this tutorial?**
+
+- Yes
+- No
+
+**How difficult do you find the process of accessing data from NASA EarthData Cloud?**
+
+- Very easy: I could access and retrieve data without any issues.
+- Somewhat easy: I could access data but I encountered minor difficulties.
+- Challenging: I encountered significant difficulties or was unable to access the data.
+
+**Have you used NASA EarthData products in your research or projects before this tutorial?**
+
+- Yes
+- No
+
+**Do you feel confident in extracting and processing data from NASA EarthData Cloud after this tutorial?**
+
+- Very confident: I can extract and process data independently and efficiently.
+- Somewhat confident: I can extract and process data but may need occasional assistance.
+- Not confident: I still feel unsure about extracting and processing data on my own.
+
+---
+
+### Tutorial
+
+**Please indicate your level of agreement with the following statements regarding the delivery of the tutorial:**
+
+|                                                    | Strongly agree | Agree | Neutral | Disagree | Strongly disagree |
+|----------------------------------------------------|----------------|-------|---------|----------|-------------------|
+| The materials provided for the tutorial were adequate. |                |       |         |          |                   |
+| The activities were clear and easy to follow.       |                |       |         |          |                   |
+| The examples were effective in helping me understand the concepts of cloud-based data analysis. |                |       |         |          |                   |
+| The theoretical concepts were sufficiently explained. |                |       |         |          |                   |
+| There was enough hands-on practice.                 |                |       |         |          |                   |
+| There was a good balance between theory and hands-on activities. |                |       |         |          |                   |
+| The time allocated to each task was sufficient.     |                |       |         |          |                   |
+
+**Please write down any additional suggestions or observations you may have.**
+
+_________________________________________________________________________________________________
+
+Link to the form: http://tiny.cc/ClimateriskScipy2024
diff --git a/book/03_Geospatial_data_files/geographic_data_formats.md b/book/03_Geospatial_data_files/geographic_data_formats.md
@@ -60,6 +60,15 @@ A GeoTIFF file extension contains geographic metadata that describes the actual
 Geodata is drawn from vector formats on a map, and the geodata is converted to the specified output projection of the map if the projection in the source file differs. Some of the vector and raster formats typically supported by a GeoTIFF online viewer include: asc, gml, gpx, json, kml, kmz, mid, mif, osm, tif, tab, map, id, dat, gdbtable, and gdbtablx.^2^
 
 
+## A Note on the Ordering of Coordinates in Python GIS Libraries
+
+In Python, 2D raster data are represented as matrices. Typically, the x-dimension of the array corresponds to `easting` or `longitude`, while the y-dimension corresponds to `northing` or `latitude`. However, the convention for listing the dimensions of matrices is to list the number of rows first and columns second. This means a matrix with dimensions (n, m) has `n` rows (latitude bins) and `m` columns (longitude bins). This convention is reflected when querying the shape of a `numpy` array using the `shape` attribute.
+
+On the other hand, vector shapes, such as Shapely `Points`, follow the (longitude, latitude) notation. For example, the coordinates for Livingston, TX, are specified as `livingston_tx = Point(-95.09, 30.69)`. Similarly, `Polygon` bounds are specified in the order `(longitude_0, latitude_0, longitude_1, latitude_1)`.
+
+It is important to keep these coordinate orderings in mind when working with geospatial data, as they can easily become a source of confusion or errors.
+
+
 ## References
 
 1. https://www.geographyrealm.com/geodatabases-explored-vector-and-raster-data/

diff --git a/book/04_NASA_Earthdata/0_Initial_Setup.md b/book/04_NASA_Earthdata/0_Initial_Setup.md
@@ -1,31 +1,40 @@
 # Initial Setup 
 
-## 1- How to use the 2i2c Hub
+## 1- Accesing to the 2i2c Hub
+To login to the 2i2c Hub, follow these simple steps:
 
-To access the 2i2c Hub, follow these simple steps:
+* **Head to the Hub:** Visit this link to access the 2i2c Hub: https://climaterisk.opensci.2i2c.cloud/.
 
-* Access the 2i2c Hub: Go to https://2i2c.org/platform/
+* **Log in with your Credentials:**
 
-# Initial Configuration Steps for 2i2c Hub and EarthData NASA Access
+**Username:** Feel free to choose any username you like.  Te recommend your GitHub username helps avoid conflicts with others.
 
-## 1. Accessing the 2i2c Hub
+**Password:** You'll receive the password the day before the tutorial.
 
-To access the 2i2c Hub, follow these simple steps:
-
-* Go to the 2i2c Hub.
 
 ![2i2c_login](../assets/2i2c_login.png)
 
-* Enter your credentials: username and password (Note: You must have previously sent your Github account username to be enabled for access with that account).
 
-* If the login is successful, you will see the following screen. Choose the Start option to enter the JupyterLab environment in the cloud.
-
+* **Logging In:**
+
+The login process might take a few minutes, especially if a new virtual workspace needs to be created just for you. 
+
+
+![start_server2](../assets/start_server_2i2c.png)
+
+
+* **What to Expect:**
+
+By default,  logging into https://climaterisk.opensci.2i2c.cloud will automatically clone https://github.com/ScienceCore/scipy-2024-climaterisk and change to that directly. If the login is successful, you will see the following screen. 
+
+
+![work_environment_jupyter_lab](../assets/work_environment_jupyter_lab.png) 
+
+Finally, if you see the previous JupyterLab screen, you are ready to start working.
 
-![2i2c_login](../assets/start_server.png)
+**Notes:** Any files you work on will be saved between sessions as long as you use the same username.
 
-* Finally, if you see the following JupyterLab screen, you are ready to start working.
 
-![ambiente_trabajo_jupyter_lab](../assets/work_environment_jupyter_lab.png) 
 
 ## 2. Using NASA's Earthdata
 

diff --git a/book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md b/book/07_Wildfire_analysis/Retrieving_Disturbance_Data.md
@@ -5,7 +5,7 @@ jupyter:
       extension: .md
       format_name: markdown
       format_version: '1.3'
-      jupytext_version: 1.16.2
+      jupytext_version: 1.16.1
   kernelspec:
     display_name: Python 3 (ipykernel)
     language: python
@@ -31,6 +31,7 @@ from osgeo import gdal
 from rasterio.merge import merge
 import rasterio
 import contextily as cx
+import folium
 
 # data wrangling imports
 import pandas as pd
@@ -93,19 +94,23 @@ print(f"Number of tiles found intersecting given AOI: {len(results)}")
 Let's load the search results into a pandas dataframe
 
 ```python
-layer_name = 'VEG-DIST-STATUS'
+def search_to_df(results, layer_name = 'VEG-DIST-STATUS'):
 
-times = pd.DatetimeIndex([result['properties']['datetime'] for result in results]) # parse of timestamp for each result
-data = {'hrefs': [value['href'] for result in results for key, value in result['assets'].items() if layer_name in key],  # parse out links only to DIST-STATUS data layer
-        'tile_id': [value['href'].split('/')[-1].split('_')[3] for result in results for key, value in result['assets'].items() if layer_name in key]}
+        times = pd.DatetimeIndex([result['properties']['datetime'] for result in results]) # parse of timestamp for each result
+        data = {'hrefs': [value['href'] for result in results for key, value in result['assets'].items() if layer_name in key],  # parse out links only to DIST-STATUS data layer
+                'tile_id': [value['href'].split('/')[-1].split('_')[3] for result in results for key, value in result['assets'].items() if layer_name in key]}
 
-# # Construct pandas dataframe to summarize granules from search results
-granules = pd.DataFrame(index=times, data=data)
-granules.index.name = 'times'
+        # Construct pandas dataframe to summarize granules from search results
+        granules = pd.DataFrame(index=times, data=data)
+        granules.index.name = 'times'
+
+        return granules
 ```
 
 ```python
+granules = search_to_df(results)
 granules.head()
+
 ```
 
 ```python
@@ -213,3 +218,112 @@ plt.xlabel('Date', size=15)
 plt.xticks([datetime(year=2023, month=8, day=1) + timedelta(days=6*i) for i in range(11)], size=14)
 plt.title('2023 Dadia forest wildfire detected extent', size=14)
 ```
+
+### Great Green Wall, Sahel Region, Africa
+
+```python
+ndiaye_senegal = Point(-16.09, 16.50)
+
+# We will search data through the product record
+start_date = datetime(year=2022, month=1, day=1)
+stop_date = datetime.now()
+```
+
+```python
+# Plotting search location in folium as a sanity check
+m = folium.Map(location=(ndiaye_senegal.y, ndiaye_senegal.x), control_scale = True, zoom_start=9)
+radius = 5000
+folium.Circle(
+    location=[ndiaye_senegal.y, ndiaye_senegal.x],
+    radius=radius,
+    color="red",
+    stroke=False,
+    fill=True,
+    fill_opacity=0.6,
+    opacity=1,
+    popup="{} pixels".format(radius),
+    tooltip="50 px radius",
+    # 
+).add_to(m)
+
+m
+```
+
+```python
+# We open a client instance to search for data, and retrieve relevant data records
+STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
+
+# Setup PySTAC client
+# LPCLOUD refers to the LP DAAC cloud environment that hosts earth observation data
+catalog = Client.open(f'{STAC_URL}/LPCLOUD/') 
+
+collections = ["OPERA_L3_DIST-ANN-HLS_V1"]
+
+# We would like to search data for August-September 2023
+date_range = f'{start_date.strftime("%Y-%m-%d")}/{stop_date.strftime("%Y-%m-%d")}'
+
+opts = {
+    'bbox' : ndiaye_senegal.bounds, 
+    'collections': collections,
+    'datetime' : date_range,
+}
+
+search = catalog.search(**opts)
+results = list(search.items_as_dicts())
+print(f"Number of tiles found intersecting given AOI: {len(results)}")
+```
+
+```python
+def urls_to_dataset(granule_dataframe):
+    '''method that takes in a list of OPERA tile URLs and returns an xarray dataset with dimensions
+    latitude, longitude and time'''
+
+    dataset_list = []
+
+    for i, row in granule_dataframe.iterrows():
+        with rasterio.open(row.hrefs) as ds:
+            # extract CRS string
+            crs = str(ds.crs).split(':')[-1]
+
+            # extract the image spatial extent (xmin, ymin, xmax, ymax)
+            xmin, ymin, xmax, ymax = ds.bounds
+
+            # the x and y resolution of the image is available in image metadata
+            x_res = np.abs(ds.transform[0])
+            y_res = np.abs(ds.transform[4])
+
+            # read the data 
+            img = ds.read()
+
+            # Ensure img has three dimensions (bands, y, x)
+            if img.ndim == 2:
+                img = np.expand_dims(img, axis=0) 
+
+
+
+        lon = np.arange(xmin, xmax, x_res)
+        lat = np.arange(ymax, ymin, -y_res)
+
+        lon_grid, lat_grid = np.meshgrid(lon, lat)
+
+        da = xr.DataArray(
+            data=img,
+            dims=["band", "y", "x"],
+            coords=dict(
+                lon=(["y", "x"], lon_grid),
+                lat=(["y", "x"], lat_grid),
+                time=i,
+                band=np.arange(img.shape[0])
+            ),
+            attrs=dict(
+                description="OPERA DIST ANN",
+                units=None,
+            ),
+        )
+        da.rio.write_crs(crs, inplace=True)
+
+        dataset_list.append(da)
+    return xr.concat(dataset_list, dim='time').squeeze()
+
+dataset= urls_to_dataset(granules)
+```