Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Commit

Permalink
Merge pull request ScienceCore#89 from ScienceCore/post_scipy_updates
Browse files Browse the repository at this point in the history
Post scipy updates
  • Loading branch information
jnywong authored Sep 17, 2024
2 parents 09c3a11 + ddef252 commit 9d4c037
Show file tree
Hide file tree
Showing 40 changed files with 1,942 additions and 2,758 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ venv.bak/
.netrc

# Jupyter Notebook
.ipynb_checkpoints
book/*.ipynb
**/.ipynb_checkpoints
**/*.ipynb

# Jupyter Book
book/_build
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Getting NASA Earthdata Credentials


This notebook lays out the process to obtain for NASA Earthdata credentials.

**You can complete this step before the day of the actual tutorial.**


## Brief Introduction


The [NASA Earth Science Data Systems (ESDS) program oversees the lifecycle of Earth science data from all its Earth observation missions, from acquisition to processing and distribution.

For the purposes of this guide, the NASA Earthdata website is the entry point that allows full, free and open access to NASA's Earth science data collections, in order to accelerate scientific progress for the benefit of society. To access the data through this portal, users must first define their access credentials. To create an EarthData account, follow these steps:

+ Go to the Earth Nasa website: [`https://www.earthdata.nasa.gov/`](https://www.earthdata.nasa.gov/). Then, select the menu options "*Use Data*" and then "*Register*". Finally, navigate to [`https://urs.earthdata.nasa.gov/`](https://urs.earthdata.nasa.gov/).

![earthdata_login](../assets/earthdata_login.png)

+ Select the "*Register for a profile*" option, there choose a username and password. You will need these later, so choose ones that you remember well. You will also need to complete your profile to complete the registration, where you will be asked for information such as email, country, and affiliation. Finally, choose "*Register for Earthdata Login*".

![earthdata_profile](../assets/earthdata_profile2.png)
77 changes: 77 additions & 0 deletions book/00_Introduction_Setup/01_Using_the_2i2c_Hub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Using the 2i2c Hub


This notebook lays out instructions to log into the cloud infrastructure ([JupyterHub](https://jupyter.org/hub)) provided by [2i2c](https://2i2c.org) for this tutorial.

**You won't be able to complete this step until the actual day of the tutorial (you'll get the password then).**


## 1. Logging into the 2i2c Hub


To login to the JupyterHub provided by 2i2c, follow these steps:

1. **Navigate to the 2i2c Hub:** Point your web browser to [this link](https://climaterisk.opensci.2i2c.cloud).

2. **Log in with your Credentials:**

+ **Username:** Feel free to choose any username you like. We suggest using your GitHub username to avoid conflicts.
+ **Password:** *You'll receive the password on the day of the tutorial*.

![2i2c_login](../assets/2i2c_login.png)

3. **Logging In:**

The login process might take a few minutes, especially if a new virtual workspace needs to be created just for you.

![start_server2](../assets/start_server_2i2c.png)

* **What to Expect:**

By default, logging into [`https://climaterisk.opensci.2i2c.cloud`](https://climaterisk.opensci.2i2c.cloud) automatically clones a repository to work in. If the login is successful, you will see the following screen and are ready to start working.

![work_environment_jupyter_lab](../assets/work_environment_jupyter_lab.png)

**Notes:** Any files you work on will persist between sessions as long as you use the same username when logging in.

## 2. Configuring the Cloud Environment to Access NASA EarthData from Python

To access NASA's EarthData products from Python programs or Jupyter notebooks, it is necessary to save your NASA EarthData credentials in a special file called `.netrc` in your home directory.

+ You can create this file by executing the script `make_netrc.py` in a terminal:
```bash
$ python make_netrc.py
```
You can also choose to execute this script within this Jupyter notebook by executing the Python cell below (using the `%run` magic).

Some caveats:
+ The script won't execute if `~/.netrc` exists already. You can delete that file or rename it if you want to preserve the credentials within.
+ The script prompts for your NASA EarthData username & password, so watch for the prompt if you execute it from a Jupyter notebook.
```bash
%run make_netrc.py
```
+ Alternatively, you can create a file called `.netrc` in your home folder (i.e., `~/.netrc`) with content as follows:
```
machine urs.earthdata.nasa.gov login USERNAME password PASSWORD
```
Of course, you would replace `USERNAME` and `PASSWORD` in your `.netrc` file with your actual NASA EarthData account details. Once the `.netrc` file is saved with your correct credentials, it's good practice to restrict access to the file:
```bash
$ chmod 600 ~/.netrc
```


## 3. Verifying Access to NASA EarthData Products
<!-- #region -->
To make sure everything is working properly, execute the script `test_netrc.py`:
```bash
$ python test_netrc.py
```
Again, you can execute this directly from this notebook using the Python cell below:
<!-- #endregion -->
```bash
%run test_netrc.py
```

If that worked smoothly, you're done! You now have everything you need to explore NASA's Earth observation data through the EarthData portal!
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,44 @@ jupyter:
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: base
display_name: Python 3 (ipykernel)
language: python
name: python3
---

# About this tutorial


This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all.
We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.


## What is Open Science

<!-- #region jupyter={"source_hidden": true} -->
"Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity."

<!-- #region -->

![](../assets/image165.png)
<!-- #endregion -->

### Availability of Open Science Resources:

<!-- #region jupyter={"source_hidden": true} -->
- Many existing open science resources, over 100 Petabytes of openly available NASA data.
- Tools and practices for collaboration and code development.
<!-- #endregion -->

### Outputs and Project Openness:

<!-- #region jupyter={"source_hidden": true} -->
- Choice between openness from project inception or at publication.
- Making data, code, and results open.
<!-- #endregion -->

### Importance of Sharing and Impact:

<!-- #region jupyter={"source_hidden": true} -->
- Enhances the discoverability and accessibility of scientific processes and outputs.
- Open methods enhance reproducibility.
- Transparency and verifiability enhance accuracy.
Expand All @@ -48,43 +55,50 @@ We aim to equip learners with the skills to analyze, visualize, and report on da


![](../assets/image377.jpg)

<!-- #endregion -->

## Why now

<!-- #region jupyter={"source_hidden": true} -->
- The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.

- Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.

- Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process.
<!-- #endregion -->

<!-- #region -->
## Where to start: Open Research Products

<!-- #region jupyter={"source_hidden": true} -->
Scientific knowledge, or research products, take the form of:

![](../assets/image5.png)
<!-- #endregion -->

### What is data?

<!-- #region jupyter={"source_hidden": true} -->
Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:

- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
- Information needed to validate scientific conclusions of peer-reviewed publications.
- Metadata.
<!-- #endregion -->

### What is code?

<!-- #region jupyter={"source_hidden": true} -->
- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.
<!-- #endregion -->

### What are results?

<!-- #region jupyter={"source_hidden": true} -->
Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:

- Peer-reviewed publications
Expand All @@ -99,7 +113,4 @@ Products are created throughout the scientific process that are needed to enable


![](../assets/image7.jpeg)



<!-- #endregion -->
22 changes: 22 additions & 0 deletions book/00_Introduction_Setup/make_netrc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# This script generates .netrc file, prompting user for EarthData credentials.
# Delete or rename ~/.netrc before executing this script (it won't overwrite a
# pre-existing ~/.netrc file).

from getpass import getpass
from pathlib import Path
import sys

NETRC_PATH = Path('~/.netrc').expanduser()
TEMPLATE = " ".join(["machine", "urs.earthdata.nasa.gov", "login",
"{USERNAME}", "password", "{PASSWORD}\n"])

if NETRC_PATH.exists():
print("Warning: ~/.netrc exists already (this script won't overwrite).")
print(" Delete ~/.netrc first or back up to avoid losing credentials.")
sys.exit(1)

username = input("NASA EarthData login: ")
password = getpass(prompt="NASA EarthData password: ")

NETRC_PATH.write_text(TEMPLATE.format(USERNAME=username, PASSWORD=password))
NETRC_PATH.chmod(0o600)
36 changes: 36 additions & 0 deletions book/00_Introduction_Setup/test_netrc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Minimal test to verify NASA Earthdata credentials for downloading data products.
# Requires a .netrc file in home directory containing valid credentials.

import osgeo.gdal, rasterio
from pystac_client import Client
from warnings import filterwarnings
filterwarnings("ignore") # suppress PySTAC warnings

# Mandatory GDAL setup for accessing cloud data
osgeo.gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/.gdal_cookies.txt')
osgeo.gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/.gdal_cookies.txt')
osgeo.gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR')
osgeo.gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF')

# Define AOI (Area-Of-Interest) & time-window
livingston_tx, delta = (-95.09, 30.69), 0.1
AOI = tuple(coord + sgn*delta for sgn in (-1,+1) for coord in livingston_tx)
start, stop = '2024-04-30', '2024-05-31'
WINDOW = f'{start}/{stop}'

# Prepare PySTAC client
STAC_URL, COLLECTIONS = 'https://cmr.earthdata.nasa.gov/stac', ["OPERA_L3_DSWX-HLS_V1"]
catalog = Client.open(f'{STAC_URL}/POCLOUD/')

print("Testing PySTAC search...")
opts = dict(bbox=AOI, collections=COLLECTIONS, datetime=WINDOW)
results = list(catalog.search(**opts).items_as_dicts())
test_uri = results[0]['assets']['0_B01_WTR']['href']

try:
print(f"Search successful, accessing test data...")
with rasterio.open(test_uri) as ds: _ = ds.profile
print("Success! Your credentials file is correctly configured!")
except:
print(f"Could not access NASA EarthData asset: {test_uri}")
print("Ensure that a .netrc file containing valid NASA Earthdata credentials exists in the user home directory.")
Loading

0 comments on commit 9d4c037

Please sign in to comment.