Skip to content

Commit

Permalink
add storage location descriptions
Browse files Browse the repository at this point in the history
  • Loading branch information
amsnyder committed Sep 15, 2023
1 parent 4cde913 commit 01f5f89
Show file tree
Hide file tree
Showing 59 changed files with 12,206 additions and 1,592 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: c1a466e13780e5467658071063133287
config: 5124740340b2386eadeb67159d9e4978
tags: 645f666f9bcd5a90fca523b33c5a78b7
13 changes: 9 additions & 4 deletions _sources/dataset_access/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
# CONUS404 Access

This section contains notebooks that demonstrate how to access and perform basic data manipulation for the CONUS404 dataset. These methods are likely applicable to many of the other key HyTEST datasets that can be opened with xarray. If you need help setting up a computing environment where you can run these notebooks, you should review the [Computing Environments](../environment_set_up/README.md) section of the documentation.
This section contains notebooks that demonstrate how to access and perform basic data manipulation for the [CONUS404 dataset](https://doi.org/10.5066/P9PHPK4F).

We currently have four demonstrations:
In the CONUS404 intake sub-catalog (see [here](../dataset_catalog/README.md) for an explainer of our intake data catalog), you will see entries for four CONUS404 datasets: conus404-hourly, conus404-daily, conus404-monthly, and conus404-daily-diagnostic data. Each of these datasets is duplicated in three different storage locations (as the [intake catalog section](../dataset_catalog/README.md) also describes). The conus404-hourly data is a subset of the wrfout model output and conus404-daily-diagnostic is a subset from the wrfxtrm model output, both of which are described in the official [CONUS404 data release](https://doi.org/10.5066/P9PHPK4F). We also have conus404-daily and conus404-monthly files, which are just resampled from the conus404-hourly data.

We currently have five notebooks to help demonstrate how to work with these datasets in a python workflow:
- [Explore CONUS404 Dataset](./conus404_explore.ipynb): opens the CONUS404 dataset, loads and plots the entire spatial
domain of a specified variable at a specfic time step, and loads and plots a time series of a variable at a specified coordinate pair.
- [CONUS404 Temporal Aggregation](./conus404_temporal_aggregation.ipynb): calculates a daily average of the CONUS404 hourly data.
- [CONUS404 Spatial Aggregation](./conus404_spatial_aggregation.ipynb): calculates the area-weighted mean of a specified
CONUS404 variable for all HUC12s in the Delaware River Basin.
- [CONUS404 Spatial Aggregation](./conus404_spatial_aggregation.ipynb): calculates the area-weighted mean of the CONUS404 data for all HUC12s in the Delaware River Basin.
- [CONUS404 Point Selection](./conus404_point_selection.ipynb): samples the CONUS404 data at a selection of gage locations using their lat/lon point coordinates.
- [CONUS404 Regridding (Curvilinear => Rectilinear)](./conus404_regrid.ipynb): regrids a subset of the CONUS404 dataset from a curvilinear grid to a rectilinear grid and saves the output to a netcdf file. The package used in this demo is not compatible with Windows. We hope to improve upon this methodology, and will likely update the package/technique used in the future.

These methods are likely applicable to many of the other key HyTEST datasets that can be opened with xarray.

*Note: If you need help setting up a computing environment where you can run these notebooks, you should review the [Computing Environments](../environment_set_up/README.md) section of the documentation.*
25 changes: 24 additions & 1 deletion _sources/dataset_catalog/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,33 @@
# HyTEST Data Catalog (Intake)
This section describes how to use HyTEST's [intake catalog](https://intake.readthedocs.io/en/latest/catalog.html). Intake catalogs help reduce or remove the burden of handling different file formats and storage locations, making it easier to read data into your workflow. They also allow data providers to update the filepath/storage location of a dataset without breaking the workflows that were built on top of the intake catalog.

Our catalog facilitates this access for HyTEST's key data offerings and is used to read the data into the notebooks contained in this repository. While intake catalogs are Python-centric, they stored as a yaml file, which should also be easy to parse using other programming languages, even if there is no equivalent package in that programming language. Example usage of this catalog is shown below.
Our catalog facilitates this access for HyTEST's key data offerings and is used to read the data into the notebooks contained in this repository. While intake catalogs are Python-centric, they stored as a yaml file, which should also be easy to parse using other programming languages, even if there is no equivalent package in that programming language.

Please note that this catalog is a temporary solution for reading data into our workflows. By the end of 2023, we hope to replace this catalog by a [STAC](https://stacspec.org/en). We plan to update all notebooks to read from our STAC at that time, as well.

## Storage Locations
Before getting into the details of how to use the intake catalog, it will be helpful to have some background on the various data storage systems HyTEST uses. Many of the datasets in our intake catalog have been duplicated in multiple storage locations, so you will need to have a basic understanding of these systems to navigate the data catalog. For datasets that are duplicated in multiple locations, the data on all storage systems will be identical; however, the details and costs associated with accessing them may differ. Datasets that are duplicated in multiple locations will have identical names, up until the last hypenated part of the name, which will indicate the storage location; for example, `conus404-hourly-cloud`, `conus404-hourly-osn`, and `conus404-hourly-onprem` are all identical datasets stored in different places. The three locations we store data currently are: AWS S3 buckets, Open Storage Network (OSN) pods, and USGS on-premises supercomputer storage (Caldera). Each of these locations is described in more detail below.

### AWS S3
This location provides object storage through an Amazon Web Services (AWS) Simple Storage Service (S3) bucket. This data is free to access for workflows that are running in the AWS us-west-2 region. However, if you would like to pull the data out of the AWS cloud (to your local computer, a supercomputer, or another cloud provider) or into another [AWS cloud region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html), you will incur fees. This is because the bucket storing the data is a **“requester pays”** bucket. The costs associated with reading the data to other computing environments or AWS regions is documented [here](https://aws.amazon.com/s3/pricing/) (on the “Requests and Data Retrievals” tab). If you do need to read this data into a computing environment outside the AWS us-west-2 region, you will need to make sure you have an [AWS account](https://aws.amazon.com/account/) set up. You will need credentials from this account to read in the data, and your account will be billed. Please refer to the [AWS Credentials](../environment_set_up/Help_AWS_Credentials.ipynb) section of this book for more details on handling AWS credentials.

**Datasets in the intake catalog that are stored in an S3 bucket have a name ending in "-cloud".**

### Open Storage Network (OSN) Pod
This location provides object storage through Woods Hole Oceanographic Institute’s [Open Storage Network (OSN)](https://www.openstoragenetwork.org/) storage pod. This OSN pod that HyTEST uses is housed at the Massachusetts Green High Performance Computing Center on a **high-speed (100+ GbE) network**. This copy of the data is **free** to access from any computing environment and **does not require any credentials** to access.

The OSN pod storage can be accessed through an API that is compatible with the basic data access model of the S3 API. The only major difference is that the user needs to specify the appropriate endpoint url for the OSN pod when making the request. However, *a user accessing data on the OSN pod through HyTEST's intake catalog will not have to worry about these details*, as the intake package will handle them for you. If you would like to access the data on the OSN pod through a mechanism other than intake, you may want to review the [Data/Cloud Storage](../essential_reading/DataSources/Data_S3.md) section of this book.

**Datasets in the intake catalog that are stored on the OSN pod have a name ending in "-osn".**

### USGS On-premises Supercomputer Storage (Caldera)
The last storage location is the USGS on-premises disk storage that is attached to the USGS supercomputers (also called Caldera). This location is **only accessible to USGS employees or collaborators who have been granted access to USGS supercomputers**. This is the preferred data storage to use if you are working on the USGS supercomputers, and in fact, you will *not* be able to read data from this location into any computing environment other than the USGS supercomputers. More information about this storage system can be found in the [HPC User Docs](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/caldera.html) (which are also only accessible through the internal USGS network).

**Datasets in the intake catalog that are stored on Caldera have a name ending in "-onprem".**

## Example Intake Catalog Usage
Now that you have an understanding of the different storage systems HyTEST uses, you will be able to navigate the HyTEST intake catalog and make a selection that is appropriate for your computing environment. Below is a demonstration of how to use HyTEST's intake catalog to select and open a dataset in your python workflow.

```python
import intake
url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'
Expand Down
134 changes: 134 additions & 0 deletions _static/_sphinx_javascript_frameworks_compat.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/*
* _sphinx_javascript_frameworks_compat.js
* ~~~~~~~~~~
*
* Compatability shim for jQuery and underscores.js.
*
* WILL BE REMOVED IN Sphinx 6.0
* xref RemovedInSphinx60Warning
*
*/

/**
* select a different prefix for underscore
*/
$u = _.noConflict();


/**
* small helper function to urldecode strings
*
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
*/
jQuery.urldecode = function(x) {
if (!x) {
return x
}
return decodeURIComponent(x.replace(/\+/g, ' '));
};

/**
* small helper function to urlencode strings
*/
jQuery.urlencode = encodeURIComponent;

/**
* This function returns the parsed url parameters of the
* current request. Multiple values per key are supported,
* it will always return arrays of strings for the value parts.
*/
jQuery.getQueryParameters = function(s) {
if (typeof s === 'undefined')
s = document.location.search;
var parts = s.substr(s.indexOf('?') + 1).split('&');
var result = {};
for (var i = 0; i < parts.length; i++) {
var tmp = parts[i].split('=', 2);
var key = jQuery.urldecode(tmp[0]);
var value = jQuery.urldecode(tmp[1]);
if (key in result)
result[key].push(value);
else
result[key] = [value];
}
return result;
};

/**
* highlight a given string on a jquery object by wrapping it in
* span elements with the given class name.
*/
jQuery.fn.highlightText = function(text, className) {
function highlight(node, addItems) {
if (node.nodeType === 3) {
var val = node.nodeValue;
var pos = val.toLowerCase().indexOf(text);
if (pos >= 0 &&
!jQuery(node.parentNode).hasClass(className) &&
!jQuery(node.parentNode).hasClass("nohighlight")) {
var span;
var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
if (isInSVG) {
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
} else {
span = document.createElement("span");
span.className = className;
}
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
node.parentNode.insertBefore(span, node.parentNode.insertBefore(
document.createTextNode(val.substr(pos + text.length)),
node.nextSibling));
node.nodeValue = val.substr(0, pos);
if (isInSVG) {
var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
var bbox = node.parentElement.getBBox();
rect.x.baseVal.value = bbox.x;
rect.y.baseVal.value = bbox.y;
rect.width.baseVal.value = bbox.width;
rect.height.baseVal.value = bbox.height;
rect.setAttribute('class', className);
addItems.push({
"parent": node.parentNode,
"target": rect});
}
}
}
else if (!jQuery(node).is("button, select, textarea")) {
jQuery.each(node.childNodes, function() {
highlight(this, addItems);
});
}
}
var addItems = [];
var result = this.each(function() {
highlight(this, addItems);
});
for (var i = 0; i < addItems.length; ++i) {
jQuery(addItems[i].parent).before(addItems[i].target);
}
return result;
};

/*
* backward compatibility for jQuery.browser
* This will be supported until firefox bug is fixed.
*/
if (!jQuery.browser) {
jQuery.uaMatch = function(ua) {
ua = ua.toLowerCase();

var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
/(webkit)[ \/]([\w.]+)/.exec(ua) ||
/(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
/(msie) ([\w.]+)/.exec(ua) ||
ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
[];

return {
browser: match[ 1 ] || "",
version: match[ 2 ] || "0"
};
};
jQuery.browser = {};
jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
}
54 changes: 39 additions & 15 deletions _static/basic.css
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ table.modindextable td {
/* -- general body styles --------------------------------------------------- */

div.body {
min-width: 450px;
min-width: 360px;
max-width: 800px;
}

Expand All @@ -237,16 +237,6 @@ a.headerlink {
visibility: hidden;
}

a.brackets:before,
span.brackets > a:before{
content: "[";
}

a.brackets:after,
span.brackets > a:after {
content: "]";
}

h1:hover > a.headerlink,
h2:hover > a.headerlink,
h3:hover > a.headerlink,
Expand Down Expand Up @@ -334,12 +324,16 @@ aside.sidebar {
p.sidebar-title {
font-weight: bold;
}
nav.contents,
aside.topic,

div.admonition, div.topic, blockquote {
clear: left;
}

/* -- topics ---------------------------------------------------------------- */
nav.contents,
aside.topic,

div.topic {
border: 1px solid #ccc;
Expand Down Expand Up @@ -379,13 +373,19 @@ div.body p.centered {

div.sidebar > :last-child,
aside.sidebar > :last-child,
nav.contents > :last-child,
aside.topic > :last-child,

div.topic > :last-child,
div.admonition > :last-child {
margin-bottom: 0;
}

div.sidebar::after,
aside.sidebar::after,
nav.contents::after,
aside.topic::after,

div.topic::after,
div.admonition::after,
blockquote::after {
Expand Down Expand Up @@ -428,10 +428,6 @@ table.docutils td, table.docutils th {
border-bottom: 1px solid #aaa;
}

table.footnote td, table.footnote th {
border: 0 !important;
}

th {
text-align: left;
padding-right: 5px;
Expand Down Expand Up @@ -615,6 +611,7 @@ ul.simple p {
margin-bottom: 0;
}

/* Docutils 0.17 and older (footnotes & citations) */
dl.footnote > dt,
dl.citation > dt {
float: left;
Expand All @@ -632,6 +629,33 @@ dl.citation > dd:after {
clear: both;
}

/* Docutils 0.18+ (footnotes & citations) */
aside.footnote > span,
div.citation > span {
float: left;
}
aside.footnote > span:last-of-type,
div.citation > span:last-of-type {
padding-right: 0.5em;
}
aside.footnote > p {
margin-left: 2em;
}
div.citation > p {
margin-left: 4em;
}
aside.footnote > p:last-of-type,
div.citation > p:last-of-type {
margin-bottom: 0em;
}
aside.footnote > p:last-of-type:after,
div.citation > p:last-of-type:after {
content: "";
clear: both;
}

/* Footnotes & citations ends */

dl.field-list {
display: grid;
grid-template-columns: fit-content(30%) auto;
Expand Down
Loading

0 comments on commit 01f5f89

Please sign in to comment.