add storage location descriptions

hytest-org · Sep 15, 2023 · 01f5f89 · 01f5f89
1 parent 4cde913
commit 01f5f89
Show file tree

Hide file tree

Showing 59 changed files with 12,206 additions and 1,592 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: c1a466e13780e5467658071063133287
+config: 5124740340b2386eadeb67159d9e4978
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/_sources/dataset_access/README.md b/_sources/dataset_access/README.md
@@ -1,12 +1,17 @@
 # CONUS404 Access
 
-This section contains notebooks that demonstrate how to access and perform basic data manipulation for the CONUS404 dataset. These methods are likely applicable to many of the other key HyTEST datasets that can be opened with xarray. If you need help setting up a computing environment where you can run these notebooks, you should review the [Computing Environments](../environment_set_up/README.md) section of the documentation.
+This section contains notebooks that demonstrate how to access and perform basic data manipulation for the [CONUS404 dataset](https://doi.org/10.5066/P9PHPK4F). 
 
-We currently have four demonstrations:
+In the CONUS404 intake sub-catalog (see [here](../dataset_catalog/README.md) for an explainer of our intake data catalog), you will see entries for four CONUS404 datasets: conus404-hourly, conus404-daily, conus404-monthly, and conus404-daily-diagnostic data. Each of these datasets is duplicated in three different storage locations (as the [intake catalog section](../dataset_catalog/README.md) also describes). The conus404-hourly data is a subset of the wrfout model output and conus404-daily-diagnostic is a subset from the wrfxtrm model output, both of which are described in the official [CONUS404 data release](https://doi.org/10.5066/P9PHPK4F). We also have conus404-daily and conus404-monthly files, which are just resampled from the conus404-hourly data.
 
+We currently have five notebooks to help demonstrate how to work with these datasets in a python workflow:
 - [Explore CONUS404 Dataset](./conus404_explore.ipynb): opens the CONUS404 dataset, loads and plots the entire spatial 
    domain of a specified variable at a specfic time step, and loads and plots a time series of a variable at a specified coordinate pair.
 - [CONUS404 Temporal Aggregation](./conus404_temporal_aggregation.ipynb): calculates a daily average of the CONUS404 hourly data.
-- [CONUS404 Spatial Aggregation](./conus404_spatial_aggregation.ipynb): calculates the area-weighted mean of a specified 
-   CONUS404 variable for all HUC12s in the Delaware River Basin.
+- [CONUS404 Spatial Aggregation](./conus404_spatial_aggregation.ipynb): calculates the area-weighted mean of the CONUS404 data for all HUC12s in the Delaware River Basin.
+- [CONUS404 Point Selection](./conus404_point_selection.ipynb): samples the CONUS404 data at a selection of gage locations using their lat/lon point coordinates.
 - [CONUS404 Regridding (Curvilinear => Rectilinear)](./conus404_regrid.ipynb): regrids a subset of the CONUS404 dataset from a curvilinear grid to a rectilinear grid and saves the output to a netcdf file. The package used in this demo is not compatible with Windows. We hope to improve upon this methodology, and will likely update the package/technique used in the future.
+
+These methods are likely applicable to many of the other key HyTEST datasets that can be opened with xarray.
+
+*Note: If you need help setting up a computing environment where you can run these notebooks, you should review the [Computing Environments](../environment_set_up/README.md) section of the documentation.*
diff --git a/_sources/dataset_catalog/README.md b/_sources/dataset_catalog/README.md
@@ -1,10 +1,33 @@
 # HyTEST Data Catalog (Intake)
 This section describes how to use HyTEST's [intake catalog](https://intake.readthedocs.io/en/latest/catalog.html). Intake catalogs help reduce or remove the burden of handling different file formats and storage locations, making it easier to read data into your workflow. They also allow data providers to update the filepath/storage location of a dataset without breaking the workflows that were built on top of the intake catalog.
 
-Our catalog facilitates this access for HyTEST's key data offerings and is used to read the data into the notebooks contained in this repository. While intake catalogs are Python-centric, they stored as a yaml file, which should also be easy to parse using other programming languages, even if there is no equivalent package in that programming language. Example usage of this catalog is shown below.
+Our catalog facilitates this access for HyTEST's key data offerings and is used to read the data into the notebooks contained in this repository. While intake catalogs are Python-centric, they stored as a yaml file, which should also be easy to parse using other programming languages, even if there is no equivalent package in that programming language.
 
 Please note that this catalog is a temporary solution for reading data into our workflows. By the end of 2023, we hope to replace this catalog by a [STAC](https://stacspec.org/en). We plan to update all notebooks to read from our STAC at that time, as well.
 
+## Storage Locations
+Before getting into the details of how to use the intake catalog, it will be helpful to have some background on the various data storage systems HyTEST uses. Many of the datasets in our intake catalog have been duplicated in multiple storage locations, so you will need to have a basic understanding of these systems to navigate the data catalog. For datasets that are duplicated in multiple locations, the data on all storage systems will be identical; however, the details and costs associated with accessing them may differ. Datasets that are duplicated in multiple locations will have identical names, up until the last hypenated part of the name, which will indicate the storage location; for example, `conus404-hourly-cloud`, `conus404-hourly-osn`, and `conus404-hourly-onprem` are all identical datasets stored in different places. The three locations we store data currently are: AWS S3 buckets, Open Storage Network (OSN) pods, and USGS on-premises supercomputer storage (Caldera). Each of these locations is described in more detail below.
+
+### AWS S3
+This location provides object storage through an Amazon Web Services (AWS) Simple Storage Service (S3) bucket. This data is free to access for workflows that are running in the AWS us-west-2 region. However, if you would like to pull the data out of the AWS cloud (to your local computer, a supercomputer, or another cloud provider) or into another [AWS cloud region](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html), you will incur fees. This is because the bucket storing the data is a **“requester pays”** bucket. The costs associated with reading the data to other computing environments or AWS regions is documented [here](https://aws.amazon.com/s3/pricing/) (on the “Requests and Data Retrievals” tab). If you do need to read this data into a computing environment outside the AWS us-west-2 region, you will need to make sure you have an [AWS account](https://aws.amazon.com/account/) set up. You will need credentials from this account to read in the data, and your account will be billed. Please refer to the [AWS Credentials](../environment_set_up/Help_AWS_Credentials.ipynb) section of this book for more details on handling AWS credentials.
+
+**Datasets in the intake catalog that are stored in an S3 bucket have a name ending in "-cloud".**
+
+### Open Storage Network (OSN) Pod
+This location provides object storage through Woods Hole Oceanographic Institute’s [Open Storage Network (OSN)](https://www.openstoragenetwork.org/) storage pod. This OSN pod that HyTEST uses is housed at the Massachusetts Green High Performance Computing Center on a **high-speed (100+ GbE) network**. This copy of the data is **free** to access from any computing environment and **does not require any credentials** to access.
+
+The OSN pod storage can be accessed through an API that is compatible with the basic data access model of the S3 API. The only major difference is that the user needs to specify the appropriate endpoint url for the OSN pod when making the request. However, *a user accessing data on the OSN pod through HyTEST's intake catalog will not have to worry about these details*, as the intake package will handle them for you. If you would like to access the data on the OSN pod through a mechanism other than intake, you may want to review the [Data/Cloud Storage](../essential_reading/DataSources/Data_S3.md) section of this book.
+
+**Datasets in the intake catalog that are stored on the OSN pod have a name ending in "-osn".**
+
+### USGS On-premises Supercomputer Storage (Caldera)
+The last storage location is the USGS on-premises disk storage that is attached to the USGS supercomputers (also called Caldera). This location is **only accessible to USGS employees or collaborators who have been granted access to USGS supercomputers**. This is the preferred data storage to use if you are working on the USGS supercomputers, and in fact, you will *not* be able to read data from this location into any computing environment other than the USGS supercomputers. More information about this storage system can be found in the [HPC User Docs](https://hpcportal.cr.usgs.gov/hpc-user-docs/supercomputers/caldera.html) (which are also only accessible through the internal USGS network).
+
+**Datasets in the intake catalog that are stored on Caldera have a name ending in "-onprem".**
+
+## Example Intake Catalog Usage
+Now that you have an understanding of the different storage systems HyTEST uses, you will be able to navigate the HyTEST intake catalog and make a selection that is appropriate for your computing environment. Below is a demonstration of how to use HyTEST's intake catalog to select and open a dataset in your python workflow.
+
 ```python
 import intake
 url = 'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml'

diff --git a/_static/_sphinx_javascript_frameworks_compat.js b/_static/_sphinx_javascript_frameworks_compat.js
@@ -0,0 +1,134 @@
+/*
+ * _sphinx_javascript_frameworks_compat.js
+ * ~~~~~~~~~~
+ *
+ * Compatability shim for jQuery and underscores.js.
+ *
+ * WILL BE REMOVED IN Sphinx 6.0
+ * xref RemovedInSphinx60Warning
+ *
+ */
+
+/**
+ * select a different prefix for underscore
+ */
+$u = _.noConflict();
+
+
+/**
+ * small helper function to urldecode strings
+ *
+ * See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURIComponent#Decoding_query_parameters_from_a_URL
+ */
+jQuery.urldecode = function(x) {
+    if (!x) {
+        return x
+    }
+    return decodeURIComponent(x.replace(/\+/g, ' '));
+};
+
+/**
+ * small helper function to urlencode strings
+ */
+jQuery.urlencode = encodeURIComponent;
+
+/**
+ * This function returns the parsed url parameters of the
+ * current request. Multiple values per key are supported,
+ * it will always return arrays of strings for the value parts.
+ */
+jQuery.getQueryParameters = function(s) {
+    if (typeof s === 'undefined')
+        s = document.location.search;
+    var parts = s.substr(s.indexOf('?') + 1).split('&');
+    var result = {};
+    for (var i = 0; i < parts.length; i++) {
+        var tmp = parts[i].split('=', 2);
+        var key = jQuery.urldecode(tmp[0]);
+        var value = jQuery.urldecode(tmp[1]);
+        if (key in result)
+            result[key].push(value);
+        else
+            result[key] = [value];
+    }
+    return result;
+};
+
+/**
+ * highlight a given string on a jquery object by wrapping it in
+ * span elements with the given class name.
+ */
+jQuery.fn.highlightText = function(text, className) {
+    function highlight(node, addItems) {
+        if (node.nodeType === 3) {
+            var val = node.nodeValue;
+            var pos = val.toLowerCase().indexOf(text);
+            if (pos >= 0 &&
+                !jQuery(node.parentNode).hasClass(className) &&
+                !jQuery(node.parentNode).hasClass("nohighlight")) {
+                var span;
+                var isInSVG = jQuery(node).closest("body, svg, foreignObject").is("svg");
+                if (isInSVG) {
+                    span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
+                } else {
+                    span = document.createElement("span");
+                    span.className = className;
+                }
+                span.appendChild(document.createTextNode(val.substr(pos, text.length)));
+                node.parentNode.insertBefore(span, node.parentNode.insertBefore(
+                    document.createTextNode(val.substr(pos + text.length)),
+                    node.nextSibling));
+                node.nodeValue = val.substr(0, pos);
+                if (isInSVG) {
+                    var rect = document.createElementNS("http://www.w3.org/2000/svg", "rect");
+                    var bbox = node.parentElement.getBBox();
+                    rect.x.baseVal.value = bbox.x;
+                    rect.y.baseVal.value = bbox.y;
+                    rect.width.baseVal.value = bbox.width;
+                    rect.height.baseVal.value = bbox.height;
+                    rect.setAttribute('class', className);
+                    addItems.push({
+                        "parent": node.parentNode,
+                        "target": rect});
+                }
+            }
+        }
+        else if (!jQuery(node).is("button, select, textarea")) {
+            jQuery.each(node.childNodes, function() {
+                highlight(this, addItems);
+            });
+        }
+    }
+    var addItems = [];
+    var result = this.each(function() {
+        highlight(this, addItems);
+    });
+    for (var i = 0; i < addItems.length; ++i) {
+        jQuery(addItems[i].parent).before(addItems[i].target);
+    }
+    return result;
+};
+
+/*
+ * backward compatibility for jQuery.browser
+ * This will be supported until firefox bug is fixed.
+ */
+if (!jQuery.browser) {
+    jQuery.uaMatch = function(ua) {
+        ua = ua.toLowerCase();
+
+        var match = /(chrome)[ \/]([\w.]+)/.exec(ua) ||
+            /(webkit)[ \/]([\w.]+)/.exec(ua) ||
+            /(opera)(?:.*version|)[ \/]([\w.]+)/.exec(ua) ||
+            /(msie) ([\w.]+)/.exec(ua) ||
+            ua.indexOf("compatible") < 0 && /(mozilla)(?:.*? rv:([\w.]+)|)/.exec(ua) ||
+            [];
+
+        return {
+            browser: match[ 1 ] || "",
+            version: match[ 2 ] || "0"
+        };
+    };
+    jQuery.browser = {};
+    jQuery.browser[jQuery.uaMatch(navigator.userAgent).browser] = true;
+}
diff --git a/_static/basic.css b/_static/basic.css
@@ -222,7 +222,7 @@ table.modindextable td {
 /* -- general body styles --------------------------------------------------- */
 
 div.body {
-    min-width: 450px;
+    min-width: 360px;
     max-width: 800px;
 }
 
@@ -237,16 +237,6 @@ a.headerlink {
     visibility: hidden;
 }
 
-a.brackets:before,
-span.brackets > a:before{
-    content: "[";
-}
-
-a.brackets:after,
-span.brackets > a:after {
-    content: "]";
-}
-
 h1:hover > a.headerlink,
 h2:hover > a.headerlink,
 h3:hover > a.headerlink,
@@ -334,12 +324,16 @@ aside.sidebar {
 p.sidebar-title {
     font-weight: bold;
 }
+nav.contents,
+aside.topic,
 
 div.admonition, div.topic, blockquote {
     clear: left;
 }
 
 /* -- topics ---------------------------------------------------------------- */
+nav.contents,
+aside.topic,
 
 div.topic {
     border: 1px solid #ccc;
@@ -379,13 +373,19 @@ div.body p.centered {
 
 div.sidebar > :last-child,
 aside.sidebar > :last-child,
+nav.contents > :last-child,
+aside.topic > :last-child,
+
 div.topic > :last-child,
 div.admonition > :last-child {
     margin-bottom: 0;
 }
 
 div.sidebar::after,
 aside.sidebar::after,
+nav.contents::after,
+aside.topic::after,
+
 div.topic::after,
 div.admonition::after,
 blockquote::after {
@@ -428,10 +428,6 @@ table.docutils td, table.docutils th {
     border-bottom: 1px solid #aaa;
 }
 
-table.footnote td, table.footnote th {
-    border: 0 !important;
-}
-
 th {
     text-align: left;
     padding-right: 5px;
@@ -615,6 +611,7 @@ ul.simple p {
     margin-bottom: 0;
 }
 
+/* Docutils 0.17 and older (footnotes & citations) */
 dl.footnote > dt,
 dl.citation > dt {
     float: left;
@@ -632,6 +629,33 @@ dl.citation > dd:after {
     clear: both;
 }
 
+/* Docutils 0.18+ (footnotes & citations) */
+aside.footnote > span,
+div.citation > span {
+    float: left;
+}
+aside.footnote > span:last-of-type,
+div.citation > span:last-of-type {
+  padding-right: 0.5em;
+}
+aside.footnote > p {
+  margin-left: 2em;
+}
+div.citation > p {
+  margin-left: 4em;
+}
+aside.footnote > p:last-of-type,
+div.citation > p:last-of-type {
+    margin-bottom: 0em;
+}
+aside.footnote > p:last-of-type:after,
+div.citation > p:last-of-type:after {
+    content: "";
+    clear: both;
+}
+
+/* Footnotes & citations ends */
+
 dl.field-list {
     display: grid;
     grid-template-columns: fit-content(30%) auto;