From 704085e3f41d94ac97f54c6b2b57247d35365563 Mon Sep 17 00:00:00 2001
From:
Date: Fri, 23 Aug 2024 15:33:54 -0500
Subject: [PATCH] add conda env set up for ondemand
---
_sources/environment_set_up/OpenOnDemand.md | 17 +-
.../CONUS404_ACCESS-checkpoint.html | 591 ++++++++++++++++++
environment_set_up/OpenOnDemand.html | 32 +-
searchindex.js | 2 +-
4 files changed, 628 insertions(+), 14 deletions(-)
create mode 100644 dataset_access/.ipynb_checkpoints/CONUS404_ACCESS-checkpoint.html
diff --git a/_sources/environment_set_up/OpenOnDemand.md b/_sources/environment_set_up/OpenOnDemand.md
index 0f133979..9a5079b5 100644
--- a/_sources/environment_set_up/OpenOnDemand.md
+++ b/_sources/environment_set_up/OpenOnDemand.md
@@ -3,8 +3,11 @@
This is a custom service provided by the USGS ARC team. It is the easiest to use (no configuration needed on your part), and provides reasonable compute resources via the `tallgrass` and `hovenweep` hosts:
* To log in to OnDemand, select the appropriate login link from the `OnDemand` section of `https://hpcportal.cr.usgs.gov/`. Note that you must be on the VPN to access this host. Denali/Tallgrass share one disk for data storage and Hovenweep has a different disk. If you have data stored on the HPCs, you will want to choose whichever resource is attached to where your data is stored. If you are accessing data from a different, publicly accessible storage location, you can choose either option.
-* From the OnDemand landing page, choose `Interactive Apps`. If you are using `Hovenweep`, select the `Jupyter` option from this dropdown. If you are using `Tallgrass`, you can either select `Jupyter` or you can launch the `HyTEST Jupyter` server app, which will include a conda environment pre-configured with the packages you need to run the workflows in this JupyterBook. If you do not use our pre-configured environment (if you selected `Jupyter`), you will need to build your own. You can learn more about how to set up your own conda environment [here](https://hpcportal.cr.usgs.gov/hpc-user-docs/guides/software/environments/python/Python_Environment_Setup_with_Conda.html) in the HPC user docs.
+* From the OnDemand landing page, choose `Interactive Apps`. If you are using `Hovenweep`, select the `Jupyter` option from this dropdown. If you are using `Tallgrass`, you can either select `Jupyter` or you can launch the `HyTEST Jupyter` server app, which will include a conda environment pre-configured with the packages you need to run the workflows in this JupyterBook. If you do not use our pre-configured environment (if you selected `Jupyter`), you will need to build your own once your connect to the HPC. This process is described below in [Conda Environment Set Up](#conda-environment-set-up)
* Fill in the form to customize the allocation in which the Jupyter Server will execute.
+ * For light duty work (i.e. tutorials), a `Viz` node is likely adequate in your allocation request. If you
+will be doing heavier processing, you may want to request a compute node. None of the HyTEST tutorials
+utilize GPU code; a GPU-enabled node is not necessary.
* You may want to consider adding the git and/or aws modules if you plan to use them during your session. You will just need to type `module load git` and/or `module load aws` in the `Module loads` section.
* If you expect to run code in parallel on multiple compute nodes, you have two options. (1) You can use the form to request the number of cores you need and then run a [Dask Local Cluster](./Start_Dask_Cluster_Denali.ipynb) on those cores, or (2) you can request the standard 2 cores, and then use a [Dask SLURMCluster](./Start_Dask_Cluster_Tallgrass.ipynb) to submit new jobs to the SLURM scheduler, giving you access to additional compute nodes.
* Click Submit
@@ -12,9 +15,13 @@ This is a custom service provided by the USGS ARC team. It is the easiest to use
The Jupyter Server will run in an allocation on `tallgrass` or `hovenweep`. This server will have access to your home
directory/folder on that host, which is where your notebooks will reside.
+
+## Conda Environment Set Up
+If you need to set up your own conda environment on the HPCs, please refer to the [HPC User Docs](https://hpcportal.cr.usgs.gov/hpc-user-docs/guides/software/environments/python/Python_Environment_Setup_with_Conda.html). You can stop right before they create the sample python environment with `conda create env --name py310 python=3.10` and instead create whatever conda environment you need to use for your work. Once you have created your conda environment, you will need to take one additional step to make this conda environment visible as a kernel in your Jupyter Notebook:
+* activate your environment with `conda activate your_environment_name`
+* make sure `ipykernel` is installed in your conda environment
+* run `python -m ipykernel install --user --name your_environment_name --display-name "your_environment_name"`
-For light duty work (i.e. tutorials), a `Viz` node is likely adequate in your allocation request. If you
-will be doing heavier processing, you may want to request a compute node. None of the HyTEST tutorials
-utilize GPU code; a GPU-enabled node is not necessary.
+Now, you will be able to see the environment you just built as an available kernel from your Jupyter notebook.
-##### Note: The code to build this app is in [this repository](https://code.chs.usgs.gov/sas/arc/arc-software/ood/bc_jupyter_hytest); however, this repo is only visible on the internal USGS network.
\ No newline at end of file
+##### Note: The code to build the HyTEST Jupyter app is in [this repository](https://code.chs.usgs.gov/sas/arc/arc-software/ood/bc_jupyter_hytest); however, this repo is only visible on the internal USGS network.
\ No newline at end of file
diff --git a/dataset_access/.ipynb_checkpoints/CONUS404_ACCESS-checkpoint.html b/dataset_access/.ipynb_checkpoints/CONUS404_ACCESS-checkpoint.html
new file mode 100644
index 00000000..0dd629bd
--- /dev/null
+++ b/dataset_access/.ipynb_checkpoints/CONUS404_ACCESS-checkpoint.html
@@ -0,0 +1,591 @@
+
+
+
+
+
+
+
+
+
+
+ CONUS404 and CONUS404 Bias-Adjusted Data Access — HyTEST Project
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
This section contains notebooks that demonstrate how to access and perform basic data manipulation for the CONUS404 dataset. The examples can also be applied to the CONUS404 bias-adjusted dataset.
+
In the CONUS404 intake sub-catalog (see here for an explainer of our intake data catalog), you will see entries for four CONUS404 datasets: conus404-hourly, conus404-daily, conus404-monthly, and conus404-daily-diagnostic data, as well as two CONUS404 bias-adjusted datasets: conus404-hourly-ba, conus404-daily-ba. Each of these datasets is duplicated in up to three different storage locations (as the intake catalog section also describes).
The conus404-hourly data is a subset of the wrfout model output. For instantaneous variables, the data value at each time step represents the instantaneous value at the timestep. For accumulated variables, the data value represents the accumulated value up to the timestep (see the integration_length attribute attached to each accumulated variable for more details on the accumulation period).
+
The conus404-daily-diagnostic data is a subset from the wrfxtrm model output. These data represent the results of the past 24 hours, with the timestamp corresponding to the end time of the 24 hour period. Because the CONUS404 started at 1979-10-01_00:00:00, the first timestep (1979-10-01_00:00:00) for each variable is all zeros.
We also have conus404-daily and conus404-monthly files, which are just resampled from the conus404-hourly data. To create the conus404-daily zarr, instantaneous variables are aggregated from 00:00:00 UTC to 11:00:00 UTC, while accumulated variables are aggregated from 01:00:00 UTC to 12:00:00 UTC of the next day.
+
Please note that the values in the ACLWDNB, ACLWUPB, ACSWDNB, ACSWDNT, and ACSWUPB variables available in the zarr store differ from the original model output. These variables have been re-calculated to reflect the accumulated value since the model start, as directed in the WRF manual. An attribute has been added to each of these variables in the zarr store to denote the accumulation period for the variable.
+
We recommend that you regularly check our CONUS404 changelog to see any updates that have been made to the zarr stores. We do not anticipate regular changes to the dataset, but we may need to fix an occasional bug or update the dataset with additional years of data.
The conus404-hourly-ba data contains bias-adjusted temperature and precipiation data from the CONUS404 dataset, which is described in the official CONUS404 bias adjusted data release. The conus404-daily-ba files are resampled from the conus404-hourly-ba data.
We currently have five notebooks to help demonstrate how to work with these datasets in a python workflow:
+
+
Explore CONUS404 Dataset: opens the CONUS404 dataset, loads and plots the entire spatial
+domain of a specified variable at a specfic time step, and loads and plots a time series of a variable at a specified coordinate pair.
CONUS404 Spatial Aggregation: calculates the area-weighted mean of the CONUS404 data for all HUC12s in the Delaware River Basin.
+
CONUS404 Point Selection: samples the CONUS404 data at a selection of gage locations using their lat/lon point coordinates.
+
CONUS404 Regridding (Curvilinear => Rectilinear): regrids a subset of the CONUS404 dataset from a curvilinear grid to a rectilinear grid and saves the output to a netcdf file. The package used in this demo is not compatible with Windows. We hope to improve upon this methodology, and will likely update the package/technique used in the future.
+
+
These methods are likely applicable to many of the other key HyTEST datasets that can be opened with xarray.
+
Note: If you need help setting up a computing environment where you can run these notebooks, you should review the Computing Environments section of the documentation.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Contents
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/environment_set_up/OpenOnDemand.html b/environment_set_up/OpenOnDemand.html
index b409be8e..fcd7bff8 100644
--- a/environment_set_up/OpenOnDemand.html
+++ b/environment_set_up/OpenOnDemand.html
@@ -418,7 +418,10 @@
To log in to OnDemand, select the appropriate login link from the OnDemand section of https://hpcportal.cr.usgs.gov/. Note that you must be on the VPN to access this host. Denali/Tallgrass share one disk for data storage and Hovenweep has a different disk. If you have data stored on the HPCs, you will want to choose whichever resource is attached to where your data is stored. If you are accessing data from a different, publicly accessible storage location, you can choose either option.
-
From the OnDemand landing page, choose InteractiveApps. If you are using Hovenweep, select the Jupyter option from this dropdown. If you are using Tallgrass, you can either select Jupyter or you can launch the HyTESTJupyter server app, which will include a conda environment pre-configured with the packages you need to run the workflows in this JupyterBook. If you do not use our pre-configured environment (if you selected Jupyter), you will need to build your own. You can learn more about how to set up your own conda environment here in the HPC user docs.
+
From the OnDemand landing page, choose InteractiveApps. If you are using Hovenweep, select the Jupyter option from this dropdown. If you are using Tallgrass, you can either select Jupyter or you can launch the HyTESTJupyter server app, which will include a conda environment pre-configured with the packages you need to run the workflows in this JupyterBook. If you do not use our pre-configured environment (if you selected Jupyter), you will need to build your own once your connect to the HPC. This process is described below in Conda Environment Set Up
Fill in the form to customize the allocation in which the Jupyter Server will execute.
+
For light duty work (i.e. tutorials), a Viz node is likely adequate in your allocation request. If you
+will be doing heavier processing, you may want to request a compute node. None of the HyTEST tutorials
+utilize GPU code; a GPU-enabled node is not necessary.
You may want to consider adding the git and/or aws modules if you plan to use them during your session. You will just need to type moduleloadgit and/or moduleloadaws in the Moduleloads section.
If you expect to run code in parallel on multiple compute nodes, you have two options. (1) You can use the form to request the number of cores you need and then run a Dask Local Cluster on those cores, or (2) you can request the standard 2 cores, and then use a Dask SLURMCluster to submit new jobs to the SLURM scheduler, giving you access to additional compute nodes.
If you need to set up your own conda environment on the HPCs, please refer to the HPC User Docs. You can stop right before they create the sample python environment with condacreateenv--namepy310python=3.10 and instead create whatever conda environment you need to use for your work. Once you have created your conda environment, you will need to take one additional step to make this conda environment visible as a kernel in your Jupyter Notebook:
+
+
activate your environment with condaactivateyour_environment_name
+
make sure ipykernel is installed in your conda environment
+
run python-mipykernelinstall--user--nameyour_environment_name--display-name"your_environment_name"
+
+
Now, you will be able to see the environment you just built as an available kernel from your Jupyter notebook.
+
+
Note: The code to build the HyTEST Jupyter app is in this repository; however, this repo is only visible on the internal USGS network.#