Merge pull request #52 from CliMA/ln/sf-test

Add a perfect model example using SurfaceFluxes.jl
CliMA · Mar 12, 2024 · caeca40 · caeca40
2 parents 96b1fb7 + 67680ba
commit caeca40
Show file tree

Hide file tree

Showing 33 changed files with 3,919 additions and 223 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,7 +6,9 @@ test/test_init/iteration*
 *.hdf5
 *.h5
 *.nc
+*.jld2
 !experiments/*/*.hdf5
+!experiments/*/*.jld2
 
 # System-specific files and directories generated by the BinaryProvider and BinDeps packages
 # They contain absolute paths specific to the host computer, and so should not be committed
@@ -23,9 +25,10 @@ docs/site/
 # Julia package dependencies
 /Manifest.toml
 .dev/Manifest.toml
-plot/Manifest.toml
+!plot/Manifest.toml
 test/Manifest.toml
 docs/Manifest.toml
 
 # misc
-.DS_Store
+.DS_Store
+
diff --git a/Project.toml b/Project.toml
@@ -5,9 +5,9 @@ version = "0.1.0"
 
 [deps]
 CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
-ClimaAtmos = "b2c96348-7fb7-4fe0-8da9-78d88439e717"
 ClimaComms = "3a4d1b5c-c61d-41fd-a00a-5873ba7a1b0d"
 ClimaCore = "d414da3d-4745-48bb-8d80-42e94e092884"
+ClimaParams = "5c42b081-d73a-476f-9059-fd94b934656c"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 EnsembleKalmanProcesses = "aa8a2aa5-91d8-4396-bcef-d4f2ec43552d"
 JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
@@ -19,7 +19,6 @@ YAML = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"
 
 [compat]
 CalibrateEmulateSample = "0.5"
-ClimaAtmos = "0.20.0"
 ClimaComms = "0.5"
 ClimaCore = "0.11, 0.12"
 Distributions = "0.25"

diff --git a/docs/src/api.md b/docs/src/api.md
@@ -1,9 +1,11 @@
 # API
 
-## ClimaAtmos Interface
+## Model Interface
 ```@docs
-CalibrateAtmos.get_atmos_config
+CalibrateAtmos.get_config
 CalibrateAtmos.run_forward_model
+CalibrateAtmos.get_forward_model
+CalibrateAtmos.observation_map
 ```
 
 ## EnsembleKalmanProcesses Interface

diff --git a/docs/src/experiment_setup_guide.md b/docs/src/experiment_setup_guide.md
@@ -1,15 +1,15 @@
-# Experiment Set Up Guide
+# Experiment Set Up Guide for ClimaAtmos
 
 Please read this entire guide before setting up an experiment.
-This guide assumes familiarity with ClimaAtmos and a basic understanding of Ensemble Kalman Inversion.
+This guide assumes familiarity with ClimaAtmos and a basic understanding of Ensemble Kalman Inversion (EKI). Because moderate resolution ClimaAtmos runs need to be run with MPI and/or on GPU, this example demonstrates how to run the pipeline of a perfect model example calibration (i.e., generating the "true" data with the same model that we are calibrating with towards a known set of parameters) using bash scripts for the Caltech HPC cluster and the EKI calibration machinery.
 
 ## Summary
 
 The overall pipeline is as follows:
 
 For a perfect model scenario, observations are generated by running the model and then processing the diagnostic output through the constructed observation map.
 
-For the example experiment, `sphere_held_suarez_rhoe_equilmoist`, this is done by running 
+For the example experiment, `sphere_held_suarez_rhoe_equilmoist`, this is done by running
 `sbatch experiments/sphere_held_suarez_rhoe_equilmoist/generate_observations.sbatch`. This script runs the model, passes the output through the observation map, and saves the result.
 
 Once the observations have been processed and saved, the actual calibration pipeline can be run via
@@ -26,7 +26,7 @@ This consists of three `sbatch` scripts:
 To calibrate parameters, you need:
 - Atmos model configuration
 - Steady-state restart file
-- EKP configuration 
+- EKP configuration
 - Prior parameter distributions
 - Truth and noise data
 - Observation map script with a function `observation_map(iteration)`
@@ -39,13 +39,13 @@ First, create a folder for your experiment with a descriptive experiment id in t
 This is the ClimaAtmos configuration which will be used to run all simulations in the calibration. The only changes between model runs are the parameters selected for calibration.
 This file is a typical Atmos YAML file with a few restrictions. It must:
 
-- be named `atmos_config.yml`. 
+- be named `model_config.yml`.
 - include an `output_dir` to store model output.
 - include the path to the restart file. This is detailed in the next section.
 Here is an example of the arguments you will need:
 ```
 output_dir: output/experiment_name
-restart_file: experiments/experiment_name/restart_file.hdf5 
+restart_file: experiments/experiment_name/restart_file.hdf5
 ```
 
 If your configuration requires a parameter TOML file, add this file to the experiment folder and set the `toml` entry in the configuration to the relative file path from the base directory of `CalibrateAtmos.jl`.
@@ -58,19 +58,19 @@ The restart file is a model snapshot which will be used to spin-off all ensemble
 
 Once you have settled on a configuration, follow these steps to generate a restart file:
 
-1. Run ClimaAtmos with your configuration to determine the time it takes for the simulation to reach an equilibrium state. 
-2. Generate a restart file by setting `dt_save_restart` in the model configuration to the equilibrium timestamp. 
+1. Run ClimaAtmos with your configuration to determine the time it takes for the simulation to reach an equilibrium state.
+2. Generate a restart file by setting `dt_save_restart` in the model configuration to the equilibrium timestamp.
 3. Transfer the file to your experiment directory and enter the relative path into your atmosphere configuration file:
 ```
-restart_file: experiments/experiment_name/restart_file.hdf5 
+restart_file: experiments/experiment_name/restart_file.hdf5
 ```
 
 !!! note
     A restarted simulation starts from the saved date of the restart file. A restart file saved at 200 days and a `t_end` of 201 days will only run for 1 simulated day:
     ```
-    restart_file: experiments/experiment_name/day200.hdf5 
+    restart_file: experiments/experiment_name/day200.hdf5
     t_end: 201 days
-    ``` 
+    ```
 
 ## Prior Distribution File
 First, create your TOML file in your experiment folder.
@@ -85,7 +85,7 @@ constraint = "[bounded(0,5)]"
 Note that the prior distribution here is in unconstrained space - the `constraint` list constrains the distribution in parameter space.
 
 !!! note "Why two parameter spaces?"
-    The calibration tools are effective when working with unconstrained parameters (`u`), whereas physical models typically require (partially-)bounded parameters (`φ`). 
+    The calibration tools are effective when working with unconstrained parameters (`u`), whereas physical models typically require (partially-)bounded parameters (`φ`).
     To satisfy both conditions the `ParameterDistribution` object contains maps between these two spaces. The drawback is that the prior must be defined in the unconstrained space.
 
 An easy way to generate prior distributions in constrained parameter space is with the [constrained_gaussian](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/API/ParameterDistributions/#EnsembleKalmanProcesses.ParameterDistributions.constrained_gaussian) constructor from `EnsembleKalmanProcesses.ParameterDistributions`. Here is an example:
@@ -99,9 +99,9 @@ constrained_gaussian("name", physical_mean, physical_std, lower_bound, upper_bou
 ```
 This constructor will provide a Normal distribution, which, when transformed will approximate the target mean and standard deviation provided.
 ```
-ParameterDistribution with 1 entries: 
-'name' with Constraint{BoundedBelow}[Bounds: (-50, ∞)] over distribution 
-Parameterized(Distributions.Normal{Float64}(μ=5.1393230339433345, σ=0.2256676316186263)) 
+ParameterDistribution with 1 entries:
+'name' with Constraint{BoundedBelow}[Bounds: (-50, ∞)] over distribution
+Parameterized(Distributions.Normal{Float64}(μ=5.1393230339433345, σ=0.2256676316186263))
 ```
 Copy the mean and standard deviation values into the constructor for the prior distribution in your TOML file like so:
 ```toml
@@ -113,7 +113,7 @@ constraint = "bounded_below(-50)"
 description = " this prior has approximate (mean,std) = (125,40) and is bounded below by -50"
 ```
 
-You can also ensure that the bounds match the `constraint` entry in your TOML file. 
+You can also ensure that the bounds match the `constraint` entry in your TOML file.
 Since `constraint` is a TOML string parsed to a vector in Julia, ensure that you have quotes around the square brackets for the constraint list.
 
 Constraint constructors:
@@ -127,25 +127,25 @@ Constraint constructors:
 ## Observation Map
 The observation map is applied to process model output diagnostics into the exact observable used to fit to observations. In a perfect model setting it is used also to generate the observation.
 
-This component is inflexible for now. Your observation map file must be a script with the name `observation_map.jl` which exports a function `observation_map(iteration)`. 
+This component is inflexible for now. Your observation map file must be a script with the name `observation_map.jl` which exports a function `observation_map(::Val{:<experiment_id>}, iteration)`, this function is specific to each experiment, so it is dispatched on the `experiment_id`.
 These requirements arise from the update step, which runs:
 ```julia
 include("experiments/$experiment_id/observation_map.jl")
-observation_map(iteration)
+observation_map(::Val(Symbol(experiment_id)), iteration)
 ```
-This function must load in model diagnostics for each ensemble member in the iteration and construct an array `arr = Array{Float64}(undef, dims..., ensemble_size)` such that 
-`arr[:, i]` will return the i-th ensemble member's observation map output.
+This function must load in model diagnostics for each ensemble member in the iteration and construct an array `arr = Array{Float64}(undef, dims..., ensemble_size)` such that
+`arr[:, i]` will return the i-th ensemble member's observation map output. Note this floating point precision is required for the EKI update step.
 
 In the update step of EKI, the array will be saved in a JLD2 file named `observation_map.jld2` in the iteration folder of the output directory.
 
 As an example, in `observation_map(iteration)` in the `sphere_held_suarez_rhoe_equilmoist` experiment, we have the following sequence:
 
-`observation_map(iteration)` constructs the array to store the ensemble's observations. Then, for each ensemble member `m`:
+`observation_map(::Val{:sphere_held_suarez_rhoe_equilmoist}, iteration)` constructs the array to store the ensemble's observations. Then, for each ensemble member `m`:
 - it loads in the model diagnostic output, in this case 60-day air temperature averages.
 - it calls `process_member_data(m)` and stores the result in the output array.
 Pseudocode for `observation_map(iteration)`:
 ```julia
-function observation_map(iteration)
+function observation_map(::Val{:sphere_held_suarez_rhoe_equilmoist}, iteration)
     # Get Configuration
     experiment_id = "sphere_held_suarez_rhoe_equilmoist"
     config = load_config(experiment_id)
@@ -185,7 +185,7 @@ function process_member_data(ta; output_variance = false)
 end
 ```
 
-If you are running a perfect-model experiment and generating truth data from ClimaAtmos itself, you may find it useful to create a kernel function to compute the observation map for each ensemble member. You can use this to run the default simulation's diagnostic output through the observation map and save the truth data and noise. 
+If you are running a perfect-model experiment and generating truth data from ClimaAtmos itself, you may find it useful to create a kernel function to compute the observation map for each ensemble member. You can use this to run the default simulation's diagnostic output through the observation map and save the truth data and noise.
 
 ## Generating Truth Data
 The truth data must be an array of observations with the same dimensions as an individual ensemble member's entry in the observation map (`arr[:, i]` from above).
@@ -215,7 +215,7 @@ The EKP configuration file must contain the following:
 - `n_iterations`, the number of iterations to run
 - `ensemble_size`, the ensemble size
 - `prior_path`, the path to the TOML file with the prior parameter distributions
-- `parameter_names`, a list of the long names of the parameters being calibrated 
+- `parameter_names`, a list of the long names of the parameters being calibrated
 - `truth_data`, the truth data
 - `truth_noise`, the covariance of the truth data
 - `output_dir`, the folder where you want calibration data and logs to be output. This must be the same as the `output_dir` in the model configuration file.

diff --git a/docs/src/quickstart.md b/docs/src/quickstart.md
@@ -1,13 +1,30 @@
 # Getting Started
 
+Before you run, make sure your system meets the dependencies of [CalibrateEmulateSample.jl](https://clima.github.io/CalibrateEmulateSample.jl/dev/installation_instructions/). TODO: remove if workaround
+
+## HPC Cluster
 A good way to get started is to run the initial experiment, `sphere_held_suarez_rhoe_equilmoist`.
 It is a perfect-model calibration, serving as a test case for the initial pipeline.
 
 This experiment runs the Held-Suarez configuration, estimating the parameter `equator_pole_temperature_gradient_wet`.
-By default, it runs 10 ensemble members for 3 iterations. 
+By default, it runs 10 ensemble members for 3 iterations.
 
 To run this experiment:
 1. Log onto the Caltech HPC
 2. Clone CalibrateAtmos.jl and `cd` into the repository.
 3. Run: `bash experiments/pipeline.sh sphere_held_suarez_rhoe_equilmoist 8`. This will run the `sphere_held_suarez_rhoe_equilmoist` experiment with 8 tasks per ensemble member.
 
+## Local Machine
+To run an experiment on your local machine, you can use the `experiments/pipeline.jl` script. This is recommended for more lightweight experiments, such as the `surface_fluxes_perfect_model` experiment, which uses the [SurfaceFluxes.jl](https://github.com/CliMA/SurfaceFluxes.jl) package to generate a physical model that calculates the Monin Obukhov turbulent surface fluxes based on idealized atmospheric and surface conditions. Since this is a "perfect model" example, the same model is used to generate synthetic observations using its default parameters and a small amount of noise. These synthetic observations are considered to be the ground truth, which is used to assess the model ensembles' performance when parameters are drawn from the prior parameter distributions. To run this experiment, you can use the following command from terminal to run an interactive run:
+
+```bash
+julia -i experiments/pipeline.jl surface_fluxes_perfect_model
+```
+
+This pipeline mirrors the pipeline of the bash srcipts, and the same example can be run on the HPC cluster if needed:
+
+```bash
+bash experiments/pipeline.sh surface_fluxes_perfect_model 8
+```
+
+The experiments (such as `surface_fluxes_perfect_model`) can be equally defined within the component model repos (in this case, `SurfaceFluxes.jl`), so that the internals of `CalibrateAtmos.jl` do not explicitly depend on component models.
diff --git a/experiments/initialize.sbatch b/experiments/initialize.sbatch
@@ -7,11 +7,11 @@
 experiment_id=$1
 
 echo "Initializing calibration for experiment $experiment_id"
-julia --color=no --project=experiments -e 'using Pkg; Pkg.instantiate(;verbose=true)'
+julia --color=no --project=experiments/$experiment_id -e 'using Pkg; Pkg.instantiate(;verbose=true)'
 
-julia --color=no --project=experiments -e '
+julia --color=no --project=experiments/$experiment_id -e '
     import CalibrateAtmos
     CalibrateAtmos.initialize("'$experiment_id'")
 '
 
-echo "Calibration initialized."
+echo "Calibration initialized."
diff --git a/experiments/model_run.sbatch b/experiments/model_run.sbatch
@@ -13,10 +13,11 @@ member=$(printf "member_%03d" "$SLURM_ARRAY_TASK_ID")
 output=output/$experiment_id/$format_i/$member/model_log.out
 
 # Run the forward model
-srun --output=$output julia --color=no --project=experiments -e "
-    using ClimaComms
-    ClimaComms.init(ClimaComms.context())
+srun --output=$output julia --color=no --project=experiments/$experiment_id -e "
     import CalibrateAtmos
-    atmos_config = CalibrateAtmos.get_atmos_config($SLURM_ARRAY_TASK_ID, $iteration, \"$experiment_id\")
-    CalibrateAtmos.run_forward_model(atmos_config)
+    include(\"experiments/$experiment_id/model_interface.jl\")
+
+    physical_model = CalibrateAtmos.get_forward_model(Val(:$experiment_id))
+    config = CalibrateAtmos.get_config(physical_model, $SLURM_ARRAY_TASK_ID, $iteration, \"$experiment_id\")
+    CalibrateAtmos.run_forward_model(physical_model, config)
 "
diff --git a/experiments/pipeline.jl b/experiments/pipeline.jl
@@ -0,0 +1,43 @@
+# experiment-specific Project.toml instantiation, function extensions and config load
+
+using Pkg
+experiment_id = ARGS[1]
+Pkg.activate("experiments/$experiment_id")
+Pkg.instantiate()
+using CalibrateAtmos
+pkg_dir = pkgdir(CalibrateAtmos)
+experiment_path = "$pkg_dir/experiments/$experiment_id"
+include("$experiment_path/model_interface.jl")
+include("$experiment_path/generate_truth.jl")
+ekp_config =
+    YAML.load_file(joinpath("experiments", experiment_id, "ekp_config.yml"))
+
+# initialize the CalibrateAtmos
+CalibrateAtmos.initialize(experiment_id)
+
+# run experiment with CalibrateAtmos for N_iter iterations
+N_iter = ekp_config["n_iterations"]
+N_mem = ekp_config["ensemble_size"]
+for i in 0:(N_iter - 1)
+    # run G model to produce output from N_mem ensemble members
+    physical_model =
+        CalibrateAtmos.get_forward_model(Val(Symbol(experiment_id)))
+
+    for m in 1:N_mem # TODO: parallelize with threads!
+        # model run for each ensemble member
+        model_config =
+            CalibrateAtmos.get_config(physical_model, m, i, experiment_id)
+        CalibrateAtmos.run_forward_model(physical_model, model_config)
+        @info "Finished model run for member $m at iteration $i"
+    end
+
+    # update EKP with the ensemble output and update calibrated parameters
+    G_ensemble = CalibrateAtmos.observation_map(Val(Symbol(experiment_id)), i)
+    output_dir = ekp_config["output_dir"]
+    iter_path = CalibrateAtmos.path_to_iteration(output_dir, i)
+    JLD2.save_object(joinpath(iter_path, "observation_map.jld2"), G_ensemble)
+    CalibrateAtmos.update_ensemble(experiment_id, i)
+
+end
+
+include("$pkg_dir/plot/convergence_$experiment_id.jl")