Skip to content

Commit

Permalink
Merge pull request #52 from CliMA/ln/sf-test
Browse files Browse the repository at this point in the history
Add a perfect model example using SurfaceFluxes.jl
  • Loading branch information
LenkaNovak authored Mar 12, 2024
2 parents 96b1fb7 + 67680ba commit caeca40
Show file tree
Hide file tree
Showing 33 changed files with 3,919 additions and 223 deletions.
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ test/test_init/iteration*
*.hdf5
*.h5
*.nc
*.jld2
!experiments/*/*.hdf5
!experiments/*/*.jld2

# System-specific files and directories generated by the BinaryProvider and BinDeps packages
# They contain absolute paths specific to the host computer, and so should not be committed
Expand All @@ -23,9 +25,10 @@ docs/site/
# Julia package dependencies
/Manifest.toml
.dev/Manifest.toml
plot/Manifest.toml
!plot/Manifest.toml
test/Manifest.toml
docs/Manifest.toml

# misc
.DS_Store
.DS_Store

3 changes: 1 addition & 2 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ version = "0.1.0"

[deps]
CalibrateEmulateSample = "95e48a1f-0bec-4818-9538-3db4340308e3"
ClimaAtmos = "b2c96348-7fb7-4fe0-8da9-78d88439e717"
ClimaComms = "3a4d1b5c-c61d-41fd-a00a-5873ba7a1b0d"
ClimaCore = "d414da3d-4745-48bb-8d80-42e94e092884"
ClimaParams = "5c42b081-d73a-476f-9059-fd94b934656c"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
EnsembleKalmanProcesses = "aa8a2aa5-91d8-4396-bcef-d4f2ec43552d"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
Expand All @@ -19,7 +19,6 @@ YAML = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"

[compat]
CalibrateEmulateSample = "0.5"
ClimaAtmos = "0.20.0"
ClimaComms = "0.5"
ClimaCore = "0.11, 0.12"
Distributions = "0.25"
Expand Down
6 changes: 4 additions & 2 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# API

## ClimaAtmos Interface
## Model Interface
```@docs
CalibrateAtmos.get_atmos_config
CalibrateAtmos.get_config
CalibrateAtmos.run_forward_model
CalibrateAtmos.get_forward_model
CalibrateAtmos.observation_map
```

## EnsembleKalmanProcesses Interface
Expand Down
48 changes: 24 additions & 24 deletions docs/src/experiment_setup_guide.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Experiment Set Up Guide
# Experiment Set Up Guide for ClimaAtmos

Please read this entire guide before setting up an experiment.
This guide assumes familiarity with ClimaAtmos and a basic understanding of Ensemble Kalman Inversion.
This guide assumes familiarity with ClimaAtmos and a basic understanding of Ensemble Kalman Inversion (EKI). Because moderate resolution ClimaAtmos runs need to be run with MPI and/or on GPU, this example demonstrates how to run the pipeline of a perfect model example calibration (i.e., generating the "true" data with the same model that we are calibrating with towards a known set of parameters) using bash scripts for the Caltech HPC cluster and the EKI calibration machinery.

## Summary

The overall pipeline is as follows:

For a perfect model scenario, observations are generated by running the model and then processing the diagnostic output through the constructed observation map.

For the example experiment, `sphere_held_suarez_rhoe_equilmoist`, this is done by running
For the example experiment, `sphere_held_suarez_rhoe_equilmoist`, this is done by running
`sbatch experiments/sphere_held_suarez_rhoe_equilmoist/generate_observations.sbatch`. This script runs the model, passes the output through the observation map, and saves the result.

Once the observations have been processed and saved, the actual calibration pipeline can be run via
Expand All @@ -26,7 +26,7 @@ This consists of three `sbatch` scripts:
To calibrate parameters, you need:
- Atmos model configuration
- Steady-state restart file
- EKP configuration
- EKP configuration
- Prior parameter distributions
- Truth and noise data
- Observation map script with a function `observation_map(iteration)`
Expand All @@ -39,13 +39,13 @@ First, create a folder for your experiment with a descriptive experiment id in t
This is the ClimaAtmos configuration which will be used to run all simulations in the calibration. The only changes between model runs are the parameters selected for calibration.
This file is a typical Atmos YAML file with a few restrictions. It must:

- be named `atmos_config.yml`.
- be named `model_config.yml`.
- include an `output_dir` to store model output.
- include the path to the restart file. This is detailed in the next section.
Here is an example of the arguments you will need:
```
output_dir: output/experiment_name
restart_file: experiments/experiment_name/restart_file.hdf5
restart_file: experiments/experiment_name/restart_file.hdf5
```

If your configuration requires a parameter TOML file, add this file to the experiment folder and set the `toml` entry in the configuration to the relative file path from the base directory of `CalibrateAtmos.jl`.
Expand All @@ -58,19 +58,19 @@ The restart file is a model snapshot which will be used to spin-off all ensemble

Once you have settled on a configuration, follow these steps to generate a restart file:

1. Run ClimaAtmos with your configuration to determine the time it takes for the simulation to reach an equilibrium state.
2. Generate a restart file by setting `dt_save_restart` in the model configuration to the equilibrium timestamp.
1. Run ClimaAtmos with your configuration to determine the time it takes for the simulation to reach an equilibrium state.
2. Generate a restart file by setting `dt_save_restart` in the model configuration to the equilibrium timestamp.
3. Transfer the file to your experiment directory and enter the relative path into your atmosphere configuration file:
```
restart_file: experiments/experiment_name/restart_file.hdf5
restart_file: experiments/experiment_name/restart_file.hdf5
```

!!! note
A restarted simulation starts from the saved date of the restart file. A restart file saved at 200 days and a `t_end` of 201 days will only run for 1 simulated day:
```
restart_file: experiments/experiment_name/day200.hdf5
restart_file: experiments/experiment_name/day200.hdf5
t_end: 201 days
```
```

## Prior Distribution File
First, create your TOML file in your experiment folder.
Expand All @@ -85,7 +85,7 @@ constraint = "[bounded(0,5)]"
Note that the prior distribution here is in unconstrained space - the `constraint` list constrains the distribution in parameter space.

!!! note "Why two parameter spaces?"
The calibration tools are effective when working with unconstrained parameters (`u`), whereas physical models typically require (partially-)bounded parameters (`φ`).
The calibration tools are effective when working with unconstrained parameters (`u`), whereas physical models typically require (partially-)bounded parameters (`φ`).
To satisfy both conditions the `ParameterDistribution` object contains maps between these two spaces. The drawback is that the prior must be defined in the unconstrained space.

An easy way to generate prior distributions in constrained parameter space is with the [constrained_gaussian](https://clima.github.io/EnsembleKalmanProcesses.jl/dev/API/ParameterDistributions/#EnsembleKalmanProcesses.ParameterDistributions.constrained_gaussian) constructor from `EnsembleKalmanProcesses.ParameterDistributions`. Here is an example:
Expand All @@ -99,9 +99,9 @@ constrained_gaussian("name", physical_mean, physical_std, lower_bound, upper_bou
```
This constructor will provide a Normal distribution, which, when transformed will approximate the target mean and standard deviation provided.
```
ParameterDistribution with 1 entries:
'name' with Constraint{BoundedBelow}[Bounds: (-50, ∞)] over distribution
Parameterized(Distributions.Normal{Float64}(μ=5.1393230339433345, σ=0.2256676316186263))
ParameterDistribution with 1 entries:
'name' with Constraint{BoundedBelow}[Bounds: (-50, ∞)] over distribution
Parameterized(Distributions.Normal{Float64}(μ=5.1393230339433345, σ=0.2256676316186263))
```
Copy the mean and standard deviation values into the constructor for the prior distribution in your TOML file like so:
```toml
Expand All @@ -113,7 +113,7 @@ constraint = "bounded_below(-50)"
description = " this prior has approximate (mean,std) = (125,40) and is bounded below by -50"
```

You can also ensure that the bounds match the `constraint` entry in your TOML file.
You can also ensure that the bounds match the `constraint` entry in your TOML file.
Since `constraint` is a TOML string parsed to a vector in Julia, ensure that you have quotes around the square brackets for the constraint list.

Constraint constructors:
Expand All @@ -127,25 +127,25 @@ Constraint constructors:
## Observation Map
The observation map is applied to process model output diagnostics into the exact observable used to fit to observations. In a perfect model setting it is used also to generate the observation.

This component is inflexible for now. Your observation map file must be a script with the name `observation_map.jl` which exports a function `observation_map(iteration)`.
This component is inflexible for now. Your observation map file must be a script with the name `observation_map.jl` which exports a function `observation_map(::Val{:<experiment_id>}, iteration)`, this function is specific to each experiment, so it is dispatched on the `experiment_id`.
These requirements arise from the update step, which runs:
```julia
include("experiments/$experiment_id/observation_map.jl")
observation_map(iteration)
observation_map(::Val(Symbol(experiment_id)), iteration)
```
This function must load in model diagnostics for each ensemble member in the iteration and construct an array `arr = Array{Float64}(undef, dims..., ensemble_size)` such that
`arr[:, i]` will return the i-th ensemble member's observation map output.
This function must load in model diagnostics for each ensemble member in the iteration and construct an array `arr = Array{Float64}(undef, dims..., ensemble_size)` such that
`arr[:, i]` will return the i-th ensemble member's observation map output. Note this floating point precision is required for the EKI update step.

In the update step of EKI, the array will be saved in a JLD2 file named `observation_map.jld2` in the iteration folder of the output directory.

As an example, in `observation_map(iteration)` in the `sphere_held_suarez_rhoe_equilmoist` experiment, we have the following sequence:

`observation_map(iteration)` constructs the array to store the ensemble's observations. Then, for each ensemble member `m`:
`observation_map(::Val{:sphere_held_suarez_rhoe_equilmoist}, iteration)` constructs the array to store the ensemble's observations. Then, for each ensemble member `m`:
- it loads in the model diagnostic output, in this case 60-day air temperature averages.
- it calls `process_member_data(m)` and stores the result in the output array.
Pseudocode for `observation_map(iteration)`:
```julia
function observation_map(iteration)
function observation_map(::Val{:sphere_held_suarez_rhoe_equilmoist}, iteration)
# Get Configuration
experiment_id = "sphere_held_suarez_rhoe_equilmoist"
config = load_config(experiment_id)
Expand Down Expand Up @@ -185,7 +185,7 @@ function process_member_data(ta; output_variance = false)
end
```

If you are running a perfect-model experiment and generating truth data from ClimaAtmos itself, you may find it useful to create a kernel function to compute the observation map for each ensemble member. You can use this to run the default simulation's diagnostic output through the observation map and save the truth data and noise.
If you are running a perfect-model experiment and generating truth data from ClimaAtmos itself, you may find it useful to create a kernel function to compute the observation map for each ensemble member. You can use this to run the default simulation's diagnostic output through the observation map and save the truth data and noise.

## Generating Truth Data
The truth data must be an array of observations with the same dimensions as an individual ensemble member's entry in the observation map (`arr[:, i]` from above).
Expand Down Expand Up @@ -215,7 +215,7 @@ The EKP configuration file must contain the following:
- `n_iterations`, the number of iterations to run
- `ensemble_size`, the ensemble size
- `prior_path`, the path to the TOML file with the prior parameter distributions
- `parameter_names`, a list of the long names of the parameters being calibrated
- `parameter_names`, a list of the long names of the parameters being calibrated
- `truth_data`, the truth data
- `truth_noise`, the covariance of the truth data
- `output_dir`, the folder where you want calibration data and logs to be output. This must be the same as the `output_dir` in the model configuration file.
Expand Down
19 changes: 18 additions & 1 deletion docs/src/quickstart.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,30 @@
# Getting Started

Before you run, make sure your system meets the dependencies of [CalibrateEmulateSample.jl](https://clima.github.io/CalibrateEmulateSample.jl/dev/installation_instructions/). TODO: remove if workaround

## HPC Cluster
A good way to get started is to run the initial experiment, `sphere_held_suarez_rhoe_equilmoist`.
It is a perfect-model calibration, serving as a test case for the initial pipeline.

This experiment runs the Held-Suarez configuration, estimating the parameter `equator_pole_temperature_gradient_wet`.
By default, it runs 10 ensemble members for 3 iterations.
By default, it runs 10 ensemble members for 3 iterations.

To run this experiment:
1. Log onto the Caltech HPC
2. Clone CalibrateAtmos.jl and `cd` into the repository.
3. Run: `bash experiments/pipeline.sh sphere_held_suarez_rhoe_equilmoist 8`. This will run the `sphere_held_suarez_rhoe_equilmoist` experiment with 8 tasks per ensemble member.

## Local Machine
To run an experiment on your local machine, you can use the `experiments/pipeline.jl` script. This is recommended for more lightweight experiments, such as the `surface_fluxes_perfect_model` experiment, which uses the [SurfaceFluxes.jl](https://github.com/CliMA/SurfaceFluxes.jl) package to generate a physical model that calculates the Monin Obukhov turbulent surface fluxes based on idealized atmospheric and surface conditions. Since this is a "perfect model" example, the same model is used to generate synthetic observations using its default parameters and a small amount of noise. These synthetic observations are considered to be the ground truth, which is used to assess the model ensembles' performance when parameters are drawn from the prior parameter distributions. To run this experiment, you can use the following command from terminal to run an interactive run:

```bash
julia -i experiments/pipeline.jl surface_fluxes_perfect_model
```

This pipeline mirrors the pipeline of the bash srcipts, and the same example can be run on the HPC cluster if needed:

```bash
bash experiments/pipeline.sh surface_fluxes_perfect_model 8
```

The experiments (such as `surface_fluxes_perfect_model`) can be equally defined within the component model repos (in this case, `SurfaceFluxes.jl`), so that the internals of `CalibrateAtmos.jl` do not explicitly depend on component models.
6 changes: 3 additions & 3 deletions experiments/initialize.sbatch
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
experiment_id=$1

echo "Initializing calibration for experiment $experiment_id"
julia --color=no --project=experiments -e 'using Pkg; Pkg.instantiate(;verbose=true)'
julia --color=no --project=experiments/$experiment_id -e 'using Pkg; Pkg.instantiate(;verbose=true)'

julia --color=no --project=experiments -e '
julia --color=no --project=experiments/$experiment_id -e '
import CalibrateAtmos
CalibrateAtmos.initialize("'$experiment_id'")
'

echo "Calibration initialized."
echo "Calibration initialized."
11 changes: 6 additions & 5 deletions experiments/model_run.sbatch
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ member=$(printf "member_%03d" "$SLURM_ARRAY_TASK_ID")
output=output/$experiment_id/$format_i/$member/model_log.out

# Run the forward model
srun --output=$output julia --color=no --project=experiments -e "
using ClimaComms
ClimaComms.init(ClimaComms.context())
srun --output=$output julia --color=no --project=experiments/$experiment_id -e "
import CalibrateAtmos
atmos_config = CalibrateAtmos.get_atmos_config($SLURM_ARRAY_TASK_ID, $iteration, \"$experiment_id\")
CalibrateAtmos.run_forward_model(atmos_config)
include(\"experiments/$experiment_id/model_interface.jl\")
physical_model = CalibrateAtmos.get_forward_model(Val(:$experiment_id))
config = CalibrateAtmos.get_config(physical_model, $SLURM_ARRAY_TASK_ID, $iteration, \"$experiment_id\")
CalibrateAtmos.run_forward_model(physical_model, config)
"
43 changes: 43 additions & 0 deletions experiments/pipeline.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# experiment-specific Project.toml instantiation, function extensions and config load

using Pkg
experiment_id = ARGS[1]
Pkg.activate("experiments/$experiment_id")
Pkg.instantiate()
using CalibrateAtmos
pkg_dir = pkgdir(CalibrateAtmos)
experiment_path = "$pkg_dir/experiments/$experiment_id"
include("$experiment_path/model_interface.jl")
include("$experiment_path/generate_truth.jl")
ekp_config =
YAML.load_file(joinpath("experiments", experiment_id, "ekp_config.yml"))

# initialize the CalibrateAtmos
CalibrateAtmos.initialize(experiment_id)

# run experiment with CalibrateAtmos for N_iter iterations
N_iter = ekp_config["n_iterations"]
N_mem = ekp_config["ensemble_size"]
for i in 0:(N_iter - 1)
# run G model to produce output from N_mem ensemble members
physical_model =
CalibrateAtmos.get_forward_model(Val(Symbol(experiment_id)))

for m in 1:N_mem # TODO: parallelize with threads!
# model run for each ensemble member
model_config =
CalibrateAtmos.get_config(physical_model, m, i, experiment_id)
CalibrateAtmos.run_forward_model(physical_model, model_config)
@info "Finished model run for member $m at iteration $i"
end

# update EKP with the ensemble output and update calibrated parameters
G_ensemble = CalibrateAtmos.observation_map(Val(Symbol(experiment_id)), i)
output_dir = ekp_config["output_dir"]
iter_path = CalibrateAtmos.path_to_iteration(output_dir, i)
JLD2.save_object(joinpath(iter_path, "observation_map.jld2"), G_ensemble)
CalibrateAtmos.update_ensemble(experiment_id, i)

end

include("$pkg_dir/plot/convergence_$experiment_id.jl")
Loading

0 comments on commit caeca40

Please sign in to comment.