Skip to content

Commit

Permalink
Add documentation about postprocessing pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
ph-kev committed Dec 3, 2024
1 parent f90270d commit 502d29f
Show file tree
Hide file tree
Showing 2 changed files with 178 additions and 0 deletions.
126 changes: 126 additions & 0 deletions docs/src/leaderboard/leaderboard.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Leaderboard

## Long run

### Add a new variable to compare against observations
Computing errors against observations are all contained in the `leaderboard` folder. The
files in the leaderboard folder are `data_sources.jl` and `leaderboard.jl`. Loading and
preprocessing variables of interest are done in `data_sources.jl` and computing the errors
and plotting are done in `leaderboard.jl`. To add a new variable, you ideally only need to
modify `data_sources.jl`.

### Computation
As of now, the leaderboard produces bias plots with the global bias and global root mean
squared error (RMSE). These quantities are computed for each month with the first year of
the simulation not considered as that is the spinup time. The start date of the simulation
is 2012 which means that only the year 2013 is used to compare against observational data.
See the plots below for what this look like.

![bias_with_custom_mask_plot](./leaderboard/images/global_rmse_and_bias_graphs.png)
![gpp_bias_plot](./leaderboard/images/gpp_bias_plot.png)

### Add a new variable to the bias plots
There are four functions that you need to modify to add a new variable which are
`get_sim_var_dict`, `get_obs_var_dict`, `get_mask_dict`, and
`get_compare_vars_biases_plot_extrema`. Each function returns a dictionary that must be
modified to add a new variable to the leaderboard. The dictionaries are `sim_var_dict`,
`obs_var_dict`, `mask_dict`, and `compare_vars_biases_plot_extrema`.

To add a variable for the leaderboard, add a key-value pair to the dictionary `sim_var_dict`
whose key is the short name of the variable and the value is a function that returns a
[`OutputVar`](https://clima.github.io/ClimaAnalysis.jl/dev/var/). Any preprocessing is done
in the function which includes unit conversion and shifting the dates.

```julia
sim_var_dict["et"] =
() -> begin
# Load in variable
sim_var = get(
ClimaAnalysis.SimDir(diagnostics_folder_path),
short_name = "et",
)
# Shift to the first day and subtract one month as preprocessing
sim_var =
ClimaAnalysis.shift_to_start_of_previous_month(sim_var)
return sim_var
end
```

Then, add a key-value pair to the dictionary `obs_var_dict` whose key is the same short name
as before and the value is a function that takes in a start date and returns a `OutputVar`.
Any preprocessing is done in the function.

```julia
obs_var_dict["et"] =
(start_date) -> begin
# We use ClimaArtifacts to use a dataset from ILAMB
obs_var = ClimaAnalysis.OutputVar(
ClimaLand.Artifacts.ilamb_dataset_path(;
context = "evspsbl_MODIS_et_0.5x0.5.nc",
),
"et",
# start_date is used to align the dates in the observational data
# with the simulation data
new_start_date = start_date,
# Shift dates to the first day of the month before aligning the dates
shift_by = Dates.firstdayofmonth,
)
# More preprocessing to match the units with the simulation data
ClimaAnalysis.units(obs_var) == "kg/m2/s" &&
(obs_var = ClimaAnalysis.set_units(obs_var, "kg m^-2 s^-1"))
# ClimaAnalysis cannot handle `missing` values, but does support handling NaNs
obs_var = ClimaAnalysis.replace(obs_var, missing => NaN)
return obs_var
end
```

!!! tip "Preprocessing"
Observational and simulational data should be preprocessed for dates and units. For
simulation data, monthly averages correspond to the first day following the month.
For instance, the monthly average corresponding to January 2010 is on the date
2/1/2010. Preprocessing is done to shift this date to 1/1/2010. When preprocessing
data, we follow the convention that the first day corresponds to the monthly average
for that month. For observational data, you should check the convention being followed
and preprocess the dates if necessary.

For `obs_var_dict`, the anonymous function must take in a start date. The start date is
used in `leaderboard.jl` to adjust the seconds in the `OutputVar` to match between start
date in the simulation data.

Units should be the same between the simulation and observational data.

Next, add a key-value pair to the dictionary `mask_dict` whose key is the same short name
as before and the value is a function that takes in a `OutputVar` representing simulation
data and a `OutputVar` representing observational data and returns a masking function or
`nothing` if no masking function is needed. The masking function is used to correctly
normalize the global bias and global RMSE. See the example below where a mask is made using
the observational data.

```julia
mask_dict["et"] =
(sim_var, obs_var) -> begin
return ClimaAnalysis.make_lonlat_mask(
# We do this to get a `OutputVar` with only two dimensions:
# longitude and latitude
ClimaAnalysis.slice(
obs_var,
time = ClimaAnalysis.times(obs_var) |> first,
);
# Any values that are NaN should be 0.0
set_to_val = isnan,
true_val = 0.0
)
end
```

Finally, add a key-value pair to the dictionary `compare_vars_biases_plot_extrema` whose
key is the same short name as before and the value is a tuple of floats which determine
the range of the bias plots.

```julia
compare_vars_biases_plot_extrema = Dict(
"et" => (-0.00001, 0.00001),
"gpp" => (-8.0, 8.0),
"lwu" => (-40.0, 40.0),
)
```
52 changes: 52 additions & 0 deletions experiments/long_runs/test_leaderboard.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# # Global run of land model

# The code sets up and runs the soil/canopy model for 6 hours on a spherical domain,
# using ERA5 data. In this simulation, we have
# turned lateral flow off because horizontal boundary conditions and the
# land/sea mask are not yet supported by ClimaCore.

# Simulation Setup
# Number of spatial elements: 101 in horizontal, 15 in vertical
# Soil depth: 50 m
# Simulation duration: 365 d
# Timestep: 450 s
# Timestepper: ARS111
# Fixed number of iterations: 3
# Jacobian update: every new Newton iteration
# Atmos forcing update: every 3 hours
import SciMLBase
import ClimaComms
ClimaComms.@import_required_backends
import ClimaTimeSteppers as CTS
using ClimaCore
using ClimaUtilities.ClimaArtifacts

import ClimaDiagnostics
import ClimaAnalysis
import ClimaAnalysis.Visualize as viz
import ClimaUtilities

import ClimaUtilities.TimeVaryingInputs:
TimeVaryingInput, LinearInterpolation, PeriodicCalendar
import ClimaUtilities.ClimaArtifacts: @clima_artifact
import ClimaParams as CP

using ClimaLand
using ClimaLand.Soil
using ClimaLand.Canopy
import ClimaLand
import ClimaLand.Parameters as LP

using Statistics
using CairoMakie
import GeoMakie
using Dates
import NCDatasets

using Poppler_jll: pdfunite

# Make bias plots
include("leaderboard/leaderboard.jl")
diagnostics_folder_path = "/home/kphan2/worktree/ClimaLand.jl/leaderboard/land_longrun_gpu/output_0000"
leaderboard_base_path = "/home/kphan2/worktree/ClimaLand.jl/leaderboard/land_longrun_gpu"
compute_leaderboard(leaderboard_base_path, diagnostics_folder_path)

0 comments on commit 502d29f

Please sign in to comment.