-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #890 from CliMA/kp/leaderboard
Add leaderboard component to ClimaLand's long runs
- Loading branch information
Showing
8 changed files
with
543 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# Leaderboard | ||
|
||
## Long run | ||
|
||
### Add a new variable to compare against observations | ||
The infrastructure to compute errors against observations is in the `leaderboard` folder. | ||
This folder contains two files: `data_sources.jl`, responsible for loading and preprocessing | ||
variables of interest, and `leaderboard.jl`, which computes error and draw plots. To add a | ||
new variable to the comparison, you modify the `data_sources.jl`. | ||
|
||
### Computation | ||
As of now, the leaderboard produces bias plots with the global bias and global root mean | ||
squared error (RMSE). These quantities are computed for each month with the first year of | ||
the simulation not considered as that is the spinup time. The start date of the simulation | ||
is 2008 which means that only the year 2009 is used to compare against observational data. | ||
|
||
### Add a new variable to the bias plots | ||
There are four functions that you need to modify to add a new variable which are | ||
`get_sim_var_dict`, `get_obs_var_dict`, `get_mask_dict`, and | ||
`get_compare_vars_biases_plot_extrema`. Each function returns a dictionary that must be | ||
modified to add a new variable to the leaderboard. The dictionaries are `sim_var_dict`, | ||
`obs_var_dict`, `mask_dict`, and `compare_vars_biases_plot_extrema`. | ||
|
||
To add a variable for the leaderboard, add a key-value pair to the dictionary `sim_var_dict` | ||
whose key is the short name of the variable and the value is a function that returns a | ||
[`OutputVar`](https://clima.github.io/ClimaAnalysis.jl/dev/var/). Any preprocessing is done | ||
in the function which includes unit conversion and shifting the dates. | ||
|
||
```julia | ||
sim_var_dict["et"] = | ||
() -> begin | ||
# Load in variable | ||
sim_var = get( | ||
ClimaAnalysis.SimDir(diagnostics_folder_path), | ||
short_name = "et", | ||
) | ||
# Shift to the first day and subtract one month as preprocessing | ||
sim_var = | ||
ClimaAnalysis.shift_to_start_of_previous_month(sim_var) | ||
return sim_var | ||
end | ||
``` | ||
|
||
Then, add a key-value pair to the dictionary `obs_var_dict` whose key is the same short name | ||
as before and the value is a function that takes in a start date and returns a `OutputVar`. | ||
Any preprocessing is done in the function. | ||
|
||
```julia | ||
obs_var_dict["et"] = | ||
(start_date) -> begin | ||
# We use ClimaArtifacts to use a dataset from ILAMB | ||
obs_var = ClimaAnalysis.OutputVar( | ||
ClimaLand.Artifacts.ilamb_dataset_path(; | ||
context = "evspsbl_MODIS_et_0.5x0.5.nc", | ||
), | ||
"et", | ||
# start_date is used to align the dates in the observational data | ||
# with the simulation data | ||
new_start_date = start_date, | ||
# Shift dates to the first day of the month before aligning the dates | ||
shift_by = Dates.firstdayofmonth, | ||
) | ||
# More preprocessing to match the units with the simulation data | ||
ClimaAnalysis.units(obs_var) == "kg/m2/s" && | ||
(obs_var = ClimaAnalysis.set_units(obs_var, "kg m^-2 s^-1")) | ||
# ClimaAnalysis cannot handle `missing` values, but does support handling NaNs | ||
obs_var = ClimaAnalysis.replace(obs_var, missing => NaN) | ||
return obs_var | ||
end | ||
``` | ||
|
||
!!! tip "Preprocessing" | ||
Observational and simulational data should be preprocessed for dates and units. When | ||
using ClimaDiagnostics to report monthly averages from a simulation, monthly averages | ||
are output on the first day following the month when the average was computed. For | ||
instance, the monthly average corresponding to January 2010 is on the date 1 Feb 2010. | ||
Preprocessing is done to shift this date to 1 Jan 2010. When preprocessing data, we | ||
follow the convention that the first day corresponds to the monthly average for that | ||
month. For observational data, you should check the convention being followed and | ||
preprocess the dates if necessary. | ||
|
||
For `obs_var_dict`, the anonymous function must take in a start date. The start date is | ||
used in `leaderboard.jl` to adjust the seconds in the `OutputVar` to match between start | ||
date in the simulation data. | ||
|
||
Units should be the same between the simulation and observational data. | ||
|
||
Next, add a key-value pair to the dictionary `mask_dict` whose key is the same short name | ||
as before and the value is a function that takes in a `OutputVar` representing simulation | ||
data and a `OutputVar` representing observational data and returns a masking function or | ||
`nothing` if no masking function is needed. The masking function is used to correctly | ||
normalize the global bias and global RMSE. See the example below where a mask is made using | ||
the observational data. | ||
|
||
```julia | ||
mask_dict["et"] = | ||
(sim_var, obs_var) -> begin | ||
return ClimaAnalysis.make_lonlat_mask( | ||
# We do this to get a `OutputVar` with only two dimensions: | ||
# longitude and latitude | ||
ClimaAnalysis.slice( | ||
obs_var, | ||
time = ClimaAnalysis.times(obs_var) |> first, | ||
); | ||
# Any values that are NaN should be 0.0 | ||
set_to_val = isnan, | ||
true_val = 0.0 | ||
) | ||
end | ||
``` | ||
|
||
Finally, add a key-value pair to the dictionary `compare_vars_biases_plot_extrema` whose | ||
key is the same short name as before and the value is a tuple of floats which determine | ||
the range of the bias plots. | ||
|
||
```julia | ||
compare_vars_biases_plot_extrema = Dict( | ||
"et" => (-0.00001, 0.00001), | ||
"gpp" => (-8.0, 8.0), | ||
"lwu" => (-40.0, 40.0), | ||
) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
import ClimaAnalysis | ||
|
||
""" | ||
get_sim_var_dict(diagnostics_folder_path) | ||
Return a dictionary mapping short names to `OutputVar` containing preprocessed | ||
simulation data. This is used by the function `compute_leaderboard`. | ||
To add a variable for the leaderboard, add a key-value pair to the dictionary | ||
`sim_var_dict` whose key is the short name of the variable and the value is an | ||
anonymous function that returns a `OutputVar`. For each variable, any | ||
preprocessing should be done in the corresponding anonymous function which | ||
includes unit conversion and shifting the dates. | ||
The variable should have only three dimensions: time, longitude, and latitude. | ||
""" | ||
function get_sim_var_dict(diagnostics_folder_path) | ||
# Dict for loading in simulation data | ||
sim_var_dict = Dict{String, Any}() | ||
|
||
sim_var_dict["lwu"] = | ||
() -> begin | ||
sim_var = get( | ||
ClimaAnalysis.SimDir(diagnostics_folder_path), | ||
short_name = "lwu", | ||
) | ||
sim_var = | ||
ClimaAnalysis.shift_to_start_of_previous_month(sim_var) | ||
return sim_var | ||
end | ||
|
||
sim_var_dict["et"] = | ||
() -> begin | ||
sim_var = get( | ||
ClimaAnalysis.SimDir(diagnostics_folder_path), | ||
short_name = "et", | ||
) | ||
sim_var = | ||
ClimaAnalysis.shift_to_start_of_previous_month(sim_var) | ||
(ClimaAnalysis.units(sim_var) == "kg m^-2 s^-1") && ( | ||
sim_var = ClimaAnalysis.convert_units( | ||
sim_var, | ||
"mm / day", | ||
conversion_function = units -> units * 86400.0, | ||
) | ||
) | ||
return sim_var | ||
end | ||
|
||
|
||
sim_var_dict["gpp"] = | ||
() -> begin | ||
sim_var = get( | ||
ClimaAnalysis.SimDir(diagnostics_folder_path), | ||
short_name = "gpp", | ||
) | ||
sim_var = | ||
ClimaAnalysis.shift_to_start_of_previous_month(sim_var) | ||
# converting from to `mol CO2 m^-2 s^-1` in sim to `g C m-2 day-1` in obs | ||
(ClimaAnalysis.units(sim_var) == "mol CO2 m^-2 s^-1") && ( | ||
sim_var = ClimaAnalysis.convert_units( | ||
sim_var, | ||
"g m-2 day-1", | ||
conversion_function = units -> units * 86400.0 * 12.011, | ||
) | ||
) | ||
return sim_var | ||
end | ||
return sim_var_dict | ||
end | ||
|
||
""" | ||
get_obs_var_dict() | ||
Return a dictionary mapping short names to `OutputVar` containing preprocessed | ||
observational data. This is used by the function `compute_leaderboard`. | ||
To add a variable for the leaderboard, add a key-value pair to the dictionary | ||
`obs_var_dict` whose key is the short name of the variable and the value is an | ||
anonymous function that returns a `OutputVar`. The function must take in a | ||
start date which is used to align the times in the observational data to match | ||
the simulation data. The short name must be the same as in `sim_var_dict` in the | ||
function `sim_var_dict`. For each variable, any preprocessing is done in the | ||
corresponding anonymous function which includes unit conversion and shifting the | ||
dates. | ||
The variable should have only three dimensions: latitude, longitude, and time. | ||
""" | ||
function get_obs_var_dict() | ||
# Dict for loading in observational data | ||
obs_var_dict = Dict{String, Any}() | ||
obs_var_dict["et"] = | ||
(start_date) -> begin | ||
obs_var = ClimaAnalysis.OutputVar( | ||
ClimaLand.Artifacts.ilamb_dataset_path(; | ||
context = "evspsbl_MODIS_et_0.5x0.5.nc", | ||
), | ||
"et", | ||
new_start_date = start_date, | ||
shift_by = Dates.firstdayofmonth, | ||
) | ||
(ClimaAnalysis.units(obs_var) == "kg/m2/s") && ( | ||
obs_var = ClimaAnalysis.convert_units( | ||
obs_var, | ||
"mm / day", | ||
conversion_function = units -> units * 86400.0, | ||
) | ||
) | ||
obs_var = ClimaAnalysis.replace(obs_var, missing => NaN) | ||
return obs_var | ||
end | ||
|
||
obs_var_dict["gpp"] = | ||
(start_date) -> begin | ||
obs_var = ClimaAnalysis.OutputVar( | ||
ClimaLand.Artifacts.ilamb_dataset_path(; | ||
context = "gpp_FLUXCOM_gpp.nc", | ||
), | ||
"gpp", | ||
new_start_date = start_date, | ||
shift_by = Dates.firstdayofmonth, | ||
) | ||
ClimaAnalysis.dim_units(obs_var, "lon") == "degree" && | ||
(obs_var.dim_attributes["lon"]["units"] = "degrees_east") | ||
ClimaAnalysis.dim_units(obs_var, "lat") == "degree" && | ||
(obs_var.dim_attributes["lat"]["units"] = "degrees_north") | ||
obs_var = ClimaAnalysis.replace(obs_var, missing => NaN) | ||
return obs_var | ||
end | ||
|
||
obs_var_dict["lwu"] = | ||
(start_date) -> begin | ||
obs_var = ClimaAnalysis.OutputVar( | ||
ClimaLand.Artifacts.ilamb_dataset_path(; | ||
context = "rlus_CERESed4.2_rlus.nc", | ||
), | ||
"rlus", | ||
new_start_date = start_date, | ||
shift_by = Dates.firstdayofmonth, | ||
) | ||
ClimaAnalysis.units(obs_var) == "W m-2" && | ||
(obs_var = ClimaAnalysis.set_units(obs_var, "W m^-2")) | ||
return obs_var | ||
end | ||
return obs_var_dict | ||
end | ||
|
||
""" | ||
get_mask_dict() | ||
Return a dictionary mapping short names to a function which takes in `sim_var`, | ||
a `OutputVar` containing simulation data, and `obs_var`, a `OutputVar` | ||
containing observational data, and return a masking function. | ||
To add a variable to the leaderboard, add a key-value pair to the dictionary | ||
`mask_dict` whose key is the same short name in `sim_var_dict` and the value is | ||
a function that takes in a `OutputVar` representing simulation data and a | ||
`OutputVar` representing observational data and returns a masking function or | ||
`nothing` if a masking function is not needed. The masking function is used to | ||
correctly normalize the global bias and global RMSE. | ||
""" | ||
function get_mask_dict() | ||
# Dict for loading in masks | ||
mask_dict = Dict{String, Any}() | ||
|
||
mask_dict["et"] = | ||
(sim_var, obs_var) -> begin | ||
return ClimaAnalysis.make_lonlat_mask( | ||
ClimaAnalysis.slice( | ||
obs_var, | ||
time = ClimaAnalysis.times(obs_var) |> first, | ||
); | ||
set_to_val = isnan, | ||
) | ||
end | ||
|
||
mask_dict["gpp"] = | ||
(sim_var, obs_var) -> begin | ||
return ClimaAnalysis.make_lonlat_mask( | ||
ClimaAnalysis.slice( | ||
obs_var, | ||
time = ClimaAnalysis.times(obs_var) |> first, | ||
); | ||
set_to_val = isnan, | ||
) | ||
end | ||
|
||
mask_dict["lwu"] = (sim_var, obs_var) -> begin | ||
return nothing | ||
end | ||
return mask_dict | ||
end | ||
|
||
""" | ||
get_compare_vars_biases_plot_extrema() | ||
Return a dictionary mapping short names to ranges for the bias plots. | ||
To add a variable to the leaderboard, add a key-value pair to the dictionary | ||
`compare_vars_biases_plot_extrema` whose key is a short name key is the same | ||
short name in `sim_var_pfull_dict` in the function `get_sim_var_pfull_dict` and | ||
the value is a tuple, where the first element is the lower bound and the last | ||
element is the upper bound for the bias plots. | ||
""" | ||
function get_compare_vars_biases_plot_extrema() | ||
compare_vars_biases_plot_extrema = | ||
Dict("et" => (-2.0, 2.0), "gpp" => (-6.0, 6.0), "lwu" => (-40.0, 40.0)) | ||
return compare_vars_biases_plot_extrema | ||
end |
Oops, something went wrong.