Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OKR O.3.2.2: A flexible diagnostic module for ClimaAtmos #2043

Closed
Sbozzolo opened this issue Aug 31, 2023 · 10 comments
Closed

OKR O.3.2.2: A flexible diagnostic module for ClimaAtmos #2043

Sbozzolo opened this issue Aug 31, 2023 · 10 comments
Assignees
Labels
SDI Software Design Issue

Comments

@Sbozzolo
Copy link
Member

Sbozzolo commented Aug 31, 2023

The Climate Modeling Alliance

Software Design Issue 📜

Purpose

This SDI proposes to add a flexible module to compute arbitrary diagnostics from the simulation.

We want to be able to:

  • compute arbitrary diagnostics with possibly arbitrary reductions in time (e.g., arithmetic average, min, max) on arbitrary intervals (e.g., daily, monthly),
  • let users specify what they want/where/when to output.

Our goals are:

  • it has be easy for a users to specify what they want to output and for developers to add new diagnostics,
  • diagnostics that are not used should not be computed,
  • there should be at least one way for users/developers to achieve complex behaviors that are not pre-defined in ClimaAtmos (e.g., adding a new reduction operation),
  • our abstractions should not significantly degrade performance.

Cost/Benefits/Risks

Diagnostics are currently hard-coded, so this is an important step towards a general and usable ClimaAtmos.

A proof-of-concept implementation is already available, and some of the challenges have been addressed.

A possible performance problem with the design outlined below is that diagnostics cannot trivially use information computed in other diagnostics.

People and Personnel

This design was discussed with @simonbyrne

Components

Diagnostics are implemented as callbacks in the integrator. At fixed intervals of (integration) time, the diagnostics are computed from the state are output to disk. In an nutshell, this SDI discusses a module to produce a list of callbacks.

We will initially focus on point-wise operations and HDF5 files. Online remapping and producing NetCDF files can be implemented as a different output_writer (see below) and will be tackled after the main infrastructure is put in place.

The low-level details

(Snippets of code below are to be considered pseudo-code.)

DiagnosticVariable

We represent a diagnostic variable as structs that look like (roughly following ClimateMachine)

struct DiagnosticVariable
    short_name::String
    long_name::String
    units::String
    description::String
    compute_from_integrator::Function
end

Fundamentally, a DiagnosticVariable is a recipe on how to compute a given diagnostic variable. Arguably, most of this struct is not really needed. The key field is compute_from_integrator, which provides the recipe on how to obtain the value of the diagnostic variable from the integrator. long_name, units, and description are provided for documentation. In the future, we can put in place a simple script to produce a table to add to the documentation to list what diagnostics can be computed (as in https://clima.github.io/ClimateMachine.jl/latest/DevDocs/DiagnosticVariableList/). We add these fields also to encourage good practices in documenting the diagnostics variables.

The short_name is primarily the variable name in the output files, and is unique. The long_name is a descriptive name. We will follow the CMIP table wherever available. We don't need a standard_name as we are already following CMIP for short names.

compute_from_integrator has to be a function that takes two arguments: the integrator object, and an optional pre-allocated output space. If output is not nothing, the diagnostic is computed in-place, otherwise new memory is allocated. An example of compute_from_integrator to compute air temperature might look like

    function compute_from_integrator(integrator, out)
        thermo_params = CAP.thermodynamics_params(integrator.p.params)
        out .= TD.air_temperature.(thermo_params, integrator.p.ᶜts)
    end

Supporting this syntax (with the optional out) requires adding a new method to ClimaCore.

The DiagnosticVariable struct is also a (optional) public interface. Users/developers that want to add more diagnostic variables can define their own.

ClimaAtmos will provide a collection of DiagnosticVariables in a dictionary all_diagnostics. Developers can make new diagnostics available by adding new DiagnosticVariables. The integrator contains the atmospheric model, so developers can dispatch model-specific calculations on that.

ScheduledDiagnostic

DiagnosticVariables are the ones that we know how to compute. The ones we are actually computing in a given simulation are described by ScheduledDiagnostic objects, which are

struct ScheduledDiagnostic
    variable::DiagnosticVariable
    reduction_time_func::Function = nothing
    reduction_space_func::Function = nothing
    period_iterations::Int
    output_writer
end

A ScheduledDiagnostic is a variable that is computed and output. The struct contains:

  • The variable we want to compute (and internally, this gives us information about how to compute the diagnostic and its name)
  • If we want to perform any reduction in time (for example, if we want to take time averages, or a daily max), we can pass a function to reduction_time_func. When a function is passed to reduction_time_func, we allocate an accumulator for this specific ScheduledDiagnostic and we repeatedly apply reduction_time_func at the end of every iteration until we reach period. If reduction_time_func is nothing, no time reduction is performed, instead, the variable is output as is every period_iterations iterations.
  • If we want to perform any reduction in space (for example, if we want to take space averages), we can pass a function to reduction_space_func. This will be called before writing the diagnostic. This will not implemented at this stage and only point-wise diagnostics are considered.
  • A function/object responsible for writing the variable to disk. output_writer is expected to take three arguments: the value that has to be written, the DiagnosticVariable, and the integrator.

Details about this struct might change as more complexity is added (e.g., we might want to add a skip_initial field).

We will provide factories to produce output_writers for standard use-cases. For example, to write to HDF5 files given their path. Having a rich output_writer function allows us to support complex behaviors (such as creating new files, or appending to existing, or all sorts of combinations).

We work with iterations instead of because it is well-defined and unambiguous. We will provide a second constructor for ScheduledDiagnostic that is more intuitive and that enforces constraints. For example,

function ScheduledDiagnostic(variable::DiagnosticVariable,
                             reduction_time_func::Function = nothing,
                             reduction_space_func::Function = nothing,
                             time_period::Real,
                             out_writer
)
    # if dt is the simulation timestep
    time_span % dt == 0 || error("time_span has to be multiple of simulation timestep")
    ...
end

We allow multiple ScheduledDiagnostics for a given DiagnosticVariable (for example, if we want to have mean daily and yearly temperature).

Note that this is also a (optional) public interface. Users/developers that want to add/change more diagnostic can define their own.

To run a simulation, we collect all the ScheduledDiagnostics we want to run into a DiagnosticTable (which is just an iterable-- we will not define a new type for this). Upon initialization of the simulation, the DiagnosticTable is parsed to pre-allocate all the accumulators and counters and prepare all the callbacks that are going to compute and output the diagnostics.

A technical note here is that we will have to restrict the space of allowed reductions in time to the subset of operations for which we know the identity of the group (e.g., for the arithmetic average of numbers, the value 0 is the identity of the group; we will have to hard-code this).

The higher level interfaces

The interface described in the previous section are available to be used by users and developers, but are too detailed for running most simulations (we still expect developers to use it to extend ClimaAtmos). Therefore, we will also provide higher level functions for common operations and model-depended defaults. get_default_diagnostics(AtmosModel) will return the list of default diagnostics for the various components in AtmosModel. This will be done by recursively asking for defaults to the various submodules, so that users can obtain the defaults of only specific submodules if they want to. With this function, we expect that only one line of code will be needed for users that want to output the default diagnostics for their specific model.

Examples of other convenience functions that we will provide are (given a DiagnosticTable):

  • add_diagnostics!(disgnostic_table::List, variables::List[DiagnosticVariable], output_file::String)
  • add_daily_averages!(disgnostic_table::List, variables::List[DiagnosticVariable], output_file::String)
  • add_precipitation_diagnostics!(disgnostic_table::List, variable_names::List[String], output_file::String)
  • ...

This interface can be used in a script. Alternatively, users can specify the diagnostics they want to have in a YAML file that looks like:

diagnostics:
  short_name: u
  time_period: 5sec
  reduction: max
  output_file: u_5sec_max.h5

Internally, this is parsed and evaluated with the constructor for ScheduledDiagnostic, and then a DiagnosticTable is compiled, so that we are brought back to the low-level case.

Results and Deliverables

The diagnostic module is an important part of ClimaAtmos. We will target outstanding levels of documentation. We will verify that the overhead due to the abstractions we put in place does not degrade performance significantly with respect to the main integration loop.

Task Breakdown And Schedule

We have a first proof-of-concept implementation: https://github.com/CliMA/ClimaAtmos.jl/tree/gb/diagnostics

This first implementation does already everything we want, but with several hard-coded values and some workarounds. This implementation is currently being built on top of an experimental interface that bypasses the driver (see upcoming SDI). We will add the changes to the driver once the implementation stabilizes.

Rough timeline:

  • Point-wise diagnostics (without remapping) in HDF5
  • Performance assessment
  • Add support for remapping and support for NetCDF

SDI Revision Log

CC

@tapios @simonbyrne @cmbengue

EDITS: Typos

@Sbozzolo Sbozzolo added the SDI Software Design Issue label Aug 31, 2023
@Sbozzolo Sbozzolo changed the title OKR 0.3.2: A flexible diagnostic module for ClimaAtmos OKR O.3.2.2: A flexible diagnostic module for ClimaAtmos Aug 31, 2023
@tapios
Copy link
Contributor

tapios commented Aug 31, 2023

This looks good. Thank you. A couple comments/requests:

  • long_name, units, and description are absolutely essential. They are standard attributes in the netCDF files the community uses.
  • Please use the standard variable names as far as possible (see, e.g., here for the CORDEX-CMIP subset and here for CMIP). This is essential. We want to compare our model with others, and gives others the opportunity to do so. This will get much easier with standard variable names. We do not need to invent our own names now just to have to revise it later.
  • Remapping to a diagnostic grid will also be important. Whatever you do now, please make sure that this is not getting more complicated than needed in the end (e.g., it may be useful to distinguish between 2d, surface and TOA, fields, and 3d fields).

@szy21 szy21 added this to the O3.2.2 Add a diagnostic module milestone Aug 31, 2023
@szy21
Copy link
Member

szy21 commented Aug 31, 2023

We will add the standard_name attribute and use the CMIP convention when it is available. The question is more about the short_name (which is called output_variable in the CMIP table) and long_name. Do we want to follow the CMIP standard as well? I am leaning towards not as CMIP doesn't have all the output_variable we will have, and some of the names are not straightforward to understand, but I am ok with following CMIP.

@tapios
Copy link
Contributor

tapios commented Aug 31, 2023

You can use the short_names in https://clima.github.io/ClimateMachine.jl/latest/DevDocs/DiagnosticVariableList/, and derivatives thereof. But let's please not start inventing new names now.

If you want to add the CMIP names anyway, why not doing it right away, as you are adding the variables? It will make comparisons with other models, which should start soon, easier.

@szy21
Copy link
Member

szy21 commented Sep 1, 2023

Based on an offline discussion, here is what we are going to do for names:
short_name: Used as output variable names and in output file names. We will have our own list following ClimateMachine where available.
long_name: Unique and used as an identifier for diagnostics. We will follow the CMIP table where available, but replace the space with an underscore, and modify as needed if the names are not unique.
standard_name: Not used anywhere. It will be an attribute for the output variable. We will follow the CMIP table where available.

I will update the SDI.

@szy21
Copy link
Member

szy21 commented Sep 6, 2023

Based on (another) offline discussion, we will use CMIP names for both short_name and long_name. We don't need a standard_name then. I will update the SDI.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Sep 12, 2023

#2064 contains a fully working implementation of the infrastructure underpinning this SDI. The diagnostic variables and the defaults are not yet populated.

I am leaving some comments here on rough edges that will likely not be fixed at this point because they depend on other work:

  • Each time we compute a diagnostic, we are allocating new memory. Everything is ready to avoid doing that by allocating only the first time (by using in-place operations). However, we will likely need ClimaCore to support this operation. This is no longer the case, but the code has now several if/else statements that could be removed in the future.

  • Before computing the diagnostics, we explicitely call set_precomputed_quantities!, which mutates the state. Work on set_precomputed_quantities! is actively ongoing, so this will soon be fixed.

  • The writers (NDF5 and NetCDF) should be considered stub of implementations. They don't have many options, and they do not support everything we want to support (e.g., the NetCDF writer does not support distributed runs). Some of this also depends on other work (e.g., remapping with distributed runs).

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Sep 25, 2023

#2064 is being merged, implementing the majority of this SDI.

My next step is to work on the remapping, so that we can produce lat-long-z files for generic configurations.

ClimaCore implements a simple pointwise remapping function. However, this function relies on the assumption that the process contains all the points. So, the function does not work in distributed environments. I am going to generalize this function so that it can work on single-threaded runs, as well as MPI runs and GPUs. This requires changing how the function works and casting the problem into a preprocessing step where we prepare a weights matrix, and the actual remapping (which we will formulate as a matrix-vector multiplication).

bors bot added a commit that referenced this issue Sep 25, 2023
2064: Add diagnostic module r=Sbozzolo a=Sbozzolo

This PR adds a new diagnostic module that roughly follows what described in #2043. 



Co-authored-by: Gabriele Bozzola <gbozzola@caltech.edu>
Co-authored-by: LenkaNovak <lenka@caltech.edu>
Co-authored-by: Zhaoyi Shen <11598433+szy21@users.noreply.github.com>
@Sbozzolo
Copy link
Member Author

The above-mentioned distributed remapping is being implemented in CliMA/ClimaCore.jl#1475

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Oct 2, 2023

#2179 implements the ClimaAtmos side. Once this is merger, ClimaAtmos will directly produce remapped netCDF files as the simulation runs.

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Oct 5, 2023

CliMA/ClimaCore.jl#1475 was meged.

The two(/three) main outstanding items are:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SDI Software Design Issue
Projects
None yet
Development

No branches or pull requests

4 participants