Replace cache namedtuple with explicit struct #2217

Sbozzolo · 2023-10-09T18:08:34Z

Currently, the integrator cache is a complex heterogeneous NamedTuple. The cache is partially flat, partially nested. For instance, precomputed_quantities(Y, atmos) is unpacked into the cache, but simulation is not. The details are not super easy to follow given the mix of splatting, unpacking, and merging that occur when building the cache.

Here's some of the fields that are in the cache:

    is_init
    simulation
    spaces
    atmos
    comms_ctx
    sfc_setup
    test
    moisture_model
    model_config
    Yₜ
    limiter
    ᶜΦ
    ᶠgradᵥ_ᶜΦ
    ᶜρ_ref
    ᶜp_ref
    ᶜT
    ᶜf
    ∂ᶜK∂ᶠu₃_data
    params
    energy_upwinding
    tracer_upwinding
    density_upwinding
    edmfx_upwinding
    do_dss
    ghost_buffer
    net_energy_flux_toa
    net_energy_flux_sfc
    env_thermo_quad
    ᶜspecific
    ᶜu
    ᶠu³
    ᶜK
    ᶜts
    ᶜp
    ᶜh_tot
    sfc_conditions
    ᶠtemp_scalar
    ᶜtemp_scalar
    ᶜtemp_scalar_2
    temp_data_level
    temp_data_level_2
    temp_data_level_3
    ᶜtemp_CT3
    ᶠtemp_CT3
    ᶠtemp_CT12
    ᶠtemp_CT12ʲs
    ᶠtemp_C123
    ᶜtemp_UVWxUVW
    sfc_temp_C3
    ᶜ∇²u
    ᶜ∇²specific_energy
    ᶜ∇²specific_tracers
    hyperdiffusion_ghost_buffer
    ᶜ∇²uʲs
    center_space
    radiation_model
    rayleigh_sponge_cache
    viscous_sponge_cache
    precipitation_cache
    subsidence_cache
    large_scale_advection_cache
    edmf_coriolis_cache
    forcing_cache
    radiation_cache
    non_orographic_gravity_wave_cache
    orographic_gravity_wave_cache
    edmfx_nh_pressure_cache
    Δt
    turbconv_cache

Some of the cache items are always added (e.g., non_orographic_gravity_wave_cache), others are added conditionally. Some fields in the cache are directly controlled by flags in parsed_args (e.g. use_reference_state, test_dycore_consistency).

The cache also contains information that is not related to the model (e.g., output_dir), information that is available elsewhere (e.g., Δt), or that is redundant (model_config = atmos.model_config).

Some values are hardcoded in the computation of the cache (e.g., T_ref = FT(255)), others are added with possibly fragile checks (e.g., ᶜf is set by checking if ᶜcoord is a LatLongZPoint, and otherwise set using f_plane_coriolis_frequency).

The text was updated successfully, but these errors were encountered:

Sbozzolo · 2023-10-09T21:21:10Z

This is where I started my experiment to turn the cache into a struct a little while ago. (I did everything quickly and poorly--my goal was to check the compilation time)

simonbyrne · 2023-10-09T23:49:06Z

Maybe we should start by trying to identify stuff which is not needed, or can be accessed from somewhere else (e.g. comms_ctx)

Sbozzolo · 2023-10-10T00:31:13Z

Maybe we should start by trying to identify stuff which is not needed, or can be accessed from somewhere else (e.g. comms_ctx)

At the least the following are trivially redundant:

do_dss (obtained from the space)
moisture_model, radiation_model, turbconv_model, ls_adv, forcing_type, model_config, precip_model, (from the atmos model)
comms_ctx (from the space)
Δt (from the integrator)

Another one that should probably be removed is simulation, which contains comms_ctx, is_debugging_tc, output_dir, restart, job_id, dt, start_date, t_end, information that we can pass directly to the integrator/diagnostics.

charleskawczynski · 2023-10-10T02:47:58Z

This is where I started my experiment to turn the cache into a struct a little while ago. (I did everything quickly and poorly--my goal was to check the compilation time)

Can you make the struct concretely typed and confirm that it still compiles quickly? Maybe that was the main runtime performance issue?

charleskawczynski · 2023-10-10T04:12:32Z

This issue is entangled with two important design tradeoffs:

computing variables on the fly vs using a cache.
Using a “scratch”-like cache, where the same temporary field is used to compute more than one intermediate quantity.

In my opinion the first bullet has a trade-off:

on the fly: less storage is needed, less stateful, may lessen this latency issue, but requires recomputation
cached: recomputation is avoided, but more stateful and more storage is required

Also, every cached variable can be thought of as having some sort of efficiency. A good example of a high efficiency cache is the thermo state: two fields are needed, but we can compute many variables from it. Another way to put this is: adding a cache could increase or decrease the number of heap reads/writes. I think that this is a good quantitative metric we can use to decide if a variable should be cached or computer on the fly, if we want a balanced solution.

Sbozzolo · 2023-10-10T15:46:29Z

This is where I started my experiment to turn the cache into a struct a little while ago. (I did everything quickly and poorly--my goal was to check the compilation time)

Can you make the struct concretely typed and confirm that it still compiles quickly? Maybe that was the main runtime performance issue?

@time CA.get_integrator(CA.AtmosConfig())

Struct with concrete types:
66.282692 seconds (137.65 M allocations: 8.298 GiB, 3.16% gc time, 99.81% compilation time: <1% of which was recompilation)
Struct with no types:
51.040393 seconds (140.87 M allocations: 8.512 GiB, 4.08% gc time, 99.75% compilation time: <1% of which was recompilation)
Mutable Fields:
42.463701 seconds (141.20 M allocations: 8.529 GiB, 4.92% gc time, 99.29% compilation time: <1% of which was recompilation)
NamedTuple:
77.731601 seconds (141.11 M allocations: 8.524 GiB, 3.20% gc time, 99.84% compilation time: <1% of which was recompilation)

So yes, there is a performance penalty is using concrete types, but fixing the root of the issue is still much faster.

This issue is entangled with two important design tradeoffs:
computing variables on the fly vs using a cache.
Using a “scratch”-like cache, where the same temporary field is used to > compute more than one intermediate quantity.

When it comes to design, I also think that this is a good moment to ensure that we make the cache composible and extensible. I believe that this was the original intent with the default_cache and the additional_cache, and it is mostly already implemented. For some of the fields in the additional_cache, *_cache functions are defined with dispatches over the value of the respective entry in atmos_model. It is not fully implemented as in the default cache contains all sorts of stuff, and not all the entries follow the patter (e.g., the gravity waves).

If we were to just look at a clean design and ignore performance, we could have AtmosCache struct with some default fields (that include the scratch space) and with one subfield for each entry in the AtmosModel.

E.g.,

struct AtmosCache
    core
    temporary
    moisture_model
    precipitation_model
    ...
end

Different models would implement their own struct for what they need. E.g.,

struct DryModelCache <: AbstractCache
     var1
     var2
end
struct EquilMoistModelCache <: AbstractCache
     var1
     var2
     var3 
     var4
end

Enforcing the above mentioned pattern and moving all the named tuples to structs.

charleskawczynski · 2023-10-10T16:07:30Z

So yes, there is a performance penalty is using concrete types, but fixing the root of the issue is still much faster.

👍🏻, it's good to know that the main issue is the ClimaCore field.

And I agree with the cache design points.

Sbozzolo · 2023-10-11T20:56:37Z

Incidentally, the function get_cache has a non negligible contribution to the latency. Even with the mutable workaround, it takes 30 seconds to infer/compile on my laptop (subsequent evaluations take less than 1 second).

Hopefully, cleaning up the cache will also reduce that time (which, with the mutable fix, can be 25 % of the time to get the first integrator).

Sbozzolo · 2023-10-13T17:34:36Z

30 % of the compilation time for the cache is to compile orographic gravity waves (mostly compute_OGW_info) and for radiation (mostly RRTMGPI.RRTMGPModel).

charleskawczynski · 2023-10-13T17:39:54Z

Related: CliMA/RRTMGP.jl#391

simonbyrne · 2023-10-16T16:37:17Z

radius we can get from the "global geometry" object (though in the longer term, we shouldn't be computing the gradient based on lat/long)

Sbozzolo assigned charleskawczynski, dennisYatunin, Sbozzolo and juliasloan25 Oct 9, 2023

simonbyrne added this to the O1.2.1 Clean up ClimaAtmos and run it on GPU milestone Oct 9, 2023

simonbyrne mentioned this issue Oct 16, 2023

The cache significantly increases compile time #2138

Closed

Sbozzolo mentioned this issue Oct 17, 2023

Clean up cache #2244

Closed

1 task

Sbozzolo mentioned this issue Oct 26, 2023

Turn cache into struct #2296

Merged

lawann mentioned this issue Oct 30, 2023

O1.2.1 Make ClimaAtmos GPU compatible #1980

Closed

Sbozzolo closed this as completed in #2296 Nov 27, 2023

juliasloan25 removed their assignment Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace cache namedtuple with explicit struct #2217

Replace cache namedtuple with explicit struct #2217

Sbozzolo commented Oct 9, 2023 •

edited

Loading

Sbozzolo commented Oct 9, 2023

simonbyrne commented Oct 9, 2023

Sbozzolo commented Oct 10, 2023 •

edited

Loading

charleskawczynski commented Oct 10, 2023

charleskawczynski commented Oct 10, 2023

Sbozzolo commented Oct 10, 2023

charleskawczynski commented Oct 10, 2023

Sbozzolo commented Oct 11, 2023 •

edited

Loading

Sbozzolo commented Oct 13, 2023

charleskawczynski commented Oct 13, 2023

simonbyrne commented Oct 16, 2023

Replace cache namedtuple with explicit struct #2217

Replace cache namedtuple with explicit struct #2217

Comments

Sbozzolo commented Oct 9, 2023 • edited Loading

Sbozzolo commented Oct 9, 2023

simonbyrne commented Oct 9, 2023

Sbozzolo commented Oct 10, 2023 • edited Loading

charleskawczynski commented Oct 10, 2023

charleskawczynski commented Oct 10, 2023

Sbozzolo commented Oct 10, 2023

charleskawczynski commented Oct 10, 2023

Sbozzolo commented Oct 11, 2023 • edited Loading

Sbozzolo commented Oct 13, 2023

charleskawczynski commented Oct 13, 2023

simonbyrne commented Oct 16, 2023

Sbozzolo commented Oct 9, 2023 •

edited

Loading

Sbozzolo commented Oct 10, 2023 •

edited

Loading

Sbozzolo commented Oct 11, 2023 •

edited

Loading