Skip to content

Commit

Permalink
add coupler output summary table
Browse files Browse the repository at this point in the history
  • Loading branch information
juliasloan25 committed May 10, 2024
1 parent ad4e9c4 commit 5be2abd
Show file tree
Hide file tree
Showing 56 changed files with 955 additions and 1,135 deletions.
47 changes: 47 additions & 0 deletions .buildkite/benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
## ClimaCoupler Benchmarks Pipeline

### Purpose
The goal of the benchmarks pipeline is to have concrete comparisons between
analogous simulations of different setups and on different architectures.
This allows us to compare things like performance and allocations across
atmosphere-only vs coupled runs, and on CPU vs GPU.

This pipeline is triggered manually rather than on a schedule, so that we
can monitor the various metrics after specific changes made to the code.

### Simulation Setups
#### All simulations
- Timestep: 120 seconds
- Horizontal resolution: 30 spectral elements (~110km)
- Vertical resolution: 63 levels
- Config setup duplicated from ClimaAtmos.jl v0.23.0
[gpu_aquaplanet_diagedmf.yml](https://github.com/CliMA/ClimaAtmos.jl/blob/v0.23.0/config/gpu_configs/gpu_aquaplanet_diagedmf.yml),
with minor tweaks

#### CPU ClimaAtmos with diagnostic EDMF
- Atmosphere-only simulation
- Run on 64 CPU threads

#### CPU AMIP with diagnostic EDMF
- ClimaAtmos coupled to ClimaLand bucket model, with prescribed sea surface
temperature and sea ice
- Run on 64 CPU threads

#### GPU ClimaAtmos with diagnostic EDMF
- Atmosphere-only simulation
- Run on 4 A100 GPUs sharing 1 node

#### GPU AMIP with diagnostic EDMF
- ClimaAtmos coupled to ClimaLand bucket model, with prescribed sea surface
temperature and sea ice
- Run on 4 A100 GPUs sharing 1 node

### Comparison Metrics
- Simulated years per day (SYPD): The number of years of simulation time we
can run in 1 day of walltime
- CPU simulation object allocations: The allocations in GB of the simulation
object, which contains everything needed to run the simulation.
In the atmosphere-only case, this is the `AtmosSimulation` object.
In the coupled case, this is the `CoupledSimulation` object, which includes
all of the component models, coupler fields, and auxiliary objects. More
information on this object can be found in the `Interfacer` docs.
108 changes: 108 additions & 0 deletions .buildkite/benchmarks/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
agents:
queue: clima
slurm_time: 24:00:00
modules: common

env:
JULIA_NVTX_CALLBACKS: gc
OPENBLAS_NUM_THREADS: 1
OMPI_MCA_opal_warn_on_missing_libcuda: 0
SLURM_KILL_BAD_EXIT: 1
SLURM_GRES_FLAGS: "allow-task-sharing"
BENCHMARK_CONFIG_PATH: "config/benchmark_configs"

steps:
- label: "init :GPU:"
key: "init_gpu_env"
command:
- echo "--- Instantiate experiments/AMIP"
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.instantiate(;verbose=true)'
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.precompile()'
- julia --project=experiments/AMIP -e 'using Pkg; Pkg.status()'

- echo "--- Instantiate test env"
- "julia --project=test/ -e 'using Pkg; Pkg.develop(path=\".\")'"
- "julia --project=test/ -e 'using Pkg; Pkg.instantiate(;verbose=true)'"
- "julia --project=test/ -e 'using Pkg; Pkg.precompile()'"
- "julia --project=test/ -e 'using Pkg; Pkg.status()'"

- echo "--- Download artifacts"
- "julia --project=artifacts -e 'using Pkg; Pkg.instantiate(;verbose=true)'"
- "julia --project=artifacts -e 'using Pkg; Pkg.precompile()'"
- "julia --project=artifacts -e 'using Pkg; Pkg.status()'"
- "julia --project=artifacts artifacts/download_artifacts.jl"

agents:
slurm_gpus: 1
slurm_cpus_per_task: 8
env:
JULIA_NUM_PRECOMPILE_TASKS: 8
JULIA_MAX_NUM_PRECOMPILE_FILES: 50

- wait

- group: "CPU benchmarks"
steps:
- label: "CPU ClimaAtmos with diagnostic EDMF"
key: "climaatmos_diagedmf"
command: "srun julia --color=yes --project=test/ test/component_model_tests/climaatmos_standalone/atmos_driver.jl --config_file $BENCHMARK_CONFIG_PATH/climaatmos_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/climaatmos/climaatmos_diagedmf_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
CLIMACOMMS_DEVICE: "CPU"
agents:
slurm_ntasks_per_node: 64
slurm_nodes: 1
slurm_mem_per_cpu: 4GB

- label: "CPU AMIP with diagnostic EDMF"
key: "amip_diagedmf"
command: "srun julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $BENCHMARK_CONFIG_PATH/amip_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/amip/amip_diagedmf_artifacts/*"
env:
BUILD_HISTORY_HANDLE: ""
CLIMACOMMS_DEVICE: "CPU"
agents:
slurm_ntasks_per_node: 64
slurm_nodes: 1
slurm_mem_per_cpu: 4GB

- group: "GPU benchmarks"
steps:
- label: "GPU ClimaAtmos with diagnostic EDMF"
key: "gpu_climaatmos_diagedmf"
command: "srun julia --threads=3 --color=yes --project=test/ test/component_model_tests/climaatmos_standalone/atmos_driver.jl --config_file $BENCHMARK_CONFIG_PATH/gpu_climaatmos_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/climaatmos/gpu_climaatmos_diagedmf_artifacts/*"
agents:
slurm_gpus_per_task: 1
slurm_cpus_per_task: 4
slurm_ntasks: 2
slurm_mem: 16GB

- label: "GPU AMIP with diagnostic EDMF"
key: "gpu_amip_diagedmf"
command: "srun julia --threads=3 --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $BENCHMARK_CONFIG_PATH/gpu_amip_diagedmf.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_amip_diagedmf_artifacts/*"
agents:
slurm_gpus_per_task: 1
slurm_cpus_per_task: 4
slurm_ntasks: 2
slurm_mem: 16GB

- group: "Generate output table"
steps:
- label: "Compare AMIP/Atmos-only with diagnostic EDMF"
key: "compare_amip_climaatmos_amip_diagedmf"
command: "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/user_io/benchmarks.jl --cpu_run_name_coupled amip_diagedmf --cpu_run_name_atmos climaatmos_diagedmf --gpu_run_name_coupled gpu_amip_diagedmf --gpu_run_name_atmos gpu_climaatmos_diagedmf --mode_name amip --build_id $BUILDKITE_BUILD_NUMBER"
artifact_paths: "experiments/AMIP/output/compare_amip_climaatmos_amip_diagedmf/*"
depends_on:
- "climaatmos_diagedmf"
- "amip_diagedmf"
- "gpu_climaatmos_diagedmf"
- "gpu_amip_diagedmf"

- label: ":envelope: Slack report: CPU/GPU AMIP/Atmos-only table"
depends_on:
- "compare_amip_climaatmos_amip_diagedmf"
command:
- slack-upload -c "#coupler-report" -f experiments/AMIP/output/compare_amip_climaatmos_amip_diagedmf/table.txt -m txt -n compare_amip_climaatmos_amip_diagedmf_table -x "Coupler CPU/GPU Comparison Table"
7 changes: 3 additions & 4 deletions .buildkite/longruns/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ env:
JULIA_MAX_NUM_PRECOMPILE_FILES: 100
GKSwstype: 100
SLURM_KILL_BAD_EXIT: 1

CONFIG_PATH: "config/longrun_configs"

timeout_in_minutes: 1440
Expand Down Expand Up @@ -300,11 +299,11 @@ steps:

# DYAMOND AMIP: 1 day (convection resolving)
- label: "GPU AMIP SUPERFINE: dyamond_target"
key: "gpu_dyamond_target"
key: "gpu_longrun_amip_dyamond"
command:
- echo "--- Run simulation"
- "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $CONFIG_PATH/gpu_dyamond_target.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_dyamond_target_artifacts/*"
- "julia --color=yes --project=experiments/AMIP/ experiments/AMIP/coupler_driver.jl --config_file $CONFIG_PATH/gpu_longrun_amip_dyamond.yml"
artifact_paths: "experiments/AMIP/output/amip/gpu_longrun_amip_dyamond_artifacts/*"
agents:
queue: clima
slurm_mem: 20GB
Expand Down
5 changes: 2 additions & 3 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@ env:
GKSwstype: 100
SLURM_KILL_BAD_EXIT: 1

CONFIG_PATH: "config/model_configs"
CONFIG_PATH: "config/ci_configs"
PERF_CONFIG_PATH: "config/perf_configs"
MPI_CONFIG_PATH: "config/mpi_configs"

timeout_in_minutes: 240

Expand Down Expand Up @@ -81,7 +80,7 @@ steps:
steps:
- label: "MPI Regridder unit tests"
key: "regridder_mpi_tests"
command: "srun julia --color=yes --project=test/ test/mpi_tests/regridder_mpi_tests.jl --config_file $MPI_CONFIG_PATH/regridder_mpi.yml"
command: "srun julia --color=yes --project=test/ test/mpi_tests/regridder_mpi_tests.jl --config_file $CONFIG_PATH/regridder_mpi.yml"
timeout_in_minutes: 20
env:
CLIMACORE_DISTRIBUTED: "MPI"
Expand Down
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ authors = ["CliMA Contributors <clima-software@caltech.edu>"]
version = "0.0.1"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
ClimaComms = "3a4d1b5c-c61d-41fd-a00a-5873ba7a1b0d"
ClimaCore = "d414da3d-4745-48bb-8d80-42e94e092884"
ClimaCoreTempestRemap = "d934ef94-cdd4-4710-83d6-720549644b70"
Expand All @@ -23,6 +24,7 @@ Thermodynamics = "b60c26fb-14c3-4610-9d3e-2d17fe7ff00c"
ClimaComms = "0.5.6"
ClimaCore = "0.13"
ClimaCoreTempestRemap = "0.3"
CUDA = "5"
Dates = "1"
DocStringExtensions = "0.8, 0.9"
JLD2 = "0.4"
Expand Down
16 changes: 16 additions & 0 deletions config/benchmark_configs/amip_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FLOAT_TYPE: "Float32"
anim: false
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_diagedmf.yml"
dt_cpl: 120
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
energy_check: false
job_id: "amip_diagedmf"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "amip_diagedmf"
start_date: "19790301"
t_end: "12hours"
turb_flux_partition: "CombinedStateFluxes"
30 changes: 30 additions & 0 deletions config/benchmark_configs/climaatmos_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FLOAT_TYPE: "Float32"
approximate_linear_solve_iters: 2
dt: 120secs
dt_cloud_fraction: 1hours
dt_rad: 1hours
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
dz_bottom: 30.0
dz_top: 3000.0
edmfx_detr_model: "Generalized"
edmfx_entr_model: "Generalized"
edmfx_nh_pressure: true
edmfx_sgs_diffusive_flux: true
edmfx_sgs_mass_flux: true
edmfx_upwinding: first_order
h_elem: 30
idealized_insolation: false
implicit_diffusion: true
job_id: "climaatmos_diagedmf"
moist: equil
output_default_diagnostics: false
precip_model: 0M
prognostic_tke: true
rad: allskywithclear
surface_setup: DefaultMoninObukhov
t_end: 12hours
toml: [toml/diagnostic_edmfx_box.toml]
turbconv: diagnostic_edmfx
z_elem: 63
z_max: 55000.0
16 changes: 16 additions & 0 deletions config/benchmark_configs/gpu_amip_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FLOAT_TYPE: "Float32"
anim: false
atmos_config_file: "config/gpu_configs/gpu_aquaplanet_diagedmf.yml"
dt_cpl: 120
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
energy_check: false
job_id: "gpu_amip_diagedmf"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "gpu_amip_diagedmf"
start_date: "19790301"
t_end: "12hours"
turb_flux_partition: "CombinedStateFluxes"
30 changes: 30 additions & 0 deletions config/benchmark_configs/gpu_climaatmos_diagedmf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
FLOAT_TYPE: "Float32"
approximate_linear_solve_iters: 2
dt: 120secs
dt_cloud_fraction: 1hours
dt_rad: 1hours
dt_save_state_to_disk: "Inf"
dt_save_to_sol: "Inf"
dz_bottom: 30.0
dz_top: 3000.0
edmfx_detr_model: "Generalized"
edmfx_entr_model: "Generalized"
edmfx_nh_pressure: true
edmfx_sgs_diffusive_flux: true
edmfx_sgs_mass_flux: true
edmfx_upwinding: first_order
h_elem: 30
idealized_insolation: false
implicit_diffusion: true
job_id: "gpu_climaatmos_diagedmf"
moist: equil
output_default_diagnostics: false
precip_model: 0M
prognostic_tke: true
rad: allskywithclear
surface_setup: DefaultMoninObukhov
t_end: 12hours
toml: [toml/diagnostic_edmfx_box.toml]
turbconv: diagnostic_edmfx
z_elem: 63
z_max: 55000.0
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ dt_cpl: 50
dt_save_state_to_disk: "0.5days"
dt_save_to_sol: "0.5days"
energy_check: false
job_id: "gpu_dyamond_target"
job_id: "gpu_longrun_amip_dyamond"
land_albedo_type: "map_temporal"
mode_name: "amip"
mono_surface: false
monthly_checkpoint: false
run_name: "gpu_dyamond_target"
run_name: "gpu_longrun_amip_dyamond"
start_date: "19790301"
t_end: "1days"
turb_flux_partition: "CombinedStateFluxes"
4 changes: 2 additions & 2 deletions experiments/AMIP/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

julia_version = "1.10.2"
manifest_format = "2.0"
project_hash = "c00c8204c76db2774e82408096e51d91be9ef6bf"
project_hash = "36cae8e3da41534867db0a0941600724a7517b72"

[[deps.ADTypes]]
git-tree-sha1 = "016833eb52ba2d6bea9fcb50ca295980e728ee24"
Expand Down Expand Up @@ -401,7 +401,7 @@ uuid = "d934ef94-cdd4-4710-83d6-720549644b70"
version = "0.3.14"

[[deps.ClimaCoupler]]
deps = ["ClimaAtmos", "ClimaComms", "ClimaCore", "ClimaCoreTempestRemap", "ClimaLand", "ClimaParams", "Dates", "DocStringExtensions", "Insolation", "JLD2", "NCDatasets", "Plots", "SciMLBase", "StaticArrays", "Statistics", "SurfaceFluxes", "TempestRemap_jll", "Thermodynamics"]
deps = ["CUDA", "ClimaComms", "ClimaCore", "ClimaCoreTempestRemap", "Dates", "DocStringExtensions", "JLD2", "NCDatasets", "Plots", "SciMLBase", "StaticArrays", "Statistics", "SurfaceFluxes", "TempestRemap_jll", "Thermodynamics"]
path = "../.."
uuid = "4ade58fe-a8da-486c-bd89-46df092ec0c7"
version = "0.0.1"
Expand Down
Loading

0 comments on commit 5be2abd

Please sign in to comment.