-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use 4xA100 to achieve > 1SYPD for target EDMF AMIP #740
Comments
What's the issue with the P100? |
Apparently they're too slow and we're using this run as an SYPD benchmark. |
If you are interested, this build https://buildkite.com/clima/climacoupler-longruns/builds/628 can be fixed by asking more memory per CPU |
That's useful to know, thanks! We'll still run the benchmark on the A100 (as requested by the OKR), but this will be useful for the scaling tables (Cc'ing @juliasloan25 ). |
Update: The above results are for 200km resolution. For 100km resolution SYPD on 4xA100: between 0.8 and 1.5 (see builds). This is partly dependent on whether we use coupler/atmos diagnsotics, but removing diagnostics didn't always lead to better SYPD. We also see a large variability between runs of the same config, and even within one simulation. More thorough investigation is being performed as part of CliMA/ClimaAtmos.jl#2914. And we will be presenting a like-for-like comparison and scaling as part of #663. Notes: |
While it should be reasonably accurate, I would encourage you not to look at the SYPD printed by the progress log. That is an estimate and does not reflect the actual SYPD in some cases (e.g, first iterations, when callbacks/diagnostics are called). |
Very true, but for avoidance of doubt, this run shows we can achieve at least 380sim days in 1 day of walltime. |
Since gpu_amip_topo_target_diagedmf is our current target, we want to run it on the faster nodes: either clima's A100 or new-central's V100 / H100.
Results
Running on Clima should be sufficient.
clima A100:
new-central P100
Note
Components in PR
The text was updated successfully, but these errors were encountered: