Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of diagnostic edmf #2868

Open
charleskawczynski opened this issue Apr 3, 2024 · 5 comments
Open

Improve performance of diagnostic edmf #2868

charleskawczynski opened this issue Apr 3, 2024 · 5 comments

Comments

@charleskawczynski
Copy link
Member

Diagnostic edmf performance is slow, and we need to identify the issue and improve the performance.

@charleskawczynski
Copy link
Member Author

Here is a nsight report:

image

from this build: https://buildkite.com/clima/climaatmos-target-gpu-simulations/builds/250 (from #2846)

@charleskawczynski
Copy link
Member Author

Zooming into ldiv! and set_precomputed_quantities!

image

shows that there are many, many kernel launches in set_precomputed_quantities!. So, this is likely due to the loop in set_diagnostic_edmf_precomputed_quantities_do_integral!. We can add more NVTX annotations to confirm, but this is what I suspected and it makes sense based on the report.

@szy21
Copy link
Member

szy21 commented Apr 4, 2024

Could we look at the gpu_hs_rhoe_equil_55km_nz63_0M job in that build first? Comparing that with the one in the main branch shows a significant slowdown in set_precomputed_quantities! due to get_cloud_fraction.

@charleskawczynski
Copy link
Member Author

Yes, of course. Which two builds are we comparing? Maybe we can add the option to do set_cloud_fraction! per stage vs per step/callback, so that we can merge an example into main. I'll try to do this now.

@charleskawczynski
Copy link
Member Author

I'm not working on this, but I did open up the nvtx report with more annotations and can confirm that the issue is the large number of kernel launches in the integral function:

image

zoomed out:

image

I need to confirm (nsight systems crashed on me), but I think these ranges are on the gpu, in which case 60% of the time spent step! is in set_diagnostic_edmf_precomputed_quantities_do_integral! (for a non-radiation step!). If not, I know that when I clicked on the range that a pretty large number of kernels was highlighted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants