Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted small improvements #2609

Merged
merged 3 commits into from
Feb 2, 2024
Merged

Assorted small improvements #2609

merged 3 commits into from
Feb 2, 2024

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Jan 30, 2024

I was trying to debug the issue with a test taking forever to run on ubuntu on GitHub and I ended up cleaning a few small things. I identified the problem with the interpolating the diagnostics. I don't still don't know why this takes so long on GitHub: it is approximately 1000x slower than everywhere else. I verified that the package versions are identical (between a "good" run and a "bad" one). In any case, we don't need any diagnostic for the specific test, so I disabled them.

The main real change in this PR is to reduce the number of syncs that are forced for NetCDF files. sync (= flush) was introduced because when running on a GPU, the NetCDF file will be buffered until the end of the run. This was problematic for simulations that are interrupted for any reason (all the data is lost).

Now, sync is called once per cycle, ie, once all the diagnostics are called, accumulated, and reduced.

Working (ie, fast) build: https://github.com/CliMA/ClimaAtmos.jl/actions/runs/7759147548/job/21162571999

@Sbozzolo Sbozzolo force-pushed the gb/test_ci branch 5 times, most recently from bc0cf37 to ccab9c0 Compare January 30, 2024 23:02
@Sbozzolo Sbozzolo force-pushed the gb/test_ci branch 2 times, most recently from 540ac34 to 2da4805 Compare February 2, 2024 17:04
@Sbozzolo Sbozzolo changed the title Debugging CI timeout Assorted small improvements Feb 2, 2024
@Sbozzolo Sbozzolo marked this pull request as ready for review February 2, 2024 17:37
@charleskawczynski
Copy link
Member

I think we should enable a way to statically elide the diagnostics (#2598). I like a handful of changes in this PR (using ClimaCore API for getproperty calls, GHA timeout times), but the diagnostic pieces are pretty different, and I think need more though, since the flame graph is showing a pretty big perf hit.

Can you split the PR up? I'm happy to get non-controversial parts in asap.

This value is not important for the test, but it seems to cause problems
for GitHub actions on Ubuntu (it's extremely slow).
If these jobs run for more than 30 minutes, there's something wrong
@charleskawczynski charleskawczynski added this pull request to the merge queue Feb 2, 2024
Merged via the queue into main with commit 20cb006 Feb 2, 2024
9 of 10 checks passed
@charleskawczynski charleskawczynski deleted the gb/test_ci branch February 2, 2024 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants