Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"custom" restart leads to init time doubling (reading restart files tenfolding) #2896

Open
mahf708 opened this issue Jul 10, 2024 · 2 comments
Labels
bug Something isn't working I/O priority:low

Comments

@mahf708
Copy link
Contributor

mahf708 commented Jul 10, 2024

Usually, a restart happens from within the same run dir the original run was happening. However, consider an alternative setup whereby we create a new run dir, move all needed files there, and then continue the run there. One would expect both to be identical in terms of perf, but that's not the case!

In the second ("custom") case, the atm init time doubles. A more detailed reading (thanks to @ndkeen) shows that the time reading of restart files (take the main scream restart file as an example) actually increasing by tenfold --- an order of magnitude.

I am filing this issue and setting it as "bug" (as it doesn't match expectation). The binary netcdf files were sent to hpss archives and recalled back, but I am inclined to think that's not going to change anything. Additionally, I am filing this issue because I think it is related to my other IO issues, so maybe it will help us narrow our search for the elusive bug...

xref #2892 #2891 #2890 #2889

@mahf708 mahf708 added bug Something isn't working I/O priority:low labels Jul 10, 2024
@bartgol
Copy link
Contributor

bartgol commented Jul 25, 2024

Does the atm.log file show that you are reading the file in the new run dir? Also, just to avoid the obvious, is the new run dir on the same filesystem as the original one?

@mahf708
Copy link
Contributor Author

mahf708 commented Jul 25, 2024

Does the atm.log file show that you are reading the file in the new run dir? Also, just to avoid the obvious, is the new run dir on the same filesystem as the original one?

Yes and yes. This is actually my automated way to recover the missing files reported in #2890, and it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working I/O priority:low
Projects
None yet
Development

No branches or pull requests

2 participants