Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush frequency in yaml outputs #2766

Open
ndkeen opened this issue Mar 21, 2024 · 4 comments
Open

Flush frequency in yaml outputs #2766

ndkeen opened this issue Mar 21, 2024 · 4 comments

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Mar 21, 2024

I recently had a case running (happened to be PPE member) that was beyond second day and I needed to cancel it, thinking we already had the data written for that second day. Afterwards, looking at data, the file is there, but empty.

In atm.log, it does indicate we are "done" with the file:

[EAMxx::output_manager] - Writing model-output:
[EAMxx::output_manager]      FILE: output.scream.AutoCal.daily_avg_cosp_ne30pg2.AVERAGE.nhours_x24.2016-08-07-00000.nc
[EAMxx::scorpio_output] Writing variables to file
  file name: output.scream.AutoCal.daily_avg_cosp_ne30pg2.AVERAGE.nhours_x24.2016-08-07-00000.nc
  Done! Elapsed time: 0.004000 seconds
Atmosphere step = 6048
  model start-of-step time = 2016-08-08 00:00:00

Atmosphere step = 6049
  model start-of-step time = 2016-08-08 00:01:40

@bartgol explains that it might be scorpio not flushing and we have some control by adding flush_frequency: 1 to the yamls.

There must be some perf impact of doing this, but unless it's severe, I would think we would generally want this?
Could it actually explain why some of the data from Cess sims are missing?

@bartgol
Copy link
Contributor

bartgol commented Mar 21, 2024

@jayeshkrishna do you know how big of an impact we'd have if we flushed the output file after every write? I'm assuming it's non negligible, but maybe still relatively small?

Edit: I don't mean "after each write_darray call", but rather "after all the write_darray and put_var calls within a timestep"...

@AaronDonahue
Copy link
Contributor

@bartgol , @jayeshkrishna I want to bring this issue back to life. We had a discussion about this in the eval call today.

@AaronDonahue
Copy link
Contributor

@crterai can you comment briefly on how this impacted the CESS sims?

@crterai
Copy link
Contributor

crterai commented Jul 17, 2024

We had portions of the Cess production run that we're having to re-run because we are missing outputs from certain periods. We got this when the model crashed pretty close to a restart write and one of the output files was still filling up but hadn't flushed. And example is in
/lustre/orion/cli115/proj-shared/noel/e3sm_scratch/cess-oct2/cess-control.ne1024pg2_ne1024pg2.F2010-SCREAMv1.cess-oct2/run

  • On 2020-04-20-03600 the output.scream.Cess.hourly2DVars. output stream started writing a new file.
  • On 2020-04-22-00000 a restart was written, but the output.scream.Cess.hourly2DVars. output stream wasn't flushed.
  • On 2020-04-22 17:21:40 the model crashed and output.scream.Cess.hourly2DVars.INSTANT.nhours_x1.2020-04-20-03600.nc remained empty because the output hadn't flushed.
  • When we went to restart the model, we started on 2020-04-22-00000. At that point, we started writing to a new file output.scream.Cess.hourly2DVars.INSTANT.nhours_x1.2020-04-22-03600.nc. That left us missing the data for 2020-04-20-03600 to 2020-04-22-00000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants