Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake conversion Fig3-GlobalTimeseries #344

Merged
merged 5 commits into from
Aug 4, 2024

Conversation

rbeucher
Copy link
Contributor

@rbeucher rbeucher commented Jun 6, 2024

Following the discussion in issue #313, we propose converting the recipes to use Intake, given that the Cookbook is no longer supported and the ACCESS-NRI Intake catalog is now available.

A few months ago, @max-anu began working on this transition. This pull request contains the changes @max-anu made to the notebook specified in the title.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@julia-neme julia-neme self-requested a review July 22, 2024 06:11
@julia-neme
Copy link
Collaborator

What is expdata? I can't run this because the module is not available. Maybe it is outdated, do I need to find another way of opening and concatenating all cycles?

@AndyHoggANU
Copy link
Contributor

It is an old system we used for the GMD paper.
But I simplified that in this example:
https://github.com/COSIMA/cosima-recipes/blob/main/ACCESS-OM2-GMD-Paper-Figs/Fig4-DrakePassageTransport.ipynb
So maybe follow my lead there to replace exptdata methods?

@adele-morrison
Copy link
Collaborator

Do we actually think it's worth converting all these notebooks from the Kiss et al. 2020 paper to intake? I thought they were more here for completeness and historical records. I don't think they're really still used?

@julia-neme
Copy link
Collaborator

I don't think they are in use, but I think that if we choose to keep them in the repo they should be runnable.

@adele-morrison
Copy link
Collaborator

Is it worth considering archiving them to zenodo or similar?

@AndyHoggANU
Copy link
Contributor

I think they are good as extended contributed examples, so would be nice to keep some of them. Especially as we start to evaluate OM3 ...

@julia-neme
Copy link
Collaborator

julia-neme commented Jul 29, 2024

I am not finding this conversion to intake easy.... could someone give me a hand understanding these errors? @rbeucher
image

RuntimeError: NetCDF: Not a valid ID

@anton-seaice
Copy link
Collaborator

RuntimeError: NetCDF: Not a valid ID

This normally means we need to set threads_per_worker=1 for the dask client. See #409

@rbeucher
Copy link
Contributor Author

Sorry @julia-neme . Hopefully we can make it easier soon.

@julia-neme
Copy link
Collaborator

I have managed to load all the scalars via intake. However, the figure I am getting is not the same as in Kiss et al. (2020). I'm not sure what I'm doing wrong, I've just changed the loading and added one more cycle.... everything else is the same. It is also extremely slow to run and I am using XXlarge

Maybe @aekiss @AndyHoggANU @rbeucher have some idea why?

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 30, 2024

Hi @julia-neme. The old (cosima-cookbook) and new (intake) versions of this recipe are loading different data, which is why the plots are different. Someone that knows about the history of these experiments will need to comment on this (@aekiss, @AndyHoggANU). For their reference:

The cc version uses experiments:

  • 1deg_jra55v13_iaf_spinup1_B1
  • 025deg_jra55v13_iaf_gmredi6
  • 01deg_jra55v13_iaf

giving

Screenshot 2024-07-30 at 12 11 36 PM

The intake version uses experiments:

  • 1deg_jra55_iaf_omip2_cycle.*
  • 025deg_jra55_iaf_omip2_cycle.*
  • 01deg_jra55v140_iaf.*

giving

Screenshot 2024-07-30 at 12 34 54 PM

Regarding the timing, the relevant baseline is the time taken before changing to the intake catalog. For me, the old version took ~7 mins to run (bearing in mind that this PR also changes the experiments being used, so we're not really comparing apples in comparing the timing of the old and new versions). The big reason your recipe is slow at the moment is because the chunking of the data has not been considered. By default, both the cookbook and the intake catalog open the data using the netcdf chunking. This chunking is particularly sub-optimal for this recipe as the scalar data are chunked in time. Simply adding chunks={"time": -1} when opening the data speeds things up a lot. When I add this to xarray_open_kwargs in the to_dask call in this PR, I can run the recipe to generate the plots in ~5.5 mins (remembering that this new version also includes more 01deg data than the old version).

@julia-neme
Copy link
Collaborator

That's great @dougiesquire thank you so much. I'll try to keep in mind all the intake tips I'm getting jaja, apologies for all my problems.

The intake catalog doesn't have the original experiments. So I guess maybe keeping these notebooks here, in a current runnable version is not possible...

@dougiesquire
Copy link
Collaborator

@julia-neme would you like me to push the changes I made to the notebook in this PR to generate the above plots in 5.5 mins?

@julia-neme
Copy link
Collaborator

On one hand yes, but on the other hand I think this is supposed to be a reproduction of the Kiss et al. (2020) figures. If we can't do that reproduction anymore because the experiments are not in intake, I'm not sure we want to update these at all... or even whether it makes sense to have them.

@AndyHoggANU @adele-morrison @navidcy what do you think?

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 30, 2024

The intake catalog doesn't have the original experiments. So I guess maybe keeping these notebooks here, in a current runnable version is not possible...

We can add those experiments to the catalog if they're important to people? Data requests can be made here

@anton-seaice
Copy link
Collaborator

The intake catalog doesn't have the original experiments. So I guess maybe keeping these notebooks here, in a current runnable version is not possible...

I think it makes sense to move to the newer experiments, as those are the ones we use and refer to now.

I'm not sure the goal is to literally replicate the exact results in the paper, I think the goal is to have the same figures available (and up to date)?

@aekiss
Copy link
Contributor

aekiss commented Jul 30, 2024

Agreed, I think it's worthwhile having this as an example, even if it doesn't reproduce the paper figure - thanks for getting it working!

@aekiss
Copy link
Contributor

aekiss commented Jul 30, 2024

It would be good to widen the y axis limits in panel (a) so we can see the complete 0.25° and 0.1° timeseries.

@aekiss
Copy link
Contributor

aekiss commented Jul 30, 2024

The differences between the 0.25° runs in panels (a) is a bit surprising. The initial condition is nearly 0.1°C warmer in the new run and the drift also seems a bit stronger.

I haven't thought of a plausible explanation for this.

The new run fixed an initial condition bug, but the resulting temperature change seems too small to explain it, and also doesn't explain why the initial difference at 1° is smaller and of the opposite sign.

The new config uses the updated bathymetry from input_20201102 (which is itself not the latest - see COSIMA/access-om2#265), so the initial state is based on WOA13 data interpolated to a different set of points. But the new 0.25° is generally somewhat deeper, which I'd expect would reduce the global mean temperature.

@AndyHoggANU
Copy link
Contributor

Yeah, I think we should use more recent runs, but retain the style of plot that we used in the original GMD paper. The key is that people may want to use these as examples in the future, so it’s important that they work — and it’s not important for the intake catalog to include very old experiments that we won’t use for science any more …

@navidcy
Copy link
Collaborator

navidcy commented Jul 30, 2024

We can update the description of this directory in the README to say something along yhe lines of “reproduce figures in the same style as the Kiss et al. (2020) paper using different model output”

@dougiesquire
Copy link
Collaborator

I've just pushed the minor changes I made to speed things up and finish making the plots. Note, I also added a .load() to the global_scalar function - these data are tiny, so it doesn't help to delay the calculations here. This brought the run time down to ~3.5 mins, so ~2x faster than the original.

@julia-neme julia-neme merged commit 1907aaf into main Aug 4, 2024
2 of 3 checks passed
@julia-neme julia-neme deleted the INTAKE_Fig3-GlobalTimeseries branch August 4, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants