Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intake conversion NearestNeighbourDistance #363

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rbeucher
Copy link
Contributor

@rbeucher rbeucher commented Jun 6, 2024

Following the discussion in issue #313, we propose converting the recipes to use Intake, given that the Cookbook is no longer supported and the ACCESS-NRI Intake catalog is now available.

A few months ago, @max-anu began working on this transition. This pull request contains the changes @max-anu made to the notebook specified in the title.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@navidcy
Copy link
Collaborator

navidcy commented Jul 19, 2024

This is ready to be reviewed. Any volunteers @COSIMA/cosima-contrib?

@navidcy
Copy link
Collaborator

navidcy commented Jul 19, 2024

Ready to review!

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    import pandas as pd

Remove as not used


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    client = Client("tcp://10.6.48.33:8786")

Curious as to what this means/why we need it? Is it better than dong Client()? I get this error when running it:

OSError: Timed out trying to connect to tcp://10.6.48.33:8786 after 30 s

Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this with a "small" ARE session crushes the kernel. I think this is because it is not efficient to try to open and then subset the times...


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    cat_subset = catalog['01deg_jra55v140_iaf_cycle4']

Already run in the cell above.


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@julia-neme julia-neme Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    var_search = var_search.search(path=var_search.df['path'][0])

What does this do/why we need it?


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hack to only load this data from one file (it doesn't change in time, but is saved multiple time in the model output)

Its equivalent to the n=1 argument supplied to getvar in the cookbook version

@julia-neme
Copy link
Collaborator

I have only commented on the intake part since this is what this Issue is about, but I'm wondering whether we want to convert this to model agnostic/cfxarray sort of example.

But my biggest concern is that I tried to run it using a small ARE session, and my kernel crashed trying to open the data (3rd cell). It is perfectly fine to open it with the cosima_cookbook. Related to my test here.

@navidcy
Copy link
Collaborator

navidcy commented Jul 22, 2024

But my biggest concern is that I tried to run it using a small ARE session, and my kernel crashed trying to open the data (3rd cell). It is perfectly fine to open it with the cosima_cookbook. Related to my test here.

I'm cc'ing here @rbeucher, @anton-seaice, @dougiesquire, @AndyHoggANU in case they want to comment on this?

Copy link
Collaborator

@julia-neme julia-neme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in ReviewNB

@dougiesquire
Copy link
Collaborator

dougiesquire commented Jul 22, 2024

I'm cc'ing here @rbeucher, @anton-seaice, @dougiesquire, @AndyHoggANU in case they want to comment on this?

I think this is due to the way the CICE data are structured (with the 2D coordinates saved in every file). The issue has been brought up in the past on the forum in the context of the cookbook - see here. There are two solutions described in that topic:

  • open with decode_coords = False which will drop the 2D coords - this is what was being done with the cookbook before this PR (hence why it worked @julia-neme), but the kwarg was dropped in this PR (it's still actually mentioned in a comment)
  • open with compat="override", data_vars="minimal", coords="minimal" to skip coordinate verification - this is the better approach as you'll keep the 2D coords in your returned Dataset.

So, pass:

xarray_combine_by_coords_kwargs=dict(
    compat="override",
    data_vars="minimal",
    coords="minimal"
)

to your to_dask() calls. I've tried this and the nodebook runs on a small ARE instance.

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@anton-seaice anton-seaice Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As dougie noted, adding the xarray_combine_by_coords_kwargs would be better than using decode_coords as referred to in this note. This note needs updating along with the cell below.


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@anton-seaice anton-seaice Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest getting rid of the cat subset line and merging into the var_search line.

and changing darray to ds because its a dataset not a data array

This would be similar to cell 4 in https://github.com/COSIMA/cosima-recipes/blob/4fd51150331ebb1669fdb8bc58ff4ac16b313b0e/Examples/Bathymetry.ipynb


Reply via ReviewNB

@@ -31,7 +31,9 @@
},
Copy link
Collaborator

@anton-seaice anton-seaice Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the var_ice chunks larger to get rid of these warnings I think?


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants