Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use xarray functionality in Documented Examples #321

Open
navidcy opened this issue Feb 25, 2024 · 4 comments
Open

Use xarray functionality in Documented Examples #321

navidcy opened this issue Feb 25, 2024 · 4 comments
Assignees
Labels

Comments

@navidcy
Copy link
Collaborator

navidcy commented Feb 25, 2024

Can we replace the for loop in cell 8 here:

https://cosima-recipes.readthedocs.io/en/latest/DocumentedExamples/Meridional_heat_transport.html#Method-2:-using-from-the-net-surface-heat-flux-(assuming-steady-state)

with some xarray-dask friendly functionality?

cc @dhruvbhagtani, @angus-g

@navidcy navidcy added the ❓ question Further information is requested label Feb 25, 2024
@angus-g
Copy link
Contributor

angus-g commented Feb 26, 2024

Yeah, that loop looks a bit suspect to me... Its performance is O(N^2)-ish, because the cumulative sum is recomputing the entire array south of the current loop index.

I think it should be something like the following:

shflux = cc.querying.getvar(experiment, 'net_sfc_heating', session, n=3)
shflux_am = shflux.mean('time').load()

area = cc.querying.getvar(experiment, 'area_t', session, ncfile="ocean_grid.nc", n=1)
lat  = cc.querying.getvar(experiment, 'geolat_t', session, ncfile="ocean_grid.nc", n=1)
latv = cc.querying.getvar(experiment, 'yt_ocean', session, ncfile="ocean_grid.nc", n=1)

# create left edge for bottom bin
latv_bins = np.hstack(([-90], latv.values))

# replicate original loop
mhf = (
    (shflux_am * area)
    .groupby_bins("geolat_t", latv_bins)
    .sum()
    .cumsum()
    .rename(geolat_t_bins="yt_ocean")
)
mhf.coords["yt_ocean"] = latv

mht_method2 = mhf + (mhf.isel(yt_ocean=0) - mhf.isel(yt_ocean=-1)) / 2

This takes about 8s with 4 workers, vs. 3 min for the original method, and the answers have a relative accuracy of around 1e-5...

@angus-g angus-g added the 🐣 good first issue Good for newcomers label Feb 26, 2024
@navidcy
Copy link
Collaborator Author

navidcy commented Feb 26, 2024

22x speedup I think it's great!!
thanks @angus-g

@navidcy
Copy link
Collaborator Author

navidcy commented Feb 26, 2024

Here's another example from remap_depth(remap_dens, psi_args, psi_avg, session, model) method in the "Zonally-averaged overturning example"

Specifically,

    for ii in range(nmin, nmax):
        rho1 = rho2_zonal_mean.cf.isel(latitude=ii)
        rho1v = rho1.copy()
        z = rho1.cf['vertical']
        rho1 = rho1.rename({rho1.cf['vertical'].name: 'rho_ref'})
        rho1['rho_ref'] = np.array(rho1v)
        rho1.name = rho2_zonal_mean.cf['vertical'].name
        rho1.values = np.array(z)
        
        rho1 = rho1.isel(rho_ref = ~np.isnan(rho1.rho_ref)).drop_duplicates(dim='rho_ref', keep='first')

        rho1 = rho1.interp(rho_ref = psi_avg.cf['vertical'].values,
                           kwargs={"bounds_error": False, "fill_value": (0, 6000)})
        psi_depth[:, ii] = rho1.rename({'rho_ref': psi_avg.cf['vertical'].name})

(See #324)

@polinaesia
Copy link

I compared Angus' method with the initial one, and the results are in good agreement except at a couple of points, as seen in the screenshot:
Image
Should I push the commit using the loop created by Angus?

I also tried handling units with pint package, but an issue occurred when pint encountered degrees Celsius (in Cp). It seems pint doesn't handle Celsius well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants