Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hit when using dask to compute missing_wmo #1820

Open
1 of 2 tasks
huard opened this issue Jul 3, 2024 · 1 comment
Open
1 of 2 tasks

Performance hit when using dask to compute missing_wmo #1820

huard opened this issue Jul 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@huard
Copy link
Collaborator

huard commented Jul 3, 2024

Setup Information

  • Xclim version: 0.47
  • Python version: 3.10

Description

I noticed a very significant performance hit when running missing_wmo with and without dask. The dask version seems to slow things down, and I had to load the array to get results in a reasonable amount of time.

Steps To Reproduce

No response

Additional context

No response

Contribution

  • I would be willing/able to open a Pull Request to address this bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@huard huard added the bug Something isn't working label Jul 3, 2024
@aulemahal
Copy link
Collaborator

MWE and dask analysis:

import xclim as xc
from xclim.testing import open_dataset

# Open a dataset of a single chunk
ds = open_dataset('sdba/CanESM2_1950-2100.nc', chunks={'time': -1, 'location': -1})

pr_valid = xc.core.missing.missing_wmo(ds.pr, freq="YS")

The last line took me 115 s. But most importantly, counting the number of tasks with len(ds.pr.__dask_graph__().keys()), I see an increase from 6 to 95304. This is insane!

A probable solution would be to wrap as much as possible into single apply_ufunc or map_blocks call to aggregate the tasks. Maybe we can look into flox for help since we are applying a function along the time axis and grouping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants