Fast MBCn (a la groupies) #1580

coxipi · 2024-01-09T17:32:33Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
Tests for the changes have been added (for bug fixes / features)
- (If applicable) Documentation has been added / updated (for bug fixes / features)
CHANGES.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

New MBCn TrainAdjust class. The train part finds adjustment factors for the npdf transform. The adjust part does the rest.

A single numpy function to perform all rotations of the npdf_transform makes the process faster
Grouping is handled using the same logic as in numpy_groupies. I initially tried to stop using map_blocks by using what I call a the Big Dataset (BD) solution. It was a dataset that included the group windowed blocks. This was working well but sometimes caused dask workers to die. Maybe a better chunking could have solved this problem. But instead of constructing a BD, we simply loop over blocks, and simply specify time indices in each block (à la groupies) in the original datasets. The resulting code is a bit more messy, but it seems to be working well performance-wise.

The function also changes how windowed group blocks are handled throughout the computation. Now, a block is preserved its form from begin to start of the MBCn computation.

This is in contrast to the current way which was grouping and ungrouping block between each iteration of the NpdfTransform.
The standardization is performed on a block
The univariate bias correction is maintainted as blocks, reordered, then the blocks are ungrouped
In the sdba notebook, it was suggested that we should give the univariate bias corrected datasets in the npdf transform. But following (Cannon, 2018), we should input the raw datasets in the npdf transform. This change should not really matter that much, but still, to perform exactly the MBCn as presented by Cannon, this change is necessary.

All these changes will result in a different output for window>1 and our implementation should now match that of Cannon.

Does this PR introduce a breaking change?

No

Other information

It might be worthwhile to retest map_blocks to see if, with the rest of changes, it can offer a good performance. It would be cleaner code
Using BD would also simplify many things, worth re-exploring if it can maintain the performance

aulemahal · 2024-07-19T17:25:56Z

Woups. I didn't mean to approve, only to comment.

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

docs/notebooks/sdba.ipynb

aulemahal

This is an actual approval.

Grosse et belle job @coxipi! Deux morceaux de robots : 🤲 🤖 🤖.

xclim/sdba/_adjustment.py

coxipi · 2024-07-19T21:18:39Z

Nice!

Merci pour la review qui a dû demander du jus aussi! J'essaie de faire plus de PRs moins grosses...

coxipi added 30 commits December 7, 2023 18:44

npdf_transform relying more on numpy

5fce024

Fix check rot_matrices is None in NpdfTransform

60c58f1

allow rotation_matrices input in fast_npdf (for real)

c915cd7

not None -> None (duh)

4a5fcb2

modular training/adjustment

2767b56

handle default kws with .setdefault

3351727

put back None -> {}

9221fac

convert quantiles to array

c00558e

npdf_adj 'sim' dataset with movingwin dim

5c0041d

fusion _single_qdm and single_qdm (much less code)

a05eefb

forgotten indices

674e1b0

fixes to fast_npdf_adjust support for movingwin

55b4e03

opti loop order, fix slice bug np, (2,1,10950) -> (10950,1)

58903cd

_adj -> _adjust

351f73e

Minimal working example, fast npdf a la xr

254f4d4

npdf_train/adjust methods for NpdfTransform

f0a8ab3

fix ungrouping (group -> group.name)

7308f20

optimize interp / quantile computation

ae5bc2d

assign correct values to quantiles in af_q

0c50fc1

use map_blocks

8288c92

revert to no map_blocks

388995f

adjust now performs reordering

e4ebf64

map_blocks in training

dce3241

map_blocks for adjust too

f0fb4dc

fixed train map_blocks, adjust map_blocks still problematic

62aad22

remove map_blocks, chunk group dimension (e.g. doy)

bd0a415

control number of points in group chunks

f6a1a5e

rotations in numpy call

a01e465

train with map groups

660b0ec

loop over groups

d2e00ea

coxipi and others added 11 commits July 19, 2024 13:49

format docstrings

bfc6866

Merge branch 'npdf_gpies' of github.com:Ouranosinc/xclim into npdf_gpies

3956044

jitter_under_thresh defaults to None

605f068

simple harmonize_units & better docstrings

2232ea8

rem list comprehension

2868ae1

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

directly code adjustment factor

a734038

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

better doc part1

ce44d8a

Co-authored-by: Pascal Bourgault <bourgault.pascal@ouranos.ca>

improve doc part2

385cdab

Merge branch 'npdf_gpies' of github.com:Ouranosinc/xclim into npdf_gpies

5f6e7f5

fast path _harmonize_units_multivariate

9043d5d

test harmonize_units

53380ac

aulemahal reviewed Jul 19, 2024

View reviewed changes

docs/notebooks/sdba.ipynb Show resolved Hide resolved

coxipi added 3 commits July 19, 2024 16:35

fillna(-1) before selecting group indexes

fcdaf86

update doc in nb

9d46a6a

Merge branch 'main' of github.com:Ouranosinc/xclim into npdf_gpies

aa311c2

aulemahal approved these changes Jul 19, 2024

View reviewed changes

github-actions bot added the approved Approved for additional tests label Jul 19, 2024

coxipi commented Jul 19, 2024

View reviewed changes

xclim/sdba/_adjustment.py Outdated Show resolved Hide resolved

rem unintended change in DQM docstring

e4f248f

coxipi and others added 7 commits July 22, 2024 13:40

Merge branch 'main' of github.com:Ouranosinc/xclim into npdf_gpies

aa48579

Merge branch 'npdf_gpies' of github.com:Ouranosinc/xclim into npdf_gpies

c7c93c1

Merge branch 'main' into npdf_gpies

f6cd662

Merge branch 'main' into npdf_gpies

a9fc293

multivar da need units in test (no skip check)

13ce8ab

_quantile: default behaviour == np.nanquantile & better error msg

2116c63

don't see units for multivariate scen

02f3c55

coxipi merged commit 1d91900 into main Jul 24, 2024
19 checks passed

coxipi deleted the npdf_gpies branch July 24, 2024 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast MBCn (a la groupies) #1580

Fast MBCn (a la groupies) #1580

coxipi commented Jan 9, 2024 •

edited

Loading

aulemahal commented Jul 19, 2024

aulemahal left a comment

coxipi commented Jul 19, 2024

Fast MBCn (a la groupies) #1580

Fast MBCn (a la groupies) #1580

Conversation

coxipi commented Jan 9, 2024 • edited Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information

aulemahal commented Jul 19, 2024

aulemahal left a comment

Choose a reason for hiding this comment

coxipi commented Jul 19, 2024

coxipi commented Jan 9, 2024 •

edited

Loading