Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up quantiles with sorting #1513

Merged
merged 63 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from 59 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
7d752a4
Improvements to sdba _quantile function
SarahG-579462 Oct 26, 2023
2de9086
Add to vecquantiles
SarahG-579462 Oct 26, 2023
65ddbdd
fix vecquantiles?
SarahG-579462 Oct 26, 2023
719e9c6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 26, 2023
a798e8a
pre-commit and proper signature orders
SarahG-579462 Oct 26, 2023
a134bc2
Remove changes to vecquantiles
SarahG-579462 Oct 30, 2023
749a467
Reformat sdba-speed notebook
SarahG-579462 Oct 30, 2023
ea57cf5
Update AUTHORS.rst
SarahG-579462 Oct 30, 2023
7393c34
change criteria for nanquantile algorithm
SarahG-579462 Oct 30, 2023
8845208
precompilation for most nbutils
SarahG-579462 Oct 31, 2023
6e35ec4
fix escores
juliettelavoie Nov 16, 2023
88aadeb
Merge branch 'master' into speed-up-quantile
Zeitsperre Nov 16, 2023
500428f
Merge branch 'master' into speed-up-quantile
Zeitsperre Dec 12, 2023
f81f484
Merge branch 'master' into speed-up-quantile
coxipi Jan 15, 2024
5741c2d
`allow_sortquantile` option
coxipi Apr 29, 2024
006d26e
Merge branch 'main' of github.com:Ouranosinc/xclim into speed-up-quan…
coxipi Apr 30, 2024
33a79b0
time dimensions not stacked = faster quantile
coxipi Apr 30, 2024
b629e7d
Merge branch 'main' of github.com:Ouranosinc/xclim into speed-up-quan…
coxipi Apr 30, 2024
8e1e362
correct nd vs.1d condition & clean
coxipi Apr 30, 2024
e4da29e
np reshape and apply_ufunc -> avoid stack, etc
coxipi May 3, 2024
3d42f87
add fastnanquantile as option, _sort True by default
coxipi May 6, 2024
d4430d6
Merge branch 'main' of github.com:Ouranosinc/xclim into speed-up-quan…
coxipi May 6, 2024
354d78f
update doc
coxipi May 6, 2024
5c8667a
move benchmark tests to hidden folder
coxipi May 6, 2024
2842b39
update doc, ignore notebook
coxipi May 6, 2024
3a1cde8
Merge branch 'main' of github.com:Ouranosinc/xclim into speed-up-quan…
coxipi May 8, 2024
d63af58
add 1d numba-compiled implementation of core.utils._nan_quantile
SarahG-579462 May 14, 2024
915cd6a
add test_quantile.ipynb temporarily, to test numpy-level implementations
SarahG-579462 May 14, 2024
2810d86
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2024
b2e08cb
Merge branch 'main' into speed-up-quantile
coxipi Jun 6, 2024
89fdf00
Merge branch 'main' into speed-up-quantile
Zeitsperre Jun 11, 2024
11858ae
Merge branch 'main' into speed-up-quantile
coxipi Jun 17, 2024
5ab3f2d
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jun 18, 2024
7bb324e
remove _sortquantile, use utils._nan_quantile
coxipi Jun 18, 2024
8333357
_nan_quantile in vecquantiles
coxipi Jun 18, 2024
b9d2126
Merge branch 'speed-up-quantile' of https://github.com/Ouranosinc/xcl…
coxipi Jun 18, 2024
6dc179f
remove mentions of _sortquantile
coxipi Jun 18, 2024
d6c87c0
revert: np.nanquantile in vecquantiles
coxipi Jun 18, 2024
0c1ea62
quantile -> nanquantile (duh)
coxipi Jun 18, 2024
8191aa0
add issue number
coxipi Jun 18, 2024
1159cdd
public nan_quantile & add back docstring
coxipi Jun 20, 2024
8d5e4d6
Delete test_quantile.ipynb
coxipi Jun 21, 2024
c001863
remove comments
coxipi Jun 21, 2024
a6807ee
revert unwanted AUTHORS change
coxipi Jun 21, 2024
2204a80
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jun 27, 2024
3e32930
Merge branch 'main' into speed-up-quantile
Zeitsperre Jul 2, 2024
980ec94
1d numba compatible _nan_quantile in sdba
coxipi Jul 9, 2024
99c368a
Merge branch 'speed-up-quantile' of https://github.com/Ouranosinc/xcl…
coxipi Jul 9, 2024
ccc8291
test nbu.quantile
coxipi Jul 9, 2024
7905a15
conserve arr.dtype
coxipi Jul 9, 2024
d5b81c0
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jul 9, 2024
a2682ef
CHANGELOG formatting
coxipi Jul 9, 2024
1585fc0
unwanted space
coxipi Jul 9, 2024
10b13dd
respect np2 new conventions
coxipi Jul 9, 2024
53c9f23
extras dependency (fastnanquantile)
coxipi Jul 12, 2024
48b2e14
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jul 12, 2024
9206b73
extra mention of extra
coxipi Jul 12, 2024
d876376
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jul 19, 2024
0b58a78
Merge branch 'main' into speed-up-quantile
Zeitsperre Jul 19, 2024
6542128
Merge branch 'main' into speed-up-quantile
Zeitsperre Jul 19, 2024
f93e71f
Merge branch 'main' of https://github.com/Ouranosinc/xclim into speed…
coxipi Jul 19, 2024
e3d0112
Merge branch 'speed-up-quantile' of https://github.com/Ouranosinc/xcl…
coxipi Jul 19, 2024
df6963b
Merge branch 'main' into speed-up-quantile
coxipi Jul 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ jobs:
- tox-env: py310-coverage-lmoments # No markers -- includes slow tests
python-version: "3.10"
os: ubuntu-latest
- tox-env: py311-coverage-sbck
- tox-env: py311-coverage-sbck-extras
python-version: "3.11"
markers: -m 'not slow'
os: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Contributors
* David Caron `@davidcaron <https://github.com/davidcaron>`_
* Carsten Ehbrecht <ehbrecht@dkrz.de> `@cehbrecht <https://github.com/cehbrecht>`_
* Jeremy Fyke `@jeremyfyke <https://github.com/jeremyfyke>`_
* Sarah Gammon <gammon.sarah@ouranos.ca> `@SarahG-579462 <https://github.com/SarahG-579462>`_
* Sarah Gammon `@SarahG-579462 <https://github.com/SarahG-579462>`_
* Tom Keel <thomas.keel.18@ucl.ac.uk> `@Thomasjkeel <https://github.com/Thomasjkeel>`_
* Marie-Pier Labonté <labonte.marie-pier@ouranos.ca> `@marielabonte <https://github.com/marielabonte>`_
* Ludwig Lierhammer <ludwig.lierhammer@hereon.de> `@ludwiglierhammer <https://github.com/ludwiglierhammer>`_
Expand Down
6 changes: 5 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@ Changelog

v0.52.0 (unreleased)
--------------------
Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`).
Contributors to this version: David Huard (:user:`huard`), Trevor James Smith (:user:`Zeitsperre`), Éric Dupuis (:user:`coxipi`), Sarah Gammon (:user:`SarahG-579462`).

New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``xclim.sdba.nbutils.quantile`` and its child functions are now faster. If the module `fastnanquantile` is installed, it is used as the backend for the computation of quantiles and yields even faster results. (:issue:`1255`, :pull:`1513`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,7 @@ class XCStyle(AlphaStyle):
"_build",
"Thumbs.db",
".DS_Store",
"notebooks/benchmarks",
"notebooks/xclim_training",
"paper/paper.md",
"**.ipynb_checkpoints",
Expand Down
142 changes: 142 additions & 0 deletions docs/notebooks/benchmarks/sdba_quantile.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import annotations\n",
"\n",
"import time\n",
"\n",
"import numpy as np\n",
"\n",
"import xclim\n",
"from xclim import sdba\n",
"from xclim.testing import open_dataset\n",
"\n",
"ds = open_dataset(\"sdba/CanESM2_1950-2100.nc\")\n",
"tx = ds.sel(time=slice(\"1950\", \"1980\")).tasmax\n",
"kws = {\"dim\": \"time\", \"q\": np.linspace(0, 1, 50)}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tests with %%timeit (full 30 years)\n",
"\n",
"Here `fastnanquantile` is the best algorithm out of \n",
"* `xr.DataArray.quantile`\n",
"* `nbutils.quantile`, using: \n",
" * `xclim.core.utils.nan_quantile`\n",
" * `fastnanquantile`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%timeit\n",
"tx.quantile(**kws).compute()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%timeit\n",
"sdba.nbutils.USE_FASTNANQUANTILE = False\n",
"sdba.nbutils.quantile(tx, **kws).compute()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install fastnanquantile"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%timeit\n",
"sdba.nbutils.USE_FASTNANQUANTILE = True\n",
"sdba.nbutils.quantile(tx, **kws).compute()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test computation time as a function of number of points\n",
"\n",
"For a smaller number of time steps <=2000, `_sortquantile` is the best algorithm in general"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import xarray as xr\n",
"\n",
"num_tests = 500\n",
"timed = {}\n",
"# fastnanquantile has nothing to do with sortquantile\n",
"# I just added a third step using this variable\n",
"\n",
"for use_fnq in [True, False]:\n",
" sdba.nbutils.USE_FASTNANQUANTILE = use_fnq\n",
" # heat-up the jit\n",
" sdba.nbutils.quantile(\n",
" xr.DataArray(np.array([0, 1.5])), dim=\"dim_0\", q=np.array([0.5])\n",
" )\n",
" for size in np.arange(250, 2000 + 250, 250):\n",
" da = tx.isel(time=slice(0, size))\n",
" t0 = time.time()\n",
" for ii in range(num_tests):\n",
" sdba.nbutils.quantile(da, **kws).compute()\n",
" timed[use_fnq].append([size, time.time() - t0])\n",
"\n",
"for k, lab in zip([True, False], [\"xclim.core.utils.nan_quantile\", \"fastnanquantile\"]):\n",
" arr = np.array(timed[k])\n",
" plt.plot(arr[:, 0], arr[:, 1] / num_tests, label=lab)\n",
"plt.legend()\n",
"plt.title(\"Quantile computation, average time vs array size, for 50 quantiles\")\n",
"plt.xlabel(\"Number of time steps in the distribution\")\n",
"plt.ylabel(\"Computation time (s)\")"
]
}
],
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,8 @@ docs = [
"sphinxcontrib-bibtex",
"sphinxcontrib-svg2pdfconverter[Cairosvg]"
]
all = ["xclim[dev]", "xclim[docs]"]
extras = ["fastnanquantile"]
all = ["xclim[dev]", "xclim[docs]", "xclim[extras]"]

[project.scripts]
xclim = "xclim.cli:cli"
Expand Down
36 changes: 36 additions & 0 deletions tests/test_sdba/test_nbutils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from __future__ import annotations

import numpy as np
import pytest
import xarray as xr

from xclim.sdba import nbutils as nbu


class TestQuantiles:
@pytest.mark.parametrize("uses_dask", [True, False])
def test_quantile(self, open_dataset, uses_dask):
da = (
open_dataset("sdba/CanESM2_1950-2100.nc").sel(time=slice("1950", "1955")).pr
).load()
if uses_dask:
da = da.chunk({"location": 1})
else:
da = da.load()
q = np.linspace(0.1, 0.99, 50)
out_nbu = nbu.quantile(da, q, dim="time").transpose("location", ...)
out_xr = da.quantile(q=q, dim="time").transpose("location", ...)
np.testing.assert_array_almost_equal(out_nbu.values, out_xr.values)

def test_edge_cases(self, open_dataset):
q = np.linspace(0.1, 0.99, 50)

# only 1 non-null value
da = xr.DataArray([1] + [np.nan] * 100, dims="dim_0")
out_nbu = nbu.quantile(da, q, dim="dim_0")
np.testing.assert_array_equal(out_nbu.values, np.full_like(q, 1))

# only NANs
da = xr.DataArray([np.nan] * 100, dims="dim_0")
out_nbu = nbu.quantile(da, q, dim="dim_0")
np.testing.assert_array_equal(out_nbu.values, np.full_like(q, np.nan))
4 changes: 3 additions & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,9 @@ passenv =
LD_LIBRARY_PATH
SKIP_NOTEBOOKS
XCLIM_*
extras = dev
extras =
dev
extras: extras
deps =
upstream: -r CI/requirements_upstream.txt
sbck: pybind11
Expand Down
Loading
Loading