Skip to content

Commit

Permalink
Testing data fetching via pooch (#62)
Browse files Browse the repository at this point in the history
### What kind of change does this PR introduce?

* Added `pooch` as a hard dependency (this can be revisited)
* Added `xhydro.testing.helpers.py` and `xhydro.testing.registry.txt`
* Added functions:
* `generate_registry`: Parses data found in package
(`xhydro.testing.data`), and adds it to the `registry.txt`
* `load_registry`: Loads installed (or custom) registry and returns
dictionary
* `populate_testing_data`: Fetches the registry and optionally caches
files at a different location (helpful for `pytest-xdist`)
* Added a pre-commit hook for validating NumPy docstrings, fixed a few
docstrings.

### Does this PR introduce a breaking change?

Yes. Testing data management is largely handled at the environment
variable level:
* `XHYDRO_DATA_DIR` can be used to override the default cache location
set for `pooch` (on *nix: `$XDG_CACHE_HOME`; Must be an absolute path)
* `XHYDRO_TESTDATA_BRANCH` can be used to override the branch that the
testing data
should be fetched from (default: `main`).

### Other information:
https://www.fatiando.org/pooch/latest/sample-data.html
Based on a similar approach taken for:
https://github.com/Ouranosinc/xclim-testdata

Typically, the loaded `pooch` registries are given dog names, so I
landed on `DEVEREAUX` as an appropriate name for this project.
  • Loading branch information
Zeitsperre authored Jan 9, 2024
2 parents ee25ba8 + 04dcc16 commit 643c2eb
Show file tree
Hide file tree
Showing 13 changed files with 306 additions and 21 deletions.
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,11 @@ repos:
- id: flake8
additional_dependencies: [ 'flake8-alphabetize', 'flake8-rst-docstrings' ]
args: [ '--config=.flake8' ]
- repo: https://github.com/numpy/numpydoc
rev: v1.6.0
hooks:
- id: numpydoc-validation
exclude: 'tests|docs/conf.py'
- repo: https://github.com/keewis/blackdoc
rev: v0.3.9
hooks:
Expand Down
14 changes: 14 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,20 @@ Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Thomas-Ch
New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Added French language support to the documentation. (:issue:`53`, :pull:`55`).
* Added a new set of functions to support creating and updating `pooch` registries, caching testing datasets from `hydrologie/xhydro-testdata`, and ensuring that testing datasets can be loaded into temporary directories.
* `xhydro` is now configured to use `pooch` to download and cache testing datasets from `hydrologie/xhydro-testdata`. (:pull:`62`).

Breaking changes
^^^^^^^^^^^^^^^^
* Added `pooch` as an installation dependency. (:pull:`62`).

Internal changes
^^^^^^^^^^^^^^^^
* Added a new module for testing purposes: `xhydro.testing.helpers` with some new functions. (:pull:`62`):
* `generate_registry`: Parses data found in package (`xhydro.testing.data`), and adds it to the `registry.txt`
* `load_registry`: Loads installed (or custom) registry and returns dictionary
* `populate_testing_data`: Fetches the registry and optionally caches files at a different location (helpful for `pytest-xdist`).
* Added a `pre-commit` hook (`numpydoc`) to ensure that `numpy` docstrings are formatted correctly. (:pull:`62`).

v0.3.0 (2023-12-01)
-------------------
Expand Down
14 changes: 14 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,14 @@ Ready to contribute? Here's how to set up ``xhydro`` for local development.
# Or, to run multiple build tests
$ tox

.. note::

Running `pytest` or `tox` will automatically fetch and cache the testing data for the package to your local cache (using the `platformdirs` library). On Linux, this is located at ``XDG_CACHE_HOME`` (usually ``~/.cache``). On Windows, this is located at ``%LOCALAPPDATA%`` (usually ``C:\Users\username\AppData\Local``). On MacOS, this is located at ``~/Library/Caches``.

If for some reason you wish to cache this data elsewhere, you can set the ``XHYDRO_DATA_DIR`` environment variable to a different location before running the tests. For example, to cache the data in the current working directory, run::

$ export XHYDRO_DATA_DIR=$(pwd)/.cache

#. Commit your changes and push your branch to GitHub::

$ git add .
Expand Down Expand Up @@ -134,6 +142,12 @@ Ready to contribute? Here's how to set up ``xhydro`` for local development.

You will have contributed your first changes to ``xhydro``!

.. warning::

If your Pull Request relies on modifications to the testing data of `xhydro`, you will need to update the testing data repository as well. As a preliminary testing measure, the branch of the testing data can be modified at testing time (from `main`) by setting the ``XHYDRO_TESTDATA_BRANCH`` environment variable to the branch name of the ``xhydro-testdata`` repository.

Be sure to consult the ReadMe found at https://github.com/hydrologie/xhydro-testdata as well.

Pull Request Guidelines
-----------------------

Expand Down
2 changes: 2 additions & 0 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ dependencies:
# Don't forget to sync changes between environment.yml, environment-dev.yml, and pyproject.toml!
# Main packages
- numpy
- pooch >=1.8.0
- pydantic >=2.0,<2.5.3 # FIXME: Remove pin once our dependencies (xclim, xscen) support pydantic 2.5.3
- statsmodels
- xarray
- xclim >=0.45.0
Expand Down
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ dependencies:
# Don't forget to sync changes between environment.yml, environment-dev.yml, and pyproject.toml!
# Main packages
- numpy
- pooch >=1.8.0\
- pydantic >=2.0,<2.5.3
- statsmodels
- xarray
- xclim >=0.45.0
Expand Down
24 changes: 22 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ dynamic = ["description", "version"]
dependencies = [
# Don't forget to sync changes between environment.yml, environment-dev.yml, and pyproject.toml!
"numpy",
"pooch>=1.8.0",
"pydantic>=2.0,<2.5.3",
"statsmodels",
"xarray",
"xclim>=0.45.0",
Expand Down Expand Up @@ -146,7 +148,8 @@ include = [
"docs/make.bat",
"tests/*.py",
"tox.ini",
"xhydro"
"xhydro",
"xhydro/testing/registry.txt"
]
exclude = [
"*.py[co]",
Expand All @@ -161,7 +164,8 @@ exclude = [
"Makefile",
"docs/_*",
"docs/apidoc/modules.rst",
"docs/apidoc/xhydro*.rst"
"docs/apidoc/xhydro*.rst",
"xhydro/testing/data/*"
]

[tool.isort]
Expand All @@ -178,6 +182,22 @@ warn_unused_configs = true
module = []
ignore_missing_imports = true

[tool.numpydoc_validation]
checks = [
"all", # report on all checks, except the below
"ES01",
"EX01",
"GL01",
"SA01"
]
exclude = [
# don't report on objects that match any of these regex
'\.undocumented_method$',
'\.__repr__$',
# any object starting with an underscore is a private object
'\._\w+'
]

[tool.pytest.ini_options]
addopts = [
"--verbose",
Expand Down
1 change: 1 addition & 0 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ setenv =
PYTHONPATH = {toxinidir}
passenv =
CI
ESMFMKFILE
COVERALLS_*
GITHUB_*
extras =
Expand Down
17 changes: 15 additions & 2 deletions xhydro/cc.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Module to compute climate change statistics using xscen functions."""
import xarray

# Special imports from xscen
from xscen import ( # FIXME: To be replaced with climatological_op once available
Expand All @@ -17,8 +18,20 @@


# FIXME: To be deleted once climatological_op is available in xscen
def climatological_op(ds, **kwargs):
"""Compute climatological operation.
def climatological_op(ds: xarray.Dataset, **kwargs: dict) -> xarray.Dataset:
r"""Compute climatological operation.
Parameters
----------
ds : xarray.Dataset
Input dataset.
\*\*kwargs : dict
Keyword arguments passed to :py:func:`xscen.aggregate.climatological_mean`.
Returns
-------
xarray.Dataset
Output dataset.
Notes
-----
Expand Down
31 changes: 16 additions & 15 deletions xhydro/indicators.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,36 +64,37 @@ def get_yearly_op(
missing_options: Optional[dict] = None,
interpolate_na: bool = False,
) -> xr.Dataset:
"""
Compute yearly operations on a variable.
"""Compute yearly operations on a variable.
Parameters
----------
ds: xr.Dataset
ds : xr.Dataset
Dataset containing the variable to compute the operation on.
op: str
op : str
Operation to compute. One of ["max", "min", "mean", "sum"].
input_var: str
input_var : str
Name of the input variable. Defaults to "streamflow".
window: int
window : int
Size of the rolling window. A "mean" operation is performed on the rolling window before the call to xclim.
This parameter cannot be used with the "sum" operation.
timeargs: dict, optional
timeargs : dict, optional
Dictionary of time arguments for the operation.
Keys are the name of the period that will be added to the results (e.g. "winter", "summer", "annual").
Values are up to two dictionaries, with both being optional.
The first is {'freq': str}, where str is a frequency supported by xarray (e.g. "YS", "AS-JAN", "AS-DEC").
It needs to be a yearly frequency. Defaults to "AS-JAN".
The second is an indexer as supported by :py:func:`xclim.core.calendar.select_time`. Defaults to {}, which means the whole year.
The second is an indexer as supported by :py:func:`xclim.core.calendar.select_time`.
Defaults to {}, which means the whole year.
See :py:func:`xclim.core.calendar.select_time` for more information.
Examples: {"winter": {"freq": "AS-DEC", "date_bounds": ['12-01', '02-28']}}, {"jan": {"freq": "YS", "month": 1}}, {"annual": {}}.
missing: str
Examples: {"winter": {"freq": "AS-DEC", "date_bounds": ["12-01", "02-28"]}}, {"jan": {"freq": "YS", "month": 1}}, {"annual": {}}.
missing : str
How to handle missing values. One of "skip", "any", "at_least_n", "pct", "wmo".
See :py:func:`xclim.core.missing` for more information.
missing_options: dict, optional
missing_options : dict, optional
Dictionary of options for the missing values' method. See :py:func:`xclim.core.missing` for more information.
interpolate_na: bool
Whether to interpolate missing values before computing the operation. Only used with the "sum" operation. Defaults to False.
interpolate_na : bool
Whether to interpolate missing values before computing the operation. Only used with the "sum" operation.
Defaults to False.
Returns
-------
Expand All @@ -105,7 +106,6 @@ def get_yearly_op(
-----
If you want to perform a frequency analysis on a frequency that is finer than annual, simply use multiple timeargs
(e.g. 1 per month) to create multiple distinct variables.
"""
missing_options = missing_options or {}
timeargs = timeargs or {"annual": {}}
Expand Down Expand Up @@ -174,7 +174,8 @@ def get_yearly_op(
and freq != "AS-DEC"
):
warnings.warn(
"The frequency is not AS-DEC, but the season indexer includes DJF. This will lead to misleading results."
"The frequency is not AS-DEC, but the season indexer includes DJF. "
"This will lead to misleading results."
)
elif (
"doy_bounds" in indexer.keys()
Expand Down
5 changes: 4 additions & 1 deletion xhydro/testing/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
"""Helpers for testing."""
"""Testing utilities and helper functions."""

from .helpers import *
from .utils import *
Loading

0 comments on commit 643c2eb

Please sign in to comment.