Adding first gis module to perform geospatial operations, notebook builds #61

sebastienlanglois · 2023-12-19T00:12:58Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes Geospatial operations for hydrological analysis #60
(If applicable) Documentation has been added / updated (for bug fixes / features).
(If applicable) Tests have been added.
CHANGES.rst has been updated (with summary of main changes).
- Link to issue (:issue:number) and pull request (:pull:number) has been added.

What kind of change does this PR introduce?

This PR adds a GIS module for usual geospatial operations that are common in hydrology such a watershed delineation, watershed properties extraction, etc. It adapts the work that's been done in ravenpy while also adding some new functionalities.

Watershed Delineation

Support concurrent delineation of multiple watersheds simultaneously.
Enable access to official watershed polygons (shapefiles/geojson/geoparquet) from authoritative sources (DEH, HYDAT, USGS, HQ, etc.) —implemented collaboratively with xdatasets.

Physiographic Variable (or others) Extraction

Support simultaneous extraction of physiographic variables across multiple watersheds.
Facilitate the extraction of variables present in STAC catalogs (e.g., Planetary Computer).
Implement extraction considering pixel weighting rather than an "all_touched" approach, as this can significantly impact final results —implemented collaboratively with xdatasets.

Does this PR introduce a breaking change?

No

Other information:

This PR also integrates the changes from #65 and #68

… hydrological analysis

review-notebook-app · 2023-12-19T00:13:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sebastienlanglois · 2023-12-19T13:47:00Z

@Zeitsperre @TC-FF
I would like to prepare the .po file for the doc that I'll be adding with this PR. Is there any documentation about the process ?

Zeitsperre · 2023-12-19T15:41:28Z

@sebastienlanglois

I mentioned this to @TC-FF in my other Pull Request, but I haven't set things up to translate the fucntions and classes of the library. The reason for this is because docstrings can change much more often than the rest of the documentation. This means that po files will need to be manually touched up on every pull request that changes code.

Quickly going over the documentation on po files (https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html) It doesn't appear that the #: comments require for a message to be on particular line (which is good), but if we want to have translated docstrings, we likely need to add documentation for performing this.

@TC-FF Would you want to add a new section in CONTRIBUTING.rst explaining how to do this (with poedit)? I'll open an issue to discuss committing the docstring (autodoc) files.

sebastienlanglois · 2023-12-19T16:51:54Z

@sebastienlanglois

I mentioned this to @TC-FF in my other Pull Request, but I haven't set things up to translate the fucntions and classes of the library. The reason for this is because docstrings can change much more often than the rest of the documentation. This means that po files will need to be manually touched up on every pull request that changes code.

Quickly going over the documentation on po files (https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html) It doesn't appear that the #: comments require for a message to be on particular line (which is good), but if we want to have translated docstrings, we likely need to add documentation for performing this.

@TC-FF Would you want to add a new section in CONTRIBUTING.rst explaining how to do this (with poedit)? I'll open an issue to discuss committing the docstring (autodoc) files.

Sorry I was not super clear, I meant the documentation for the examples based on the notebooks. I agree that docstrings/classes/fonctions translation is probably too difficult to maintain. But for notebooks and .rst translation, I agree that a new section in CONTRIBUTING.rst would be useful.

…n-delineation

for more information, see https://pre-commit.ci

sebastienlanglois · 2023-12-20T15:14:05Z

@Zeitsperre
I need to have xagg in the environment running the notebooks for the RTD documentation. I've added the library to the environment-dev.yml file but I'm usure where to put it in pyproject.toml.

Is is part of dev = [...] or docs = [ ... ] ? Thanks!

environment-dev.yml

Zeitsperre · 2023-12-20T16:15:40Z

@Zeitsperre I need to have xagg in the environment running the notebooks for the RTD documentation. I've added the library to the environment-dev.yml file but I'm usure where to put it in pyproject.toml.

Is is part of dev = [...] or docs = [ ... ] ? Thanks!

So we listed xagg in the extra recipe of xdatasets. If the dependency is needed for extra functionality of xdatasets within the xhydro library, I'd add it to the pyproject.toml like so:

[project.optional-dependencies]
...
extra = ["xdatasets[extra]>=0.3.1"]

If it's exclusively for an xagg function to show off in a notebook, I'd add xagg to the docs recipe:

[project.optional-dependencies]
docs = [
  ...
  "xagg"
]

sebastienlanglois · 2023-12-20T16:24:05Z

@Zeitsperre I need to have xagg in the environment running the notebooks for the RTD documentation. I've added the library to the environment-dev.yml file but I'm usure where to put it in pyproject.toml.
Is is part of dev = [...] or docs = [ ... ] ? Thanks!

So we listed xagg in the extra recipe of xdatasets. If the dependency is needed for extra functionality of xdatasets within the xhydro library, I'd add it to the pyproject.toml like so:
[project.optional-dependencies]
...
extra = ["xdatasets[extra]>=0.3.1"]
If it's exclusively for an xagg function to show off in a notebook, I'd add xagg to the docs recipe:
[project.optional-dependencies]
docs = [
  ...
  "xagg"
]

It is indeed to use extra functionnalities of xdatasets but not within the xhydro library. In the notebook example, we first call xdatasets [extra] and then, using the results, we call xhydro. Would that use case be covered by this ? :

[project.optional-dependencies]
docs = [
  ...
  "xagg"
]

into basin-delineation

Zeitsperre · 2023-12-20T17:45:28Z

It is indeed to use extra functionnalities of xdatasets but not within the xhydro library. In the notebook example, we first call xdatasets [extra] and then, using the results, we call xhydro. Would that use case be covered by this ? :
[project.optional-dependencies]
docs = [
  ...
  "xagg"
]

That looks good to me!

sebastienlanglois · 2023-12-20T18:09:53Z

@Zeitsperre
Other question, the RTD build is failing due to excessive memory consumption that is caused by the extraction and processing of data from xdatasets. I know I could reduce the amount of data and calculations involved but at the same time, I would prefer to show a real and useful example in the documentation.

As an alternative, I was thinking that I could :

Convert my code cell to markdown
Precalculate and store the query's results from xdatasets, perhaps in xhydro-testdata or in cloud storage
Use a hidden cell the would fetch the precalculated data

In essence, this would reduce memory consumption for the RTD build while hiding the fact that the data is precalculated. We assume that the user will have more than enough (+8Gb) memory when actually testing the notebook in their own environment.

As I'm not that familiar with the RTD build, do you know if there is a way (cell tag, metadata, etc.) to hide a cell when bulding the docs while also allowing its execution ? Or alternatively, do you have a better idea that this ? Thanks !

N.B. : The same xdatasets' query runs fine when using a github runner (7 Gb RAM). I'm curious to see what are the specs for the RTD runners.

Zeitsperre · 2023-12-20T18:42:35Z

@sebastienlanglois

I think the total memory allocation for each builder is around 4GB (but it can be bumped up if you send the ReadTheDocs maintainers a message from the admin page of xhydro on RTD with some justifications). Given that it's a completely free service, I don't like the idea of asking for more memory too often (we've been granted more memory for a few projects at Ouranos).

If it isn't important to run the notebook on every change to xhydro, one thing you could do is set it so that that particular notebook is not run on ReadTheDocs. This would be as simple as doing the following:

Setting the pre-commit hook for nbstripout to not remove outputs from your particular notebook (https://github.com/kynan/nbstripout) and setting nbsphinx_execute to auto
- This means that the notebook is never run (because it always has outputs), but because the other notebooks are stripped of outputs, they are always run.

This approach is a bit problematic since it means that the notebook is never checked at all. An alternative approach could look like:

Do the same as above, but add a notebook checking build on GitHub that runs all notebooks, including the xagg-dependent one. Notebook failures will block Pull Request merges
- If the notebooks are relatively stable, this shouldn't often cause problems for contributors.
Disable notebooks from being run at all on ReadTheDocs, and have a GitHub Workflow that runs all notebooks and commits their outputs every time that a merge occurs to the main branch.
- Fancier / more complex, but leverages GitHub's more liberal memory/processing power to reduce strain on ReadTheDocs.
- Could be made simpler if the notebooks are run on a cron-based schedule (e.g. every Sunday night) or when preparing a new release (also complex, but possible).

There are a lot of ways to go about doing this. I've implemented similar approaches in xclim (see: https://github.com/Ouranosinc/xclim/blob/e678308f2c5cde75a3c63cddcab7835dab1e422e/docs/conf.py#L206). Curious to hear what you think of these.

sebastienlanglois · 2023-12-20T21:20:40Z

@Zeitsperre
Thanks, those are some really great suggestions!

Looking ahead, we might come across other notebooks that are memory-intensive, making it make sense to switch to Github Workflows. However since this might involve quite a bit of work, we could maybe start by checking out your second suggestion for now :

pre-commit hook for nbstripout + running a notebook checking build on GitHub .

So, if I get it right, when we run the notebook checking build on Github, it goes through all the notebooks. Does this imply that the notebooks not affected with nbstripout removing outputs (all notebooks except the xagg dependant one, for now) would run twice—once in the Github checking build and again during the ReadTheDocs build (assuming Github check completed successfully) ?

…evel

sebastienlanglois · 2023-12-22T00:28:35Z

@Zeitsperre
For now, I have deactivated the execution of nbsphinx only for this notebook (GIS notebook) following this approach:
https://nbsphinx.readthedocs.io/en/0.9.3/never-execute.html

Coming back from the holidays, we can set this up properly!

RondeauG

Mes commentaires que j'avais laissés sont encore valides, mais mineurs et devraient pouvoir facilement être réglés. @sebastienlanglois , quand penses-tu avoir le temps de regarder ?

sebastienlanglois · 2024-03-04T23:14:51Z

Mes commentaires que j'avais laissés sont encore valides, mais mineurs et devraient pouvoir facilement être réglés. @sebastienlanglois , quand penses-tu avoir le temps de regarder ?

Je profite de la semaine de relâche pour terminer ça! Ce sera fini d'ici la fin de la semaine sans faute!

into basin-delineation

sebastienlanglois · 2024-03-11T17:53:33Z

@Zeitsperre @RondeauG
I've updated this branch based on @RondeauG's review. It should now be ready to merge but for some reason, the notebooks testing suite is not successful and I don't know why, The GIS notebook runs correctly on my local machine (python 3.10) with a environment-dev.yml conda environment.

Zeitsperre · 2024-03-11T18:18:17Z

@sebastienlanglois The kernelspec is a weird artifact that notebooks use. I have a hook in a few other projects that simply removes that field (since if it's wrong, that causes errors; no problem if missing). I'll see if I can find the config.

Zeitsperre · 2024-03-11T18:32:27Z

@sebastienlanglois

The last thing we need to fix here is that the notebooks need to be re-run with their outputs saved. ReadTheDocs can't handle running the notebook examples, but if they are saved to the notebooks, they can parse and display them at least.

into basin-delineation

sebastienlanglois · 2024-03-11T18:46:30Z

@sebastienlanglois

The last thing we need to fix here is that the notebooks need to be re-run with their outputs saved. ReadTheDocs can't handle running the notebook examples, but if they are saved to the notebooks, they can parse and display them at least.

Thanks! That's what I figured from our previous discussions so I was surprised that it didn't work out. Now I understand that the culprit was the kernelspec property in the notebook. I will push the notebook with outputs in a few minutes and hopefully, we are done with this PR!

RondeauG

Quelques suggestions très mineures, mais ça me semble prêt ! Bravo :)

.github/workflows/main.yml

RondeauG · 2024-03-12T13:23:56Z

Makefile

+ESMF_VERSION := $(shell cat $(ESMFMKFILE) | grep "ESMF_VERSION_STRING=" | awk -F= '{print $$2}' | tr -d "'")
+install-esmpy: clean ## install esmpy from git based on installed ESMF_VERSION
+	pip install git+https://github.com/esmf-org/esmf.git@v$(ESMF_VERSION)\#subdirectory=src/addon/esmpy
+
+install: install-esmpy ## install the package to the active Python's site-packages


@Zeitsperre We can leave this here as an additional security, but this should not be required anymore (yay!)

If xESMF is badly installed or initiated, xscen will simply deactivate the functions using it. xhydro/xdatasets do not use it at all, for now.

esmf/esmpy 8.6.0 has been released and should have fixed that bug too.

If we want to, we can remove the install-esmpy step from install. This just means that we don't install the GitHub repo version when we run $ make install. I can see having that disabled as being beneficial for users in systems that don't have grep, awk, and tr installed (powershell and command prompt, probably)

Let's leave that for another PR.

Just wanted to point out that xdatasets has xesmf as a dependency because of xagg. I had initially created a spinoff xagg library with no xesmf dependency (because xesmf is used for regridding but not for spatial averaging in xagg, which is the only thing required for xdatasets) but we now have xagg as a dependency.

xesmf is required for the notebooks but might not be for the tests.

docs/index.rst

environment-dev.yml

xhydro/gis.py

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

RondeauG · 2024-03-12T16:53:59Z

@sebastienlanglois Il y a eu un mixup dans les triggers pour les tests, mais c'est bon maintenant. Tu peux merger quand tu es prêt !

sebastienlanglois · 2024-03-22T20:40:48Z

@Zeitsperre @RondeauG I know this PR is closed but just wanted to point out that xagg has changed the xesmf dependency as optional (only required for regridding). This means that xdatasets no longer has a xesmf dependency when installing xagg.

adding first gis module to perform geospatial operations required for…

ec632c4

… hydrological analysis

add documentation

bd38932

sebastienlanglois added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 19, 2023

sebastienlanglois added 2 commits December 18, 2023 19:36

add backward compatibility for type hints

629572c

change kernel in specs

a9c8fb7

sebastienlanglois and others added 4 commits December 20, 2023 09:55

add climatology extraction from watershed boundaries

99fc6b0

Merge branch 'main' of https://github.com/hydrologie/xhydro into basi…

96d53be

…n-delineation

correct kernel name

c761419

[pre-commit.ci] auto fixes from pre-commit.com hooks

0849dda

for more information, see https://pre-commit.ci

Zeitsperre reviewed Dec 20, 2023

View reviewed changes

environment-dev.yml Outdated Show resolved Hide resolved

sebastienlanglois added 2 commits December 20, 2023 11:47

complete climatology example and add correct dependencies

b10b342

Merge branch 'basin-delineation' of https://github.com/hydrologie/xhydro

f8ef52b

into basin-delineation

update gis notebook with no execution from nbsphinx at the notebook l…

6207fe1

…evel

sebastienlanglois added 4 commits December 21, 2023 19:42

correct typos

f59e9b6

precalculate map

7baaccf

test again readthedocs build

cbf3cb4

temporary fix for pydantic dependency

cb15134

RondeauG requested changes Mar 4, 2024

View reviewed changes

Zeitsperre and others added 5 commits March 8, 2024 11:13

Merge branch 'main' into basin-delineation

038ac20

update gis notebook

1128b72

Merge branch 'basin-delineation' of https://github.com/hydrologie/xhydro

cec771a

into basin-delineation

updated branch after review

b3639fb

Merge branch 'main' into basin-delineation

f31919b

sebastienlanglois added 2 commits March 11, 2024 13:56

try pipeline with striped outputs in gis notebook

ed2e8cf

update kernelspec

4d4ba4c

remove kernelspec field

32b07b5

sebastienlanglois added 2 commits March 11, 2024 14:46

remove kernelspec and add outputs to notebook

750b5d5

Merge branch 'basin-delineation' of https://github.com/hydrologie/xhydro

a0eb88a

into basin-delineation

Zeitsperre requested a review from RondeauG March 11, 2024 18:59

RondeauG approved these changes Mar 12, 2024

View reviewed changes

github-actions bot added the approved Approved for additional tests label Mar 12, 2024

sebastienlanglois and others added 4 commits March 12, 2024 11:31

correct gravelius property name

36901a6

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

Update environment-dev.yml to add missing library planetary-computer

12efda8

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

update order of notebooks

d90cc27

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

Update .github/workflows/main.yml to add python 3.12

8043acb

Co-authored-by: RondeauG <38501935+RondeauG@users.noreply.github.com>

Zeitsperre approved these changes Mar 12, 2024

View reviewed changes

Zeitsperre added 2 commits March 12, 2024 12:09

fix triggers

0130c05

harden runners

6bf6c62

sebastienlanglois merged commit 376684f into main Mar 12, 2024
24 checks passed

sebastienlanglois deleted the basin-delineation branch March 12, 2024 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding first gis module to perform geospatial operations, notebook builds #61

Adding first gis module to perform geospatial operations, notebook builds #61

sebastienlanglois commented Dec 19, 2023 •

edited by Zeitsperre

Loading

review-notebook-app bot commented Dec 19, 2023

sebastienlanglois commented Dec 19, 2023

Zeitsperre commented Dec 19, 2023

sebastienlanglois commented Dec 19, 2023

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023 •

edited

Loading

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023

sebastienlanglois commented Dec 20, 2023 •

edited

Loading

sebastienlanglois commented Dec 22, 2023

RondeauG left a comment •

edited

Loading

sebastienlanglois commented Mar 4, 2024

sebastienlanglois commented Mar 11, 2024

Zeitsperre commented Mar 11, 2024

Zeitsperre commented Mar 11, 2024

sebastienlanglois commented Mar 11, 2024

RondeauG left a comment

RondeauG Mar 12, 2024

Zeitsperre Mar 12, 2024

RondeauG Mar 12, 2024

sebastienlanglois Mar 12, 2024

RondeauG commented Mar 12, 2024 •

edited

Loading

sebastienlanglois commented Mar 22, 2024 •

edited

Loading

Adding first gis module to perform geospatial operations, notebook builds #61

Adding first gis module to perform geospatial operations, notebook builds #61

Conversation

sebastienlanglois commented Dec 19, 2023 • edited by Zeitsperre Loading

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

review-notebook-app bot commented Dec 19, 2023

sebastienlanglois commented Dec 19, 2023

Zeitsperre commented Dec 19, 2023

sebastienlanglois commented Dec 19, 2023

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023 • edited Loading

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023

sebastienlanglois commented Dec 20, 2023

Zeitsperre commented Dec 20, 2023

sebastienlanglois commented Dec 20, 2023 • edited Loading

sebastienlanglois commented Dec 22, 2023

RondeauG left a comment • edited Loading

Choose a reason for hiding this comment

sebastienlanglois commented Mar 4, 2024

sebastienlanglois commented Mar 11, 2024

Zeitsperre commented Mar 11, 2024

Zeitsperre commented Mar 11, 2024

sebastienlanglois commented Mar 11, 2024

RondeauG left a comment

Choose a reason for hiding this comment

RondeauG Mar 12, 2024

Choose a reason for hiding this comment

Zeitsperre Mar 12, 2024

Choose a reason for hiding this comment

RondeauG Mar 12, 2024

Choose a reason for hiding this comment

sebastienlanglois Mar 12, 2024

Choose a reason for hiding this comment

RondeauG commented Mar 12, 2024 • edited Loading

sebastienlanglois commented Mar 22, 2024 • edited Loading

sebastienlanglois commented Dec 19, 2023 •

edited by Zeitsperre

Loading

Zeitsperre commented Dec 20, 2023 •

edited

Loading

sebastienlanglois commented Dec 20, 2023 •

edited

Loading

RondeauG left a comment •

edited

Loading

RondeauG commented Mar 12, 2024 •

edited

Loading

sebastienlanglois commented Mar 22, 2024 •

edited

Loading