Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpin Xarray: dataArray -> dataArray.variable #492

Merged
merged 13 commits into from
Nov 10, 2023

Conversation

cyschneck
Copy link
Contributor

@cyschneck cyschneck commented Oct 20, 2023

PR Summary

Closes #381

PR Checklist

General

  • Make an issue if one doesn't already exist
  • Link the issue this PR resolves by adding closes #XXX to the PR description where XXX is the number of the issue.
  • Add a brief summary of changes to docs/release-notes.rst in a relevant section for the next unreleased release. Possible sections include: Documentation, New Features, Bug Fixes, Internal Changes, Breaking Changes/Deprecated
  • Add appropriate labels to this PR
  • Make your changes in a forked repository rather than directly in this repo
  • Open this PR as a draft if it is not ready for review
  • Convert this PR from a draft to a full PR before requesting reviewers
  • Passes precommit. To set up on your local, run pre-commit install from the top level of the repository. To manually run pre-commits, use pre-commit run --all-files and re-add any changed files before committing again and pushing.
  • If needed, squash and merge PR commits into a single commit to clean up commit history

@cyschneck cyschneck self-assigned this Oct 20, 2023
@cyschneck
Copy link
Contributor Author

cyschneck commented Oct 24, 2023

Test errors appear to arise in most recent xarray (2023.10.1) interaction with existing tests

/xarray/core/dataarray.py:869: in __setitem__

self.variable[key] = value

Calls: /xarray/core/variable.py:880: in __setitem__

value = value.set_dims(dims).data

Throws: ValueError: new dimensions ('sigma',) must be a superset of existing dimensions ('dim_0',)

If dims are omitted from xarray arguments dimension names are taken from coords (if possible) and otherwise default to ['dim_0', ... 'dim_n'].

Issue appears to be arising from interpolation.py function interp_sigma_to_hybrid. By specify the dims when the DataArray is created, "sigma" is used instead of "dim_0"

@cyschneck
Copy link
Contributor Author

cyschneck commented Nov 1, 2023

Current test failure: test/test_climatologies.py::Test_Month_to_Season::test_month_to_season_custom_time_coordinate[ds4-dataset1-None-Tair-expected1:350

Throws the error ValueError: unit abbreviation w/o a number, but the actual issue is loffset in climatologies.py:303 when xarray resample is called

means = data_filter.resample({time_coord_name: quarter}, loffset='MS').mean(keep_attrs=keep_attrs)

ValueError originates in xarray resample_cftime.py:155 where xarray calls pandas to_timedelta with loffset

labels = labels + pd.to_timedelta(self.loffset)

loffset is accepted since CFTimeGrouper:first_items expected loffset to be a a string, datetime.timedelta, or instance of a BaseCFTimeOffset. As a string, loffset="MS" with pandas function ``pd.to_timedelta("MS"). However, pandas has dropped loffset string support since pandas version 1.1.0. The ValueError being thrown is a deceptive error due to when pandas attempts to convert the loffset "MS" to a timedelta, but as loffset is deprecated within pandas when "MS" is attempted to be converted to TimeDelta without units

result = Timedelta(r="MS", unit=None)

It throws an invalid ValueError. As an invalid Timedelta string, the error can be replicated as pandas.to_timedelta("MS")

>>> import pandas as pd
>>> pd.to_timedelta("MS")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/envs/geocat_comp_build/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 223, in to_timedelta
    return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/geocat_comp_build/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 233, in _coerce_scalar_to_timedelta_type
    result = Timedelta(r, unit)
             ^^^^^^^^^^^^^^^^^^
  File "timedeltas.pyx", line 1820, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "timedeltas.pyx", line 653, in pandas._libs.tslibs.timedeltas.parse_timedelta_string
ValueError: unit abbreviation w/o a number

The current issue appears to arise as a result of xarray not entirely deprecating loffset (as of version 2023.10.1 with pydata/xarray#7444) so invalid arguments are internally passed to pandas (as of version 2.1.2) which has deprecated loffset

@kafitzgerald
Copy link
Contributor

kafitzgerald commented Nov 1, 2023

Looks like handling of the time offset arithmetic (as needed for pydata/xarray#7596) is different for cftime.

Probably why xr.coding.cftime_offsets.to_offset exists (though this is also private API) and why pydata/xarray#5687 would be nice.

@kafitzgerald
Copy link
Contributor

kafitzgerald commented Nov 1, 2023

The current issue appears to arise as a result of xarray not entirely deprecating loffset (as of version 2023.10.1 with pydata/xarray#7444) so invalid arguments are internally passed to pandas (as of version 2.1.2) which has deprecated loffset

I think this is a little more nuanced.

The xarray fix to postpone the deprecation of the loffset argument should still work for most cases, but a later PR introduced a line that converts the loffset argument to a pandas timedelta (via pd.to_timedelta) and that fails since "MS" is invalid input.

Unfortunately, this means that the xarray resample/loffset fix no longer works for us and as mentioned above and shown in the CI failures the time offset arithmetic is a little more complicated in this case.

I'm not sure the nuance here really changes a lot, but I might go log an issue on xarray in case someone else runs into this.

@cyschneck cyschneck marked this pull request as ready for review November 2, 2023 16:24
@cyschneck cyschneck added bug Something isn't working support Support request opened by outside user/collaborator labels Nov 2, 2023
@cyschneck cyschneck changed the title Private API Xarray: dataArray -> dataArray.variable Unpin Xarray: dataArray -> dataArray.variable Nov 2, 2023
@cyschneck cyschneck requested review from anissa111 and jukent November 2, 2023 16:57
Copy link
Member

@anissa111 anissa111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget the pin in setup.cfg!

EDIT: also, why are there changes to interpolation.py here? Are those intentional?

@kafitzgerald
Copy link
Contributor

Having missed some version pins myself recently... I think this version pin is also in some additional *.yml files and the requirements.txt in addition to the setup.cfg

@cyschneck
Copy link
Contributor Author

The changes to interpolation.py are to fix the issue with the interpolation tests throwing

ValueError: new dimensions ('sigma',) must be a superset of existing dimensions ('dim_0',)

Copy link
Contributor

@kafitzgerald kafitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here all look reasonable to me (introducing more private API from xarray is non-ideal, but should be less of a risk with expanded upstream testing and hopefully we do something different in the long run).

I do think we should better document the change in interpolation.py and the loffset change. At least including them in the PR description and changing the title and release notes blurb to something more inclusive of the changes here.

Probably a good idea to open another issue about Xarray private API as well since we'll be introducing more rather than removing it.

Will be nice to see this get merged though!

docs/release-notes.rst Outdated Show resolved Hide resolved
geocat/comp/interpolation.py Show resolved Hide resolved
geocat/comp/climatologies.py Show resolved Hide resolved
@cyschneck
Copy link
Contributor Author

Added an issue to the backlog (#508) to come back to these changes in the future

@@ -23,12 +23,14 @@ Documentation
Bug Fixes
^^^^^^^^^
* Fix Python version in upstream CI by `Philip Chmielowiec`_ in (:pr:`436`)
* Unpin xarray in enviroment builds with changes to interpolation.py (specify dims in xr.DataArray) and climatologies.py (replace loffset with to_offset) by `Cora Schneck`_ in (:pr:`381`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this still needs to get bumped up to a v2023.11.0 (unreleased) section.

Otherwise, things look good to me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seconded, I'll approve after this, too!

@kafitzgerald kafitzgerald self-requested a review November 8, 2023 16:42
Copy link
Contributor

@kafitzgerald kafitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@cyschneck cyschneck requested a review from anissa111 November 8, 2023 16:46
Copy link
Member

@anissa111 anissa111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed the re-review request, this looks good!

@anissa111 anissa111 merged commit e5b5635 into NCAR:main Nov 10, 2023
8 checks passed
@cyschneck cyschneck deleted the xr_dataArray_381 branch November 17, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working support Support request opened by outside user/collaborator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Xarray Private API and remove pinning
5 participants