Add weight threshold option for spatial averaging #672

pochedls · 2024-06-28T19:58:43Z

Description

This PR adds an optional argument that requires a minimum fraction of data be available to perform a spatial average. The initial PR is for spatial averaging only (it would need to be expanded to handle temporal averaging).

Closes [Enhancement]: Add weight threshold option for averaging operations #531

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass with my changes (locally and CI/CD build)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

codecov · 2024-06-28T20:02:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1d57e25) to head (9967127).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #672   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        15           
  Lines         1544      1558   +14     
=========================================
+ Hits          1544      1558   +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pochedls · 2024-06-28T20:01:29Z

xcdat/spatial.py

+ # zero out cells with missing values in data_var
+ weights = xr.where(~np.isnan(data_var), weights, 0)
+ # sum all weights (including zero for missing values)
+ weight_sum_masked = weights.sum(dim=dim) # type: ignore


Needed to add type: ignore for unclear reasons...

I think dim is expected to be a string but we pass a Hashable (dict key).

# type: ignore is fine here or set dim=str(dim) to remove that comment

pochedls · 2024-06-28T20:02:15Z

xcdat/spatial.py

@@ -716,6 +725,9 @@ def _averager(self, data_var: xr.DataArray, axis: List[SpatialAxis]):
 Data variable inside a Dataset.
 axis : List[SpatialAxis]
 List of axis dimensions to average over.
+ required_weight : optional, float


Is required_weight a good parameter name? What should the default be (currently zero to make default behavior backwards compatible).

Maybe required_weight_pct? required_weight sounds like a boolean parameter.

I think 0 is the appropriate default value.

Actually I like None as a default value to indicate the arg is not set by default.

pochedls · 2024-06-28T20:02:30Z

xcdat/spatial.py

@@ -729,11 +741,48 @@ def _averager(self, data_var: xr.DataArray, axis: List[SpatialAxis]):
 """
 weights = self._weights.fillna(0)


I don't think this does anything.

I recall we ran into issues with Xarray's weighted mean API when weights contain np.nan. The notes above it say:

xcdat/xcdat/spatial.py

Lines 727 to 728 in cdae826

``weights`` must be a DataArray and cannot contain missing values.

Missing values are replaced with 0 using ``weights.fillna(0)``.

pochedls · 2024-06-28T20:12:14Z

tests/test_spatial.py

+ def test_spatial_average_with_required_weight_as_None(self):
+ ds = self.ds.copy()
+
+ result = ds.spatial.average(
+ "ts",
+ axis=["X", "Y"],
+ lat_bounds=(-5.0, 5),
+ lon_bounds=(-170, -120.1),
+ required_weight=None,
+ )
+
+ expected = self.ds.copy()
+ expected["ts"] = xr.DataArray(
+ data=np.array([2.25, 1.0, 1.0]),
+ coords={"time": expected.time},
+ dims="time",
+ )
+
+ xr.testing.assert_allclose(result, expected)


This test is related to the code chunk in _averager that deals with the case where required_weight=None. I don't know why it would ever be None (it is supposed to be Optional[float] = 0., so I don't know how it could be None). If you don't have those conditionals you get an issue (maybe from mypy) about comparing None to int | float with the lines that check if required_weight > 0..

So if we could ensure that required_weight is always of type float we could remove this test and other conditionals in _averager. That would be ideal...

Using Optional indicates that the argument can also be None. You can just specify float as the type annotation if None is not expected.

I like None as the default value

tomvothecoder · 2024-07-30T21:29:24Z

I'm going to open a separate PR to address this enhancement with the temporal APIs because those require more work and we can merge this earlier when it is ready.

tomvothecoder · 2024-07-30T22:40:00Z

xcdat/spatial.py

+ # ensure required weight is between 0 and 1
+ if required_weight is None:
+ required_weight = 0.0
+
+ if required_weight < 0.0:
+ raise ValueError(
+ "required_weight argment is less than zero. "
+ "required_weight must be between 0 and 1."
+ )
+
+ if required_weight > 1.0:
+ raise ValueError(
+ "required_weight argment is greater than zero. "
+ "required_weight must be between 0 and 1."
+ )
+
+ # need weights to match data_var dimensionality
+ if required_weight > 0.0:
+ weights, data_var = xr.broadcast(weights, data_var)


Suggested change

# ensure required weight is between 0 and 1

if required_weight is None:

required_weight = 0.0

if required_weight < 0.0:

raise ValueError(

"required_weight argment is less than zero. "

"required_weight must be between 0 and 1."

)

if required_weight > 1.0:

raise ValueError(

"required_weight argment is greater than zero. "

"required_weight must be between 0 and 1."

)

# need weights to match data_var dimensionality

if required_weight > 0.0:

weights, data_var = xr.broadcast(weights, data_var)

# ensure required weight is between 0 and 1

if required_weight is None:

required_weight = 0.0

elif required_weight < 0.0:

raise ValueError(

"required_weight argument is less than 0. "

"required_weight must be between 0 and 1."

)

elif required_weight > 1.0:

raise ValueError(

"required_weight argument is greater than 1. "

"required_weight must be between 0 and 1."

)

# need weights to match data_var dimensionality

if required_weight > 0.0:

weights, data_var = xr.broadcast(weights, data_var)

tomvothecoder · 2024-07-30T22:42:37Z

I started PR #683 for temporal operations. I used some of your code and split them up into reusable functions. We can think about making these functions generalizable across the spatial and temporal classes.

Check them out here: https://github.com/xCDAT/xcdat/pull/683/files

initial attempt at #531 (for spatial averaging)

73808e1

github-actions bot added the type: enhancement New enhancement request label Jun 28, 2024

pochedls commented Jun 28, 2024

View reviewed changes

cleanup print statement and complete code coverage

9967127

pochedls commented Jun 28, 2024

View reviewed changes

tomvothecoder changed the title ~~Set (optional) weight threshold for averaging operations~~ Add weight threshold option for spatial averaging Jul 30, 2024

tomvothecoder mentioned this pull request Jul 30, 2024

Add weight threshold option for temporal operations #683

Draft

9 tasks

tomvothecoder reviewed Jul 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weight threshold option for spatial averaging #672

Add weight threshold option for spatial averaging #672

pochedls commented Jun 28, 2024 •

edited by tomvothecoder

Loading

codecov bot commented Jun 28, 2024 •

edited

Loading

pochedls Jun 28, 2024

tomvothecoder Jul 30, 2024 •

edited

Loading

pochedls Jun 28, 2024

tomvothecoder Jul 30, 2024

tomvothecoder Jul 30, 2024

pochedls Jun 28, 2024

tomvothecoder Jul 30, 2024

pochedls Jun 28, 2024

tomvothecoder Jul 30, 2024

tomvothecoder Jul 30, 2024

tomvothecoder commented Jul 30, 2024

tomvothecoder Jul 30, 2024 •

edited

Loading

tomvothecoder commented Jul 30, 2024

		@@ -729,11 +741,48 @@ def _averager(self, data_var: xr.DataArray, axis: List[SpatialAxis]):
		"""
		weights = self._weights.fillna(0)

	``weights`` must be a DataArray and cannot contain missing values.
	Missing values are replaced with 0 using ``weights.fillna(0)``.

Add weight threshold option for spatial averaging #672

Are you sure you want to change the base?

Add weight threshold option for spatial averaging #672

Conversation

pochedls commented Jun 28, 2024 • edited by tomvothecoder Loading

Description

Checklist

codecov bot commented Jun 28, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

tomvothecoder Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomvothecoder commented Jul 30, 2024

tomvothecoder Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

tomvothecoder commented Jul 30, 2024

pochedls commented Jun 28, 2024 •

edited by tomvothecoder

Loading

codecov bot commented Jun 28, 2024 •

edited

Loading

tomvothecoder Jul 30, 2024 •

edited

Loading

tomvothecoder Jul 30, 2024 •

edited

Loading