-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add weight threshold option for spatial averaging #672
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #672 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 15
Lines 1544 1558 +14
=========================================
+ Hits 1544 1558 +14 ☔ View full report in Codecov by Sentry. |
# zero out cells with missing values in data_var | ||
weights = xr.where(~np.isnan(data_var), weights, 0) | ||
# sum all weights (including zero for missing values) | ||
weight_sum_masked = weights.sum(dim=dim) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to add type: ignore
for unclear reasons...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dim
is expected to be a string but we pass a Hashable (dict key).
# type: ignore
is fine here or set dim=str(dim)
to remove that comment
@@ -716,6 +725,9 @@ def _averager(self, data_var: xr.DataArray, axis: List[SpatialAxis]): | |||
Data variable inside a Dataset. | |||
axis : List[SpatialAxis] | |||
List of axis dimensions to average over. | |||
required_weight : optional, float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is required_weight
a good parameter name? What should the default be (currently zero to make default behavior backwards compatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe required_weight_pct
? required_weight
sounds like a boolean parameter.
I think 0 is the appropriate default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I like None
as a default value to indicate the arg is not set by default.
@@ -729,11 +741,48 @@ def _averager(self, data_var: xr.DataArray, axis: List[SpatialAxis]): | |||
""" | |||
weights = self._weights.fillna(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we ran into issues with Xarray's weighted mean API when weights contain np.nan
. The notes above it say:
Lines 727 to 728 in cdae826
``weights`` must be a DataArray and cannot contain missing values. | |
Missing values are replaced with 0 using ``weights.fillna(0)``. |
def test_spatial_average_with_required_weight_as_None(self): | ||
ds = self.ds.copy() | ||
|
||
result = ds.spatial.average( | ||
"ts", | ||
axis=["X", "Y"], | ||
lat_bounds=(-5.0, 5), | ||
lon_bounds=(-170, -120.1), | ||
required_weight=None, | ||
) | ||
|
||
expected = self.ds.copy() | ||
expected["ts"] = xr.DataArray( | ||
data=np.array([2.25, 1.0, 1.0]), | ||
coords={"time": expected.time}, | ||
dims="time", | ||
) | ||
|
||
xr.testing.assert_allclose(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is related to the code chunk in _averager
that deals with the case where required_weight=None
. I don't know why it would ever be None
(it is supposed to be Optional[float] = 0.
, so I don't know how it could be None). If you don't have those conditionals you get an issue (maybe from mypy) about comparing None
to int | float
with the lines that check if required_weight > 0.
.
So if we could ensure that required_weight
is always of type float
we could remove this test and other conditionals in _averager
. That would be ideal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Optional
indicates that the argument can also be None
. You can just specify float
as the type annotation if None
is not expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like None
as the default value
I'm going to open a separate PR to address this enhancement with the temporal APIs because those require more work and we can merge this earlier when it is ready. |
# ensure required weight is between 0 and 1 | ||
if required_weight is None: | ||
required_weight = 0.0 | ||
|
||
if required_weight < 0.0: | ||
raise ValueError( | ||
"required_weight argment is less than zero. " | ||
"required_weight must be between 0 and 1." | ||
) | ||
|
||
if required_weight > 1.0: | ||
raise ValueError( | ||
"required_weight argment is greater than zero. " | ||
"required_weight must be between 0 and 1." | ||
) | ||
|
||
# need weights to match data_var dimensionality | ||
if required_weight > 0.0: | ||
weights, data_var = xr.broadcast(weights, data_var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# ensure required weight is between 0 and 1 | |
if required_weight is None: | |
required_weight = 0.0 | |
if required_weight < 0.0: | |
raise ValueError( | |
"required_weight argment is less than zero. " | |
"required_weight must be between 0 and 1." | |
) | |
if required_weight > 1.0: | |
raise ValueError( | |
"required_weight argment is greater than zero. " | |
"required_weight must be between 0 and 1." | |
) | |
# need weights to match data_var dimensionality | |
if required_weight > 0.0: | |
weights, data_var = xr.broadcast(weights, data_var) | |
# ensure required weight is between 0 and 1 | |
if required_weight is None: | |
required_weight = 0.0 | |
elif required_weight < 0.0: | |
raise ValueError( | |
"required_weight argument is less than 0. " | |
"required_weight must be between 0 and 1." | |
) | |
elif required_weight > 1.0: | |
raise ValueError( | |
"required_weight argument is greater than 1. " | |
"required_weight must be between 0 and 1." | |
) | |
# need weights to match data_var dimensionality | |
if required_weight > 0.0: | |
weights, data_var = xr.broadcast(weights, data_var) |
I started PR #683 for temporal operations. I used some of your code and split them up into reusable functions. We can think about making these functions generalizable across the spatial and temporal classes. Check them out here: https://github.com/xCDAT/xcdat/pull/683/files |
Description
This PR adds an optional argument that requires a minimum fraction of data be available to perform a spatial average. The initial PR is for spatial averaging only (it would need to be expanded to handle temporal averaging).
Checklist
If applicable: