Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with GDAC data source and BGC dataset #418

Merged
merged 11 commits into from
Dec 12, 2024

Conversation

gmaze
Copy link
Member

@gmaze gmaze commented Dec 10, 2024

The data mode splitting method was found to provide erroneous results due to unexpected results from xarray dropna

The ds.argo.datamode.split method now use a clean ufunc method to properly extract parameter data mode

gmaze added 10 commits December 10, 2024 13:40
Allows for:
- deep copy (new instance with same parameters and cleared search)
- shallow copy (new instance with same parameters and search results if any)
- fix bug raised when dealing with a STATION_PARAMETER index having an empty string
Re-designed how we compute the data mode of a parameter
fix compute to handle a collection of profiles (presumably with STATION_PARAMETERS and PARAMETER_DATA_MODE) or a collection of points (presumably without STATION_PARAMETERS and PARAMETER_DATA_MODE)
@gmaze gmaze added invalid This doesn't seem right internals Internal machinery argo-BGC About biogeochemical variables labels Dec 10, 2024
@gmaze gmaze self-assigned this Dec 10, 2024
Copy link

codecov bot commented Dec 10, 2024

❌ 174 Tests Failed:

Tests completed Failed Passed Skipped
1663 174 1489 534
View the top 3 failed tests by shortest run time
test_stores_index.py::Test_IndexStore_pyarrow_BGC_bio::test_hosts[ftp_mocked]
Stack Traces | 0.005s run time
No failure message available
test_stores_index.py::Test_IndexStore_pyarrow_BGC_synthetic::test_hosts[ftp_mocked]
Stack Traces | 0.005s run time
No failure message available
test_stores_index.py::Test_IndexStore_pyarrow_CORE::test_hosts[ftp_mocked]
Stack Traces | 0.006s run time
No failure message available

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

ftp fsspec is not accepting timeout=0 anymore
ValueError: Non-blocking socket (timeout=0) is not supported
"""
idx = copy.copy(indexfs) if isinstance(indexfs, ArgoIndex) else ArgoIndex()
idx = indexfs.copy(deep=True) if isinstance(indexfs, ArgoIndex) else ArgoIndex()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have a new method copy for the ArgoIndex

def split(self):
def compute(self, indexfs: Union[None, ArgoIndex]) -> xr.Dataset:
"""Compute <PARAM>_DATA_MODE variables"""
if "STATION_PARAMETERS" in self._obj and "PARAMETER_DATA_MODE" in self._obj:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not spot this subtility before

@@ -177,6 +177,8 @@ def __init__(self, mode: str = "", src: str = "", ds: str = "", **fetcher_kwargs
raise OptionValueError(
"The 'argovis' data source fetching is only available in 'standard' user mode"
)
if self._src == "gdac" and "bgc" in self._dataset_id:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont remember why this was removed

@@ -352,24 +354,48 @@ def split_data_mode(ds: xr.Dataset) -> xr.Dataset:
-------
:class:`xr.Dataset`
"""
if ds.argo._type != "profile":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an error to avoid passing through for the wrong reasons and not detecting it

@gmaze gmaze merged commit 3a876a6 into master Dec 12, 2024
37 checks passed
@gmaze gmaze deleted the fix-for-bad-STATION_PARAMETERS branch December 12, 2024 07:30
gmaze added a commit that referenced this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argo-BGC About biogeochemical variables internals Internal machinery invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant