Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a fetcher for AWS S3 Argo GDAC data #385

Open
wants to merge 54 commits into
base: master
Choose a base branch
from

Conversation

gmaze
Copy link
Member

@gmaze gmaze commented Sep 3, 2024

Support for AWS S3 data files

This support is experimental and is primarily made available for benchmarking as part of the ADMT working group on Argo cloud format activities

In this PR, we provide:

  • a data fetcher for netcdf files on AWS S3
  • a kerchunk helper to lazily load netcdf files from AWS S3

More work is ongoing at #424 and #423 to handle Argo data in the cloud more easily and appropriately

@gmaze gmaze added enhancement New feature or request backends performance labels Sep 3, 2024
@gmaze gmaze self-assigned this Sep 3, 2024
@gmaze gmaze linked an issue Sep 3, 2024 that may be closed by this pull request
@gmaze gmaze marked this pull request as draft September 4, 2024 06:08
commit 62ba4cb
Merge: 919484e ce6fed9
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 13:50:16 2024 +0200

    Merge pull request #389 from euroargodev/other-major-breaking-refactoring

    Implement other than bgc-2024 branch major breaking refactoring for major release v1.0.0

commit ce6fed9
Merge: fa05fa7 919484e
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 12:08:02 2024 +0200

    Merge branch 'master' into other-major-breaking-refactoring

commit fa05fa7
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 12:07:02 2024 +0200

    Delete test_deprecated.py

commit 919484e
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 11:37:21 2024 +0200

    Fix ci tests env

    fix error    libmamba Could not solve for environment specs
          The following packages are incompatible
          ├─ fsspec 2024.9.0*  is requested and can be installed;
          └─ s3fs 2024.6.1*  is not installable because it requires
             └─ fsspec 2024.6.1 , which conflicts with any installable versions previously reported.
      critical libmamba Could not solve for environment specs

commit 0dc9834
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 11:31:21 2024 +0200

    Add upstream tests with python 3.11 and 3.12

commit a1aedc5
Merge: 747ba13 549d8c3
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 11:25:09 2024 +0200

    Merge branch 'master' into other-major-breaking-refactoring

commit 549d8c3
Merge: 1e79ec0 2d4785d
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 11:20:42 2024 +0200

    Merge pull request #356 from euroargodev/bgc-2024

    Work on BGC from 2024 LOV visit

commit 2d4785d
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 10:30:17 2024 +0200

    Remove 45mins timeout for CI tests

commit 1797037
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 08:03:22 2024 +0200

    Update CI tests data

    include standard and research mode for erddap BGC

commit 82c20c8
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 07:44:50 2024 +0200

    Update CI tests data

commit f7ebc21
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 25 07:39:34 2024 +0200

    Update test_deprecated.py

commit 51355c3
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Tue Sep 24 12:08:05 2024 +0200

    update CI tests data

commit 809adc9
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Tue Sep 24 10:37:20 2024 +0200

    Update create_json_assets

commit 2ff193f
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Tue Sep 24 10:37:15 2024 +0200

    Update argovis_data.py

    make sure argovis is only using a single filestore

commit a73f727
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Tue Sep 24 10:36:53 2024 +0200

    Update CI tests data

commit cf41ba4
Merge: 4681d55 1e79ec0
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Mon Sep 23 14:59:03 2024 +0200

    Merge branch 'master' into bgc-2024

commit 4681d55
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Mon Sep 23 14:57:32 2024 +0200

    Clear CI tests for easier merge with master [skip-ci]

commit 1e79ec0
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Mon Sep 23 14:56:59 2024 +0200

    Clear CI tests data for easier merge [skip-ci]

commit c9de8b9
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Mon Sep 23 14:54:43 2024 +0200

    Clear CI tests data before merge

commit a21a644
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Mon Sep 23 09:56:26 2024 +0200

    Update whats-new.rst

commit fe8b91c
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 15:38:26 2024 +0200

    Update requirements.txt

commit 4ae5aab
Merge: 0f5a754 b135bfa
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 15:36:58 2024 +0200

    Merge pull request #394 from euroargodev/releasev0.1.17

    Prepare for v0.1.17 Bat Release 🦇

commit b135bfa
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 14:16:32 2024 +0200

    Update dev env definitions

commit 0f5a754
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:54:21 2024 +0200

    Update HOW_TO_RELEASE.md [skip-ci]

commit 4bc625e
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:49:08 2024 +0200

    Flake8

commit 34d1a46
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:45:15 2024 +0200

    codespell

commit 6259011
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:42:32 2024 +0200

    Fix CI tests data update

commit c5ab622
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:36:15 2024 +0200

    Update cheatsheet.rst

commit cb66217
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 13:28:25 2024 +0200

    Update cheatsheet PDF

commit 10ff2cf
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:50:15 2024 +0200

    Update CI tests data

commit ec0b14c
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:48:41 2024 +0200

    Update HOW_TO_RELEASE.md [skip-ci]

commit e2df789
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:28:55 2024 +0200

    Update static assets

commit cffefc0
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:28:24 2024 +0200

    Update reference_tables.py

commit 6cf2644
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:07:15 2024 +0200

    Update whats-new.rst

commit eb7e689
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 11:07:12 2024 +0200

    Update fetchers.py

commit d8121d8
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 10:58:12 2024 +0200

    Update HOW_TO_RELEASE.md [skip-ci]

commit 88ff363
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 10:34:20 2024 +0200

    Move to v0.1.17, to Beta

commit e48ab55
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 09:47:51 2024 +0200

    Update xarray.py

    don't anticipate too much on the upcoming filter_data_mode replacement

commit 29a5cfc
Merge: 5a31057 f3b0a56
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 09:45:45 2024 +0200

    Merge pull request #388 from euroargodev/deprec-before-major

    Introduces deprecation warnings before major v1.0.0 release

commit f3b0a56
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 20 08:56:53 2024 +0200

    Better deprecation introduction

commit 5a31057
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Thu Sep 19 14:15:02 2024 +0200

    Pin erddapy for python < 3.10

    See ioos/erddapy#359

commit 747ba13
Merge: 37f2495 0095fe6
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 18 15:33:08 2024 +0200

    Merge branch 'master' into other-major-breaking-refactoring

commit 37f2495
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 18 15:32:46 2024 +0200

    Update monitored_threadpool.py

commit 6d9be49
Merge: 62ece42 0095fe6
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 18 15:30:38 2024 +0200

    Merge branch 'master' into bgc-2024

commit 2669301
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 13 14:46:38 2024 +0200

    [skip-ci]

commit e87afe1
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 13 14:32:06 2024 +0200

    Create test_deprecated.py

    Ensure we're having warnings for deprecations

commit c319d0a
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 13 14:31:32 2024 +0200

    Update xarray.py

    fix deprecation warning

commit 19daad3
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 13 14:31:13 2024 +0200

    New deprecation for option 'ftp' replaced by 'gdac'

commit c890602
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Fri Sep 13 14:30:32 2024 +0200

    introduce new "OptionDeprecatedWarning"

commit 850adf1
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 4 11:14:10 2024 +0200

    Deprec for 'dataset' option replaced by 'ds'

commit 1371625
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 4 10:12:43 2024 +0200

    Update whats-new.rst

commit a988d79
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 4 10:10:43 2024 +0200

    Update xarray.py

commit acc789e
Author: Guillaume Maze <gmaze@ifremer.fr>
Date:   Wed Sep 4 10:08:07 2024 +0200

    Update xarray.py
Copy link

codecov bot commented Oct 23, 2024

❌ 17 Tests Failed:

Tests completed Failed Passed Skipped
1680 17 1663 546
View the top 3 failed tests by shortest run time
test_fetchers_data_gdac.py::TestBackend::test_fetching_cached[host='s3', ds='phy', mode='expert', {'float': [13857]}]
Stack Traces | 0.409s run time
self = <argopy.tests.test_fetchers_data_gdac.TestBackend object at 0x0000019989963890>
mocked_httpserver = 'http://127.0.0.1:9898'
cached_fetcher = <datafetcher.gdac>
#x1F310 Name: Ifremer GDAC Argo data fetcher for floats
#x1F916 Domain: WMO13857
#x1F517 API: s3:.../argo-gdac-sandbox/...searched: True (98 matches, 100.0000%)
#x1F3C4 User mode: expert
#x1F7E1+#x1F535 Dataset: phy
#x1F324  Performances: cache=True, parallel=False

    @pytest.mark.parametrize("cached_fetcher", VALID_ACCESS_POINTS, indirect=True, ids=VALID_ACCESS_POINTS_IDS)
    def test_fetching_cached(self, mocked_httpserver, cached_fetcher):
        # Assert the fetcher (this trigger data fetching, hence caching as well):
>       assert_fetcher(mocked_httpserver, cached_fetcher, cacheable=True)

argopy\tests\test_fetchers_data_gdac.py:205: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
argopy\tests\test_fetchers_data_gdac.py:98: in assert_fetcher
    assert isinstance(this_fetcher.to_xarray(errors='raise'), xr.Dataset)
argopy\fetchers.py:618: in to_xarray
    xds = self.fetcher.to_xarray(**kwargs)
argopy\data_fetchers\gdac_data.py:341: in to_xarray
    URI = self.uri  # Call it once
argopy\data_fetchers\gdac_data.py:545: in uri
    self._list_of_argo_files = self.uri_mono2multi(URIs)
argopy\data_fetchers\gdac_data.py:227: in uri_mono2multi
    new_uri = [mono2multi(uri) for uri in URIs]
argopy\data_fetchers\gdac_data.py:227: in <listcomp>
    new_uri = [mono2multi(uri) for uri in URIs]
argopy\data_fetchers\gdac_data.py:200: in mono2multi
    meta = argo_split_path(mono_path)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

this_path = 's3:.../13857/profiles/R13857_001.nc'

    def argo_split_path(this_path):  # noqa C901
        """Split path from a GDAC ftp style Argo netcdf file and return information
    
        >>> argo_split_path('.../6901035/profiles/D6901035_001D.nc')
        >>> argo_split_path('https://data-argo.ifremer..../5903939/profiles/D5903939_103.nc')
    
        Parameters
        ----------
        str
    
        Returns
        -------
        dict
        """
        dacs = [
            "aoml",
            "bodc",
            "coriolis",
            "csio",
            "csiro",
            "incois",
            "jma",
            "kma",
            "kordi",
            "meds",
            "nmdis",
        ]
        output = {}
    
        start_with = (
            lambda f, x: f[0 : len(x)] == x if len(x) <= len(f) else False
        )  # noqa: E731
    
        def split_path(p, sep="/"):
            """Split a pathname.  Returns tuple "(head, tail)" where "tail" is
            everything after the final slash.  Either part may be empty."""
            # Same as posixpath.py but we get to choose the file separator !
            p = os.fspath(p)
            i = p.rfind(sep) + 1
            head, tail = p[:i], p[i:]
            if head and head != sep * len(head):
                head = head.rstrip(sep)
            return head, tail
    
        def fix_localhost(host):
            if "ftp://localhost:" in host:
                return "ftp://%s" % (urlparse(host).netloc)
            if "http://127.0.0.1:" in host:
                return "http://%s" % (urlparse(host).netloc)
            else:
                return ""
    
        known_origins = [
            "https://data-argo.ifremer.fr",
            "ftp://ftp.ifremer.fr/ifremer/argo",
            "ftp://usgodae..../pub/outgoing/argo",
            fix_localhost(this_path),
            "",
        ]
    
        output["origin"] = [
            origin for origin in known_origins if start_with(this_path, origin)
        ][0]
        output["origin"] = "." if output["origin"] == "" else output["origin"] + "/"
        sep = "/" if output["origin"] != "." else os.path.sep
    
        (path, file) = split_path(this_path, sep=sep)
    
        output["path"] = path.replace(output["origin"], "")
        output["name"] = file
    
        # Deal with the path:
        # dac/<DAC>/<FloatWmoID>/
        # dac/<DAC>/<FloatWmoID>/profiles
        path_parts = path.split(sep)
    
        try:
            if path_parts[-1] == "profiles":
                output["type"] = "Mono-cycle profile file"
                output["wmo"] = path_parts[-2]
                output["dac"] = path_parts[-3]
            else:
                output["type"] = "Multi-cycle profile file"
                output["wmo"] = path_parts[-1]
>               output["dac"] = path_parts[-2]
E               IndexError: list index out of range

argopy\utils\format.py:116: IndexError
test_fetchers_data_gdac.py::TestBackend::test_fetching_cached[host='c', ds='phy', mode='research', {'region': [-20, -16.0, 0, 1, 0, 100.0, '1997-07-01', '1997-09-01']}]
Stack Traces | 0.425s run time
self = <argopy.tests.test_fetchers_data_gdac.TestBackend object at 0x000001B540AA2080>
mocked_httpserver = 'http://127.0.0.1:9898'
cached_fetcher = <datafetcher.gdac>
#x1F310 Name: Ifremer GDAC Argo data fetcher for a space/time region
#x1F5FA  Domain: [x=-20.00/-16.00; y=0.00/... searched: True (3 matches, 0.1110%)
#x1F6A3 User mode: research
#x1F7E1+#x1F535 Dataset: phy
#x1F324  Performances: cache=True, parallel=False

    @pytest.mark.parametrize("cached_fetcher", VALID_ACCESS_POINTS, indirect=True, ids=VALID_ACCESS_POINTS_IDS)
    def test_fetching_cached(self, mocked_httpserver, cached_fetcher):
        # Assert the fetcher (this trigger data fetching, hence caching as well):
>       assert_fetcher(mocked_httpserver, cached_fetcher, cacheable=True)

argopy\tests\test_fetchers_data_gdac.py:205: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
argopy\tests\test_fetchers_data_gdac.py:98: in assert_fetcher
    assert isinstance(this_fetcher.to_xarray(errors='raise'), xr.Dataset)
argopy\fetchers.py:618: in to_xarray
    xds = self.fetcher.to_xarray(**kwargs)
argopy\data_fetchers\gdac_data.py:399: in to_xarray
    results = self.fs.open_mfdataset(URI, **opts)
argopy\stores\filesystems.py:587: in open_mfdataset
    data = future.result()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\_base.py:451: in result
    return self.__get_result()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\_base.py:403: in __get_result
    raise self._exception
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
argopy\stores\filesystems.py:491: in _mfprocessor
    ds = self.open_dataset(url, **open_dataset_opts)
argopy\stores\filesystems.py:455: in open_dataset
    target, _ = load_in_memory(path, errors=errors, xr_opts=xr_opts)
argopy\stores\filesystems.py:415: in load_in_memory
    data = self.fs.cat_file(path)
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\spec.py:766: in cat_file
    with self.open(path, "rb", **kwargs) as f:
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\spec.py:1293: in open
    f = self._open(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:718: in _open
    self.save_cache()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:206: in save_cache
    self._metadata.save()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cache_metadata.py:227: in save
    self._save(cache, fn)
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cache_metadata.py:71: in _save
    with atomic_write(fn, mode="w") as f:
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\contextlib.py:142: in __exit__
    next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache'
mode = 'w'

    @contextlib.contextmanager
    def atomic_write(path: str, mode: str = "wb"):
        """
        A context manager that opens a temporary file next to `path` and, on exit,
        replaces `path` with the temporary file, thereby updating `path`
        atomically.
        """
        fd, fn = tempfile.mkstemp(
            dir=os.path.dirname(path), prefix=os.path.basename(path) + "-"
        )
        try:
            with open(fd, mode) as fp:
                yield fp
        except BaseException:
            with contextlib.suppress(FileNotFoundError):
                os.unlink(fn)
            raise
        else:
>           os.replace(fn, path)
E           PermissionError: [WinError 5] Access is denied: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache-fkvsaa61' -> 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache'

C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\utils.py:630: PermissionError
test_fetchers_data_gdac.py::TestBackend::test_fetching_cached[host='http_mocked', ds='phy', mode='research', {'region': [-20, -16.0, 0, 1, 0, 100.0]}]
Stack Traces | 0.553s run time
self = <argopy.tests.test_fetchers_data_gdac.TestBackend object at 0x000001B540AA0FA0>
mocked_httpserver = 'http://127.0.0.1:9898'
cached_fetcher = <datafetcher.gdac>
#x1F310 Name: Ifremer GDAC Argo data fetcher for a space/time region
#x1F5FA  Domain: [x=-20.00/-16.00; y=0.00/... searched: True (3 matches, 3.0000%)
#x1F6A3 User mode: research
#x1F7E1+#x1F535 Dataset: phy
#x1F324  Performances: cache=True, parallel=False

    @pytest.mark.parametrize("cached_fetcher", VALID_ACCESS_POINTS, indirect=True, ids=VALID_ACCESS_POINTS_IDS)
    def test_fetching_cached(self, mocked_httpserver, cached_fetcher):
        # Assert the fetcher (this trigger data fetching, hence caching as well):
>       assert_fetcher(mocked_httpserver, cached_fetcher, cacheable=True)

argopy\tests\test_fetchers_data_gdac.py:205: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
argopy\tests\test_fetchers_data_gdac.py:98: in assert_fetcher
    assert isinstance(this_fetcher.to_xarray(errors='raise'), xr.Dataset)
argopy\fetchers.py:618: in to_xarray
    xds = self.fetcher.to_xarray(**kwargs)
argopy\data_fetchers\gdac_data.py:399: in to_xarray
    results = self.fs.open_mfdataset(URI, **opts)
argopy\stores\filesystems.py:1386: in open_mfdataset
    data = future.result()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\_base.py:451: in result
    return self.__get_result()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\_base.py:403: in __get_result
    raise self._exception
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\concurrent\futures\thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
argopy\stores\filesystems.py:1064: in _mfprocessor_dataset
    ds = self.open_dataset(url, **open_dataset_opts)
argopy\stores\filesystems.py:1005: in open_dataset
    target, _ = load_in_memory(
argopy\stores\filesystems.py:946: in load_in_memory
    data = self.download_url(url, **dwn_opts)
argopy\stores\filesystems.py:859: in download_url
    data, n = make_request(
argopy\stores\filesystems.py:811: in make_request
    data = ffs.cat_file(url, **cat_opts)
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\spec.py:766: in cat_file
    with self.open(path, "rb", **kwargs) as f:
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\spec.py:1293: in open
    f = self._open(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:718: in _open
    self.save_cache()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:449: in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cached.py:206: in save_cache
    self._metadata.save()
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cache_metadata.py:227: in save
    self._save(cache, fn)
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\implementations\cache_metadata.py:71: in _save
    with atomic_write(fn, mode="w") as f:
C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\contextlib.py:142: in __exit__
    next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache'
mode = 'w'

    @contextlib.contextmanager
    def atomic_write(path: str, mode: str = "wb"):
        """
        A context manager that opens a temporary file next to `path` and, on exit,
        replaces `path` with the temporary file, thereby updating `path`
        atomically.
        """
        fd, fn = tempfile.mkstemp(
            dir=os.path.dirname(path), prefix=os.path.basename(path) + "-"
        )
        try:
            with open(fd, mode) as fp:
                yield fp
        except BaseException:
            with contextlib.suppress(FileNotFoundError):
                os.unlink(fn)
            raise
        else:
>           os.replace(fn, path)
E           PermissionError: [WinError 5] Access is denied: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache-w9_o2r2g' -> 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpd_l21mp7\\cache'

C:\Users\runneradmin\micromamba\envs\argopy-tests\lib\site-packages\fsspec\utils.py:630: PermissionError

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

gmaze added 26 commits October 24, 2024 16:37
Easy management of misc GDAC protocols
- fix isAPIconnected to use check_gdac_path when the gdac data source
Now uses ArgoIndex !
- Introduces gdacfs helper (a file system for any gdac path)
- Introduces ArgoKerchuncker to allow lazy netcdf file opening
- allows to pass "overwrite" to open_dataset in lazy mode for kerchunk translate
- fix bug in load_in_memory open_dataset
@gmaze gmaze marked this pull request as ready for review December 19, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New data source for GDAC from Amazon S3
1 participant