Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

earthaccess.download() ignores errors #581

Open
mfisher87 opened this issue May 21, 2024 · 5 comments
Open

earthaccess.download() ignores errors #581

mfisher87 opened this issue May 21, 2024 · 5 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mfisher87
Copy link
Member

mfisher87 commented May 21, 2024

We found the pqdm library is the thing swallowing all the error messages. pqdm has an exception_behavior option, which defaults to "ignore", surprisingly. The other two settings, "deferred" and "immediate", are both good options for earthaccess.

@chuckwondo and I agree "immediate" is the most intuitive default, and we can establish a new flag fail_fast: bool = True to override it to "deferred" by passing download(..., fail_fast=False). Prior discussion in #36. Please continue discussion here :)

I feel exposing the "ignore" behavior isn't necessary because the user can do their own exception handling with try.

Thoughts?

@Sherwin-14
Copy link
Contributor

@mfisher87 This seems interesting, what should we proceed immediateor deferred?

@chuckwondo
Copy link
Collaborator

@mfisher87 This seems interesting, what should we proceed immediate or deferred?

@Sherwin-14, we want to proceed with both, meaning that we want both options available to the user by doing the following:

  • Add a keyword argument fail_fast: bool = True to the download method/function (i.e., by default we want to "fail fast" [immediately upon the first error])
  • When fail_fast is True, pass exception_behavior="immediate" to pdqm
  • When fail_fast is False, pass exception_behavior="deferred" to pdqm
  • Enhance the docstring for download to indicate this behavior, describing that when fail_fast is True (default), the first download error will raise the error from download, and when fail_fast is False, the method/function will return a list with possibly a mix of successful downloads and errors (you'll have to do a bit of experimentation here, as I don't know if that's completely correct)

@mfisher87
Copy link
Member Author

@Sherwin-14 don't forget to take a look at #36 for more context! Do you want to be assigned this issue?

@Sherwin-14
Copy link
Contributor

def get(
        self,
        granules: Union[List[DataGranule], List[str]],
        local_path: Union[Path, str, None] = None,
        provider: Optional[str] = None,
        threads: int = 8,
    ) -> List[str]:

        if local_path is None:
            today = datetime.datetime.today().strftime("%Y-%m-%d")
            uuid = uuid4().hex[:6]
            local_path = Path.cwd() / "data" / f"{today}-{uuid}"
        elif isinstance(local_path, str):
            local_path = Path(local_path)

        if len(granules):
            files = self._get(granules, local_path, provider, threads)
            return files
        else:
            raise ValueError("List of URLs or DataGranule instances expected")

 @singledispatchmethod
 def _get(
        self,
        granules: Union[List[DataGranule], List[str]],
        local_path: Path,
        provider: Optional[str] = None,
        threads: int = 8,
    ) -> List[str]:

        raise NotImplementedError(f"Cannot _get {granules}")

@mfisher87 @chuckwondo I had been looking into this and it seems like pqdm isn't used in the get and _get functions which are used in download function. But as chuck mentioned in #36 the 'pretty progress bars" are a result of pqdm which I too had when I tried to recreate the error as described in the linked post above.

@chuckwondo
Copy link
Collaborator

@Sherwin-14, just search the code base for pqdm.

Whichever functions you find it in, do the following:

  1. Add a fail_fast: bool = True parameter to the end of the parameter list of the function that is calling pqdm
  2. Immediately before the call to pqdm, add the following line:
    exception_behavior = "immediate" if fail_fast else "deferred"
  3. In the call to pqdm, add the following to the end of the args passed to pqdm: exception_behavior=exception_behavior

Then, you have to find the places where those functions are called (i.e., where the functions that call pqdm are called), and do the following:

  1. add a fail_fast param, just like described above
  2. pass the fail_fast arg to the calls to the functions modified above
  3. repeat this process up to the point where the call chain is initiated from any of the "download" or "open" functions

Also, in each modified function, update the docstring appropriately to describe the added fail_fast parameter.

Writing tests for these changes will likely be tricky, so you can ignore this for the moment and focus first on the steps above and make sure that existing tests pass (i.e., make sure you didn't break existing stuff before worrying about writing tests for the new stuff).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants