Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http file system directed to stream by an "Accept-Ranges": "none" response #1631

Merged
merged 6 commits into from
Sep 24, 2024

Conversation

itcarroll
Copy link
Contributor

@itcarroll itcarroll commented Jun 19, 2024

Background

HTTPFileSystem._open chooses between HTTPFile or HTTPStreamFile, and only the former allows random access, by using the "Range" header in subsequent HTTP GET requests. Servers do not always respond predictably to partial content requests, but they have the option of explicitly discouraging the use of "Range" by including "Accept-Ranges": "none" in response headers. It's rarely done, but the lack of a "Accept-Ranges" response does not (in practice) indicate the lack of support for the "Range" header.

Changes

Previously, HTTPFileSystem._open only used HTTPStreamFile when NOT block caching (which is not the default) or when the content size could not be determined in advance (i.e. rarely). This PR makes a response header with "Accept-Range": "none" into another rare cause for the switch to streaming.

  • HTTPFileSystem._open now always makes a self.info call (even given size) 🤨 may see a "partial" key in self.info results
  • http._file_info checks for the "Accept-Ranges" header in the response
  • The return value of HTTPFileSystem._info may include "partial": False
  • Tested with test_no_range_support
    • in its first parameterization of header, the ignore_range value was irrelevant and removed
    • a new parameterization was added with accept_range with conftest updated to implement

Alternative

It seems like an alternative approach is to stick with HTTPFile but force _fetch_all for read. Please comment if that is preferred ... I haven't figured out what does and doesn't use caching.

## Draft

I'm in the process of working with a NASA DAAC that does not provide partial content to test the approach, and will remove the "Draft" status when successful. Reviews in the meantime are very welcome. Ugh, I give up waiting for them. Please review when able!

@itcarroll itcarroll changed the title http file system directed to only stream by an "Accept-Ranges": "none" response to HEAD http file system directed to stream by an "Accept-Ranges": "none" response Jun 19, 2024
@itcarroll
Copy link
Contributor Author

Testing with the DAAC providing "Accept-Range": "none" and earthaccess jumped through one hoop (getting HTTPStreamFile rather than HTTPFile, but revealed another. When I get an HTTPStreamFile through earthaccess.open, there's an error in fsspec.caching.BaseCache._fetch because self.size is not set. First step to resolving is making a test to cover it, but I'm currently not sure how to trigger this use of caching directloy (i.e. without earthaccess.open). Investigating.

@martindurant
Copy link
Member

You seem to have the same situation as #1626 - there are some suggestions in that thread of what might be done.

@itcarroll
Copy link
Contributor Author

I think I'm actually running into nsidc/earthaccess#610. If I understand correctly what you say about the cache type being hard coded to "none" (i.e. BaseCache), then there should not have been a call to the cache (and no problem with the lack of self.size) if not for the earthaccess bug.

If they agree, I will mark this ready for review. Caching on HTTPStreamFile would be great, but a separate (harder!) contribution.

@martindurant
Copy link
Member

Is there any update here?

@itcarroll
Copy link
Contributor Author

Waiting for a review of nsidc/earthaccess#620, and I won't forget to pick this up after.

@martindurant
Copy link
Member

OK thanks :)

@itcarroll itcarroll marked this pull request as ready for review July 29, 2024 02:34
@itcarroll
Copy link
Contributor Author

@martindurant The downstream issue has resolved as I'd hoped. I've sync'd this branch and ran tests locally. Ready for workflow approval and review. Thanks for your patience!

@martindurant
Copy link
Member

Perfect, thank you.

@martindurant martindurant merged commit b842cf4 into fsspec:master Sep 24, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants