http file system directed to stream by an "Accept-Ranges": "none" response #1631
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
HTTPFileSystem._open
chooses betweenHTTPFile
orHTTPStreamFile
, and only the former allows random access, by using the"Range"
header in subsequent HTTP GET requests. Servers do not always respond predictably to partial content requests, but they have the option of explicitly discouraging the use of"Range"
by including"Accept-Ranges": "none"
in response headers. It's rarely done, but the lack of a"Accept-Ranges"
response does not (in practice) indicate the lack of support for the"Range"
header.Changes
Previously,
HTTPFileSystem._open
only usedHTTPStreamFile
when NOT block caching (which is not the default) or when the content size could not be determined in advance (i.e. rarely). This PR makes a response header with"Accept-Range": "none"
into another rare cause for the switch to streaming.HTTPFileSystem._open
now always makes amay see a "partial" key inself.info
call (even givensize
) 🤨self.info
resultshttp._file_info
checks for the"Accept-Ranges"
header in the responseHTTPFileSystem._info
may include"partial": False
test_no_range_support
header
, theignore_range
value was irrelevant and removedaccept_range
withconftest
updated to implementAlternative
It seems like an alternative approach is to stick with
HTTPFile
but force_fetch_all
for read. Please comment if that is preferred ... I haven't figured out what does and doesn't use caching.## DraftI'm in the process of working with a NASA DAAC that does not provide partial content to test the approach, and will remove the "Draft" status when successful. Reviews in the meantime are very welcome.Ugh, I give up waiting for them. Please review when able!