Run scrapers against live data sources #1059
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1057.
The new
HTV_TEST_MOCK_REQUESTS
environment variable can be set tofalse
to disable HTTP request mocking. This means that all HTTP requests initiated during tests will be passed to the original source.Not all tests can be run easily against live data sources. For example, some tests cover edge cases that cannot be reproduced reliably using live data. The test marker
always_mock_requests
can be used to override the global setting for individual tests.For example, the following test will never send a real HTTP request, even when
HTV_TEST_MOCK_REQUESTS=false
.I have chosen to explicitly mark tests that should never send real HTTP requests (rather than the opposite) to encourage writing tests that can be executed against the live data sources. Writing a (scraper) test that needs to mock HTTP requests should be the exception.
For now, I’ve marked tests covering the
RCVListScraper
(they cannot be easily executed against the live source and given they are using an XML source they are probably less likely to break compared to scrapers using an HTML source) as well as a few tests covering generic functionality (such as timeout handling).I have also updated a few scrapers and fixtures because the live source data has changed, although only in one case the scraper was actually broken (geographic areas).
With regards to the implementation (see the
responses
fixture inconftest.py
), I’m a little unsure whether the fact that it’s quite hacky indicates it’s something you shouldn’t do or if it’s simply a use case that isn’t super common.