Run scrapers against live data sources #1059

tillprochaska · 2024-11-17T13:03:07Z

The new HTV_TEST_MOCK_REQUESTS environment variable can be set to false to disable HTTP request mocking. This means that all HTTP requests initiated during tests will be passed to the original source.

Not all tests can be run easily against live data sources. For example, some tests cover edge cases that cannot be reproduced reliably using live data. The test marker always_mock_requests can be used to override the global setting for individual tests.

For example, the following test will never send a real HTTP request, even when HTV_TEST_MOCK_REQUESTS=false.

@pytest.mark.always_mock_requests
def test_lorem_ipsum(responses):
  responses.get("https://example.org", body="Lorem ipsum")

  assert requests.get("https://example.org").body == "Lorem ipsum"

I have chosen to explicitly mark tests that should never send real HTTP requests (rather than the opposite) to encourage writing tests that can be executed against the live data sources. Writing a (scraper) test that needs to mock HTTP requests should be the exception.

For now, I’ve marked tests covering the RCVListScraper (they cannot be easily executed against the live source and given they are using an XML source they are probably less likely to break compared to scrapers using an HTML source) as well as a few tests covering generic functionality (such as timeout handling).

I have also updated a few scrapers and fixtures because the live source data has changed, although only in one case the scraper was actually broken (geographic areas).

With regards to the implementation (see the responses fixture in conftest.py), I’m a little unsure whether the fact that it’s quite hacky indicates it’s something you shouldn’t do or if it’s simply a use case that isn’t super common.

This isn’t a breaking change in practice as the old URLs still work and redirect to the new URLs.

This test was using an outdated fixture. As the 9th plenary term is over, there are no ongoing group memberships anymore and all group memberships now have an end date. I've updated the test to use Markus Weber's group memberships from the current term. We will have to update the test when something changes (or for the 11th term the latest), but that should be manageable.

…s are disable globally

The test fixture used in this case was different from the live source. I've replaced it with a copy of the original source.

Optionally disable all HTTP request mocks

73e0ffe

tillprochaska force-pushed the 1057-scraper-tests-live-sources branch from b3f1927 to 73e0ffe Compare November 17, 2024 13:05

tillprochaska added 6 commits November 17, 2024 15:59

Run scraper tests against live data sources every Tuesday night

52120e1

Update procedure scraper URLs

3e72aa8

This isn’t a breaking change in practice as the old URLs still work and redirect to the new URLs.

Fix extraction of geographic areas after OEIL markup change

b10164e

Allow some tests to always mock HTTP requests, even when request mock…

1adbaac

…s are disable globally

Update members scraper so that it can run against live data sources

a47a129

The test fixture used in this case was different from the live source. I've replaced it with a copy of the original source.

tillprochaska force-pushed the 1057-scraper-tests-live-sources branch from 8b64299 to a47a129 Compare November 17, 2024 17:31

tillprochaska marked this pull request as ready for review November 17, 2024 17:40

tillprochaska requested a review from linusha November 17, 2024 17:40

tillprochaska changed the title ~~Optionally disable all HTTP request mocks~~ Run scrapers against live data sources Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run scrapers against live data sources #1059

Run scrapers against live data sources #1059

tillprochaska commented Nov 17, 2024 •

edited

Loading

Run scrapers against live data sources #1059

Are you sure you want to change the base?

Run scrapers against live data sources #1059

Conversation

tillprochaska commented Nov 17, 2024 • edited Loading

tillprochaska commented Nov 17, 2024 •

edited

Loading