Skip to content

Commit

Permalink
Adjust logic to always use downloader middleware when SW_WACZ_SOURCE_…
Browse files Browse the repository at this point in the history
…URI is configured
  • Loading branch information
Wesley van Lee committed Jan 10, 2025
1 parent cd62579 commit 14f1f36
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 7 deletions.
2 changes: 1 addition & 1 deletion docs/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ This setting defines the location of the WACZ file that should be used as a sour
SW_WACZ_CRAWL = True
```

Setting to control the scraping behavior. If set to `False`, the scraper will bypass the WACZ middleware/downloadermiddleware during the crawling process.
Setting to ignore original `start_requests`, just yield all responses found in WACZ.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ description = "A webarchive extension for Scrapy"
readme = "README.md"
keywords = ["Scrapy", "Webarchive", "WARC", "WACZ"]
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"Programming Language :: Python :: 3",
"Programming Language :: Python",
]
Expand Down
6 changes: 1 addition & 5 deletions scrapy_webarchive/downloadermiddlewares.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,8 @@ def _check_ignore_conditions(self, request: Request, spider: Spider) -> None:
def process_request(self, request: Request, spider: Spider):
"""Called for each request that goes through the downloader."""

# Continue default crawl behaviour.
if not self.crawl:
return None

# If the attribute has not been set, none of the WACZ could be opened.
if self.crawl and not hasattr(self, "wacz"):
if not hasattr(self, "wacz"):
raise WaczMiddlewareException("Could not open any WACZ files, check your WACZ URIs and authentication.")

# Check if the request should be ignored.
Expand Down

0 comments on commit 14f1f36

Please sign in to comment.