A scalable, mature and versatile web crawler based on Apache Storm
-
Updated
Nov 25, 2024 - Java
A scalable, mature and versatile web crawler based on Apache Storm
Resources for running StormCrawler with Docker services
Process web archives (WARC format) with StormCrawler and index content into Elasticsearch or Solr
Ansible playbook for deploying a Storm cluster
StormCrawler topology to evaluate the performance of different backends and configurations
Add a description, image, and links to the stormcrawler topic page so that developers can more easily learn about it.
To associate your repository with the stormcrawler topic, visit your repo's landing page and select "manage topics."