This is a project created to illustrate the basics of web scraping by pulling information from the HackerNews RSS feed. This builds from a simple web scraper in scraping.py
, into an automated scraping tool in tasks.py
.
-
Building an RSS feed scraper with Python is available here.
-
Automated web scraping with Python and Celery is available here.
The following are used to start the scheduled scraping with Celery in tasks.py
.
Starting our RabbitMQ server (terminal #1):
rabbitmq-server
Starting the scraping (terminal #2):
celery -A tasks worker -B -l INFO
MIT License.