A basic repo for all of the automated HODP scraping scripts
Please refer to CONTRIBUTING.md
for instructions on how to add your own scraper.
- If you have access, ssh into the instance, and run
sudo su
to login again as the root user. - Navigate to the project and run
source hodp/bin/activate
./resolve_reqs
./init_crontab
crontab scrape.tab
- Add logging system
- Add unit tests
- 05/25/19 #2 (kevalii)
- Reverted the routing referenced in #1
- Added
init_crontab.sh
,write_cron.py
, andresolve_reqs.py
.write_cron.py
writes cron jobs to a crontab (scrape.tab
) using an API provided by python-crontab package. You can still write cron jobs directly intoscrape.tab
; this just provides a perhaps more organized way of writing cron jobs in.- NOTE:
scrape.tab
isn't provided in the repo. Eithertouch
it locally or executeinit_crontab.sh
- NOTE:
init_crontab.sh
executeswrite_cron.py
but it does not set the crontab. Make any changes toscrape.tab
and then executecrontab scrape.tab
.resolve_reqs.py
goes through all therequirements.txt
s in each subdirectory ofscrapers/
and installs the dependencies, updating the root directory'srequirements.txt
as well.
scrapers/crime/scrape_crime.py
no longer features the scrape function referenced in #1.
- 05/25/19 #1 (kevalii)
- Set up routing, enabling us to add more scrapers (and schedule them) in a sustainable manner.
- Added
gocrimson
scraper at/scrape/gocrimson
and a corresponding cron job.- While the scraper has also been added to this repo, it is actually executed by an GCloud function that uses a local copy of the source code for the scraper. For the future, we'll have to adjust this so that the function is sourced from this repo instead.
- Modified
crime
scraper to route to/scrape/crime
.- Wrapped relevant code in a
scrape
function in the renamed src filescrape_crime.py
so that the scraper is executed by a call toscrape
instead of just running at the top-level.
- Wrapped relevant code in a
- Moved to each scraper to a respective folder in
scrapers/
that also contains each scraper's respective dependencies in arequirements.txt
.