Automated workflow for data scraping

Process

Get data from URL
Clean data in Pandas dataframe
Convert dataframe to .csv
Store .csv in S3 bucket as text file
Deploy scraper to AWS lambda and run it there every 15 minutes

Creating a new scraper

Create a copy of the scraper_template.py file or one of the existing scrapers
Implement your scraper
Add your scraper to the list of scrapers in handler.py

How to test locally

Run your scraper .py skript: python get_data_scrapername.py

Deploy

Push your code to the repo, it will get deployed automatically

Contributing

Make pull request (or ask for repo membership)
After review your new scraper will be deployed automatically

Error reporting

Any errors in a scraper are reported to Sentry
Use asserts to confirm the data is valid
If non-critical but interesting changes to the data are noticed, report them to Slack via Sentry message

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.github/workflows		.github/workflows
config		config
data		data
utils		utils
.gitignore		.gitignore
README.md		README.md
get_data_arcgis_nrw_icu.py		get_data_arcgis_nrw_icu.py
get_data_dashboard.py		get_data_dashboard.py
get_data_divi.py		get_data_divi.py
get_data_rki.py		get_data_rki.py
get_data_rki_github_hospitalization.py		get_data_rki_github_hospitalization.py
get_data_rki_github_r.py		get_data_rki_github_r.py
get_data_rki_github_vaccination.py		get_data_rki_github_vaccination.py
get_data_rki_ndr_districts.py		get_data_rki_ndr_districts.py
get_data_rki_ndr_districts_history.py		get_data_rki_ndr_districts_history.py
get_data_rki_ndr_districts_nrw.py		get_data_rki_ndr_districts_nrw.py
get_data_rki_ndr_history.py		get_data_rki_ndr_history.py
handler.py		handler.py
package-lock.json		package-lock.json
package.json		package.json
requirements.in		requirements.in
requirements.txt		requirements.txt
scraper_template.py		scraper_template.py
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated workflow for data scraping

Process

Creating a new scraper

How to test locally

Deploy

Contributing

Error reporting

About

Releases

Packages

Contributors 4

Languages

wdr-data/python_covid19_nrw

Folders and files

Latest commit

History

Repository files navigation

Automated workflow for data scraping

Process

Creating a new scraper

How to test locally

Deploy

Contributing

Error reporting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages