- Get data from URL
- Clean data in Pandas dataframe
- Convert dataframe to .csv
- Store .csv in S3 bucket as text file
- Deploy scraper to AWS lambda and run it there every 15 minutes
- Create a copy of the
scraper_template.py
file or one of the existing scrapers - Implement your scraper
- Add your scraper to the list of scrapers in
handler.py
- Run your scraper .py skript:
python get_data_scrapername.py
- Push your code to the repo, it will get deployed automatically
- Make pull request (or ask for repo membership)
- After review your new scraper will be deployed automatically
- Any errors in a scraper are reported to Sentry
- Use asserts to confirm the data is valid
- If non-critical but interesting changes to the data are noticed, report them to Slack via Sentry message