Web-scraping of TV5-MMDA traffic monitoring system

The script will be set as a cron job in a GCS F1-micro instance, scheduled to run every 15-minutes. Scraped data will be dumped in the Cloud storage for later analysis.

Process flow

References

There's some related web-scraping work done by like-minded individuals in the past. The goal of this project is to build upon what they have done under a data engineering mindset: as a deployable and sustained data pipeline.

Changelog

2020-08-01
1. Modified update timestamp by scraping information when status was last updated
2. Added scrape timestamp to data dump for downstream processing of actual status update time
3. Applied modifications to cronjob with data scrape dump scheduled at 2020-08-01 7:15 am
2020-08-02
1. Added line head name (eg: EDSA, QUEZON AVE.) as part of scraped data
2. Applied modifications to scrape script for cron job job scheduled at 2020-08-02 6:30 am

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
docs		docs
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
download_raw.py		download_raw.py
requirements.txt		requirements.txt
scrape_data.py		scrape_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-scraping of TV5-MMDA traffic monitoring system

Process flow

References

Changelog

About

Releases

Packages

Languages

tropicalmentat/mmda_traffic_history

Folders and files

Latest commit

History

Repository files navigation

Web-scraping of TV5-MMDA traffic monitoring system

Process flow

References

Changelog

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages