A hobby project as an introduction to self-taught data science. It is a web scraping app to plot ER wait times in Alberta.
Data was collected on my local machine before being published to Heroku. The app uses the following:
- Python 3.9.13
- Data captured every hour for each city/hospital.
- Plotly and Dash for visual display of the ER wait data.
- MongoDB for storage and retrieval of the data.
- Prior to MongoDB, data was collected in .csv files.
- A trial Twilio account to send SMS messages if exceptions happen during production. SMS messages are sent to my phone number only.
- Environment variables for safekeeping of the MongoDB cloud URL and Twilio account information
Data is captured for all available Calgary and Edmonton hospitals. Web scraping of the data is captured from here:
https://www.albertahealthservices.ca/waittimes/waittimes.aspx
Additional cities in Alberta can be added as a future update to the project.
Note: There is no retrospective data; data is only available from when I started collecting in May/June 2022 and is subject to outages beyond my control.
Data is visualized as follows:
- Line plots of the ER wait time as a function of every hour.
- A table showing the average and standard deviation of all data points for each hospital
- Violin sub-plots (kernel density plots) of each hospital in a 24-hour period
- Complete violin and table data for each hospital hour (average and standard deviation) when clicking on the hyperlink of the hospital name in the violin sub-plot
- A sinusoid (cosine) model "best fit curve" and equation is shown for each hospital's violin 24-hour period
Interactive features include:
- Viewing the page/plots in light or dark mode
- Computing a rolling average (boxcar filter) based on input hours selected
- Selecting date ranges by clicking relative time buttons
- Enabling/disabling hospitals in the legend
- Click-dragging zoom levels for all plots
- Click-dragging pan of all plots
- Toolbar control features (save as .png, zoom control, etc.)
Webpage is best viewed on a desktop PC, however it is supported for mobile. For best results, please rotate phone to landscape orientation.
Note: The instructions below are for a Windows platform using Python 3.9.X.
Create a folder on your PC to host the project files. Navigate to the root folder and open a command window (Windows Key + cmd.exe)
at this location.
The following environment variables are to be set by pressing Windows Key
and typing path
and hit enter. Then select Environment Variables...
. In the bottom half of the window under "System variables", click New...
and add the following environment variables:
MONGO_DB_URL
TWILIO_ACCOUNT_SID
TWILIO_AUTH_TOKEN
MY_TWILIO_NUM
MY_PHONE_NUM
Click OK
for this to take effect.
IMPORTANT! Updating environment variables for the OS requires new instances of command windows (e.g. cmd.exe
) to be openend, and/or IDEs like PyCharm or VS Code to be restarted.
Create the virtual environment in the root folder by running the following command:
python -m venv waittimes
For Windows, this means going into the waittimes\Scripts
folder(by using the cd
command in cmd.exe
) and running activate
via command prompt. Now this command prompt has (waittimes)
in it and is the virtual environment for this project, only containing the dependencies required for it (i.e. those from requirements.txt).
Once you have your virtual environment setup and running, install dependencies by navigating to the root directory in the command window (cd..
twice) and running:
pip install -r requirements.txt
This will install all of the required packages we selected within the requirements.txt
file.
MongoDB is the cloud host for my data. It can be run locally at:
mongodb://127.0.0.1:27017
Start the local mongo servers with the following in seperate cmd.exe
windows:
run mongod
run mongo
run mongosh
All of these tools have their PATH
set from:
C:\Program Files\MongoDB\Server\5.0\bin
C:\Program Files\MongoDB\Tools\100\bin
Data is imported from the CSV file by following these instructions:
- Copy and paste the
Calgary_hospital_stats.csv
and theEdmonton_hospital_stats.csv
in the same folder which will createCalgary_hospital_stats - Copy.csv
andEdmonton_hospital_stats - Copy.csv
. - Open each copied
.csv
file innotepad++
and pressCTRL+H
(to find/replace). Find all instances of\r\r
and replace with\r
. Save these files and closenotepad++
. - Open the updated
.csv
files inExcel
, and simply save these files in Excel to properly format them as csv files thatmongoimport
can recognize.
Next, open a command window cmd.exe
and run the following commands to populate the MongoDB on the cloud:
mongoimport --uri mongodb+srv://<USERNAME>:<PASSWORD>@cluster0.chujx.mongodb.net/erWaitTimesDB --collection Calgary --type=csv --headerline --maintainInsertionOrder --file "Calgary_hospital_stats - Copy.csv"
mongoimport --uri mongodb+srv://<USERNAME>:<PASSWORD>@cluster0.chujx.mongodb.net/erWaitTimesDB --collection Edmonton --type=csv --headerline --maintainInsertionOrder --file "Edmonton_hospital_stats - Copy.csv"
If just populating the localhost, removing the --uri
option will allow the local database to be populated.
Run the dashboard as:
python dash_er_wait.py
The server will start at http://127.0.0.1:8050/
Capturing the data is done at: https://capture-ab-er-wait.herokuapp.com/
The app is located at: https://alberta-er-wait-times.herokuapp.com/
The app has been moved to: https://alberta-er-wait-times.onrender.com/
Procfile
for capture_er_wait_data.py:
worker: python capture_er_wait_data.py
Chrome/Chromedriver buildpacks required:
https://github.com/heroku/heroku-buildpack-google-chrome.git
https://github.com/heroku/heroku-buildpack-chromedriver.git
CLI to scale worker:
heroku scale worker=1
Point to this app in Heroku:
git remote set-url heroku https://git.heroku.com/capture-ab-er-wait.git
Procfile
for dash_er_wait.py:
web: gunicorn dash_er_wait:server
Point to this app in Heroku:
git remote set-url heroku https://git.heroku.com/alberta-er-wait-times.git