A collection of Bash and Python scripts to download, process, plot, and compare COVID19 data and forecasts. Each separate data set has its own unique organziation, and the goal of this project is to provide a standard way of manipulating data from different sources. Each data set is loaded as a Pandas dataframe with the following standardized fields:
date
: calendar date for each record, stored as adatetime
objectcases
: new number cases reported each daycumCases
: the cumulative number of cases that have been reportedhosptialAdmissions
: new hospital admissions reported each dayhospitalizedCurrently
: number of people currently hospitalized at each datedeaths
: new number of deaths reported each daycumDeaths
: the cumulative number of deaths
- JHU CSSE: https://coronavirus.jhu.edu/
- covidtracking.com: https://covidtracking.com/
- healthdata.gov: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh
- NCHS: https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab
- CDC: https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36
- UMN: https://carlsonschool.umn.edu/mili-misrc-covid19-tracking-project
- CDC ensemble forecasts: https://www.cdc.gov/coronavirus/2019-ncov/covid-data/mathematical-modeling.html
- IHME: https://covid19.healthdata.org
Downloading data requires a Bash shell scripting envoronment. Windows users are encouraged to use the Windows Subsystem for Linux (WSL). The following prerequisites are required to run the Bash scripts: wget
, curl
, unzip
, git
Data can be downloaded by running the following commands from the root directory. Beware that the forecasting data requires >20GB of storage space, and the initial download takes significant time. Running the commands again will fetch and process any new data.
Download/update forecast data:
./update-forecasts.sh
Download/udpate truth data:
./update-truth.sh