Skip to content

This repository focuses on data understanding and preparation for the Covid-19 pandemic. The data comes from the Centers for Disease Control and Prevention.

Notifications You must be signed in to change notification settings

JasonBallantyne/DataAnalytics_Prep

Repository files navigation

DataAnalytics_Prep

This repository focuses on data understanding and preparation for the Covid-19 pandemic. The data comes from the Centers for Disease Control and Prevention. CDC is a USA health protection agency and is in charge of collecting data about the COVID-19 pandemic, and in particular, tracking cases, deaths, and trends of COVID-19 in the United States. CDC collects and makes public deidentified individual-case data on a daily basis, submitted using standardized case reporting forms. In this analysis, we focus on using the data collected by CDC to build a data analytics solution for death risk prediction. CDC collects demographic characteristics, exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities. It also includes information on whether the individual survived or not.

We carry out the following tasks:

  1. Prepare a data quality report for the dataset on CSV.
  2. Prepare a data quality plan for the cleaned CSV file.
  3. Explore Relationships between feature pairs.
  4. Transform the existing features to create new features with the aim to better capture the problem domain and the target outcome.

About

This repository focuses on data understanding and preparation for the Covid-19 pandemic. The data comes from the Centers for Disease Control and Prevention.

Topics

Resources

Stars

Watchers

Forks