Skip to content

I work for a summer camp planning its 2021 season. We hope to base our program and policy decisions on the level of community spread in our campers' hometowns. This project connects camper geography to the latest COVID-19 data and generates an interactive dashboard for decision-makers.

Notifications You must be signed in to change notification settings

amcgaha/camp-community-covid-levels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summer Camp COVID Risk: Assessing Local Conditions

I work for a summer camp planning its 2021 season. We hope to base our program and policy decisions on the level of community spread in our campers' home counties. This project connects camper geography to the latest COVID-19 data and generates an interactive dashboard for decision-makers. Click here to view the final product.

Contents

  1. Introduction
  2. Data Sources
  3. Methods & Tools
  4. Procedures
  5. Product

Introduction

I work for a summer camp in North Carolina. Our most important project this year is developing a safe opening plan in the context of COVID-19. With many best practices available, our team would like to know how restrictive our policies and procedures need to be this summer.

We hope to strike an intelligent balance between freedom and a safe environment. Ideally, the level of restrictions we require would be informed by the current epidemiological conditions in our campers' communities. Because the pandemic seems to be ever-changing, we would also want to know if we need to adjust our plans as conditions change throughout the spring and summer.

Our campers come from many different states and counties in the United States, but we have significant clusters in certain communities. We know that in the United States there are differences in COVID-19 outbreaks across communities, making location-specific data an important analytical tool.

Before this project, we had no way to incorporate specific data about our campers' communities into our decision-making. Without this data, our process for determining risk would be limited at best. We might, for example, base our decisions on data about the nation as a whole, missing the important distinctions in geography that would reveal higher or lower risks. Worse, we might suffer from bias and self-deception. With the pressure mounting to get good news, we might unconsciously select datapoints or anecdotes that lead us to underestimate the risks we face.

The solution to this dilemma is data.

This project provides a simple but useful snapshot into the current conditions in our campers' communities. It connects the geography of COVID-19 data, including cases by county and positivity rates by state, with a database of enrolled campers that includes their home address.

We present the data in an interactive dashboard for decision-makers. The dashboard visualizes current COVID-19 data in our campers' states and counties, along with estimates of the risks presented to our community based on how many campers come from each place.

Click here to view the final product.

Sources

This project connects three categories of data: dynamic COVID data, basic geographic and demographic data, and the latest information on the campers who are enrolled in summer camp.

Camper Data

Information on our campers comes from our company's data management system, called CampMinder. This company does not have an API, so the information has to be downloaded manually. This data is also sensitive, because it shows campers' names and addresses.

To keep these parts confidential, as well as make querying specific information seamless, I created a local database using PostgreSQL. Throughout this project, we query camper details from this database in SQL via the Python library SQLalchemy.

Geographic Data

Prior to this project, the geographic information we stored on campers was limited to home addresses. Because COVID data is reported by health departments at the county level, we needed a way to connect zip codes to counties. We use a conversion table from the U.S. Department of Housing and Urban Development to make this conversion.

In order to calculate the number of cases per 100,000 people, a metric that standardizes numbers according to population, we needed to find the population for each state and county. We add the state population with a convenient table from the U.S. Census Bureau. We add the county population from this webpage recommended by the CDC. These sources are ideal because the data is stripped down and relatively easy to process.

Dynamic COVID-19 Data

We connect to two popular datasets to obtain the latest COVID-19 information. These datasets are ideal because they are carefully managed by experts in data science and health reporting. They also update frequently, have strong documentation, and are easy to access.

For county-level data, we download the most recent dataset from The New York Times. We focus on the number of cases that each county has reported most recently. This number will show up on our analysis as the cumulative number of cases per county.

For state-level data, we connect to The Atlantic's COVID-19 Tracking Project using their API. This source is unique because it includes information on the number of tests conducted in each state and how many return positive. This allows us to calculate the state's positive test rate.

We also need sources to help us complete "sanity checks" and validate certain information. Whenever we have doubts or questions we will consult the Johns Hopkins University Coronavirus Resource Center, which has a wealth of reliable state and county information.

For more information about the challenges of COVID-19 data collection and interpretation, read this article in Wired.

Methods & Tools

We process our data using Python and the Pandas library. Each step is completed in a series of Jupyter Notebooks so the process can be followed in detail.

The data about our campers is stored in a PostgreSQL relational database. I created the database beforehand. In this project we update the database with the latest information on our campers who've enrolled this year. We access the database using SQL via the Python library SQLalchemy.

Finally, we explore and visualize the results in Tableau.

Procedure

1. Add Geographic Details to Database

In this step we import, clean, and combine geographic information, and then add it to our database. Because this information will not change over the course of the project, we only need to complete this step once.

View Notebook

2. Update Database with New Campers

Campers enroll in summer camp throughout the year, and so we need to continually update our database to include the most recent information on who is enrolled and where they come from. In this step we process that information from our company data management system into a form that matches our database. Then we upload it.

View Notebook

3. Collect & Combine COVID Data

Next, we download the COVID datasets, process them, and combine them with information queried from the camper database. We create an output that is ready for exploration and visualization.

View Notebook

4. Generate Dashboard

In Tableau, we import the file created in the previous step. After first exploring what visualizations will be the most helpful in describing the data, we create dashboards and combine them into a story (a series of dashboards) that we can update at any point with new data.

View Tableau Workbook

Product

The final product is an interactive dashboard with four parts. It is built using Tableau. Click here to view.

1. State Indicators

The first page focuses on two indicators that come only from state-level data. Sometimes known as the "positivity rate", the first indicator is the proportion of total COVID-19 tests that have returned positive for COVID-19. The data reported are cumulative, meaning that the number seen on the dashboard is the proportion of all tests conducted that have been positive.

The bottom half of the page shows "recent increases" in COVID-19 cases, reported in the standard form of cases per 100,000 people in the state. This number is very limited, but it does alert us to states that might be "on the rise." If we are going to make decisions based on trends in certain states, we should not limit our research to this data, which is only a snapshot of recent increases. We should take any increases seen here as one clue, and then visit other resources to see a fuller picture of trends.

Both indicators are also presented in a block chart. In addition to tagging each state with its current indicator number, the block charts show the proportion of our enrolled campers who come from that state.

Image

2. County Indicators

The second page adds new information at the county level. We see the cumulative number of COVID-19 cases by county, per 100,000 people. Like the first page, we also see these numbers on a block chart representing the counties most campers come from.

Image

3. Risk Overview

Here we try to combine the indicators into one number that can help us understand our overall risk. To do this, we calculate a field called "risk points." These are calculated for each camper based on two numbers: The positivity rate for that camper's state, and the number of cases per 100,000 in their county. We multiply these numbers together to get risk points for each camper.

Lower Risk: 0.05 pos rate * 100 cases = 5 risk points

Higher Risk: 0.10 pos rate * 300 cases = 30 risk points

When we sum all the risk points for all campers we can get a sense of our overall risk level. When we sum the risk from certain states, counties, or camp sessions, we can compare risks and risk sources together.

Image

Image

Note that this does not dictate what policies or procedures we should follow. There are certain procedures we must do even if all campers are 'low risk' according to this estimate, like wearing masks and social distancing. The dashboard only serves to ground us in an understanding of some basic information on what our campers' communities are like right now.

4. Risk Detail

The final page shows risk points by individual and by camp session. We could use this detailed view to understand which sessions are more risky than others, at least based on our campers' geography. When compared with our camper database, which stores campers names and contact information, we could also develop a system to contact and create a safety plan with campers who have a high number of risk points.

Image

About

I work for a summer camp planning its 2021 season. We hope to base our program and policy decisions on the level of community spread in our campers' hometowns. This project connects camper geography to the latest COVID-19 data and generates an interactive dashboard for decision-makers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published