county-data

Manager of arbitrary collected data of US counties, or at the county-year level.

This is a general system, for anyone who has county-level data and wants to include joint analyses. The subdirectories in this repository contain county-level datasets and the logic for exposing it.

Installation

The county-data tool requires numpy, pandas, pyyaml, xlrd, and prompt_toolkit.

pip install numpy pandas pyyaml xlrd prompt_toolkit

It has only been tested with python 2.7.

How to use the data

Depending on your use-case, there are three main ways to use the county-data package.

If you want to use an individual dataset, you can find the source data in the subdirectory. Since each dataset is in a different format, it can be helpful to use the standardized interface to extract data from it. To do so, in python, run the following:
```
import sys
sys.path.append("<path to county-data>")
from <database> import main
db = main.load()
```
where <database> is one of the subdirectories. Methods on the db object then provide access to the data. The main methods are:
- get_variables(): A list of the available variables.
- describe_variable(variable): Return a text description of a variable.
- get_unit(variable): Return the canonical unit for variable.
- get_fips(): A list of the avalable counties, as FIPS codes.
- get_years(): A list of the available years (or None for a single year).
- get_data(variable, year): Return an ordered list of data values, in the same order as the FIPS codes.
If you want to use multiple datasets, it is best to use the export tool, which provides an interactive interface to the data. The export tool joins the data across different FIPS orders, and exports the data as a CSV.
If you aren't sure what data is most useful, and want to do some data-mining, you can export the most recent year of all the variables in all the datasets across all counties. To do so, run:
```
python -m analysis.alldata
```
The export takes about 1 hour, and produces a file results.csv.

The export tool

The export tool allows variables to be extracted from the datasets, merged with variables from other datasets, and exported into new files. You can run the export tool at the terminal with:

python export.py

This will then give you a prompt, which includes the following commands (amongst others, in order of example usage):

help: Provide a list of commands or help on a particular command with help [command].
available: List the datasets that are avaialable to load.
load [dataset]: Load a dataset and prepare it for additional processing.
list: List the variables that are available across all loaded datasets.
add [variable]: Add one of the available variables to the export file.
export [filename]: Create a new file with just the added variables.
bye: Exit.

Adding new datasets

To include a new CSV file, do the following:

Create a new subdirectory and place the data files there.
Create an empty __init__.py file in that directory.
Create a file main.py in that directory and include the following:

import os
import database

def get_description(variable):
    return "Ask YOURNAME about %s." % variable

def load():
    datapath = os.path.join(os.path.dirname(os.path.realpath(__file__)), "DATAFILE.csv")
    return database.CSVDatabase(datapath, 'FIPS', get_description)

Fill in DATAFILE with the filename, FIPS with the name of the FIPS code column, and YOURNAME with your name.

Currently this returns no useful information about the variables in the get_description function, but you are encouraged to add variable-specific descriptions.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
AHRF		AHRF
USGS		USGS
acra		acra
agriculture		agriculture
analysis		analysis
awash		awash
ccimpacts		ccimpacts
census		census
climate		climate
crime		crime
election		election
energy		energy
ers		ers
groundwater		groundwater
labor		labor
mortality		mortality
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
database.py		database.py
export.py		export.py
metainfo.py		metainfo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

county-data

Installation

How to use the data

The export tool

Adding new datasets

About

Releases

Packages

Languages

License

RDCEP/county-data

Folders and files

Latest commit

History

Repository files navigation

county-data

Installation

How to use the data

The export tool

Adding new datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages