Skip to content

Commit

Permalink
Add fiscally standardized cities
Browse files Browse the repository at this point in the history
  • Loading branch information
capnrefsmmat committed Sep 7, 2023
1 parent a84fcb1 commit 5684d5c
Show file tree
Hide file tree
Showing 5 changed files with 135 additions and 0 deletions.
18 changes: 18 additions & 0 deletions _freeze/politics/standard-cities/execute-results/html.json

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions cmu-statds-datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@ date,datayear,title,description,subject,categories,url
2017-02-25,2017,Religion and analytic thinking,A controversial study found that encouraging analytic thinking also reduced religious belief. A replication attempt collected much more data to try to confirm the hypothesis; is it supported by the data? A simple randomized experiment with continuous outcome and demographic controls.,psychology,"ANOVA, linear regression",https://cmustatistics.github.io/data-repository/psychology/religion-analytic-thinking.html
2023-09-06,2023,Mapping Police Violence,"Each year in the United States, around 1,000 people are killed by police. Some of these deaths are accidental, in traffic accidents and other incidents; some are deliberate killings considered legally justified by authorities; and others are considered unjustified or even lead to the prosecution of the police officer. Explore data on over 10,000 such killings to identify patterns in the people killed, the stated reasons for their killings, and the situations leading to their deaths.",politics,"GLMs, linear regression, logistic regression",https://cmustatistics.github.io/data-repository/politics/mapping-police-violence.html
2023-06-20,2022,World Development Indicators,"The World Bank’s World Development Indicators (WDI) compile development information about countries around the world. Using ten years of data, study development, political stability, pollution, and other factors at the national and regional levels.",politics,"linear regression, ANOVA",https://cmustatistics.github.io/data-repository/politics/world-bank.html
2023-09-07,2020,Fiscally standardized cities,Extensive financial data on over 200 of the largest cities in the United States for over 40 years. Which cities spend the most or the least on government services?,politics,"EDA, clustering",https://cmustatistics.github.io/data-repository/politics/standard-cities.html
2017-02-22,2014,Science Forums,"A random sample of discussions at a large science discussion forum, with metadata about each.",social,"GLMs, classification, linear regression",https://cmustatistics.github.io/data-repository/social/science-forums.html
2019-08-14,2010,"House prices in Ames, Iowa","Records of homes sold in Ames, Iowa, and their sale prices, including detailed covariates that could be used to predict home prices.",money,"linear regression, nonparametric regression",https://cmustatistics.github.io/data-repository/money/ames-housing.html
2023-06-09,2021,European protected ham,"In Europe, some types of ham—like Black Forest ham—can only be legally produced in specific geographic regions. Does the size of those regions affect the price of the ham?",money,"ANOVA, linear regression, hierarchical model",https://cmustatistics.github.io/data-repository/money/european-ham.html
2022-07-27,2014,Rail trails and property values,"Rail trails are a great recreation opportunity, but do homes near them rise in value? A retrospective observational study of house values.",money,"linear regression, nonparametric regression",https://cmustatistics.github.io/data-repository/money/rail-trails.html
2019-02-09,2015,Traffic stops in Connecticut,"Records for every traffic stop made by the Connecticut State Police over several years, including the reason for the stop and demographic details. Use classifiers to study whether stop and search decisions show signs of bias.",crime-and-justice,"classification, logistic regression",https://cmustatistics.github.io/data-repository/crime-and-justice/connecticut-stops.html
2023-05-17,2020,Core temperature during surgery,Is low core temperature during surgery associated with poor surgical outcomes or death? Use logistic and survival analysis to study an observational dataset.,medicine,"logistic regression, survival analysis",https://cmustatistics.github.io/data-repository/medicine/core-temperature.html
2023-06-08,1828,Bloodletting,"Bloodletting—deliberately withdrawing large amounts of blood from patients—was a common medical practice for centuries, until evidence was finally collected on its efficacy. Use an early observational dataset to explore its use for pneumonia.",medicine,"logistic regression, contingency tables",https://cmustatistics.github.io/data-repository/medicine/bloodletting.html
2022-12-08,2017,Health exams in Vietnam,"As access to health care increases, public health advocates encourage ordinary people to get regular check-ups to detect problems before they become serious (and expensive). But not everyone follows this advice. This survey of people in Vietnam explores the factors leading people to get regular check-ups.",medicine,"surveys, classification, logistic regression",https://cmustatistics.github.io/data-repository/medicine/vietnam-health.html
Expand Down
Binary file added data/fisc_full_dataset_2020_update.csv.gz
Binary file not shown.
Binary file added data/fisc_full_dataset_2020_update_variables.pdf
Binary file not shown.
115 changes: 115 additions & 0 deletions politics/standard-cities.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
---
title: Fiscally standardized cities
author: Alex Reinhart
date: September 7, 2023
description: Extensive financial data on over 200 of the largest cities in the United States for over 40 years. Which cities spend the most or the least on government services?
categories:
- EDA
- clustering
data:
year: 2020
files: fisc_full_dataset_2020_update.csv.gz
---

## Motivation

In the United States, city governments provide many services: they run public
school districts, administer certain welfare and health programs, build roads
and manage airports, provide police and fire protection, inspect buildings, and
often run water and utility systems. Cities also get revenues through certain
local taxes, various fees and permit costs, sale of property, and through the
fees they charge for the utilities they run.

It would be interesting to compare all these expenses and revenues across cities
and over time, but also quite difficult. Cities share many of these service
responsibilities with other government agencies: in one particular city, some
roads may be maintained by the state government, some law enforcement provided
by the county sheriff, some schools run by independent school districts with
their own tax revenue, and some utilities run by special independent utility
districts. These governmental structures vary greatly by state and by individual
city. It would be hard to make a fair comparison without taking into account all
these differences.

This dataset takes into account all those differences. The Lincoln Institute of
Land Policy produces what they call “Fiscally Standardized Cities” (FiSCs),
aggregating all services provided to city residents regardless of how they may
be divided up by different government agencies and jurisdictions. Using this, we
can study city expenses and revenues, and how the proportions of different costs
vary over time.

## Data

The dataset tracks over 200 American cities between 1977 and 2020. Each row
represents one city for one year. Revenue and expenditures are broken down into
more than 120 categories.

Values are available for FiSCs and also for the entities that make it up: the
city, the county, independent school districts, and any special districts, such
as utility districts. There are hence five versions of each variable, with
suffixes indicating the entity. For example, `taxes` gives the FiSC's tax
revenue, while `taxes_city`, `taxes_cnty`, `taxes_schl`, and `taxes_spec` break
it down for the city, county, school districts, and special districts.

The values are organized *hierarchically*. For example, `taxes` is the sum of
`tax_property` (property taxes), `tax_sales_general` (sales taxes), `tax_income`
(income tax), and `tax_other` (other taxes). And `tax_income` is itself the sum
of `tax_income_indiv` (individual income tax) and `tax_income_corp` (corporate
income tax) subcategories.

### Data preview

```{r, echo=FALSE, results="asis"}
source("../preview_dataset.R")
preview_datasets()
```

### Variable descriptions

For each city and year, the following metadata is available:

| Variable | Description |
|----|-------------|
| year | Year for these values |
| city_name | Name of the city, such as "AK: Anchorage", where "AK" is the standard two-letter abbreviation for Alaska |
| city_population | Estimated city population, based on Census data |
| county_name | Name of the county the city is in |
| county_population | Estimated county population, based on Census data |
| cpi | Consumer Price Index for this year, scaled so that 2020 is 1. |
| relationship_city_school | Type of school district. 1: City-wide independent school district that serves the entire city. 2: County-wide independent school district that serves the entire county. 3: One or more independent school districts whose boundaries extend beyond the city. 4: School district run by or dependent on the city. 5: School district run by or dependent on the county. |
| enrollment | Estimated number of public school students living in the city. |
| districts_in_city | Estimated number of school districts in the city. |
| consolidated_govt | Whether the city has a consolidated city-county government (1 = yes, 0 = no). For example, Philadelphia's city and county government are the same entity; they are not separate governments. |
| id2_city | 12-digit city identifier, from the Annual Survey of State and Local Government Finances |
| id2_county | 12-digit county identifier |
| city_types | Two types: core and legacy. There are 150 core cities, "including the two largest cities in each state, plus all cities with populations of 150,000+ in 1980 and 200,000+ in 2010". Legacy cities include "95 cities with population declines of at least 20 percent from their peak, poverty rates exceeding the national average, and a peak population of at least 50,000". Some cities are both (denoted "core|legacy"). |

The revenue and expenses variables are described in [this detailed
table](../data/fisc_full_dataset_2020_update_variables.pdf). Further
documentation is available on the FiSC Database website, linked in
[References](#references) below.

All monetary data is already adjusted for inflation, and is given in terms of
2020 US dollars per capita. The Consumer Price Index is provided for each year
if you prefer to use numbers not adjusted for inflation, scaled so that 2020 is
1; simply divide each value by the CPI to get the value in that year's nominal
dollars. The total population is also provided if you want total values instead
of per-capita values.

## Questions

1. Do some exploratory data analysis. Are there any outlying cities? Any
interesting trends and relationships? Also, explore the hierarchy of revenues
and expenses, and check that values add up in the way the hierarchy suggests
they should.
2. When considering expenditures, there may be different kinds of cities.
Perhaps dense cities with efficient public transit spend money in different
ways than large, sprawling cities where everyone drives, for example. Extract
out important expenditure variables and do a clustering analysis. Are there
distinct clusters? How many? Can you interpret what they mean? Be careful
about including the hierarchical values in your analysis.


## References

Lincoln Institute of Land Policy. Fiscally Standardized Cities database.
<https://www.lincolninst.edu/research-data/data-toolkits/fiscally-standardized-cities>

0 comments on commit 5684d5c

Please sign in to comment.