diff --git a/_freeze/politics/standard-cities/execute-results/html.json b/_freeze/politics/standard-cities/execute-results/html.json new file mode 100644 index 0000000..e600473 --- /dev/null +++ b/_freeze/politics/standard-cities/execute-results/html.json @@ -0,0 +1,18 @@ +{ + "hash": "d011c996f035c87cc2dc79a6e41f5dda", + "result": { + "markdown": "---\ntitle: Fiscally standardized cities\nauthor: Alex Reinhart\ndate: September 7, 2023\ndescription: Extensive financial data on over 200 of the largest cities in the United States for over 40 years. Which cities spend the most or the least on government services?\ncategories:\n - EDA\n - clustering\ndata:\n year: 2020\n files: fisc_full_dataset_2020_update.csv.gz\n---\n\n\n## Motivation\n\nIn the United States, city governments provide many services: they run public\nschool districts, administer certain welfare and health programs, build roads\nand manage airports, provide police and fire protection, inspect buildings, and\noften run water and utility systems. Cities also get revenues through certain\nlocal taxes, various fees and permit costs, sale of property, and through the\nfees they charge for the utilities they run.\n\nIt would be interesting to compare all these expenses and revenues across cities\nand over time, but also quite difficult. Cities share many of these service\nresponsibilities with other government agencies: in one particular city, some\nroads may be maintained by the state government, some law enforcement provided\nby the county sheriff, some schools run by independent school districts with\ntheir own tax revenue, and some utilities run by special independent utility\ndistricts. These governmental structures vary greatly by state and by individual\ncity. It would be hard to make a fair comparison without taking into account all\nthese differences.\n\nThis dataset takes into account all those differences. The Lincoln Institute of\nLand Policy produces what they call “Fiscally Standardized Cities” (FiSCs),\naggregating all services provided to city residents regardless of how they may\nbe divided up by different government agencies and jurisdictions. Using this, we\ncan study city expenses and revenues, and how the proportions of different costs\nvary over time.\n\n## Data\n\nThe dataset tracks over 200 American cities between 1977 and 2020. Each row\nrepresents one city for one year. Revenue and expenditures are broken down into\nmore than 120 categories.\n\nValues are available for FiSCs and also for the entities that make it up: the\ncity, the county, independent school districts, and any special districts, such\nas utility districts. There are hence five versions of each variable, with\nsuffixes indicating the entity. For example, `taxes` gives the FiSC's tax\nrevenue, while `taxes_city`, `taxes_cnty`, `taxes_schl`, and `taxes_spec` break\nit down for the city, county, school districts, and special districts.\n\nThe values are organized *hierarchically*. For example, `taxes` is the sum of\n`tax_property` (property taxes), `tax_sales_general` (sales taxes), `tax_income`\n(income tax), and `tax_other` (other taxes). And `tax_income` is itself the sum\nof `tax_income_indiv` (individual income tax) and `tax_income_corp` (corporate\nincome tax) subcategories.\n\n### Data preview\n\n\n

fisc_full_dataset_2020_update.csv.gz

\n
\n \n
\n\n\n### Variable descriptions\n\nFor each city and year, the following metadata is available:\n\n| Variable | Description |\n|----|-------------|\n| year | Year for these values |\n| city_name | Name of the city, such as \"AK: Anchorage\", where \"AK\" is the standard two-letter abbreviation for Alaska |\n| city_population | Estimated city population, based on Census data |\n| county_name | Name of the county the city is in |\n| county_population | Estimated county population, based on Census data |\n| cpi | Consumer Price Index for this year, scaled so that 2020 is 1. |\n| relationship_city_school | Type of school district. 1: City-wide independent school district that serves the entire city. 2: County-wide independent school district that serves the entire county. 3: One or more independent school districts whose boundaries extend beyond the city. 4: School district run by or dependent on the city. 5: School district run by or dependent on the county. |\n| enrollment | Estimated number of public school students living in the city. |\n| districts_in_city | Estimated number of school districts in the city. |\n| consolidated_govt | Whether the city has a consolidated city-county government (1 = yes, 0 = no). For example, Philadelphia's city and county government are the same entity; they are not separate governments. |\n| id2_city | 12-digit city identifier, from the Annual Survey of State and Local Government Finances |\n| id2_county | 12-digit county identifier |\n| city_types | Two types: core and legacy. There are 150 core cities, \"including the two largest cities in each state, plus all cities with populations of 150,000+ in 1980 and 200,000+ in 2010\". Legacy cities include \"95 cities with population declines of at least 20 percent from their peak, poverty rates exceeding the national average, and a peak population of at least 50,000\". Some cities are both (denoted \"core|legacy\"). |\n\nThe revenue and expenses variables are described in [this detailed\ntable](../data/fisc_full_dataset_2020_update_variables.pdf). Further\ndocumentation is available on the FiSC Database website, linked in\n[References](#references) below.\n\nAll monetary data is already adjusted for inflation, and is given in terms of\n2020 US dollars per capita. The Consumer Price Index is provided for each year\nif you prefer to use numbers not adjusted for inflation, scaled so that 2020 is\n1; simply divide each value by the CPI to get the value in that year's nominal\ndollars. The total population is also provided if you want total values instead\nof per-capita values.\n\n## Questions\n\n1. Do some exploratory data analysis. Are there any outlying cities? Any\n interesting trends and relationships? Also, explore the hierarchy of revenues\n and expenses, and check that values add up in the way the hierarchy suggests\n they should.\n2. When considering expenditures, there may be different kinds of cities.\n Perhaps dense cities with efficient public transit spend money in different\n ways than large, sprawling cities where everyone drives, for example. Extract\n out important expenditure variables and do a clustering analysis. Are there\n distinct clusters? How many? Can you interpret what they mean? Be careful\n about including the hierarchical values in your analysis.\n\n\n## References\n\nLincoln Institute of Land Policy. Fiscally Standardized Cities database.\n\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": { + "include-in-header": [ + "\n\n" + ] + }, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/cmu-statds-datasets.csv b/cmu-statds-datasets.csv index bea2478..5b2861f 100644 --- a/cmu-statds-datasets.csv +++ b/cmu-statds-datasets.csv @@ -8,10 +8,12 @@ date,datayear,title,description,subject,categories,url 2017-02-25,2017,Religion and analytic thinking,A controversial study found that encouraging analytic thinking also reduced religious belief. A replication attempt collected much more data to try to confirm the hypothesis; is it supported by the data? A simple randomized experiment with continuous outcome and demographic controls.,psychology,"ANOVA, linear regression",https://cmustatistics.github.io/data-repository/psychology/religion-analytic-thinking.html 2023-09-06,2023,Mapping Police Violence,"Each year in the United States, around 1,000 people are killed by police. Some of these deaths are accidental, in traffic accidents and other incidents; some are deliberate killings considered legally justified by authorities; and others are considered unjustified or even lead to the prosecution of the police officer. Explore data on over 10,000 such killings to identify patterns in the people killed, the stated reasons for their killings, and the situations leading to their deaths.",politics,"GLMs, linear regression, logistic regression",https://cmustatistics.github.io/data-repository/politics/mapping-police-violence.html 2023-06-20,2022,World Development Indicators,"The World Bank’s World Development Indicators (WDI) compile development information about countries around the world. Using ten years of data, study development, political stability, pollution, and other factors at the national and regional levels.",politics,"linear regression, ANOVA",https://cmustatistics.github.io/data-repository/politics/world-bank.html +2023-09-07,2020,Fiscally standardized cities,Extensive financial data on over 200 of the largest cities in the United States for over 40 years. Which cities spend the most or the least on government services?,politics,"EDA, clustering",https://cmustatistics.github.io/data-repository/politics/standard-cities.html 2017-02-22,2014,Science Forums,"A random sample of discussions at a large science discussion forum, with metadata about each.",social,"GLMs, classification, linear regression",https://cmustatistics.github.io/data-repository/social/science-forums.html 2019-08-14,2010,"House prices in Ames, Iowa","Records of homes sold in Ames, Iowa, and their sale prices, including detailed covariates that could be used to predict home prices.",money,"linear regression, nonparametric regression",https://cmustatistics.github.io/data-repository/money/ames-housing.html 2023-06-09,2021,European protected ham,"In Europe, some types of ham—like Black Forest ham—can only be legally produced in specific geographic regions. Does the size of those regions affect the price of the ham?",money,"ANOVA, linear regression, hierarchical model",https://cmustatistics.github.io/data-repository/money/european-ham.html 2022-07-27,2014,Rail trails and property values,"Rail trails are a great recreation opportunity, but do homes near them rise in value? A retrospective observational study of house values.",money,"linear regression, nonparametric regression",https://cmustatistics.github.io/data-repository/money/rail-trails.html +2019-02-09,2015,Traffic stops in Connecticut,"Records for every traffic stop made by the Connecticut State Police over several years, including the reason for the stop and demographic details. Use classifiers to study whether stop and search decisions show signs of bias.",crime-and-justice,"classification, logistic regression",https://cmustatistics.github.io/data-repository/crime-and-justice/connecticut-stops.html 2023-05-17,2020,Core temperature during surgery,Is low core temperature during surgery associated with poor surgical outcomes or death? Use logistic and survival analysis to study an observational dataset.,medicine,"logistic regression, survival analysis",https://cmustatistics.github.io/data-repository/medicine/core-temperature.html 2023-06-08,1828,Bloodletting,"Bloodletting—deliberately withdrawing large amounts of blood from patients—was a common medical practice for centuries, until evidence was finally collected on its efficacy. Use an early observational dataset to explore its use for pneumonia.",medicine,"logistic regression, contingency tables",https://cmustatistics.github.io/data-repository/medicine/bloodletting.html 2022-12-08,2017,Health exams in Vietnam,"As access to health care increases, public health advocates encourage ordinary people to get regular check-ups to detect problems before they become serious (and expensive). But not everyone follows this advice. This survey of people in Vietnam explores the factors leading people to get regular check-ups.",medicine,"surveys, classification, logistic regression",https://cmustatistics.github.io/data-repository/medicine/vietnam-health.html diff --git a/data/fisc_full_dataset_2020_update.csv.gz b/data/fisc_full_dataset_2020_update.csv.gz new file mode 100644 index 0000000..6655908 Binary files /dev/null and b/data/fisc_full_dataset_2020_update.csv.gz differ diff --git a/data/fisc_full_dataset_2020_update_variables.pdf b/data/fisc_full_dataset_2020_update_variables.pdf new file mode 100644 index 0000000..b5449c1 Binary files /dev/null and b/data/fisc_full_dataset_2020_update_variables.pdf differ diff --git a/politics/standard-cities.qmd b/politics/standard-cities.qmd new file mode 100644 index 0000000..1aaa18e --- /dev/null +++ b/politics/standard-cities.qmd @@ -0,0 +1,115 @@ +--- +title: Fiscally standardized cities +author: Alex Reinhart +date: September 7, 2023 +description: Extensive financial data on over 200 of the largest cities in the United States for over 40 years. Which cities spend the most or the least on government services? +categories: + - EDA + - clustering +data: + year: 2020 + files: fisc_full_dataset_2020_update.csv.gz +--- + +## Motivation + +In the United States, city governments provide many services: they run public +school districts, administer certain welfare and health programs, build roads +and manage airports, provide police and fire protection, inspect buildings, and +often run water and utility systems. Cities also get revenues through certain +local taxes, various fees and permit costs, sale of property, and through the +fees they charge for the utilities they run. + +It would be interesting to compare all these expenses and revenues across cities +and over time, but also quite difficult. Cities share many of these service +responsibilities with other government agencies: in one particular city, some +roads may be maintained by the state government, some law enforcement provided +by the county sheriff, some schools run by independent school districts with +their own tax revenue, and some utilities run by special independent utility +districts. These governmental structures vary greatly by state and by individual +city. It would be hard to make a fair comparison without taking into account all +these differences. + +This dataset takes into account all those differences. The Lincoln Institute of +Land Policy produces what they call “Fiscally Standardized Cities” (FiSCs), +aggregating all services provided to city residents regardless of how they may +be divided up by different government agencies and jurisdictions. Using this, we +can study city expenses and revenues, and how the proportions of different costs +vary over time. + +## Data + +The dataset tracks over 200 American cities between 1977 and 2020. Each row +represents one city for one year. Revenue and expenditures are broken down into +more than 120 categories. + +Values are available for FiSCs and also for the entities that make it up: the +city, the county, independent school districts, and any special districts, such +as utility districts. There are hence five versions of each variable, with +suffixes indicating the entity. For example, `taxes` gives the FiSC's tax +revenue, while `taxes_city`, `taxes_cnty`, `taxes_schl`, and `taxes_spec` break +it down for the city, county, school districts, and special districts. + +The values are organized *hierarchically*. For example, `taxes` is the sum of +`tax_property` (property taxes), `tax_sales_general` (sales taxes), `tax_income` +(income tax), and `tax_other` (other taxes). And `tax_income` is itself the sum +of `tax_income_indiv` (individual income tax) and `tax_income_corp` (corporate +income tax) subcategories. + +### Data preview + +```{r, echo=FALSE, results="asis"} +source("../preview_dataset.R") +preview_datasets() +``` + +### Variable descriptions + +For each city and year, the following metadata is available: + +| Variable | Description | +|----|-------------| +| year | Year for these values | +| city_name | Name of the city, such as "AK: Anchorage", where "AK" is the standard two-letter abbreviation for Alaska | +| city_population | Estimated city population, based on Census data | +| county_name | Name of the county the city is in | +| county_population | Estimated county population, based on Census data | +| cpi | Consumer Price Index for this year, scaled so that 2020 is 1. | +| relationship_city_school | Type of school district. 1: City-wide independent school district that serves the entire city. 2: County-wide independent school district that serves the entire county. 3: One or more independent school districts whose boundaries extend beyond the city. 4: School district run by or dependent on the city. 5: School district run by or dependent on the county. | +| enrollment | Estimated number of public school students living in the city. | +| districts_in_city | Estimated number of school districts in the city. | +| consolidated_govt | Whether the city has a consolidated city-county government (1 = yes, 0 = no). For example, Philadelphia's city and county government are the same entity; they are not separate governments. | +| id2_city | 12-digit city identifier, from the Annual Survey of State and Local Government Finances | +| id2_county | 12-digit county identifier | +| city_types | Two types: core and legacy. There are 150 core cities, "including the two largest cities in each state, plus all cities with populations of 150,000+ in 1980 and 200,000+ in 2010". Legacy cities include "95 cities with population declines of at least 20 percent from their peak, poverty rates exceeding the national average, and a peak population of at least 50,000". Some cities are both (denoted "core|legacy"). | + +The revenue and expenses variables are described in [this detailed +table](../data/fisc_full_dataset_2020_update_variables.pdf). Further +documentation is available on the FiSC Database website, linked in +[References](#references) below. + +All monetary data is already adjusted for inflation, and is given in terms of +2020 US dollars per capita. The Consumer Price Index is provided for each year +if you prefer to use numbers not adjusted for inflation, scaled so that 2020 is +1; simply divide each value by the CPI to get the value in that year's nominal +dollars. The total population is also provided if you want total values instead +of per-capita values. + +## Questions + +1. Do some exploratory data analysis. Are there any outlying cities? Any + interesting trends and relationships? Also, explore the hierarchy of revenues + and expenses, and check that values add up in the way the hierarchy suggests + they should. +2. When considering expenditures, there may be different kinds of cities. + Perhaps dense cities with efficient public transit spend money in different + ways than large, sprawling cities where everyone drives, for example. Extract + out important expenditure variables and do a clustering analysis. Are there + distinct clusters? How many? Can you interpret what they mean? Be careful + about including the hierarchical values in your analysis. + + +## References + +Lincoln Institute of Land Policy. Fiscally Standardized Cities database. +