FedSpend

Presentation (Screencast)
Slides
UI (down) (Screencast)
Data wrangling work (separate branch)

A tool to see where federal money is going with additional information on associated legislators.

Overview

Companies are interested in the distribution of US federal spending on government contracts, across industries and across states, to direct investment or expansion decisions. Taxpayers are also interested because this is their money being spent on their behalf, and the spending should be in line with their priorities.

While federal laws mandate the public accessibility of federal spending data, available exploratory tools are focused on individual payments and the data are provided without context. FedSpend is designed to address this gap in accessibility, by linking federal payments to the recipient's House and Senate representatives. The data are aggregated to abstract away government peculiarities and presented via UI with additional state demographic information.

Data

Federal spending data are provided by the US Government as a PostgreSQL dump archive file, updated monthly.

Every row in the federal spending dataset is an obligation (i.e., promise to pay). Outlays (i.e., actual payments) are not included. Additional contract funding modifications, such as the increase or decrease of a contract obligation, are entered as an additional row in the dataset with the adjustment amount. See an example below.

Date	NAICS code	Recipient	Amount	District	State
2019	519130	Bojack Horseman Inc.	$45,000	36	CA
2019	519130	Bojack Horseman Inc.	-$20,000	36	CA
2019	519130	Bojack Horseman Inc.	$5,000	36	CA

In this example, the total value of the contract is $30,000.

Historical and current House of Representatives and Senate data are provided as a public resource, maintained by a community of interested persons. These data are linked to government contracts by year and city/state of the recipient.

Installation

An overview is provided here, while full details are in the wiki.

Set up a PostgreSQL database and restore the full database (~600GB) from the archive file (~60GB).
Set up a CockroachDB cluster.
Set up a Spark cluster for the migration of data from PostgreSQL to CockroachDB.
Migrate each table (~60 tables) from PostgreSQL to CockroachDB via Spark.
Calculate aggregate tables from PostgreSQL using Spark and save to CockroachDB.
Set up a web server with Apache and UI with Flask.
Set up Airflow to run the data pipeline + version control, and refresh the UI.

Architecture

Engineering challenges

Choosing a database to migrate to

The data come prepackaged in a PostgreSQL database, which is in itself a stable choice but there are several modern options to choose from. Prioritizing pattern flexibility and efficiency over infinite scale, the PIE theorem dictates that another relational database is the right choice. CockroachDB was chosen because of its built-in distributed computing and its resilency.

Version control

Consider the database dump posted in February 2020. Rather than overwriting our existing data with the new archive dump, we select out and append the new data (green), then conduct a version control procedure on the carryover data (yellow). The goal is to check whether that carryover data -- which we expect to be identical -- is in fact identical. A hash function is applied on the obligation column in both versions and its output noted.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
0-setup		0-setup
1-ingestion		1-ingestion
2-data-processing		2-data-processing
3-database-scripts		3-database-scripts
4-app		4-app
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FedSpend

Overview

Data

Installation

Architecture

Engineering challenges

Choosing a database to migrate to

Version control

About

Releases

Packages

Languages

License

sanoke/fedspend

Folders and files

Latest commit

History

Repository files navigation

FedSpend

Overview

Data

Installation

Architecture

Engineering challenges

Choosing a database to migrate to

Version control

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages