This is the source code for govscent.org
Govscent is a Django application.
- Python 3.10+
- PostgresSQL
- Required:
btree_gin
- Required:
Create a .env
file like the following:
ENV="dev"
SECRET_KEY="LOCAL_DEV"
OPENAI_API_ORG="org-abc"
OPENAI_API_KEY="sk-xyz"
DB_NAME="govscent"
DB_USER="postgres"
DB_PASSWORD="password"
DB_HOST="localhost"
DB_PORT="5432"
The OpenAPI keys are only required if you plan to run analysis from your machine.
- Install Postgres 14+ and create a "govscent" database.
- Setup venv to isolate dependencies:
python3 -m venv venv
- Use the venv (for Posix users):
source env/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Setup database:
./manage.py migrate
- Create your admin user!
./manage.py createsuperuser
- Import the last 1000 bills and related data from production
./manage.py runscript sync_from_prod --script-args 1000
- Start the app!
./manage.py runserver
Then the admin panel is accessible via http://localhost:8000/admin
.
The API is accessible via http://localhost:8000/api
.
You can pull the latest data from prod with the sync_from_prod
script. This makes development a lot easier since you're
always working with prod data.
- From Congress Data Sources.
- → Then to On-Disk data structures (for now, may be removed in future to save disk space)
- → Then to Local Database (SQLite locally, Postgresql in prod)
- Analysis performed via cron and Python scripts.
- Analysis results saved in Postgresql. We parse the response from GPT to extract the topic rating and topics list, and we also store the raw response, so it can be re-parsed without calling OpenAI.
- Any errors from the API are also stored on the
Bill
for analysis and re-running later. - The
Bill
only contains the raw text as well as the response from the language model. - The pretty HTML version of the bill is generated at runtime via govscentdotorg/services/bill_html_generator.py.
- The pretty HTML is cached on disk for a short time to save CPU if a page goes viral. This saves significant disk space compared to saving millions of HTML or PDF files.
Merging to main will trigger a deployment. Deploys take around three seconds.
Scripts are written using Django's RunScript plugin with the syntax python3 manage.py runscript script_name --script-args arg_one arg_two
.
RunScript has its own docs, but arguments are passed via --script-args
, and each arg is passed as an argument to the run
method in your script.
Please type def your arguments. You can also mark them optional, like def run(input_path: str | None)
1993 and onward bill data is from github.com/unitedstates/congress. You can run usc-run govinfo --collections=BILLS --store=html
in that tool to get the data.
You'll then want to import the bills from the filesystem into the database. To keep things simple, each country will have its own import script, which is ran via a cron in production.
For example, for USA: python3 manage.py runscript usa_import_bills --script-args /congress-repo-path/data False False
.
Bills can be analyzed via python3 manage.py runscript analyze_bills --script-args False
. This scripts runs analysis sequentially. The cost is fairly low due to its
sequential nature, so it's fairly safe to run yourself.
Bulk analysis, done with each year in parallel, is available via the analyze_bills_bulk
script. Be wary, this will rack up thousands of dollars very quickly.
There were concerns that Python would be too slow, but it has shown to be plenty fast enough. Most pages render < 10ms, and development productivity with type hints is good.