Skip to content

The purpose of this study was to use a variety of machine learning techniques to determine relevant features for Accountability Court Graduation and deploy a machine learning model in a Heroku app.

Notifications You must be signed in to change notification settings

rjpaxtondata/Georgia-Accountability-Court-Graduation-Prediction

 
 

Repository files navigation

Georgia Accountability Court Success Prediction

Matthew Bishop, Ashley Gates, Raheem Paxton, Swapna Subbagari

Introduction

Georgia’s accountability court programs offer an alternative to traditional adjudication and incarceration for non-violent offenders charged with various drug-related crimes and DUIs. The state of Georgia contracted with the Carl Vinson Institute of Government at the University of Georgia to estimate the financial benefits of accountability courts, finding that, on average, the programs saved the state of Georgia more than $22,000 per graduate. This study also found that accountability courts are almost $5,000 less than the costs for traditional adjudication per defendant when considering both state and local costs (https://cjcc.georgia.gov/document/full-report/download). Considering these potential savings for the state, it’s no wonder that these programs are growing in popularity. This study aimed to identify the feature relevant for graduation and develop a prediction model that can be deployed for public use.

image

Problem Statement--Questions to Examine

• What features are most relevant for graduation? • Does time between arrest and acceptance impact graduation? • Are there certain individual characteristics that increase one’s risk for termination?

Data points to examine:

• Acceptance type

• Age

• Arrest date,

• Referral date

• Risk level

• Acceptance Date- Arrest Date → Processing Time(Mathew will get back to us)

• Exit date

• Exit status

• Referral source

• Demographic info: DOB, Education level(could change), Employment status(at entry), Gender, Income level, Employment stability, Military service, Race

• Program type (See program codes below)

• Clinical Diagnosis and Level

• Diagnosis Reason

• Number of drug tests

• Count weekly judicial status meetings

• Primary drug of choice

• Secondary drug of choice

• Number of treatment sessions

• Residence County

Program Codes

• FD - Felony Drug

• DC - DUI Courts

• MH - Mental Health

• JD - Juvenile Drug

• JM - Juvenile Mental Health

• FT - Family Treatment

• VC - Veterans Court

Methodology and Results

Data Cleaning

The data were cleaned by examining relevant data points and items that would move forward in the final analysis. Participants with unrealistic values were removed from the analytic data frame.

Descriptive Statistics

Descriptive statistics were computed for all variables of interest. In particular, we examined the distribution of all continuous variables to determine whether they were normally distributed. The frequency counts of all categorical variables were explored. Finally, we explored the pros and cons of scaling and binning continuous variables.

Unsupervised Learning

The data were pre-processed by one-hot-encoding (i.e., get_dummies) all categorical variables and standardizing continuous variables using StandardScaler in Pandas. We then fit a Principal Component Analysis specifying that the model accounts for 99% of the variance in the data. Finally, we used TSNE to reduce the identified components further and derived an Inertia plot to determine the relevant clusters in the data. The elbow plot did not suggest a definitive cluster number. However, after exploring two to four clusters on relevant features, we decided on two clusters because of their ability to distinguish participants on relevant factors. The clusters were later visualized with Matplotlib (See Below).

image

image

Supervised Learning

Initially, we used Recursive Feature Elimination (RFE) with a Random Forest estimator to determine the number of features relevant for further testing. The features were selected using using 5-fold cross-validation with 2 repeats, which produced mean accuracy and standard deviation scores for the 5 to 20 features that were evaluated (See figure below).

image

The model revealed that eight features produced a reasonable accuracy, which preserving model parsimony. We then trained a series of 5 machine learning models using a Recursive feature elimination pipeline, with 5-fold cross-validation, with two repeats. The final model was chosen based on mean accuracy.

image

We observed that Gradient Boosting produced the highest accuracy. The model was then tuned using a randomized grid search to obtain the best precision, specificity, recall, and R-square. The unbalanced classification report was reported below.

Front End

The model is housed, along with other visualizations, in a Heroku (or similar tool) website. Other tools to be used to complete this project were Python Pandas, Tableau, Flask. HTML/CSS/Bootstrap, and SQL database.

Heroku App

The deployed app can be viewed here.

Final Presentation

The final presentation can be viewed here.

About

The purpose of this study was to use a variety of machine learning techniques to determine relevant features for Accountability Court Graduation and deploy a machine learning model in a Heroku app.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 89.8%
  • HTML 6.0%
  • JavaScript 1.9%
  • Python 1.6%
  • CSS 0.7%