Skip to content

Predicting functionality of Tanzanian water well pumps using ML classification methods

Notifications You must be signed in to change notification settings

wvsharber/mod3project_Tanzania_water_well

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Water Well Pump Functionality in Tanzania

In addition to this readme, you can read more about this project on my blog here

Summary and Motivation

Easy access to water is one of the most influential ways to increase quality of living in rural areas. In order to identify pumps that are a good candidate for updating, repairing, or installing, we create a model that predicts existing pump functionality in communities across Tanzania. Our predictive model, which implements a consensus method based on 4 individual classifiers, is able to predict pump functionality with 82% accuracy.

Data

Our predictive model is trained on data from Taarifa, which aggregates the data from the Tanzania Ministry of Water. You may find and download the data here https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/ after signing up for the competition. The training data consists of over 50,000 records of Tanzanian water well pumps across the country and includes features about their location, when they were installed, who installed them, what kind of pump they are, etc., as well as their current functioning status - functional, non-functional, or functional but needs repair.

Model Development and Results

We trained four separate classifier models: Logistic Regression, K-Nearest Neighbor, Random Forest, and Gradient Boosted Forest. Individually, these models performed well, ~ 80% accuracy on the training data, but we were able to achieve slightly higher accuracy (82%) by combining the four models into a single Soft Voting Classifier, where the weighted consensus was calculated for each prediction. We support the use of this consensus model to avoid any biases the individual models may introduce.

Future Directions

Our current model could be improved by appropriately dealing with the smallest class, functional but needs repair, which is underrepresented in our dataset. Given more time, we would implement methods to deal with this class imbalance. Given the potential of this model to improve quality of life for Tanzanian communities, we also suggest adding functionality to the usage of this model to prioritize pump replacement/installation in communities with larger populations or where there is not already a functioning well.

About

Predicting functionality of Tanzanian water well pumps using ML classification methods

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published