MLCompetition

Code used to win 2nd place out of 453 students in the Microsoft Data Science Professional Certification Capstone.

This competition was focused on predicting poverty rates (a regression problem) for counties in the United States based on 33 categorical and continuous features about those counties.

More information about the competition can be found here: https://www.datasciencecapstone.org/competitions/3/county-poverty/

Six regression models (multi-layer perceptron, support vector machine, gradient boosted regressor, random forest regressor, extra trees regressor, and adaboost regresor) were tuned using grid search cross-validation against the training dataset to optimize mean squared error.

The predictions for each model were used as features to train a final regression model (XGBoost regresor) using the stacking technique with the library vecstack. The model was evaluated on a held-out sample of the training set. Once proven to be effective with the selected parameters, the model was then re-trained on the entire training set to reduce bias.

The trained final model was then applied to the test dataset resulting in a test root mean squared error of 2.57. A report summarizing the exploration of the dataset, modeling procedure, and suggested actions was then created.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Analysis of County Poverty Rates.pdf		Analysis of County Poverty Rates.pdf
Correlation matrix.py		Correlation matrix.py
Final Stacking Model.py		Final Stacking Model.py
Model Tuner.py		Model Tuner.py
NN Model.py		NN Model.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLCompetition

About

Releases

Packages

Languages

nmannheimer/MLCompetition

Folders and files

Latest commit

History

Repository files navigation

MLCompetition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages