Data Modelling on US Election Data

This project was done as a course assignment for CS418: Introduction to Data Science course at the University of Illinois at Chicago during the Fall 2019 term along with teammates Yushenli1996 and nathanhe789.

The dataset was partly provided to us by the Professor. There were 2 CSV files: one contained a merged data file of demographic data and election data of counties of certain US states from the 2016 US Senate Elections(generated in this project), and another data file containing only the demographic data of some US counties.

The merged data file was meant to be used for training machine learning classification/predictive models to predict winning political party for a particular county, while the demographic data file was to be used as the testing set for the models.

The merged data file was partitioned into training and validation sets using Holdout method. 75% of data was allocated for training the models and rest 25% for validation of the models.

Additionally, the numeric attributes in the training and validation sets were standardized to have a mean of 0 and variance of 1.

The main purpose of the assignment was to perform Data Modelling on the merged demographic-election data. The data modelling tasks performed on the dataset are:

Build Linear Regression Model
- Using all attributes
- By selecting different attributes to find the best set of attributes
- Using LASSO regression
Build Classification Models and select 2 best performing models
- Using all attributes
- By selecting different attributes to find the best set of attributes
Build Clustering Models and select 2 best performing models
- Using all attributes
- By selecting different attributes to find the best set of attributes
Predict the Democratic and Republican party votes of each county using the best performing regression model using the testing set of demographic data
Predict winning political party in each county using the best performing classification model using the testing set of demographic data
Create choropleth map to visualize the majority political party of each county as predicted by the best performing classification model

Using political party attribute in the dataset

Using political party predicted by SVM

Check out the Jupyter Notebook or the project report to see the data science flow implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
Election_Modelling.ipynb		Election_Modelling.ipynb
README.md		README.md
Report.pdf		Report.pdf
demographics_test.csv		demographics_test.csv
merged_train.csv		merged_train.csv
project2_output.csv		project2_output.csv
sample_output.csv		sample_output.csv
votes1.png		votes1.png
votes2.png		votes2.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Modelling on US Election Data

About

Releases

Packages

Contributors 3

Languages

samujjwaal/Modelling-US-Election-Data

Folders and files

Latest commit

History

Repository files navigation

Data Modelling on US Election Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages