A-Technical-Review-of-Clustering

This project analyses different clustering methods over three different datasets

Functionality

Cleaning and Preprocessing of datasets: Plotted pairplots, heatmaps and histograms for the three datasets to pre-analyse the data and to identify which variables to exclude
Used the elbow method to use the K-means algorithm to analyse the three datasets
Plotted a dendrogram to use the Agglomerative Hierarchical algorithm to analyse the three datasets
Plotted a k-distance graph to use the DBScan algorithm to analyse the three datasets
Compared and evaluated the three methods to identify the advantages and disadvantages of using each clustering method

All documents can be found under the Documentation folder

All datasets can be found under the Datasets folder

Frequent Flyer Program: This dataset contains information about the behaviour of NZ Airline’s FFP customers. We have dropped two variables: PartnerTrans and FlightTrans. The models are built with the following variables: AwardMiles, EliteMiles, PartnerMiles, FlyingReturnsMiles, and EnrollDuration.
Mall Customer: This dataset contains the basic information about the customers. None of the attributes has a good correlation among them and hence we used all the numeric variables when building the clustering models.
Wine: This dataset contains information about different types of wines. Total Phenols, Ravanoids, Hue, OD280 and Proline show a strong negative correlation with the class label. Ash_Alcanity has a positive correlation with Ash. Therefore, we dropped the variables - Ash_Alcanity, OD280, and Proanthocyanins- when we built the clustering models.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Datasets		Datasets
Documentation		Documentation
.DS_Store		.DS_Store
.gitattributes		.gitattributes
FrequentFlyerProgram.ipynb		FrequentFlyerProgram.ipynb
Mall_Customers.ipynb		Mall_Customers.ipynb
README.md		README.md
Wine.ipynb		Wine.ipynb