EIGEN FREQUENCY CLUSTERING USING [KMEANS] [KMEANS & PCA ] [DBSCAN] [HDBSCAN]
KMeans has been used over the years as a clustering algorithm and quite reasonably performed very well. Its approval however is not due to any complex structure but a rather subtle appraoch towards data clustering using K-value(this value is the number of mean centroids). These mean values then serve as a refernce point for the clusters sorrounding it.
As simple and nicely as KMeans may try to solve our clustering problem. It would howvere fall short of clustering data containing very noisy observation. This is therefore the disadvantage of Kmeans.
To solve this problem therefor..a team of researchers have introduced an approach which ignores the mean value for clustering but even better this time considers their density...Hmmn! Reasonable right!. Indeed it is.
This algorithm is able to cluster data based on their density..According to their wiki
The same team of researchers further developed a more sophististated algorithm built on DBSCAN..HDBSCAN. However, the efficiency of this two algorithms is subject to the type of data you want.
For this sample.txt file in this folder, we found out DBSCAN is highly sufficient enough for us and does the clustering better
than its successor HDBSCAN.