This work proposes a simple way to improve a clustering algorithm. The idea is to exploit a new distance metric called the "Euclidean Commute Time" (ECT) distance, based on a random walk model on a graph derived from the data. Using this distance measure instead of the usual Euclidean distance in a k-means algorithm allows to retrieve well separated clusters of arbitrary shape, without working hypothesis about their data distribution. Experimental results show that the use of this new distance measure significantly improves the quality of the clustering on the tested data sets. This project is an implementation of this technique.
- Python 3.7.0
- Python packages:
- pykov
- pandas
- networkx
- matplotlib
- numpy
- scikit_learn
- seaborn
- cmake
First, check if you already have it installed or not.
python3 --version
If you don't have python 3 in your computer you can use the code below:
sudo apt-get update
sudo apt-get install python3
sudo pip3 install pandas networkx matplotlib numpy scikit_learn seaborn cmake
sudo pip3 install git+git://github.com/riccardoscalco/Pykov@master
If you haven't installed pip, you can use the codes below in your terminal:
sudo apt-get update
sudo apt install python3-pip
You should check and update your pip:
pip3 install --upgrade pip