Skip to content

Latest commit

 

History

History
executable file
·
46 lines (41 loc) · 1.56 KB

README.md

File metadata and controls

executable file
·
46 lines (41 loc) · 1.56 KB

Kmeans-Based-on-ECT-Distances

Abstract

This work proposes a simple way to improve a clustering algorithm. The idea is to exploit a new distance metric called the "Euclidean Commute Time" (ECT) distance, based on a random walk model on a graph derived from the data. Using this distance measure instead of the usual Euclidean distance in a k-means algorithm allows to retrieve well separated clusters of arbitrary shape, without working hypothesis about their data distribution. Experimental results show that the use of this new distance measure significantly improves the quality of the clustering on the tested data sets. This project is an implementation of this technique.

To use this work on your researches or projects you need:

  • Python 3.7.0
  • Python packages:
    • pykov
    • pandas
    • networkx
    • matplotlib
    • numpy
    • scikit_learn
    • seaborn
    • cmake

To install Python:

First, check if you already have it installed or not.

python3 --version

If you don't have python 3 in your computer you can use the code below:

sudo apt-get update
sudo apt-get install python3

To install packages via pip install:

sudo pip3 install pandas networkx matplotlib numpy scikit_learn seaborn cmake
sudo pip3 install git+git://github.com/riccardoscalco/Pykov@master

If you haven't installed pip, you can use the codes below in your terminal:

sudo apt-get update
sudo apt install python3-pip

You should check and update your pip:

pip3 install --upgrade pip