Skip to content

Latest commit

 

History

History
99 lines (66 loc) · 3.06 KB

README.rst

File metadata and controls

99 lines (66 loc) · 3.06 KB
codecov pypi Download Status Documentation Status License: MIT

CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highlights:

  • Ability to cluster low and high-dimensional data of arbitrary shape efficiently.
  • Ability to detect and deal with outliers in the data.
  • Ability to provide textual explanations for the generated clusters.
  • Full reproducibility of all tests in the accompanying paper.
  • Support of Cython compilation.

CLASSIX is a contrived acronym of CLustering by Aggregation with Sorting-based Indexing and the letter X for explainability. CLASSIX clustering consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by a merging phase of groups into clusters. The algorithm is controlled by two parameters, namely the distance parameter radius for the group aggregation and a minPts parameter controlling the minimal cluster size.

Installing and example

CLASSIX has the following dependencies for its clustering functionality:

  • cython
  • numpy
  • scipy
  • requests

and requires the following packages for data visualization:

  • matplotlib
  • pandas

To install the current CLASSIX release via PIP use:

pip install classixclustering

To check the CLASSIX installation you can use:

python -m pip show classixclustering

Download the repository via:

git clone https://github.com/nla-group/classix.git

Example usage:

from sklearn import datasets
from classix import CLASSIX

# Generate synthetic data
X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1)

# Employ CLASSIX clustering
clx = CLASSIX(sorting='pca', verbose=1)
clx.fit(X)

Citation

@techreport{CG22b,
  title   = {Fast and explainable clustering based on sorting},
  author  = {Chen, Xinye and G\"{u}ttel, Stefan},
  year    = {2022},
  number  = {arXiv:2202.01456},
  pages   = {25},
  institution = {The University of Manchester},
  address = {UK},
  type    = {arXiv EPrint},
  url     = {https://arxiv.org/abs/2202.01456}
}