Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 1.47 KB

README.md

File metadata and controls

46 lines (29 loc) · 1.47 KB

SubspaceKMeans

Mautz, Dominik, et al. "Towards an Optimal Subspace for K-Means." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017. (link)

Original implementation of above article.

This package provides SubspaceKMeans class which implements the above algorithm and act like the KMeans of scikit-learn (sklearn.cluster.KMeans).

articles corresponding this repository (Qiita)

following articles were written in Japanese only.

Install

> pip install "git+https://github.com/tetutaro/subspacekmeans"

Very simple usage

from subspacekmeans import SubspaceKMeans

subspace_km = SubspaceKMeans(n_clusters=8)
predicted = subspace_km.fit_predict(data)
transformed = subspace_km.transform(data)

Notices

  • This implementation is now based on scikit-learn==0.24.1
  • This implementation does not support sparse matrix as input data
  • This implementation does not use either Cython nor algorithms which adopted by scikit-learn (lloyd and elkan)
    • just use numpy
    • easy to understand but slow

Detailed usage

  • comparing Subspace k-Means with PCA + k-Means. see wine.ipynb
  • finding the best $k$ of (Subspace) $k$-Means. see pendigits.ipynb

Sample