Skip to content

Mautz, Dominik, et al. "Towards an Optimal Subspace for K-Means." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017.

License

Notifications You must be signed in to change notification settings

tetutaro/subspacekmeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SubspaceKMeans

Mautz, Dominik, et al. "Towards an Optimal Subspace for K-Means." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017. (link)

Original implementation of above article.

This package provides SubspaceKMeans class which implements the above algorithm and act like the KMeans of scikit-learn (sklearn.cluster.KMeans).

articles corresponding this repository (Qiita)

following articles were written in Japanese only.

Install

> pip install "git+https://github.com/tetutaro/subspacekmeans"

Very simple usage

from subspacekmeans import SubspaceKMeans

subspace_km = SubspaceKMeans(n_clusters=8)
predicted = subspace_km.fit_predict(data)
transformed = subspace_km.transform(data)

Notices

  • This implementation is now based on scikit-learn==0.24.1
  • This implementation does not support sparse matrix as input data
  • This implementation does not use either Cython nor algorithms which adopted by scikit-learn (lloyd and elkan)
    • just use numpy
    • easy to understand but slow

Detailed usage

  • comparing Subspace k-Means with PCA + k-Means. see wine.ipynb
  • finding the best $k$ of (Subspace) $k$-Means. see pendigits.ipynb

Sample

About

Mautz, Dominik, et al. "Towards an Optimal Subspace for K-Means." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published