Mautz, Dominik, et al. "Towards an Optimal Subspace for K-Means." Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2017. (link)
Original implementation of above article.
This package provides SubspaceKMeans
class which implements the above algorithm and act like the KMeans
of scikit-learn (sklearn.cluster.KMeans).
following articles were written in Japanese only.
> pip install "git+https://github.com/tetutaro/subspacekmeans"
from subspacekmeans import SubspaceKMeans
subspace_km = SubspaceKMeans(n_clusters=8)
predicted = subspace_km.fit_predict(data)
transformed = subspace_km.transform(data)
- This implementation is now based on
scikit-learn==0.24.1
- This implementation does not support sparse matrix as input data
- This implementation does not use either Cython nor algorithms which adopted by scikit-learn (
lloyd
andelkan
)- just use numpy
- easy to understand but slow
- comparing Subspace k-Means with PCA + k-Means. see wine.ipynb
- finding the best
$k$ of (Subspace)$k$ -Means. see pendigits.ipynb