An implementation of the FLS++ algorithm for k-means Clustering, as presented in [1]. This is an improvement of the LS++ algorithm presented by Lattanzi and Sohler in [2].
You can try FLS++ out on our Clustering Toolkit!
One can think of the algorithm as working in 3 phases:
- Initializing centers with k-means++ (as presented in [3]).
- Improving the center set by performing local search swaps (for details, see [1]).
- Converging the solution with LLoyd's algorithm (also known as "the" k-means algorithm).
In [2] it is shown that O(k log log k)
local search iterations yield a constant factor approximation. However, in both [1] and [2] it is shown that, in practice, a very small number of iterations (e.g. 20) already yields very good results, at very reasonable runtime.
The interface is built in the same way as scikit-learn's KMeans for better compatibility.
In the following plots, we compare the performance of FLS++ (GFLS++) with various local search steps (5, 10, 15) with k-means++ (kM++), greedy k-means++ (GkM++) and the local search algorithm [2] with 25 local search steps (GLS++). The results are computed for the KDD Phy Test and the Tower datasets and averaged over 50 runs.
[1] Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
[2] Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc.444 of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662–3671.445 PMLR, 09–15 Jun 2019
[3] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In409 Proceedings of the 18th SODA, page 1027–1035, USA, 2007
pip install flspp
from flspp import FLSpp
example_data = [
[1.0, 1.0, 1.0],
[1.1, 1.1, 1.1],
[1.2, 1.2, 1.2],
[2.0, 2.0, 2.0],
[2.1, 2.1, 2.1],
[2.2, 2.2, 2.2],
]
flspp = FLSpp(n_clusters=2)
labels = flspp.fit_predict(example_data)
centers = flspp.cluster_centers_
print(labels) # [1, 1, 1, 0, 0, 0]
print(centers) # [[2.1, 2.1, 2.1], [1.1, 1.1, 1.1]]
Install poetry
curl -sSL https://install.python-poetry.org | python3 -
Install clang
sudo apt-get install clang
Set clang variables
export CXX=/usr/bin/clang++
export CC=/usr/bin/clang
Install the package
poetry install
If the installation does not work and you do not see the C++ output, you can build the package to see the stack trace
poetry build
Run the tests
poetry run python -m unittest discover tests -v
If you use this code, please cite the following paper:
T. Conrads, L. Drexler, J. Könen, D. R. Schmidt, and M. Schmidt, "Local Search k-means++ with Foresight," in 22nd International Symposium on Experimental Algorithms (SEA 2024), 2024, pp. 7:1–7:20. doi: 10.4230/LIPIcs.SEA.2024.7.