Skip to content
/ kmeans Public

A small, header-only, parallel implementation of kmeans clustering for arbitrary-long byte vectors.

Notifications You must be signed in to change notification settings

jermp/kmeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KMeans for byte vectors

A small, header-only, parallel implementation of kmeans clustering for arbitrary-long byte vectors.

The code is inspired by the dkm library.

Compiling the code

The code is tested on Linux with gcc and on MacOS with clang. To build the code, CMake is required.

First clone the repository with

git clone --recursive https://github.com/jermp/kmeans.git

If you forgot --recursive when cloning, do

git submodule update --init --recursive

before compiling.

To compile the code for a release environment (see file CMakeLists.txt for the used compilation flags), it is sufficient to do the following, within the parent kmeans directory:

mkdir build
cd build
cmake ..
make -j

For a testing environment, use the following instead:

mkdir debug_build
cd debug_build
cmake .. -D CMAKE_BUILD_TYPE=Debug -D KMEANS_USE_SANITIZERS=On
make -j

Dependencies

Parallelism is achieved via OpenMP. To install it, do

brew install libomp

on a Mac, or

sudo apt install libomp

on Linux.

Examples

The tool tools/cluster can be used to cluster a collection of byte vectors. We assume the input collection vectors.bin is a binary file where: the first 8 bytes encode the number of bytes per vector, say p; the next 8 bytes encode the number of vectors in the collection, say n; we have then p bytes for vector (a total of np bytes).

./cluster -i vectors.bin -k 16 -d 0.0 -s 13 > labels.txt

./cluster -i vectors.bin -m 7 -d 0.001 -s 13 --mse 500 --mcs 10 > labels.txt

./cluster -i vectors.bin -m 7 -d 0.001 -s 13 --mse 50 --mcs 1 > labels.txt

About

A small, header-only, parallel implementation of kmeans clustering for arbitrary-long byte vectors.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published