We have implemented KMeans with the omp library to parallelize and decrease clustering time consumption.
Platform | Minimum g++ Version | Minimum CMake Version | Status |
---|---|---|---|
Linux | 5.0 | 3.21 | Tested |
- Used 4 Approaches To Use OMP + 1 Serial Approach.
- Structured Implementation.
- Support Multi Dimension DataSets.
- Support Generic DataSets.
- Plotting Final Cluster.
We Used 4 Approaches to use omp constructs 1 Serial and a simple one. Each of them is a class that implements an algorithm and specific constructs:
- SerialKMeans
- CriticalKMeans
- SPMDKMeans
- ForKmeans
- PadKmeans
SerialKMeans is the simple serial implementation of KMeans and uses no parallelism.
In this approach, We used omp parallel
construction, and to avoid data-race, 'omp critical' was used to modify clusters.
In computing, SPMD (single program, multiple data) is a technique employed to achieve parallelism; it is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors with different inputs to obtain results faster. SPMD is the most common style of parallel programming.
So to achieve this approach, We used a 2D Array to store Clusters for each thread and modify each of them on its running thread; this way, we split up our tasks and avoided data races, but in the end, we should merge the results of each thread's cluster.
We used omp for
construct for several for loops inside the algorithm within this approach.
As you may know, there is much going on in the back scene of omp and hardware to provide us parallelism. One of them is False-Sharing, which puts lots of pressure on the parallelism process.
To avoid False-Sharing, we use Padding in our parallel constructions.
To see if GCC is already installed on your system, open a Terminal window and enter the following command:
GCC -v
If GCC is not installed, run the following command from the Terminal window to update the Ubuntu package lists. An out-of-date Linux distribution can interfere with getting the latest packages.
sudo apt-get update
Next, install the GNU compiler, make, and the GDB debugger with this command:
sudo apt-get install build-essential gdb
Use the following command to use GUI CMake for building project:
make-GUI
Alternatively, use the command line:
For example, from the command line, we could navigate to the main root of the project's directory of the CMake source code tree and create a build directory:
mkdir Step1_build
Next, navigate to the build directory and run CMake to configure the project and generate a native build system:
cd Step1_build
cmake ../Step1
Then call that build system to compile/link the project:
CMake --build .
Finally, try to use the newly built 1_3omp
with these commands:
1_3omp
After running the project and whenever it is finished, there will be a outputResult.txt
file that logs every approach and the final centroid of clusters and more info.
The result of execution on 4-core cpu with 12 threads in seconds.
Type\Thread | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SPMD | 1.311 | 1.54 | 1.92 | 2.71 | 2.61 | 2.77 | 4.07 | 7.18 | 7.02 | 3.38 | 6.46 | 5.88 |
Serial | 1.07 | - | - | - | - | - | - | - | - | - | - | - |
Critical | 0.75 | 0.63 | 0.33 | 0.29 | 0.32 | 0.46 | 0.3 | 0.29 | 0.3 | 0.43 | 0.4 | 0.21 |
Pad | 1.21 | 1.56 | 1.97 | 3.21 | 2.83 | 4.37 | 4.61 | 3.81 | 4.39 | 3.90 | 4.89 | 4.50 |
For | 0.57 | 0.70 | 1.29 | 0.75 | 0.93 | 0.85 | 0.55 | 0.70 | 1.29 | 0.75 | 0.93 | 0.85 |
Feel free to share your ideas or any other problems. Pull requests are welcomed.
OMPKMeans was released under an MIT license. See LICENSE for more information.