Developed a GA for Graph Partitioning.
Project for System And Device Programming course @ PoliTO
We tested our program with WSL (Windows Subsystem for Linux) with an Intel Core i7-8700K CPU @ 3.70GHz. The RAM used was 8 GB and we were allowed to run 12 threads in parallel.
The goal of this project was to develop two versions (sequential and parallel) of a p-way partitioning algorithm. This means that our algorithm's goal is to split the initial graph G into p non-overlapping subgraphs while satisfying two specific properties:
- the sum of the weights of the nodes in each subgraph is balanced
- the sum of the weights of the edges crossing between subgraphs is minimized.
In order to do so, we decided to exploit a Genetic Algorithm, using the islands paradigm in its parallel version.
The commands to run the entire project are in Makefile
file.
These are fundamental commands to compile our program:
- Delete previous object files and eventual data folder
make clean
make clean_data
- Compile code and extract data
make
###How to run ? These are fundamental commands to run our program:
OPTIONAL
It creates a new graph to be partitioned
./graph_generator #nodes #edges
- Run program (details below) with a set of command line arguments, using
./graph_partitioner <data/graph_#nodes_#edges/> --binary [valore] --bal --mod --parallel --part [valore] --population [valore] --gen [valore] --era [valore] --new [valore] --isl [valore] --mig [valore] --lr [valore]
Extra. Run metis application to obtain a reference value
sh compute_metis.sh
In order to run the same tests we did, use:
sh autorun.sh <data/graph_#nodes_#edges>
- graph with 250 k nodes & 750 k edges
- graph with 500 k nodes & 1,5 M edges
- graph with 750 k nodes & 2,25 M edges
- graph with 1 M nodes & 3 M edges
NOTE
If you want to see the contents of a graph, it is recommended to use files with the extension .txt
The file with the extension .bin is used to parallel read the graph
- GA: Genetic Algorithm sequential (one thread for one island);
- GA: Genetic Algorithm parallel (one thread dedicated to each island);
The GA is provided with DEFAULT SETTINGS, tuned by us in the train phase. They are used if you run the algorithm without specifying any optional parameters.
Please refer to the output file if you are interested to see their values.
In order to use our program correctly it's important to understand what to pass to run it. The only mandatory argument is the : Input graph folder name
All the following argument can be used by the user to change the default setting ( they're Optional ).
Argument | Usage |
---|---|
--binary | if activated defines how many threads used to read the file in input. |
--bal | if activated defines to make a balanced partition. |
--mod | if activated defines that the algorithm stops if there has been no significant improvement for some era. |
--parallel | if enabled defines parallel computation. |
--part | if enabled defines how many partitions to partition the graph into. |
--population | if activated, needs a value defining the population size. |
--gen | if activated, needs a value defining the number of generations. |
--era | -mod NOT Active if activated, needs a value to defines the number of eras. |
--era | -mod Active if activated, needs a value defining the number of consecutive eras to wait without improvement. |
--new | if activated, needs a value defining the number of offsprings. |
--isl | if activated, needs a value defining the number of islands. |
--mig | if activated, needs a value defining the number of migrants. |
--lr | -mod Active if activated, needs a value defining the learning rate. |
./graph_partitioner ./data/graph_1000000_2500000 --bal --parallel --era 50 --isl 10 # Parallel execution to compute balanced partition, using 50 era and 10 island (other parameters stay default)
./graph_partitioner ./data/graph_1000000_2500000 --mod --era 7 --new 15 --lr 20 # Sequential execution (default), generating 15 new offspring, that stop if there's not at least one improvement of 20% for 7 consecutive eras
./graph_partitioner ./data/graph_1000000_2500000 # All parameters are kept by default (Sequential computation, not balanced partitions)
./graph_generator 100000 250000 # Create a graph file with 100000 nodes and 250000 edges
There are 5 main folders.
It contains a Genetic Algorithm class, which contains all the functions and parameters needed:
- GA.cpp
- GA.h
It contains all utility functions.
- utility.cpp
- utility.h
It contains a Graph class and related functions:
- Graph.cpp
- Graph.h
It contains a Individual class and related functions to encode it:
- Individual.cpp
- Individual.h