======================
This repository provides experiments and scripts for reproducing most figures and tables in our paper Glign: Taming Misaligned Graph Traversals in Concurrent Graph Processing. Our paper introduces Glign, a runtime system for evaluating concurrent graph queries. Currently, Glign is built on top of Ligra.
In the instructions below, we provide steps for compiling and running experiments.
G++ compilers with support for Cilk Plus is required (version >= 5.3.0 and <= 7.4.0). Perf tool is used for profiling.
Compilers
- g++ >= 5.3.0 and <= 7.4.0 with support for Cilk Plus
To compile, first checkout ae-checkin
branch, cd into Glign-AE/apps
folder and run
$ bash compile.sh
This will compile all six benchmakrs in the paper (SSSP_Batch, BFS_Batch, SSWP_Batch, SSNP_Batch, Viterbi_Batch, Heter_Batch).
For example, to run concurrent SSSP queries (under Glign-AE/apps
directory):
$ ./SSSP_Batch -option glign -batch 64 -max_combination 512 -mode 3 -delay -qf [queries source file] [graph]
The above will evaluate 512 queries in 8 batches, each batch contains 64 concurrent queries.
Query files are provided (in Glign-AE/query_input
folder), the queries for each graph are generated based on Section 4.1 of the paper.
-batch
denotes the batch size, -max_combination
denotes the total number of queries to evaluate. [graph] should be a weighted graph in adjacency format. User can use Ligra's tool to convert SNAP graph format to adjacency format and add random weights.
A smaller graph (LiveJournal) has been provided inside the ae-data
folder. The GridGrid graph format (used by GraphM) for LJ is also provided in ae-data/graphm
.
There are some variants of Glign in the paper which are summarized below:
Variant | parameters |
---|---|
Ligra-Seq | -option glign -mode 1 |
Ligra-C | -option ligra-c |
Glign-Intra | -option glign -mode 2 |
Glign-Inter | -option glign -mode 2 -delay |
Glign-Batch | -option glign -mode 3 |
Glign | -option glign -mode 3 -delay |
For heterogeneous queries (a batch of mixed SSSP, SSWP, BFS, and SSNP):
Variant | parameters |
---|---|
Ligra-Seq | -option glign-heter -mode 1 |
Ligra-C | -option ligra-c |
Glign-Intra | -option glign-heter -mode 2 |
Glign-Inter | -option glign-heter -mode 2 -delay |
Glign-Batch | -option glign-heter -mode 3 |
Glign | -option glign-heter -mode 3 -delay |
In the future release, we will simpilify the config parameters for different settings.
The adj graphs (for Ligra and Krill) and grid format graphs (for GraphM) can be downloaded from the shared Google Drive folder.
It is suggested that first finish data collection for one graph. If time allows, larger graphs can be further tested.
Adj Graph | File name | Size |
---|---|---|
LJ | soc-LiveJournal1.weighted.adj | 0.69GB |
WP | wikipedia_link_en_weighted.adj | 4.3GB |
UK2 | uk-2002-sym-weighted.adj | 5.5 GB |
TW | twitter-2010.weighted.adj | 16.1GB |
FR | friendster.sym.weighted.adj | 40.3GB |
RD-CA | roadNet-CA.weighted.adj | 0.067GB |
RD-US | road-road-usa.weighted.adj | 0.82GB |
Note: it is possible to download the SNAP graphs (from SNAP Project, and convert SNAP graphs to Adj format (following notes in https://github.com/jshun/ligra#graph-converters), and convert SNAP graphs to Grid format (following notes in https://github.com/Program-Anonymity/GraphM)), however, the Goolge Drive shared folder contains all necessary graph inputs.
After downloading the graph files, please copy and paste all files into .../Glign/ae-data
folder and
run the following commands (under the .../Glign/apps
folder):
1. The overall performance tests in Glign (in the .../Glign/apps
folder):
bash run_experiments.sh [name] [graph_path] [query_file]
[name] should be the abbreviation of tested graphs in the paper: LJ, WP, UK2, TW, FR, RDCA, RDUS.
[graph_path] is the location of weighted adjacency graph.
[query_file] is the query input (../query_input/[name]
).
The results are stored in ../results/[name]
.
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_experiments.sh LJ ../ae-data/soc-LiveJournal1.weighted.adj ../query_input/LJ_queries.txt | less than 2 hours |
WP | bash run_experiments.sh WP ../ae-data/wikipedia_link_en_weighted.adj ../query_input/WP_queries.txt | ~4 hours |
UK2 | bash run_experiments.sh UK2 ../ae-data/uk-2002-sym-weighted.adj ../query_input/UK2_queries.txt | ~5 hours |
TW | bash run_experiments.sh TW ../ae-data/twitter-2010.weighted.adj ../query_input/TW_queries.txt | ~12 hours |
FR | bash run_experiments.sh FR ../ae-data/friendster.sym.weighted.adj ../query_input/FR_queries.txt | > 24 hours |
2. To run road networks (in the .../Glign/apps
folder) (only for Table 15):
Graph | Command | Estimated time |
---|---|---|
RD-CA | bash run_roadnets.sh RDCA ../ae-data/roadNet-CA.weighted.adj ../query_input/road-ca-512-random.txt | ~2 hours |
RD-US | bash run_roadnets.sh RDUS ../ae-data/road-road-usa.weighted.adj ../query_input/road-usa-512-random.txt | ~8 hours |
3. To run profiling experiments in Glign:
bash run_profilings.sh [name] [graph_path] [query_file]
For example, to run experiments for LJ, use bash run_profilings.sh LJ ../ae-data/soc-LiveJournal1.weighted.adj ../query_input/LJ_queries.txt
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_profilings.sh LJ ../ae-data/soc-LiveJournal1.weighted.adj ../query_input/LJ_queries.txt | less than 2 hours |
WP | bash run_profilings.sh WP ../ae-data/wikipedia_link_en_weighted.adj ../query_input/WP_queries.txt | ~4 hours |
UK2 | bash run_profilings.sh UK2 ../ae-data/uk-2002-sym-weighted.adj ../query_input/UK2_queries.txt | ~5 hours |
TW | bash run_profilings.sh TW ../ae-data/twitter-2010.weighted.adj ../query_input/TW_queries.txt | ~12 hours |
FR | bash run_profilings.sh FR ../ae-data/friendster.sym.weighted.adj ../query_input/FR_queries.txt | >24 hours |
4. To run performance tests in Krill (under Krill-AE/apps
directory):
bash compile_benchmarks.sh
bash run_experiments.sh [name] [graph_path] [query_file] [output_path]
Note that [output_path] is the root directory of Glign (.../Glign-AE
).
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_experiments.sh LJ ../../Glign-AE/ae-data/soc-LiveJournal1.weighted.adj ../../Glign-AE/query_input/LJ_queries.txt ../../Glign-AE | less than 2 hours |
WP | bash run_experiments.sh WP ../../Glign-AE/ae-data/wikipedia_link_en_weighted.adj ../../Glign-AE/query_input/WP_queries.txt ../../Glign-AE | ~4 hours |
UK2 | bash run_experiments.sh UK2 ../../Glign-AE/ae-data/uk-2002-sym-weighted.adj ../../Glign-AE/query_input/UK2_queries.txt ../../Glign-AE | ~6 hours |
TW | bash run_experiments.sh TW ../../Glign-AE/ae-data/twitter-2010.weighted.adj ../../Glign-AE/query_input/TW_queries.txt ../../Glign-AE | ~12 hours |
FR | bash run_experiments.sh FR ../../Glign-AE/ae-data/friendster.sym.weighted.adj ../../Glign-AE/query_input/FR_queries.txt ../../Glign-AE | >24 hours |
5. To run profiling experiments in Krill (under Krill-AE/apps
directory):
bash run_profilings.sh [name] [graph_path] [query_file] [output_path]
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_profilings.sh LJ ../../Glign-AE/ae-data/soc-LiveJournal1.weighted.adj ../../Glign-AE/query_input/LJ_queries.txt ../../Glign-AE | less than 2 hours |
WP | bash run_profilings.sh WP ../../Glign-AE/ae-data/wikipedia_link_en_weighted.adj ../../Glign-AE/query_input/WP_queries.txt ../../Glign-AE | ~4 hours |
UK2 | bash run_profilings.sh UK2 ../../Glign-AE/ae-data/uk-2002-sym-weighted.adj ../../Glign-AE/query_input/query_input/UK2_queries.txt ../../Glign-AE | ~6 hours |
TW | bash run_profilings.sh TW ../../Glign-AE/ae-data/twitter-2010.weighted.adj ../../Glign-AE/query_input/TW_queries.txt ../../Glign-AE | ~12 hours |
FR | bash run_profilings.sh FR ../../Glign-AE/ae-data/friendster.sym.weighted.adj ../../Glign-AE/query_input/FR_queries.txt ../../Glign-AE | >24 hours |
6. To run performance tests in GraphM (under GraphM-AE
directory):
bash compile_benchmarks.sh
bash run_experiments.sh [name] [graph_path] [query_file] [output_path] [graph_size]
Note that [graph_size] is a required input parameter by GraphM. Use 828 for LJ, 5247 for WP, 17621 for TW, 21673 for FR, 3142 for UK2.
[output_path] is the root directory of Glign (../Glign-AE
).
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_experiments.sh LJ ../Glign-AE/ae-data/graphm/LJ ../Glign-AE/query_input/LJ_queries.txt ../Glign-AE 828 | less than 2 hours |
WP | bash run_experiments.sh WP ../Glign-AE/ae-data/graphm/WP ../Glign-AE/query_input/WP_queries.txt ../Glign-AE 5247 | ~4 hours |
UK2 | bash run_experiments.sh UK2 ../Glign-AE/ae-data/graphm/UK2 ../Glign-AE/query_input/UK2_queries.txt ../Glign-AE 3142 | ~6 hours |
TW | bash run_experiments.sh TW ../Glign-AE/ae-data/graphm/TW ../Glign-AE/query_input/TW_queries.txt ../Glign-AE 17621 | >12 hours |
FR | bash run_experiments.sh FR ../Glign-AE/ae-data/graphm/FR ../Glign-AE/query_input/FR_queries.txt ../Glign-AE 21673 | >24 hours |
7. To run profiling experiments in GraphM (under GraphM-AE
directory):
bash run_profilings.sh [name] [graph_path] [query_file] [output_path] [graph_size]
Example commands for running different graphs are listed below:
Graph | Command | Estimated time |
---|---|---|
LJ | bash run_profilings.sh LJ ../Glign-AE/ae-data/graphm/LJ ../Glign-AE/query_input/LJ_queries.txt ../Glign-AE 828 | less than 2 hours |
WP | bash run_profilings.sh WP ../Glign-AE/ae-data/graphm/WP ../Glign-AE/query_input/WP_queries.txt ../Glign-AE 5247 | ~4 hours |
UK2 | bash run_profilings.sh UK2 ../Glign-AE/ae-data/graphm/UK2 ../Glign-AE/query_input/UK2_queries.txt ../Glign-AE 3142 | ~6 hours |
TW | bash run_profilings.sh TW ../Glign-AE/ae-data/graphm/TW ../Glign-AE/query_input/TW_queries.txt ../Glign-AE 17621 | >12 hours |
FR | bash run_profilings.sh FR ../Glign-AE/ae-data/graphm/FR ../Glign-AE/query_input/FR_queries.txt ../Glign-AE 21673 | >24 hours |
Under the .../Glign-AE/results
directory and run the following command to:
1. Generate data for Figure 11:
bash ./scripts/extract_figure_11.sh [graph_abbr]
[graph_abbr] is the name of graph, use LJ for LiveJournal.
graph_abbr | Command |
---|---|
LJ | bash ./scripts/extract_figure_11.sh LJ |
WP | bash ./scripts/extract_figure_11.sh WP |
UK2 | bash ./scripts/extract_figure_11.sh UK2 |
TW | bash ./scripts/extract_figure_11.sh TW |
FR | bash ./scripts/extract_figure_11.sh FR |
2. Generate data for Figure 12:
bash ./scripts/extract_figure_12.sh [graph_abbr]
3. Generate data for Figure 13:
bash ./scripts/extract_figure_13.sh [graph_abbr]
4. Generate data for Figure 14:
bash ./scripts/extract_figure_14.sh [graph_abbr]
5. Generate data for Figure 15:
bash ./scripts/extract_figure_15.sh [graph_abbr]
6. Generate data for Figure 16:
bash ./scripts/extract_figure_16.sh [graph_abbr]
7. Generate data for Table 8:
bash ./scripts/extract_table_8.sh [graph_abbr]
8. Generate data for Table 9:
bash ./scripts/extract_table_9.sh [graph_abbr]
The cache misses are collected by perf
tool, please make sure the CPU supports related hardware counters.
9. Generate data for Table 10:
bash ./scripts/extract_table_10.sh [graph_abbr]
10. Generate data for Table 12:
bash ./scripts/extract_table_12.sh [graph_abbr]
11. Generate data for Table 13:
../apps/SSSP_Batch -option ground-truth -batch 2 -max_combination 512 -qf ../query_input/LJ_queries.txt ../ae-data/soc-LiveJournal1.weighted.adj
The results table will be printed in the terminal after the program finishes (estimation: around 1 hour).
12. Data for Table 14:
Please check any of Glign's output file inside the results folder, there is a line Profiling cost:
in each file.
For example, in results/LJ/BFS_glign_batch_LJ_2.txt
line 7, the profiling cost is 0.34739 seconds.
13. Data for Table 15:
bash ./scripts/extract_table_15.sh [graph_abbr]
Graph | Command |
---|---|
RD-CA | bash ./scripts/extract_table_15.sh RDCA |
RD-US | bash ./scripts/extract_table_15.sh RDUS |
14. Data for Table 16:
bash ./scripts/extract_table_16.sh [graph_abbr]