AccelGraph-CAPI is an open source graph processing framework. It is designed as a modular benchmarking suite for graph processing algorithms. It provides an end to end evaluation infrastructure which includes the preprocessing stage of forming the graph structure and the graph algorithm. The OpenMP part of AccelGraph-CAPI has been developed on Ubuntu 18.04, with PowerPC/Intel architecture taken into account. AccelGraph-CAPI is coded using C giving the researcher full flexibility with modifying data structures and other algorithmic optimizations. Furthermore, this benchmarking suite has been fully integrated with IBM Coherent Accelerator Processor Interface (CAPI), demonstrating the contrast in performance between Shared Memory Accelerators and Parallel Processors.
- Judy Arrays
AccelGraph@CAPI:~$ sudo apt-get install libjudy-dev
- OpenMP is already a feature of the compiler, so this step is not necessary.
AccelGraph@CAPI:~$ sudo apt-get install libomp-dev
- Simulation and Synthesis
- This framework was developed on Ubuntu 18.04 LTS.
- ModelSim is used for simulation and installed along side Quartus II 18.1.
- Synthesis requires ALTERA Quartus, starting from release 15.0 of Quartus II should be fine.
- Nallatech P385-A7 card with the Altera Stratix-V-GX-A7 FPGA is supported.
- Environment Variable setup,
HOME
andALTERAPATH
depend on where you clone the repository and install ModelSim.
#quartus 18.1 env-variables
export ALTERAPATH="${HOME}/intelFPGA/18.1"
export QUARTUS_INSTALL_DIR="${ALTERAPATH}/quartus"
export LM_LICENSE_FILE="${ALTERAPATH}/licenses/psl_A000_license.dat:${ALTERAPATH}/licenses/common_license.dat"
export QSYS_ROOTDIR="${ALTERAPATH}/quartus/sopc_builder/bin"
export PATH=$PATH:${ALTERAPATH}/quartus/bin
export PATH=$PATH:${ALTERAPATH}/nios2eds/bin
#modelsim env-variables
export PATH=$PATH:${ALTERAPATH}/modelsim_ase/bin
#AccelGraph project folder
export PSLSE_ROOT="AccelGraph/01_capi_precis"
#CAPI framework env variables
export PSLSE_INSTALL_DIR="${HOME}/Documents/github_repos/${PSLSE_ROOT}/01_capi_integration/pslse"
export VPI_USER_H_DIR="${ALTERAPATH}/modelsim_ase/include"
export PSLVER=8
export BIT32=n
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PSLSE_INSTALL_DIR/libcxl:$PSLSE_INSTALL_DIR/afu_driver/src"
#PSLSE env variables
export PSLSE_SERVER_DIR="${HOME}/Documents/github_repos/${PSLSE_ROOT}/01_capi_integration/accelerator_sim/server"
export PSLSE_SERVER_DAT="${PSLSE_SERVER_DIR}/pslse_server.dat"
export SHIM_HOST_DAT="${PSLSE_SERVER_DIR}/shim_host.dat"
export PSLSE_PARMS="${PSLSE_SERVER_DIR}/pslse.parms"
export DEBUG_LOG_PATH="${PSLSE_SERVER_DIR}/debug.log"
- AFU Communication with PSL
- please check (OpenGraph).
- please check (CAPIPrecis).
- please check (CAPI User's Manual).
- Clone AccelGraph-CAPI.
AccelGraph@CAPI:~$ git https://github.com/atmughrabi/AccelGraph.git
- From the home directory go to the AccelGraph directory:
AccelGraph@CAPI:~$ cd AccelGraph/
- Setup the CAPI submodules.
AccelGraph@CAPI:~AccelGraph$ git submodule update --init --recursive
- The default compilation is
openmp
mode:
AccelGraph@CAPI:~AccelGraph$ make
- From the root directory you can modify the Makefile with the (parameters) you need for OpenMP:
AccelGraph@CAPI:~AccelGraph$ make run
- OR
AccelGraph@CAPI:~AccelGraph$ make run-openmp
- NOTE: You need CAPI environment setup on your machine (tested on Power8 8247-22L).
- CAPI Education Videos
- We are not supporting CAPI-SNAP since our processing suite supports accelerator-cache. SNAP does not support this feature yet. So if you are interested in streaming applications or do not benefit from caches SNAP is also good candidate.
- To check the SNAP framework: https://github.com/open-power/snap.
- NOTE: You need three open terminals, for running vsim, pslse, and the application.
- On terminal 1. Run [ModelSim vsim] for
simulation
this step is not needed when running on real hardware, this just simulates the AFU that resides on your (CAPI supported) FPGA :
AccelGraph@CAPI:~AccelGraph$ make run-vsim
- The previous step will execute vsim.tcl script to compile the design, to start the running the simulation just execute the following command at the transcript terminal of ModelSim :
r #recompile design
,c #run simulation
ModelSim> rc
For simulation with floating point IPs (Altera)
ModelSim> rcf
- On Terminal 2. Run PSL Simulation Engine (PSLSE) for
simulation
this step is not needed when running on real hardware, this just emulates the PSL that resides on your (CAPI supported) IBM-PowerPC machine :
AccelGraph@CAPI:~AccelGraph$ make run-pslse
- On Terminal 3. Run the algorithm that communicates with the PSLSE (simulation):
AccelGraph@CAPI:~AccelGraph$ make run-capi-sim
- On Terminal 3. Run the algorithm that communicates with the PSLSE (simulation) printing out stats based on the responses received to the AFU-Control layer:
AccelGraph@CAPI:~AccelGraph$ make run-capi-sim-verbose
- Example output: please check (CAPI User's Manual), for each response explanation. The stats are labeled
RESPONSE_COMMANADTYPE_count
.
*-----------------------------------------------------*
| AFU Stats |
-----------------------------------------------------
| CYCLE_count : #Cycles |
*-----------------------------------------------------*
| Responses Stats |
-----------------------------------------------------
| DONE_count : (#) Commands successful |
-----------------------------------------------------
| DONE_READ_count : (#) Reads successful |
| DONE_WRITE_count : (#) Writes successful |
-----------------------------------------------------
| DONE_RESTART_count : (#) Bus Restart |
-----------------------------------------------------
| DONE_PREFETCH_READ_count : (#) Read Prefetches |
| DONE_PREFETCH_WRITE_count: (#) Write Prefetches |
-----------------------------------------------------
| PAGED_count : 0 |
| FLUSHED_count : 0 |
| AERROR_count : 0 |
| DERROR_count : 0 |
| FAILED_count : 0 |
| NRES_count : 0 |
| NLOCK_count : 0 |
*-----------------------------------------------------*
These steps require ALTERA Quartus synthesis tool, starting from release 15.0 of Quartus II should be fine.
- From the root directory (using terminal)
AccelGraph@CAPI:~AccelGraph$ make run-capi-synth
- Check AccelGraph.sta.rpt for timing requirements violations
- From the root directory (using terminal)
AccelGraph@CAPI:~AccelGraph$ make run-capi-gui
- Synthesize using Quartus GUI
- From the root directory go to CAPI integration directory -> AccelGraph synthesis folder
AccelGraph@CAPI:~AccelGraph$ cd 03_capi_integration/accelerator_synth/
- invoke synthesis from terminal
AccelGraph@CAPI:~AccelGraph/03_capi_integration/accelerator_synth$ make
- From the root directory go to CAPI integration directory -> AccelGraph synthesis folder
AccelGraph@CAPI:~AccelGraph$ cd 03_capi_integration/accelerator_synth/
- invoke synthesis from terminal
AccelGraph@CAPI:~AccelGraph/03_capi_integration/accelerator_synth$ make gui
- From the root directory go to CAPI integration directory -> AccelGraph binary images:
AccelGraph@CAPI:~AccelGraph$ cd 03_capi_integration/accelerator_bin/
- Flash the image to the corresponding
#define DEVICE
you can modify it according to your Power8 system from00_bench/include/capi_utils/capienv.h
AccelGraph@CAPI:~AccelGraph/03_capi_integration/accelerator_bin$ sudo capi-flash-script accel-graph_GITCOMMIT#_DATETIME.rbf
- Runs algorithm that communicates with the or PSL (real HW):
AccelGraph@CAPI:~AccelGraph$ make run-capi-fpga
This run outputs different AFU-Control stats based on the responses received from the PSL
- Runs algorithm that communicates with the or PSL (real HW):
AccelGraph@CAPI:~AccelGraph$ make run-capi-fpga-verbose
-m, --afu-config=[DEFAULT:0x1]
CAPI FPGA integration: AFU-Control
buffers(read/write/prefetcher) arbitration 0x01
round robin 0x10 fixed priority.
-q, --cu-config=[DEFAULT:0x01]
CAPI FPGA integration: CU configurations for
requests cached/non cached/prefetcher active or
not check README for more explanation.
Usage: open-graph-openmp [OPTION...]
-f <graph file> -d [data structure] -a [algorithm] -r [root] -n
[num threads] [-h -c -s -w]
OpenGraph is an open source graph processing framework, it is designed to be a
benchmarking suite for various graph processing algorithms using pure C.
-a, --algorithm=[DEFAULT:[0]-BFS]
[0]-BFS,
[1]-Page-rank,
[2]-SSSP-DeltaStepping,
[3]-SSSP-BellmanFord,
[4]-DFS,
[5]-SPMV,
[6]-Connected-Components,
[7]-Betweenness-Centrality,
[8]-Triangle Counting,
[9-BUGGY]-IncrementalAggregation.
-b, --delta=[DEFAULT:1]
SSSP Delta value [Default:1].
-c, --convert-format=[DEFAULT:[1]-binary-edgeList]
[serialize flag must be on --serialize to write]
Serialize graph text format (edge list format) to
binary graph file on load example:-f <graph file>
-c this is specifically useful if you have Graph
CSR/Grid structure and want to save in a binary
file format to skip the preprocessing step for
future runs.
[0]-text-edgeList,
[1]-binary-edgeList,
[2]-graphCSR-binary.
-C, --cache-size=<LLC>
LLC cache size for MASK vertex reodering
-d, --data-structure=[DEFAULT:[0]-CSR]
[0]-CSR,
[1]-Grid,
[2]-Adj LinkedList,
[3]-Adj ArrayList
[4-5] same order bitmap frontiers.
-e, --tolerance=[EPSILON:0.0001]
Tolerance value of for page rank
[default:0.0001].
-f, --graph-file=<FILE>
Edge list represents the graph binary format to
run the algorithm textual format change
graph-file-format.
-F, --labels-file=<FILE>
Read and reorder vertex labels from a text file,
Specify the file name for the new graph reorder,
generated from Gorder, Rabbit-order, etc.
-g, --bin-size=[SIZE:512]
You bin vertices's histogram according to this
parameter, if you have a large graph you want to
illustrate.
-i, --num-iterations=[DEFAULT:20]
Number of iterations for page rank to converge
[default:20] SSSP-BellmanFord [default:V-1].
-j, --verbosity=[DEFAULT:[0:no stats output]
For now it controls the output of .perf file and
PageRank .stats (needs --stats enabled)
filesPageRank .stat [1:top-k results] [2:top-k
results and top-k ranked vertices listed.
-k, --remove-duplicate
Removers duplicate edges and self loops from the
graph.
-K, --Kernel-num-threads=[DEFAULT:algo-num-threads]
Number of threads for graph processing kernel
(critical-path) (graph algorithm)
-l, --light-reorder-l1=[DEFAULT:[0]-no-reordering]
Relabels the graph for better cache performance
(first layer).
[0]-no-reordering,
[1]-out-degree,
[2]-in-degree,
[3]-(in+out)-degree,
[4]-DBG-out,
[5]-DBG-in,
[6]-HUBSort-out,
[7]-HUBSort-in,
[8]-HUBCluster-out,
[9]-HUBCluster-in,
[10]-(random)-degree,
[11]-LoadFromFile (used for Rabbit order).
-L, --light-reorder-l2=[DEFAULT:[0]-no-reordering]
Relabels the graph for better cache performance
(second layer).
[0]-no-reordering,
[1]-out-degree,
[2]-in-degree,
[3]-(in+out)-degree,
[4]-DBG-out,
[5]-DBG-in,
[6]-HUBSort-out,
[7]-HUBSort-in,
[8]-HUBCluster-out,
[9]-HUBCluster-in,
[10]-(random)-degree,
[11]-LoadFromFile (used for Rabbit order).
-O, --light-reorder-l3=[DEFAULT:[0]-no-reordering]
Relabels the graph for better cache performance
(third layer).
[0]-no-reordering,
[1]-out-degree,
[2]-in-degree,
[3]-(in+out)-degree,
[4]-DBG-out,
[5]-DBG-in,
[6]-HUBSort-out,
[7]-HUBSort-in,
[8]-HUBCluster-out,
[9]-HUBCluster-in,
[10]-(random)-degree,
[11]-LoadFromFile (used for Rabbit order).
-M, --mask-mode=[DEFAULT:[0:disabled]]
Encodes [0:disabled] the last two bits of
[1:out-degree]-Edgelist-labels
[2:in-degree]-Edgelist-labels or
[3:out-degree]-vertex-property-data
[4:in-degree]-vertex-property-data with hot/cold
hints [11:HOT]|[10:WARM]|[01:LUKEWARM]|[00:COLD]
to specialize caching. The algorithm needs to
support value unmask to work.
-n, --pre-num-threads=[DEFAULT:MAX]
Number of threads for preprocessing (graph
structure) step
-N, --algo-num-threads=[DEFAULT:MAX]
Number of threads for graph processing (graph
algorithm)
-o, --sort=[DEFAULT:[0]-radix-src]
[0]-radix-src,
[1]-radix-src-dest,
[2]-count-src,
[3]-count-src-dst.
-p, --direction=[DEFAULT:[0]-PULL]
[0]-PULL,
[1]-PUSH,
[2]-HYBRID.
NOTE: Please consult the function switch table for each
algorithm.
-r, --root=[DEFAULT:0]
BFS, DFS, SSSP root
-s, --symmetrize
Symmetric graph, create a set of incoming edges.
-S, --stats
Write algorithm stats to file. same directory as
the graph.PageRank: Dumps top-k ranks matching
using QPR similarity metrics.
-t, --num-trials=[DEFAULT:[1 Trial]]
Number of trials for whole run (graph algorithm
run) [default:1].
-w, --generate-weights
Load or Generate weights. Check ->graphConfig.h
#define WEIGHTED 1 beforehand then recompile using
this option.
-x, --serialize
Enable file conversion/serialization use with
--convert-format.
-z, --graph-file-format=[DEFAULT:[1]-binary-edgeList]
Specify file format to be read, is it textual edge
list, or a binary file edge list. This is
specifically useful if you have Graph CSR/Grid
structure already saved in a binary file format to
skip the preprocessing step.
[0]-text edgeList,
[1]-binary edgeList,
[2]-graphCSR binary.
-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version
-
00_graph_bench
include
- Major function headersalgorithms
- supported Graph algorithmscapi
- capi integrationBFS.h
- Breadth First SearchDFS.h
- Depth First SearchSSSP.h
- Single Source Shortest PathbellmanFord.h
- Single Source Shortest Path using Bellman FordincrementalAgreggation.h
- Incremental Aggregation for clusteringpageRank.h
- Page Rank AlgorithmSPMV.h
- Sparse Matrix Vector Multiplication
src
- Major function Source filesalgorithms
- supported Graph algorithmscapi
- CAPI integrationBFS.c
- Breadth First SearchDFS.c
- Depth First SearchSSSP.c
- Single Source Shortest PathbellmanFord.c
- Single Source Shortest Path using Bellman FordincrementalAgreggation.c
- Incremental Aggregation for clusteringpageRank.c
- Page Rank AlgorithmSPMV.c
- Sparse Matrix Vector Multiplication
-
Makefile
- Global makefile
- Finish integration with CAPI Simulation
- Finish integration with CAPI Cache
- Finish Synthesis with CAPI (Meets time requirements)
- Finish graph algorithms suite CAPI
- BFS (Breadth First Search)
- PR (Page-Rank)
- DFS (Depth First Search)
(work in progress)
- IA (Incremental Aggregation) (Needs Atomic Operation -> CAPI v2.0)
- SSSP (BellmanFord) (Needs Atomic Operation -> CAPI v2.0)
- SSSP (Dijkstra) (Needs Atomic Operation -> CAPI v2.0)
- CC (Connected Components)
- TC (Triangle Counting)
(work in progress)
- SPMV (Sparse Matrix-vector Multiplication)
- BC (Betweenness Centrality)
(work in progress)
- Support testing
Report bugs to atmughra@ncsu.edu