diff --git a/docs/elements/balar/BalarInDepth.md b/docs/elements/balar/BalarInDepth.md new file mode 100644 index 00000000..3653fb20 --- /dev/null +++ b/docs/elements/balar/BalarInDepth.md @@ -0,0 +1,127 @@ +--- +title: Balar In Depth +--- + +This doc provide some high level views on various aspects of *balar*. + +## balar CUDA calls dispatch mechanism + +In *balar*, every CUDA API call and return are represented by `SST::BalarComponent::BalarCudaCallPacket_t` and `SST::BalarComponent::BalarCudaCallReturnPacket_t`. These two structures contain necessary arguments for CUDA function calls and return values. + +Since *balar* is a [MMIO](https://sst-simulator.org/sst-docs/docs/elements/memHierarchy/stdmem#mmio) (memory mapped IO) device, it receives CUDA call packets via incoming writes to its mapped address. Specifically, it follows the dispatch sequence as follow: + +```mermaid +sequenceDiagram + autonumber + participant dmaEngine + participant balarMMIO + participant balarTestCPU + participant memory + balarTestCPU->>memory: Write CUDA API packet to
scratch memory location + balarTestCPU->>balarMMIO: Write pointer to
scratch memory location + balarMMIO->>dmaEngine: Issue a Read to
retrieve the CUDA packet + dmaEngine->>balarMMIO: Return packet + balarMMIO->>balarMMIO: Call GPGPU-Sim functions + balarMMIO->>memory: Write CUDA return packet to pointer + balarMMIO->>balarTestCPU: Send response to the initial write +``` + +:::note + +*BalarTestCPU* writes the pointer to the CUDA packet into *balar*'s MMIO address range, which *balar* will use this to copy the actual packet content into simulator memory space. + +::: + +:::note + +With direct-execution, there are some differences with `cudaMemcpy()` function calls. Specifically, *balar* will need to copy data from SST memory system with `cudaMemcpyHostToDevice` and copy data from simulator memory space into SST memory with `cudaMemcpyDeviceToHost` using *dmaEngine*. + +::: + +## Custom CUDA runtime library + +Located in `src/sst/elements/balar/tests/vanadisLLVMRISCV`, the custom runtime lib `cuda_runtime_api_vanadis.cc` will be linked with CUDA programs. For most CUDA APIs, it will create `SST::BalarComponent::BalarCudaCallPacket_t` packets and send pointers to the packets to *balar*. + +For each CUDA call using `makeCudaCall()`, *balar* will first map its MMIO into *vanadis*'s virtual memory with memory fencing ops first. The actual `mmap` call is performed via inline assembly code to avoid invalid accesses into *balar*'s MMIO address due to OoO execution. *Balar* will unmap immediately after pointer is written for the same reason. + +- For blocking CUDA calls, *balar* will poll on the last CUDA API return status via `readLastCudaStatus()` until the operation is completed. +- For non-blocking CUDA calls, *balar* will return immediately. + +## Trace-driven mode component setup + +We provided a config script `src/sst/elements/balar/tests/testBalar-testcpu.py` to run with trace information. The configuration graph roughly looks like this: + +```mermaid +flowchart TD + balarTestCPU + balarMMIO + dmaEngine + memory + router + balarTestCPU <--> router + balarMMIO <--mmio_iface--> router + dmaEngine <--mem_iface--> router + dmaEngine <--mmio_iface--> router + memory <--> router +``` + +*dmaEngine* has two memory interfaces. One for receiving commands (`mmio_iface`) and the other is used to access data (`mem_iface`). + +## Direct-execution mode component setup + +For direct-execution with *vanadis*, the config script is at `src/sst/elements/balar/tests/testBalar-vanadis.py`, with configuration graph: + +```mermaid +flowchart TD + vanadisCore + coreTLB + coreCache + vanadisOS + osMMU + balar + balarTLB + dmaEngine + memory + router + coreCacheBus{{coreCacheBus}} + + subgraph " " + direction LR + subgraph VanadisCPU + direction TB + vanadisCore <--> coreTLB + coreTLB <--> coreCacheBus + coreCacheBus <--> coreCache + end + + subgraph OS + direction TB + vanadisOS <--> osMMU + end + balarTLB <--> coreCacheBus + balarTLB <--MMU::m_nicTlbLink--> osMMU + vanadisCore <--> vanadisOS + coreTLB <--MMU::m_coreLinks--> osMMU + subgraph Balar + direction TB + dmaEngine <--> balarTLB + balar + end + end + balar <--mmio interface--> router + coreCache <--> router + dmaEngine <--mmio interface--> router + router <--> memory +``` + +:::note + +Some details are omitted for simplicity. + +::: + +:::note + +*balar* needs a TLB as *vanadis* works in virtual memory space. That part of the configuration script is based on the test example for [*rdmaNic*](../rdmaNic/intro.md). + +::: diff --git a/docs/elements/balar/CompilingRISCVCUDA.md b/docs/elements/balar/CompilingRISCVCUDA.md new file mode 100644 index 00000000..c0e233c0 --- /dev/null +++ b/docs/elements/balar/CompilingRISCVCUDA.md @@ -0,0 +1,84 @@ +--- +title: Compiling RISCV + CUDA +--- + +This page provides information to compile a CUDA program from source code into binary that can be run with *vanadis* and *balar*. + +:::info + +This documentation assumes you have followed the [QuickStart](./QuickStart.md#llvm--riscv-gnu-toolchain) guide on setting up LLVM and RISCV GNU toolchain. + +::: + +Since no real GPU will be used during the simulation, a custom CUDA runtime library is needed to intercept the CUDA API calls from CPU code and dispatch those to *balar* and GPGPU-Sim. You can find the source code for the custom CUDA runtime at `src/sst/elements/balar/tests/vanadisLLVMRISCV/cuda_runtime_api_vanadis.cc` + +In the same folder, there is a Makefile handling compiling the custom runtime and a simple vector add program. + +```Makefile title="Makefile" +# Custom Vanadis CUDA lib +CUSTOM_CUDA_LIB_SRC := cuda_runtime_api_vanadis.cc +CUSTOM_CUDA_LIB := cudart_vanadis + +# Clang prefix +CLANG_PREFIX := $(LLVM_INSTALL_PATH)/bin/ + +# RISCV Toolchain path +# Need to set RISCV_TOOLCHAIN_INSTALL_PATH env variable +RISCV_PREFIX := $(RISCV_TOOLCHAIN_INSTALL_PATH)/bin/riscv64-unknown-linux-gnu- +RISCV_SYSROOT := $(RISCV_TOOLCHAIN_INSTALL_PATH)/sysroot/ +RISCV_CXX := $(wildcard $(RISCV_TOOLCHAIN_INSTALL_PATH)/riscv64-unknown-linux-gnu/include/c++/*/) + +# Clang flags +C_FLAGS_RISCV_LINUX := --target=riscv64-unknown-linux-gnu -march=rv64gc -static-libgcc --gcc-toolchain=$(RISCV_TOOLCHAIN_INSTALL_PATH) --sysroot=$(RISCV_SYSROOT) +CXX_FLAGS_RISCV_LINUX := -static-libgcc -static-libstdc++ -static --gcc-toolchain=$(RISCV_TOOLCHAIN_INSTALL_PATH) --sysroot=$(RISCV_SYSROOT) -nostdlibinc -stdlib++-isystem$(RISCV_CXX)/ -stdlib++-isystem$(RISCV_SYSROOT)/usr/include -stdlib++-isystem$(RISCV_CXX)/riscv64-unknown-linux-gnu --target=riscv64-unknown-linux-gnu -march=rv64gc +CXX_FLAGS_RISCV_LINUX_CLANG_ONLY := --gcc-toolchain=$(RISCV_TOOLCHAIN_INSTALL_PATH) --sysroot=$(RISCV_SYSROOT) -I$(RISCV_SYSROOT)/usr/include --target=riscv64 -march=rv64gc +CXX_FLAGS_CUDA := --cuda-gpu-arch=$(GPU_ARCH) --cuda-path=$(CUDA_INSTALL_PATH) -L$(CUDA_INSTALL_PATH)/lib64 -lcudart -ldl -lrt -pthread +CXX_FLAGS_RISCV_CUDA := -static --cuda-gpu-arch=$(GPU_ARCH) --cuda-path=$(CUDA_INSTALL_PATH) -L. -Wl,-static -l$(CUSTOM_CUDA_LIB) -L$(CUDA_INSTALL_PATH)/lib64 -Wl,-static -ldl -lrt -pthread +CXX_FLAGS_CUDA_WRAPPER := -include __clang_cuda_runtime_wrapper.h + +all: helloworld vecadd + +test_llvm_env: +ifeq ($(LLVM_INSTALL_PATH),) + $(error env LLVM_INSTALL_PATH is not set) +endif + +test_vanadis_envs: +ifeq ($(RISCV_TOOLCHAIN_INSTALL_PATH),) + $(error env RISCV_TOOLCHAIN_INSTALL_PATH is not set) +endif +ifeq ($(GPU_ARCH),) + $(error env GPU_ARCH is not set) +endif + +# Sample helloworld program, pure CPU code +# test if clang is working properly +helloworld: helloworld.c test_llvm_env + $(CLANG_PREFIX)clang -static -mno-relax $(C_FLAGS_RISCV_LINUX) $< -o $@ + +# Custom CUDA runtime target +# Have to embed the CUDA version information for GPGPU-Sim +vanadis_cuda: $(CUSTOM_CUDA_LIB_SRC) test_vanadis_envs + $(RISCV_PREFIX)gcc -c -static -I$(CUDA_INSTALL_PATH)/include -I. -DBALAR_CUDA_VERSION=\"libcudart_vanadis.a.$(shell echo $$CUDA_VERSION_NUMBER | cut -c 1-2)\" -fpic $< + $(RISCV_PREFIX)gcc -shared -o lib$(CUSTOM_CUDA_LIB).so $(CUSTOM_CUDA_LIB_SRC:.cc=.o) + $(RISCV_PREFIX)ar rcs lib$(CUSTOM_CUDA_LIB).a $(CUSTOM_CUDA_LIB_SRC:.cc=.o) + +# RISCV + CUDA binary +vecadd: vecadd.cu test_vanadis_envs test_llvm_env vanadis_cuda + $(CLANG_PREFIX)clang++ $< -o $@ $(CXX_FLAGS_RISCV_CUDA) $(CXX_FLAGS_CUDA_WRAPPER) $(CXX_FLAGS_RISCV_LINUX) + +# x86 + CUDA binary +vecadd_normal: vecadd.cu + $(CLANG_PREFIX)clang++ $< -o $@ $(CXX_FLAGS_CUDA_WRAPPER) $(CXX_FLAGS_CUDA) + +clean: + rm -f lib$(CUSTOM_CUDA_LIB).so lib$(CUSTOM_CUDA_LIB).a $(CUSTOM_CUDA_LIB_SRC:.cc=.o) vecadd helloworld +``` + +For more complicated CUDA program, you can refer to the flags and CC setup in [gpu-app-collection's Makefile](https://github.com/accel-sim/gpu-app-collection/blob/7db54738af0aed3760f496f6c968ee5a40c0ee46/src/cuda/common/common.mk#L77-L106). + +:::note + +Vanadis currently only support programs that are statically linked. + +::: diff --git a/docs/elements/balar/QuickStart.md b/docs/elements/balar/QuickStart.md new file mode 100644 index 00000000..640e17c6 --- /dev/null +++ b/docs/elements/balar/QuickStart.md @@ -0,0 +1,181 @@ +--- +title: QuickStart +--- + +This page provides instructions to setup balar and run test examples. + +## Prerequisites + +### CUDA +Please refer to [NVIDIA's website](https://developer.nvidia.com/cuda-downloads) for setting up CUDA. + +After CUDA is installed, you would need to set this environment variable `CUDA_INSTALL_PATH`: + +```bash +# Assuming CUDA is installed at /usr/local/cuda +export CUDA_INSTALL_PATH=/usr/local/cuda +``` + +### GPGPU-Sim +First, we shall install the prerequisites for GPGPU-Sim: + +```bash +sudo apt-get install cmake build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev +``` + +Then to build GPGPU-Sim: +```bash +git clone https://github.com/accel-sim/gpgpu-sim_distribution.git +cd gpgpu-sim_distribution +source setup_environment +make -j4 +``` + +### LLVM + RISCV GNU Toolchain + +If you wish to run CUDA binary with [vanadis](../vanadis/intro.md) and balar, you will need to clone LLVM and RISCV GNU toolchain to compile CUDA source code. + +```bash +# Create installation dirs +mkdir llvm-install +mkdir riscv-gnu-install + +# Set up environment vars to LLVM and RISCV GCC installation folders +export LLVM_INSTALL_PATH=$(pwd)/llvm-install +export RISCV_TOOLCHAIN_INSTALL_PATH=$(pwd)/riscv-gnu-install + +# Build LLVM with RISC-V, x86, and CUDA support from source +# x86 is included for testing purpose, you can remove it if +# you will only run the CUDA binary with SST +git clone https://github.com/llvm/llvm-project.git + +cd llvm-project +mkdir build && cd build +cmake -DLLVM_TARGETS_TO_BUILD="RISCV;X86;NVPTX" -DLLVM_DEFAULT_TARGET_TRIPLE=riscv64-unknown-linux-gnu \ + -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DCMAKE_INSTALL_PREFIX=$LLVM_INSTALL_PATH ../llvm +cmake --build . -j8 +cmake --build . --target install +cd .. + +# Build RISC-V GCC toolchain +git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git + +cd riscv-gnu-toolchain +./configure --prefix=$RISCV_INSTALL_PATH +make linux -j8 +cd .. + +# Match with the GPU config file we have (V100) +export GPU_ARCH=sm_70 +``` + +### GPU App Collection + +In order to run balar's unittest, we would need to clone the [GPU app collection repo](https://github.com/accel-sim/gpu-app-collection/tree/sst_support). The unittest script will handle compilation for these kernels with the custom CUDA runtime lib. + +```bash +git clone git@github.com:accel-sim/gpu-app-collection.git +cd gpu-app-collection +git checkout sst_support + +# Setup environ vars for apps, need to have +# env var LLVM_INSTALL_PATH and RISCV_TOOLCHAIN_INSTALL_PATH +# If you plan to compile the apps directly, you will +# also need to set SST_CUSTOM_CUDA_LIB_PATH to +# the directory of the custom CUDA library, +# which normally will be `SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV` +source ./src/setup_environment sst +``` + +:::note Manual Compilation +If you want to compile the Rodinia benchmarks manually or want to tested out other kernels in the gpu-app-collection repo, you will need to set the `SST_CUSTOM_CUDA_LIB_PATH` env var and compile the custom CUDA runtime first. + +```bash +# Set SST_CUSTOM_CUDA_LIB_PATH +export SST_CUSTOM_CUDA_LIB_PATH=SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV + +# Build custom CUDA runtime +cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV +make + +# Compile Rodinia 2.0 and pull data +cd PATH_TO/gpu-app-collection +make rodinia_2.0-ft -i -j4 -C ./src +make data -C ./src + +# The compiled binaries would be located +# PATH_TO/gpu-app-collection/bin/CUDA_VERSION_NUM/release +``` + +::: + +## Compilation + +There are some subtle details need to be taken care of for *sst-core* and *sst-elements*: + +```bash +# For sst-core, you will need to disable MPI and mempools +cd PATH_TO/SST_CORE_SOURCE/ +./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools +make -j4 +make install + +# For sst-elements, you will need to specify CUDA and GPGPU-Sim path +# GPGPUSIM_ROOT will be set by sourcing the setup_environment script +cd PATH_TO/SST_ELEMENTS_SOURCE/ +./configure --prefix=$SST_ELEMENTS_HOME --with-sst-core=$SST_CORE_HOME --with-cuda=$CUDA_INSTALL_PATH --with-gpgpusim=$GPGPUSIM_ROOT +make -j4 +make install +``` + +After configuring *sst-elements*, you should have a command line output stating balar will be built. If not, you would need to check if you have CUDA and GPGPU-Sim installed and compiled properly. + +## Testing + +Balar divides its testcases into three testsuites based on time cost: + +- *simple*: takes about 10 mins to complete +- *medium*: takes about 1 hr to complete +- *long*: takes 1~2 hrs to complete + +Each All of them can be run in parallel with `-c NUM_CORES` flags. + +```bash +# Run simple tests sequentially +$SST_CORE_HOME/bin/sst-test-elements -w "*balar*simple*" + +# Run medium testcases with 2 processes +$SST_CORE_HOME/bin/sst-test-elements -c 2 -w "*balar*medium*" + +# Run long tests with 4 processes +$SST_CORE_HOME/bin/sst-test-elements -c 4 -w "*balar*long*" + +# Run all tests with 8 processes +$SST_CORE_HOME/bin/sst-test-elements -c 8 -w "*balar*" +``` + +When running each testsuite, it will first compiled the custom CUDA library under at `SST_ELEMENT_SOURCE/src/sst/elements/balar/tests/vanadisLLVMRISCV/` and link this with Rodinia 2.0 kernels in [gpu-app-collection](https://github.com/accel-sim/gpu-app-collection/tree/sst_support). + +## Running examples + +```bash +# cd into balar's tests folder +cd SST_ELEMENT_SOURCE/src/sst/elements/balar/tests + +# With testcpu +make -C vectorAdd +sst testBalar-testcpu.py --model-options="-c gpu-v100-mem.cfg -x ./vectorAdd/vectorAdd -t cuda_calls.trace" + +# With vanadis +# Run helloworld example, pure CPU code, no CUDA calls +make -C vanadisLLVMRISCV +vanadis_EXE=./vanadisLLVMRISCV/helloworld \ +vanadis_ISA=RISCV64 \ +sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' + +# Run a simple integer vector add example +vanadis_EXE=./vanadisLLVMRISCV/vecadd \ +vanadis_ISA=RISCV64 \ +BALAR_CUDA_EXE_PATH=./vanadisLLVMRISCV/vecadd \ +sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' +``` diff --git a/docs/elements/balar/TracingCUDAProgram.md b/docs/elements/balar/TracingCUDAProgram.md new file mode 100644 index 00000000..0726e5fb --- /dev/null +++ b/docs/elements/balar/TracingCUDAProgram.md @@ -0,0 +1,53 @@ +--- +title: Tracing CUDA Program +--- + +This page provides steps to use tracer tool to generate CUDA API traces to run *balar* with *BalarTestCPU*. + +:::warning + +We are working on providing a more robust version of the tracer with the new NVBit release. Including better trace format and better computation validation. + +::: + +In order to run *balar* in trace-driven mode, you will need to supply the *BalarTestCPU* component with a trace file and associated GPU memory copy data dump. We have created an instruction tracer tool based on [NVBit](https://github.com/NVlabs/NVBit) to generate those. You can find relevant code inside [Accel-Sim](https://github.com/accel-sim/accel-sim-framework) framework: `ACCEL_SIM_SRC/util/tracer_nvbit/others/cuda_api_tracer_tool`. + +:::note + +To setup the tool and generate traces, you will need a machine with a NVIDIA GPU installed. Also [NVBit](https://github.com/NVlabs/NVBit) has some requirements for both GPU hardware and software versions. Refer to its README for more info. + +::: + +To pull and compile the tracer tool: + +```bash +# Get the Accel-Sim framework +git clone git@github.com:accel-sim/accel-sim-framework.git + +# cd into tracer tool folder +cd accel-sim-framework/util/tracer_nvbit + +# Install nvbit +./install_nvbit.sh + +# Compile tracer tool +# Which will generate a 'cuda_api_tracer_tool.so' file at +# './others/cuda_api_tracer_tool/cuda_api_tracer' +make -C ./others/cuda_api_tracer_tool +``` + +Then, in order to dump traces, put path to the tracer tool shared object in `LD_PRELOAD`: + +```bash +LD_PRELOAD=PATH_TO/cuda_api_tracer_tool.so CUDA_PROG +``` + +Which will generate the following files when exiting: + +- `cuda_calls.trace`: the API trace file tracking + - `cudaMemcpy` + - `cudaMalloc` + - cuda kernel launches + - `cudaFree` +- `cuMemcpyD2H-X-X.data`: cuda memcpy device to host data payload +- `cuMemcpyH2D-X-X.data`: cuda memcpy host to device data payload diff --git a/docs/elements/balar/intro.md b/docs/elements/balar/intro.md index 3574cf07..1784cbf8 100644 --- a/docs/elements/balar/intro.md +++ b/docs/elements/balar/intro.md @@ -2,9 +2,16 @@ title: balar --- -The *BalarTestCPU* component is a trace-based test CPU that is included inside `balarMMIO` to run simulations with CUDA API call traces and data collected from a real GPU. It works by consuming a trace file and associated CUDA memory copy data files. The cudaMemcpyH2D data payload is collected for program correctness. The cudaMemcpyD2H data is collected to validate computation. +The *balar* library provides an interface between SST and [GPGPU-Sim](https://github.com/accel-sim/gpgpu-sim_distribution), a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA. It supports two execution modes: trace-driven and direct-execution. -*BalarMMIO* is responsible for relaying CUDA API requests from SST to GPGPU-Sim. Currently it supports running with CUDA traces without a real CPU model (with BalarTestCPU) or with a [Vanadis](../vanadis/intro) core (under active development with limited support for MIPS32 and no support for RV64). +- Trace-driven: balar is driven by a test CPU that consume CUDA API traces and launch CUDA calls. +- Direct-execution: a CPU model execute a CUDA binary linked with custom `libcudart` and dispatch CUDA API calls to balar and GPGPU-Sim. + +The *BalarTestCPU* component is a trace-based test CPU that is included inside balar folder (`./testcpu/`) to run simulations with CUDA API call traces and data collected from a real GPU. It works by consuming a trace file and associated CUDA memory copy data files. The `cudaMemcpyH2D` data payload is collected for program correctness. The `cudaMemcpyD2H` data is collected to validate computation. + +*BalarMMIO* is responsible for relaying CUDA API requests from SST to GPGPU-Sim. Currently it supports running with CUDA traces without a real CPU model (with BalarTestCPU) or with a [Vanadis](../vanadis/intro) core running RISCV + CUDA binary with a custom CUDA runtime (`libcudart_vanadis` inside `./tests/vanadisLLVMRISCV/`). This mode has been tested with a subset of Rodinia 2.0 benchmark kernels in unittest. + +The *dmaEngine* component performs memory data transfers between SST cache memory space and simulator memory space. It is required as balar will read/write the CPU data (i.e. `cudaMemcpy()` with vanadis) and place them into GPGPU-Sim's memory space for functional simulation. In addition, dmaEngine is also used to read CUDA dispatch packet and write return value for the custom CUDA runtime. :::note At a Glance @@ -16,13 +23,32 @@ The *BalarTestCPU* component is a trace-based test CPU that is included inside ` ::: -### Required dependencies -* **CUDA** Version 10.1 is recommended -* **GPGPUSim** A link to the GPGPUSim version can be found in balar's [README](https://github.com/sstsimulator/sst-elements/tree/master/src/sst/elements/balar/README.md). +:::warning -### Optional dependencies -*None* +Support for trace-driven mode currently is limited as it was used as early stage validation for balar implementation. It has only been tested with a simple integer vector add example. -## Additional documentation -A detailed [README](https://github.com/sstsimulator/sst-elements/tree/master/src/sst/elements/balar/README.md) on the balar element can be found in the balar source code. +We are working on providing a more robust version of this with the new NVBit release. Including better trace format and better computation validation. +::: + +### Required dependencies +* **CUDA** Version 11.0+ is recommended +* **GPGPUSim** Use the *dev* branch from [accel-sim/gpgpu-sim_distribution](https://github.com/accel-sim/gpgpu-sim_distribution) + +### Optional dependencies +* [**LLVM**](https://github.com/llvm/llvm-project) For compiling RISCV + CUDA binary +* [**RISCV GNU Toolchain**](https://github.com/riscv-collab/riscv-gnu-toolchain) For compiling RISCV + CUDA binary +* [**gpu-app-collection**](https://github.com/accel-sim/gpu-app-collection/tree/sst_support) For running unittest with Rodinia 2.0 kernels +* **Test Docker image** You can also opt in the prebuilt docker image with all dependencies setup except for GPGPU-Sim + * OS: Ubuntu 22.04 + * CUDA: 11.7 + * LLVM: 18.1.8 + * RISCV: 2024.08.06-nightly + * ```bash + # Pull prebuilt image + docker pull tgrogers/accel-sim_regress:SST-Integration-Ubuntu-22.04-cuda-11.7-llvm-18.1.8-riscv-gnu-2024.08.06-nightly + ``` + +import DocCardList from '@theme/DocCardList'; + + diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js index baf3cb54..9a2e829b 100644 --- a/website/docusaurus.config.js +++ b/website/docusaurus.config.js @@ -131,7 +131,14 @@ module.exports= async function createConfigAsync() { "apiKey": "21b099cfb417d9b97f19045cc7c20cfe", "indexName": "sst-simulator", "placeholder": "Search Docs (not working yet)" - } - } + }, + prism: { + additionalLanguages: ['bash', 'makefile'], + }, + }, + markdown: { + mermaid: true, + }, + themes: ['@docusaurus/theme-mermaid'], }; }; diff --git a/website/package.json b/website/package.json index 155cfc62..21a6cd82 100644 --- a/website/package.json +++ b/website/package.json @@ -15,13 +15,14 @@ "@docusaurus/core": "3.3.2", "@docusaurus/plugin-client-redirects": "3.3.2", "@docusaurus/preset-classic": "3.3.2", + "@docusaurus/theme-mermaid": "^3.3.2", "@mdx-js/react": "^3.0.0", "clsx": "^1.2.1", "katex": "^0.16.9", "prism-react-renderer": "^2.1.0", "react": "^18.2.0", "react-dom": "^18.2.0", - "rehype-katex":"^7.0.0", + "rehype-katex": "^7.0.0", "remark-math": "^6.0.0" }, "devDependencies": { diff --git a/website/sidebars.js b/website/sidebars.js index 80f6e5ae..0583bb61 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -640,7 +640,18 @@ const sidebars = { elements: [ 'elements/intro', "elements/ariel/intro", - "elements/balar/intro", + /* Balar */ + { + type: 'category', + label: 'balar', + link: {type: 'doc', id: 'elements/balar/intro'}, + items: [ + "elements/balar/QuickStart", + "elements/balar/TracingCUDAProgram", + "elements/balar/CompilingRISCVCUDA", + "elements/balar/BalarInDepth", + ] + }, "elements/cacheTracer/intro", "elements/cassini/intro", "elements/cramsim/intro",