aurora-runtime

This package is an attempt to reproduce NVIDIA's CUDA Runtime API [1], i.e. enable the user to write device kernels and launch them in a quasi-grid structure on NEC's Aurora SX-TSUBASA vector engine.

To that end, we wrap NEC's VE Offload [2] and UDMA [3] APIs their such that the usage mimics CUDA's runtime API.

Installation and Example

The installation is as easy as a breeze! The dependencies on the target systems are:

python (>= 3.5)
cmake (>= 3.10)
reasonably new gcc/g++ (eg. from scl devtoolset-8)
NEC Aurora SDK (ncc, libs) - under /opt/nec
LLVM-VE (llvm/clang): https://sx-aurora.com/repos/veos/ef_extra under /opt/nec

For installation,

Clone this repository:

$ git clone https://github.com/dthuerck/aurora_runtime.git

Download and build dependencies:

$ cd aurora_runtime
$ chmod +x init.sh
$ ./init.sh

That's it! Now we can build an example application featuring GEMA (256x256 batched matrix addition) and GEMM (256x256 batched matrix multiplication):

$ mkdir build && cd build
$ cmake ..
$ make

Finally, run the example with ./app-test and watch your Aurora hard at work!

Using the runtime

The runtime API functions are listed in .runtime/include/aurora_runtime.h, their usage is demonstrated in the example (see app-test.cc).

The runtime centers around the concept of a (virtual) processing group; basically, we write kernels and each kernel is then executed in a batch of size n via offload and OpenMP. Roughly speaking (for people familiar with CUDA), each processing group is a block and the batch corresponds to a grid of size n. The runtime offers the following variables that are set in kernel functions:

__pg__ix: the index of the processing group (index in the batch)
__num_pgs: the batch size / number of processing groups
__pe__ix / __pg_size: reserved for future use

Lastly, the most important part: kernels are conventional C-functions with the annotation ve_kernel and saved with a .cve extension.

The build process is fully automated and supported by CMake. For details, please refer to CMakeLists.txt.

Creating a new project

Ideally, use this repository as a scaffolding:

Clone this repository and run the init.sh.
Replace gema.cve, gemm.cve by your kernels.
Replace app-test.cc by your application's source.
Change the CMakeLists.txt accordingly.

That's it!

Standing on the shoulder of giants...

This project uses the following packages:

VE Offload [1]
VE UDMA [2]
NEC's LLVM
pycparse

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.runtime		.runtime
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
app-test.cc		app-test.cc
gema.cve		gema.cve
gemm.cve		gemm.cve
init.sh		init.sh
timer.cc		timer.cc
timer.h		timer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aurora-runtime

Installation and Example

Using the runtime

Creating a new project

Standing on the shoulder of giants...

References

About

Releases

Packages

Contributors 2

Languages

License

dthuerck/aurora-runtime

Folders and files

Latest commit

History

Repository files navigation

aurora-runtime

Installation and Example

Using the runtime

Creating a new project

Standing on the shoulder of giants...

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages