ISAAC

This is the development repository for ISAAC, an input-aware auto-tuning framework and code-generator for HPC/DL. This version is only compatible with NVIDIA hardware (it generates PTX source code). For OpenCL/CUDA compatibility, visit the Intel fork (https://github.com/intel/isaac) or the v1.0 branch (deprecated) or the

License

ISAAC is distributed under the MIT/X11 license.

Compilation

In order to compile and use ISAAC, only a proprietary NVIDIA driver is necessary. No CUDA SDK is required (except for testing and benchmarking against cuBLAS/cuDNN)

git clone https://github.com/ptillet/isaac.git
cd isaac; 
mkdir build; 
cd build;
cmake ../ ; make -j8;
./examples/isaac-tools --gemm --bench --suite deepbench --dtype float32
./examples/isaac-tools --conv --bench --suite deepbench --dtype float32

Python interface

The Tensorflow wrapper can be installed as follows in an environment where Tensorflow is present.

cd python;
python setup.py build; 
python setup.py install;

You can test the installation by executing:

python ./python/examples/benchmark.py

What the script does is pretty straightforward:

import isaac as sc
isaac = tf.load_op_library(sc.tensorflow)

Will expose isaac.conv2d and isaac.conv3d. You can use them like you'd use tf.nn.conv2d and tf.nn.conv3d.

If you don't want to use Tensorflow, it is possible to use the python bindings directly. See the "tune/" folder for an example.

Binary interface

Basic benchmarks for GEMM and CONV for DeepBench can be obtained using the isaac-tools binary interface:

Note that only float32 and float64 are supported at the moment.

If you want, you can also dump the PTX source code generated by ISAAC for some shapes:

./examples/isaac-tools --gemm --dump --format ptx --shape 2048,2048,2048 --layout NT --dtype float32

If you really know what you're doing, you can also capture the tiling parameters found by ISAAC:

./examples/isaac-tools --gemm --dump --format params --shape 2048,2048,2048 --layout NT --dtype float32

You will get the following output:

Tuning parameters: 4, 16, 8, 8, 8, 8, 16, 8, 16, 8, 1, 1, 1

The parameters respectively mean: (1) that shared memory loads have a width of 4 ; (2) each block comprises 16x8 threads ; (3) each threads computes a tile of 8x8 elements; (4) Each loop iteration processes 8 elements along the K axis ; (5) threads are rearranged as a 16 x 8 block for loading A, and a 16 x 8 block for loading B; (6) the reduction is split accross 1, 1 and 1 independent batches within each thread, thread-block and grid, and the results are accumulated after the inner-loop

Benchmarks

ISAAC often provides Tesla P100 - SGEMM:

Tesla P100 - DGEMM:

Tesla P100 - SCONV (vs cuDNN's IMPLICIT_PRECOMP_GEMM)

Coverage

I would consider GEMM and CONV as both being production-ready. Kernel selection is done for each new shape and the best kernel is cached in RAM. I wouldn't advise this library for applications that use 1000s of different shapes exactly once (e.g., Blocked SVD).

Acknowledgments

This work was partially supported by the National Science Foundation (IIS 1409097) and by IARPA (contract D16PC00002).

Name		Name	Last commit message	Last commit date
Latest commit History 688 Commits
documentation		documentation
examples		examples
include/isaac		include/isaac
lib		lib
papers		papers
python		python
tests		tests
tune		tune
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISAAC

License

Compilation

Python interface

Binary interface

Benchmarks

Coverage

Acknowledgments

About

Releases

Packages

Languages

License

coxlab/isaac

Folders and files

Latest commit

History

Repository files navigation

ISAAC

License

Compilation

Python interface

Binary interface

Benchmarks

Coverage

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages