This repository contains optimized versions of compute kernels used in genomics applications like GATK and HTSJDK. These kernels are optimized to run on Intel Architecture (AVX, AVX2, AVX-512, and multicore) under 64-bit Linux and Mac OSX.
Kernels included:
- PairHMM
- AVX and AVX-512 optimized versions of PairHMM used in GATK HaplotypeCaller and MuTect2.
- OpenMP support for multicore processors.
- Smith-Waterman
- AVX2 and AVX-512 optimized versions of Smith-Waterman used in GATK HaplotypeCaller and MuTect2.
- DEFLATE Compression/Decompression:
- Performance optimized Level 1 and 2 compression and decompression from Intel's ISA-L library.
- Performance optimized Level 3 through 9 compression from Intel's Open Source Technology Center zlib library.
- Partially Determined HMM (PDHMM)
- AVX2 and AVX-512 optimized versions of PDHMM used in GATK.
- Serial Implementation for CPU's with no AVX.
GKL release binaries are built on CentOS 7, to enable running on most Linux distributions (see holy-build-box for a good description of portability issues).
- Java JDK 8
- Git >= 2.5
- CMake >= 2.8.12.2
- GCC g++ >= 5.3.1
- GNU patch >= 2.6
- GNU libtool >= 2.2.6
- GNU automake >= 1.11.1
- Yasm >= 1.2.0
- zlib-devel >= 1.2.7
Run these commands to set up the build environment on CentOS:
sudo yum install -y java-1.8.0-openjdk-devel git cmake patch libtool automake yasm zlib-devel centos-release-scl help2man
sudo yum install -y devtoolset-7-gcc-c++
source scl_source enable devtoolset-7
After build requirements are met, clone, and build:
git clone https://github.com/Intel-HLS/GKL.git
cd GKL
./gradlew build
For more details check build.sh
- (Version 0.8.11 only): Some GKL dependencies are declared incorrectly as
implementation
which makes them not accessible by projects depending on GKL unless the project itself also uses those dependencies. Workaround for this issue is to include following dependencies manually in affected projects:Fix for this issue is present in master branch.implementation 'org.broadinstitute:gatk-native-bindings:1.0.0' implementation 'com.github.samtools:htsjdk:3.0.5'
- When compressing using ISA-L library (compression levels 1, 2) outputted compressed data size can differ by small amount of bytes (up to 100) for the same input. This does not affect original uncompressed contents. Investigation of this issue is ongoing.
All code is licensed under the MIT License, except:
- PairHMM code from GATK is licensed under the BSD 3-Clause License.
- ISA-L code is licensed under the BSD 3-Clause License.
- Intel Open Source Technology Center zlib (otc_zlib) code is licensed under the Zlib License.
- zlib code is licensed under the Zlib License.