Open-Omics-Acceleration-Framework-2.0
This v2.0 release adds the accelerated version of following new pipelines and corresponding tools.
-
Containerized AlphaFold2-based pipeline for protein folding that takes protein sequences as input and outputs predicted protein structures. It consists of
- Open-Omics-AlphaFold: a PyTorch implementation of AlphaFold2 (v.2.2.2) monomer accelerated using 4th generation Intel® Xeon® CPU.
- Hmmer and HH-suite accelerated through 256- and 512-bit SIMD instructions (AVX2, AVX512) available on modern x86 CPUs.
- Efficient load balanced folding of multiple proteins in parallel on a dual-socket CPU.
- A docker file for seamless installation and execution.
- Can perform folding on proteins of length up to ~9k residues on a 1 TB memory machine.
- To the best of our knowledge, faster than any prior CPU/GPU implementation for folding a set of proteins.
-
Containerized DeepVariant-based variant calling (fq2vcf) pipeline that takes paired fastq.gz files as input and outputs vcf file. It achieves efficient performance across multiple CPU nodes and consists of
- BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
- SAMtools sort utility for sorting SAM/BAM files.
- Open-Omics-DeepVariant: A new version of DeepVariant v1.5.0 accelerated using 4th generation Intel® Xeon® CPU.
- A distributed memory framework that achieves excellent scaling for this pipeline across several CPU nodes.
- To the best of our knowledge, faster than any prior CPU/GPU implementation for 30x WGS dataset.
-
A fq2sortedbam pipeline accelerated using modern CPUs that takes read fastq files as input and outputs a sorted BAM file. It consists of
- BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
- SAMtools sort utility for sorting SAM/BAM files.