Releases: IntelLabs/Open-Omics-Acceleration-Framework
Open-Omics-Acceleration-Framework-2.2
v2.2 release adds and updates to docker files of major pipelines
- Adds a docker file for the fq2sortedbam pipeline
- Adds support for minimap2 and extends fq2sortedbam pipeline for long reads analysis
- Provides standalone dockers for the pre-processing and inference stages of the AlphaFold2 pipeline
- Provides standalone dockers for fq2sortedbam and Deepvariant-based inference stages of the DeepVariant-based germline fq2vcf pipeline
Open-Omics-Acceleration-Framework-2.1
v2.1 release updates and adds fixes to AlphaFold2 pipeline and DeepVariant-based germline fq2vcf pipeline.
-
AlphaFold2-based protein folding pipeline:
- Enabled inference using different models
- Bug fixes for running Model 3, 4, & 5.
- Removed unnecessary paths from run script.
- Enabled use of contiguous tensor inside TPP PyTorch extension
-
DeepVariant-based germline variant calling (fq2vcf) pipeline :
- Enabled support for gzipped reference sequence file as input
- Enabled support for reads and reference sequence data files to be in different folders
- Enabled cleanup of all intermediate data generated during the pipeline's run
- Provided an option to keep the intermediate SAM files out of bwa-mem2
- Fixed the messaging to the user in case of a failed run
- Updated README with precise instructions to run on various types of compute environments
- In AWS parallel cluster environment: enabled index creation on a compute node instead of the master node
- Added details in README about memory and disk requirements for a run using a Human WGS dataset
Open-Omics-Acceleration-Framework-2.0
This v2.0 release adds the accelerated version of following new pipelines and corresponding tools.
-
Containerized AlphaFold2-based pipeline for protein folding that takes protein sequences as input and outputs predicted protein structures. It consists of
- Open-Omics-AlphaFold: a PyTorch implementation of AlphaFold2 (v.2.2.2) monomer accelerated using 4th generation Intel® Xeon® CPU.
- Hmmer and HH-suite accelerated through 256- and 512-bit SIMD instructions (AVX2, AVX512) available on modern x86 CPUs.
- Efficient load balanced folding of multiple proteins in parallel on a dual-socket CPU.
- A docker file for seamless installation and execution.
- Can perform folding on proteins of length up to ~9k residues on a 1 TB memory machine.
- To the best of our knowledge, faster than any prior CPU/GPU implementation for folding a set of proteins.
-
Containerized DeepVariant-based variant calling (fq2vcf) pipeline that takes paired fastq.gz files as input and outputs vcf file. It achieves efficient performance across multiple CPU nodes and consists of
- BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
- SAMtools sort utility for sorting SAM/BAM files.
- Open-Omics-DeepVariant: A new version of DeepVariant v1.5.0 accelerated using 4th generation Intel® Xeon® CPU.
- A distributed memory framework that achieves excellent scaling for this pipeline across several CPU nodes.
- To the best of our knowledge, faster than any prior CPU/GPU implementation for 30x WGS dataset.
-
A fq2sortedbam pipeline accelerated using modern CPUs that takes read fastq files as input and outputs a sorted BAM file. It consists of
- BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
- SAMtools sort utility for sorting SAM/BAM files.
v1.0
First version of Open Omics Acceleration Framework containing the accelerated versions of the following tools and pipelines of digital biology.
Tools:
- BWA-MEM2: accelerated version of BWA-MEM
- MM2-Fast: accelerated version of minimap2
- UMAP_fast: accelerated version of UMAP algorithm used for visualization
- Accelerated version of AtacWorks that performs denoising of ATAC-Seq data
Pipelines
- A Single cell RNA-Seq analysis pipeline for clustering cells starting with a cell-by-gene matrix.