Skip to content

Releases: IntelLabs/Open-Omics-Acceleration-Framework

Open-Omics-Acceleration-Framework-2.2

07 Nov 06:00
724f90b
Compare
Choose a tag to compare

v2.2 release adds and updates to docker files of major pipelines

  • Adds a docker file for the fq2sortedbam pipeline
  • Adds support for minimap2 and extends fq2sortedbam pipeline for long reads analysis
  • Provides standalone dockers for the pre-processing and inference stages of the AlphaFold2 pipeline
  • Provides standalone dockers for fq2sortedbam and Deepvariant-based inference stages of the DeepVariant-based germline fq2vcf pipeline

Open-Omics-Acceleration-Framework-2.1

08 Apr 16:17
0878db8
Compare
Choose a tag to compare

v2.1 release updates and adds fixes to AlphaFold2 pipeline and DeepVariant-based germline fq2vcf pipeline.

  • AlphaFold2-based protein folding pipeline:

    • Enabled inference using different models
    • Bug fixes for running Model 3, 4, & 5.
    • Removed unnecessary paths from run script.
    • Enabled use of contiguous tensor inside TPP PyTorch extension
  • DeepVariant-based germline variant calling (fq2vcf) pipeline :

    • Enabled support for gzipped reference sequence file as input
    • Enabled support for reads and reference sequence data files to be in different folders
    • Enabled cleanup of all intermediate data generated during the pipeline's run
    • Provided an option to keep the intermediate SAM files out of bwa-mem2
    • Fixed the messaging to the user in case of a failed run
    • Updated README with precise instructions to run on various types of compute environments
    • In AWS parallel cluster environment: enabled index creation on a compute node instead of the master node
    • Added details in README about memory and disk requirements for a run using a Human WGS dataset

Open-Omics-Acceleration-Framework-2.0

22 Nov 14:50
0eaf428
Compare
Choose a tag to compare

This v2.0 release adds the accelerated version of following new pipelines and corresponding tools.

  • Containerized AlphaFold2-based pipeline for protein folding that takes protein sequences as input and outputs predicted protein structures. It consists of

    • Open-Omics-AlphaFold: a PyTorch implementation of AlphaFold2 (v.2.2.2) monomer accelerated using 4th generation Intel® Xeon® CPU.
    • Hmmer and HH-suite accelerated through 256- and 512-bit SIMD instructions (AVX2, AVX512) available on modern x86 CPUs.
    • Efficient load balanced folding of multiple proteins in parallel on a dual-socket CPU.
    • A docker file for seamless installation and execution.
    • Can perform folding on proteins of length up to ~9k residues on a 1 TB memory machine.
    • To the best of our knowledge, faster than any prior CPU/GPU implementation for folding a set of proteins.
  • Containerized DeepVariant-based variant calling (fq2vcf) pipeline that takes paired fastq.gz files as input and outputs vcf file. It achieves efficient performance across multiple CPU nodes and consists of

    • BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
    • SAMtools sort utility for sorting SAM/BAM files.
    • Open-Omics-DeepVariant: A new version of DeepVariant v1.5.0 accelerated using 4th generation Intel® Xeon® CPU.
    • A distributed memory framework that achieves excellent scaling for this pipeline across several CPU nodes.
    • To the best of our knowledge, faster than any prior CPU/GPU implementation for 30x WGS dataset.
  • A fq2sortedbam pipeline accelerated using modern CPUs that takes read fastq files as input and outputs a sorted BAM file. It consists of

    • BWA-MEM2: an architecture-efficient version of BWA-MEM that is 1.8-3.0 times faster.
    • SAMtools sort utility for sorting SAM/BAM files.

v1.0

19 May 14:12
Compare
Choose a tag to compare

First version of Open Omics Acceleration Framework containing the accelerated versions of the following tools and pipelines of digital biology.

Tools:

  • BWA-MEM2: accelerated version of BWA-MEM
  • MM2-Fast: accelerated version of minimap2
  • UMAP_fast: accelerated version of UMAP algorithm used for visualization
  • Accelerated version of AtacWorks that performs denoising of ATAC-Seq data

Pipelines

  • A Single cell RNA-Seq analysis pipeline for clustering cells starting with a cell-by-gene matrix.