Skip to content

Seamless Integration of DIAMOND2 Sequence Searches in R

Latest
Compare
Choose a tag to compare
@HajkD HajkD released this 10 Sep 13:20
· 13 commits to master since this release

R interface to the DIAMOND2 Pairwise Sequence Aligner.

Motivation

We are excited to introduce DIAMOND2, a cutting-edge pairwise protein aligner tailored to meet the extensive demands of the Earth BioGenome Project and other expansive genomics initiatives. DIAMOND2 is a groundbreaking software solution designed to accelerate BLAST searches by an factor of up to 10,000x. To offer researchers even more flexibility and integration, we provide rdiamond, a dedicated interface package that allows programmatic handling of DIAMOND2 sequence searches directly through R.

The rdiamond package offers streamlined interface functions, enabling users to seamlessly run DIAMOND2 directly within R. Notably, it's designed to handle vast outputs, processing terabytes of DIAMOND2 hit files directly from the disk on a local machine, bypassing memory limitations.

Furthermore, when paired with the biomartr R package, users have the convenience of automatically fetching large-scale genomic data and subsequently searching through it using rdiamond."

This version emphasizes the utility and integration capabilities of the rdiamond package while maintaining clarity.

Install rdiamond

For Linux Users:

Please install the libpq-dev library on you linux machine by typing into the terminal:

sudo apt-get install libpq-dev

For all systems install rdiamond by typing

# install Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install()

# install Biostrings -> see here for different Biostrings verions:
# http://bioconductor.org/about/release-announcements/
BiocManager::install(c("Biostrings"))

# install.packages("devtools")
# install the current version of rdiamond on your system
devtools::install_github("drostlab/rdiamond", build_vignettes = TRUE, dependencies = TRUE)

Citation

This R package is not formally published yet, but please please cite the following paper when using this software for your research:

Buchfink B, Reuter K, Drost HG, "Sensitive protein alignments at tree-of-life scale using DIAMOND", Nature Methods 18, 366–368 (2021). doi:10.1038/s41592-021-01101-x