-
Notifications
You must be signed in to change notification settings - Fork 125
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #752 from Kincekara/ki-fastani-1.34
updated FastANI to 1.34
- Loading branch information
Showing
8 changed files
with
299 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
## build RGDv2 ## | ||
FROM staphb/ncbi-datasets:15.11.0 as stage | ||
|
||
# copy in list of NCBI accessions and species list | ||
COPY RGDv2-NCBI-assembly-accessions.txt /RGDv2/RGDv2-NCBI-assembly-accessions.txt | ||
COPY RGDv2-NCBI-assembly-accessions-and-species.txt /RGDv2/RGDv2-NCBI-assembly-accessions-and-species.txt | ||
|
||
# download RGD genomes using NCBI datasets tools; cleanup unneccessary files; | ||
# move and re-name assemblies to include Species in the filename | ||
# make fasta files readable to all users; create File Of FileNames for all 43 assemblies (to be used with fastANI) | ||
RUN for ID in $(cat /RGDv2/RGDv2-NCBI-assembly-accessions.txt); do \ | ||
SPECIES=$(grep "${ID}" /RGDv2/RGDv2-NCBI-assembly-accessions-and-species.txt | cut -f 1) && \ | ||
echo "downloading $ID, species "${SPECIES}", from NCBI..."; \ | ||
datasets download genome accession ${ID} --filename ${ID}.zip; \ | ||
unzip ${ID}.zip; \ | ||
rm ${ID}.zip; \ | ||
mv -v ncbi_dataset/data/${ID}/${ID}*.fna /RGDv2/${ID}.${SPECIES}.fasta; \ | ||
rm -rfv ncbi_dataset/; \ | ||
rm -v README.md; \ | ||
done && \ | ||
ls /RGDv2/*.fasta >/RGDv2/FOFN-RGDv2.txt &&\ | ||
chmod 664 /RGDv2/* | ||
|
||
## App ## | ||
FROM ubuntu:jammy as app | ||
|
||
# for easy upgrade later. ARG variables only persist at build time | ||
ARG FASTANI_VER="v1.34" | ||
|
||
LABEL base.image="ubuntu:jammy" | ||
LABEL dockerfile.version="1" | ||
LABEL software="FastANI" | ||
LABEL software.version=${FASTANI_VER} | ||
LABEL description="Fast alignment-free computation of whole-genome Average Nucleotide Identity" | ||
LABEL website="https://github.com/ParBLiSS/FastANI" | ||
LABEL license="https://github.com/ParBLiSS/FastANI/blob/master/LICENSE" | ||
LABEL maintainer="Kelsey Florek" | ||
LABEL maintainer.email="kelsey.florek@slh.wisc.edu" | ||
LABEL maintainer2="Curtis Kapsak" | ||
LABEL maintainer2.email="kapsakcj@gmail.com" | ||
LABEL maintainer3="Kutluhan Incekara" | ||
LABEL maintainer3.email="kutluhan.incekara@ct.gov" | ||
|
||
# install dependencies; cleanup apt garbage | ||
RUN apt-get update && apt-get install --no-install-recommends -y \ | ||
wget \ | ||
unzip \ | ||
libgomp1 && \ | ||
apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
# download pre-compiled binary; unzip; put binary in /usr/local/bin | ||
# apt dependencies: libgomp1 unzip wget | ||
RUN wget --no-check-certificate https://github.com/ParBLiSS/FastANI/releases/download/${FASTANI_VER}/fastANI-Linux64-${FASTANI_VER}.zip && \ | ||
unzip fastANI-Linux64-${FASTANI_VER}.zip -d /usr/local/bin && \ | ||
rm fastANI-Linux64-${FASTANI_VER}.zip | ||
|
||
# copy RGDv2 from stage | ||
COPY --from=stage /RGDv2/ /RGDv2/ | ||
|
||
# default run command | ||
CMD fastANI -h | ||
|
||
# singularity compatibility | ||
ENV LC_ALL=C | ||
|
||
# set working directory | ||
WORKDIR /data | ||
|
||
## Test ## | ||
FROM app as test | ||
|
||
# test against RGDv2 | ||
RUN wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Escherichia_coli_str_K12_MG1655.fna && \ | ||
fastANI -t 8 -q /data/Escherichia_coli_str_K12_MG1655.fna --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv &&\ | ||
echo "output TSV from fastANI test:" && \ | ||
cat fastANI.RGDv2.out.tsv | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# fastANI container | ||
|
||
Main tool : [fastANI](https://github.com/ParBLiSS/FastANI) | ||
|
||
Full documentation: https://github.com/ParBLiSS/FastANI | ||
|
||
FastANI was developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. | ||
|
||
This docker image contains the Reference Genome Database version 2 (RGDv2) from the Enteric Diseases Laboratory Branch at the CDC. It contains the genomes of 43 enteric bacterial isolates that are used to for species identification of bacterial isolate WGS data. This database is NOT meant to be comprehensive - it contains the genomes of enteric pathogens commonly sequenced by EDLB and some closely related species. | ||
|
||
The FASTA files for RGDv2 can be found within `/RGDv2/` inside the docker image. | ||
|
||
## Example Usage | ||
|
||
```bash | ||
# query one genome against another genome | ||
fastANI -t 8 -q bacterial-genome1.fasta -r bacterial-genome2.fasta -o fastANI.out.tsv | ||
|
||
# query one genome against the 43 genomes in RGDv2 (requires a File Of FileNames as input) | ||
fastANI -t 8 -q bacterial-genome.fasta --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv | ||
``` |
43 changes: 43 additions & 0 deletions
43
fastani/1.34-RGDV2/RGDv2-NCBI-assembly-accessions-and-species.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
Campylobacter_coli GCA_008011635.1 | ||
Campylobacter_fetus GCA_000015085.1 | ||
Campylobacter_fetus GCA_000495505.1 | ||
Campylobacter_fetus GCA_000759515.1 | ||
Campylobacter_hyointestinalis GCA_001643955.1 | ||
Campylobacter_jejuni GCA_000017485.1 | ||
Campylobacter_jejuni GCA_008011525.1 | ||
Campylobacter_lari GCA_000019205.1 | ||
Campylobacter_lari GCA_000816225.1 | ||
Campylobacter_upsaliensis GCA_008011615.1 | ||
Escherichia_albertii GCA_000512125.1 | ||
Escherichia_coli GCA_002741475.1 | ||
Escherichia_fergusonii GCA_000026225.1 | ||
Grimontia_hollisae GCA_009665295.1 | ||
Listeria_innocua GCA_017363615.1 | ||
Listeria_innocua GCA_017363655.1 | ||
Listeria_ivanovii GCA_000252975.1 | ||
Listeria_marthii GCA_017363645.1 | ||
Listeria_monocytogenes GCA_001466295.1 | ||
Listeria_monocytogenes GCA_013625895.1 | ||
Listeria_monocytogenes GCA_013625995.1 | ||
Listeria_monocytogenes GCA_013626145.1 | ||
Listeria_monocytogenes GCA_014526935.1 | ||
Listeria_seeligeri GCA_017363605.1 | ||
Listeria_welshimeri GCA_002489005.1 | ||
Photobacterium_damselae GCA_009665375.1 | ||
Salmonella_bongori GCA_013588055.1 | ||
Salmonella_enterica GCA_011388235.1 | ||
Vibrio_alginolyticus GCA_009665435.1 | ||
Vibrio_cholerae GCA_009665515.2 | ||
Vibrio_cidicii GCA_009665415.1 | ||
Vibrio_cincinnatiensis GCA_009665395.1 | ||
Vibrio_fluvialis GCA_009665355.1 | ||
Vibrio_furnissii GCA_009665335.1 | ||
Vibrio_harveyi GCA_009665315.1 | ||
Vibrio_metoecus GCA_009665255.1 | ||
Vibrio_metoecus GCA_009665275.1 | ||
Vibrio_metschnikovii GCA_009665235.1 | ||
Vibrio_mimicus GCA_009665195.1 | ||
Vibrio_navarrensis GCA_009665215.1 | ||
Vibrio_parahaemolyticus GCA_009665495.1 | ||
Vibrio_vulnificus GCA_009665455.1 | ||
Vibrio_vulnificus GCA_009665475.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
GCA_008011635.1 | ||
GCA_000015085.1 | ||
GCA_000495505.1 | ||
GCA_000759515.1 | ||
GCA_001643955.1 | ||
GCA_000017485.1 | ||
GCA_008011525.1 | ||
GCA_000816225.1 | ||
GCA_000019205.1 | ||
GCA_008011615.1 | ||
GCA_000512125.1 | ||
GCA_002741475.1 | ||
GCA_000026225.1 | ||
GCA_009665295.1 | ||
GCA_017363655.1 | ||
GCA_017363615.1 | ||
GCA_000252975.1 | ||
GCA_017363645.1 | ||
GCA_001466295.1 | ||
GCA_014526935.1 | ||
GCA_013626145.1 | ||
GCA_013625995.1 | ||
GCA_013625895.1 | ||
GCA_017363605.1 | ||
GCA_002489005.1 | ||
GCA_009665375.1 | ||
GCA_013588055.1 | ||
GCA_011388235.1 | ||
GCA_009665435.1 | ||
GCA_009665515.2 | ||
GCA_009665415.1 | ||
GCA_009665395.1 | ||
GCA_009665355.1 | ||
GCA_009665335.1 | ||
GCA_009665315.1 | ||
GCA_009665275.1 | ||
GCA_009665255.1 | ||
GCA_009665235.1 | ||
GCA_009665195.1 | ||
GCA_009665215.1 | ||
GCA_009665495.1 | ||
GCA_009665475.1 | ||
GCA_009665455.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
Species BioSample NCBI Assembly Strain ID | ||
Campylobacter coli SAMN12323645 GCA_008011635.1 2013D-9606 | ||
Campylobacter fetus SAMN02604050 GCA_000015085.1 82-40 | ||
Campylobacter fetus SAMN02604287 GCA_000495505.1 03-427 | ||
Campylobacter fetus SAMN02870596 GCA_000759515.1 97-608 | ||
Campylobacter hyointestinalis SAMN03737973 GCA_001643955.1 LMG 9260 | ||
Campylobacter jejuni SAMN02604056 GCA_000017485.1 NC_009707 | ||
Campylobacter jejuni SAMN12323651 GCA_008011525.1 D0133 | ||
Campylobacter lari SAMN02604025 GCA_000019205.1 RM2100 | ||
Campylobacter lari SAMN03248542 GCA_000816225.1 LMG 11760 | ||
Campylobacter upsaliensis SAMN12323647 GCA_008011615.1 D1914 | ||
Escherichia albertii SAMN02641387 GCA_000512125.1 KF1 | ||
Escherichia coli SAMN07731009 GCA_002741475.1 B4103-1 | ||
Escherichia fergusonii SAMEA3138228 GCA_000026225.1 ATCC_35469 | ||
Grimontia hollisae SAMN10812938 GCA_009665295.1 2013V-1029 | ||
Listeria innocua SAMN10869157 GCA_017363615.1 2010L-2059 | ||
Listeria innocua SAMN10869156 GCA_017363655.1 H0996 L | ||
Listeria ivanovii SAMEA3138408 GCA_000252975.1 PAM55 | ||
Listeria marthii SAMN10869158 GCA_017363645.1 FSL S4-696 | ||
Listeria monocytogenes SAMN02944835 GCA_001466295.1 G4599 | ||
Listeria monocytogenes SAMN02847829 GCA_013625895.1 2014L-6256 | ||
Listeria monocytogenes SAMN03067768 GCA_013625995.1 J0099 | ||
Listeria monocytogenes SAMN02950479 GCA_013626145.1 2014L-6393 | ||
Listeria monocytogenes SAMN03761815 GCA_014526935.1 2011L-2626 | ||
Listeria seeligeri SAMN10869159 GCA_017363605.1 F5761 | ||
Listeria welshimeri SAMN03462185 GCA_002489005.1 SLCC5334 | ||
Photobacterium damselae SAMN10702680 GCA_009665375.1 2012V-1072 | ||
Salmonella bongori SAMN13207407 GCA_013588055.1 04-0440 | ||
Salmonella enterica SAMN08167480 GCA_011388235.1 2010K-2370 | ||
Vibrio alginolyticus SAMN10702675 GCA_009665435.1 2013V-1302 | ||
Vibrio cholerae SAMN10863496 GCA_009665515.2 2010EL-1786 | ||
Vibrio cidicii SAMN10863497 GCA_009665415.1 2423-01 | ||
Vibrio cincinnatiensis SAMN10812936 GCA_009665395.1 2409-02 | ||
Vibrio fluvialis SAMN10812937 GCA_009665355.1 2013V-1049 | ||
Vibrio furnissii SAMN10702681 GCA_009665335.1 2419-04 | ||
Vibrio harveyi SAMN10702676 GCA_009665315.1 2011V-1164 | ||
Vibrio metoecus SAMN10702677 GCA_009665255.1 2011V-1169 | ||
Vibrio metoecus SAMN10863498 GCA_009665275.1 08-2459 | ||
Vibrio metschnikovii SAMN10702671 GCA_009665235.1 2012V-1020 | ||
Vibrio mimicus SAMN10812939 GCA_009665195.1 2011V-1073 | ||
Vibrio navarrensis SAMN10863499 GCA_009665215.1 08-2462 | ||
Vibrio parahaemolyticus SAMN10702672 GCA_009665495.1 2012AW-0154 | ||
Vibrio vulnificus SAMN10702674 GCA_009665455.1 2009V-1035 | ||
Vibrio vulnificus SAMN10702673 GCA_009665475.1 2142-77 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
FROM ubuntu:jammy as app | ||
|
||
# for easy upgrade later. ARG variables only persist at build time | ||
ARG FASTANI_VER="v1.34" | ||
|
||
LABEL base.image="ubuntu:jammy" | ||
LABEL dockerfile.version="1" | ||
LABEL software="FastANI" | ||
LABEL software.version=${FASTANI_VER} | ||
LABEL description="Fast alignment-free computation of whole-genome Average Nucleotide Identity" | ||
LABEL website="https://github.com/ParBLiSS/FastANI" | ||
LABEL license="https://github.com/ParBLiSS/FastANI/blob/master/LICENSE" | ||
LABEL maintainer="Kelsey Florek" | ||
LABEL maintainer.email="kelsey.florek@slh.wisc.edu" | ||
LABEL maintainer2="Curtis Kapsak" | ||
LABEL maintainer2.email="kapsakcj@gmail.com" | ||
LABEL maintainer3="Kutluhan Incekara" | ||
LABEL maintainer3.email="kutluhan.incekara@ct.gov" | ||
|
||
# install dependencies; cleanup apt garbage | ||
RUN apt-get update && apt-get install --no-install-recommends -y \ | ||
wget \ | ||
unzip \ | ||
libgomp1 && \ | ||
apt-get clean && rm -rf /var/lib/apt/lists/* | ||
|
||
# download pre-compiled binary; unzip; put binary in /usr/local/bin | ||
# apt dependencies: libgomp1 unzip wget | ||
RUN wget --no-check-certificate https://github.com/ParBLiSS/FastANI/releases/download/${FASTANI_VER}/fastANI-Linux64-${FASTANI_VER}.zip && \ | ||
unzip fastANI-Linux64-${FASTANI_VER}.zip -d /usr/local/bin && \ | ||
rm fastANI-Linux64-${FASTANI_VER}.zip | ||
|
||
# default run command | ||
CMD fastANI -h | ||
|
||
# singularity compatibility | ||
ENV LC_ALL=C | ||
|
||
# set working directory | ||
WORKDIR /data | ||
|
||
## Test ## | ||
FROM app as test | ||
|
||
# download 2 genomes from fastANI GitHub; compare the 2; cat the output file | ||
RUN wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Escherichia_coli_str_K12_MG1655.fna && \ | ||
wget --no-check-certificate -P /data https://github.com/ParBLiSS/FastANI/raw/master/tests/data/Shigella_flexneri_2a_01.fna && \ | ||
fastANI -q /data/Shigella_flexneri_2a_01.fna -r /data/Escherichia_coli_str_K12_MG1655.fna -o /data/fastANI-test-ShiglellaFlexneri-EcoliK12.tsv && \ | ||
echo "output TSV from fastANI test:" && \ | ||
cat fastANI-test-ShiglellaFlexneri-EcoliK12.tsv | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# fastANI container | ||
|
||
Main tool : [fastANI](https://github.com/ParBLiSS/FastANI) | ||
|
||
Full documentation: https://github.com/ParBLiSS/FastANI | ||
|
||
FastANI was developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes. | ||
|
||
This docker image contains no references. | ||
|
||
## Example Usage | ||
|
||
```bash | ||
# query one genome against another genome | ||
fastANI -t 8 -q bacterial-genome1.fasta -r bacterial-genome2.fasta -o fastANI.out.tsv | ||
|
||
# query one genome against the 43 genomes in RGDv2 (requires a File Of FileNames as input) | ||
fastANI -t 8 -q bacterial-genome.fasta --rl /RGDv2/FOFN-RGDv2.txt -o fastANI.RGDv2.out.tsv | ||
``` |