ARGs identification from nanopore 1D/2D reads
ARGpore2 is a easy-to-use bioinformatics pipeline which codifies the current best practice to identify antibiotic resistance genes (ARGs) and its host populations from nanopore reads (longer than 1kb in fasta format).
Please read below instructions carefully to avoid unnecessary errors.
python2.7 ### sudo apt install python2.7
GNU parallel ### sudo apt install parallel
git lfs ### sudo apt install git-lfs
R and library: plyr, data.table, doParallel, foreach
git clone https://github.com/sustc-xylab/ARGpore2.git
cd ARGpore2
bash ./setup.sh
The setup.sh will install blast+, Centrifuge and then download bacteria+archaea+virus database for Centrifuge, MetaPhlan2 Markergene and PLSDB database for you. It will take at least 4 hour to finish, please stay patient :)
Once installed ARGpore2 package, all needed analysis is wrapped up in one executable named argpore.sh. Please use bash instead of sh to initiate argpore.sh.
NOTICE: To avoid cross-writing of intermediate files, each ARGpore run should has a independent working directory. To improve annotation accuracy, your input fasta should be longer than 1kb
mkdir -p demo
cp test.fa demo
cd demo
bash $PATH_to_ARGpore2/argpore.sh -f test.fa -t 60 > ARGpore.log
All output files of ARGpore are stored in a folder named $INPUT_FASTA_ARGpore_nowtime in the working directory.
Main output files include:
input_arg.tab ARG quntification (No. of reads)
input_arg.w.taxa.tab ARGs-containing nanopore reads with taxonomy assignment and plausible plasmid identification
input_plasmid.like.tab plasmid-like nanopore reads identified
input_taxa.tab taxonomy assignment of all nanopore reads
plasmid-like nanopore reads are identified by last query against PLSDB (only hit showing alignment with > 70% similarity over 70% of its lenth to a known plasmid in PLSDB is considered as valid plasmid hit). NOTICE: This method cannot fully distinguish plasmids from chromosome, as a result, it only reports plasmid-like nanopore reads in input_plasmid.like.tab. If such a plasmid-like nanopre read also showed circular nature,it is more likely to be a real plasmid. You may use ccontigs (https://github.com/Microbiology/ccontigs.git) to check the circular nature of nanopore reads. Althernatively, users may want to use Plasflow (https://github.com/smaegol/PlasFlow) to double-confirm these plasmid-like nanopore reads by comparing their kmer freqeucy to that of known plasmids.
Taxonomy annotation of nanopore reads were derived by combining results of Centrifuge and MetaPhlan2 markergene database. If case of inconsistent annotations among these tools, to maximize classification ratio, ARGpore2 combines results with priority as Centrifuge > markergene.
If you use ARGpore2 in your nanopore dataset analysis please cite:
Ziqi Wu, You Che, Chenyuan Dang, Miao Zhang, Xuyang Zhang, Yuhong Sun, Xiang Li, Tong Zhang, Yu Xia*. 2022. Nanopore-based Long-read Metagenomics Uncover the Resistome Intrusion by Antibiotic Resistant Bacteria from Treated Wastewater in Receiving Water Body. Water Research. 226:119282. (https://doi.org/10.1016/j.watres.2022.119282)
last, blast+, Centrifuge, MetaPhlan2, PLSDB, GNU parallel, R, python