Run virfinder in parallel, saving both scores and FASTA file
Via conda:
conda install -c bioconda -c conda-forge parallel-virfinder
parallel-virfinder.py -i input.fasta -o output.csv -t THREADS [-f output.fasta]
usage: parallel-virfinder.py [-h] -i INPUT -o OUTPUT [-f FASTA] [-n PARALLEL] [-t TMPDIR] [-s MIN_SCORE] [-p MAX_P_VALUE] [--no-check] [-v] [-d]
Execute virfinder on a FASTA file in parallel
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input FASTA file
-o OUTPUT, --output OUTPUT
Output CSV file
-f FASTA, --fasta FASTA
Save FASTA file
-n PARALLEL, --parallel PARALLEL
Number of parallel processes [default: 4]
-t TMPDIR, --tmpdir TMPDIR
Temporary directory [default: /tmp]
VirFinder options:
-s MIN_SCORE, --min-score MIN_SCORE
Minimum score [default: 0.9]
-p MAX_P_VALUE, --max-p-value MAX_P_VALUE
Maximum p-value [default: 0.05]
Running options:
--no-check Do not check dependencies at startup
-v, --verbose Verbose output
-d, --debug Debug output and do not remove temporary files
Clone this repository, activate the conda environment and run:
# Activate the appropriate conda environment, if needed
bash test/test.sh
If compared with a parallel implementation in R, this wrapper performs better (smaller times, smaller memroy usage). See benchmark.
If you use parallel-virfinder, please cite the following paper:
-
Ren, J., Ahlgren, N. A., Lu, Y. Y., Fuhrman, J. A., & Sun, F. (2017). VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome, 5(1), 1-20.
-
Telatin, A., Fariselli, P., & Birolo, G. (2021). Seqfu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering, 8(5), 59.
VirFinder (see license) is free to use for academic or non commercial use only. SeqFu and this wrapper are free to use.