Skip to content

Using the full pipeline with Snakemake

Andrew Zheng edited this page May 20, 2023 · 8 revisions

If the user only has reads and references and wants to generate a SAM file, we outline the steps to also download and use an initial aligner such as Bowtie2, Pufferfish, or minimap2.

Installing Initial Aligners

To download the initial aligners, run the following command to download all three initial aligners.

bash install.sh

This script basically downloads each aligner one by one. If an error occurs, please consult the installation page of the aligner. If you do not want to download some of the aligners, open and comment out the respective lines in the install.sh file. If you already have the aligner installed elsewhere, edit the BINARIES path in the config file to the path of the aligner.

Update Configurations

After installation is finished, update the configuration file (config/config.yaml) with the directories of all references, reads, and initial aligners. The user can also change different parameters used for MORA in the config file if needed. The parameters of the config files are listed below.

Config File

Parameter Description
BINARIES Binary folder directory of aligners(default: binaries) - edit if the aligner is located somewhere else
REFERENCES Directory to reference fasta file
SAMPLES_DIR Directory to folder containing query fasta files
RESULTS Directory to write the results
FILES_EXT Query files extension, i.e. .fq, .fq.gz etc
MAPPING_MODE Algorithm for the initial mapping - (pufferfish, bowtie2, minimap2)
STRATEGY "PE" for paired-end samples or "SE" for single-end samples
TYPE RNA or DNA host-specific samples - right now only supports DNA
MIN_CNT Minimum number of counts for a reference to be considered valid
MIN_SCORE_DIFFERENCE Minimum score difference for a query to be assgined second
MAX_ABUNDANCE_DIFFERENCE Maximum difference allowed between the initial abundance estimation and the abundances created from assignments
SEGMENT_SIZE Size to split references into bins
ABUNDANCE_OUTPUT Whether to output estimated abundance levels
TAXONOMY Directory of taxonomic information to write results with taxonomic classes (False to not include taxonomic information in the results)
MEM_MB Amount of memory to be allocated to snakemake
TPS Number of threads to be used per sample

Running Snakemake

After the configurations are updated, run the following command.

snakemake --snakefile MORA --cores 24 --resources mem_mb=140000
Clone this wiki locally