-
Notifications
You must be signed in to change notification settings - Fork 1
Using the full pipeline with Snakemake
If the user only has reads and references and wants to generate a SAM file, we outline the steps to also download and use an initial aligner such as Bowtie2, Pufferfish, or minimap2.
To download the initial aligners, run the following command to download all three initial aligners.
bash install.sh
This script basically downloads each aligner one by one. If an error occurs, please consult the installation page of the aligner. If you do not want to download some of the aligners, open and comment out the respective lines in the install.sh file. If you already have the aligner installed elsewhere, edit the BINARIES path in the config file to the path of the aligner.
After installation is finished, update the configuration file (config/config.yaml) with the directories of all references, reads, and initial aligners. The user can also change different parameters used for MORA in the config file if needed. The parameters of the config files are listed below.
Parameter | Description |
---|---|
BINARIES | Binary folder directory of aligners(default: binaries) - edit if the aligner is located somewhere else |
REFERENCES | Directory to reference fasta file |
SAMPLES_DIR | Directory to folder containing query fasta files |
RESULTS | Directory to write the results |
FILES_EXT | Query files extension, i.e. .fq, .fq.gz etc |
MAPPING_MODE | Algorithm for the initial mapping - (pufferfish, bowtie2, minimap2) |
STRATEGY | "PE" for paired-end samples or "SE" for single-end samples |
TYPE | RNA or DNA host-specific samples - right now only supports DNA |
MIN_CNT | Minimum number of counts for a reference to be considered valid |
MIN_SCORE_DIFFERENCE | Minimum score difference for a query to be assgined second |
MAX_ABUNDANCE_DIFFERENCE | Maximum difference allowed between the initial abundance estimation and the abundances created from assignments |
SEGMENT_SIZE | Size to split references into bins |
ABUNDANCE_OUTPUT | Whether to output estimated abundance levels |
TAXONOMY | Directory of taxonomic information to write results with taxonomic classes (False to not include taxonomic information in the results) |
MEM_MB | Amount of memory to be allocated to snakemake |
TPS | Number of threads to be used per sample |
After the configurations are updated, run the following command.
snakemake --snakefile MORA --cores 24 --resources mem_mb=140000