Document version 1.1 // 18.01.2023
This document can be found at https://github.com/NIB-SI/ARMOR/README-NIB-SI.md.
Please read this entire document at least once before starting - it is not long. These instructions are specific to make running ARMOR on our servers easier.
If you do have issues you can let carissa know (Slakc/email). Do not delete your project folder, the logs will be needed for debugging.
mkdir projectX && cd projectX
cp -r --no-preserve=mode /DKHA/proj/conda/armor/ARMOR-bare/* .
<configure data and config file>
conda activate snakemake
snakemake --use-conda --conda-prefix /swalt/conda/envs/armor/ -j<num cores>
The ARMOR github repo has been forked from https://github.com/csoneson/ARMOR to https://github.com/NIB-SI/ARMOR. This was to address dependency issues that could not be resolved by the original version. Feel free to make a pull request to the NIB-SI repo if you have improvements.
On Heron the NIB-SI fork has been downloaded to
/DKHA/proj/conda/armor/ARMOR/
It is very unlikely you will either need to use the downloaded repo or need to re-download the repo.
The Snakefile cotains a number of rules (workflow steps) that call applications. These applications are managed in conda environments: one for shell applications (defined in ./envs/environment_shell.yaml) and one for R (defined in ./envs/environment_R.yaml). Each new snakemake --use-conda
call in a new working directory will recreate these environments. To prevent this waste of resources (and installation time), use a common directory for the conda environments with the --conda-prefix
each time (they are already installed in the /swalt/conda folder). This way, only when the environment yaml files are edited will the environments need to be recreated.
The changes made to the NIB-SI fork make that the workflow "has" to be run using conda. (Probably, I haven't bothered checking otherwise. )
You will notice the environment names are a bit odd - they are automatically generated using a hash of the environment file, and if the environment file is changed, the hash changes, and so Snakemake knows to re-create the environment. Neat!
Snakemake provides a ton of logging (errors, versions, output, progress, ...) in a .snakemake folder. Therefore it is recommended to have a new, clean folder for each ARMOR project.
A number of files are needed in the project folder. To keep things simple, a clean project "startup" folder with
only the necessary files is in /DKHA/proj/conda/armor/ARMOR-bare/
.
DO NOT edit or run snakemake in this folder. This folder and all its contents have been made read-only to prevent inadvertently corrupting it for new projects. Instead, make your own ARMOR project folder and copy the contents there. Let's pretend to be working on "projectX". The first step is:
mkdir projectX && cd projectX
cp -r --no-preserve=mode /DKHA/proj/conda/armor/ARMOR-bare/* .
Note the --no-preserve=mode
option is needed to use the destination permissions.
Next, you need to configure the ARMOR workflow to your needs. This includes the input data paths, the output data location, and parameters to the sub-calls. This is done by editing projectX/config.yaml (it is currently setup to run an example dataset). For complete instructions regarding the config file see https://github.com/csoneson/ARMOR/wiki/The-config.yaml-configuration-file and for the input files see https://github.com/csoneson/ARMOR/wiki/Preparing-the-input-files
For example, make new folders within projectX, (perhaps input_data and armor_output), put the necessary input files there (or perhaps symlinks), and edit the paths in the following sections of confgi.yaml:
- Paths to existing reference files
- Paths to indexes that will be generated by the workflow
- Path to metadata text file.
- Path to a folder containing gzipped fastq files, and the file suffix (typically, either fastq or fq).
- Path to a folder that will store the output generated by the workflow.
(See below for a quick way to find and replace in vim.)
To resolve dependencies, conda is used as the environment/package manager.
Before running snakemake activate the snakemake environment:
conda activate snakemake
The snakemake environment contains basic functions and the Snakemake tool, it was already created on Heron using the command:
mamba env create -f ARMOR/envs/snakemake.yaml
Within the snakemake environment, run the workflow with the following command:
snakemake --use-conda --conda-prefix /swalt/conda/envs/armor/ -j<num cores>
(replacing <num cores>
with your desired number of cores).
As mentioned before, the workflow depends on two other environments. These are already installed to
/swalt/conda/envs/armor/
, and this path is defined with the --conda-prefix
argument to snakemake. Snakemake takes care of activating these as needed.
If you have any issues with the environments (e.g. permissions after changing the environment.yaml file), you can drop the --conda-prefix
argument, and snakemake will create the environments locally (e.g. in projectX/.snakemake
). Usually this taks an unneccesary amount of time and redunant space,
hence the shared environments in swalt
.
The config.yaml file in /DKHA/proj/conda/armor/ARMOR-bare/
is set up to run the "example_data" test dataset.
If you want to run it, copy the dataset to your project folder (after copying ARMOR-bare
):
cd projectX
cp -r /DKHA/proj/conda/armor/ARMOR/example_data .
and run snakemake as above.
If the workflow crashes, and you want to continue the workflow from the last successful point/rule, use the --ri
(rerun-incomplete) option, i.e.:
snakemake --use-conda --conda-prefix /swalt/conda/envs/armor -j20 --ri
In command mode (press Esc) use the following command to find and replace all occurrences:
:%s/<search_term>/<replace_term>/<option>
(Do not forget the :%
)
option can be:
- g - replace all
- c - confirm before each replace
- i - ignore case
For example, to replace all "example_data" occurrences with "adapt_data" as the input folder path in config.yaml, and confirming each change, you can use:
:%s/example_data/adapt_data/c