Skip to content

Latest commit

 

History

History
144 lines (90 loc) · 6.32 KB

README-NIB-SI.md

File metadata and controls

144 lines (90 loc) · 6.32 KB

ARMOR

Document version 1.1 // 18.01.2023

This document can be found at https://github.com/NIB-SI/ARMOR/README-NIB-SI.md.

Please read this entire document at least once before starting - it is not long. These instructions are specific to make running ARMOR on our servers easier.

If you do have issues you can let carissa know (Slakc/email). Do not delete your project folder, the logs will be needed for debugging.

Summary

mkdir projectX && cd projectX
cp -r --no-preserve=mode /DKHA/proj/conda/armor/ARMOR-bare/* .
<configure data and config file>
conda activate snakemake
snakemake --use-conda --conda-prefix /swalt/conda/envs/armor/ -j<num cores>

NIB-SI/ARMOR

The ARMOR github repo has been forked from https://github.com/csoneson/ARMOR to https://github.com/NIB-SI/ARMOR. This was to address dependency issues that could not be resolved by the original version. Feel free to make a pull request to the NIB-SI repo if you have improvements.

On Heron the NIB-SI fork has been downloaded to /DKHA/proj/conda/armor/ARMOR/

It is very unlikely you will either need to use the downloaded repo or need to re-download the repo.

Snakemake and conda

The Snakefile cotains a number of rules (workflow steps) that call applications. These applications are managed in conda environments: one for shell applications (defined in ./envs/environment_shell.yaml) and one for R (defined in ./envs/environment_R.yaml). Each new snakemake --use-conda call in a new working directory will recreate these environments. To prevent this waste of resources (and installation time), use a common directory for the conda environments with the --conda-prefix each time (they are already installed in the /swalt/conda folder). This way, only when the environment yaml files are edited will the environments need to be recreated.

The changes made to the NIB-SI fork make that the workflow "has" to be run using conda. (Probably, I haven't bothered checking otherwise. )

You will notice the environment names are a bit odd - they are automatically generated using a hash of the environment file, and if the environment file is changed, the hash changes, and so Snakemake knows to re-create the environment. Neat!

Running ARMOR

1. The project folder

Snakemake provides a ton of logging (errors, versions, output, progress, ...) in a .snakemake folder. Therefore it is recommended to have a new, clean folder for each ARMOR project.

A number of files are needed in the project folder. To keep things simple, a clean project "startup" folder with only the necessary files is in /DKHA/proj/conda/armor/ARMOR-bare/.

DO NOT edit or run snakemake in this folder. This folder and all its contents have been made read-only to prevent inadvertently corrupting it for new projects. Instead, make your own ARMOR project folder and copy the contents there. Let's pretend to be working on "projectX". The first step is:

mkdir projectX && cd projectX
cp -r --no-preserve=mode /DKHA/proj/conda/armor/ARMOR-bare/* .

Note the --no-preserve=mode option is needed to use the destination permissions.

2. Configuration

Next, you need to configure the ARMOR workflow to your needs. This includes the input data paths, the output data location, and parameters to the sub-calls. This is done by editing projectX/config.yaml (it is currently setup to run an example dataset). For complete instructions regarding the config file see https://github.com/csoneson/ARMOR/wiki/The-config.yaml-configuration-file and for the input files see https://github.com/csoneson/ARMOR/wiki/Preparing-the-input-files

For example, make new folders within projectX, (perhaps input_data and armor_output), put the necessary input files there (or perhaps symlinks), and edit the paths in the following sections of confgi.yaml:

  • Paths to existing reference files
  • Paths to indexes that will be generated by the workflow
  • Path to metadata text file.
  • Path to a folder containing gzipped fastq files, and the file suffix (typically, either fastq or fq).
  • Path to a folder that will store the output generated by the workflow.

(See below for a quick way to find and replace in vim.)

3. The environment

To resolve dependencies, conda is used as the environment/package manager.

Before running snakemake activate the snakemake environment:

conda activate snakemake

The snakemake environment contains basic functions and the Snakemake tool, it was already created on Heron using the command:

mamba env create -f ARMOR/envs/snakemake.yaml

4. And Go!

Within the snakemake environment, run the workflow with the following command:

snakemake --use-conda --conda-prefix /swalt/conda/envs/armor/ -j<num cores>

(replacing <num cores> with your desired number of cores).

As mentioned before, the workflow depends on two other environments. These are already installed to /swalt/conda/envs/armor/, and this path is defined with the --conda-prefix argument to snakemake. Snakemake takes care of activating these as needed.

If you have any issues with the environments (e.g. permissions after changing the environment.yaml file), you can drop the --conda-prefix argument, and snakemake will create the environments locally (e.g. in projectX/.snakemake). Usually this taks an unneccesary amount of time and redunant space, hence the shared environments in swalt.

Tips

Example data

The config.yaml file in /DKHA/proj/conda/armor/ARMOR-bare/ is set up to run the "example_data" test dataset. If you want to run it, copy the dataset to your project folder (after copying ARMOR-bare):

cd projectX
cp -r /DKHA/proj/conda/armor/ARMOR/example_data .

and run snakemake as above.

Restart from a stop

If the workflow crashes, and you want to continue the workflow from the last successful point/rule, use the --ri (rerun-incomplete) option, i.e.:

snakemake --use-conda --conda-prefix /swalt/conda/envs/armor -j20 --ri

Find and Replace in VIM

In command mode (press Esc) use the following command to find and replace all occurrences:

:%s/<search_term>/<replace_term>/<option>

(Do not forget the :%) option can be:

  • g - replace all
  • c - confirm before each replace
  • i - ignore case

For example, to replace all "example_data" occurrences with "adapt_data" as the input folder path in config.yaml, and confirming each change, you can use:

:%s/example_data/adapt_data/c