conda-hpc-snakemake-example

Practical instructions

The pipeline in this repository analyses data collected from online reviews of prescribed medications.

Specifically, it:

Identifies the 4 most common conditions experienced by people writing reviews
Finds words and bigrams used when describing the side effects of medications prescribed for each condition
Plots wordclouds to visualise the most common words and bigrams used to describe the side effects associated with these medications

Practical 1: Conda on HPC

Setting up Git and Conda

Git

Log into bc4
Add git module
Setup ssh access for git
Fork this repository
Clone forked repository to home partition

Conda / Mamba

Follow ACRC's instructions to install mamba on your work partition
Navigate to cloned repository and create a conda environment from environment.yml
Activate the conda environment

Run analysis using a conda environment

Look at practical_1/run_analysis.sh: this is a sumbission script to run the pipeline on BlueCrystal. It does not make use of Snakemake.

Add comments to the submission script, describing what each step does
Submit the submission script to the jobque
Add, commit and push the results to git so that you can view the figures locally
extension Edit the submission script to run as an array job

Practical 2: Snakemake on HPC

You will now re-run the analysis using Snakemake. First, navigate to the root directory and clear all derived data and results:

rm -r ./results/

rm -r ./data/derived/

Adapt the bash pipeline to work with snakemake

Look at practical_2/Snakefile: this is a snakemake workflow for the analysis.

Compare this to the commands in run_analysis.sh. What commands aren't executed by the Snakefile?
Execute snakemake with a dry run (snakemake -n) -- what error(s) do you see?
Look at make_config.sh: which parts of the pipeline are missing? Complete and execute.
Run snakemake with a dry run again -- the errors should have gone.
Execute the snakemake workflow on the login node, timing how long it takes (time snakemake -j1)

Run your workflow using a slurm profile

Install the snakemake executor plugin for slurm
Create a slurm profile and save this in your config directory: ~./config/snakemake/slurm_profile/config.yaml
Clean your snakemake workflow (snakemake clean)
Run your workflow using your profile, timing it: time snakemake --executor slurm --profile slurm_profile

Things to consider

If you're making changes, consider raising an Issue in your forked github repository, creating a new branch, checking out that new branch and committing and pushing the changes there, then making a pull request.

Once finished, you could update this README.md to explain how to run your pipeline in different environments.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
code		code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

conda-hpc-snakemake-example

Practical instructions

Practical 1: Conda on HPC

Setting up Git and Conda

Git

Conda / Mamba

Run analysis using a conda environment

Practical 2: Snakemake on HPC

Adapt the bash pipeline to work with snakemake

Run your workflow using a slurm profile

Things to consider

About

Releases

Packages

Contributors 2

Languages

License

MRCIEU/conda-hpc-snakemake-example

Folders and files

Latest commit

History

Repository files navigation

conda-hpc-snakemake-example

Practical instructions

Practical 1: Conda on HPC

Setting up Git and Conda

Git

Conda / Mamba

Run analysis using a conda environment

Practical 2: Snakemake on HPC

Adapt the bash pipeline to work with snakemake

Run your workflow using a slurm profile

Things to consider

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages