The pipeline in this repository analyses data collected from online reviews of prescribed medications.
Specifically, it:
- Identifies the 4 most common conditions experienced by people writing reviews
- Finds words and bigrams used when describing the side effects of medications prescribed for each condition
- Plots wordclouds to visualise the most common words and bigrams used to describe the side effects associated with these medications
- Log into bc4
- Add git module
- Setup ssh access for git
- Fork this repository
- Clone forked repository to home partition
- Follow ACRC's instructions to install mamba on your work partition
- Navigate to cloned repository and create a conda environment from environment.yml
- Activate the conda environment
Look at practical_1/run_analysis.sh
: this is a sumbission script to run the pipeline on BlueCrystal. It does not make use of Snakemake.
- Add comments to the submission script, describing what each step does
- Submit the submission script to the jobque
- Add, commit and push the results to git so that you can view the figures locally
- extension Edit the submission script to run as an array job
You will now re-run the analysis using Snakemake. First, navigate to the root directory and clear all derived data and results:
rm -r ./results/
rm -r ./data/derived/
Look at practical_2/Snakefile
: this is a snakemake workflow for the analysis.
- Compare this to the commands in
run_analysis.sh
. What commands aren't executed by the Snakefile? - Execute snakemake with a dry run (
snakemake -n
) -- what error(s) do you see? - Look at
make_config.sh
: which parts of the pipeline are missing? Complete and execute. - Run snakemake with a dry run again -- the errors should have gone.
- Execute the snakemake workflow on the login node, timing how long it takes (
time snakemake -j1
)
- Install the snakemake executor plugin for slurm
- Create a slurm profile and save this in your config directory:
~./config/snakemake/slurm_profile/config.yaml
- Clean your snakemake workflow (
snakemake clean
) - Run your workflow using your profile, timing it:
time snakemake --executor slurm --profile slurm_profile
If you're making changes, consider raising an Issue in your forked github repository, creating a new branch, checking out that new branch and committing and pushing the changes there, then making a pull request.
Once finished, you could update this README.md to explain how to run your pipeline in different environments.