A Snakemake workflow for standardised sc/snRNAseq analysis.
Find us in the "Standardized Usage" Section of the Snakemake Workflow Catalog
Every single cell analysis is slightly different. This represents what I would call a "core" analysis, as nearly every analysis I perform start with something very akin to this. Given this custom nature of single cell, this workflow is not designed to be all encompassing. Rather, it aims to be extensible, modular, and reproducible. Any given step can be easily modified - as they are all self contained scripts - and a new rule can be easily added - see the downstream rules for an example. Finally, by taking advantage of the integrated Conda and Singularity support, we can run the whole thing in an isolated environment.
A full walkthrough on how to install and use this pipeline can be found here.
To take advantage of Singularity, you'll need to install that separately. If you are running on a Linux system, then singularity can be installed from conda like so:
conda install -n snakemake -c conda-forge singularity
It's a bit more challenging for other operating systems. Your best bet is to follow their instructions here. But don't worry! Singularity is not regquired! Snakemake will still run each step in its own Conda environment, it just won't put each Conda environment in a container.
Alternatively, you may grab the source code. You likely will not need these steps if you aren't planning to contribute.
Navigate to our release page on github and download the most recent version. The following will do the trick:
curl -s https://api.github.com/repos/IMS-Bio2Core-Facility/single_snake_sequencing/releases/latest |
grep tarball_url |
cut -d " " -f 4 |
tr -d '",' |
xargs -n1 curl -sL |
tar xzf -
After querying the github api to get the most recent release information,
we grep for the desired URL,
split the line and extract the field,
trim superfluous characters,
use xargs
to pipe this to curl
while allowing for re-directs,
and un-tar the files.
Easy!
Alternatively, for the bleeding edge, please clone the repo like so:
git clone https://github.com/IMS-Bio2Core-Facility/single_snake_sequencing
⚠️ Heads Up! The bleeding edge may not be stable, as it contains all active development.
However you choose to install it,
cd
into the directory.
This pipeline expects de-multiplexed fastq.gz files,
normally produced by some deriviative of bcl2fastq
after sequencing.
They can (technically) be placed anywhere,
but we recommend creating a data
directory in your project for them.
The analysis pipeline was run using Snakemake v6.6.1.
The full version and software lists can be found under the relevant yaml files in workflow/envs
.
The all reasonable efforts have been made to ensure that the repository adheres to the best practices
outlined here.
For a full discussion on the analysis methods, please see the technical documentation.
Briefly,
the count matrix was produced using Cellranger,
droplet calling with DropletUtils::emptyDrops
,
doublet detection with SOLO
from the scVI
family,
batch effect removal with harmonypy
,
and general analysis and data handling with scanpy
.
- Supply tests
- Track lane in samples that have been pooled and de-multiplexed
- Parallelise emptyDrops
- Support custom references
- Support SCTransform?