SkeletalVis-Transcriptomics is a bioinformatics pipeline for reproducible analyses of microarray and RNA-seq data.
The pipeline is built using Nextflow, a portable workflow tool to run tasks across multiple compute infrastructures. This pipeline uses a singularity container containing all the software needed to run the analysis, making installation simple and the results reproducible.
The SkeletalVis-Transcriptomics pipeline takes a sample table and a parameter file defining the experiment as input. If not provided microarray data and fastq files are automatically downloaded using the provided accession numbers/sample identifiers.
(a) Download of raw microarray data from GEO, fastq files either directly from ENA, via conversion of sra files from SRA
(b) Microarray with affyQCReport and RNA-seq read quality trimming with trimmomatic, QC reports with fastqc and multiQC
(c) RNA-seq Quantification using kallisto
and processing with tximport to produce a sample x gene expression table
(d) Differential expression analysis with limma, DESeq2 and Characteristic Direction
(e) Pathway and gene ontology enrichment analysis with goseqgoseq
(f) Active subnetwork identification with GIGA
(i) Identify transcription factors potenitally driving differential expression with CHEA3
Analyses are run in parallel and in result of error you can resume with the -resume
parameter to re-run the pipeline starting from the previous fault.
Try the pipeline on an example dataset (all inputs will be automatically downloaded): -
-
Install
Nextflow
-
Install
Singularity
-
Download the pipeline
nextflow clone CBFLivUni/SkeletalVis-Transcriptomics
-
Configure
the resource profile for your HPC or local computer. A template for slurm schedulers is provided as an example innextflow.config
There is a utility function provided to help replace paths within the config text files:
```console
bash scripts/install/replacePath.sh nextflow.config /mnt/hc-storage/groups/cbf/Nextflow/SkeletalVis-Transcriptomics `pwd -P`
```
-
Test on the example dataset:
nextflow run main.nf -profile slurm -params-file params/GSE152805.yaml -with-singularity library://jsoul/default/skeletalvis-transcriptomics:latest
- Define the sampleTable for RNA-seq data
Create a tab seperated table with unique Sample names, SRR accession numbers (if download is needed) and any additional metadata e.g
Sample | File | Condition |
---|---|---|
Control_1 | SRRXXX | Control |
Control_2 | SRRXXX | Control |
Treated_1 | SRRXXX | Treated |
Treated_2 | SRRXXX | Treated |
Note for microarray data the metadata is retrived directly from GEO and we instead just need to specify the columns of interest that define variables to compare.
- Define the configuration
Most parameters are set to sensible defaults within the main nextflow script, with only a few parameters required to be altered with typical use. Note the use of Groovy, python and R booleans.
Parameter | Description | Options |
---|---|---|
accession | A unique identifier for the experiment to be analysed e.g the GEO accession of the data - used to name output data and download fastq files | |
species | The species the reads originate from - used to create the kallisto index | Human, Mouse, Rat, Cow, Pig |
single | Is the data single ended RNA-seq? | true, false |
batchCorrect | Should batch effect correction (sva) be used? | TRUE, FALSE |
skipTrimming | Should read trimming be skipped? | false (default), true |
Parameters should be defined within a yaml file. See params/GSE152805.yaml
for an example.
The accession
parameter defines the default search path for fastq.gz files (data/accession
/fastqFiles/). Trimmed unpaired reads e.g "*_R0.fastq.gz" are skipped by default. If fastq files are not found locally the data will be downloaded using the provided accession
number.
-
Run the pipeline with your own parameters
nextflow run soulj/SkeletalVis-Transcriptomics -profile slurm -params-file ownData.yaml -with-singularity library://jsoul/default/skeletalvis-transcriptomics
Modules can be tested using the pytest-workflow
framework. Module test directories within the tests
folder contain a nextflow script and a configuration yaml file defining the test for each module.
-
Install pytest-workflow
conda install pytest-workflow
-
Run the tests - e.g to test the pathway enrichment module
pytest --symlink --kwdof --tag pathwayEnrichment