Here we describe the differential expression part of the workflow.
An analyse should not last more than 10 minutes.
First you need to check that the following tools are installed on server/computer.
Scripts available here are in Python3.
It's not required but advised to install Conda if python3 is not set up on your computer.
It will make things easier then for installing tools or switching to older python version if needed.
Conda : here
DIFFERENTIAL EXPRESSION :
R : R here
Install following packages for R :
source("https://bioconductor.org/biocLite.R")
biocLite("data.table")
biocLite("reshape2")
biocLite("edgeR")
biocLite("DESeq2")
biocLite("limma")
biocLite("RColorBrewer")
biocLite("gplots")
biocLite("heatmap3")
biocLite("grDevices")
biocLite("genefilter")
biocLite("ggplot2")
biocLite("GenomicFeatures")
biocLite("AnnotationDbi")
biocLite("biomaRt")
biocLite("stringr")
biocLite("org.Hs.eg.db")
#biocLite("vsn")
biocLite("plyr")
biocLite("pheatmap")
biocLite("PoiClaClu")
biocLite("gtools")
You need an init.json and diff_exp.json to launch this script.
init.json is called automatically. Create the file in in configs directory.
Only scriptDir variable need to be set up in your init.json :
"scriptDir" : "/home/jean-philippe.villemin/code/RNA-SEQ/",
To get an overview of the json, look into configs directory.
Here we show an example for the diff_exp.json :
python3 pathTo/diffGeneExp.py -c pathToConfigFile/diff_exp.json -p TestConditionName_vs_NormalConditionName
This script is a wrapper calling a Rscript called diff_exp.R. diff_exp.R will use Design.csv & Raw_read_counts.csv created by the python wrapper using json configuration file.
Design.csv & Raw_read_counts.csv should be in $path_to_output/output/$project_name/ directory.
If you already have Design.csv & Raw_read_counts.csv , you can execute directly the Rscript as follows :
Rscript ${PATH_TO_SCRIPT}/diff_exp.R --dir ${PATH_TO_DATA}/[DIR_NAME] --cond1 [COND1] --cond2 [COND2] ${PATH_TO_DATA}/[DESIGN.csv] ${PATH_TO_DATA}/[GENE_READ_COUNT.csv]
This is how Design.csv should be :
When you call the script, the p parameter need TestConditionName_vs_NormalConditionName to be set. It should be set in accordance with what you wrote in design.csv.
Note : No need of last column.
This is how Raw_read_counts.csv should be :
Finally you get the following directories as output :