From 9b2075762aa12c4a9ed470a4a2753fc0c2f1a7e4 Mon Sep 17 00:00:00 2001 From: Candace Savonen Date: Mon, 16 Nov 2020 11:47:24 -0500 Subject: [PATCH 1/3] Try out intro and fix filenames --- .../pathway-analysis_microarray_00_intro.Rmd | 31 - .../pathway-analysis_microarray_01_ora.Rmd | 502 ++++++ ...> pathway-analysis_microarray_01_ora.html} | 1385 ++++++++++++++--- ...is_microarray_01_ortholog_mapping_kegg.Rmd | 272 ---- ...> pathway-analysis_microarray_02_gsea.Rmd} | 0 ... pathway-analysis_microarray_02_gsea.html} | 0 .../pathway-analysis_microarray_02_ora.Rmd | 103 +- ...sis_microarray_03_qusage_meta_analysis.Rmd | 512 ------ ...icroarray_04_qusage_replicate_vignette.Rmd | 465 ------ ...is_microarray_05_qusage_single_dataset.Rmd | 481 ------ .../pathway-analysis_microarray_06_ssgsea.Rmd | 445 ------ Snakefile | 4 +- components/_navbar.html | 6 +- 13 files changed, 1709 insertions(+), 2497 deletions(-) delete mode 100644 02-microarray/pathway-analysis_microarray_00_intro.Rmd create mode 100644 02-microarray/pathway-analysis_microarray_01_ora.Rmd rename 02-microarray/{pathway-analysis_microarray_02_ora.html => pathway-analysis_microarray_01_ora.html} (90%) delete mode 100644 02-microarray/pathway-analysis_microarray_01_ortholog_mapping_kegg.Rmd rename 02-microarray/{pathway-analysis_microarray_03_gsea.Rmd => pathway-analysis_microarray_02_gsea.Rmd} (100%) rename 02-microarray/{pathway-analysis_microarray_03_gsea.html => pathway-analysis_microarray_02_gsea.html} (100%) delete mode 100644 02-microarray/pathway-analysis_microarray_03_qusage_meta_analysis.Rmd delete mode 100644 02-microarray/pathway-analysis_microarray_04_qusage_replicate_vignette.Rmd delete mode 100644 02-microarray/pathway-analysis_microarray_05_qusage_single_dataset.Rmd delete mode 100644 02-microarray/pathway-analysis_microarray_06_ssgsea.Rmd diff --git a/02-microarray/pathway-analysis_microarray_00_intro.Rmd b/02-microarray/pathway-analysis_microarray_00_intro.Rmd deleted file mode 100644 index 696e6724..00000000 --- a/02-microarray/pathway-analysis_microarray_00_intro.Rmd +++ /dev/null @@ -1,31 +0,0 @@ ---- -title: "Pathway Analysis Introduction" -output: - html_notebook: - toc: true - toc_float: true - number_sections: true ---- - -## Background - -Over-representation analysis (ORA) is a method of pathway or gene set analysis -where one can ask if a set of genes (e.g., those differentially expressed -using some cutoff) shares more or less genes with gene sets/pathways than -we would expect at random. -The other methodologies introduced throughout this module such as QuSAGE and -GSEA can require more samples than a different expression analysis. -For instance, the sample label permutation step of GSEA is reported to -perform poorly with 7 samples or less in each group -([](https://doi.org/10.1093/nar/gkt660)). -It is not uncommon to have n ~ 3 for each group in a treatment-control -transcriptomic study, at which point identifying differentially expressed genes -is possible. -If you are interested in performing pathway analysis on a small study, ORA may -be your best bet. -There are some limitations to ORA methods to be aware such as ignoring -gene-gene correlation. -See [](https://doi.org/10.1371/journal.pcbi.1002375) -to learn more about the different types of pathway analysis and their -limitations. - diff --git a/02-microarray/pathway-analysis_microarray_01_ora.Rmd b/02-microarray/pathway-analysis_microarray_01_ora.Rmd new file mode 100644 index 00000000..ff433031 --- /dev/null +++ b/02-microarray/pathway-analysis_microarray_01_ora.Rmd @@ -0,0 +1,502 @@ +--- +title: "Over-representation analysis - Microarray" +author: "CCDL for ALSF" +date: "`r format(Sys.time(), '%B %Y')`" +output: + html_notebook: + toc: true + toc_float: true + number_sections: true +--- + +# Purpose of this analysis + +This example is one of pathway analysis module set, we recommend looking at the [pathway analysis table beloow](#how-to-choose-a-pathway-analysis) to help you determine which pathway analysis method is best suited for your purposes. + +This particular example analysis shows how you can use over-representation analysis (ORA) to determine if a set of genes (e.g., those differentially expressed using some cutoff) shares more or fewer genes with gene sets/pathways than we would expect at random. +This pathway analysis method does not require any particular sample size, since the only input from your dataset is a set of genes of interest [@Yaari2013]. + +⬇️ [**Jump to the analysis code**](#analysis) ⬇️ + +### What is pathway analysis? + +We refer to any technique that uses predetermined sets of genes that are related or coordinated in their expression in some way (e.g., participate in the same molecular process, are regulated by the same transcription factor) to interpret a high-throughput experiment as pathway analysis. +In the context of refine.bio, we use these techniques to analyze and interpret genome-wide gene expression experiments. +The rationale for performing pathway analysis is that looking at the pathway-level may be more biologically meaningful than considering individual genes, especially if a large number of genes are differentially expressed between conditions of interest. +In addition, many relatively small changes in the expression values of genes in the same pathway could lead to a phenotypic outcome and these small changes may go undetected in differential gene expression analysis. + +We highly recommend taking a look at Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges from @Khatri for a more comprehensive overview and reading the primary publications and documentation of the methods and sources we will introduce below. + +### How to choose a pathway analysis? + +This table summarizes the pathway analyses examples in this module. + +|Analysis|What is required for input|What output looks like |✅ Pros|⚠️ Cons| +|--------|--------------------------|-----------------------|-------|------| +|[**ORA (Over-representation Analysis)**](link)|A list of gene IDs (no stats needed)|A per-pathway hypergeometric test result|