Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish Microarray GSEA example #434

Merged
merged 9 commits into from
Dec 18, 2020
22 changes: 12 additions & 10 deletions 02-microarray/pathway-analysis_microarray_01_ora.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ From here you can customize this analysis example to fit your own scientific que

See our Getting Started page with [instructions for package installation](https://alexslemonade.github.io/refinebio-examples/01-getting-started/getting-started.html#what-you-need-to-install) for a list of the other software you will need, as well as more tips and resources.

In this analysis, we will be using [`clusterProfiler`](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) package to perform ORA and the [`msigdbr`](https://cran.r-project.org/web/packages/msigdbr/index.html) package which contains gene sets from the [Molecular Signatures Database (MSigDB)](https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) already in the tidy format required by `clusterProfiler` [@Igor2020; @Subramanian2005].
In this analysis, we will be using [`clusterProfiler`](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) package to perform ORA and the [`msigdbr`](https://cran.r-project.org/web/packages/msigdbr/index.html) package which contains gene sets from the [Molecular Signatures Database (MSigDB)](https://www.gsea-msigdb.org/gsea/msigdb/index.jsp) already in the tidy format required by `clusterProfiler` [@Yu2012; @Igor2020; @Subramanian2005; @Liberzon2011].

We will also need the [`org.Dr.eg.db`](https://bioconductor.org/packages/release/data/annotation/html/org.Dr.eg.db.html) package to perform gene identifier conversion and [`ggupset`](https://cran.r-project.org/web/packages/ggupset/readme/README.html) to make an UpSet plot [@Carlson2019-zebrafish; @Ahlmann-Eltze2020].

Expand Down Expand Up @@ -239,7 +239,7 @@ dge_df

## Getting familiar with MSigDB gene sets available via `msigdbr`

The Molecular Signatures Database (MSigDB) is a resource that contains annotated gene sets that can be used for pathway or gene set analyses [@Subramanian2005].
The Molecular Signatures Database (MSigDB) is a resource that contains annotated gene sets that can be used for pathway or gene set analyses [@Subramanian2005; @Liberzon2011].
We can use the `msigdbr` package to access these gene sets in a format compatible with the package we'll use for analysis, `clusterProfiler` [@Igor2020; @Yu2012].

The gene sets available directly from MSigDB are applicable to human studies.
Expand All @@ -257,8 +257,8 @@ The data we're interested in here comes from zebrafish samples, so we can obtain
dr_msigdb_df <- msigdbr(species = "Danio rerio")
```

MSigDB contains [8 different gene set collections](https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) [@Subramanian2005] that are distinguished by how they are derived (e.g., computationally mined, curated).
In this example, we will use pathways that are gene sets considered to be "canonical representations of a biological process compiled by domain experts" and are a subset of `C2: curated gene sets` [@Subramanian2005].
MSigDB contains [8 different gene set collections](https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) [@Subramanian2005; @Liberzon2011] that are distinguished by how they are derived (e.g., computationally mined, curated).
In this example, we will use pathways that are gene sets considered to be "canonical representations of a biological process compiled by domain experts" and are a subset of `C2: curated gene sets` [@Subramanian2005; @Liberzon2011].

Specifically, we will use the [KEGG (Kyoto Encyclopedia of Genes and Genomes)](https://www.genome.jp/kegg/) pathways [@Kanehisa2000].

Expand Down Expand Up @@ -305,18 +305,20 @@ keytypes(org.Dr.eg.db)

Even though we'll use this package to convert from Ensembl gene IDs (`ENSEMBL`) to gene symbols (`SYMBOL`), we could just as easily use it to convert from an Ensembl transcript ID (`ENSEMBLTRANS`) to Entrez IDs (`ENTREZID`).

The function we will use to map from Ensembl gene IDs to gene symbols is called `mapIds()`.
The function we will use to map from Ensembl gene IDs to gene symbols is called `mapIds()` and comes from the `AnnotationDbi` package.

```{r}
# This returns a named vector which we can convert to a data frame, where
# the keys (Ensembl IDs) are the names
symbols_vector <- mapIds(org.Dr.eg.db, # Specify the annotation package
symbols_vector <- mapIds(
# Replace with annotation package for the organism relevant to your data
org.Dr.eg.db,
# The vector of gene identifiers we want to map
keys = dge_df$Gene,
# The type of gene identifier we want returned
column = "SYMBOL",
# What type of gene identifiers we're starting with
# Replace with the type of gene identifiers in your data
keytype = "ENSEMBL",
# Replace with the type of gene identifiers you would like to map to
column = "SYMBOL",
# In the case of 1:many mappings, return the
# first one. This is default behavior!
multiVals = "first"
Expand Down Expand Up @@ -452,7 +454,7 @@ kegg_result_df <- data.frame(kegg_ora_results@result)

Let's print out a sneak peek of it here and take a look at how many sets do we have that fit our cutoff of `0.1` FDR?

```{r}
```{r rownames.print = FALSE}
kegg_result_df %>%
dplyr::filter(p.adjust < 0.1)
```
Expand Down
Loading