WIP: ORA for RNA-seq (with WGCNA module genes) #381

cansavvy · 2020-11-30T15:15:37Z

Analysis Purpose

Add an "RNA-seq" version of ORA pathway analysis.

Pull Request Stage

This is a Draft PR - needs review of big concepts and outline.

To help reviewers save time, I'm using slightly different tags to alert you to the few places that have changed in this module so far.
**IDENTICAL TO MICROARRAY EXAMPLE** means that until a different tag shows up, all the info there is 99% the same as the microarray example. Any differences I put in parenthesis like **IDENTICAL TO MICROARRAY EXAMPLE** (except microarray. --> rnaseq)

**REVIEW** doesn't mean it's 100% polished, but it does mean that it differs from the microarray example.

Strategy

So far, not much has changed from the microarray example to the RNA-seq example besides using the WGCNA gene modules instead of a DGE results table. Note that for the purposes of this Draft PR, the descriptions are completely polished yet, but wanted to get some feedback on the larger steps and see if there's anything we want to change.

Concerns/Questions for reviewers:

What I want to know from this Draft PR is what do we want to change from microarray? Related to #223.

Do we want to do different plots, that seems to be the main thing we could play with, though for story purposes I do like what the clusterProfiler plots here illustrate.
For the background gene list should we use all the detected genes, or should we pull all possible genes from Ensembl as background genes -- this feels like too much, but let me know if you disagree.

Another idea, if we are okay with what is here, and given the other before going live examples we want to get to and that steps here are fine, perhaps we leave these steps as is and file an issue about us coming back if we have inspiration about how to show something different with ORA later?

Analysis Pull Request Check List (roughly in order):

Content checks

All {{BLANKS}} have been replaced with the correct content.
Sources are cited
Seed is set (if applicable) -- not set here.

Formatting Checks

Removed any manual numbering of sections.
Removed any instances of chunk naming.
Comments and documentation are up to date.
All links have been checked and are properly formatted.

Add datasets to S3

Added data and metadata files to S3.

Docker/Snakemake rendering components

Added the .html link to the navigation bar.
Any not yet added packages needed for this analysis have been added to the Dockerfile and it successfully builds.

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd

cbethell

Looks good @cansavvy!

I just have two suggestions which I left below, and a question re a warning message that pops up upon running the dotplot() function.

cbethell · 2020-12-01T14:27:56Z

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd

+Even though we'll use this package to convert from Ensembl gene IDs (`ENSEMBL`) to gene symbols (`SYMBOL`), we could just as easily use it to convert from an Ensembl transcript ID (`ENSEMBLTRANS`) to Entrez IDs (`ENTREZID`).
+
+The function we will use to map from Ensembl gene IDs to gene symbols is called `mapIds()`.
+


We mapped to gene symbols in the ORA for microarray example analysis, perhaps we should map to Entrez IDs here? The reason would be to show alternate methods between the two ORA example analyses.

Yeah! That's a good idea. Let me try that.

cbethell · 2020-12-01T14:35:37Z

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd

+
+ORA generally requires you make some sort of arbitrary decision to obtain your genes of interest list and this is one of the approach's weaknesses -- to get to a gene list we've removed all other context.
+
+In this example, we will focus on one module, module 19, which we previously identified as differentially expressed across our time point variable.  


We will likely want to show how we identified module 19 here or point to where we previously identified module 19 as differentially expressed across the time point variable. If possible, adding a chunk or two to identify the module of interest before we get to this step would be ideal (if it is not too complex for this use case).

I was trying to be brief and not get bogged down too much in the details of the previous analysis, but I will add more than just this Lol.

cbethell · 2020-12-01T14:39:19Z

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd

+The `enrichplot::dotplot()` function will only plot gene sets that are significant according to the multiple testing corrected p values (in the `p.adjust` column) and the `pvalueCutoff` you provided in the [`enricher()` step](#run-ora-using-the-enricher-function).
+
+```{r}
+enrich_plot <- enrichplot::dotplot(kegg_ora_results)


Any idea why the following message is showing up at this step?

## wrong orderBy parameter; set to default orderBy = "x"

Oh I see it. Hmm...

This is an error that is known in this version of enrichplot: YuLab-SMU/enrichplot#22
It looks like if we bumped up to a more recent version we may not have this problem. The users may not encounter this issue depending on what version they install.

Because it doesn't show up in the render and users may not encounter it and it doesn't seem to affect the plot, I'm inclined to let this one go. But it was good to look into. 👍

envest

Great -- thanks for clear indication of what's identical / to be reviewed.

In terms of big picture Qs you asked:

Overall I like the similarity between ORA with microarray and RNA-seq. I like your idea to keep the existing plots and file an issue to return in future rounds of changes. Using the set of detected genes seems right to me, and there could even be a little explanation in that section to say that by not testing genes that are not in our data, we can avoid having a too-conservative adjusted p-value (right?).

See minor comments below! 🎉

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd

cansavvy · 2020-12-02T15:12:20Z

Don't need this anymore. #395 is merged.

cansavvy added 5 commits November 30, 2020 09:33

Add the file. It works

8b83db4

Add components

83465f7

re-render

f05c570

Add review tags

c009d63

Update wording around detectable genes

2e69b34

cansavvy requested a review from cbethell November 30, 2020 17:40

cansavvy commented Nov 30, 2020

View reviewed changes

03-rnaseq/pathway-analysis_rnaseq_01_ora.Rmd Outdated Show resolved Hide resolved

cansavvy added 3 commits November 30, 2020 13:04

Add some words to dictionary.txt

fd5812b

re-render

f4b3aa7

Switch to PNG

865f8f8

cansavvy mentioned this pull request Nov 30, 2020

WGCNA: Switch plots to save as png (instead of PDF) #382

Merged

cansavvy requested a review from envest November 30, 2020 20:05

cbethell reviewed Dec 1, 2020

View reviewed changes

envest reviewed Dec 1, 2020

View reviewed changes

cansavvy mentioned this pull request Dec 1, 2020

ORA upset plot is not saved, only enrich plot 2x! #393

Closed

cansavvy added 2 commits December 1, 2020 13:12

Incorporating cbethell 's and envest 's review

6b45545

Switch from using gene symbols to Entrez IDs

e9d3f52

This was referenced Dec 1, 2020

ORA RNA-seq: Part 1 - The Set Up #394

Merged

ORA RNA-seq: Part 2 - Run ORA and get results! #395

Merged

cansavvy closed this Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: ORA for RNA-seq (with WGCNA module genes) #381

WIP: ORA for RNA-seq (with WGCNA module genes) #381

cansavvy commented Nov 30, 2020 •

edited

Loading

cbethell left a comment

cbethell Dec 1, 2020

cansavvy Dec 1, 2020

cbethell Dec 1, 2020

cansavvy Dec 1, 2020

cbethell Dec 1, 2020

cansavvy Dec 1, 2020

cansavvy Dec 1, 2020 •

edited

Loading

cansavvy Dec 1, 2020

envest left a comment

cansavvy commented Dec 2, 2020

		Even though we'll use this package to convert from Ensembl gene IDs (`ENSEMBL`) to gene symbols (`SYMBOL`), we could just as easily use it to convert from an Ensembl transcript ID (`ENSEMBLTRANS`) to Entrez IDs (`ENTREZID`).

		The function we will use to map from Ensembl gene IDs to gene symbols is called `mapIds()`.


		ORA generally requires you make some sort of arbitrary decision to obtain your genes of interest list and this is one of the approach's weaknesses -- to get to a gene list we've removed all other context.

		In this example, we will focus on one module, module 19, which we previously identified as differentially expressed across our time point variable.

WIP: ORA for RNA-seq (with WGCNA module genes) #381

WIP: ORA for RNA-seq (with WGCNA module genes) #381

Conversation

cansavvy commented Nov 30, 2020 • edited Loading

Analysis Purpose

Pull Request Stage

Strategy

Concerns/Questions for reviewers:

Analysis Pull Request Check List (roughly in order):

Content checks

Formatting Checks

Add datasets to S3

Docker/Snakemake rendering components

cbethell left a comment

Choose a reason for hiding this comment

cbethell Dec 1, 2020

Choose a reason for hiding this comment

cansavvy Dec 1, 2020

Choose a reason for hiding this comment

cbethell Dec 1, 2020

Choose a reason for hiding this comment

cansavvy Dec 1, 2020

Choose a reason for hiding this comment

cbethell Dec 1, 2020

Choose a reason for hiding this comment

cansavvy Dec 1, 2020

Choose a reason for hiding this comment

cansavvy Dec 1, 2020 • edited Loading

Choose a reason for hiding this comment

cansavvy Dec 1, 2020

Choose a reason for hiding this comment

envest left a comment

Choose a reason for hiding this comment

cansavvy commented Dec 2, 2020

cansavvy commented Nov 30, 2020 •

edited

Loading

cansavvy Dec 1, 2020 •

edited

Loading