Add the distinction between PANTHER's over-representation and enrichm…

…ent analysis to vignettes
moosa-r · Dec 28, 2024 · cf15ad3 · cf15ad3
1 parent d1db943
commit cf15ad3
Show file tree

Hide file tree

Showing 2 changed files with 97 additions and 20 deletions.
diff --git a/vignettes/rbioapi_do_enrich.Rmd b/vignettes/rbioapi_do_enrich.Rmd
@@ -237,7 +237,7 @@ rba_reactome_analysis_pdf(token = reactome$summary$token,
 
 PANTHER (protein analysis through evolutionary relationships) is a project that provides classification systems of genes and proteins. PANTHER also provides an enrichment service. In fact, the enrichment tool available on the [Gene Ontology (GO) website](https://geneontology.org/ "Gene Ontology Resource - Unifying Biology") is powered by [PANTHER](https://pantherdb.org/). Just like the previous sections, we will only review the enrichment functionality here; for an in-depth review of PANTHER refer to the vignette article "[PANTHER & rbioapi](rbioapi_panther.html "2.D: PANTHER & rbioapi")".
 
-To perform the analysis, first, we need to choose the classifications (i.e., gene sets) that we want to compare our results against it. To retrieve a list of available gene sets, we call the following function:
+To perform the analysis, first, we need to choose the annotation datasets (i.e. a collection of gene sets) that we want to compare our results against it. To retrieve a list of available annotation datasets, we call the following function:
 
 ```{r panther_sets, echo=TRUE, message=TRUE}
 panther_sets <- rba_panther_info(what = "datasets")
@@ -258,9 +258,19 @@ if (is.data.frame(panther_sets)) {
 }
 ```
 
-The Gene Ontology (GO) project is one of the pinnacles of scientists' collective effort in bioinformatics. The GO Consortium provides a comprehensive model of biological systems. In short, GO curates a thoroughly designed directed acyclic graph (DAG) of ontologies. You may think of it as a tree of terms, where as it branches out, the terms become more specific). Each protein may be annotated with one or more terms. The terms are organized in three domains: "Molecular Function," "Biological Process," and "Cellular Component". GO slim datasets refer to subsets which are a cut-down version of GO terms. If you are not familiar with GO, I strongly encourage you to see this page and follow the links it provides: [About the GO resource](https://geneontology.org/docs/introduction-to-go-resource/ "About the GO resource").
+Note that you should enter the "id" of the datasets, not its label. For example, entering "biological_process" is incorrect, you should rather use "GO:0008150".
 
-Here, we demonstrate the enrichment analysis using the Biological Process domain. Note that you should enter the "id" of the datasets, not its label. For example, entering "biological_process" is incorrect, you should rather enter the following:.
+Here, we demonstrate the enrichment analysis using the Biological Process annotations. The Gene Ontology (GO) project is one of the pinnacles of scientists' collective effort in bioinformatics. The GO Consortium provides a comprehensive model of biological systems. In short, GO curates a thoroughly designed directed acyclic graph (DAG) of ontologies. You may think of it as a tree of terms, where as it branches out, the terms become more specific). Each protein may be annotated with one or more terms. The terms are organized in three domains: "Molecular Function," "Biological Process," and "Cellular Component". GO slim datasets refer to subsets which are a cut-down version of GO terms. If you are not familiar with GO, I strongly encourage you to see this page and follow the links it provides: [About the GO resource](https://geneontology.org/docs/introduction-to-go-resource/ "About the GO resource").
+
+Depending on the provided input, PANTHER will conduct two types of analysis:
+
+1.  If a character vector is supplied, over-representation analysis will be performed using either Fisher's exact or binomial test.
+
+2.  If a data frame with gene identifiers and their corresponding expression values is supplied, statistical enrichment test is performed using Mann-Whitney U (Wilcoxon Rank-Sum) test.
+
+rbioapi determines the proper analysis based on the class of the `genes` parameter. Please refer to the details section of `rba_panther_enrich()` function manual and [rbioapi & PANTHER vignette article](rbioapi_panther.html#submit-the-analysis-request) for more information.
+
+Here, we only demonstrate using a gene list, without the expression values:
 
 ```{r panther_enrich, echo=TRUE, message=TRUE}
 panther_enrich <- rba_panther_enrich(genes = covid_critical,
@@ -269,13 +279,13 @@ panther_enrich <- rba_panther_enrich(genes = covid_critical,
                                      )
 ```
 
-In addition to the enrichment results, PANTHER returns other useful information about your analysis. The names are self-explanatory:
+In addition to the results table, PANTHER returns other useful information about your analysis. The names are self-explanatory:
 
 ```{r panther_enrich_str, echo=TRUE, message=TRUE}
 str(panther_enrich, 2)
 ```
 
-The enrichment results are returned as a Data Frame with an element named result:
+The analysis results are returned as a Data Frame with an element named result:
 
 ```{r panther_enrich_df, echo=FALSE}
 if (utils::hasName(panther_enrich, "result") && is.data.frame(panther_enrich$result)) {

diff --git a/vignettes/rbioapi_panther.Rmd b/vignettes/rbioapi_panther.Rmd
@@ -33,7 +33,6 @@ library(rbioapi)
 rba_options(timeout = 30, skip_error = TRUE)
 ```
 
-
 # Introduction {#introduction}
 
 Directly quoting the the paper published by [PANTHER](https://www.pantherdb.org "Protein Analysis THrough Evolutionary Relationships (PANTHER)") (Protein Analysis THrough Evolutionary Relationships) authors:
@@ -64,27 +63,66 @@ The available tools in PANTHER's **RESTful API services** can be divided into 3
 
 ------------------------------------------------------------------------
 
-# Research tools {#research-tools}
+# Gene List Analysis {#gene-list-analysis}
+
+`rba_panther_enrich()` is equivalent to [Gene List analysis tool's webpage](https://www.pantherdb.org/index.jsp "PANTHER Gene List Analysis"). Depending on the provided input's class, PANTHER will perform either over-representation analysis or statistical enrichment analysis. Below we demonstrate how to perform such analyses.
 
-## Gene List Analysis {#gene-list-analysis}
+## Get the available annotation datasets {#analysis-get-available-annotation-datasets}
 
-`rba_panther_enrich()` is an equivalent to [Gene List analysis tool's webpage.](https://www.pantherdb.org/index.jsp "PANTHER Gene List Analysis"). Here is a usage example:
+First, we need to select an annotation dataset to conduct the analysis based on it. Each annotation dataset contains a collection of terms, where each term is associated with a group of genes.
 
-```{r rba_panther_enrich, message=TRUE}
-## 1 We get the available annotation datasets in PANTHER (we need to select one of them to submit an enrichment request)
+To retrieve the list of available annotation datasets in PANTHER, use the following command:
+
+```{r enrich_available_annotations}
 annots <- rba_panther_info(what = "datasets")
-# Note that you should enter the "id" of the datasets, not its label (e.g. entering "biological_process" is incorrect, you should rather enter "GO:0008150").
-## 2 We create a variable with our genes' IDs
-genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42","CDK1","KIF23","PLK1",
-           "RAC2","RACGAP1","RHOA","RHOB", "PHF14", "RBM3", "MSL1")
-## 3 Now we can submit the enrichment request.
-enriched <- rba_panther_enrich(genes = genes,
+```
+
+```{r enrich_available_annotations_results, echo=FALSE}
+if (is.data.frame(annots)) {
+  DT::datatable(data = annots,
+              options = list(scrollX = TRUE, 
+                             paging = TRUE,
+                             fixedHeader = TRUE,
+                             keys = TRUE))
+} else {
+  print("Vignette building failed. It is probably because the web service was down during the building.")
+}
+```
+
+Please note that you should use the ID of the desired annotation dataset, not its label. For example, using `"biological_process"` is incorrect; you should rather use `"GO:0008150"`.
+
+## Submit the analysis request {#submit-the-analysis-request}
+
+Depending on the provided input, PANTHER will conduct two types of analysis:
+
+1.  If a character vector is supplied, over-representation analysis will be performed using either Fisher's exact or binomial test.
+
+2.  If a data frame with gene identifiers and their corresponding expression values is supplied, statistical enrichment test is performed using Mann-Whitney U (Wilcoxon Rank-Sum) test.
+
+rbioapi determines the proper analysis based on the class of the `genes` parameter. Please refer to the details section of `rba_panther_enrich()` function manual for more information.
+
+### Over-representation analysis {#over-representation-analysis}
+
+Now, suppose we want to perform an over-representation analysis against the 'GO biological process' annotation dataset. In this example, we only provide the gene names, thus over-representation analysis will be conducted:
+
+```{r rba_panther_overrep, message=TRUE}
+# Create a variable to store the genes vector
+my_genes_vec <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42",
+                  "CDK1","KIF23","PLK1", "RAC2","RACGAP1","RHOA",
+                  "RHOB", "PHF14", "RBM3", "MSL1")
+
+# Submit the analysis request.
+enriched <- rba_panther_enrich(genes = my_genes_vec,
                                organism = 9606,
-                               annot_dataset = "ANNOT_TYPE_ID_PANTHER_PATHWAY",
+                               annot_dataset = "GO:0008150",
                                cutoff = 0.05)
+
+# Note that we didn't supply the `test_type` parameter.
+# In this case, the function will default to using Fisher's exact test # (i.e. `test_type = "FISHER"`).
+# You may also use binomial test for the over-representation analysis # (i.e. `test_type = "BINOMIAL"`).
 ```
 
-```{r enriched_df, echo=FALSE}
+```{r rba_panther_overrep_results, echo=FALSE}
 if (utils::hasName(enriched, "result") && is.data.frame(enriched$result)) {
   DT::datatable(data = enriched$result,
               options = list(scrollX = TRUE, 
@@ -97,9 +135,38 @@ if (utils::hasName(enriched, "result") && is.data.frame(enriched$result)) {
 }
 ```
 
+### Statistical enrichment analysis {#statistical-enrichment-analysis}
+
+As you can see in the above example, only a vector of gene names was used. We can also use the corresponding expression values of the genes. In this case, PANTHER will perform a statistical enrichment analysis.
+
+To do so, the only change will be to supply a data frame to the `genes` parameter. Note that in this case, Mann-Whitney U Test will be performed. The data frame should have two columns: the first column should contain the gene identifiers as a character vector; the second column should contain the corresponding expression values as a numeric vector.
+
+```{r rba_panther_enrich, eval=FALSE}
+# Create a variable to store the data frame
+my_genes_df <- data.frame(
+  genes = c("p53", "BRCA1", "cdk2", "Q99835", "CDC42",
+                  "CDK1","KIF23","PLK1", "RAC2","RACGAP1","RHOA",
+                  "RHOB", "PHF14", "RBM3", "MSL1"),
+  ## generate random expression values
+  expression = runif(15, 0, 10) 
+)
+
+# Submit the analysis request.
+enriched <- rba_panther_enrich(genes = my_genes_df,
+                               organism = 9606,
+                               annot_dataset = "GO:0008150",
+                               cutoff = 0.05)
+
+# Note that we didn't supply the `test_type` parameter.
+# In this case, the function will default to Mann-Whitney U Test
+# (i.e. `test_type = "Mann-Whitney"`).
+# This is the only valid value for the statistical enrichment analysis test,
+# thus ommiting or supplying it will not make a difference.
+```
+
 **Please Note:** Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article [Do with rbioapi: Over-Representation (Enrichment) Analysis in R](rbioapi_do_enrich.html) ([link to the documentation site](https://rbioapi.moosa-r.com/articles/rbioapi_do_enrich.html)) for an in-depth review.
 
-## Tree grafter {#tree-grafter}
+# Tree grafter {#tree-grafter}
 
 `rba_panther_tree_grafter()` is an equivalent to the "[Graft sequence into PANTHER library of trees](https://www.pantherdb.org/tools/sequenceSearchForm.jsp)" tool.