Skip to content

Pathway Enrichment Analysis

kevincjnixon edited this page Jul 6, 2021 · 4 revisions

Pathway Enrichment Analysis

Once differentially expressed genes are determined, pathway enrichment analyses are typically performed to give us an idea as to what pathways, processes, etc are enriched in the data. BinfTools' GO_GEM() function makes use of gprofiler2's gost() function to perform enrichment analyses, make some summary figures and export the results in two formats. To start, GO_GEM() has the following arguments:

  • geneList A Character vector or named list of gene symbols or IDs to be analyzed (e.g. up-/down-regulated DEGs). This can also be the output of volcanoPlot() or MA_Plot() when the argument returnDEG is set to TRUE.
  • species A Character indicating the species (e.g. "hsapiens", "mmusculus")
  • bg A Character vector of genes indicating the background (i.e. all genes that were analyzed). NULL indicates no background and entire genome is used. Use a background if possible.
  • source A Character vector indicating the data source to be used. Currently "GO" ("GO:BP", "GO:MF", "GO:CC"), "KEGG", "REAC", "TF", "MIRNA", "CORUM", "HP", "HPA", "WP".
  • corr A Character indicating the type of p-value correction method to use. Default is "fdr".
  • iea Boolean indicating if electronic annotations should be excluded. Default is FALSE.
  • prefix A Character vector describing the path and prefix of the output files (should not include any file extensions).
  • ts An integer indicating the minimum term size to be used when generating the plot for most enriched terms. Default is 10.
  • pdf A Boolean indicating if the bar plot should be printed to pdf. Default is TRUE.
  • fig A Boolean indicating if the bar plot should be printed to R output. Default is TRUE.
  • returnGost A Boolean indicating if the gprofiler2 gost() results should be returned. This is useful when wanting to look at a figure interactively using gprofiler2::gostplot(gost) where the gost object is returned when returnGost=TRUE. Default is FALSE.
  • writeRes Boolean indicating if GO.txt results should be written to file 'prefix.GO.txt'
  • writeGem Boolean indicating if gem.txt results should be written to file.
#Get the upregulated genes from the results:
up<-rownames(subset(res, padj<0.05 & log2FoldChange > log(1.5,2)))
#Create an output directory:
dir.create("GO")
#Run GO for biological process:
GO_GEM(up, species="dmelanogaster", bg=rownames(res), source="GO:BP", prefix="GO/Up_BP")

This function will generate two figures:

  1. a bar plot of the top ten enriched terms (minimum genes/term = ts) with bars showing enrichment and significance of the terms Top ten enriched terms
  2. a bar plot of the top ten significant terms (maximum 500 genes/term) with bars showing enrichment and significance of the terms Top 10 significant terms These figures will be output in the R environment, but also as a pdf named prefix.top10.pdf.

The complete analysis results will be output in a .txt file named prefix.GO.txt. Columns of interest in this file are as follows:

  • p_value Adjusted p-value (using the correction method indicated by corr) of the term.
  • term_size Number of genes out of the background belonging to the term.
  • query_size Number of genes in geneList that are annotated to a term in the data sources
  • intersection_size Number of genes from query (geneList) that belong to the term.
  • term_id Code from data source for the term
  • term_name The enriched term
  • effective_domain_size The total number of genes in the background that are annotated to a term in the data sources
  • intersection Genes from query (geneList) that belong to the term
  • enrichment Enrichment of the term.

Finally, a .gem file with the results will be output to prefix.gem.txt. This file is compatible with the Cytoscape app "EnrichmentMap" for visualization (see this paper for a step-by-step description).