netGO is an R/Shiny package for network-integrated pathway enrichment analysis.
netGO provides user-interactive visualization of enrichment analysis results and related networks.
Currently, netGO supports analysis for four species (Human, Mouse, Arabidopsis thaliana,and Yeast)
These data are available from netGO-Data repository.
The R packages listed below are required to be installed before running netGO.(Alphabetical order)
devtools, doParallel, doSNOW, DT, foreach, googleVis, htmlwidgets, shiny, shinyCyJS, shinyjs, V8
-
Most of the packages are avaiable from CRAN, but shinyCyJS should be installed from github.
-
Linux user has to install V8 after installing the other packages.
-
Note that netGO is not supported for centOS 8, because V8 is not available in centOS 8.
On Debian / Ubuntu : libv8-dev or libnode-dev.
On Fedora : v8-devel
more information
The user may want to use the following codes to install the required packages.
install.packages('devtools') # 2.2.1
library(devtools) # check Rcpp package is installed.
install_github('unistbig/shinyCyJS')
install.packages('doParallel') # 1.0.15
install.packages('doSNOW') # 1.0.18
install.packages('DT') # 0.11
install.packages('foreach') # 1.4.7
install.packages('googleVis') # 0.6.4
install.packages('htmlwidgets') # 1.5.1
install.packages('shiny') # 1.4.0
install.packages('shinyjs') # 1.0
install.packages('V8') # 2.3
Here are codes to run netGO for the breast tumor dataset (GEO GSE3744.)
library(devtools)
install_github('unistbig/netGO') # install netGO library
library(netGO) # load netGO library
DownloadExampleData() # Download and load the breast tumor data
obj = netGO(genes = brca[1:30], genesets, network, genesetV)
# The user may also load the pre-calculated result using the following command
# load("brcaresult.RData")
For custom data analysis,
library(netGO)
userGenesetV = BuildGenesetV(genesets = userGenesets, network = userNetwork)
obj = netGO(genes = userGenes, genesets = userGenesets, network = userNetwork, genesetV = userGenesetV)
Running this example takes 5 to 25 minutes depending on the system used. The analysis results of netGO is shown below.
The analysis result can be visualized using the following codes:
netGOVis(obj, genes = brca[1:30], genesets, network, R = 50, Q = 0.25 ) # visualize netGO's result
If user wants to access result without shinyweb-application, the following functions can be used to export the result as text files
# exportGraphTxt
table = exportGraphTxt(gene = brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # table
head(table)
# exportGraph
graph = exportGraph(brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object
shinyCyJS(graph)
# exportTable
table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)
dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable
Example Datasets (netGO-Data repository)
Data | genes | genesets | network | genesetV |
---|---|---|---|---|
Breast Tumor | brca.RData | c2gs.RData | networkString.RData networkHumannet.RData | genesetVString1,2.RData genesetVHumannet1,2.RData |
P53 | p53.RData | c2gs.RData | networkString.RData networkHumannet.RData | genesetVString1,2.RData genesetVHumannet1,2.RData |
Diabetes | dg.RData | cpGenesets.RData | networkString.RData networkHumannet.RData | cpgenesetV1,2.RData |
The user can download the breast tumor data using DownloadExampleData function(Recommended)
Data | genes | genesets | network | genesetV |
---|---|---|---|---|
ShadowResponse | Aragenes.RData | KEGGara.RData | networkAranet.RData | AragenesetV.RData |
Species | genesets | network |
---|---|---|
Mouse | KEGGmouse.Rdata | networkMousenet.Rdata |
Yeast | KEGGyeast.Rdata | networkYeastnet.Rdata |
netGO requires the follwoing four data types.
-
genes : a character vector of input genes (e.g., differentially expressed genes).
-
genesets : a named list of gene-sets consisting of groups of genes to be tested.
-
network : a numeric matrix of network data. The network scores are normalized to the unit interval [0,1] by dividing each score by the maximum score
-
genesetV : A numeric matrix of pre-calculated interaction data between gene and gene-sets.
The dimension of matrix must be [{number of genes} , {number of gene-sets}].
It can be built by using BuildGenesetV function with network and genesets objects as the input arguments.genesetV = BuildGenesetV(network, genesets)
netGO function tests the significance of the gene-sets for the input gene list
and returns a data frame of gene-sets, their p-values, q-values derived from netGO+, Fisher’s exact test and netGO (optional) as well as the scores for the network interaction and overlap.
Input arguments
-
genes: a character vector of input genes (e.g., differentially expressed genes).
-
genesets: a list of gene-sets consisting of groups of genes.
-
network: A numeric matrix of network data. The network scores are normalized to the unit interval [0,1]. 1 represents strong interaction and 0 for no interaction
A B C A 0 0.1 0.76 B 0.1 0 0.324 C 0.76 0.324 0 -
genesetV: a numeric matrix of pre-calculated interaction data between genes and gene-sets.
This object can be built with BuildGenesetV function.Gene-set1 Gene-set2 Gene-set3 A 0.837 1.647 0.074 B 0 1.75 0.113 C 0.464 0.486 2.442 -
alpha (optional): a numeric parameter ( ≥ 1; the default is 20) that weights the contribution of network connections in enrichment analysis.
-
beta (optional): a numeric parameter (∈[0,1]; the default is 0.5) that balances the weights between the relative and absolute network scores.
- nperm (optional): a numeric parameter to determine the bin size (number of genes) to be used during resampling. The default is NULL which assigns approximately 2000 genes to each bin
- pvalue (optional): a boolean parameter to determine whether to return Q-values only ( FALSE ) or both P-values and Q-values (TRUE)
- plus (optional): a boolean parameter to determine whether to run both netGO and netGO+ (plus = FALSE) or netGO+ only ( plus = TRUE, default )
- verbose (optional) : a boolean parameter whether to show more process of netGO as follows.
Notice the input genes should be represented in gene symbols when using the default networks and gene-sets (STRING and MSigDB).
Other types of gene names are also allowed if the corresponding customized data (networks and gene-set data) are used.
netGOVis function visualizes the analysis results on the web browser (google chrome is recommended).
The resulting graphs (svg format) and table are downloadable from the web browser.
Input arguments
- obj: the data frame of analysis results obtained by running netGO function.
It consists of multiple columns including
- gene-set name and p, q-values evaluated using netGO (optional), netGO+, and Fisher’s exact test as well as the scores for the overlap and networks.
- genes, genesets, network: the same as those in the netGO function.
- R (optional): gene-set rank threshold, The default is 50 (Top 50 gene-sets in either method will be shown).
- Q (optional): Gene-set Q-value threshold, The default is 0.25. (gene-sets with Q-value ≤ 0.25 will be used)
After running the netGO function, the user may see the following logs in the R console.
and user's default web browser (netGO was built based on chrome environment) will return the following interactive visualization:
BuildGenesetV function will build genesetV object using the given network and genesets.
genesetV is pre-calculated interaction files used to reduce the running time of netGO.
Input arguments
- genesets, network: the same as those in the netGO function.
This function will download example data in the user's working directory and load the data ( breast tumor, GSE3744 ) in user's R environment.
Note that, if objects exist in the working directory, this function will not download the data again, so we recommand removing and downloading them again if netGO package is updated.
Input arguments
- none
- R object named brca, genesets, genesetV, network, obj will be loaded.
exportGraph function will export network data from the netGO analsysis result as graph object that can be accessed using shinyCyJS function
Input arguments
-
genes, network : the same as those in the netGO function.
-
geneset : a character vector of gene symbols (e.g., member of genesets object in netGO).
for example,
geneset = genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']]
graph = exportGraph(brca[1:30], geneset =
genesets[['SMID_BREAST_CANCER_NORMAL_LIKE_UP']], network) # shinyCyJS graph object
shinyCyJS(graph)
However, the default viewer of R (not web browser) will not use the layout functions as shown below.
exportGraphTxt function will export network data from the netGO analysis result as table format.
Input arguments
- genes, network, geneset : the same as those in the exportGraph function.
For example,
table = exportGraphTxt(brca[1:30], geneset, network)
head(table)
the exported data are shown as
geneA | geneB | strength | type |
---|---|---|---|
A | B | 0.1 | Inter |
C | D | 0.82 | Inner |
'Inter' means geneB belongs to the intersection of genes and genesets. 'Inner' means geneB belongs to the differenced set genesets – genes.
exportTable will export the result object of netGO as table or data.table.
Input arguments
- obj, R, Q : the same as those in the netGOVis function.
for example,
table = exportTable(obj, R = 50, Q = 0.25) # table
head(table)
dtable = exportTable(obj, type='D', R = 50, Q = 0.25) # data.table
dtable
The exported data have the format as follows:
geneset name | netGO+ q-value | Fisher q-value |
---|---|---|
genesetA | 0.11 | 0.2 |
The netGO analysis results are visualized through three panels: interaction networks, list of significant gene-sets, and the bubble chart.
- The network panel displays the input genes, selected gene-set, and the network connections between the two.
- Sky blue nodes represent input genes (e.g., differentially expressed genes)
- Yellow nodes represent genes in the selected gene-set
- Green nodes represent the intersection of input genes and the gene-set.
- The edge width represents the strength of interaction between two nodes.
- Genes without edges will be not be displayed.
- The gene-set can be selected by clicking on the gene-set name on the upper-right panel.
- The user can download the graph image as SVG format.
- This panel contains the list of significant gene-sets as well as their Q-values ( or P-values ) evaluated from netGO, netGO+ and Fisher’s exact test. It is downloadable by clicking the ‘Download Table’ button in the upper right corner of the table
- This module plots the bubble chart of significant gene-sets for the netGO+ results.
- The overlap (x-axis) and network (y-axis) scores of the significant gene-sets are represented.
- The size of bubbles represents the significance level of each gene-set in -log10 scale (Qvalue).
- Hovering/Click on each bubble will show corresponding statistical values.
-
Comments / suggestions and questions will be greatly appreciated,
-
Jinhwan Kim @jhk0530 kjh0530@unist.ac.kr
-
prof. Dougu Nam dougnam@unist.ac.kr
This project is MIT licensed