Skip to content

Latest commit

 

History

History
91 lines (78 loc) · 3.56 KB

README.md

File metadata and controls

91 lines (78 loc) · 3.56 KB

RE-GOA

A tool for non-coding genomic regions functional enrichment analysis based on Regulatory Elements Gene Ontology Annotation (RE-GOA). Resources for RE-GOA in mouse and human are availible.

Installation

  1. Run the following commands for installation:
wget https://github.com/AMSSwanglab/RE-GOA/archive/master.zip  
unzip master.zip
cd RE-GOA-master
  1. Download the necessary files for RE-GOA into RE-GOA-master at: https://drive.google.com/file/d/17nYBAGKc2ZZK06mY-9RQPZVSWgAoHxgu/view?usp=sharing and run the fowllowing command:
tar -zxvf REGOA_Data.tar.gz

Run RE-GOA

  1. Prepare input file: RE-GOA takes a .bed file as input, with which each line shoud have at least 3 columns: chromchromStartchromEnd. .bed file for mm9 or hg19 are accepted. Few lines of an example input file exampleinput.bed are shown as follows:
chr1	3389104	3389281
chr1	3991781	3992030
chr1	4333622	4333830
chr1	4402701	4402873
chr1	4424781	4425411
chr1	4463463	4465117
chr1	4469419	4470362
chr1	4527428	4529092
  1. Editconfigures.txt, including:
  • path for path of input file;
  • datapath=./datamgi/ for mm9 bed file and datapath=./datahg/ for hg19 bed file;
  • resultpath for path of output files;
  • filename for input file name.

For example:

path	./inputfiledict/
datapath	./datamgi/
resultpath	./outputfiledict/
filename	exampleinput.bed
  1. After preparing the input files as above, run the following command:
python peaksanalysis.py
  1. Output files in resultpath fold includes four files, exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt,exampleinput_genes.txt.

exampleinput_BP.txt,exampleinput_CC.txt,exampleinput_MF.txt list out enriched terms in Biological Process, Cellular Component, Molecular Function, and have 9 colunms in each line, which are described as follows:

1.  GO id
2.  GO term name
3.  m (Num of peaks annotated with the term)
4.  N (Num of peaks located in defined REs)
5.  p (background probability of the term)
6.  raw p-value=Binom(m,N,p)
7.  -log p-value
8.  B-H corrected Q-value
9.  -log Q-value

Terms are sorted by p-value.

exampleinput_genes.txt lists out enriched genes, and have 2 columns, which are described as follows:

1.  gene
2.  n (Num of peaks located in a RE which regulates the gene)

Genes are sorted by n.

Requirements

Python environment: python 3

Python package: pickle, math, and scipy

Memory: >= 3.0 Gb

It usually takes about one minite for computing a .bed file and and write all text output files.

REs Annotations

Annotations of REs can be browsed after unzip REs Annotations.zip. Dictionary ./hg19/ and ./mm9/ contain REs annotation of REs for human and mouse in BP, MF, and CC. Each line in a XX_REGOA.txt file contains several columns, with first column represents the RE (chromchromStartand chromEnd), and the following columns are GO terms' id which are annotated to the RE. Few lines are showm as an example as follows:

chr15_73319767_73323849	GO:0003674	GO:0005515	GO:0005488
chr15_90046441_90046590	GO:0042802	GO:0003824	GO:0005488	GO:0005515	GO:0003674
chr15_98418432_98419329	GO:0005488	GO:0060090	GO:0005515	GO:0030674	GO:0003674

Codes for generate RE-GOA

Codes for training andannotating are availible at generate RE-GOA, and associated datas are availible at https://drive.google.com/file/d/1jIU_DtBSQXID65Ky2HpTt5Z3wFojG4rh/view?usp=sharing

Citation

If you use RE-GOA or RE-GOA associated resources, please cite

Yurun, et al. Annotating regulatory elements by heterogeneous network embedding. 2021.