Predict TATA box with HMM data downloaded from EPD Database in Python3.
@ EPD Database Citations:
-
The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms. Dreos, R., Ambrosini, G., Groux, R., Périer, R., Bucher, P. Nucleic Acids Res. (2017).
-
The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools. Dreos, R., Ambrosini, G., Périer, R., Bucher, P. Nucleic Acids Res. (2014).
Steps to do:
- Prepare a sequence file in .fna(.fasta) format and a genome annotation file in .gff(.gff3) format.
- Prepare a HMM matrix. (A default HMM TATA box related matrix downloaded from EPD has been uploaded in the repository.)
- Check the file names in the Python script.
- Run HMM.sh script.
- Label each predicted TATA box in a distance threshold with + or -.
- Develop statistic analysis with R plotting. (Distribution Plot, scatter plot, ROC plot)
Example results has also been uploaded.