Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from Palindrome Analyzer. DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic.
Palindrome tree was built with Python 3.7+.
To install palindrome tree use Pypi repository.
pip install palindrome-tree
User has to initialize palindrome tree analyzer instance which is imported from main package palindrome_tree
.
from palindrome_tree import PalindromeTree
tree = PalindromeTree()
To predict regions with possible palindromes, run analyse without setting check_with_api
paramether.
from palindrome_tree import PalindromeTree
sequence_file = open("/path/to/sequence/name.txt", "r")
tree = PalindromeTree()
tree.analyse(
sequence=sequence_file.read(),
)
tree.results
The results are then stored in results variable as pd.DataFrame
.
position | sequence | |
---|---|---|
0 | 8 | TTTGTAGAGACAGGGTCTTGCTGTGTTTCC |
1 | 10 | TGTAGAGACAGGGTCTTGCTGTGTTTCCCA |
2 | 49 | CGAACTCCTGGCCTCTAGGCAATCCTCCCA |
3 | 102 | ATCCCACTCTTTTTTGAAAAATAAAATCTA |
4 | 105 | CCACTCTTTTTTGAAAAATAAAATCTACCA |
To predict regions with possible palindromes and afterward validation, run analyse with check_with_api
paramether set.
from palindrome_tree import PalindromeTree
sequence_file = open("/path/to/sequence/name.txt", "r")
tree = PalindromeTree()
tree.analyse(
sequence=sequence_file.read(),
validate_with_api=True,
)
tree.validated_results
The results are also stored in results variable as pd.DataFrame
.
original_index | after | before | mismatches | opposite | position | sequence | signature | spacer | stability_NNModel | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | CC | TTTGT | 2 | CTGTGTTT | 5 | AGAGACAG | 8-7-2 | GGTCTTG | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85} |
1 | 0 | TGCTG | TTTGT | 2 | GGGTCT | 5 | AGAGAC | 6-1-2 | A | {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3} |
2 | 0 | GTGTT | TGTAG | 2 | CTTGCT | 7 | AGACAG | 6-3-2 | GGT | {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999} |
3 | 0 | TTCC | TAGAG | 2 | CTGTGT | 9 | ACAGGG | 6-5-2 | TCTTG | {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25} |
4 | 1 | CCCA | TGT | 2 | CTGTGTTT | 3 | AGAGACAG | 8-7-2 | GGTCTTG | {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85} |
- xgboost = "^1.5.1"
- pandas = "^1.3.5"
- scikit-learn = "^1.0.2"
- requests = "^2.26.0"
-
Patrik Kaura - Main developer - patrikkaura
-
Jaromir Kratochvil - Developer - jaromirkratochvil
-
Jiří Šťastný - Supervisor
This project is licensed under the MIT License - see the LICENSE file for details.