-
Notifications
You must be signed in to change notification settings - Fork 15
Itep data limitations
The presence\absence analysis will not work for more than 1997 genomes by default (the maximum number of columns in a SQLite database is 2000, and 3 columns are reserved for annotation and cluster information). However, due to time and memory limitations we recommend using ITEP for 300 genomes at most. Running BLASTP and BLASTN on 16 cores with 300 genomes would take about a week (clustering and RPSBLAST would take additional time) and the final database including BLASTP, BLASTN, RPSBLAST, genome sequences and clustering results would require about 2 TB of hard drive space for 300 average-sized bacterial genomes.
The maximum length of a contig that can be imported into ITEP is 1 billion base pairs because that is the default maximum string length in SQLite. The limit can be increased by re-compiling SQLite with certain compile flags.
The database grows roughly as O(N^2) where N is the number of genomes in both time and space.