-
Notifications
You must be signed in to change notification settings - Fork 15
Cleaning up and reclaiming disk space
ITEP creates a large number of intermediate files and the database it creates is also quite large. This section talks about the purpose of these intermediate files and what can be done to reclaim disk space.
The following files are used in various ways by different ITEP scripts and should not be deleted even after the database is loaded:
$ITEP_ROOT/aliases/aliases
$ITEP_ROOT/db/DATABASE.sqlite
$ITEP_ROOT/groups
$ITEP_ROOT/organisms
$ITEP_ROOT/orthomcl.config_sample
The input data in $ITEP_ROOT/raw and $ITEP_ROOT/genbank should also be maintained in the same locations - the analysis scripts don't use them but you will need them if you need to make updates to your database and may end up using them them for downstream analysis.
Over the course of building the database (particularly in step 1), a couple of large tables are created, manipulated, and then dropped. SQLite will by default keep the free space for re-use when new tables are created or new records or added. This means it will not release the space for other programs to use.
After you are finished with all of the database building scripts, you can reclaim any remaining empty space in the database by running the provided wrapper script for SQLite's VACUUM command:
$ ./cleanupSqliteTables.sh TRUE
You can also issue the VACUUM command manually by using sqlite to open the db/DATABASE.sqlite file, if you prefer.
- WARNING: The reason the VACUUM command is not automatically included in the build scripts is that performing a VACUUM requires construction of a temporary database file that is (at most) as large as the original database; thus it requires a large amount of disk space for large databases to perform. The free space must be available on the partition containing SQLite's tmp directory. SQLite uses the TMPDIR environment variable to search for temp space, so you can try changing that to a partition with lots of space if you have one.
The FASTA files in faa/, fna/ and the modified table files in modtable/ can be safely removed. They are automatically regenerated from the files in raw/
Note that these directories tend to be relatively small.
BLASTP and BLASTN data are generated for every pair of organisms and stored in blastres/ and blastn_res/, respectively. These files are not deleted in order to improve your ability to add more organisms to the database later (pairs of organisms that already have had BLAST run between them are not re-run). If you do not plan on adding any new organisms to your database, you can safely delete these files.
The paragraph above also holds true for RPSBLAST data (stored in rpsblast_res/). If you do not plan on adding any new organisms to your database, you can safely delete these files.