Releases: ksahlin/ultra
v0.1
Major update. Previous versions of uLTRA had several bottlenecks which made is infeasible for mapping larger datasets (number of reads). Most notable updates:
- A faster and more memory-efficient seed finder namfinder. (>10x faster than previously used MEM finders)
- Removed loading reads/SAM files into memory on several places and instead stream over the files (Previously a sam file of alignments was loaded into RAM)
- Compressing intermediate output.
This version has been tested on the datasets I used in the publication of uLTRA from 2021. The largest dataset in the evaluation is the IsoSeq Alzheimer dataset (4.5M reads). On the Alzheimer dataset using 19 cores, peak memory usage is now less than 30Gb (previously ~100Gb), the runtime is 3h 40m (previously 5h 40m), and disk usage has gone down due to compressed files (I have not measured the reductions in size).
The accuracy of v0.1 is only a very small fraction lower than previous version (v0.0.4.2) on the tested simulated datasets. The non-identical output to previous versions is due to the new seed finder. The boost in aligning to, e.g., small exons is still there compared to other aligners.
v0.0.4.2
v0.0.4
- Fixed issue #4
- Added an option
--use_NAM_seeds
which changes the seeding from MEMs to NAMs (with strobemers). NAM seeding makes uLTRA faster and produces smaller intermediate files. The memory usage with--use_NAM_seeds
is "fixed" regardless of the number of cores/threads (about ~80-90Gb for human genome) compared to default option which grows with number of cores. Therefore, using--use_NAM_seeds
results in lower peak memory usage over the default option if using more than 18 cores, and higher memory usage otherwise. The alignment accuracy is largely the same -- NAM seeds decrease the accuracy of about 0.01%-0.05% compared to MEMs (i.e., 1 alignment in every 2,000-10,000). Due to faster runtime and smaller disk usage, at a cost of high memory usage, I recommend--use_NAM_seeds
for large datasets (>5M reads) if running on nodes with >90Gb memory and more than 20 cores.
Version used in published paper.
This is the version that was used in the paper published in Bioinformatics.
First stable working version of uLTRA
- Marks a milestone in the implementation.