this repo tries to optimize some Perl scripts which are part of the CRISPRAnalyzer R shiny package and which are too slow for production use in web applications. The original code's benchmark is as follow (please note: PERL script not included here)
all examples are done on Thinkpad T420s Core i7 vPro with 16GB RAM and Evo 850 SSD.
PERL
time perl CRISPR-extract.pl "ACC(.{20,21})G" ./data/TRAIL-Replicate1.fastq no
real 0m45.006s
user 0m44.191s
sys 0m0.682s
RUST
$ time fastq_parser ./data/TRAIL-Replicate1.fastq
output
real 0m3.866s
user 0m3.414s
sys 0m0.438s
C
time ./extractor default ./data/TRAIL-Replicate1.fastq no
output
real 0m4.409s
user 0m3.953s
sys 0m0.445s
TODO: make the regexp parsing multithreaded in RUST on big big input files
unbelievable the Rust code did beat the low-level C code, pretty amazing!
PERL (not included in this repo)
time perl CRISPR-mapping.pl ./data/pilotscreen.fasta ./data/TRAIL-Replicate1_extracted.sam "M{20,21}$" "_"
output
real 1m17.280s
user 1m16.820s
sys 0m0.192s
RUST
time sam_mapper -f ./data/pilotscreen.fasta -s ./data/TRAIL-Replicate1_extracted.sam
output
real 0m7.590s
user 0m7.461s
sys 0m0.111s