fastq_extractor_proof_of_principle

this repo tries to optimize some Perl scripts which are part of the CRISPRAnalyzer R shiny package and which are too slow for production use in web applications. The original code's benchmark is as follow (please note: PERL script not included here)

all examples are done on Thinkpad T420s Core i7 vPro with 16GB RAM and Evo 850 SSD.

PERL

time perl CRISPR-extract.pl "ACC(.{20,21})G" ./data/TRAIL-Replicate1.fastq no

real	0m45.006s
user	0m44.191s
sys	0m0.682s

RUST

$ time fastq_parser ./data/TRAIL-Replicate1.fastq

output

real	0m3.866s
user	0m3.414s
sys	0m0.438s

C

time ./extractor default ./data/TRAIL-Replicate1.fastq  no

output

real	0m4.409s
user	0m3.953s
sys	0m0.445s

TODO: make the regexp parsing multithreaded in RUST on big big input files

unbelievable the Rust code did beat the low-level C code, pretty amazing!

sam_mapper in RUST

PERL (not included in this repo)

time perl CRISPR-mapping.pl ./data/pilotscreen.fasta ./data/TRAIL-Replicate1_extracted.sam "M{20,21}$" "_"

output

real	1m17.280s
user	1m16.820s
sys	0m0.192s

RUST

time sam_mapper -f ./data/pilotscreen.fasta -s ./data/TRAIL-Replicate1_extracted.sam

output

real	0m7.590s
user	0m7.461s
sys	0m0.111s

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
extractor_in_C		extractor_in_C
extractor_in_RUST		extractor_in_RUST
sam_mapper_in_RUST		sam_mapper_in_RUST
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastq_extractor_proof_of_principle

sam_mapper in RUST

About

Releases

Packages

Contributors 2

Languages

OliPelz/fastq_extractor_proof_of_principle

Folders and files

Latest commit

History

Repository files navigation

fastq_extractor_proof_of_principle

sam_mapper in RUST

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages