Skip to content

A repository of experimental code and benchmark for greedy/dynamic programming approximate string matching algorithms.

License

Notifications You must be signed in to change notification settings

GZHoffie/approximate-string-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Approximate-String-Matching

Benchmark

Simulated Data

Error rate: 0.05 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 36.220 s
=> LEAP             | 1.550 s
=> Greedy           | 0.850 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 99.757 %
=> Greedy           | 92.975 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 97.512 %

Error rate: 0.10 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 34.260 s
=> LEAP             | 2.890 s
=> Greedy           | 1.210 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 98.066 %
=> Greedy           | 78.020 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 94.213 %

Error rate: 0.15 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 32.330 s
=> LEAP             | 3.850 s
=> Greedy           | 1.640 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 93.424 %
=> Greedy           | 57.939 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 90.418 %

Error rate: 0.20 Total number of alignments: 1000000

[Time]
=> Needleman-Wunsch | 31.550 s
=> LEAP             | 4.470 s
=> Greedy           | 1.730 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 88.579 %
=> Greedy           | 46.023 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 88.289 %

Real Data

Dataset generated by illumina reads SRR611076.

  • Number of highways per alignment: ~3.46339795332125282947
  • Percentage of mismatch: ~0.02452309963366200179
  • Percentage of insert: ~0.00046834182131581764
  • Percentage of delete: ~0.00055319598705419218

Using AVX2,

[Time]
=> Needleman-Wunsch | 108.390 s
=> LEAP             | 4.440 s
=> Greedy           | 3.460 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 89.524 %
=> Greedy           | 92.674 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 97.829 %

Using SSE,

[Time]
=> Needleman-Wunsch | 106.850 s
=> LEAP             | 4.660 s
=> Greedy           | 2.610 s
[Accuracy] (percentage of alignments matching optimal penalty)
=> Needleman-Wunsch | 100.000 %
=> LEAP             | 89.524 %
=> Greedy           | 89.725 %
[Coverage] (percentage of alignments covering all long consecutive matches)
=> Greedy           | 96.843 %

About

A repository of experimental code and benchmark for greedy/dynamic programming approximate string matching algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages