NGMLR very slow on bovine nanopore reads #70

sdjebali · 2019-10-11T08:50:50Z

Dear all,

First of all, thanks for this very nice development.

I just wanted to report the fact that on some quite heavy ONT runs from bovine, NGMLR followed by sort was very slow (about 4 days for 4 million reads).

And I was wondering if I was using the tool correctly (right parameters)?

I tried with the first 1 million reads like this:
zcat $fastq | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome
| samtools sort -@ 6 -o $output
and it took 5h23 to complete

I then tried with the second 1 million reads like this:
zcat $fastq | tail -n+4000000 | head -n 4000000 | ngmlr --presets ont -t
22 -r $genome | samtools sort -@ 4 -o $output
and it took 24h10 to complete

I am using NGMLR version 0.2.8 and samtools version 1.9, and here are the details about my machine :
Linux tatum 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08)
x86_64 GNU/
24 processors
Linuxprocessor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz

Any advice would be warmly welcome?

Best,
Sarah

fritzsedlazeck · 2019-10-15T13:23:30Z

Thanks Sarah,
do you have an average read length? Its likely but unfortunate that some of your 2nd patch reads are very long..
Thanks
Fritz

sdjebali · 2019-10-16T09:15:30Z

Indeed there seems to be a big read length difference between the two batches.

I ran Nanoplot on them and here are the results :

First 1 Million reads:
General summary:
Mean read length: 4,722.5
Mean read quality: 4.4
Median read length: 906.0
Median read quality: 4.2
Number of reads: 1,000,000.0
Read length N50: 14,404.0
Total bases: 4,722,479,679.0
Number, percentage and megabases of reads above quality cutoffs

Q5: 367454 (36.7%) 3015.3Mb
Q7: 8 (0.0%) 0.1Mb
Q10: 0 (0.0%) 0.0Mb
Q12: 0 (0.0%) 0.0Mb
Q15: 0 (0.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 7.0 (17272)
2: 7.0 (9848)
3: 7.0 (25242)
4: 7.0 (12091)
5: 7.0 (25093)
Top 5 longest reads and their mean basecall quality score
1: 2210466 (3.6)
2: 1850945 (3.8)
3: 1772717 (3.6)
4: 1685671 (3.9)
5: 1563326 (3.9)

second 1 Million reads
General summary:
Mean read length: 13,668.0
Mean read quality: 11.1
Median read length: 13,451.0
Median read quality: 11.8
Number of reads: 1,000,000.0
Read length N50: 16,657.0
Total bases: 13,668,019,254.0
Number, percentage and megabases of reads above quality cutoffs

Q5: 963153 (96.3%) 13574.0Mb
Q7: 937982 (93.8%) 13387.4Mb
Q10: 781757 (78.2%) 10950.3Mb
Q12: 446035 (44.6%) 6333.8Mb
Q15: 165 (0.0%) 1.6Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 16.3 (2090)
2: 16.2 (243)
3: 16.1 (362)
4: 16.1 (570)
5: 16.1 (1509)
Top 5 longest reads and their mean basecall quality score
1: 884004 (3.7)
2: 274368 (5.2)
3: 187850 (4.8)
4: 150969 (3.8)
5: 124444 (9.8)

so 13kb vs 4kb

If we still want to use NGMLR on these data, is there any option that can speed the process up?

Best,
Sarah

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGMLR very slow on bovine nanopore reads #70

NGMLR very slow on bovine nanopore reads #70

sdjebali commented Oct 11, 2019

fritzsedlazeck commented Oct 15, 2019

sdjebali commented Oct 16, 2019

NGMLR very slow on bovine nanopore reads #70

NGMLR very slow on bovine nanopore reads #70

Comments

sdjebali commented Oct 11, 2019

fritzsedlazeck commented Oct 15, 2019

sdjebali commented Oct 16, 2019