Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGMLR very slow on bovine nanopore reads #70

Open
sdjebali opened this issue Oct 11, 2019 · 2 comments
Open

NGMLR very slow on bovine nanopore reads #70

sdjebali opened this issue Oct 11, 2019 · 2 comments

Comments

@sdjebali
Copy link

Dear all,

First of all, thanks for this very nice development.

I just wanted to report the fact that on some quite heavy ONT runs from bovine, NGMLR followed by sort was very slow (about 4 days for 4 million reads).

And I was wondering if I was using the tool correctly (right parameters)?

I tried with the first 1 million reads like this:
zcat $fastq | head -n 4000000 | ngmlr --presets ont -t 22 -r $genome
| samtools sort -@ 6 -o $output
and it took 5h23 to complete

I then tried with the second 1 million reads like this:
zcat $fastq | tail -n+4000000 | head -n 4000000 | ngmlr --presets ont -t
22 -r $genome | samtools sort -@ 4 -o $output
and it took 24h10 to complete

I am using NGMLR version 0.2.8 and samtools version 1.9, and here are the details about my machine :
Linux tatum 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08)
x86_64 GNU/
24 processors
Linuxprocessor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz

Any advice would be warmly welcome?

Best,
Sarah

@fritzsedlazeck
Copy link
Collaborator

Thanks Sarah,
do you have an average read length? Its likely but unfortunate that some of your 2nd patch reads are very long..
Thanks
Fritz

@sdjebali
Copy link
Author

Indeed there seems to be a big read length difference between the two batches.

I ran Nanoplot on them and here are the results :

  • First 1 Million reads:
    General summary:
    Mean read length: 4,722.5
    Mean read quality: 4.4
    Median read length: 906.0
    Median read quality: 4.2
    Number of reads: 1,000,000.0
    Read length N50: 14,404.0
    Total bases: 4,722,479,679.0
    Number, percentage and megabases of reads above quality cutoffs

Q5: 367454 (36.7%) 3015.3Mb
Q7: 8 (0.0%) 0.1Mb
Q10: 0 (0.0%) 0.0Mb
Q12: 0 (0.0%) 0.0Mb
Q15: 0 (0.0%) 0.0Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 7.0 (17272)
2: 7.0 (9848)
3: 7.0 (25242)
4: 7.0 (12091)
5: 7.0 (25093)
Top 5 longest reads and their mean basecall quality score
1: 2210466 (3.6)
2: 1850945 (3.8)
3: 1772717 (3.6)
4: 1685671 (3.9)
5: 1563326 (3.9)

  • second 1 Million reads
    General summary:
    Mean read length: 13,668.0
    Mean read quality: 11.1
    Median read length: 13,451.0
    Median read quality: 11.8
    Number of reads: 1,000,000.0
    Read length N50: 16,657.0
    Total bases: 13,668,019,254.0
    Number, percentage and megabases of reads above quality cutoffs

Q5: 963153 (96.3%) 13574.0Mb
Q7: 937982 (93.8%) 13387.4Mb
Q10: 781757 (78.2%) 10950.3Mb
Q12: 446035 (44.6%) 6333.8Mb
Q15: 165 (0.0%) 1.6Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 16.3 (2090)
2: 16.2 (243)
3: 16.1 (362)
4: 16.1 (570)
5: 16.1 (1509)
Top 5 longest reads and their mean basecall quality score
1: 884004 (3.7)
2: 274368 (5.2)
3: 187850 (4.8)
4: 150969 (3.8)
5: 124444 (9.8)

so 13kb vs 4kb

If we still want to use NGMLR on these data, is there any option that can speed the process up?

Best,
Sarah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants