-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nanopore #34
Comments
I was thinking same thing when I saw your tweet this morning. Probably a few mods and it should work, but right out of the box I'm less sure of unless I had some data. Quality trimming with expected errors won't work as it is too stringent and currently AMPtk is using that for all "clustering" steps. But you could of course set that really high to bypass quality trimming altogether. I thought I saw a paper on bioxriv about a nanopore pipeline for 16S (this seems really rough around the edges though https://github.com/umerijaz/nanopore) and I think Schloss has one for PacBio reads recently - which means its in mothur and is horrible to use.... I'm not sure how to deal with the "clustering" or if you need to keep each sequence separate or not. But basically there isn't a reason you couldn't run normal clustering at say 97% and see what happens. The reference based clustering could be useful in AMPtk as well. If you have some data and want to share I can see if it works and see if I need to make some tweaks or not. It would probably only take a few hours to get something that would work at least work well enough to get through some testing. |
In the past I've used PoreChop https://github.com/rrwick/Porechop for demuxing and adapter trimming (Ryan writes really nice tools). I think I would not quality trim the data at all actually, it would be better to leave the ends intact and make sure you can find primers (if there are any) -- then you know what a full length read looks like. |
If I remember correctly, both PoreChop and now Albacore demux files into separate folders. So something like how |
Actually I think this already exists, the
So in the above folder, if you ran this command it should label everything:
This would then relabel all sequences in You could then take the resulting
Note this will keep singletons which you probably want to do (default is --minsize 2). You could get a better idea about what expected error value to use by running the following command on your input reads and investigating a little bit:
This will tell you how many reads would be retained at various EE values and lengths. |
Thanks Jon, |
Yeah that would be great. I have a nanopore, but haven't used it for amplicons. I probably got one a little too early where data wasn't as good as what I read comes off now. Seems like the kits/technology change overnight... |
@devonorourke @nextgenusfs Any updates on using amptk with nanopore? I have been playing around a little with my nanopore amplicons (16S, 16S23S, ITS, 18S) but I am not able to get nice clustering so I thought you might have some furter recommendations. |
@druvus I haven't seen any data yet, so I haven't looked at it specifically. Should be able to come up with a method if a mock community was sequenced - anybody know if that data is public somewhere? Reference based clustering in theory should work, although probably the best aligner would be |
I have generated a tiny bit of 16-S data a few months ago; totally failed experiment I was trying as part of a high school 1-week workshop (apparently bad reagents killed 3 flow cells in a day... ouch). It generated maybe 2000 total reads, so probably not enough to really flesh out how well amptk can handle these kinds of data. If you wanted to do a de novo approach first, you could try miniasm on the front end. You won't be looking for forward/reverse primers, I don't think, will you? If you're base-calling with nanopore data, your first task is converting the raw signal from a .fast5 file to a .fastq; at that point you'll have your demultiplexed dataset. Porechop will also demultiplex if you want to; nevertheless you should probably have already split your reads before assembling or mapping. Cheers, |
Well let me know if you find some data that has a mock community... While it certainly depends on your experimental goal, I'm assuming here that we are talking about PCR amplicons -- but I very much would use the forward/reverse conserved priming regions to enforce "full-length" sequences for "OTU-picking" (to use classical terminology). I don't know what length amplicons you are talking about here? If 1.5 kb or so, should be easy for Nanopore to sequence across the entire length of these amplicons. While porechop would be good to remove adapter sequence - I'm assuming that your initial PCR region specific primers would still be intact - so then 1) pick out only sequences that are full-length 2) run dereplication, 3) cluster, 4) map reads to "OTUs" using minimap2. Would need to write a PAF/SAM to OTU_table script but that shouldn't be too difficult I wouldn't think. I would be concerned with using something like miniasm as there should be many sequences in a community that are 95-97% identical yet are unique OTUs, so not sure about collapsing/assembling those reads would yield the desired result. |
I should add -- we have a minion, but I have only tried to use for long reads for genome assembly and have not run any of the PCR/amplicon procedures -- so I'm not very familiar with the adapters/primers/etc. |
Is there anything in the structure of amptk that would prohibit using Nanopore amplicon data as input?
Fragment lengths are ~1500 bp. Input fastq are already adapter and quality trimmed and demultiplexed. Curious about using amptk for some comparisons with the clustering and classification steps.
Sound crazy?
The text was updated successfully, but these errors were encountered: