Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with taxa assignation #1971

Open
martina-nasuelli opened this issue Jun 13, 2024 · 3 comments
Open

Problem with taxa assignation #1971

martina-nasuelli opened this issue Jun 13, 2024 · 3 comments

Comments

@martina-nasuelli
Copy link

Hello Ben and team,
I am having trouble with the taxa assignation using dada2 and the MIDORI reference database. I've sequenced a Metazoan COI library based on Leray et al. 2013 primers targeting an amplicon of 313 bp.
Based on the quality profile plots (attached), I've set these filters and parameters to get the taxa assignation:
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, trimLeft = c(0,26), truncLen = c(240,180),
truncQ = 5, maxN = 0, maxEE = c(2,2), rm.phix=TRUE, compress=TRUE,
multithread=TRUE)

errF <- learnErrors(filtFs, multithread=TRUE)
errR <- learnErrors(filtRs, multithread=TRUE)

mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE, minOverlap = 12, returnRejects = TRUE)

seqtabfilt.nochim <- removeBimeraDenovo(seqtabfilt, method="consensus", allowOneOff = TRUE, multithread=TRUE, verbose=TRUE)

head(track)
input filtered denoisedF denoisedR merged nonchim
R1.0A 77811 53902 51110 52772 50672 33272
R1.0C 82972 58557 55671 57493 55300 40178
R1.0E 52498 45084 42800 44256 42565 24863
R1.0H 58371 43242 40987 42617 40833 24986
R1.0L 39734 33325 31069 32586 30850 21042
R1.1A 55957 41674 38200 39987 37869 26640

sum(seqtabfilt.nochim)/sum(seqtabfilt)
[1] 0.7844993

taxa.MIDORICOI <- assignTaxonomy(seqtabfilt.nochim, "../MIDORI2_UNIQ_NUC_GB259_CO1_DADA2.fasta.gz", tryRC = TRUE,
taxLevels = c("Phylum","Class","Order","Family","Genus","Species"),
minBoot = 50, multithread=TRUE)

After these steps, the taxa assignation I've got is mostly NA or no assignation lower than Order.
In the attached files you can find the quality profile plots of the R1 and R2 fastqs and the taxa and count csv I've obtained.
What I am missing here? Is there something I have done wrong in the library preparation or I am analyzing the data incorrectly?
COI_analysis.zip

Thank you in advance,

Martina Nasuelli

@benjjneb
Copy link
Owner

I would be a bit concerned about the fraction of reads you are losing in the chimera removal step. I'm not familiar with these primers and library preparation -- is it true that the reverse primer is 26nts long and is sequenced, but that the forward primer is not sequenced? That is what it looks like based on your filterAndTrim(..., trimLeft=c(0,26)) choice.

In addition, why have you chosen mergePairs(..., returnRejects=TRUE)? This is really only intended for troubleshooting, not as part of a normal workflow. removeBimeraDenovo(..., allowOneOff=TRUE) is also not a recommended option anymore.

I am not an expert on CO1 metabarcoding, but I would start by BLAST-ing some of your top ASVs against a broad database like nt, and comparing the best hits to what you are getting from taxonomic assignment against MIDORI. Do they mostly match up? Or is there an obvious mismatch even among the abundant ASVs?

@martina-nasuelli
Copy link
Author

martina-nasuelli commented Jun 14, 2024 via email

@benjjneb
Copy link
Owner

The primers I used are meant to amplify the CO1 gene for Metazoan; most of the best matches I obtain from BLAST, in terms of query over and similarity, are Fungi. I am referring to 98% query cover and 90% similarity, which is very low. Some Metazoan are returned with 87-88% of similarity.

One possibility is that these primers are not performing as expected, and so you are getting lots of off-target amplification. That would be my first guess based on what you describe here. It may also be worth looking at the alignments for your top hits --- are the mismatches clustered to a specific part of the reads? If so, it could be some technical issue that is solvable. If not, it is consistent with the non-specific priming idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants