Problem with taxa assignation #1971

martina-nasuelli · 2024-06-13T07:29:20Z

Hello Ben and team,
I am having trouble with the taxa assignation using dada2 and the MIDORI reference database. I've sequenced a Metazoan COI library based on Leray et al. 2013 primers targeting an amplicon of 313 bp.
Based on the quality profile plots (attached), I've set these filters and parameters to get the taxa assignation:
out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, trimLeft = c(0,26), truncLen = c(240,180),
truncQ = 5, maxN = 0, maxEE = c(2,2), rm.phix=TRUE, compress=TRUE,
multithread=TRUE)

errF <- learnErrors(filtFs, multithread=TRUE)
errR <- learnErrors(filtRs, multithread=TRUE)

mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose=TRUE, minOverlap = 12, returnRejects = TRUE)

seqtabfilt.nochim <- removeBimeraDenovo(seqtabfilt, method="consensus", allowOneOff = TRUE, multithread=TRUE, verbose=TRUE)

head(track)
input filtered denoisedF denoisedR merged nonchim
R1.0A 77811 53902 51110 52772 50672 33272
R1.0C 82972 58557 55671 57493 55300 40178
R1.0E 52498 45084 42800 44256 42565 24863
R1.0H 58371 43242 40987 42617 40833 24986
R1.0L 39734 33325 31069 32586 30850 21042
R1.1A 55957 41674 38200 39987 37869 26640

sum(seqtabfilt.nochim)/sum(seqtabfilt)
[1] 0.7844993

taxa.MIDORICOI <- assignTaxonomy(seqtabfilt.nochim, "../MIDORI2_UNIQ_NUC_GB259_CO1_DADA2.fasta.gz", tryRC = TRUE,
taxLevels = c("Phylum","Class","Order","Family","Genus","Species"),
minBoot = 50, multithread=TRUE)

After these steps, the taxa assignation I've got is mostly NA or no assignation lower than Order.
In the attached files you can find the quality profile plots of the R1 and R2 fastqs and the taxa and count csv I've obtained.
What I am missing here? Is there something I have done wrong in the library preparation or I am analyzing the data incorrectly?
COI_analysis.zip

Thank you in advance,

Martina Nasuelli

benjjneb · 2024-06-13T20:14:38Z

I would be a bit concerned about the fraction of reads you are losing in the chimera removal step. I'm not familiar with these primers and library preparation -- is it true that the reverse primer is 26nts long and is sequenced, but that the forward primer is not sequenced? That is what it looks like based on your filterAndTrim(..., trimLeft=c(0,26)) choice.

In addition, why have you chosen mergePairs(..., returnRejects=TRUE)? This is really only intended for troubleshooting, not as part of a normal workflow. removeBimeraDenovo(..., allowOneOff=TRUE) is also not a recommended option anymore.

I am not an expert on CO1 metabarcoding, but I would start by BLAST-ing some of your top ASVs against a broad database like nt, and comparing the best hits to what you are getting from taxonomic assignment against MIDORI. Do they mostly match up? Or is there an obvious mismatch even among the abundant ASVs?

martina-nasuelli · 2024-06-14T06:38:28Z

Hello Benjamin, I've rerun the dada2 script based on your suggestions, adding the Trimleft option for the forward primer (26bp). I've obtained a higher percentage of non-chimeras (0.9928853), but still got no taxonomy for my ASV. Yes, I've blasted the generated sequences. The primers I used are meant to amplify the CO1 gene for Metazoan; most of the best matches I obtain from BLAST, in terms of query over and similarity, are Fungi. I am referring to 98% query cover and 90% similarity, which is very low. Some Metazoan are returned with 87-88% of similarity. Thank you again *Martina Nasuelli, *PhD student Università del Piemonte Orientale Dipartimento per lo Sviluppo Sostenibile e la Transizione Ecologica (DiSSTE) Piazza Sant'Eusebio 5, 13100 - Vercelli, Italy Google Scholar <https://scholar.google.com/citations?hl=it&view_op=list_works&gmla=AJsN-F7yLKtw4EQIsjnfDMRSWzFmyF4E48O6LKKYsKLNV08MRAaezbKyZIVeZwdEvgWSd4-IHaEipDdykvftRsRxTOoB9XBWScIzsMYs5xaa-k21Fme9Sotb2Efe7Dt6_RfK1K9ey5WisciHR7W4GvHjF8Ct-gPvbA&user=NsdvzOIAAAAJ> ResearchGate <https://www.researchgate.net/profile/Martina-Nasuelli-2> Twitter <https://twitter.com/MNasuelli> Il giorno gio 13 giu 2024 alle ore 22:15 Benjamin Callahan < ***@***.***> ha scritto:

…

I would be a bit concerned about the fraction of reads you are losing in the chimera removal step. I'm not familiar with these primers and library preparation -- is it true that the reverse primer is 26nts long and is sequenced, but that the forward primer is not sequenced? That is what it looks like based on your filterAndTrim(..., trimLeft=c(0,26)) choice. In addition, why have you chosen mergePairs(..., returnRejects=TRUE)? This is really only intended for troubleshooting, not as part of a normal workflow. removeBimeraDenovo(..., allowOneOff=TRUE) is also not a recommended option anymore. I am not an expert on CO1 metabarcoding, but I would start by BLAST-ing some of your top ASVs against a broad database like nt, and comparing the best hits to what you are getting from taxonomic assignment against MIDORI. Do they mostly match up? Or is there an obvious mismatch even among the abundant ASVs? — Reply to this email directly, view it on GitHub <#1971 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BJEVCRCDKICUZHZTR3AO5CLZHH4UHAVCNFSM6AAAAABJHZL2QWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGY4DMMZRGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

benjjneb · 2024-06-17T17:30:11Z

The primers I used are meant to amplify the CO1 gene for Metazoan; most of the best matches I obtain from BLAST, in terms of query over and similarity, are Fungi. I am referring to 98% query cover and 90% similarity, which is very low. Some Metazoan are returned with 87-88% of similarity.

One possibility is that these primers are not performing as expected, and so you are getting lots of off-target amplification. That would be my first guess based on what you describe here. It may also be worth looking at the alignments for your top hits --- are the mismatches clustered to a specific part of the reads? If so, it could be some technical issue that is solvable. If not, it is consistent with the non-specific priming idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with taxa assignation #1971

Problem with taxa assignation #1971

martina-nasuelli commented Jun 13, 2024

benjjneb commented Jun 13, 2024

martina-nasuelli commented Jun 14, 2024 via email

benjjneb commented Jun 17, 2024

Problem with taxa assignation #1971

Problem with taxa assignation #1971

Comments

martina-nasuelli commented Jun 13, 2024

benjjneb commented Jun 13, 2024

martina-nasuelli commented Jun 14, 2024 via email

benjjneb commented Jun 17, 2024