-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences in chimera detection based on dataset structure #1942
Comments
The (it would be good to have clearer documentation on that, or maybe a warning message when pooling method and denoising method are misaligned) |
In all three cases, I used the parameter |
I have samples sequenced targeting the same 16S partial region from two different institutions, with about 120 and 300 samples respectively. I'm unsure how to correct for batch effects, so the first thing I did was try the following three commands to see what difference pooling makes:
qiime dada2 denoise-paired --i-demultiplexed-seqs
A_institution.qza (120 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs
B_institution.qza (300 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs
A+B_institution.qza (420 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
In (1), chimeras were detected and filtered out, but in (2) and (3) cases, chimeras were not detected at all in any of the samples,
![image](https://private-user-images.githubusercontent.com/31500750/327127485-de5cba98-5ddf-43e6-949a-573929b832d0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIzODMyNTgsIm5iZiI6MTcyMjM4Mjk1OCwicGF0aCI6Ii8zMTUwMDc1MC8zMjcxMjc0ODUtZGU1Y2JhOTgtNWRkZi00M2U2LTk0OWEtNTczOTI5YjgzMmQwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MzAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzMwVDIzNDIzOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgxYjlkZjFhYzU4NTU5ZmY2YjhjMDMyNjVjNWIwNjM0NTE3ODE2YmJhZmZhYzFlYzhjYmRmYjJiMWQ1OTFjNjUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.RNsCKZGvvOElv_7QOuDr_q_VhHa8yVoCow5V07_t3fQ)
i.e., all samples in A_institution (120samples) >
the count after merging in (1) ≈ the count after merging in (3) = the count after chimera removal (3) > the count after chimera removal in (1).
Are there any parameters I need to adjust for chimera detection when using large dataset? Or could there be other causes?
The sequencing quality plots for raw data from two institutions are similar, but institution A has an average read count of 40,000 while institution B has an average read count of 170,000, a difference of about 4x.
The text was updated successfully, but these errors were encountered: