Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to use controls for multiple batches? #135

Open
DrLCode opened this issue Sep 19, 2023 · 1 comment
Open

Best way to use controls for multiple batches? #135

DrLCode opened this issue Sep 19, 2023 · 1 comment

Comments

@DrLCode
Copy link

DrLCode commented Sep 19, 2023

Hi everyone,

I'm planning on using Decontam (prevalence approach) to identify and filter possible contaminants from skin microbiome sequencing data and was wondering if anyone would be able to provide some input as to the best way to utilise my controls for this. For context I have ~120 animal samples, a negative control collected each day of sample collection (N=9), a negative extraction control from each extraction batch (N=13) and a PCR/sequencing control from each sequencing batch (N=2).

I had originally planned to subset my data manually into the appropriate batches, run decontam, prune and reassemble before repeating for the next set of controls (i.e. decontam the 2 sequencing batches separately, then decontam the 13 extraction batches separately and finally decontam the 9 collection days separately). From reading available information and looking through this forum it seems that plan would be ill advised on account of there being only one control in each analysis.

These samples are from a multiple animals with different characteristics, and collected on separate occasions, so I'm apprehensive to run decontam on all samples and all controls together, as a legitimate contaminant that appeared in a single control and all the samples that control directly applies to could be lost amongst the rest.

Would splitting my samples into subsets that include all 3 controls associated with each sample (creating lots of very small subsets) be an appropriate approach?

Any comments are much appreciated!

@benjjneb
Copy link
Owner

It is not advised to split into many small subsets.

The prevalence method relies on multiple negative control samples to have the statistical power to effectively discriminate between contaminants and non-contaminants. Although the animals might be different, if you are using the same measurement protocol throughout, the contaminants being introduced should be consistent, and that is what is important to the decontam method.

Furthermore, the most effective negative controls to use with decontam (or other contaminant ID methods) are those that went through as much of your measurement methodology as possible, ideally a sterile sampling instrument that was exposed to the sampling environment but used to actually perform sampling. That sounds like it might correspond to your "negative control collected each day of sample collection". Other types of negative controls introduces later are not as effective, as they can only inform about contaminants introduced after that step in the measurement process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants