Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevalence method in Decontam not Identifying contaminants with a single control sample #152

Open
JayalalKJ opened this issue Oct 30, 2024 · 1 comment

Comments

@JayalalKJ
Copy link

JayalalKJ commented Oct 30, 2024

Hi, I have one control sample, and the prevalence method in Decontam is not effectively identifying contaminants. The p-values seem distributed evenly, and the isContaminant() function isn’t marking many sequences as contaminants even with an aggressive threshold. I need advice on how to proceed or any alternative approaches or changes to the following code.

attempted -> (e.g., using threshold=0.3).

identification of control samples
sample_data(physeq)$is.neg <- sample_data(physeq)$Sample_or_Control == "Control Sample"

Identify contaminants using the prevalence method with an aggressive threshold
contamdf.prev <- isContaminant(physeq, method = "prevalence", neg = "is.neg", threshold = 0.3)
table(contamdf.prev$contaminant)

visualize prevalence in positive vs negative controls
ps.pa <- transform_sample_counts(physeq, function(abund) 1 * (abund > 0))
ps.pa.neg <- prune_samples(sample_data(ps.pa)$Sample_or_Control == "Control Sample", ps.pa)
ps.pa.pos <- prune_samples(sample_data(ps.pa)$Sample_or_Control == "True Sample", ps.pa)

Create a data frame for visualization
df.pa <- data.frame(pa.pos = taxa_sums(ps.pa.pos), pa.neg = taxa_sums(ps.pa.neg),
contaminant = contamdf.prev$contaminant)

Plot the prevalence of taxa in positive vs negative controls
ggplot(data = df.pa, aes(x = pa.neg, y = pa.pos, color = contaminant)) +
geom_point() +
xlab("Prevalence (Negative Controls)") +
ylab("Prevalence (True Samples)")

Prune contaminants
physeq_clean <- prune_taxa(!contamdf.prev$contaminant, physeq)

@JayalalKJ JayalalKJ changed the title prevalence Method in Decontam not Identifying contaminants with a single control sample prevalence method in Decontam not Identifying contaminants with a single control sample Oct 30, 2024
@benjjneb
Copy link
Owner

I have one control sample

decontam-prevalence is not an appropriate method for use when you have only one negative control sample. It relies on repeated observation of contaminants across multiple negative controls. We recommend a minimum of 5, see our original paper for more on that. https://doi.org/10.1186/s40168-018-0605-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants