Removal of contamination based on a negative control occurs when all of "P = NA" #141

1023011930 · 2024-01-09T07:50:56Z

metadata.csv
pivot_data.csv
Thank you very much for your great contribution to data cleansing!
The abundance table I used was the TPM abundance of different contigs
The code I use is：

setwd("mydata/")
cran_packages <- c("reshape2", "ggplot2", "vegan", "stringr", "gridExtra", "ape", "RColorBrewer", "dplyr", "knitr","cowplot","openxlsx", "circlize", "plotly")
bioc_packages <- c("phyloseq","decontam","ComplexHeatmap")
sapply(c(cran_packages, bioc_packages), require, character.only = TRUE)

otu <- read.csv("pivot_data.csv", row.names = 1, check.names = F)
mapp <- read.csv("metadata.csv", row.names = 1)
otu <- otu_table(as.matrix(otu), taxa_are_rows = T)
mapp <- sample_data(mapp)
ps <- phyloseq(otu, mapp)
head(sample_data(ps))

df <- as.data.frame(sample_data(ps)) # Put sample_data into a ggplot-friendly data.frame
df$LibrarySize <- sample_sums(ps)
df <- df[order(df$LibrarySize),]
df$Index <- seq(nrow(df))
ggplot(data=df, aes(x=Index, y=LibrarySize, color=Sample_or_Control)) + geom_point()

sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"
contamdf.prev <- isContaminant(ps, method="prevalence", neg="is.neg", threshold=0.5)
table(contamdf.prev$contaminant)

FALSE
3257

head(which(contamdf.prev$contaminant))

integer(0)

And the result shown as this image

I finded that all the “p.freq p.prev p” = NA
My sincere question to you, is my metadata and abundance table not done properly, or is there something wrong with my code, or is my data really not contaminated?
If you could answer this question for me, I would be very grateful!

1023011930 · 2024-01-09T07:53:21Z

the PS data goes like this

1023011930 · 2024-01-09T12:00:16Z

Also, my study is a low biomass study, does it need to be changed to isNotContaminant

1023011930 · 2024-01-09T12:15:35Z

setwd("/")
library(phyloseq)
library(decontam)

mapp <- read.table("metadata.csv", sep=",", row.names=1, header=T, comment.char="")
otu <- as.matrix(read.table("pivot_data.csv", sep=",", row.names=1, header=T, check.names=F))

otu <- otu_table(otu, taxa_are_rows=T)
mapp <- sample_data(mapp)
ps <- phyloseq(otu, mapp)
head(sample_data(ps))
sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"
contamdf.notcontam <- isNotContaminant(ps, method="prevalence", neg="is.neg", threshold=0.5,detail=TRUE)
write.table(contamdf.notcontam, file="contamdf_notcontam.txt", sep="\t", quote=F, row.names=T)

Is there any error in my isNotContaminant code? Because the export is also wrong

benjjneb · 2024-01-11T03:57:59Z

You are using contig-level data here. Are the contigs defined separately for each sample?

1023011930 · 2024-01-11T04:41:48Z

M10 is one of my samples, I used contigs.fa assembled by M10 as index for decontamination test. Then, I used bwa-mem2 to mapping the clean_data of 64 samples (including four NC samples) back to contigs.fa, and calculated the TPM using samtools and my own python code. Finally, I got "pivot_data.csv"

1023011930 · 2024-01-11T04:41:55Z

Thank you very much for the help you can give me, it will be very helpful!

benjjneb · 2024-01-11T04:45:49Z

In the feature table you are working with, are there every any features observed in more than one sample? Or is every feature specific to a sample?

otu <- as.matrix(read.table("pivot_data.csv", sep=",", row.names=1, header=T, check.names=F))
table(rowSums(otu>0))
table(colSums(otu>0))

1023011930 · 2024-01-11T04:52:48Z

1023011930 · 2024-01-11T04:55:04Z

I think, every feature specific to a Sample(M10). Because in this test, the TPM is counted by the contig.fa assembled by M10_clean_data.fq

1023011930 · 2024-01-11T05:07:51Z

Or you mean the present times of each contig_name in the CSV table?

benjjneb · 2024-01-11T05:14:16Z

Your table(colSums(otu>0)) results yield mostly 1s, and a few 2s.

Let's nail this down: Does that mean that your features (contigs) are only being observed as present in 1 or 2 samples?

1023011930 · 2024-01-11T05:34:07Z

Your answer made me think. 这是我输入的csv文件，可以看到“M10_1001_510”在各个样品中的TPM都不为零。

Also, for each column, my data is barely zero. So "table(colSums(otu>0))" results in mostly 1's, and I don't know why, is there something wrong with the way I'm importing the data?

1023011930 · 2024-01-11T05:36:21Z

Thank you very much for the help you can give me, it will be very helpful!

1023011930 · 2024-01-11T05:40:22Z

In the "Introduction to decontam" the import data is [eadRDS(system.file("extdata", "MUClite.rds", package="decontam"))], RDS data, so i am a little confuse about how to import no-RDS data or PS data

1023011930 · 2024-01-11T06:53:32Z

After your reminder, I redid the data import
metadata.csv
pivot_datawen.csv
setwd("")
library(phyloseq)
library(decontam)

otu_matrix <- read.csv("pivot_datawen.csv", row.names = 1)
sample_data <- read.csv("metadata.csv", row.names = 1)

otu_table <- otu_table(otu_matrix, taxa_are_rows = TRUE)
sample_data <- sample_data(sample_data)
physeq <- phyloseq(otu_table, sample_data)
saveRDS(physeq, "physeq.rds")
ps <- readRDS("physeq.rds")

sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"
contamdf.notcontam <- isNotContaminant(ps, method="prevalence", neg="is.neg", threshold=0.5,detail=TRUE)
write.table(contamdf.notcontam, file="contamdf_notcontam.txt", sep="\t", quote=F, row.names=T)

However，P also = NA in all contigs

1023011930 · 2024-01-12T08:03:58Z

Finally I find the bug, and we should change it to "Control"

I this test, Is "not.contaminant:FALSE" means this contig is a contamination?
Thanks again for taking the time to mentor me！！！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removal of contamination based on a negative control occurs when all of "P = NA" #141

Removal of contamination based on a negative control occurs when all of "P = NA" #141

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024 •

edited

Loading

1023011930 commented Jan 12, 2024

Removal of contamination based on a negative control occurs when all of "P = NA" #141

Removal of contamination based on a negative control occurs when all of "P = NA" #141

Comments

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

1023011930 commented Jan 9, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

benjjneb commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024

1023011930 commented Jan 11, 2024 • edited Loading

1023011930 commented Jan 12, 2024

1023011930 commented Jan 11, 2024 •

edited

Loading