-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removal of contamination based on a negative control occurs when all of "P = NA" #141
Comments
Also, my study is a low biomass study, does it need to be changed to isNotContaminant |
You are using contig-level data here. Are the contigs defined separately for each sample? |
M10 is one of my samples, I used contigs.fa assembled by M10 as index for decontamination test. Then, I used bwa-mem2 to mapping the clean_data of 64 samples (including four NC samples) back to contigs.fa, and calculated the TPM using samtools and my own python code. Finally, I got "pivot_data.csv" |
Thank you very much for the help you can give me, it will be very helpful! |
In the feature table you are working with, are there every any features observed in more than one sample? Or is every feature specific to a sample?
|
I think, every feature specific to a Sample(M10). Because in this test, the TPM is counted by the contig.fa assembled by M10_clean_data.fq |
Your Let's nail this down: Does that mean that your features (contigs) are only being observed as present in 1 or 2 samples? |
Thank you very much for the help you can give me, it will be very helpful! |
In the "Introduction to decontam" the import data is [eadRDS(system.file("extdata", "MUClite.rds", package="decontam"))], RDS data, so i am a little confuse about how to import no-RDS data or PS data |
After your reminder, I redid the data import otu_matrix <- read.csv("pivot_datawen.csv", row.names = 1) otu_table <- otu_table(otu_matrix, taxa_are_rows = TRUE) sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample" |
metadata.csv
pivot_data.csv
Thank you very much for your great contribution to data cleansing!
The abundance table I used was the TPM abundance of different contigs
The code I use is:
setwd("mydata/")
cran_packages <- c("reshape2", "ggplot2", "vegan", "stringr", "gridExtra", "ape", "RColorBrewer", "dplyr", "knitr","cowplot","openxlsx", "circlize", "plotly")
bioc_packages <- c("phyloseq","decontam","ComplexHeatmap")
sapply(c(cran_packages, bioc_packages), require, character.only = TRUE)
otu <- read.csv("pivot_data.csv", row.names = 1, check.names = F)
mapp <- read.csv("metadata.csv", row.names = 1)
otu <- otu_table(as.matrix(otu), taxa_are_rows = T)
mapp <- sample_data(mapp)
ps <- phyloseq(otu, mapp)
head(sample_data(ps))
df <- as.data.frame(sample_data(ps)) # Put sample_data into a ggplot-friendly data.frame
df$LibrarySize <- sample_sums(ps)
df <- df[order(df$LibrarySize),]
df$Index <- seq(nrow(df))
ggplot(data=df, aes(x=Index, y=LibrarySize, color=Sample_or_Control)) + geom_point()
sample_data(ps)$is.neg <- sample_data(ps)$Sample_or_Control == "Control Sample"
contamdf.prev <- isContaminant(ps, method="prevalence", neg="is.neg", threshold=0.5)
table(contamdf.prev$contaminant)
FALSE
3257
head(which(contamdf.prev$contaminant))
integer(0)
And the result shown as this image
I finded that all the “p.freq p.prev p” = NA
My sincere question to you, is my metadata and abundance table not done properly, or is there something wrong with my code, or is my data really not contaminated?
If you could answer this question for me, I would be very grateful!
The text was updated successfully, but these errors were encountered: