Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with the decontamination of metagenomic data or viral data #136

Open
1023011930 opened this issue Sep 23, 2023 · 6 comments
Open

Comments

@1023011930
Copy link

Dear Sir or Madam, I appreciate your significant contribution in decontaminating the sequencing data.
I haven't used decontam yet, but I think most of the examples were researched based on ASV/OTU relative abundance tables
As for metagenomic data,As I understand it, removal can be done by constructing MAG abundance tables
But viral data are difficult to bin and thus constitute a MAG, and they are often found in the form of a Contig. I would like to ask if decontam can decontaminate based on contig abundance (is this scientific?)
From what I've read, the "https://www.nature.com/articles/s41586-020-2192-1" article uses Karken categorical data as the basis for decontaminated decontamination, is this a good approach?
Thank you very much for giving me some instructions!

@1023011930
Copy link
Author

To summarize, my problem is that some of the macrogenomic data is difficult to binning into "metaOTU", such as viral data (the kind of ASV relative abundance table that can't generate normal 16s data). What should I do with these data in the macrogenome to use them as input files for decontam?

@benjjneb
Copy link
Owner

You can use decontam with any feature type that has a relative abundance in each sample. This includes contigs.

@1023011930
Copy link
Author

You can use decontam with any feature type that has a relative abundance in each sample. This includes contigs.

Does this mean that I can use software such as "BWA OR bowtie2" to quantify the contig, then calculate the RPM or TPM, and then use them as the relative abundance for decontam? Thanks for your kindness!

@1023011930
Copy link
Author

1023011930 commented Sep 27, 2023

In my perception, a standardized contig abundance scale is not the same as a relative abundance scale like 16s.

In my perception, a 16s abundance table is where each sample sums to 100% and each OTU takes up a portion of the 100%, and the percentage(0-100%) taken up is the data

Whereas standardized contig abundance tables generally use the TPM (OR read count) of each contig as the data,So their values are not necessarily in the (0-100) range

It seems to me that these two abundance tables are not the same, may I ask how the contig abundance table is generally handled, if there is a corresponding tutorial or literature I would be very grateful!
Thank you for your answer.

@benjjneb
Copy link
Owner

benjjneb commented Oct 7, 2023

TPM is also a relative abundance. You can use it just the same.

A "relative abundance" measure, is any metric that informs about the abundance of this relative to that. If the TPM of contig X is doubel that of contig Y, then X has double the relative abundance (by this measure) of contig Y.

@1023011930
Copy link
Author

TPM is also a relative abundance. You can use it just the same.

A "relative abundance" measure, is any metric that informs about the abundance of this relative to that. If the TPM of contig X is doubel that of contig Y, then X has double the relative abundance (by this measure) of contig Y.

I think I see what you mean, I will try using standardized contig relative abundance, thank you very much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants