Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of frequency-based decontam method on Meta-transcriptomics data #148

Open
achald7867 opened this issue Jul 3, 2024 · 1 comment
Open

Comments

@achald7867
Copy link

Dear Benjamin,

Thank you so much for developing this wonderful tool. We are using it across multiple projects to identify contamination in our metagenomic datasets. Currently, we are analyzing a metatranscriptomics dataset, and we are wondering about the applicability of the frequency-based method. This method relies on the principle that sequences from contaminating features are likely to have frequencies that inversely correlate with sample DNA concentration. However, gene expression is influenced by environmental factors, such as antibiotics or other factors, which means this relationship probably doesn't hold true for metatranscriptomics data. Do you think we can use it for metatranscriptomics datasets?

Additionally, while using decontam on metagenomic datasets, is it better to use relative abundance that considers both known and unknown bacteria, or is focusing solely on known bacteria the best approach?

We appreciate any insights you can provide on these questions.

Best regards,
Achal Dhariwal, PhD

@benjjneb
Copy link
Owner

This is probably too late to matter, but you raise a couple good questions.

Currently, we are analyzing a metatranscriptomics dataset, and we are wondering about the applicability of the frequency-based method. This method relies on the principle that sequences from contaminating features are likely to have frequencies that inversely correlate with sample DNA concentration. However, gene expression is influenced by environmental factors, such as antibiotics or other factors, which means this relationship probably doesn't hold true for metatranscriptomics data. Do you think we can use it for metatranscriptomics datasets?

For the reasons you cite, I would also be concerned about using the frequency method. I'll add the difference between RNA and DNA, which is relevant if you are using DNA concentrations as the concentrations for the frequency method. When comparing across two different conditions, it is not uncommon, and maybe even to be expected, that there would be a different relative level of transcription and hence a different level of DNA to RNA. This is the kind of systematic difference across a tested condition that could cause problems with the frequency method.

Additionally, while using decontam on metagenomic datasets, is it better to use relative abundance that considers both known and unknown bacteria, or is focusing solely on known bacteria the best approach?

Known and unknown. Measurements of DNA concentration do not discriminate between known and unknown, so all bacteria is the right relative abundance measure to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants