-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization/Log2 transformation requirements #28
Comments
Hi @mallorymaynes , ImpulseDE2 uses a negative binomial noise model which comes with assumptions on data distribution and is built for count (ie non-normalised, non-logged, integer) data. This type of statistical modelling still works if your data transform does not validate the count data structure too much, log-ing will cause major issues most likely, for example. Assuming that your transforms dont change the statistics too much, it may work, it would be better to use count data and to supply size factors for scale the model. Filtering genes does not affect the model fits of the other genes if you define size factors. |
Thank you, this is very helpful. It sounds like I should instead use my raw counts and include the estimated factors of unwanted variation generated by RUVg - is that what you mean by supplying factors to scale the model? |
Hi David, I am still a little confused about how to input my RUVseq factors of unwanted variation into ImpulseDE2. Specifically, the output for RUVseq (called "W_1") is used as a covariate in DESeq2 or edgeR models, such that the full model for a time course in DESeq2 would be "~ W_1 + time + treatment + treatment:time," and the reduced would be: "~ W_1 + treatment + time." Given this, how do I correctly integrate W_1 into ImpulseDE2? Would this be considered vecConfounders, size factors, or something I can integrate in the dfAnnotation? Thanks for your help, it is much appreciated! |
This would be an element of |
Hello and thanks for developing this model. I read in the supplemental materials that the G x S matrix for RNAseq data should be filtered for low counts, normalized, and also log2 transformed before running the model. It also gives RPKM and TPM as suggestions for the normalization, however I would like to use upper-quantile normalized counts generated by RUVg so I can include my use my spike-ins easily. Will this be a problem? So far I have filtered low count genes and extracted the normalized counts from RUVg, log2 transformed them, and rounded so they are integers. I want to be sure I am understanding correctly and that my normalization procedure checks out (and also that I'm not over-normalizing).
Thanks!
The text was updated successfully, but these errors were encountered: