-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Analysis Example: Simpler example of gene clustering -- k-means for microarray #351
Comments
If all goes alright with this example, it can be made into an RNA-seq version as well which will require additional steps for DESeq2 transformation. |
I think this example could just as easily use KNN if we think that would be better for a particular reason. |
My gut tells me that this is not going to simpler than WGCNA from an explanation point of view, to be honest. Particularly the part about picking k... |
I agree its not simply "plug and chug" but at least its mainly |
No, not really. I think whenever you're going to talk about number of clusters or cluster validation it's going to be tricky. |
What are the goals of this new example analysis?
Currently ORA in microarray uses differential expression results and the RNA-seq one (not yet developed #344) will end up using a gene module from WGCNA.
However, we should give users a more basic way to find gene clusters from data. Some users may find WGCNA a bit daunting (it also requires some computing power). And it may be more than what a user needs for their particular question, so an example that shows something like k-means.
What kind of dataset will this need?
Something with enough samples that a cluster would make some kind of sense.
I think
GSE37382
which is medulloblastoma with subgroups and is used for dimension reduction seems like a reasonable dataset to use for this too.What steps should be included in this analysis?
These are the roughest ideas of steps I have right now that will need to be made more specific and further polished when we dig into this example more.
What packages/methods do you recommend using or looking into for this analysis?
May not need extra packages besides
magrittr
, and tidyverse ones (which are assumed everywhere). Bothk-means
andprcomp
are in base R.Note if/when this issue is completed, the ORA example should be updated to use this output (this should be its own issue and PR).
The text was updated successfully, but these errors were encountered: