Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Analysis Example: Simpler example of gene clustering -- k-means for microarray #351

Open
cansavvy opened this issue Nov 11, 2020 · 5 comments

Comments

@cansavvy
Copy link
Contributor

cansavvy commented Nov 11, 2020

What are the goals of this new example analysis?

Currently ORA in microarray uses differential expression results and the RNA-seq one (not yet developed #344) will end up using a gene module from WGCNA.

However, we should give users a more basic way to find gene clusters from data. Some users may find WGCNA a bit daunting (it also requires some computing power). And it may be more than what a user needs for their particular question, so an example that shows something like k-means.

What kind of dataset will this need?

Something with enough samples that a cluster would make some kind of sense.
I think GSE37382 which is medulloblastoma with subgroups and is used for dimension reduction seems like a reasonable dataset to use for this too.

What steps should be included in this analysis?

These are the roughest ideas of steps I have right now that will need to be made more specific and further polished when we dig into this example more.

  1. Import data and metadata
  2. Use k-means function
  3. Do some exploration into how "well" k-means ran -- unclear to me without doing a bit more digging what this looks like. It may be as simple as printing out some kind of summary stats.
  4. May want to run more iterations and see if you get the same-ish results?
  5. Get some kind of annotation for the genes that you can use as a test for seeing if your gene clustering seems sensible. This could be something like GO terms (But maybe not GO terms since they overlap so much).
  6. Probably plot gene-wise PCA and label the k-means clusters as colors and another form of gene annotation as shapes and see if it makes sense.

What packages/methods do you recommend using or looking into for this analysis?

May not need extra packages besides magrittr, and tidyverse ones (which are assumed everywhere). Both k-means and prcomp are in base R.

Note if/when this issue is completed, the ORA example should be updated to use this output (this should be its own issue and PR).

@cansavvy cansavvy changed the title New Analysis Example: Microarray simpler example of gene clustering -- k-means New Analysis Example: Simpler example of gene clustering -- k-means for microarray Nov 11, 2020
@cansavvy
Copy link
Contributor Author

If all goes alright with this example, it can be made into an RNA-seq version as well which will require additional steps for DESeq2 transformation.

@cansavvy
Copy link
Contributor Author

I think this example could just as easily use KNN if we think that would be better for a particular reason.

@jaclyn-taroni
Copy link
Member

My gut tells me that this is not going to simpler than WGCNA from an explanation point of view, to be honest. Particularly the part about picking k...

@cansavvy
Copy link
Contributor Author

My gut tells me that this is not going to simpler than WGCNA from an explanation point of view, to be honest. Particularly the part about picking k...

I agree its not simply "plug and chug" but at least its mainly k and not 4-5 other parameters? I think its more straightforward than WGCNA, but that's because WGCNA has a lot of pieces in comparison.
If we don't like k-means, do you have an even simpler suggestion for finding gene groups?

@jaclyn-taroni
Copy link
Member

No, not really. I think whenever you're going to talk about number of clusters or cluster validation it's going to be tricky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants