-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create microarray gene ID conversion example #113
Comments
Since this module of gene ID conversion won't depend on the gene expression data itself, we might consider a linking out approach like is decribed for the kegg pathway module: #131 (comment) This may mean because we have microarray listed first, we need to switch the current gene ID conversion example to a microarray dataset and then link to that from RNA-seq. |
So we can partially address: #98 we should use a mouse dataset for this gene id conversion example. |
I filed a draft PR for this issue using a mouse glioma cancer stem cell dataset with n = 15 samples (which has been uploaded to the S3 bucket for testing/review). This produces a mapped data frame that looks like: If we want to stick with a smaller dataset, we could use this transcription profiling of mouse glioma cell line dataset with n = 4 samples, which would produce a mapped data frame that looks like: Any ideas on what we think would work best in the case of this notebook? |
I'm not sure that the screenshots you've included help me give useful feedback for these datasets.
I think these are quick things that we may want to encourage our users to look into as well. For example, if nothing is mapping to your gene identifier, maybe you did it wrong or maybe you should use a different gene identifier OR maybe that's expected and its okay for your purposes, but its always good to be aware of how many genes you are potentially "losing" if you end up relying on this new gene identifier you've mapped to for your downstream analyses. In regards to your general question about which dataset, I don't think sample size matters too much for gene identifier mapping, but I would we stick with the n = 15 set so we try to recommend datasets that might be useful for users in other contexts. Aka n = 4 is fine for what we are illustrating gene identifier mapping but not necessarily other common analyses, so might as well invest in a dataset that our users might find useful beyond this example (n = 15 isn't so big that this would be hindering memory wise either). |
Of course, that makes sense! I will include said summary stats in the notebook (still using the n = 15 dataset) and ping you over on the draft PR once I commit those changes 👍 Thanks for the feedback @cansavvy! |
This is a parallel issue to #110
The only module I'm seeing that has an RNA-seq example and not a microarray example is the gene ID conversion: ensembl-id-convert
I'm not sure that we will need to change it that much from the RNA-seq example except that it will need to be a different dataset.
The text was updated successfully, but these errors were encountered: