Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create microarray gene ID conversion example #113

Closed
cansavvy opened this issue Jul 7, 2020 · 5 comments · Fixed by #212
Closed

Create microarray gene ID conversion example #113

cansavvy opened this issue Jul 7, 2020 · 5 comments · Fixed by #212
Assignees
Labels
before going live Needs to be done before we can "go live" or do testing

Comments

@cansavvy
Copy link
Contributor

cansavvy commented Jul 7, 2020

This is a parallel issue to #110

The only module I'm seeing that has an RNA-seq example and not a microarray example is the gene ID conversion: ensembl-id-convert

I'm not sure that we will need to change it that much from the RNA-seq example except that it will need to be a different dataset.

@cansavvy
Copy link
Contributor Author

Since this module of gene ID conversion won't depend on the gene expression data itself, we might consider a linking out approach like is decribed for the kegg pathway module: #131 (comment)

This may mean because we have microarray listed first, we need to switch the current gene ID conversion example to a microarray dataset and then link to that from RNA-seq.

@cansavvy cansavvy added the not ready Needs more details/planning before it can be acted on label Aug 18, 2020
@cansavvy cansavvy changed the title Create gene ID conversion example for a microarray dataset. Link RNA-seq section to gene ID conversion example in the microarray dataset section Aug 18, 2020
@cansavvy cansavvy changed the title Link RNA-seq section to gene ID conversion example in the microarray dataset section Create RNA-seq gene ID conversion example Aug 18, 2020
@cansavvy cansavvy removed the not ready Needs more details/planning before it can be acted on label Aug 18, 2020
@cansavvy
Copy link
Contributor Author

So we can partially address: #98 we should use a mouse dataset for this gene id conversion example.

@cansavvy cansavvy changed the title Create RNA-seq gene ID conversion example Create microarray gene ID conversion example Aug 19, 2020
@cbethell
Copy link
Contributor

I filed a draft PR for this issue using a mouse glioma cancer stem cell dataset with n = 15 samples (which has been uploaded to the S3 bucket for testing/review).

This produces a mapped data frame that looks like:
Screen Shot 2020-09-10 at 8 19 51 AM

If we want to stick with a smaller dataset, we could use this transcription profiling of mouse glioma cell line dataset with n = 4 samples, which would produce a mapped data frame that looks like:
Screen Shot 2020-09-10 at 8 26 35 AM

Any ideas on what we think would work best in the case of this notebook?

@cansavvy
Copy link
Contributor Author

I'm not sure that the screenshots you've included help me give useful feedback for these datasets.
Instead, it might be more helpful if you included some summary stats. This may include (but doesn't have to be limited to);

  • How many gene identifiers are mapped versus not mapped?
  • How many multi mappings does each gene identifier have?

I think these are quick things that we may want to encourage our users to look into as well. For example, if nothing is mapping to your gene identifier, maybe you did it wrong or maybe you should use a different gene identifier OR maybe that's expected and its okay for your purposes, but its always good to be aware of how many genes you are potentially "losing" if you end up relying on this new gene identifier you've mapped to for your downstream analyses.

In regards to your general question about which dataset, I don't think sample size matters too much for gene identifier mapping, but I would we stick with the n = 15 set so we try to recommend datasets that might be useful for users in other contexts. Aka n = 4 is fine for what we are illustrating gene identifier mapping but not necessarily other common analyses, so might as well invest in a dataset that our users might find useful beyond this example (n = 15 isn't so big that this would be hindering memory wise either).

@cbethell
Copy link
Contributor

I'm not sure that the screenshots you've included help me give useful feedback for these datasets.
Instead, it might be more helpful if you included some summary stats. This may include (but doesn't have to be limited to);

  • How many gene identifiers are mapped versus not mapped?
  • How many multi mappings does each gene identifier have?

I think these are quick things that we may want to encourage our users to look into as well. For example, if nothing is mapping to your gene identifier, maybe you did it wrong or maybe you should use a different gene identifier OR maybe that's expected and its okay for your purposes, but its always good to be aware of how many genes you are potentially "losing" if you end up relying on this new gene identifier you've mapped to for your downstream analyses.

In regards to your general question about which dataset, I don't think sample size matters too much for gene identifier mapping, but I would we stick with the n = 15 set so we try to recommend datasets that might be useful for users in other contexts. Aka n = 4 is fine for what we are illustrating gene identifier mapping but not necessarily other common analyses, so might as well invest in a dataset that our users might find useful beyond this example (n = 15 isn't so big that this would be hindering memory wise either).

Of course, that makes sense! I will include said summary stats in the notebook (still using the n = 15 dataset) and ping you over on the draft PR once I commit those changes 👍 Thanks for the feedback @cansavvy!

@cansavvy cansavvy added the before going live Needs to be done before we can "go live" or do testing label Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
before going live Needs to be done before we can "go live" or do testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants