Create microarray gene ID conversion example #113

cansavvy · 2020-07-07T17:52:44Z

This is a parallel issue to #110

The only module I'm seeing that has an RNA-seq example and not a microarray example is the gene ID conversion: ensembl-id-convert

I'm not sure that we will need to change it that much from the RNA-seq example except that it will need to be a different dataset.

cansavvy · 2020-07-10T18:53:42Z

Since this module of gene ID conversion won't depend on the gene expression data itself, we might consider a linking out approach like is decribed for the kegg pathway module: #131 (comment)

This may mean because we have microarray listed first, we need to switch the current gene ID conversion example to a microarray dataset and then link to that from RNA-seq.

cansavvy · 2020-08-18T18:42:48Z

So we can partially address: #98 we should use a mouse dataset for this gene id conversion example.

cbethell · 2020-09-10T12:27:53Z

I filed a draft PR for this issue using a mouse glioma cancer stem cell dataset with n = 15 samples (which has been uploaded to the S3 bucket for testing/review).

This produces a mapped data frame that looks like:

If we want to stick with a smaller dataset, we could use this transcription profiling of mouse glioma cell line dataset with n = 4 samples, which would produce a mapped data frame that looks like:

Any ideas on what we think would work best in the case of this notebook?

cansavvy · 2020-09-10T13:14:42Z

I'm not sure that the screenshots you've included help me give useful feedback for these datasets.
Instead, it might be more helpful if you included some summary stats. This may include (but doesn't have to be limited to);

How many gene identifiers are mapped versus not mapped?
How many multi mappings does each gene identifier have?

I think these are quick things that we may want to encourage our users to look into as well. For example, if nothing is mapping to your gene identifier, maybe you did it wrong or maybe you should use a different gene identifier OR maybe that's expected and its okay for your purposes, but its always good to be aware of how many genes you are potentially "losing" if you end up relying on this new gene identifier you've mapped to for your downstream analyses.

In regards to your general question about which dataset, I don't think sample size matters too much for gene identifier mapping, but I would we stick with the n = 15 set so we try to recommend datasets that might be useful for users in other contexts. Aka n = 4 is fine for what we are illustrating gene identifier mapping but not necessarily other common analyses, so might as well invest in a dataset that our users might find useful beyond this example (n = 15 isn't so big that this would be hindering memory wise either).

cbethell · 2020-09-10T13:40:43Z

I'm not sure that the screenshots you've included help me give useful feedback for these datasets.
Instead, it might be more helpful if you included some summary stats. This may include (but doesn't have to be limited to);

How many gene identifiers are mapped versus not mapped?

How many multi mappings does each gene identifier have?

I think these are quick things that we may want to encourage our users to look into as well. For example, if nothing is mapping to your gene identifier, maybe you did it wrong or maybe you should use a different gene identifier OR maybe that's expected and its okay for your purposes, but its always good to be aware of how many genes you are potentially "losing" if you end up relying on this new gene identifier you've mapped to for your downstream analyses.

In regards to your general question about which dataset, I don't think sample size matters too much for gene identifier mapping, but I would we stick with the n = 15 set so we try to recommend datasets that might be useful for users in other contexts. Aka n = 4 is fine for what we are illustrating gene identifier mapping but not necessarily other common analyses, so might as well invest in a dataset that our users might find useful beyond this example (n = 15 isn't so big that this would be hindering memory wise either).

Of course, that makes sense! I will include said summary stats in the notebook (still using the n = 15 dataset) and ping you over on the draft PR once I commit those changes 👍 Thanks for the feedback @cansavvy!

cansavvy mentioned this issue Aug 13, 2020

Which analyses can use the same steps across technologies? #175

Closed

cansavvy added the not ready Needs more details/planning before it can be acted on label Aug 18, 2020

cansavvy changed the title ~~Create gene ID conversion example for a microarray dataset.~~ Link RNA-seq section to gene ID conversion example in the microarray dataset section Aug 18, 2020

cansavvy changed the title ~~Link RNA-seq section to gene ID conversion example in the microarray dataset section~~ Create RNA-seq gene ID conversion example Aug 18, 2020

cansavvy removed the not ready Needs more details/planning before it can be acted on label Aug 18, 2020

cansavvy changed the title ~~Create RNA-seq gene ID conversion example~~ Create microarray gene ID conversion example Aug 19, 2020

jaclyn-taroni assigned cbethell Aug 31, 2020

cbethell mentioned this issue Sep 10, 2020

Add microarray gene ID conversion example #212

Merged

11 tasks

cansavvy mentioned this issue Sep 10, 2020

Update RNA-seq gene id conversion module #213

Closed

cansavvy added the before going live Needs to be done before we can "go live" or do testing label Sep 14, 2020

jaclyn-taroni closed this as completed in #212 Sep 18, 2020

cansavvy mentioned this issue Sep 23, 2020

Add Mus musculus as an example to illustrate change in annotation package #98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create microarray gene ID conversion example #113

Create microarray gene ID conversion example #113

cansavvy commented Jul 7, 2020

cansavvy commented Jul 10, 2020

cansavvy commented Aug 18, 2020

cbethell commented Sep 10, 2020

cansavvy commented Sep 10, 2020

cbethell commented Sep 10, 2020

Create microarray gene ID conversion example #113

Create microarray gene ID conversion example #113

Comments

cansavvy commented Jul 7, 2020

cansavvy commented Jul 10, 2020

cansavvy commented Aug 18, 2020

cbethell commented Sep 10, 2020

cansavvy commented Sep 10, 2020

cbethell commented Sep 10, 2020