-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNA-Seq Header Section #216
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure and content in this PR looks great to me @cansavvy! I haven't thought of anything else that should be included (yet) that you haven't covered here, but I do have some other suggestions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
03-rnaseq/00-intro-to-rnaseq.Rmd
Outdated
### RNA-seq data **strengths**: | ||
|
||
- RNA-seq can collect data on more transcripts (it is less bound to a pre-determined set of probes like microarray is). | ||
- It's values are considered more dynamic than microarray values which are constrained to the number of probes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the dynamic range of values? Do you have a citation for microarray values which are constrained to the number of probes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping up the review that I sent early!
03-rnaseq/00-intro-to-rnaseq.Rmd
Outdated
### DESeq2 normalization methods | ||
|
||
Although DESeq2 has multiple normalization methods, we generally stick to `vst()` (Variance Stablizing Transformation) or `rlog()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably call these transformations, not normalization. You can normalize (e.g., adjust for size factors; counts(<dataset>, normalize = TRUE)
) without transforming. This could be confusing for someone coming in with some level of experience. Also should talk about what these are specifically doing beyond that normalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is more general and probably should be applied in other RNA-seq notebooks (e.g., 03-rnaseq/dimension_reduction_rnaseq_01_pca.Rmd
), too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed: #220
@jaclyn-taroni I think your comments have been addressed. I didn't end up adding a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good! I had a couple remaining comments.
03-rnaseq/00-intro-to-rnaseq.Rmd
Outdated
### RNA-seq data **strengths** | ||
|
||
- RNA-seq can assay unknown transcripts, as it is not bound to a pre-determined set of probes like microarrays [@Zhong2009]. | ||
- Its values are considered more dynamic than microarray values which are constrained to the number of probes [@Zhong2009]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this point the way it is currently written - is this about the background signal point in the cited article?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From that paper: (No source to it that I can see)
and a limited dynamic range of detection owing to both background and saturation of signals.
Background and saturation. I'll put the word "saturation" in there to help make the point more clear.
03-rnaseq/00-intro-to-rnaseq.Rmd
Outdated
|
||
To normalize and transform our data with DESeq2, we generally use `vst()` (Variance Stabilizing Transformation) or `rlog()`. | ||
[Both methods are very similar](http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#the-variance-stabilizing-transformation-and-the-rlog). | ||
Both _normalize_ your data by correcting for library size differences but they also _transform_ your data by altering their distributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for a nod to this point (quoting this section of the vignette):
The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low.
But I'm not sure what the exact right level of detail is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a tad more. I don't think we need to put too much, because a good portion of users won't really care, and the ones that do will probably look at the sources we've put here, but a tad more detail; a tad less vagueness would be good.
As I mentioned in #212 (comment), I'm playing around with the process for making sure the pull request branches are up to date with While I was doing resolving conflicts, I noticed that the last round of re-rendering wasn't run in the Docker container: https://alexslemonade.github.io/refinebio-examples/03-rnaseq/dimension-reduction_rnaseq_01_pca.html#6_print_session_info But I think everything will be rerun once the next round of edits go in anyway. |
I haven't been running these items on anything but the Docker container, so that is odd... will see if that's resolved. |
What's the html issue? A merge conflict problem? |
Yes, I think we'll have merge conflicts with the HTML files every time. |
So mentioned this on Slack, these differences are because snakemake doesn't recognize docker changes so when I switched to an in-development docker image for #206 and then switched back, it doesn't realize things need to be re-run. |
This question is somewhat related to thinking about status checks - when would these ever be run on Mac OS Mojave if people were following the contributing guidelines? The in-development image will always be Ubuntu 20.04. And is this an area where some kind automation would make our lives easier? |
I've never run it on a non-ubuntu Docker image? But I see that is in there? |
We can definitely look into this if it would be helpful, the PR checklist is admittedly long, so if this would help reduce author burden, that seems good. I'm unsure what heavy a lift it is to get this going? |
Ah, I had assumed it would be all of the ones that went in on the last PR but it makes sense given what you're saying about Snakemake. (All good info for thinking about automation or not.) |
In regards to the RNA-seq content, it's ready for a another look, @jaclyn-taroni. I added links in modules with search and replace and tested them. |
Ah, it is quite possible that the last one did not get rendered as it should have been. Although I have been running it on the Docker image thus far, I have recently been moving back and forth between the Docker for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I found one instance of a word being repeated inside and outside of a link and that was my only remaining suggestion.
Purpose:
#125
Strategy
Tried to make sure the major points discussed in the issue #125 are addressed here.
How do we feel about the structure and general information added?
Any sources to add?
I haven't added the links to the individual modules yet (the
TODO
s) will do this once we know we are okay with this set up. (The links might change).Analysis Pull Request Check List (roughly in order):
Content checks
{{BLANKS}}
have been replaced with the correct content.Formatting Checks
Add datasets to S3
Docker/Snakemake rendering components
.html
link to the navigation bar.