Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNA-Seq Header Section #216

Merged
merged 28 commits into from
Sep 18, 2020
Merged

RNA-Seq Header Section #216

merged 28 commits into from
Sep 18, 2020

Conversation

cansavvy
Copy link
Contributor

@cansavvy cansavvy commented Sep 15, 2020

Purpose:

#125

Strategy

Tried to make sure the major points discussed in the issue #125 are addressed here.

  • How do we feel about the structure and general information added?

  • Any sources to add?

  • I haven't added the links to the individual modules yet (the TODOs) will do this once we know we are okay with this set up. (The links might change).

Analysis Pull Request Check List (roughly in order):

Content checks

  • All {{BLANKS}} have been replaced with the correct content.
  • Sources are cited
  • Seed is set (if applicable)

Formatting Checks

  • Removed any manual numbering of sections.
  • Removed any instances of chunk naming.
  • Spell checked any Rmd file or md file.
  • Comments and documentation are up to date.

Add datasets to S3

Docker/Snakemake rendering components

@cansavvy cansavvy changed the title WIP: RNA-Seq Header Section RNA-Seq Header Section Sep 15, 2020
@cansavvy cansavvy marked this pull request as ready for review September 15, 2020 19:16
@cansavvy cansavvy requested a review from cbethell September 15, 2020 19:16
Copy link
Contributor

@cbethell cbethell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure and content in this PR looks great to me @cansavvy! I haven't thought of anything else that should be included (yet) that you haven't covered here, but I do have some other suggestions below.

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
@cansavvy cansavvy requested a review from cbethell September 16, 2020 20:47
Copy link
Contributor

@cbethell cbethell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning the review so we can talk about #187 and #189

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
### RNA-seq data **strengths**:

- RNA-seq can collect data on more transcripts (it is less bound to a pre-determined set of probes like microarray is).
- It's values are considered more dynamic than microarray values which are constrained to the number of probes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the dynamic range of values? Do you have a citation for microarray values which are constrained to the number of probes?

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping up the review that I sent early!

Comment on lines 91 to 93
### DESeq2 normalization methods

Although DESeq2 has multiple normalization methods, we generally stick to `vst()` (Variance Stablizing Transformation) or `rlog()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call these transformations, not normalization. You can normalize (e.g., adjust for size factors; counts(<dataset>, normalize = TRUE)) without transforming. This could be confusing for someone coming in with some level of experience. Also should talk about what these are specifically doing beyond that normalization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is more general and probably should be applied in other RNA-seq notebooks (e.g., 03-rnaseq/dimension_reduction_rnaseq_01_pca.Rmd), too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed: #220

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
@cansavvy
Copy link
Contributor Author

@jaclyn-taroni I think your comments have been addressed. I didn't end up adding a More resources section, I put the table you posted a link to but I dropped the StatsQuest FPKM video, it seemed less relevant, but since we already link to some StatsQuest videos, I'm sure interested users will find it anyway.

  • I also tried to be more articulate about normalize/transform but let me know if I should go into more detail than what is here.
  • Did some editing about why genes might not show up, mainly just focused on the annotation thing, sounds like that might be the main reason.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good! I had a couple remaining comments.

### RNA-seq data **strengths**

- RNA-seq can assay unknown transcripts, as it is not bound to a pre-determined set of probes like microarrays [@Zhong2009].
- Its values are considered more dynamic than microarray values which are constrained to the number of probes [@Zhong2009].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this point the way it is currently written - is this about the background signal point in the cited article?

Copy link
Contributor Author

@cansavvy cansavvy Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that paper: (No source to it that I can see)

and a limited dynamic range of detection owing to both background and saturation of signals.

Background and saturation. I'll put the word "saturation" in there to help make the point more clear.

references.bib Outdated Show resolved Hide resolved
03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved

To normalize and transform our data with DESeq2, we generally use `vst()` (Variance Stabilizing Transformation) or `rlog()`.
[Both methods are very similar](http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#the-variance-stabilizing-transformation-and-the-rlog).
Both _normalize_ your data by correcting for library size differences but they also _transform_ your data by altering their distributions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for a nod to this point (quoting this section of the vignette):

The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low.

But I'm not sure what the exact right level of detail is here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a tad more. I don't think we need to put too much, because a good portion of users won't really care, and the ones that do will probably look at the sources we've put here, but a tad more detail; a tad less vagueness would be good.

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
@jaclyn-taroni
Copy link
Member

As I mentioned in #212 (comment), I'm playing around with the process for making sure the pull request branches are up to date with master. What I did was check out this branch locally, used git merge origin/master, and then resolved the merge conflicts in GitKraken. (I'll push that commit shortly.) For references.bib, I included both references and for all the HTML files I used the most recent commit to master (d5f6c7a). The process wasn't too bad, but I think we're going to have the HTML issue every time. I imagine it's good that the references.bib was tended to at this point. (Was the other change that "disappeared" in references.bib?)

While I was doing resolving conflicts, I noticed that the last round of re-rendering wasn't run in the Docker container: https://alexslemonade.github.io/refinebio-examples/03-rnaseq/dimension-reduction_rnaseq_01_pca.html#6_print_session_info

But I think everything will be rerun once the next round of edits go in anyway.

@cansavvy
Copy link
Contributor Author

While I was doing resolving conflicts, I noticed that the last round of re-rendering wasn't run in the Docker container: https://alexslemonade.github.io/refinebio-examples/03-rnaseq/dimension-reduction_rnaseq_01_pca.html#6_print_session_info

I haven't been running these items on anything but the Docker container, so that is odd... will see if that's resolved.

@cansavvy
Copy link
Contributor Author

The process wasn't too bad, but I think we're going to have the HTML issue every time.

What's the html issue? A merge conflict problem?

@jaclyn-taroni
Copy link
Member

What's the html issue? A merge conflict problem?

Yes, I think we'll have merge conflicts with the HTML files every time.

@cansavvy
Copy link
Contributor Author

cansavvy commented Sep 18, 2020

While I was doing resolving conflicts, I noticed that the last round of re-rendering wasn't run in the Docker container: https://alexslemonade.github.io/refinebio-examples/03-rnaseq/dimension-reduction_rnaseq_01_pca.html#6_print_session_info

So mentioned this on Slack, these differences are because snakemake doesn't recognize docker changes so when I switched to an in-development docker image for #206 and then switched back, it doesn't realize things need to be re-run.

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Sep 18, 2020

This question is somewhat related to thinking about status checks - when would these ever be run on Mac OS Mojave if people were following the contributing guidelines? The in-development image will always be Ubuntu 20.04. And is this an area where some kind automation would make our lives easier?

@cansavvy
Copy link
Contributor Author

This question is somewhat related to thinking about status checks - when would these ever be run on Mac OS Mojave if people were following the contributing guidelines? The in-development image will always be Ubuntu 20.04. And is this an area where some kind automation would make our lives easier?

I've never run it on a non-ubuntu Docker image? But I see that is in there?

@cansavvy
Copy link
Contributor Author

The only one I'm seeing Mojave on is the most recent annotation microarray PR #212. @cbethell did you forget to run snakemake on the Docker image for you last render? I missed that in my review if so.

@cansavvy
Copy link
Contributor Author

And is this an area where some kind automation would make our lives easier?

We can definitely look into this if it would be helpful, the PR checklist is admittedly long, so if this would help reduce author burden, that seems good. I'm unsure what heavy a lift it is to get this going?

@jaclyn-taroni
Copy link
Member

Ah, I had assumed it would be all of the ones that went in on the last PR but it makes sense given what you're saying about Snakemake. (All good info for thinking about automation or not.)

@cansavvy
Copy link
Contributor Author

cansavvy commented Sep 18, 2020

In regards to the RNA-seq content, it's ready for a another look, @jaclyn-taroni. I added links in modules with search and replace and tested them.

@cbethell
Copy link
Contributor

The only one I'm seeing Mojave on is the most recent annotation microarray PR #212. @cbethell did you forget to run snakemake on the Docker image for you last render? I missed that in my review if so.

Ah, it is quite possible that the last one did not get rendered as it should have been. Although I have been running it on the Docker image thus far, I have recently been moving back and forth between the Docker for OpenPBTA-analysis and Docker for refinebio-examples, so again your theory is quite possible! I'll be sure to look out for this moving forward (and will also try @cansavvy's tip of using different ports for the different repos)!

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I found one instance of a word being repeated inside and outside of a link and that was my only remaining suggestion.

03-rnaseq/00-intro-to-rnaseq.Rmd Outdated Show resolved Hide resolved
@cansavvy cansavvy merged commit 8a2c52c into master Sep 18, 2020
@cansavvy cansavvy deleted the cansavvy/rna-seq-header branch September 18, 2020 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants