adds _freeze/ to version control

sib-swiss · Mar 20, 2024 · 72b5030 · 72b5030
1 parent b45ea1f
commit 72b5030
Show file tree

Hide file tree

Showing 79 changed files with 148 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -11,7 +11,7 @@ cellranger*
 course_data*
 *.rds
 _site/
-_freeze/
+# _freeze/
 .quarto/
 /.quarto/
 *cache*

diff --git a/_freeze/day1/day1-1_setup/execute-results/html.json b/_freeze/day1/day1-1_setup/execute-results/html.json
@@ -0,0 +1,14 @@
+{
+  "hash": "59de665b3347b1be9a6bd87d45b86b5a",
+  "result": {
+    "markdown": "---\ntitle: \"Setup\"\nengine: knitr\n---\n\n\n## Material\n\n\n{{< downloadthis ../assets/pdf/01_introduction.pdf dname=\"01_introduction\" label=\"Download the presentation\" icon=\"filetype-pdf\" >}}\n\n\n\n## Exercises\n\n### Login and set up\n\nChoose one of the following:\n\n- **Enrolled:** if you are enrolled in a course with a teacher\n- **Own installation:** if you want to install packages on your own local Rstudio installation\n- **Docker:** if you want to use the docker image locally\n- **renkulab.io** if you want to easily  deploy the environment outside the course\n\n::: {.panel-tabset}\n\n## Enrolled\n\nLog in to Rstudio server with the provided link, username and password.\n\n## Own installation\n\nInstall the required packages using the script [`install_packages.R`](https://raw.githubusercontent.com/sib-swiss/single-cell-training/master/Docker/install_packages.R)\n\n## Docker\n\nWith docker, you can use exactly the same environment as we use in the enrolled course, but than running locally.\n\nIn the video below there's a short tutorial on how to set up a docker container for this course. Note that you will need administrator rights, and that if you are using Windows, you need the latest version of Windows 10.\n\n\n{{< video https://vimeo.com/481620477 width=\"640\" height=\"360\" >}}\n\n\n\nThe command to run the environment required for this course looks like this (in a terminal):\n\n::: {.callout-warning}\n## Modify the script\nThe home directory within the container is mounted to your current directory (`$PWD`), if you want to change this behaviour, modify the path after `-v` to the working directory on your computer before running it.\n:::\n\n\n::: {.cell hash='day1-1_setup_cache/html/unnamed-chunk-1_1c4b876c04ce695c706f5917064dfe4c'}\n\n```{.bash .cell-code}\ndocker run \\\n--rm \\\n-p 8787:8787 \\\n-e PASSWORD=test \\\n-v $PWD:/home/rstudio \\\ngeertvangeest/single-cell-rstudio:latest\n```\n:::\n\n\nIf this command has run successfully, approach Rstudio server like this:\n\n```\nhttp://localhost:8787\n```\n\nCopy this URL into your browser. If you used the snippet above, the credentials will be:\n\n- **Username:** `rstudio`\n- **Password:** `test`\n\nGreat! Now you will be able to use Rstudio with all required installations.\n\n::: {.callout-note}\n## About the options\nThe option `-v` mounts a local directory in your computer to the directory `/home/rstudio` in the docker container ('rstudio' is the default user for Rstudio containers). In that way, you have files available both in the container and on your computer. Use this directory on your computer. Change the first path to a path on your computer that you want to use as a working directory.\n\nThe part `geertvangeest/single-cell-rstudio:latest` is the image we are going to load into the container. The image contains all the information about software and dependencies needed for this course. When you run this command for the first time it will download the image. Once it's on your computer, it will start immediately.\n:::\n\n## renkulab.io\nTo simply run the environment, you can use [renku](https://renkulab.io). You can find the repository (including the image) here: [https://renkulab.io/projects/geert.vangeest/single-cell-training/](https://renkulab.io/projects/geert.vangeest/single-cell-training/). \n\n:::\n\n### Create a project\n\nNow that you have access to an environment with the required installations, we will set up a project in a new directory. On the top right choose the button **Project (None)** and select **New Project...**\n\n![](../assets/images/create_new_project.png){width=200}\n\nContinue by choosing **New Directory**\n\n![](../assets/images/choose_new_directory.png){width=300}\n\nAs project type select **New Project**\n\n![](../assets/images/choose_new_project.png){width=300}\n\nFinally, type in the project name. This should be `single_cell_course`. Finish by clicking **Create Project**.\n\n![](../assets/images/define_directory_name.png){width=300}\n\nNow that we have setup a project and a project directory (it is in `/home/rstudio/single_cell_course`), we can download the data that is required for this course. We will use the built-in terminal of Rstudio. To do this, select the **Terminal** tab:\n\n![](../assets/images/select_terminal_tab.png){width=300}\n\n### Downloading the course data\n\nTo download and extract the dataset, copy-paste these commands inside the terminal tab:\n\n\n::: {.cell hash='day1-1_setup_cache/html/unnamed-chunk-2_93ed9f534c0bbee569e55b34a716c0fb'}\n\n```{.bash .cell-code}\nwget https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/course_data.tar.gz\ntar -xvf course_data.tar.gz\nrm course_data.tar.gz\n```\n:::\n\n\n::: {.callout-note}\n## If on Windows\n\nIf you're using Windows, you can directly open the [link](https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/course_data.tar.gz) in your browser, and downloading will start automatically. Unpack the tar.gz file in the directory where you want to work in during the course.\n::: \n\nHave a look at the data directory you have downloaded. It should contain the following:\n\n```\ncourse_data\n├── count_matrices\n│   ├── ETV6-RUNX1_1\n│   │   └── outs\n│   │       └── filtered_feature_bc_matrix\n│   │           ├── barcodes.tsv.gz\n│   │           ├── features.tsv.gz\n│   │           └── matrix.mtx.gz\n│   ├── ETV6-RUNX1_2\n│   │   └── outs\n│   │       └── filtered_feature_bc_matrix\n│   │           ├── barcodes.tsv.gz\n│   │           ├── features.tsv.gz\n│   │           └── matrix.mtx.gz\n│   ├── ETV6-RUNX1_3\n│   │   └── outs\n│   │       └── filtered_feature_bc_matrix\n│   │           ├── barcodes.tsv.gz\n│   │           ├── features.tsv.gz\n│   │           └── matrix.mtx.gz\n│   ├── PBMMC_1\n│   │   └── outs\n│   │       └── filtered_feature_bc_matrix\n│   │           ├── barcodes.tsv.gz\n│   │           ├── features.tsv.gz\n│   │           └── matrix.mtx.gz\n│   ├── PBMMC_2\n│   │   └── outs\n│   │       └── filtered_feature_bc_matrix\n│   │           ├── barcodes.tsv.gz\n│   │           ├── features.tsv.gz\n│   │           └── matrix.mtx.gz\n│   └── PBMMC_3\n│       └── outs\n│           └── filtered_feature_bc_matrix\n│               ├── barcodes.tsv.gz\n│               ├── features.tsv.gz\n│               └── matrix.mtx.gz\n└── reads\n    ├── ETV6-RUNX1_1_S1_L001_I1_001.fastq.gz\n    ├── ETV6-RUNX1_1_S1_L001_R1_001.fastq.gz\n    └── ETV6-RUNX1_1_S1_L001_R2_001.fastq.gz\n\n20 directories, 21 files\n```\n\nThis data comes from:\n\nCaron M, St-Onge P, Sontag T, Wang YC, Richer C, Ragoussis I, et al. **Single-cell analysis of childhood leukemia reveals a link between developmental states and ribosomal protein expression as a source of intra-individual heterogeneity.** Scientific Reports. 2020;10:1–12. Available from: http://dx.doi.org/10.1038/s41598-020-64929-x\n\nWe will use the reads to showcase the use of `cellranger count`. The directory contains only reads from chromosome 21 and 22 of one sample (`ETV6-RUNX1_1`). The count matrices are output of `cellranger count`, and we will use those for the other exercises in `R`.\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/execute-results/html.json b/_freeze/day1/day1-2_analysis_tools_qc/execute-results/html.json
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-10-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-10-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-11-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-11-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-14-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-14-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-15-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-15-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-16-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-16-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-18-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-18-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-8-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-8-1.png
diff --git a/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-9-1.png b/_freeze/day1/day1-2_analysis_tools_qc/figure-html/unnamed-chunk-9-1.png
diff --git a/_freeze/day1/day1-3_normalization_scaling/execute-results/html.json b/_freeze/day1/day1-3_normalization_scaling/execute-results/html.json
@@ -0,0 +1,14 @@
+{
+  "hash": "95d3e3ffe604f139b8db3d7f940b116f",
+  "result": {
+    "markdown": "---\ntitle: \"Normalization and scaling\"\n---\n\n::: {.callout-note}\n## Learning outcomes\n\n**After having completed this chapter you will be able to:**\n\n- Describe and perform standard procedures for normalization and scaling with the package `Seurat`\n- Select the most variable genes from a `Seurat` object for downstream analyses\n::: \n\n\n## Material\n\n- [Seurat vignette](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html)\n\n## Exercises\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-1_6b13e470bf81082a514669efbd93d3cf'}\n\n:::\n\n\n### Normalization\n\nAfter removing unwanted cells from the dataset, the next step is to normalize the data.\nBy default, Seurat employs a global-scaling normalization method `\"LogNormalize\"` that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.\nNormalized values are stored in the \"RNA\" assay (as item of the `@assay` slot) of the seu object.\n\nThis is how you can call the function (don't run it yet! Read the exercise first):\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-2_01fca831fcc0537dc3267c3637d887b0'}\n\n```{.r .cell-code}\n# Don't run it yet! Read the exercise first\nseu <- Seurat::NormalizeData(seu,\n                     normalization.method = \"LogNormalize\",\n                     scale.factor = 10000)\n```\n:::\n\n\n\n::: {.callout-important}\n## Exercise\nHave a look at the assay data before and after running `NormalizeData()`. Did it change?\n::: \n\n::: {.callout-tip}\nYou can extract assay data with the function `Seurat::GetAssayData`. By default, the slot `data` is used (inside the slot `assay`), containing normalized counts. Use `Seurat::GetAssayData(seu, slot = \"counts\")` to get the raw counts. \n:::\n\n::: {.callout-tip collapse=\"true\"}\n# Answer\nYou can check out some assay data with:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-3_8cdebfbf3090cdc9d03f1afcf7c50f08'}\n\n```{.r .cell-code}\nSeurat::GetAssayData(seu)[1:10,1:10]  \n```\n:::\n\n\nReturning:\n    \n::: {.panel-tabset}\n## Before normalization\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-4_941e6971621aacec1113997e1a465007'}\n::: {.cell-output .cell-output-stderr}\n```\nWarning in GetAssayData.StdAssay(object = object[[assay]], layer = layer): data\nlayer is not found and counts layer is used\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n10 x 10 sparse Matrix of class \"dgCMatrix\"\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n  [[ suppressing 10 column names 'PBMMC-1_AAACCTGCAGACGCAA-1', 'PBMMC-1_AAACCTGTCATCACCC-1', 'PBMMC-1_AAAGATGCATAAAGGT-1' ... ]]\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n                                 \nRP11-34P13.7  . . . . . . . . . .\nFO538757.3    . . . . . . . . . .\nFO538757.2    1 . . . . . 2 . . .\nAP006222.2    . . . . . . . . . .\nRP4-669L17.10 . . . . . . . . . .\nRP5-857K21.4  . . . . . . . . . .\nRP11-206L10.9 . . . . . . . . . .\nLINC00115     . . . . . . . . . .\nFAM41C        . . . . . . . . . .\nRP11-54O7.1   . . . . . . . . . .\n```\n:::\n:::\n\n\n## After normalization\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-5_7f02a771954dba814f30e7ed3b473dbe'}\n::: {.cell-output .cell-output-stderr}\n```\nNormalizing layer: counts\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n10 x 10 sparse Matrix of class \"dgCMatrix\"\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n  [[ suppressing 10 column names 'PBMMC-1_AAACCTGCAGACGCAA-1', 'PBMMC-1_AAACCTGTCATCACCC-1', 'PBMMC-1_AAAGATGCATAAAGGT-1' ... ]]\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n                                               \nRP11-34P13.7  .        . . . . . .        . . .\nFO538757.3    .        . . . . . .        . . .\nFO538757.2    1.641892 . . . . . 1.381104 . . .\nAP006222.2    .        . . . . . .        . . .\nRP4-669L17.10 .        . . . . . .        . . .\nRP5-857K21.4  .        . . . . . .        . . .\nRP11-206L10.9 .        . . . . . .        . . .\nLINC00115     .        . . . . . .        . . .\nFAM41C        .        . . . . . .        . . .\nRP11-54O7.1   .        . . . . . .        . . .\n```\n:::\n:::\n\n\n:::\n::: \n\n\n\n::: {.callout-note}\n## Updating `seu`\nAs you might have noticed, this function takes the object `seu` as input, and it returns it to an object named `seu`. We can do this because the output of such calculations are added to the object, without loosing information.\n:::\n\n### Variable features\n\nWe next calculate a subset of features that exhibit high cell-to-cell variation in the dataset (i.e, they are highly expressed in some cells, and lowly expressed in others). Focusing on these genes in downstream analysis helps to highlight biological signal in single-cell datasets. The procedure in Seurat models the mean-variance relationship inherent in single-cell data, and is implemented in the `FindVariableFeatures()` function. By default, 2,000 genes (features) per dataset are returned and these will be used in downstream analysis, like PCA.\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-6_53d5c7a0f74c91ba82d08d2477ac8e3f'}\n\n```{.r .cell-code}\nseu <- Seurat::FindVariableFeatures(seu,\n                            selection.method = \"vst\",\n                            nfeatures = 2000)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nFinding variable features for layer counts\n```\n:::\n:::\n\n\nLet's have a look at the 10 most variable genes:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-7_9e4350f35ffc8498a1f67cc8c5026d50'}\n\n```{.r .cell-code}\n# Identify the 10 most highly variable genes\ntop10 <- head(Seurat::VariableFeatures(seu), 10)\ntop10\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n [1] \"IGKC\"   \"HBG2\"   \"IGHG3\"  \"IGHG1\"  \"JCHAIN\" \"HBG1\"   \"IGHA1\"  \"IGHGP\" \n [9] \"IGLC2\"  \"IGLC3\" \n```\n:::\n:::\n\n\nWe can plot them in a nicely labeled scatterplot:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-8_e83bec5dcde1fd9800cbcd0ec4a61e72'}\n\n```{.r .cell-code}\nvf_plot <- Seurat::VariableFeaturePlot(seu)\nSeurat::LabelPoints(plot = vf_plot,\n            points = top10, repel = TRUE)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nWhen using repel, set xnudge and ynudge to 0 for optimal results\n```\n:::\n\n::: {.cell-output-display}\n![](day1-3_normalization_scaling_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\n::: {.callout-warning}\n## Make sure the plotting window is large enough\n\nThe function `LabelPoints` will throw an error if the plotting window is to small. If you get an error, increase plotting window size in RStudio and try again. \n::: \n\nYou can see that most of the highly variables are antibody subunits (starting with IGH, IGL). Not very surprising since we look at bone marrow tissue. We can have a look later in which cells they are expressed. \n\n### Scaling\n\nNext, we apply scaling, a linear transformation that is a standard pre-processing\nstep prior to dimensional reduction techniques like PCA. The `ScaleData()` function\n\n1. shifts the expression of each gene, so that the mean expression across cells is 0\n2. scales the expression of each gene, so that the variance across cells is 1\n\nThis step gives equal weight in downstream analyses, so that highly-expressed genes do not dominate. The results of this are stored in `seu$RNA@scale.data`\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-9_dffac3a4073528d01e8a18f406b03094'}\n\n```{.r .cell-code}\nseu <- Seurat::ScaleData(seu)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nCentering and scaling data matrix\n```\n:::\n:::\n\n\n\n::: {.callout-note}\n## The use of `Seurat::SCTransform`\n\nThe functions `NormalizeData`, `VariableFeatures` and `ScaleData` can be replaced by the function `SCTransform`. The latter uses a more sophisticated way to perform the normalization and scaling, and is [argued to perform better](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1874-1). However, it is slower, and a bit less transparent compared to using the three separate functions. Therefore, we chose not to use `SCTransform` for the exercises.\n\n:::\n\n\n::: {.callout-important}\n## Bonus exercise\nRun `SCTransform` on the `seu` object. Where is the output stored?\n\n::: \n\n::: {.callout-tip collapse=\"true\"}\n## Answer\nYou can run it like so:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-10_c47968a45f8b4cb5593648afccb82fd5'}\n\n```{.r .cell-code}\nseu <- Seurat::SCTransform(seu)\n```\n:::\n\n\nAnd it will add an extra assay to the object. `names(seu@assays)` returns:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-11_203c7cf63325381fd0d3149ed1672acd'}\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"RNA\" \"SCT\"\n```\n:::\n:::\n\n\nMeaning that a whole new assay was added (including the sparse matrices with counts, normalized data and scaled data). \n:::\n\n::: {.callout-warning}\nRunning `SCTransform` will change `@active.assay` into `SCT`(in stead of `RNA`; check it with `DefaultAssay(seu)`). This assay is used as a default for following function calls. To change the active assay to `RNA` run:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-12_37b5a6048ed15406d7960fa8b9e87b20'}\n\n```{.r .cell-code}\nDefaultAssay(seu) <- \"RNA\"\n```\n:::\n\n:::\n\n### Save the dataset and clear environment\n\nNow, save the dataset so you can use it tomorrow:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-13_4a0f196dd49ee92f3ed30a7ed684ff50'}\n\n```{.r .cell-code}\nsaveRDS(seu, \"seu_day1-3.rds\")\n```\n:::\n\n\nClear your environment:\n\n\n::: {.cell hash='day1-3_normalization_scaling_cache/html/unnamed-chunk-14_bfe15eb48168291626471e68a3857c5f'}\n\n```{.r .cell-code}\nrm(list = ls())\ngc()\n.rs.restartR()\n```\n:::\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/day1/day1-3_normalization_scaling/figure-html/unnamed-chunk-5-1.png b/_freeze/day1/day1-3_normalization_scaling/figure-html/unnamed-chunk-5-1.png
diff --git a/_freeze/day1/day1-3_normalization_scaling/figure-html/unnamed-chunk-8-1.png b/_freeze/day1/day1-3_normalization_scaling/figure-html/unnamed-chunk-8-1.png