diff --git a/_freeze/day2/day2-1_dimensionality_reduction/execute-results/html.json b/_freeze/day2/day2-1_dimensionality_reduction/execute-results/html.json index ae6c8b7..a700b67 100644 --- a/_freeze/day2/day2-1_dimensionality_reduction/execute-results/html.json +++ b/_freeze/day2/day2-1_dimensionality_reduction/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "48c2592c7f3af6679cde08f2270d251e", + "hash": "fe8e827f03879226d4d8bad89dc24e53", "result": { - "markdown": "---\ntitle: \"Dimensionality reduction\"\n---\n\n\n\n## Material\n\n\n{{< downloadthis ../assets/pdf/scRNAseq_RM_Integration_dimreduction.pdf dname=\"scRNAseq_RM_Integration_dimreduction\" label=\"Download the presentation\" icon=\"filetype-pdf\" >}}\n\n\n{{< video https://youtu.be/QS6ZH_41b4s?si=fvaNGkqjsIcxTEFn >}}\n\n\n\n- Making sense of [PCA](https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues)\n- Understanding [t-SNE](https://distill.pub/2016/misread-tsne/)\n- [t-SNE explained](https://www.youtube.com/watch?v=NEaUSP4YerM) by Josh Starmer\n- Understanding [UMAP](https://pair-code.github.io/understanding-umap/)\n- [Video](https://www.youtube.com/watch?v=nq6iPZVUxZU) by one of the UMAP authors\n- More info on [UMAP parameters](https://umap.scikit-tda.org/parameters.html)\n\n\n## Exercises\n\nLoad the `seu` dataset you have created yesterday:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-1_5e0f12af8e616590ab40c6ce9aa93ace'}\n\n```{.r .cell-code}\nseu <- readRDS(\"seu_day1-3.rds\")\n```\n:::\n\n\nAnd load the following packages:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-2_29e0fe2490c5b87dc7289a7e38dba026'}\n\n```{.r .cell-code}\nlibrary(Seurat)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nLoading required package: SeuratObject\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nLoading required package: sp\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n'SeuratObject' was built under R 4.3.0 but the current version is\n4.3.2; it is recomended that you reinstall 'SeuratObject' as the ABI\nfor R may have changed\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n\nAttaching package: 'SeuratObject'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following object is masked from 'package:base':\n\n intersect\n```\n:::\n:::\n\n\nOnce the data is normalized, scaled and variable features have been identified, we can start to reduce the dimensionality of the data.\nFor the PCA, by default, only the previously determined variable features are used as input, but can be defined using features argument if you wish to specify a vector of genes. The PCA will only be run on the variable features, that you can check with `VariableFeatures(seu)`.\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-3_64cd3c15bcec4ce8410601c13331edd9'}\n\n```{.r .cell-code}\nseu <- Seurat::RunPCA(seu)\n```\n:::\n\n\nTo view the PCA plot:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-4_e79522eb2ca10f7128b00787f6ca78d1'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"pca\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\nWe can colour the PCA plot according to any factor that is present in `@meta.data`, or for any gene. For example we can take the column `percent.globin`:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-5_efeb555ea0ec5c64dc69b94a099f24f1'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, reduction = \"pca\", features = \"percent.globin\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n\n::: {.callout-note}\nNote that we used a different plotting function here: `FeaturePlot`. The difference between `DimPlot` and `FeaturePlot` is that the first allows you to color the points in the plot according to a grouping variable (e.g. sample) while the latter allows you to color the points according to a continuous variable (e.g. gene expression).\n::: \n\n::: {.callout-important}\n## Exercise\nGenerate a PCA plot where color is according to counts of a gene (i.e. gene expression). For example, you can take `HBA1` (alpha subunit of hemoglobin), or one of the most variable genes (e.g. `IGKC`).\n::: \n\n::: {.callout-tip collapse=\"true\"}\n## Answer\nGenerating a PCA plot coloured according to gene expression (here `HBA1`):\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-6_cbdaa477c79d6dc89eb438d98abb9b9c'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, reduction = \"pca\", features = \"HBA1\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\n\nWe can generate heatmaps according to their principal component scores calculated in the rotation matrix:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-7_aa58393f21a91c2560011a5f2ad1dfcd'}\n\n```{.r .cell-code}\nSeurat::DimHeatmap(seu, dims = 1:12, cells = 500, balanced = TRUE)\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n\nThe elbowplot can help you in determining how many PCs to use for downstream analysis such as UMAP:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-8_631c0a52200b326938e29af0e3a0e4c8'}\n\n```{.r .cell-code}\nSeurat::ElbowPlot(seu, ndims = 40)\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\nThe elbow plot ranks principle components based on the percentage of variance explained by each one. Where we observe an \"elbow\" or flattening curve, the majority of true signal is captured by this number of PCs, eg around 25 PCs for the seu dataset.\n\nIncluding too many PCs usually does not affect much the result, while including too few PCs can affect the results very much.\n\nUMAP: The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space.\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-9_04a5164a2ff56942f0e24180b42cd7c2'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25)\n```\n:::\n\n\nTo view the UMAP plot:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-10_fb222f914d72d76157394058a17b2ccc'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n\n\n::: {.callout-important}\n## Exercise\n\nTry to change:\n\n**A.** Color the dots in the UMAP according to a variable (e.g. `percent.globin` or `HBA1`). Any idea where the erythrocytes probably are in the UMAP?\n\n**B.** The number of neighbors used for the calculation of the UMAP. Which is the parameter to change and how did it affect the output. What is the default ? In which situation would you lower/increase this ?\n\n**C.** The number of dims to extremes dims = 1:5 or dims = 1:50 how\ndid it affect the output ? In your opinion better few PCAs too much or too few ?\nWhy does dims = 1:100 not work? When would more precision be needed?\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n\n## Answer\n\n**Answer A** \n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-11_479593327c21b576da3d99bc5c8a4e88'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, features = c(\"HBA1\",\n \"percent.globin\",\n \"IGKC\",\n \"percent.mito\"))\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n\nThe erythrocytes are probably in the cluster with a higher percentage of globin expression.\n\n**Answer B**\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-12_10121388d85198a7767dd95d6d998d60'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25, n.neighbors = 5)\n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-13_c9c53c1249f4a9a4067932b1ed076475'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\nThe default number of neighbours is 30. It can be of interest to change the number of neighbors if one has subset the data (for instance in the situation where you would only consider the t-cells inyour data set), then maybe the number of neighbors in a cluster would anyway be most of the time lower than 30 then 30 is too much. In the other extreme where your dataset is extremely big an increase in the number of neighbors can be considered.\n\n**Answer C** \n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-14_35040f7cd4736efe672bfd34dec67563'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:5)\n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-15_b0cd06447c06f076f5cf89bd0660681a'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-16_3920ab8cf55a7802e467b92d8f0878b2'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:50) \n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-17_4b23543c31c0140fbe94b2fc78631ec3'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\nTaking dims = 1:100 does not work as in the step RunPCA by default only 50pcs are calculated, so the maximum that we can consider in further steps are 50, if more precision makes sense, for instance, if the genes that is of interest for your study is not present when the RunPCA was calculated, then an increase in the number of components calculated at start might be interesting tobe considered. Taking too few PCs we have a « blob » everything looks connected. Too many PCs tends to separate everything. Personally it is more interesting for me too have maybe 2 clusters separated of epithelial cells that I then group for further downstream analysis rather than having very distinct cells being clustered together. So I would rather take the « elbow » of the elbow plot a bit further to the right.\n\n:::\n\n::: {callout-warning}\nAfter having done these exercises, change the UMAP back to a UMAP based on the first 25 PCs, in order to replicate the exercises in the following chapters. Do this by:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-18_6ac4f8cff1ae38933bd1674de2bb98ce'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25)\n```\n:::\n\n\n:::\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-19_694032eb8209da257c121973c32da5a6'}\n\n:::\n", + "markdown": "---\ntitle: \"Dimensionality reduction\"\n---\n\n\n\n## Material\n\n\n{{< downloadthis ../assets/pdf/Dimensionality_Reduction_SIB.pdf dname=\"Dimensionality_Reduction_SIB\" label=\"Download the presentation\" icon=\"filetype-pdf\" >}}\n\n\n{{< video https://youtu.be/QS6ZH_41b4s?si=fvaNGkqjsIcxTEFn >}}\n\n\n\n- Making sense of [PCA](https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues)\n- Understanding [t-SNE](https://distill.pub/2016/misread-tsne/)\n- [t-SNE explained](https://www.youtube.com/watch?v=NEaUSP4YerM) by Josh Starmer\n- Understanding [UMAP](https://pair-code.github.io/understanding-umap/)\n- [Video](https://www.youtube.com/watch?v=nq6iPZVUxZU) by one of the UMAP authors\n- More info on [UMAP parameters](https://umap.scikit-tda.org/parameters.html)\n\n\n## Exercises\n\nLoad the `seu` dataset you have created yesterday:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-1_5e0f12af8e616590ab40c6ce9aa93ace'}\n\n```{.r .cell-code}\nseu <- readRDS(\"seu_day1-3.rds\")\n```\n:::\n\n\nAnd load the following packages:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-2_29e0fe2490c5b87dc7289a7e38dba026'}\n\n```{.r .cell-code}\nlibrary(Seurat)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nLoading required package: SeuratObject\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nLoading required package: sp\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n'SeuratObject' was built under R 4.3.0 but the current version is\n4.3.2; it is recomended that you reinstall 'SeuratObject' as the ABI\nfor R may have changed\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\n\nAttaching package: 'SeuratObject'\n```\n:::\n\n::: {.cell-output .cell-output-stderr}\n```\nThe following object is masked from 'package:base':\n\n intersect\n```\n:::\n:::\n\n\nOnce the data is normalized, scaled and variable features have been identified, we can start to reduce the dimensionality of the data.\nFor the PCA, by default, only the previously determined variable features are used as input, but can be defined using features argument if you wish to specify a vector of genes. The PCA will only be run on the variable features, that you can check with `VariableFeatures(seu)`.\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-3_64cd3c15bcec4ce8410601c13331edd9'}\n\n```{.r .cell-code}\nseu <- Seurat::RunPCA(seu)\n```\n:::\n\n\nTo view the PCA plot:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-4_e79522eb2ca10f7128b00787f6ca78d1'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"pca\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\nWe can colour the PCA plot according to any factor that is present in `@meta.data`, or for any gene. For example we can take the column `percent.globin`:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-5_efeb555ea0ec5c64dc69b94a099f24f1'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, reduction = \"pca\", features = \"percent.globin\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n\n::: {.callout-note}\nNote that we used a different plotting function here: `FeaturePlot`. The difference between `DimPlot` and `FeaturePlot` is that the first allows you to color the points in the plot according to a grouping variable (e.g. sample) while the latter allows you to color the points according to a continuous variable (e.g. gene expression).\n::: \n\n::: {.callout-important}\n## Exercise\nGenerate a PCA plot where color is according to counts of a gene (i.e. gene expression). For example, you can take `HBA1` (alpha subunit of hemoglobin), or one of the most variable genes (e.g. `IGKC`).\n::: \n\n::: {.callout-tip collapse=\"true\"}\n## Answer\nGenerating a PCA plot coloured according to gene expression (here `HBA1`):\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-6_cbdaa477c79d6dc89eb438d98abb9b9c'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, reduction = \"pca\", features = \"HBA1\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n:::\n\n\nWe can generate heatmaps according to their principal component scores calculated in the rotation matrix:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-7_aa58393f21a91c2560011a5f2ad1dfcd'}\n\n```{.r .cell-code}\nSeurat::DimHeatmap(seu, dims = 1:12, cells = 500, balanced = TRUE)\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n\nThe elbowplot can help you in determining how many PCs to use for downstream analysis such as UMAP:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-8_631c0a52200b326938e29af0e3a0e4c8'}\n\n```{.r .cell-code}\nSeurat::ElbowPlot(seu, ndims = 40)\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\nThe elbow plot ranks principle components based on the percentage of variance explained by each one. Where we observe an \"elbow\" or flattening curve, the majority of true signal is captured by this number of PCs, eg around 25 PCs for the seu dataset.\n\nIncluding too many PCs usually does not affect much the result, while including too few PCs can affect the results very much.\n\nUMAP: The goal of these algorithms is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space.\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-9_04a5164a2ff56942f0e24180b42cd7c2'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25)\n```\n:::\n\n\nTo view the UMAP plot:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-10_fb222f914d72d76157394058a17b2ccc'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n\n\n::: {.callout-important}\n## Exercise\n\nTry to change:\n\n**A.** Color the dots in the UMAP according to a variable (e.g. `percent.globin` or `HBA1`). Any idea where the erythrocytes probably are in the UMAP?\n\n**B.** The number of neighbors used for the calculation of the UMAP. Which is the parameter to change and how did it affect the output. What is the default ? In which situation would you lower/increase this ?\n\n**C.** The number of dims to extremes dims = 1:5 or dims = 1:50 how\ndid it affect the output ? In your opinion better few PCAs too much or too few ?\nWhy does dims = 1:100 not work? When would more precision be needed?\n\n:::\n\n::: {.callout-tip collapse=\"true\"}\n\n## Answer\n\n**Answer A** \n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-11_479593327c21b576da3d99bc5c8a4e88'}\n\n```{.r .cell-code}\nSeurat::FeaturePlot(seu, features = c(\"HBA1\",\n \"percent.globin\",\n \"IGKC\",\n \"percent.mito\"))\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-11-1.png){width=672}\n:::\n:::\n\n\nThe erythrocytes are probably in the cluster with a higher percentage of globin expression.\n\n**Answer B**\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-12_10121388d85198a7767dd95d6d998d60'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25, n.neighbors = 5)\n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-13_c9c53c1249f4a9a4067932b1ed076475'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-13-1.png){width=672}\n:::\n:::\n\n\nThe default number of neighbours is 30. It can be of interest to change the number of neighbors if one has subset the data (for instance in the situation where you would only consider the t-cells inyour data set), then maybe the number of neighbors in a cluster would anyway be most of the time lower than 30 then 30 is too much. In the other extreme where your dataset is extremely big an increase in the number of neighbors can be considered.\n\n**Answer C** \n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-14_35040f7cd4736efe672bfd34dec67563'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:5)\n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-15_b0cd06447c06f076f5cf89bd0660681a'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-15-1.png){width=672}\n:::\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-16_3920ab8cf55a7802e467b92d8f0878b2'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:50) \n```\n:::\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-17_4b23543c31c0140fbe94b2fc78631ec3'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-1_dimensionality_reduction_files/figure-html/unnamed-chunk-17-1.png){width=672}\n:::\n:::\n\n\nTaking dims = 1:100 does not work as in the step RunPCA by default only 50pcs are calculated, so the maximum that we can consider in further steps are 50, if more precision makes sense, for instance, if the genes that is of interest for your study is not present when the RunPCA was calculated, then an increase in the number of components calculated at start might be interesting tobe considered. Taking too few PCs we have a « blob » everything looks connected. Too many PCs tends to separate everything. Personally it is more interesting for me too have maybe 2 clusters separated of epithelial cells that I then group for further downstream analysis rather than having very distinct cells being clustered together. So I would rather take the « elbow » of the elbow plot a bit further to the right.\n\n:::\n\n::: {callout-warning}\nAfter having done these exercises, change the UMAP back to a UMAP based on the first 25 PCs, in order to replicate the exercises in the following chapters. Do this by:\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-18_6ac4f8cff1ae38933bd1674de2bb98ce'}\n\n```{.r .cell-code}\nseu <- Seurat::RunUMAP(seu, dims = 1:25)\n```\n:::\n\n\n:::\n\n\n::: {.cell hash='day2-1_dimensionality_reduction_cache/html/unnamed-chunk-19_694032eb8209da257c121973c32da5a6'}\n\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_freeze/day2/day2-2_integration/execute-results/html.json b/_freeze/day2/day2-2_integration/execute-results/html.json index f471e40..ea8e278 100644 --- a/_freeze/day2/day2-2_integration/execute-results/html.json +++ b/_freeze/day2/day2-2_integration/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "1be8d3749096b9eacbc06fd6a3c5be09", + "hash": "52afd51c30892f3b3203123bef3b9334", "result": { - "markdown": "---\ntitle: \"Integration\"\n---\n\n\n## Material\n\n\n{{< video https://youtu.be/2TsW5A53hTg?si=Bas__YskLl21-YjF >}}\n\n\n\n## Exercises\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-1_2e73120ec6ed776c4833390d0d61218b'}\n\n:::\n\n\nLet's have a look at the UMAP again. Although cells of different samples are shared amongst 'clusters', you can still see seperation within the clusters:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-2_79eb911a4b7970c3b1eb125b14ad124a'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-2_integration_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\nTo perform the integration, we split our object by sample, resulting into a set of layers within the `RNA` assay. The layers are integrated and stored in the reduction slot - in our case we call it `integrated.cca`. Then, we re-join the layers\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-3_1e63f55eac8216c49406128ede54eb12'}\n\n```{.r .cell-code}\nseu[[\"RNA\"]] <- split(seu[[\"RNA\"]], f = seu$orig.ident)\n\nseu <- Seurat::IntegrateLayers(object = seu, method = CCAIntegration,\n orig.reduction = \"pca\",\n new.reduction = \"integrated.cca\",\n verbose = FALSE)\n\n# re-join layers after integration\nseu[[\"RNA\"]] <- JoinLayers(seu[[\"RNA\"]])\n```\n:::\n\n\n\nWe can then use this new integrated matrix for clustering and visualization. Now, we can re-run and visualize the results with UMAP.\n\n::: {.callout-important}\n## Exercise\nCreate the UMAP again on the `integrated.cca` reduction (using the function `RunUMAP` - set the option `reduction` accordingly). After that, generate the UMAP plot. Did the integration perform well?\n::: \n\n::: {.callout-tip collapse=\"true\"}\n## Answer\n\nPerforming the scaling, PCA and UMAP:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-4_0482443b43a57d2cfa527d2cdb471376'}\n\n```{.r .cell-code}\nseu <- RunUMAP(seu, dims = 1:30, reduction = \"integrated.cca\")\n```\n:::\n\n\nPlotting the UMAP:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-5_f7f85ce27c690dd8dd267e206a65ebc8'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-2_integration_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n::: \n\n\n### Save the dataset and clear environment\n\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-6_120af022df6fc12be96284075d624134'}\n\n```{.r .cell-code}\nsaveRDS(seu, \"seu_day2-2.rds\")\n```\n:::\n\n\nClear your environment:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-7_e2ee4d1c633d3765009a37395acfae7d'}\n\n```{.r .cell-code}\nrm(list = ls())\ngc()\n.rs.restartR()\n```\n:::\n", + "markdown": "---\ntitle: \"Integration\"\n---\n\n\n## Material\n\n\n{{< downloadthis ../assets/pdf/Integration_SIB.pdf dname=\"Integration_SIB\" label=\"Download the presentation\" icon=\"filetype-pdf\" >}}\n\n\n{{< video https://youtu.be/2TsW5A53hTg?si=Bas__YskLl21-YjF >}}\n\n\n\n## Exercises\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-1_2e73120ec6ed776c4833390d0d61218b'}\n\n:::\n\n\nLet's have a look at the UMAP again. Although cells of different samples are shared amongst 'clusters', you can still see seperation within the clusters:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-2_79eb911a4b7970c3b1eb125b14ad124a'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-2_integration_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\nTo perform the integration, we split our object by sample, resulting into a set of layers within the `RNA` assay. The layers are integrated and stored in the reduction slot - in our case we call it `integrated.cca`. Then, we re-join the layers\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-3_1e63f55eac8216c49406128ede54eb12'}\n\n```{.r .cell-code}\nseu[[\"RNA\"]] <- split(seu[[\"RNA\"]], f = seu$orig.ident)\n\nseu <- Seurat::IntegrateLayers(object = seu, method = CCAIntegration,\n orig.reduction = \"pca\",\n new.reduction = \"integrated.cca\",\n verbose = FALSE)\n\n# re-join layers after integration\nseu[[\"RNA\"]] <- JoinLayers(seu[[\"RNA\"]])\n```\n:::\n\n\n\nWe can then use this new integrated matrix for clustering and visualization. Now, we can re-run and visualize the results with UMAP.\n\n::: {.callout-important}\n## Exercise\nCreate the UMAP again on the `integrated.cca` reduction (using the function `RunUMAP` - set the option `reduction` accordingly). After that, generate the UMAP plot. Did the integration perform well?\n::: \n\n::: {.callout-tip collapse=\"true\"}\n## Answer\n\nPerforming the scaling, PCA and UMAP:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-4_0482443b43a57d2cfa527d2cdb471376'}\n\n```{.r .cell-code}\nseu <- RunUMAP(seu, dims = 1:30, reduction = \"integrated.cca\")\n```\n:::\n\n\nPlotting the UMAP:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-5_f7f85ce27c690dd8dd267e206a65ebc8'}\n\n```{.r .cell-code}\nSeurat::DimPlot(seu, reduction = \"umap\")\n```\n\n::: {.cell-output-display}\n![](day2-2_integration_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n::: \n\n\n### Save the dataset and clear environment\n\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-6_120af022df6fc12be96284075d624134'}\n\n```{.r .cell-code}\nsaveRDS(seu, \"seu_day2-2.rds\")\n```\n:::\n\n\nClear your environment:\n\n\n::: {.cell hash='day2-2_integration_cache/html/unnamed-chunk-7_e2ee4d1c633d3765009a37395acfae7d'}\n\n```{.r .cell-code}\nrm(list = ls())\ngc()\n.rs.restartR()\n```\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/assets/pdf/Dimensionality_Reduction_SIB.pdf b/assets/pdf/Dimensionality_Reduction_SIB.pdf new file mode 100644 index 0000000..8df410b Binary files /dev/null and b/assets/pdf/Dimensionality_Reduction_SIB.pdf differ diff --git a/assets/pdf/Integration_SIB.pdf b/assets/pdf/Integration_SIB.pdf new file mode 100644 index 0000000..80c8fdf Binary files /dev/null and b/assets/pdf/Integration_SIB.pdf differ diff --git a/day2/day2-1_dimensionality_reduction.qmd b/day2/day2-1_dimensionality_reduction.qmd index 040dbc7..8c7d197 100644 --- a/day2/day2-1_dimensionality_reduction.qmd +++ b/day2/day2-1_dimensionality_reduction.qmd @@ -5,7 +5,7 @@ title: "Dimensionality reduction" ## Material -{{< downloadthis ../assets/pdf/scRNAseq_RM_Integration_dimreduction.pdf dname="scRNAseq_RM_Integration_dimreduction" label="Download the presentation" icon="filetype-pdf" >}} +{{< downloadthis ../assets/pdf/Dimensionality_Reduction_SIB.pdf dname="Dimensionality_Reduction_SIB" label="Download the presentation" icon="filetype-pdf" >}} {{< video https://youtu.be/QS6ZH_41b4s?si=fvaNGkqjsIcxTEFn >}} diff --git a/day2/day2-2_integration.qmd b/day2/day2-2_integration.qmd index aa2ba1d..39c1cd3 100644 --- a/day2/day2-2_integration.qmd +++ b/day2/day2-2_integration.qmd @@ -4,6 +4,8 @@ title: "Integration" ## Material +{{< downloadthis ../assets/pdf/Integration_SIB.pdf dname="Integration_SIB" label="Download the presentation" icon="filetype-pdf" >}} + {{< video https://youtu.be/2TsW5A53hTg?si=Bas__YskLl21-YjF >}} ## Exercises