Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow test and history for SC cell cycle regression tutorial #5640

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ items:
- name: 'DOI: 10.5281/zenodo.7311628'
description: latest
items:
- url: https://zenodo.org/api/files/24d90230-31bf-4cc9-b1d3-b760de965c72/g2mPhase.tabular
- url: https://zenodo.org/record/7311628//files/g2mPhase.tabular
src: url
ext: tabular
info: https://doi.org/10.5281/zenodo.7311628
- url: https://zenodo.org/api/files/24d90230-31bf-4cc9-b1d3-b760de965c72/Processed_AnnData.h5ad
- url: https://zenodo.org/record/7311628/files/Processed_AnnData.h5ad
src: url
ext: h5ad
info: https://doi.org/10.5281/zenodo.7311628
- url: https://zenodo.org/api/files/24d90230-31bf-4cc9-b1d3-b760de965c72/sPhase.tabular
- url: https://zenodo.org/record/7311628//files/sPhase.tabular
src: url
ext: tabular
info: https://doi.org/10.5281/zenodo.7311628
28 changes: 15 additions & 13 deletions topics/single-cell/tutorials/scrna-case_cell-cycle/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@ layout: tutorial_hands_on

title: Removing the effects of the cell cycle
zenodo_link: https://zenodo.org/record/7311628/
answer_histories:
- label: "UseGalaxy.eu"
history: https://singlecell.usegalaxy.eu/u/videmp/h/cell-cycle-regression-workflow
date: 2024-12-13
subtopic: tricks
priority: 2
questions:
Expand Down Expand Up @@ -33,12 +37,10 @@ contributions:
- nomadscientist
testing:
- hrukkudyr

- pavanvidem

---



Single-cell RNA sequencing can be sensitive to both biological and technical variation, which is why preparing your data carefully is an important part of the analysis. You want the results to reflect the interesting differences in expression between cells that relate to their type or state. Other sources of variation can conceal or confound this, making it harder for you to see what is going on.

One common biological confounder is the cell cycle ({% cite Luecken2019 %}). Cells express different genes during different parts of the cell cycle, depending on whether they are in their growing phase (G1), duplicating their DNA (the S or Synthesis phase), or dividing in two (G2/M or Mitosis phase). If these cell cycle genes are having a big impact on your data, then you could end up with separate clusters that actually represent cells of the same type that are just at different stages of the cycle.
Expand Down Expand Up @@ -243,13 +245,13 @@ Next, we'll need a list of all the genes in our dataset, so that we can mark the
> 3. {% tool [Add column](toolshed.g2.bx.psu.edu/repos/devteam/add_value/addValue/1.0.0) %} with the following parameters:
> - {% icon param-file %} *"to Dataset"*: `table` (output of **Table Compute** {% icon tool %})
> - *"Iterate?"*: `YES`
>
>
>
> > <comment-title>Keeping the genes in order</comment-title>
> >
> > Adding these numbers will enable us to keep the genes in their original order. This is essential for adding the cell cycle gene annotation back into the AnnData dataset.
> {: .comment}
>
>
>
> 4. Rename the output `Dataset_Genes`
{: .hands_on}
Expand Down Expand Up @@ -328,9 +330,9 @@ We now have a table with all the gene names in the same order as the main datase
> ```
> CC_genes
> ```
>
>
> {% snippet faqs/galaxy/datasets_create_new_file.md format="tabular" %}
>
>
>
> 3. {% tool [Concatenate datasets](cat1) %} with the following parameters:
> - {% icon param-file %} *"Concatenate Dataset"*: `Pasted Entry` dataset
Expand Down Expand Up @@ -360,7 +362,7 @@ We will need to add the annotation to both the annotated dataset `CellCycle_Anno
> - {% icon param-file %} *"Annotated data matrix"*: `CellCycle_Regressed` (output of **Scanpy RegressOut** {% icon tool %})
> - *"Function to manipulate the object"*: `Add new annotation(s) for observations or variables`
> - {% icon param-file %} *"Table with new annotations"*: `out_file1` (output of **Concatenate datasets** {% icon tool %})
>
>
>
> 4. Rename the output `CellCycle_Regressed_CC`
>
Expand All @@ -378,7 +380,7 @@ To demonstrate the power of cell cycle regression, we're going to reduce our exp
> - *"Type of filtering?"*: `By key (column) values`
> - *"Key to filter"*: `CC_genes`
> - *"Type of value to filter"*: `Boolean`
>
>
>
> 2. Rename the output `CellCycle_Annotated_CC_Only`
>
Expand All @@ -389,7 +391,7 @@ To demonstrate the power of cell cycle regression, we're going to reduce our exp
> - *"Type of filtering?"*: `By key (column) values`
> - *"Key to filter"*: `CC_genes`
> - *"Type of value to filter"*: `Boolean`
>
>
>
> 4. Rename the output `CellCycle_Regressed_CC_Only`
>
Expand All @@ -407,11 +409,11 @@ You will learn more about plotting your data in the [Filter, Plot and Explore]({
> - {% icon param-file %} *"Annotated data matrix"*: `CellCycle_Annotated_CC_Only` (output of **Manipulate AnnData** {% icon tool %})
> - *"Method used"*: `Computes PCA (principal component analysis) coordinates, loadings and variance decomposition, using 'tl.pca'`
> - *"Type of PCA?"*: `Full PCA`
>
>
> > <comment-title>Plot all the genes </comment-title>
> >
> > Make sure that you de-select the option for the {% tool Cluster, infer trajectories and embed %} tool to use highly variable genes only - some of the cell cycle genes are also HVGs, but we want our plots to include the cell cycle genes that aren't HVGs too.
> {: .comment}
> > Make sure that you de-select the option for the {% tool Cluster, infer trajectories and embed %} tool to use highly variable genes only - some of the cell cycle genes are also HVGs, but we want our plots to include the cell cycle genes that aren't HVGs too.
> {: .comment}
>
> 2. {% tool [Plot](toolshed.g2.bx.psu.edu/repos/iuc/scanpy_plot/scanpy_plot/1.7.1+galaxy1) %} with the following parameters:
> - {% icon param-file %} *"Annotated data matrix"*: `anndata_out` (output of **Cluster, infer trajectories and embed** {% icon tool %})
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
- doc: Test outline for Cell-Cycle-Regression-Workflow
job:
AnnData (After QC, normalisation, scaling):
class: File
location: https://zenodo.org/record/7311628/files/Processed_AnnData.h5ad
filetype: h5ad
S Phase Genes:
class: File
location: https://zenodo.org/record/7311628//files/sPhase.tabular
filetype: tabular
G2M Phase Genes:
class: File
location: https://zenodo.org/record/7311628//files/g2mPhase.tabular
filetype: tabular
Pasted Entry:
class: File
location: test-data/annotation_header.tabular
filetype: tabular
outputs:
anndata_out (Step 20):
asserts:
has_h5_keys:
keys: "obs/G2M_score"
keys: "obs/S_score"
keys: "var/CC_genes"
keys: "uns/pca"
out_png (Step 22):
asserts:
has_size:
value: 133331
delta: 2700
Loading
Loading