Skip to content

Commit

Permalink
Add DE set comparisons (#489)
Browse files Browse the repository at this point in the history
* Add DE set comparisons

Signed-off-by: zethson <lukas.heumos@posteo.net>

* Add docs

Signed-off-by: zethson <lukas.heumos@posteo.net>

---------

Signed-off-by: zethson <lukas.heumos@posteo.net>
  • Loading branch information
Zethson authored Jan 6, 2024
1 parent 83c0350 commit ec4fb3a
Show file tree
Hide file tree
Showing 3 changed files with 145 additions and 49 deletions.
62 changes: 14 additions & 48 deletions docs/usage/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,6 @@ Simple functions for:

Assigning guides based on thresholds. Each cell is assigned to the most expressed gRNA if it has at least the specified number of counts.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: preprocessing
Expand Down Expand Up @@ -111,6 +107,20 @@ ga.plot_heatmap(gdo, layer="assigned_guides")

## Tools

### Differential gene expression

Differential gene expression involves the quantitative comparison of gene expression levels between two or more groups,
such as different cell types, tissues, or conditions to discern genes that are significantly up- or downregulated in response to specific biological contexts or stimuli.
Pertpy provides utilities to conduct differential gene expression tests through a common interface that supports complex designs and methods.

```{eval-rst}
.. autosummary::
:toctree: preprocessing
:nosignatures:
tools.DifferentialGeneExpression
```

### Pooled CRISPR screens

#### Mixscape
Expand All @@ -122,10 +132,6 @@ Mixscape first tries to remove confounding sources of variation such as cell cyc
Next, it determines which targeted cells were affected by the genetic perturbation (=KO) and which targeted cells were not (=NP) with the use of mixture models.
Finally, it visualizes similarities and differences across different perturbations.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -155,10 +161,6 @@ See [mixscape tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebo
A Python implementation of Milo for differential abundance testing on KNN graphs, to ease interoperability with scverse pipelines for single-cell analysis.
See [Differential abundance testing on single-cell data using k-nearest neighbor graphs](https://www.nature.com/articles/s41587-021-01033-z) for details on the statistical framework.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -195,10 +197,6 @@ milo.da_nhoods(mdata, design="~Status")
Reimplementation of scCODA for identification of compositional changes in high-throughput sequencing count data and tascCODA for sparse, tree-aggregated modeling of high-throughput sequencing data.
See [scCODA is a Bayesian model for compositional single-cell data analysis](https://www.nature.com/articles/s41467-021-27150-6) for statistical methodology and benchmarking performance of scCODA and [tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data](https://www.frontiersin.org/articles/10.3389/fgene.2021.766405/full) for statistical methodology and benchmarking performance of tascCODA.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -246,10 +244,6 @@ sccoda.plot_effects_barplot(
A **work in progress (!)** Python implementation of DIALOGUE for the discovery of multicellular programs.
See [DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data](https://www.nature.com/articles/s41587-022-01288-0) for more details on the methodology.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -286,10 +280,6 @@ all_results, new_mcps = dl.multilevel_modeling(

#### Enrichment

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand All @@ -312,10 +302,6 @@ pt_enricher.score(adata)
General purpose functions for distances and permutation tests.
Reimplements functions from [scperturb](http://projects.sanderlab.org/scperturb/) package.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -375,10 +361,6 @@ results["summary_metrics"]

See [augur tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks/augur.html) for a more elaborate tutorial.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand All @@ -392,10 +374,6 @@ See [augur tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/notebooks
Reimplementation of scGen for perturbation response prediction of scRNA-seq data in Jax.
See [scGen predicts single-cell perturbation responses](https://www.nature.com/articles/s41592-019-0494-8) for more details.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -435,10 +413,6 @@ CINEMA-OT separates confounding sources of variation from perturbation effects t
These cell pairs represent causal perturbation responses permitting a number of novel analyses, such as individual treatment effect analysis, response clustering, attribution analysis, and synergy analysis.
See [Causal identification of single-cell experimental perturbation effects with CINEMA-OT](https://www.biorxiv.org/content/10.1101/2022.07.31.502173v3.abstract) for more details.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -473,10 +447,6 @@ See [CINEMA-OT tutorial](https://pertpy.readthedocs.io/en/latest/tutorials/noteb

Various modules for calculating and evaluating perturbation spaces.

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: tools
Expand Down Expand Up @@ -533,10 +503,6 @@ Available databases for mechanism of action metadata:

- [CLUE](https://clue.io/)

```{eval-rst}
.. currentmodule:: pertpy
```

```{eval-rst}
.. autosummary::
:toctree: metadata
Expand Down
75 changes: 74 additions & 1 deletion pertpy/tools/_differential_gene_expression.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@
import decoupler as dc
import numpy as np
import numpy.typing as npt
import pandas as pd
from scipy.stats import kendalltau, pearsonr, spearmanr

if TYPE_CHECKING:
import pandas as pd
from anndata import AnnData


Expand Down Expand Up @@ -143,6 +144,78 @@ def filter_by_prop(self, adata: AnnData, min_prop: float = 0.2, min_samples: int

return filtered_adata

def calculate_correlation(
self,
de_res_1: pd.DataFrame,
de_res_2: pd.DataFrame,
method: Literal["spearman", "pearson", "kendall-tau"] = "spearman",
) -> pd.DataFrame:
"""Calculate the Spearman correlation coefficient for 'pvals_adj' and 'logfoldchanges' columns.
Args:
de_res_1: A DataFrame with DE result columns.
de_res_2: Another DataFrame with the same DE result columns.
method: The correlation method to apply. One of `spearman`, `pearson`, `kendall-tau`.
Defaults to `spearman`.
Returns:
A DataFrame with the Spearman correlation coefficients for 'pvals_adj' and 'logfoldchanges'.
"""
columns_of_interest = ["pvals_adj", "logfoldchanges"]
correlation_data = {}
for col in columns_of_interest:
match method:
case "spearman":
correlation, _ = spearmanr(de_res_1[col], de_res_2[col])
case "pearson":
correlation, _ = pearsonr(de_res_1[col], de_res_2[col])
case "kendall-tau":
correlation, _ = kendalltau(de_res_1[col], de_res_2[col])
case _:
raise ValueError("Unknown correlation method.")
correlation_data[col] = correlation

return pd.DataFrame([correlation_data], columns=columns_of_interest)

def calculate_jaccard_index(self, de_res_1: pd.DataFrame, de_res_2: pd.DataFrame, threshold: float = 0.05) -> float:
"""Calculate the Jaccard index for sets of significantly expressed genes/features based on a p-value threshold.
Args:
de_res_1: A DataFrame with DE result columns, including 'pvals'.
de_res_2: Another DataFrame with the same DE result columns.
threshold: A threshold for determining significant expression (default is 0.05).
Returns:
The Jaccard index.
"""
significant_set_1 = set(de_res_1[de_res_1["pvals"] <= threshold].index)
significant_set_2 = set(de_res_2[de_res_2["pvals"] <= threshold].index)

intersection = significant_set_1.intersection(significant_set_2)
union = significant_set_1.union(significant_set_2)

return len(intersection) / len(union) if union else 0

def calculate_cohens_d(self, de_res_1: pd.DataFrame, de_res_2: pd.DataFrame) -> pd.Series:
"""Calculate Cohen's D for the logfoldchanges.
Args:
de_res_1: A DataFrame with DE result columns, including 'logfoldchanges'.
de_res_2: Another DataFrame with the same DE result columns.
Returns:
A pandas Series containing Cohen's D for each gene/feature.
"""
means_1 = de_res_1["logfoldchanges"].mean()
means_2 = de_res_2["logfoldchanges"].mean()
sd_1 = de_res_1["logfoldchanges"].std()
sd_2 = de_res_2["logfoldchanges"].std()

pooled_sd = np.sqrt((sd_1**2 + sd_2**2) / 2)
cohens_d = (means_1 - means_2) / pooled_sd

return cohens_d

def de_analysis(
self,
adata: AnnData,
Expand Down
57 changes: 57 additions & 0 deletions tests/tools/test_differential_gene_expression.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import pandas as pd
import pertpy as pt
import pytest


@pytest.fixture
def dummy_de_results():
data1 = {"pvals": [0.1, 0.2, 0.3, 0.4], "pvals_adj": [0.1, 0.25, 0.35, 0.45], "logfoldchanges": [1, 2, 3, 4]}
data2 = {"pvals": [0.1, 0.2, 0.3, 0.4], "pvals_adj": [0.15, 0.2, 0.35, 0.5], "logfoldchanges": [2, 3, 4, 5]}
de_res_1 = pd.DataFrame(data1)
de_res_2 = pd.DataFrame(data2)

return de_res_1, de_res_2


@pytest.fixture
def pt_de():
pt_de = pt.tl.DifferentialGeneExpression()
return pt_de


def test_calculate_spearman_correlation(dummy_de_results, pt_de):
de_res_1, de_res_2 = dummy_de_results

result = pt_de.calculate_correlation(de_res_1, de_res_2, method="spearman")
assert result.shape == (1, 2)
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])


def test_calculate_pearson_correlation(dummy_de_results, pt_de):
de_res_1, de_res_2 = dummy_de_results

result = pt_de.calculate_correlation(de_res_1, de_res_2, method="pearson")
assert result.shape == (1, 2)
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])


def test_calculate_kendall_tau__correlation(dummy_de_results, pt_de):
de_res_1, de_res_2 = dummy_de_results

result = pt_de.calculate_correlation(de_res_1, de_res_2, method="kendall-tau")
assert result.shape == (1, 2)
assert all(column in result for column in ["pvals_adj", "logfoldchanges"])


def test_jaccard_index(dummy_de_results, pt_de):
de_res_1, de_res_2 = dummy_de_results

jaccard_index = pt_de.calculate_jaccard_index(de_res_1, de_res_2)
assert 0 <= jaccard_index <= 1


def test_calculate_cohens_d(dummy_de_results, pt_de):
de_res_1, de_res_2 = dummy_de_results

cohens_d = pt_de.calculate_cohens_d(de_res_1, de_res_2)
assert isinstance(cohens_d, float)

0 comments on commit ec4fb3a

Please sign in to comment.