possible to compute distances on a subset of genes? #640

aterceros · 2024-08-14T15:21:44Z

Description of feature

Hi!
Thank you for making this package available! I was wondering if it is possible to compute distances between groups of cells for a subset of genes (for example differentially expressed between 2 groups)?
Thanks in advance.

stefanpeidli · 2024-10-21T09:29:52Z

Great suggestion! We usually calculate most distances in lower-dimensional spaces (such as PCA) since distances in high dimensions are bad. Depending on how large your set of genes is you can either

(If many genes and partially redundant) Calculate PCA on a subset of genes, then use that subset PCA for calculating distances. You can use the mask_var argument in scanpy.pp.pca for this.
(If few genes) Directly calculate distances on the subset. In this case I would just put your subset in adata.obsm['X_subset'] = adata[:, gene_subset'].X.copy(), then specify pt.tl.Distance(metric="euclidean", obsm_key="X_subset").

@Zethson since our distance function is already flexible enough to handle this case by specifying a different key in obsm I think there is no need to implement this feature here directly. We could a small example on this to the docs though because this approach is quite useful for analysis.

aterceros · 2024-10-30T15:07:19Z

Thank you for the comment! I'll try the second option!

aterceros · 2024-12-11T00:16:21Z

Hi!
Thank you for the suggestion above, I tried the second suggestion and seems to work well. However, when I run the bootsrap option, I get very large variances (i.e. between 120-160) for some comparisons only. Would you say that such large variance values can occur?

What I ran:
adata.obsm['X_subset'] = adata[:, geneset].X.copy()
distance = pt.tl.Distance(metric="wasserstein", obsm_key="X_subset")
X = adata.obsm["X_subset"][adata.obs["condition"] == "A"]
Y = adata.obsm["X_subset"][adata.obs["condition"] == "B"]
D = distance.bootstrap(X,Y)

my gene subsets are ~ 100 genes (DEGs).

Thank you!

aterceros added the enhancement New feature or request label Aug 14, 2024

Zethson self-assigned this Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible to compute distances on a subset of genes? #640

possible to compute distances on a subset of genes? #640

aterceros commented Aug 14, 2024

stefanpeidli commented Oct 21, 2024

aterceros commented Oct 30, 2024

aterceros commented Dec 11, 2024

possible to compute distances on a subset of genes? #640

possible to compute distances on a subset of genes? #640

Comments

aterceros commented Aug 14, 2024

Description of feature

stefanpeidli commented Oct 21, 2024

aterceros commented Oct 30, 2024

aterceros commented Dec 11, 2024