All cells from simulated matrix are assigned to one group with probability 1 #96

agalvezm · 2022-10-07T17:37:37Z

Hello,

Thanks so much for the very useful tool! I am having some issues using the scvi implementation of CellAssign and I wanted to share my results with you.

I am trying to reproduce the benchmarking of CellAssign using data simulatted by the adapted version of Splatter that you mention on your paper. I used the following parameters for the simulation:

de.facLoc = 0.1
de.facScale = 0.1
ct_prob = even across all groups (1 / number groups)
de_prob = 0.1
Number of genes: 10,000

I performed simulations for all possible combinations of the following number of cells and groups:

Number of cells: 1000, 2000, 4000, 8000 and 10,000
Number of groups: 2, 4, 6, 8

I followed one of the methods you name of your paper to select marker genes, namely:
Markers for CellAssign were selected from genes in the top 20th percentile in terms of log fold change among differentially upregulated genes and the top 10th percentile in terms of expression.

The gene marker matrix therefore contains around ~20 marker genes per group.

When I run CellAssign on any of the 20 simulated matrices, I always get the same result: all cells are assigned to one of the groups with a probability of 1. I am linking a google colab notebook that:

Downloads the matrix with 1000 cells and 2 groups; and the gene marker matrix
Runs CellAssign using the same commands that you show in your tutorial
Inspects the simulated matrix to confirm the format and its content is normal.

The inspections of step 3 includes:

Plotting the first 2 principal components and labelling by the ground truth group (to confirm there indeed exist two different "cell types" in the data)
Running a very simple Gaussian Mixture model as a naive way of assigning cell types (we get cells assigned to both groups and not to only one)
Plotting the mean-variance relationship of the simulated matrix.

We have tried a number of things to fix the problem. This includes:

Adding/removing genes from the gene marker matrix
Using random genes as gene markers
Playing with the number of epochs

Nothing seems to modify the behaviour.

Any help on this would be greatly appreciated. Thanks so much!!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All cells from simulated matrix are assigned to one group with probability 1 #96

All cells from simulated matrix are assigned to one group with probability 1 #96

agalvezm commented Oct 7, 2022

All cells from simulated matrix are assigned to one group with probability 1 #96

All cells from simulated matrix are assigned to one group with probability 1 #96

Comments

agalvezm commented Oct 7, 2022