Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All cells from simulated matrix are assigned to one group with probability 1 #96

Open
agalvezm opened this issue Oct 7, 2022 · 0 comments

Comments

@agalvezm
Copy link

agalvezm commented Oct 7, 2022

Hello,

Thanks so much for the very useful tool! I am having some issues using the scvi implementation of CellAssign and I wanted to share my results with you.

I am trying to reproduce the benchmarking of CellAssign using data simulatted by the adapted version of Splatter that you mention on your paper. I used the following parameters for the simulation:

de.facLoc = 0.1
de.facScale = 0.1
ct_prob = even across all groups (1 / number groups)
de_prob = 0.1
Number of genes: 10,000

I performed simulations for all possible combinations of the following number of cells and groups:

Number of cells: 1000, 2000, 4000, 8000 and 10,000
Number of groups: 2, 4, 6, 8

I followed one of the methods you name of your paper to select marker genes, namely:
Markers for CellAssign were selected from genes in the top 20th percentile in terms of log fold change among differentially upregulated genes and the top 10th percentile in terms of expression.

The gene marker matrix therefore contains around ~20 marker genes per group.

When I run CellAssign on any of the 20 simulated matrices, I always get the same result: all cells are assigned to one of the groups with a probability of 1. I am linking a google colab notebook that:

  1. Downloads the matrix with 1000 cells and 2 groups; and the gene marker matrix
  2. Runs CellAssign using the same commands that you show in your tutorial
  3. Inspects the simulated matrix to confirm the format and its content is normal.

The inspections of step 3 includes:

  • Plotting the first 2 principal components and labelling by the ground truth group (to confirm there indeed exist two different "cell types" in the data)
  • Running a very simple Gaussian Mixture model as a naive way of assigning cell types (we get cells assigned to both groups and not to only one)
  • Plotting the mean-variance relationship of the simulated matrix.

We have tried a number of things to fix the problem. This includes:

  • Adding/removing genes from the gene marker matrix
  • Using random genes as gene markers
  • Playing with the number of epochs

Nothing seems to modify the behaviour.

Any help on this would be greatly appreciated. Thanks so much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant