Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous sample labeling for some souporcell clusters #234

Open
aranham opened this issue May 7, 2024 · 1 comment
Open

Ambiguous sample labeling for some souporcell clusters #234

aranham opened this issue May 7, 2024 · 1 comment

Comments

@aranham
Copy link

aranham commented May 7, 2024

Hi,

I’m working with scRNA-seq data where three samples are pooled together and we have 20 pools. I used souporcell for demultiplexing without initial whole-genome sequencing SNP data (as it wasn’t available for all samples at the time). I was able to demultiplex 17 pools by assigning labels from wgs data to clusters after souporcell analysis. However, I run into issues labeling samples in 3 pools. For these three pools I have two souporcell clusters clearly matching a single wgs sample each. The remaining souporcell cluster ambiguously matches parts of two wgs samples at levels greater than background noise but not reaching a clear match level. I suspect this ambiguous cluster might be high in heterotypic doublets.

To separate these ambiguous cells, I progressively increased the number of souporcell clusters from 3 to 7. This seemed to work for one samples, where all three wgs matches became distinct clusters with 7 clusters specified. Is increasing the number of clusters a valid approach for resolving ambiguous sample assignments, or are there potential pitfalls? We ruled out the possibility of closely related individuals based on wgs snp data analysis.

Before rerunning the experiment, are there any other solutions or checks you recommend to improve sample labeling accuracy?

Thanks!

Best,
Michelle

@plijnzaad
Copy link

Not entirely sure what your setup is but if you concatenate the bam files that you think contain identical genotypes (amongst potentially other genotypes) , the genotype estimates that SoupOrCell makes tend to get better because they have more data to go on. This can be especially important when some genotypes are only present in small numbers in some of the libraries. Be sure to disambiguate the cellbarcodes by their library names prior to the concatenation though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants