Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Souporcell running w/ aneuploidy sample #223

Open
hypaik opened this issue Mar 13, 2024 · 6 comments
Open

Souporcell running w/ aneuploidy sample #223

hypaik opened this issue Mar 13, 2024 · 6 comments

Comments

@hypaik
Copy link

hypaik commented Mar 13, 2024

Dear Heaton

Thank you for the development for this nice tool. I've been totally enjoyed to analyze my samples with Souporcell.
However, recently, I've got very unique sample.
It is embryo cells in very early stage with aneuploidy of autosomal chromosome. When I tried to run souporcell (it is mixed sample with fetus and maternal cell), the Souporcell spits out error signal then there is no result for clustering at all.
Based on the published paper of sourporcell (Nat. Method, 2020), I guess it is an issue of diploidy assumption. I you can share your bioinformatic insight for this issue let me know. Thank you.

@wheaton5
Copy link
Owner

Can you give more info on the error? I dont think diploid assumption should make any difference until the last step which is the estimation of ambient rna which is after clustering and doublet detection.

@hypaik
Copy link
Author

hypaik commented Mar 13, 2024

Thank you for your prompt response.

Here is the head of error messages
FYI, GRCh38_cellRanger.fa is a reference genome file I used.
In addition, the same ref file has no problem with other souporcell running.
Moreover, with out souporcell, *h5d file of this sample showed low doublet rate via Scrubelt.

"
[proj_xxx]$ head souporcell_K2RPLEndo1.err
/xxx/proj_Termination/GRCh38_cellRanger.fa: line 1: 1: command not found
/blues/ngs/data/proj_Termination/GRCh38_cellRanger.fa: line 2: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN: command not found
/xxxa/proj_Termination/GRCh38_cellRanger.fa: line 3: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN: command not found
/xxx/proj_Termination/GRCh38_cellRanger.fa: line 4: NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN: command not found...."

@wheaton5
Copy link
Owner

What step is this coming from? Like souporcell outputs .err files for each step. Which file is this? This is pretty weird like the fasta is being treated as an executable…

@hypaik
Copy link
Author

hypaik commented Mar 13, 2024

Thank you for your fast response. I figured out something wrong happened in the bash shell script of mine.
I used a SLURM-base system of my institute. It makes this weird thing... Now I fixed this bug :-). However I still curious why diploidy assumption do not impact the results. Can I send an independent e-mail you for this issue? I found that your affiliation was changed based on Google Scholar.

@wheaton5
Copy link
Owner

Sure, just updated my email on google scholar. You can find me at whheaton@gmail.com or haynesheaton@auburn.edu

@wheaton5
Copy link
Owner

wheaton5 commented Mar 13, 2024

But the short answer is lets look at the steps.

  1. remap - clearly doesnt require ploidy assumption
  2. candidate variants (freebayes) - we dont know how many individuals are in the sample and in what ratios so we cant assume allele fractions expected.
  3. allele assignment to cells (vartrix) - also just whatever the data is
  4. clustering - this could have a diploid assumption but there are also doublets and ambient RNA and false positive variants including RNA editing sites making this noisier. We have found that having no assumption of allele fractions is more accurate than including it
  5. doublet detection - we treat this as a statistical urn problem and simply ask the question "was this cell more likely drawn from the alleles of 2 clusters or 1 cluster"
  6. ambient RNA estimation and genotyping - both of these require a ploidy estimation because both rely on expectations of allele fractions. And currently we only support ploidy 1 and 2, not polyploid. Polyploid could be added, but most polyploid are allopolyploid not autopolyploid and thus will have separate reference chromosomes for each parental lineage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants