Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGE matrix with bimodal distributions of total counts #415

Open
ccuriqueo opened this issue May 4, 2024 · 1 comment
Open

DGE matrix with bimodal distributions of total counts #415

ccuriqueo opened this issue May 4, 2024 · 1 comment

Comments

@ccuriqueo
Copy link

ccuriqueo commented May 4, 2024

Hello, I hope you can guide me. Download Fastqs from SRA (SRR9843421), it is sequence data from Microwell-seq.
Use <fastq-dump --split-files -gz SRRXX>
then I used the Drop-seq protocol you posted. The DGE matrix with the following code

./DigitalExpression I= /mnt/d/output_control/my_clean.bam O= /mnt/d/output_control/control.dge.txt.gz SUMMARY= /mnt/d/output_control/control.dge.summary.txt TMP_DIR= / mnt/d/input_control/ NUM_CORE_BARCODES=10000

And when I loaded this as adata_object I got something like this, my question is if I have to perform any previous steps with the Fastq files, or would it be enough to use the protocol directly, since when I start the quality control there are several counts that have been filtered and the library is reduced quite a bit.

I attach an image of how the adata object looks in gene by counts vs total counts

Captura de pantalla 2024-05-03 234548

@jamesnemesh
Copy link
Collaborator

Hi,

I'm not at all familiar with Microwell-seq, so if there are technical requirements there not met by dropseq I can't answer those questions. The original Microwell paper is pretty light on data processing details.

For the bimodal data you've posted, It's quite possible that when you're forcing extraction of 10K cell barcodes that you are extracting both cell barcodes that have captured cells, as well as cell barcodes that have only captured ambient RNA. Since the counts and number of genes are so correlated, I think it would make sense to plot a 1d density plot of total counts which should be bimodal (but give you a better sense of how many cell barcodes are in each mode).

Generally there's a cell selection step in most scRNASeq pipelines, which you might have to implement here. If you hadn't already done this, a more general approach would be to extract all cells with at least some number of transcripts (20 or 100) and repeat this plot, then select the distribution of cell barcodes with the higher number of counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants