Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example script to process the output files #34

Open
zhanglab2008 opened this issue Dec 4, 2021 · 2 comments
Open

Example script to process the output files #34

zhanglab2008 opened this issue Dec 4, 2021 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@zhanglab2008
Copy link

zhanglab2008 commented Dec 4, 2021

Hi Mervin,
I have succeeded in getting the output files from scUTRquant but I have no idea if the files are in good shape. For example, when I read the heart_10k_v3_fastq.genes.Rds in R, the gene names are like this:

   [1] "ENSMUSG00000000001.4"  "ENSMUSG00000000003.15" "ENSMUSG00000000028.15" "ENSMUSG00000048583.16"
   [5] "ENSMUSG00000000049.11" "ENSMUSG00000000058.6"  "ENSMUSG00000000078.7"  "ENSMUSG00000000085.16"
   [9] "ENSMUSG00000000088.7"  "ENSMUSG00000061689.15" "ENSMUSG00000000093.6"  "ENSMUSG00000000094.12"

and here are the transcript names from the heart_10k_v3_fastq.txs.Rds:

  [1] "ENSMUST00000000001.4"  "ENSMUST00000000003.13" "ENSMUST00000000028.13" "ENSMUST00000000033.11"
   [5] "ENSMUST00000000049.5"  "ENSMUST00000000058.6"  "ENSMUST00000000080.7"  "ENSMUST00000000087.12"
   [9] "ENSMUST00000000090.7"  "ENSMUST00000000094.13" "ENSMUST00000000095.6"  "ENSMUST00000000096.11"

Have you by any changes made a tutorial about how to process the output files? In particular, I want to know how to get the 3'UTR information from the output files. I noticed that you have made another repo called "scUTRquant-figures" to reproduce the figures in your biorxiv paper but I have difficulty in reading the codes since the codes are not starting from the scUTRquant output files. It would be really helpful if you can make a tutorial for that.
Many thanks for developing this tool again!
Zhang

@mfansler
Copy link
Collaborator

mfansler commented Dec 4, 2021

@zhanglab2008 Great to hear it finished! Those sound like correct rownames for resulting files. I appreciate all the feedback! 🙏

As for tutorial, yes, eventually this will be added as a vignette to the scUTRboot R package (which is more a work in progress). In the meantime, I'd direct you to the processing we did in Figure 5, for the 451Lu dataset where we used a GENCODE (human) target. It should be straightforward swapping in your data files.

The preprocessing and testing is performed in fig5d-451lu-pairwise-tests.Rmd. And the plotting is in fig5d-451lu-pairwise-plots.Rds.

Adjust the QC filtering to your needs, e.g., MT percent thresholds depend on cell type. The entropy/diversity filtering is optional. In our examples, we never call cell types ourselves, but just use the cell types inferred from gene-level methods. Did you provide cell_annots to include any cluster/cell type information about the cells?

Note that we don't provide functions yet for annotating custom targets with things like shortest and longest, or distinguish between intronic and tandem 3' UTRs (maybe we should include that in this pipeline for when users don't provide their own tx or gene annots?). However, the example I mentioned does provide a rudimentary annotation of first and last UTR using the genomic position of the transcripts' 3'-ends. In that file, we do a LUI test, but because we didn't distinguish intronic sites, many of the significant hits are actually intronic polyadenylation events. Not sure if that matters for you.

Also, I'd suggested running the WD test mode, since that will test any changes. However, it doesn't give any directional information - only total magnitude of isoform changes.

As for the gene results, one can process them like any other scRNA-seq dataset.

Requirements
The knitted version also has a Conda environment YAML output that you could use to recreate the environment on MacOS.

@mfansler mfansler self-assigned this Dec 4, 2021
@mfansler mfansler added the documentation Improvements or additions to documentation label Dec 4, 2021
@zhanglab2008
Copy link
Author

Hi Mervin,
Excellent! I will try the code to generate your figure 5.
Yes, I included cell_annots when I ran scUTRquant. I actually ran it using a barcode whitelist with the cells that have been passed the QC from Seurat (based on the cellranger counts).
You made a really good point about the intronic polyadenylation events. I am not sure how this may affect my data but I think I need to get the 3'UTR data out first and then see if the results make sense or not. I will get back to you if any issues pop up.
Really great work!

@mfansler mfansler added this to the 1.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants