Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addGeneExpressionMatrix incorrectly calculates "Gex_MitoRatio" #2139

Open
dannyconrad opened this issue Mar 20, 2024 · 1 comment
Open

addGeneExpressionMatrix incorrectly calculates "Gex_MitoRatio" #2139

dannyconrad opened this issue Mar 20, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@dannyconrad
Copy link

dannyconrad commented Mar 20, 2024

This is a quick fix but users should know that when doing any kind of multiome analysis, addGeneExpressionMatrix() is not accurately calculating Gex_MitoRatio. There are two sources of error here that I can see:

  1. When import10xFeatureMatrix() reads the .h5 file via .importFM(), it removes any entry in the features dataframe for which the interval (i.e. chr1:10000-20000) value is missing. In my own cellranger-arc outputs, this value is only missing for the mitochondrial genes, so they are removed when loading the feature matrix. It only performs this check if the interval column exists at all, so this doesn't seem to apply to standard cellranger outputs because I don't think there's an "interval" slot in the .h5 files it produces.
  if ("interval" %in% colnames(rowData(se))) {
    idxNA <- which(rowData(se)$interval == "NA")
    if (length(idxNA) > 0) {
      se <- se[-idxNA, ]
    }
    rr <- GRanges(paste0(rowData(se)$interval))
    mcols(rr) <- rowData(se)
    se <- SummarizedExperiment(assays = SimpleList(counts = assay(se)), 
      rowRanges = rr)
  }
  1. Even if the mito genes are retained when loading the feature matrix, the resulting value is inflated. This is similar to the issue raised in Bug in addGeneExpressionMatrix using Mouse Data #2000 by @Nahuck. In addGeneExpressionMatrix() the regex pattern ^MT is including almost 100 additional genes that begin with the letters "MT", including genes like MTOR and MT2A. Easy fix here would be to add the hyphen that delineates the mito genes in mouse and human gene annotations and make the pattern case insensitive: (?i)^mt-
  MitoRatio <- Matrix::colSums(assay(seRNA)[grep("^MT", rownames(assay(seRNA))), 
    ])/nUMI

Using ArchR version 1.0.2

@dannyconrad dannyconrad added the bug Something isn't working label Mar 20, 2024
@rcorces
Copy link
Collaborator

rcorces commented Mar 20, 2024

Hi @dannyconrad! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other
responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire
a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know!
In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15.
Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants