Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I fix an error in the creation of Arrow Files from my fragment files? #2163

Open
Axbxh opened this issue May 23, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@Axbxh
Copy link

Axbxh commented May 23, 2024

ArchR log file

ArchR-createArrows-73c43a8d3823-Date-2024-05-20_Time-18-57-29.805596.log

Description of the bug

While creating Arrow Files from a fragment_file_name.tsv.gz, I get an error ggplot for Fragment Size Distribution. The message says the following:

2024-05-20 19:16:46.377978 : (D865 : 1 of 2) Successful creation of Arrow File, 19.246 mins elapsed.
2024-05-20 19:16:47.42894 : (D865 : 1 of 2) Adding Fragment Summary, 19.267 mins elapsed.
2024-05-20 19:17:08.62645 : (D865 : 1 of 2) Plotting Fragment Size Distribution, 19.621 mins elapsed.
2024-05-20 19:17:10.105093 : Continuing through after error ggplot for Fragment Size Distribution, 19.645 mins elapsed.
2024-05-20 19:17:11.227721 : (D865 : 1 of 2) Computing TSS Enrichment Scores, 19.664 mins elapsed.
2024-05-20 19:18:25.869288 : (D865 : 1 of 2) Computed TSS Scores!, 1.244 mins elapsed.

2024-05-20 19:18:25.885971 : Detected 2 or less cells pass filter (Non-Zero median TSS = 0.94, median Frags = 39590) in file!
Check inputs such as 'filterFrags' or 'filterTSS' to keep cells! Exiting!

2024-05-20 19:18:25.893817 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..

------- Completed

End Time : 2024-05-20 19:18:26.010781
Elapsed Time Minutes = 20.9341327190399
Elapsed Time Hours = 0.348902928100692

Although the log message shown as "Successful creation of Arrow File", I do not find any Arrow files in my home directory.
The output is three folders:

  1. ArchRLogs
  2. Fragment Size Distribution.pdf < SampleNames < QualityControl
  3. tmp which is empty

Code: To Reproduce

Code I used on Rstudio

library(ArchR)

fragmentFilePath <- '~/fragment_file_name.tsv.gz'

inputFiles <- c(fragmentFile = fragmentFilePath)
inputFiles

addArchRGenome("mm10")

work_dir <- "~/"
setwd(work_dir)

addArchRThreads(threads = 16) 

ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  minTSS = 2,
  minFrags = 0,
  maxFrags = 1e+07,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE,
  offsetPlus = 0,
  offsetMinus = 0,
  force = TRUE, #not make a new arrow file if one already exists
  TileMatParams = list(tileSize = 5000)
)

ArrowFiles

Expected behavior

Creation of Arrow File: fragment_file_name.arrow, in the ArchR directory.

ArchR Tutorial Code
Link: https://www.archrproject.com/bookdown/creating-arrow-files.html

library(ArchR)

inputFiles <- getTutorialData("Hematopoiesis")
inputFiles

1756 ATAC_BMMC_R1
“HemeFragments/scATAC_BMMC_R1.fragments.tsv.gz”
scATAC_CD34_BMMC_R1
“HemeFragments/scATAC_CD34_BMMC_R1.fragments.tsv.gz”
scATAC_PBMC_R1
“HemeFragments/scATAC_PBMC_R1.fragments.tsv.gz”

addArchRGenome("hg19")
addArchRThreads(threads = 16) 

Setting default genome to Hg19.
Setting default number of Parallel threads to 16.

ArrowFiles <- createArrowFiles(
  inputFiles = inputFiles,
  sampleNames = names(inputFiles),
  filterTSS = 4, #Dont set this too high because you can always increase later
  filterFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

Using GeneAnnotation set by addArchRGenome(Hg19)!
Using GeneAnnotation set by addArchRGenome(Hg19)!
ArchR logging to : ArchRLogs/ArchR-createArrows-dfa159ddbf6e-Date-2020-04-15_Time-09-21-27.log
If there is an issue, please report to github with logFile!
Cleaning Temporary Files
2020-04-15 09:21:28 : Batch Execution w/ safelapply!, 0 mins elapsed.
ArchR logging successful to : ArchRLogs/ArchR-createArrows-dfa159ddbf6e-Date-2020-04-15_Time-09-21-27.log

ArrowFiles

“scATAC_BMMC_R1.arrow” “scATAC_CD34_BMMC_R1.arrow”
“scATAC_PBMC_R1.arrow”

Additional context

Windows specifications of my device:
Edition: Windows 10 Home
Version: 22H2
Installed on: ‎1/‎22/‎2021
OS build: 19045.4412
Experience: Windows Feature Experience Pack 1000.19056.1000.0
R version
R 4.3.3

@Axbxh Axbxh added the bug Something isn't working label May 23, 2024
@rcorces
Copy link
Collaborator

rcorces commented May 23, 2024

Hi @Axbxh! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other
responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire
a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know!
In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15.
Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

@Axbxh
Copy link
Author

Axbxh commented May 24, 2024

Hi @Axbxh! Thanks for using ArchR! Lately, it has been very challenging for me to keep up with maintenance of this package and all of my other responsibilities as a PI. I have not been responding to issue posts and I have not been pushing updates to the software. We are actively searching to hire a computational biologist to continue to develop and maintain ArchR and related tools. If you know someone who might be a good fit, please let us know! In the meantime, your issue will likely go without a reply. Most issues with ArchR right not relate to compatibility. Try reverting to R 4.1 and Bioconductor 3.15. Newer versions of Seurat and Matrix also are causing issues. Sorry for not being able to provide active support for this package at this time.

Hi, rcorces! Thank you for your response. I have installed R 4.1 but BiocManager 1.5 and ArchR package are unavailable for this version.

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("ArchR")
Installing package into ‘C:/Users/Abhira/Documents/R/win-library/4.1’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘ArchR’ is not available for this version of R

@nigiord
Copy link

nigiord commented Jul 24, 2024

Hi @Axbxh , this could be related to #2150 . Even though createArrowFiles is supposed to continue after plotting fails, I think the error handling is wrongly implemented and that the output is not properly set.

ArchR/R/CreateArrow.R

Lines 213 to 237 in d9e741c

#Run With Parallel or lapply
outArrows <- tryCatch({
unlist(.batchlapply(args))
},error = function(x){
.logMessage("createArrowFiles has encountered an error, checking if any ArrowFiles completed..", verbose = TRUE, logFile = logFile)
for(i in seq_along(args$outputNames)){
out <- paste0(args$outputNames[i],".arrow")
if(file.exists(out)){
o <- tryCatch({
o <- h5read(out, "Metadata/Completed") #Check if completed
},error = function(y){
file.remove(out) #If not completed delete
})
}
}
paste0(args$outputNames,".arrow")[file.exists(paste0(args$outputNames,".arrow"))]
})
if(subThreading){
h5enableFileLocking()
}
.endLogging(logFile = logFile)
return(outArrows)

Currently all plots in ArchR break with the new versions of ggplot due to some functions like .fixPlotSize that convert the ggplot object to... something else for some reason.

You have to downgrade ggplot2 to 3.4.2 and it should work. Or modify yourself ArchR code to remove the plottings.

It’s also possible that another error occurred (for instance wrong type for a parameter that is not checked, like the parameters that are fed through TSSParams). You can’t know for sure because all error reportings were disabled like here or here. A strange choice, but if you manage to uncomment all those message/print in all tryCatch you might be able to understand what’s happening.

ArchR code is very convoluted with multiple intermediate functions dispatched in a lot of files, so unfortunately I haven’t found a way to fix the ggplot issue and send a pull request. On my side when I really need a specific version of ggplot (because of interactions with other single-cell analyses software for instance), I just remove all plotting in ArchR or execute it in its own outdated environment.

Cheers,
−Nils

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants