Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not split CESAR jobs #121

Open
AgustoLuz opened this issue Nov 14, 2023 · 4 comments
Open

Could not split CESAR jobs #121

AgustoLuz opened this issue Nov 14, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@AgustoLuz
Copy link

High I am running, when I run in my data I got the error "Could not split CESAR jobs".
I run the following line:

./toga.py make_lastz_chains/make_lastz_chains/chicken_turkey/target.query.final.chain.gz ../chicken/genes.bed12 ../chicken/galGal6.2bit ../turkey/melGal5.2bit --pn chicken_turkey_test --kt --cjn 100 --cb 3,5 --ces --cjn 500 --m --cesar_mem_limit 300

I have attached the log file.
toga_2023_11_13_at_20_05.log

@kirilenkobm kirilenkobm added the bug Something isn't working label Jan 12, 2024
@kirilenkobm
Copy link
Member

Hi @AgustoLuz

I apologize for the long delay in responding. Thank you for reporting the issue, t's the first problem reported with this module. Could you please execute the following command:

/nas2/aluzuriaganeira/birds/TOGA/./split_exon_realign_jobs.py /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/trans_to_chain_classes.tsv /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/toga_filt_ref_annot.bed /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/toga_filt_ref_annot.hdf5 /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/genome_alignment.bst /nas2/aluzuriaganeira/birds/chicken/galGal6.2bit /nas2/aluzuriaganeira/birds/turkey/melGal5.2bit /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test --jobs_dir /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/cesar_jobs --jobs_num 500 --combined /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/cesar_combined --results /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/cesar_results --buckets 8,16,32,64,128,256,512,999 --mem_limit 300 --chains_limit 100 --skipped_genes /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/rejected/SPLIT_CESAR.txt --rejected_log /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/rejected --cesar_binary /nas2/aluzuriaganeira/birds/TOGA/./CESAR2.0/cesar --paralogs_log /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/paralogs.txt --uhq_flank 50 --predefined_glp_class_path /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/predefined_glp_cesar_split.tsv --unprocessed_log /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/technical_cesar_err --log_file /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/toga_2023_11_13_at_20_05.log --cesar_logs_dir /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp_logs --mask_stops --check_loss /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/inact_mut_data --fragments_data /nas2/aluzuriaganeira/birds/TOGA/chicken_turkey_test/temp/gene_fragments.txt

and share the console output with me, assuming you haven't deleted the output directory? Regrettably, the logging for this module isn't working as expected, but I plan to fix it in the upcoming update."

@laristide
Copy link

laristide commented Sep 19, 2024

Hi @kirilenkobm ,
I'm having the same issue while running some tests with a small dataset locally. I get the "Could not split CESAR jobs" message. I've managed to trace down the issue to some specific transcripts, but I can't see what could be specifically wrong with them. This is the console output of the failed step with a reduced example with two transcripts, where one transcript is processed OK and the other one causes the crash.

split_cesar_jobs: the arguments list is:

  • orthologs_file: testGorila5/temp/trans_to_chain_classes.tsv
  • bed_file: testGorila5/temp/toga_filt_ref_annot.bed
  • bdb_bed_file: testGorila5/temp/toga_filt_ref_annot.hdf5
  • bdb_chain_file: testGorila5/temp/genome_alignment.bst
  • tDB: /home/laristide/tests/hg38.chr22.2bit
  • qDB: /home/laristide/tests/GCA_963575185.1_PGDP_GorBer_genomic_filtered_hgr38_Chr22.renamed.2bit
  • toga_out_dir: testGorila5
  • cesar_binary: ./CESAR2.0/cesar
  • jobs_num: 1
  • buckets: 12
  • mask_stops: True
  • chains_limit: 100
  • skipped_genes: testGorila5/temp/rejected/SPLIT_CESAR.txt
  • mem_limit: 15.0
  • jobs_dir: testGorila5/temp/cesar_jobs
  • combined: testGorila5/temp/cesar_combined
  • results: testGorila5/temp/cesar_results
  • check_loss: testGorila5/temp/inact_mut_data
  • u12: None
  • rejected_log: testGorila5/temp/rejected
  • paralogs_log: testGorila5/temp/paralogs.txt
  • uhq_flank: 50
  • o2o_only: False
  • no_fpi: False
  • annotate_paralogs: False
  • fragments_data: testGorila5/temp/gene_fragments.txt
  • predefined_glp_class_path: testGorila5/temp/predefined_glp_cesar_split.tsv
  • unprocessed_log: testGorila5/temp/technical_cesar_err
  • cesar_logs_dir: testGorila5/temp_logs
  • debug: False
  • mask_all_first_10p: False
  • log_file: testGorila5/toga_2023_11_13_at_20_05.log
  • quiet: False
    split_cesar_jobs: reading U12 data from None
    split_cesar_jobs: not U12 file provided: skip
    split_cesar_jobs: reading orthology data...
    split_cesar_jobs: for each transcript, find chains to produce annotations
  • selected chain class to annotate transcript ENST00000618236: ORTH
  • selected chain class to annotate transcript ENST00000406028: ORTH
    split_cesar_jobs: number of transcripts to create CESAR jobs: 2
    split_cesar_jobs: total number of 2 transcript/chain pairs
    split_cesar_jobs: skipped total of 0 transcripts
    split_cesar_jobs: out of them, transcripts not intersected by chains: 0
    split_cesar_jobs: assigning MISSING class to 0 transcripts not intersected by any chain
    split_cesar_jobs: creating a list of RAM-limit buckets based on user arguments
    split_cesar_jobs: defined memory limit: 12, RAM-limit buckets: {12: []} (to be filled with CESAR jobs)
    split_cesar_jobs: reading bed file testGorila5/temp/toga_filt_ref_annot.bed
    split_cesar_jobs: got data for 2 transcripts
    split_cesar_jobs: reading transcript fragments data from testGorila5/temp/gene_fragments.txt
    split_cesar_jobs: got data for 0 transcripts potentially fragmented in the query genome
    split_cesar_jobs: precomputing query regions for each transcript/chain pair
    split_cesar_jobs: batch size: 2
    split_cesar_jobs: first, invert gene-to-chains dict to chain-to-genes
    split_cesar_jobs: for each of 1 involved chains, precompute regions
    Traceback (most recent call last):
    File "/home/laristide/Software/TOGA/./split_exon_realign_jobs.py", line 1054, in
    main()
    File "/home/laristide/Software/TOGA/./split_exon_realign_jobs.py", line 873, in main
    regions, skipped_2, predef_glp = precompute_regions(
    ^^^^^^^^^^^^^^^^^^^
    File "/home/laristide/Software/TOGA/./split_exon_realign_jobs.py", line 518, in precompute_regions
    q_start, q_end = int(q_grange[0]), int(q_grange[1])
    ^^^^^^^^^^^^^^^^
    ValueError: invalid literal for int() with base 10: ''

I uploaded the files to replicate the analysis here: https://filetransfer.io/data-package/xsA37yyO#link

the command was:

./toga.py ~/tests/GorilaChain.merged.reorder.modScore.modID.chain ~/tests/hg38.genecode46.chr22.mod.reduced.bed ~/tests/hg38.chr22.2bit ~/tests/GCA_963575185.1_PGDP_GorBer_genomic_filtered_hgr38_Chr22.renamed.2bit --kt --project_dir testGorila5 -i ~/tests/hg38.isoforms.all.tsv --cb 12 --cjn 1 --chn 10 --cesar_mem_limit 12 --ncf

I would appreciate any help with this!
Thank you,
Leandro

@MichaelHiller
Copy link
Collaborator

@kirilenkobm Could you have a look pls. Thx !!

@laristide
Copy link

Hi, @kirilenkobm I'm sorry to insist, did you have the chance to have a look at this? I would really appreciate it!

Thanks a lot.
Leandro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants