Cache entry deserialization failed, entry ignored #140

vinitamehlawat · 2024-01-13T17:25:36Z

Hi Authors,

I was able to run TOGA on my couple of species, but somehow, after running for 11 hours, I got this error for one of my species in my sbatch error file.

After loading nextflow module
and installing importlib-metadata using pip install importlib-metadata I ran toga with following command

./toga.py /home/vlamba/BD-CG.chain /home/vlamba/CG.bed /home/vlamba/CG.2bit /home/vlamba/BD.2bit -i /home/vlamba/CG-isofrom.txt --project_dir /home/vlamba/BD_gene-loss --kt --cb 10,100 --cjn 500 --ms

Cache entry deserialization failed, entry ignored
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-3HJdri/pip/
You are using pip version 8.1.2, however version 23.3.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-msMJKV/importlib-metadata/
You are using pip version 8.1.2, however version 23.3.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Traceback (most recent call last):
File "./toga.py", line 8, in
import importlib.metadata as metadata
ModuleNotFoundError: No module named 'importlib.metadata'

Here are some last lines from my toga .log file

`Polling iteration 653; already waiting 39180 seconds.
Polling iteration 654; already waiting 39240 seconds.
Polling iteration 655; already waiting 39300 seconds.

CESAR jobs done

Checking whether all CESAR results are complete
2 CESAR jobs crashed, trying to run again...
!!RERUN CESAR JOBS: Pushing 2 jobs into 100 GB queue
Selected parallelization strategy: nextflow
Parallel manager: pushing job nextflow /scrfs/storage/vlamba/home/TOGA/execute_joblist.nf --joblist /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100
Monitoring CESAR jobs rerun

Stated polling cluster jobs until they done

CESAR jobs done

`
I would be grateful if you could suggest any solution for this.

My second concern is running time: my first species took 9hr:40min to complete when I ran TOGA for the first time, the second took 10hr:21min, and the third failed after 11 hrs.

Kindly have a look at my shared command and suggest the best way to run this tool faster on my data.

Thank you

The text was updated successfully, but these errors were encountered:

kirilenkobm · 2024-01-17T20:14:45Z

Hi @vinitamehlawat

Thank you for reaching out. Firstly, I consider a total runtime of around 12 hours to be quite normal. I would start to worry if it takes more than a couple of days
The issue is that CESAR2.0 isn't very memory-efficient when dealing with long genes. As far as I know, improvements are being developed in the lab, although I haven't worked there for a couple of years.
As I can see, CESAR/TOGA failed to process a couple of genes. Pls check, what exactly is in the /home/vlamba/BD_gene-loss/_cesar_rerun_batch_100 file - what are the failed transcripts.

BR,
Bogdan

kirilenkobm · 2024-01-17T22:35:28Z

I got rid of the importlib-metadata dependency.
Thank you for pointing this out

6ebd86f

vinitamehlawat · 2024-01-18T16:45:15Z

Hi @kirilenkobm

Thank you very much for your response. Following your suggestion, I did cat _cesar_rerun_batch_100 and got following message for two rejected logs:

/scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_1_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_1_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_1_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt
/scrfs/storage/vlamba/home/TOGA/cesar_runner.py /home/vlamba/BD-gene-loss/RERUN_CESAR_JOBS/rerun_job_2_100 /home/vlamba/BD-gene-loss/temp/cesar_results/rerun_job_2_100.txt --check_loss /home/vlamba/BD-gene-loss/temp/inact_mut_data/rerun_job_2_100.txt --rejected_log /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_2_100.txt

and the looked at rejected_log in /home/vlamba/BD-gene-loss/temp/cesar_jobs_crashed_again/rerun_job_1_100.txt and it gave me following massage:

/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py ENSCGOT00000028426.1 28 /home/vlamba/BD-gene-loss/temp/toga_filt_ref_annot.hdf5 /home/vlamba/BD-gene-loss/temp/genome_alignment.bst /home/vlamba/LepNud-Chain-alignment/CG.2bit /home/vlamba/BD.2bit --cesar_binary /scrfs/storage/vlamba/home/TOGA/CESAR2.0/cesar --uhq_flank 50 --temp_dir /home/vlamba/BD-gene-loss/temp/cesar_temp_files --mask_stops --check_loss --alt_frame_del --memlim 10 CESAR JOB FAILURE Input is corrupted! Reference sequence should start with ATG! Error! CESAR output is corrupted, target must start with ATG! Error! CESAR output is corrupted, target must start with ATG! Traceback (most recent call last): File "/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py", line 2661, in realign_exons(cmd_args) File "/scrfs/storage/vlamba/home/TOGA/CESAR_wrapper.py", line 2626, in realign_exons loss_report, del_mis_exons = inact_mut_check( File "/scrfs/storage/vlamba/home/TOGA/modules/inact_mut_check.py", line 1660, in inact_mut_check split_stop_codons = detect_split_stops( File "/scrfs/storage/vlamba/home/TOGA/modules/inact_mut_check.py", line 1468, in detect_split_stops position = exon_to_last_codon_of_exon[first_exon] KeyError: 1

It seemes that in my data 2 transcripts are not having in correct frame.

It would be a great help you could suggest me possible solution for this

Thanks
Vinita

kirilenkobm · 2024-01-19T22:30:45Z

Hi @vinitamehlawat

thanks for checking this.
Yes, indeed, these 2 transcripts don't have a correct reading frame.
Right now, the pipeline expects that each reference transcript satisfies the following criteria:

starts with ATG
ends with one of the stop codons: TGA, TAA, TAG
total sequence length modulo 3 == 0

Otherwise, CESAR (a tool, that realigns reference exons to query loci) might not process such transcripts correctly in the multi-exon mode. Also, post-processing TOGA steps are based on the assumption that the provided reference transcripts have a correct reading frame.

(Maybe, at some point, we will find a way to process a bit more diverse variants, and CESAR itself requires some optimisations)
In my case, I just dropped such reference transcripts.

BR,
Bogdan

vinitamehlawat · 2024-01-20T17:25:52Z

Hi @kirilenkobm

Thank you for your every response. Is there any way to delete these two exons with incomplete reading frames?

Any tool or any command you can suggest, I would really appreciate it.

Best Rgards,
Vinita

kirilenkobm · 2024-02-03T18:19:15Z

@vinitamehlawat

I would just do something like
grep -v -e $gene1 -e $gene2 your_bed_file.bed > new_bed_file.bed or something among these lines.
(I believe deleting whole transcripts would be safer)

molinfzlvvv · 2024-05-23T01:49:10Z

Hello, excuse me, I've encountered a similar issue as described above. I've placed the tasks on a CPU computing server, but I'm unable to submit the tasks to the slurm system. Therefore, I opted for "local." However, it has been running for five days now, and the log files indicate that it's still in progress. I'd greatly appreciate any suggestions on how to improve its speed. Thank you very much.

my command is :
./toga.py /opt/synData2/gene_loss/chain/DWR.chain /home/TOGA/TOGAInput/human_hg38/toga.transcripts.bed /home/TOGA/hg38.2bit /opt/synData2/gene_loss/chain/DWR.2bit --kt --pn /opt/synData2/gene_loss/DWR -i /home/TOGA/TOGAInput/human_hg38/toga.isoforms.tsv --nc /home/TOGA/nextflow_config_files --cb 10,100 --cjn 300 --u12 /home/TOGA/TOGAInput/human_hg38/toga.U12introns.tsv --ms -q

The log shows :

'''### STEP 7: Execute CESAR jobs: parallel step'''

Pushing 2 CESAR job lists
Pushing memory bucket 10Gb to the executor
Selected parallelization strategy: nextflow
Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_10.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_10_queue.nf
Pushing memory bucket 100Gb to the executor
Selected parallelization strategy: nextflow
Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/gene_loss/DWR/temp/cesar_joblist_queue_100.txt -c /opt/synData2/gene_loss/DWR/temp/cesar_config_100_queue.nf
'''## Stated polling cluster jobs until they done'''
Polling iteration 0; already waiting 0 seconds.
Polling iteration 1; already waiting 60 seconds.
.......
Polling iteration 7882; already waiting 472920 seconds.
Polling iteration 7883; already waiting 472980 seconds.

kirilenkobm added the usecase Issues related to various use cases - to documentation label Jan 19, 2024

This was referenced May 24, 2024

Output file interpretation #162

Closed

Some nextflow processes died #161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache entry deserialization failed, entry ignored #140

Cache entry deserialization failed, entry ignored #140

vinitamehlawat commented Jan 13, 2024

kirilenkobm commented Jan 17, 2024

kirilenkobm commented Jan 17, 2024

vinitamehlawat commented Jan 18, 2024

kirilenkobm commented Jan 19, 2024

vinitamehlawat commented Jan 20, 2024

kirilenkobm commented Feb 3, 2024

molinfzlvvv commented May 23, 2024 •

edited

Loading

Cache entry deserialization failed, entry ignored #140

Cache entry deserialization failed, entry ignored #140

Comments

vinitamehlawat commented Jan 13, 2024

CESAR jobs done

Stated polling cluster jobs until they done

CESAR jobs done

kirilenkobm commented Jan 17, 2024

kirilenkobm commented Jan 17, 2024

vinitamehlawat commented Jan 18, 2024

kirilenkobm commented Jan 19, 2024

vinitamehlawat commented Jan 20, 2024

kirilenkobm commented Feb 3, 2024

molinfzlvvv commented May 23, 2024 • edited Loading

molinfzlvvv commented May 23, 2024 •

edited

Loading