-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some nextflow processes died #161
Comments
Hi! |
Hi! In fact, I applied for a node with 40 cpus, and then I changed the nextflow setting to process.cpus = 40 // SLURM config file for CESAR jobs, but it actually looks like it only utilizes 2 cpus. I don't know why it's not using all the resources, is that why it's so slow? If you can suggest any commands to speed up the process, I would really appreciate it. |
Hi! @kirilenkobm I am very sorry to bother you many times, so far I have not successfully run an instance. I actually tried a lot, and I couldn't commit it to the slurm system, it kept reporting errors. So now I'm running TOGA on a master node with 40 cores, divided into two buckets based on memory(--cn 10,100). I expect to be able to use all the CPUs at CESAR, but I'm only using two CPUs. It's working normally just too slow, and it seems like it can only run one and then move on to the next at CESAR, which has been working for over a week. Do you have any suggestions for this, which I would appreciate very much. In addition, I noticed that when I ran CESAR in the 10 and 100 buckets, it was not run in command-line order, because the output did not match the order in the cesar_joblist_queue_10.txt file. What's the reason for this, because if I ran it in order, I could also know where I was running, How much longer? By the way, my nextflow is 21.10.6.5660 and I git clone TOGA directly.Looking forward to your reply. Best regards! |
Hi,I have a few problems, hope to get your help.
my command is :
./toga.py /home/TOGAInput/query/hg38.H.g.final.chain /home/TOGAInput/human_hg38/toga.transcripts.bed /home/TOGA/hg38.2bit /home/TOGA/query/H.g.2bit --kt --pn /opt/synData2/Hg -i /home/TOGAInput/human_hg38/toga.isoforms.tsv --nc /home/TOGA/nextflow_config_files --cb 10,100 --cjn 300 --u12 /home/TOGAInput/human_hg38/toga.U12introns.tsv --ms -q
When I was working on CESAR job, the following error occurred:
Compiling C code...
Model found
CESAR installation found
Traceback (most recent call last):
File "/home/TOGA/./toga.py", line 1600, in
main()
File "/home/TOGA/./toga.py", line 1596, in main toga_manager.run()
File "/home/TOGA/./toga.py", line 530, in run
self.__check_cesar_completeness() File "/home/TOGA/./toga.py", line 1088, in __check_cesar_completeness
monitor_jobs(jobs_managers, die_if_sc_1=True) File "/home/TOGA/modules/parallel_jobs_manager_helpers.py", line 36, in monitor_jobs
raise AssertionError(err)
AssertionError: Error! Some para/nextflow processes died!
The log file section is as follows:
Checking whether all CESAR results are complete
1 CESAR jobs crashed, trying to run again...
!!RERUN CESAR JOBS: Pushing 1 jobs into None GB queue
Selected parallelization strategy: nextflow
Parallel manager: pushing job nextflow /home/TOGA/execute_joblist.nf --joblist /opt/synData2/Hg/_cesar_rerun_batch_None -c /opt/synData2/Hg/temp/cesar_config_16_queue.nf
Monitoring CESAR jobs rerun
## Stated polling cluster jobs until they done
Polling iteration 0; already waiting 0 seconds.
Polling iteration 1; already waiting 60 seconds.
Polling iteration 2; already waiting 120 seconds.
Polling iteration 3; already waiting 180 seconds.
.......
Polling iteration 48; already waiting 2880 seconds.
Polling iteration 49; already waiting 2940 seconds.
### CESAR jobs done ###
It's worth noting that this error occurs frequently. Sometimes, running it a second time with the same instructions might work, but each run often requires a significant time investment. Do you have any suggestions for addressing this issue?
Best regards!
The text was updated successfully, but these errors were encountered: