You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(copy-paste from a chat)
Consider a case of 100 jobs, each for 30 min.
When adaptive scheduler submits these jobs it creates a slurm job for each one.
That requires allocating a node per job.
But in fact when the jobs are not super-long by the time the 50th node becomes available the prior nodes can be empty.
So now you end up with 98 allocated nodes and 98% of calculation finished only to wait for that last node to boot up.
I was wondering if more frequent checking and requeuing jobs would be useful for adaptive-scheduler or is it too much of a hassle?
The text was updated successfully, but these errors were encountered:
(copy-paste from a chat)
Consider a case of 100 jobs, each for 30 min.
When adaptive scheduler submits these jobs it creates a slurm job for each one.
That requires allocating a node per job.
But in fact when the jobs are not super-long by the time the 50th node becomes available the prior nodes can be empty.
So now you end up with 98 allocated nodes and 98% of calculation finished only to wait for that last node to boot up.
I was wondering if more frequent checking and requeuing jobs would be useful for adaptive-scheduler or is it too much of a hassle?
The text was updated successfully, but these errors were encountered: