Early requeue of jobs #57

aeantipov · 2020-12-11T07:12:55Z

(copy-paste from a chat)
Consider a case of 100 jobs, each for 30 min.
When adaptive scheduler submits these jobs it creates a slurm job for each one.
That requires allocating a node per job.
But in fact when the jobs are not super-long by the time the 50th node becomes available the prior nodes can be empty.
So now you end up with 98 allocated nodes and 98% of calculation finished only to wait for that last node to boot up.

I was wondering if more frequent checking and requeuing jobs would be useful for adaptive-scheduler or is it too much of a hassle?

basnijholt · 2020-12-15T17:28:02Z

This is certainly a good idea and not too hard to implement (I think).

Let's see if we can make this a priority.

basnijholt added the feature-request label Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early requeue of jobs #57

Early requeue of jobs #57

aeantipov commented Dec 11, 2020

basnijholt commented Dec 15, 2020

Early requeue of jobs #57

Early requeue of jobs #57

Comments

aeantipov commented Dec 11, 2020

basnijholt commented Dec 15, 2020