-
I had the following situation: a small cluster with four nodes I was using to do some MPI benchmarking. Two nodes had property 'x' and two had property 'y'. The goal was to run the OSU latency benchmark between the x nodes and between the y nodes a few hundred times to build a statistcal performance model. So, I created two
I assumed that the scheduler would allocate the x nodes and sequentially work through the 1024 tests targeted at those node and likewise that the y nodes would get allocated to work through their 1024 tests. Instead the x nodes were allocated to work through the x tests, and the y nodes weren't allocated until the x tests were completed. I ended up taking advantage of fluxes hierarchical scheduling and submitted two jobs that then ran the bulksubmits and everything worked as expected. I would, however, like to understand why the first approach didn't work; I assume there's something obvious that I don't understand about bulksubmt and/or the flux scheduler. Also, I apologize for not capturing the configuration and outputs before I tore down the cluster it would have made the question more succinct. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
With the Fluxion scheduler, the default queueing policy is FCFS. I think to get the behavior you're expecting you'd have to make sure you tune the Fluxion configuration to enable one of the backfill policies, along with a queue depth of probably 1024 or so so that the jobs targeted for property The built in The various tunables for the Fluxion scheduler can be found here. BTW @garlick, @milroy or @jameshcorbett feel free to correct anything I've gotten wrong above. |
Beta Was this translation helpful? Give feedback.
With the Fluxion scheduler, the default queueing policy is FCFS. I think to get the behavior you're expecting you'd have to make sure you tune the Fluxion configuration to enable one of the backfill policies, along with a queue depth of probably 1024 or so so that the jobs targeted for property
y
are considered for scheduling when the firstx
job can't be allocated resources.The built in
sched-simple
scheduler in flux-core can only do FCFS scheduling btw.The various tunables for the Fluxion scheduler can be found here.
BTW @garlick, @milroy or @jameshcorbett feel free to correct anything I've gotten wrong above.