Skip to content

SX-Aurora/SX-Aurora-Slurm-Plugin

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SX-Aurora-Slurm-Plugin

This gres plugin for Slurm allows for scheduling whole aurora cards on nodes using gres in Slurm but not to share one aurora card between jobs.

Check https://github.com/SX-Aurora/SX-Aurora-Slurm-Plugin/releases for the latest release. This has been tested on 20.11

Getting started

Compiling

You need to compile custom code for Slurm:

  1. Clone this repo to src/plugins/gres/ve in your local copy of Slurm or unpack the release tarball and copy the files to that folder.
  2. Add ''src/plugins/gres/ve/Makefile'' to the configure.ac file in the slurm root directory.
  3. Remove the existing configure file.
  4. Add ''ve'' to the SUBDIRS variable in src/plugins/gres/Makefile.am
  5. Run autoreconf.
  6. make && make install if this is a new Slurm-source-tree or patch the slurm.spec file.

It is recommendable to build a separate slurm cluster on the aurora-nodes for testing before moving the setup to production. In such a case slurmctld and slurmdbd are needed because of the gres-usage. Remember to change the ports in the slurm.conf if you have other slurmctlds running otherwise the reconfiguration may crash the other slurmctld.

Slurm-Configuration

  1. You should have the nodes configured in your slurm.conf or an approriate include file. The node definition should look like:
GresTypes=ve
SelectType=select/cons_tres
Nodename=<nodename> Gres=ve:<count>

With multiple VEs per VH you have to create a shared parition like below. It is also recommended to define CPUs as well as memory of the nodes as shared resources like in the example below with two A100:

GresTypes=ve
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
NoneName=vh[100-101] CPUs=80 Sockets=2 CoresPerSocket=20 ThreadsPerCore=2 RealMemory=192078 Gres=ve:8
PartitionName=aurora Shared=Yes Nodes=vh[100-101]
  1. gres.conf should contain at least:
Nodename=<nodename> File=/dev/veslot[<ve slot numbers as csv>]

So for the example with the two A300-8 nodes it would be

NodeName=vh0t[100-101] Name=ve File=/dev/veslot[0,1,2,3,4,5,6,7]
  1. cgroups.conf needs to have ContrainDevices=yes.

  2. cgroup_allowed_devices_file.conf must contain at least

/dev/cpu/*/*

Once the configuration files are changed restart slurmctld and the slurmds. Run a little test like:

srun -n1 --gres=ve:1 -p<yourpartition> env|sort 

and check for SLURM-variables being set, i.e. VE_NODE_NUMBER should also be there.

This GRES module only supports single node aurora jobs! The environment variables inside a job are set to support NEC-MPI in distributed mode, which means that inside your job script you should be able to simply do

mpirun <executable>

Packages

No packages published

Languages

  • C 82.0%
  • Shell 16.5%
  • Makefile 1.5%