SlurmJob is a Python package designed to simplify the process of setting up and monitoring interactive jobs on a Slurm cluster. It provides an CLI that abstracts away complex srun
and sbatch
commands, allows you to directly connect to your job via a VScode hyperlink, and keeps track of your job's status. The package also automatically constructs the sbatch
command based on your requirements and stores it on the cluster via SSH.
slurmjob run <job_name> --<SBATCH_option1>=<value1> --<SBATCH_option2>=<value2>
This command will submit your job with the specified SBATCH options, such as --qos=idle
or --cpus-per-task=1-2
.
See here the list of all SBATCH options.
When you use SlurmJob, it establishes an SSH connection to the Slurm cluster using the paramiko
library. Through this SSH connection, it executes various Slurm commands and other shell commands:
- It creates necessary folders and files (like the logs folder and interactive sbatch jobs).
- It submits jobs using the
sbatch
command, now with additional parameters if provided. - It monitors the job by tailing the Slurm log file with
tail
.
SlurmJob automates a series of steps that you'd otherwise perform manually. The typical manual steps would be:
- SSH into the cluster.
- Create a Slurm batch script (
*.sh
) file for your interactive job. - Submit this batch file using
sbatch
, now optionally with additional parameters. - Monitor job status with
squeue
and logs usingtail -f
. - Enter the ssh credentials of your interactive job into VScode.
SlurmJob continually polls the last line of the Slurm job's log file, looking for a specific pattern to determine when the interactive job is ready. When the pattern is found, it generates a VScode URL which you can use to directly connect to your job.
To install SlurmJob, you can use pip:
pip install git+https://github.com/daangeijs/deepops-slurmjob.git
Or to install it locally, you can clone the repository and run:
pip install .
Run this command to set up your initial configuration. You'll be prompted for your hostname
, username
, and key_location
. Advanced settings are optional.
This command will generate the sbatch
script for your interactive job. It will prompt you for various job settings and then upload the script to the cluster.
Use this command to run the interactive job that you've created. It will submit the job with any specified SBATCH options, monitor its status, and provide a VScode hyperlink for direct connection.
Lists all the existing job files you have in the job folder on your Slurm cluster.
Cancels a running job on the Slurm cluster using the job ID.
- hostname: The hostname of your Slurm cluster.
- username: Your username on the cluster.
- key_location: Location of your SSH private key.
- home_folder: Your home folder on the cluster, default is "home/{username}".
- log_location: Where to store log files, default is "/home/{username}/logs".
- job_location: Where to store job files, default is "/home/{username}/jobs".
- machine_prefix: Prefix for the cluster machine, default is "dlc-".
- sbatch_command: The
sbatch
command to run, default is "sbatch {job_location}/{job_name}.sh".
When running slurmjob create
, you'll be prompted for the following:
- ntasks: The number of tasks to be allocated for the job.
- gpus-per-task: The number of GPUs per task (default is 0).
- cpus-per-task: The number of CPUs per task (default is 4).
- mem: The amount of memory required for the job (default is 8G).
- time: The time limit for the job (default is 4:00:00).
- container-mounts: Paths to mount into the job's container.
- container-image: The container image to use for the job.
- output: The location for output logs (This is set automatically from your config).
- SSH port: The port to be used for SSH within the job.
To securely connect to your Slurm host, you'll need to set up an SSH keypair. Follow these steps:
-
On your local machine, generate an SSH keypair:
ssh-keygen -t rsa -f ~/.ssh/id_rsa
-
Add your public key (
~/.ssh/id_rsa.pub
) to the~/.ssh/authorized_keys
file on your Slurm host. You can do this manually or usessh-copy-id
:ssh-copy-id -i ~/.ssh/id_rsa user@host
Replace
username
andhostname
with your Slurm host username and hostname.
Add an entry for your Slurm host in your local ~/.ssh/config
file. Here's a sample configuration:
```
Host my-slurm-host
HostName your.slurm.hostname
User your-username
IdentityFile ~/.ssh/id_rsa
```
Replace your.slurm.hostname
with your Slurm host's hostname and your-username
with your username on that host. This configuration will let you SSH into your Slurm host using the keypair.