-
Notifications
You must be signed in to change notification settings - Fork 20
Komondor Benchmarking
Important
This is not a guide on how to get started with either generic SLURM scheduling, or Benchexec-benchmarking. For the former see this quick start guide, or the Komondor documentation; and for the latter, see this quick start guide. This guide is on setting up an environment on Komondor that can utilize Benchexec for benchmarking, primarily for verification-based workflows.
Benchmarking with Benchexec at the low level usually resolves to running Runexec to facilitate resource constraints and measurements. Most features of runexec require cgroups. Furthermore, benchmarked tools (such as Theta) often have specific requirements on the operating systems and installed packages (documented here). Therefore, we containerize the tasks using singularity.
Benchexec will collect the tasks of a benchmark, and submit them as an array job. The user has options to aggregate multiple tasks per job, parallelize tasks within a job, and to restart failed measurements. It is useful to try and keep the number of jobs at a minimum, but this comes at a cost of suboptimal scheduling (if we started a job for every task, the walltime would be the smallest, but if we started a job with N tasks per job, an unfortunate selection of N small tasks on the same job could slow down the entire system).
To start a simple benchmark, we'll need a tool, some data, and a benchmark definition file. Then we'll pull Benchexec, configure the containers, and finally run the benchmark.
wget https://github.com/ftsrg/theta/releases/download/v6.8.6/Theta.zip
unzip Theta.zip
git clone https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks.git
wget --directory-prefix=xmls https://gitlab.com/sosy-lab/sv-comp/bench-defs/-/raw/main/benchmark-defs/theta.xml
We also need the version of Benchexec that supports SLURM Array Jobs (not yet upstreamed at the time of writing, but planned):
git clone https://github.com/leventeBajczi/benchexec -b slurm-array-aggregate benchexec
We can now collect the necessary packages of the tool and runexec (benchexec), and create a Singularity container (e.g., svcomp.def
):
BootStrap: docker
From: ubuntu:24.04
%post
apt -y update
apt -y install libgomp1 libmpfr-dev openjdk-17-jre-headless # tool dependencies
apt -y install software-properties-common # down from here everything is mandatory
add-apt-repository ppa:sosy-lab/benchmarking
apt -y install benchexec fuse-overlayfs
mkdir /overlay
Then we need to build a SIF file. However, we don't have the necessary permissions, so we make use of a builder service:
singularity remote login
Follow the instructions (you will need a user on the builder service), then:
singularity build --remote --fix-perms svcomp.sif svcomp.def
Now we can start the benchmark:
python3.11 ./benchexec/contrib/slurm-benchmark.py \
--no-container \ # we don't want to nest containers
--slurm-array \ # this will activate the SLURM executor
--singularity ./svcomp.sif \ # this selects the container to use (relative path)
--no-hyperthreading \ # hyperthreading is not robustly supported
--tool-directory Theta \ # the tool directory to use
-N 10 \ # at most this many jobs at the same time
-o svcomp25/ \ # output directory (has to end with a /, otherwise, will only be used as filename prefix)
--aggregation-factor 32 \ # how many tasks to aggregate into one job
--parallelization 4 \ # how many tasks to run in parallel
--batch-size 5000 \ # how many tasks to handle at the same time (if too many, array script can be too big for SLURM)
xmls/theta.xml \ # the benchmark to run
--retry-killed 10 # how many times to retry a task when it fails due to the infrastructure
This will first parse the benchmark definition, query the version of the tool, query some representative information about the operating system, then do the following:
- For an (at most)
--batch-size
part of the tasks, it creates a temporary directory with the following structure:
.
├── 0/
├── 1/
....
├── <batch_size>/ # <batch_size> many folders, these will hold the outputs
├── bin0.tasks
├── bin1.tasks
....
├── binX.tasks # tasks per job, each holding <aggregation_factory> many tasks
├── logs/ # log folder, one file per each job
└── array.sbatch # batch script
-
It starts the
array.sbatch
script and waits for it to end. The jobs of the array will take the input files and parameters from thebinX.tasks
file, and populate the {0..<batch_size>} folders with output artifacts. The tool output will be inouptut.log
, and the runexec output will be inlog
. -
Benchexec then parses the outputs, sets the verdicts, and returns the result (or continues with the next batch).
-
The temporary directory is deleted. Results are output in the specified directory.
If a run was aborted (say, the login node was restarted suddenly), then zip the half-finished .logfiles
and .files
directories (pay attention to keep the prefixes!). Then, if necessary, fix half-finished ZIP files by running 7za x <zip> && zip <zip> <folder> -rm -q
(7za
can extract files that unzip
cannot due to missing end-of-dir signatures). Then, re-run benchexec with the same parameters and output dirs, but add --continue-interrupted
at the end. This will find the latest logfiles.zip, and attempt to recover results from it, the corresponding .xml(.bz2)
files, and the files.zip
zip.
squeue
is the fastest client to query running (R), pending (PD), and completing (CG) runs. Filter using -t <FLAG>
, e.g., squeue -t CG
.
Find out more info about a still not completed job using scontrol show job <JOBID>
.
Find out more info about any job using seff <JOBID>
.
Array jobs have two IDs: an array ID (containing a _
), and a job ID (not containing a _
). Sometimes you need one, sometimes you need the other. seff
can parse either and outputs both.
Find out about personal limits using:
-
sacctmgr list assoc
: How many jobs a user can run at the same time (underGrpJobs
) -
squota
: how much storage a user has (pay attention to file count quotas, those are much stricter than expected!) -
sbalance
: how much CPU time a user has left (this is not accounted immediately, sometimes it is delayed up to a day)
sacct -j <JOBID>
is sometimes useful for getting data on jobs.
To get current resource allocation: echo "Allocated resources:" ;squeue -o "%m" -t RUNNING -h | awk '{s+=$1} END {print "Memory: " s/1000/1000 " TB"}'; squeue -o "%C" -t RUNNING -h | awk '{s+=$1} END {print "CPUs: " s " cores"}'
To cancel a job (if it's an array job, cancel all jobs): scancel <JOBID>
To get current job numbers:
printf "R: %d, PD: %d\n" $(squeue -h -t R | wc -l) $(squeue -h -t PD | wc -l)
To attach to a running job: srun --pty --overlap --jobid 9009949 bash
. Useful for running htop
or similar to check on fuse-overlayfs status, among others.