Notable changes to RunRunner are documented in this file.
- SlurmJob Runtime computation when job is finished is now correct. Time of running jobs is formatted to be 2 decimals at maximum.
- SlurmRun no longer automatically refreshes its jobs details upon fetching its status, now instead this has to be done manually by the user by running .get_latest_job_details. Does still happen upon loading a a run from file.
- Better support for determining SlurmJob status for slurm ID's that can no longer be found in the Slurm DB
- Bug for SlurmJobs setting state to completed when actually the job is running
- SlurmRun.kill no longer kills each job and instead kills the whole batch at once
- SlurmRun now extracts latest information from the JSON job data instead for improved stability
- A SlurmRun is considered waiting when one of its jobs is waiting instead of all jobs (Condition that no jobs are, killed, crashed or running)
- SlurmJobs no longer have the requeues property
- Minor release for a bugfix regarding the retrieval of slurm job status for a single job
- Minor release for a bugfix regarding the retrieval of slurm job status for a single job
- Updated logging functionality
- When checking SlurmRun jobs status, this is now requested in one subprocess call instead of N.
- Logging has been improved and now has file support.
- When loading a SlurmRun from file, the dependencies are only loaded with only their IDs when load_dependencies is false. Before it was just empty.
- Shared base class for Local/Slurm Run/Job objects
- The stdout/stderr for Local runs is now configurable
- Local Jobs can be started at the will of the user with new optional add_to_queue/constructor argument
- SlurmJobs inherit many new properties from scontrol, and SlurmRuns offer the same options.
- Changed the loading from file to ignore dependencies in the JSON by default to avoid unnecesarry IO.
- Bug in the SlurmJob object causing instances to share slurm_job_details dict.
- The amount of parallel jobs is by default now None and will be calculated by the number of submitted jobs
- The logging statement for saving .json now has a verbose option to reduce excess logging when submitting a run
- Support for Slurm output piping array through add_job_to_queue
- Expanded Slurm status details, now all Slurm info on a job is available through the slurm_job_details. This is used to determine the job status (A simplified representation of slurm job status.)
- Replaced the mechanic with which RunRunner Slurm job status are detected. This now uses scontrol show job output instead of file detection, which caused latency issues when the commands were started within a node (e.g. the mother process was a slurm job)
- Changed the control flow of SBATCH options given to RunRunner's Slurm queue: The amount of parallel jobs will be replaced if the --array option is given with a different amount of parallel jobs.
- LocalJobs currently had no logical flow if the Popen threw an exception. This has been implemented and shows the user the exception message created by subprocess.Popen.
- The Slurm job .wait() could run into exceptional delays due to the file-checking mechanic, if the managing job was being executed on a different node than the actual job. This has been fixed with a new monitoring system that uses slurm mechanics.
- Fixed bug in
timing.py
functionformat_seconds()
which had cascading effects causing e.g. job statuses in LOCAL runner not to be updated when they should and getting stuck in an infinite loop for running times above 60 seconds.