Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Exception Handling in Pre and Post Run Methods SCMSUITE-10131 SO107 #183

Merged
merged 7 commits into from
Dec 11, 2024

Conversation

dormrod
Copy link
Contributor

@dormrod dormrod commented Dec 9, 2024

Description

The behaviour of exceptions raised in prerun and postrun methods of jobs was inconsistent and had issues, for example:

  • SingleJob with a serial runner would raise the exception in the main thread and so kill the script entirely (not just fail a job)
  • With a parallel runner, any exceptions would only get raised on the spawned thread and not be surfaced on the main thread, so there is a discrepancy in behaviour
  • A multi-job which had a job which raised an exception in a child job would deadlock. This is because the parent job would never be notified that the job had finished

This PR makes this behaviour consistent, by catching any exception raised in these job steps, marking the job as complete, and storing the error message.

Changes

To achieve this, the following changes were made:

  • The method get_errormsg which was already implemented on the AMSJob has been added to the Job base class with some basic implementation
  • Add the decorator _fail_on_exception which handles exceptions for functions, marks the job and complete and notifies the parent multi-job
  • Apply this decorator to _prepare, _execute and _finalize
  • Store the exception trace in _error_msg which can be used by get_errormsg

The following changes were also made:

  • Fix some mypy issues that were flagged on the latest version
  • Convert the Job to an actual abstract base class instead of just raising exceptions when the methods were called

@dormrod dormrod force-pushed the DavidOrmrodMorley/fix_multijob_locking branch 3 times, most recently from ff7e709 to cf32f9a Compare December 9, 2024 15:26
@dormrod dormrod force-pushed the DavidOrmrodMorley/fix_multijob_locking branch from cf32f9a to a7a376e Compare December 9, 2024 15:31
interfaces/adfsuite/ams.py Show resolved Hide resolved
core/basejob.py Show resolved Hide resolved
mol/molecule.py Show resolved Hide resolved
@dormrod dormrod changed the title Improve Exception Handling in Pre and Post Run Methods Improve Exception Handling in Pre and Post Run Methods SCMSUITE-10131 SO107 Dec 9, 2024
@dormrod dormrod requested a review from robertrueger December 9, 2024 16:04
Copy link
Member

@robertrueger robertrueger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not run tests yet, but code looks good to me. We should think a bit about which exceptions in job execution should be caught, and which ones should not. It seems that we are very liberally catching a lot now ...

core/basejob.py Outdated Show resolved Hide resolved
core/basejob.py Show resolved Hide resolved
@dormrod dormrod merged commit d8a857d into trunk Dec 11, 2024
17 checks passed
@dormrod dormrod deleted the DavidOrmrodMorley/fix_multijob_locking branch December 11, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants