-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Custom Check, How to exit without any changes, i.e. leave node in current state? #139
Comments
I ran a few tests and it appears that calling I guess putting this particular check at the end of |
So to make sure I understand... You want the check to fail if the correctly At present, NHC doesn't really have a "soft fail" or a concept of a partially (un)healthy node, and that was really by design. You can, however, make changes to existing configuration values from within the code for your check. So if you wanted the check to pass but disallow "undraining" of the node, you can do something like Feel free to share the code in question if that might help clarify what you're shooting for here! 😀 |
This is what I'm after, thanks: Here's the code: https://gitlab.rc.uab.edu/rc/rc-nhc/-/blob/main/uabrc_hw.nhc |
Howdy, we have a custom check that retrieves a metric value from Prometheus using
curl
.Edit: we are using Slurm as our resource manager.
The check works great, however I need to add code to the check to prevent NHC from changing the state of the node (drained, un-drained) if the curl command fails, examples:
Is there a way to return from the function where NHC would not make any changes to the node?
return 0
indicates no failure and triggers anun-drain
if the node is already drained, so I can't use thatreturn 1
or any number indicates failure and drains the node.Thanks,
Mike Hanby
UAB IT Research Computing
The text was updated successfully, but these errors were encountered: