-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NHC returns false "OK" when checking for mounted GPFS filesystems #77
Comments
Hey Ryan! Based on what I see here, NHC is reporting -- correctly -- that the filesystem is mounted. :-) As you know, NHC very intentionally does not call I haven't touched GPFS in years, and we no longer use it at LANL...but I'm open to suggestions! 😀 By any chance have you looked at @treydock's GPFS check in #71? Would something like that help your use case? |
I actually don't know that this is specific to GPFS; if anyone has a tip for how to create a stale file handle (I don't actually know if I could figure out how to do it on purse for NFS or GPFS), I could probably experiment some. Personally, I'd rather NHC hang and report the hang than I would have it report a filesystem that's "technically" mounted when it means the node is unusable. These are very bad because they will drain the entire job queue if all jobs that run will fail because of a stale file handle on the user filesystem. Would |
To be honest, I'm not exactly sure if this is because GPFS is doing something non-standard, or this would happen with any stale remote filesystem type.
It makes the filesystem check pretty unreliable, as this is one of the more likely things to go wrong. Any advice? This is with NHC 1.4.2, but I suspect this is not something that is version dependent.
The text was updated successfully, but these errors were encountered: