-
I have a management node with flux-security, flux-core (--with-flux-security), and flux-sched installed. I have a compute node with flux-security, and flux-core (--with flux-security) installed. The management node exports /usr/local/etc/flux and the compute node mounts it.
Most commands work as expected. When, however, I do
but, if I do
As might be expected While the FLUX_MODULE_PATH directory on the management node contains the shared lib for In what seems like a hack to get things working I have exported the I'm no doubt missing something simple but I would like to get a correct, per LLNL operations, config nailed down before I make my work on reproducibly deploying flux available to my colleagues. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
alloc and batch start new Flux instances so they will need a scheduler on whatever node is their rank 0 (not necessarily the management node). Since you're sharing the etc directory, and that is where fluxion installs its rc scripts, when the new instance starts, it tries to run fluxion's rc script to load the fluxion modules, but doesn't find them. My first thought would be to either install fluxion everywhere or don't share the entire etc directory (maybe do a selective rsync or some sort of config management?) |
Beta Was this translation helpful? Give feedback.
-
You should setuid /usr/local/libexec/flux/flux-imp, not flux-shell (which
will make all your job tasks run as root)
…On Thu, Jul 7, 2022 at 3:40 PM wkharold ***@***.***> wrote:
Actually, it doesn't go away with flux-shell setuid everywhere instead I
get a bunch of
0.046s: flux-imp[2]: stderr: flux-shell: ERROR: output: shell_output_write: Operation not permitted
0.046s: flux-imp[3]: stderr: flux-shell: ERROR: output: shell_output_write: Operation not permitted
0.046s: flux-imp[2]: stderr: flux-shell: ERROR: output: shell_output_write: Operation not permitted
—
Reply to this email directly, view it on GitHub
<#4394 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVEUS6AEYTONX6LHFFSYDVS5ME3ANCNFSM52236M5Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: <flux-framework/flux-core/repo-discussions/4394/comments/3103657@
github.com>
|
Beta Was this translation helpful? Give feedback.
alloc and batch start new Flux instances so they will need a scheduler on whatever node is their rank 0 (not necessarily the management node). Since you're sharing the etc directory, and that is where fluxion installs its rc scripts, when the new instance starts, it tries to run fluxion's rc script to load the fluxion modules, but doesn't find them.
My first thought would be to either install fluxion everywhere or don't share the entire etc directory (maybe do a selective rsync or some sort of config management?)