Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not ok - tbon.endpoint cannot be set #6336

Open
grondo opened this issue Oct 1, 2024 · 1 comment
Open

not ok - tbon.endpoint cannot be set #6336

grondo opened this issue Oct 1, 2024 · 1 comment

Comments

@grondo
Copy link
Contributor

grondo commented Oct 1, 2024

I've been seeing this failure regularly in CI, mainly in the inception builder for some reason:

2024-10-01T20:41:01.3261595Z flux-broker: setattr tbon.endpoint: File exists
2024-10-01T20:41:01.3262060Z 
2024-10-01T20:41:01.3262573Z flux-start: 0: PMI_Abort(): fatal bootstrap error
2024-10-01T20:41:01.3263033Z 
2024-10-01T20:41:01.3268384Z test_must_fail: died by non-SIGTERM signal: flux start -o,-Sbroker.rc1_path=,-Sbroker.rc3_path= -s2 -o,--setattr=tbon.endpoint=ipc:///tmp/customflux /bin/true
2024-10-01T20:41:01.3269615Z 
2024-10-01T20:41:01.3270381Z �[1m�[31mnot ok 18 - tbon.endpoint cannot be set�(B�[m
2024-10-01T20:41:01.3271468Z �[36mnot ok 18 - tbon.endpoint cannot be set�(B�[m
2024-10-01T20:41:01.3272240Z #	
2024-10-01T20:41:01.3272847Z #		test_must_fail_or_be_terminated flux start ${ARGS} -s2 \
2024-10-01T20:41:01.3273804Z #			-o,--setattr=tbon.endpoint=ipc:///tmp/customflux /bin/true
2024-10-01T20:41:01.3275015Z #	
2024-10-01T20:41:01.3275224Z
@garlick
Copy link
Member

garlick commented Oct 1, 2024

Well this test is racy in the sense that

  • each broker sends the abort message then exits with a code of 1
  • flux-start receives the abort message(s) and sends SIGKILL to both brokers

So flux-start should either exit with a code of 1 or 137, and both are allowed by test_must_fail_or_be_terminated.

Apparently it's not and we don't get to know what signal it was.

I will start a PR that fixes that shell function to show the signal number and see if I can repro in CI in my private fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants