Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP amplify/debug testQualifiedUsers #19294

Closed
wants to merge 7 commits into from

Conversation

martinpitt
Copy link
Member

Trying to debug this common flake. I don't get it locally, and there's nothing useful in the journal.

@martinpitt martinpitt added the no-test For doc/workflow changes, or experiments which don't need a full CI run, label Sep 7, 2023
@martinpitt martinpitt marked this pull request as draft September 7, 2023 14:07
@martinpitt
Copy link
Member Author

martinpitt commented Sep 7, 2023

Interesting - - so there is no ipaclient-uninstall.log at all. So this smells like the "closed connection" is from realmd when cockpit tries to call .leave().

At first sight this smells like a race condition with an idle timer. But I sit()ed right before clicking the Leave button, and realmd.service still stays alive after several minutes. And neither does the journal show that the service stopped. Also, if I stop realmd.service manually and then continue the test, it fails differently:

warning: Failed to leave domain: No such interface “org.freedesktop.realmd.Realm” on object at path /org/freedesktop/realmd/Sssd/cockpit_lan_2

As the next step I'll enable D-Bus debugging. But that log shows no realmd related D-Bus methods at all, just a lot of packagekit and systemd noise. I'll disable preloads and enable debug messages earlier.

But with that it's no good -- I retried the test about 7 times, and it always succeeds. Heisenbug/timing change.. So I reenabled preloads again, that reproduces it again

Next step, I added some console.logs() to trace what happens at leaving the domain, and to inspect the proxy state and close signals. This shows that it's really the D-Bus Deconfigure() call, not the cleanupWSCredentials(). But the D-Bus debug log does not actually show a Deconfigure call.

Next step, COCKPIT_DEBUG=all ; failed tests, and journal. But that doesn't reveal anything new -- the last D-Bus message is the add_match from getting the Kerberos interface.

There is no disconnected log message anywhere, but there are a few dbus close: console debug messages -- let's investigate them. The run shows no D-Bus debug messages for the realm leave, and no related close messages. Just at the very beginning there is

> wait: ph_is_present("#system_information_domain_button:not([disabled]):not([aria-disabled=true])")    
> debug: dbus close(): org.freedesktop.realmd  have channel: true     
> debug: dbus on_close: org.freedesktop.realmd {"command":"close","channel":"1:2!7"}    
> debug: dbus close_perform: org.freedesktop.realmd {"command":"close","channel":"1:2!7"}    
> debug: dbus open:    
> debug: dbus: {"watch":{"path":"/org/freedesktop/realmd"},"id":"1"}    
> debug: dbus: {"add-match":{"interface":"org.freedesktop.realmd.Realm","path_namespace":"/org/freedesktop/realmd"}}    
> debug: dbus: {"watch":{"interface":"org.freedesktop.realmd.Realm","path_namespace":"/org/freedesktop/realmd"},"id":"2"}    

which I think is just for the overview page to show/check for the current realm. But that happens waay earlier than opening the leave dialog.

@martinpitt martinpitt force-pushed the leave-domain branch 9 times, most recently from be18238 to bc3ed48 Compare September 13, 2023 04:42
@mvollmer
Copy link
Member

I'll take over for a while.

First let's disable the D-Bus logging again and start tracing backwards from the dialog error. :)

martinpitt and others added 3 commits September 14, 2023 14:09
- Show the D-Bus client name, as we usually have lots of D-Bus clients,
  and it's otherwise impossible to know to which one the message
  applies.
- Stringify the JSON objects, so that they get printed properly in our
  test logs.
- Log calls to .close(), and log rejected outstanding calls.
@mvollmer mvollmer force-pushed the leave-domain branch 4 times, most recently from e247263 to 8a35b4c Compare September 14, 2023 12:29
The client recreates all proxies when the superuser thing emits a
"changed" event. When this happens while the Leave dialog is open, the
"realm" proxy in the Leave dialog becomes unusable.
@mvollmer
Copy link
Member

mvollmer commented Sep 14, 2023

Ok, the issue is a race between the Superuser level, RealmdClient, and LeaveDialog. RealmdClient listens to superuser level changes and will call its own initProxy method twice during every login, once when constructed, and once when superuser.allowed changes from null to true or false. If a LeaveDialog is open when initProxy is called, the internal state of the dialog will become invalid. It will call Deconfigure on a DBusProxy that has been closed by initProxy.

The LeaveDialog could do useEvent(realmd_client, "changed") but that would just expose another race: There is a time where initProxy has already closed the old proxies but the new proxies have not been installed and have not been announced with a "changed" signal. (The amplified tests here hit that pretty reliably.) That could be fixed by closing the old proxies only once the new ones are available.

Another fix is to only call initProxy when superuser.allowed is not null. This is the usual pattern to avoid initializing things twice during page load that depend on the superuser level. I make a PR for that.

@mvollmer
Copy link
Member

. But that log shows no realmd related D-Bus methods at all

This is the truth: the Deconfigure method call was never made since the channel was closed when the code tried to make that call. This was super helpful information, thanks!

@mvollmer
Copy link
Member

> debug: dbus close(): org.freedesktop.realmd have channel: true

This is initProxy in realmd.jsx closing the old DBusClient before opening a new one. The LeaveDialog was already open at that time but didn't update itself with the new proxies.

@martinpitt martinpitt closed this Sep 15, 2023
@martinpitt martinpitt deleted the leave-domain branch September 15, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-test For doc/workflow changes, or experiments which don't need a full CI run,
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants