-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: update execution client state change logs #5771
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -428,19 +428,19 @@ export class ExecutionEngineHttp implements IExecutionEngine { | |||||
|
||||||
switch (newState) { | ||||||
case ExecutionEngineState.ONLINE: | ||||||
this.logger.info("ExecutionEngine became online"); | ||||||
this.logger.info("Execution client became online"); | ||||||
break; | ||||||
case ExecutionEngineState.OFFLINE: | ||||||
this.logger.error("ExecutionEngine went offline"); | ||||||
this.logger.error("Execution client went offline"); | ||||||
break; | ||||||
case ExecutionEngineState.SYNCED: | ||||||
this.logger.info("ExecutionEngine is synced"); | ||||||
this.logger.info("Execution client is synced"); | ||||||
break; | ||||||
case ExecutionEngineState.SYNCING: | ||||||
this.logger.info("ExecutionEngine is syncing"); | ||||||
this.logger.warn("Execution client is syncing"); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should not be a warning. It's a normal scenario that will happen a lot of times and syncing is a normal process not something a user should be worried about. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My assumption here is that this is only printed out if the state changes, therefore this should not happen often during normal operation, if the EL gets into syncing mode there is degraded performance, the node might not be able to perform correct head votes / miss block proposals. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think warn level is good here because it will only come if execution goes into syncing from synced, which is actually not supposed to happen often once execution syncs up with the CL, and users getting notified that node won't be able to perform its duties till execution is back up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's the sync stats among the EL nodes, not the with the CL. And EL nodes can get unsynced and synced a lot? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's an interesting topic, how does this actually work if you specify multiple ELs in I see issues with this
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
At least in the normal case with just one execution client I have not yet seen that the EL is getting back to syncing mode if it was already synced before and has been up 24/7 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
no they can't once synced they should stay synced with the CL head as long as CL doesn't go out of sync and then CL doesn't has to do batch updates to EL (our req/resp is batch processing) to get back to head There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think warn should be better then, let's keep the change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really sure about adding this, we are not printing this out when the beacon node itself is syncing, my general feeling about this is that users/operators generally understand what a syncing client implies (I guess it becomes more complicated in case of EL, see my other comment below) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@g11tech on this topic, what does "not fully" mean in that case, I saw users complain about missed head votes which was due to EL lacking behind / unsynced. My understanding is
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if el is not synced with the cl head then except gossip we do not participate in the network at all including attests, proposals aggregates etc. so none of validator duties There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In that case I think it should be fine to just print "Execution client is syncing" as warn. Maybe the scenario where all head votes are missed is related to the CL not being able to keep up with head (but close to it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a fan really of long explanations in the logs. I think something more concise would be: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should keep logs simple and use commonly understood terms. For errors that crash the process or fail a CLI command we can add more details as it is a one time message and we should help the user to fix the issue without having to guess what went wrong or having to ask in discord because the error is unclear.
hmm personally not a fan, would have to add this to offline state change message as well and then add another message to online, synced that indiciates that those are not "on hold" anymore. |
||||||
break; | ||||||
case ExecutionEngineState.AUTH_FAILED: | ||||||
this.logger.error("ExecutionEngine authentication failed"); | ||||||
this.logger.error("Execution client authentication failed"); | ||||||
break; | ||||||
} | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note on this log "ExecutionEngine is synced", right now it is printed out any time the beacon node starts because the default state is
online
and then it goes tosynced
state if EL is synced.Therefore, printing "can now participate" is not that accurate in that case. I would suggest we keep the logs simple for now and revisit it if we get user feedback on it or if we notice that there is confusion around what this means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does it take usually for Lodestar to know whether the EL is actually in a
synced
state or if this is just the defaultonline
state due to beacon startup? If there's some sort of way to delay/timeout this message to ensure it actually issynced
, that might work better?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After first request to EL we know the actual status and print out a status change message
We know the correct state at this point in time since we made an request to EL, "can now participate" is just unnecessary here because beacon node was just starting up and EL was likely always synced.
The message only makes sense to me if EL goes from syncing to synced.