Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: update execution client state change logs #5771

Merged
merged 2 commits into from
Jul 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions packages/beacon-node/src/execution/engine/http.ts
Original file line number Diff line number Diff line change
Expand Up @@ -428,19 +428,19 @@ export class ExecutionEngineHttp implements IExecutionEngine {

switch (newState) {
case ExecutionEngineState.ONLINE:
this.logger.info("ExecutionEngine became online");
this.logger.info("Execution client became online");
break;
case ExecutionEngineState.OFFLINE:
this.logger.error("ExecutionEngine went offline");
this.logger.error("Execution client went offline");
break;
case ExecutionEngineState.SYNCED:
this.logger.info("ExecutionEngine is synced");
this.logger.info("Execution client is synced");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.logger.info("Execution client is synced");
this.logger.info("Execution client is synced and client can now participate fully in the network");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note on this log "ExecutionEngine is synced", right now it is printed out any time the beacon node starts because the default state is online and then it goes to synced state if EL is synced.

Therefore, printing "can now participate" is not that accurate in that case. I would suggest we keep the logs simple for now and revisit it if we get user feedback on it or if we notice that there is confusion around what this means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take usually for Lodestar to know whether the EL is actually in a synced state or if this is just the default online state due to beacon startup? If there's some sort of way to delay/timeout this message to ensure it actually is synced, that might work better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take usually for Lodestar to know whether the EL is actually in a synced state

After first request to EL we know the actual status and print out a status change message

ensure it actually is synced

We know the correct state at this point in time since we made an request to EL, "can now participate" is just unnecessary here because beacon node was just starting up and EL was likely always synced.
The message only makes sense to me if EL goes from syncing to synced.

break;
case ExecutionEngineState.SYNCING:
this.logger.info("ExecutionEngine is syncing");
this.logger.warn("Execution client is syncing");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be a warning. It's a normal scenario that will happen a lot of times and syncing is a normal process not something a user should be worried about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption here is that this is only printed out if the state changes, therefore this should not happen often during normal operation, if the EL gets into syncing mode there is degraded performance, the node might not be able to perform correct head votes / miss block proposals.

Copy link
Contributor

@g11tech g11tech Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think warn level is good here because it will only come if execution goes into syncing from synced, which is actually not supposed to happen often once execution syncs up with the CL, and users getting notified that node won't be able to perform its duties till execution is back up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the sync stats among the EL nodes, not the with the CL. And EL nodes can get unsynced and synced a lot?

Copy link
Member Author

@nflaig nflaig Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync stats among the EL nodes

That's an interesting topic, how does this actually work if you specify multiple ELs in --execution.urls.

I see issues with this

  • if one beacon node has multiple ELs only one gets payload and fork choice updates, how are the fallback ELs able to sync? I thought this requires some sort of multiplexer / middleware that sends request to all ELs.
  • if a execution client is connected to multiple beacon nodes it causes frequent reorgs as it might received different fork choice updates

Copy link
Member Author

@nflaig nflaig Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EL nodes can get unsynced and synced a lot

At least in the normal case with just one execution client I have not yet seen that the EL is getting back to syncing mode if it was already synced before and has been up 24/7

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the sync stats among the EL nodes, not the with the CL. And EL nodes can get unsynced and synced a lot?

no they can't once synced they should stay synced with the CL head as long as CL doesn't go out of sync and then CL doesn't has to do batch updates to EL (our req/resp is batch processing) to get back to head

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think warn should be better then, let's keep the change

Copy link
Contributor

@g11tech g11tech Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.logger.warn("Execution client is syncing");
this.logger.warn("Execution client is syncing hence client would not be able to fully participate in the network till its synced");

Copy link
Member Author

@nflaig nflaig Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about adding this, we are not printing this out when the beacon node itself is syncing, my general feeling about this is that users/operators generally understand what a syncing client implies (I guess it becomes more complicated in case of EL, see my other comment below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hence client would not be able to fully participate in the network till its synced

@g11tech on this topic, what does "not fully" mean in that case, I saw users complain about missed head votes which was due to EL lacking behind / unsynced.

My understanding is

  1. if EL is really far behind --> missed attestations / proposals
  2. if EL is close to being synced (~ 1-2 epochs behind) --> incorrect / missing head votes, att target / source votes still work, missed proposals
  3. if EL is slow to process blocks (can't quite keep up), same as degraded performance as point 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if el is not synced with the cl head then except gossip we do not participate in the network at all including attests, proposals aggregates etc. so none of validator duties

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not participate in the network at all

In that case I think it should be fine to just print "Execution client is syncing" as warn. Maybe the scenario where all head votes are missed is related to the CL not being able to keep up with head (but close to it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan really of long explanations in the logs. I think something more concise would be: Execution client is syncing. Validator duties on hold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan really of long explanations in the logs

Yeah, I think we should keep logs simple and use commonly understood terms. For errors that crash the process or fail a CLI command we can add more details as it is a one time message and we should help the user to fix the issue without having to guess what went wrong or having to ask in discord because the error is unclear.

Validator duties on hold

hmm personally not a fan, would have to add this to offline state change message as well and then add another message to online, synced that indiciates that those are not "on hold" anymore.

break;
case ExecutionEngineState.AUTH_FAILED:
this.logger.error("ExecutionEngine authentication failed");
this.logger.error("Execution client authentication failed");
break;
}

Expand Down
Loading