Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeouts to refresh_health in VC #6231

Open
michaelsproul opened this issue Aug 6, 2024 · 1 comment
Open

Add timeouts to refresh_health in VC #6231

michaelsproul opened this issue Aug 6, 2024 · 1 comment
Labels
optimization Something to make Lighthouse run more efficiently. val-client Relates to the validator client binary

Comments

@michaelsproul
Copy link
Member

Description

While working on:

I noticed that our implementation of refresh_status could be optimised in several ways:

/// Perform some queries against the node to determine if it is a good candidate, updating
/// `self.status` and returning that result.
pub async fn refresh_status<T: SlotClock>(
&self,
slot_clock: Option<&T>,
spec: &ChainSpec,
log: &Logger,
) -> Result<(), CandidateError> {
let previous_status = self.status(RequireSynced::Yes).await;
let was_offline = matches!(previous_status, Err(CandidateError::Offline));
let new_status = if let Err(e) = self.is_online(was_offline, log).await {
Err(e)
} else if let Err(e) = self.is_compatible(spec, log).await {
Err(e)
} else if let Err(e) = self.is_synced(slot_clock, log).await {
Err(e)
} else {
Ok(())
};
// In case of concurrent use, the latest value will always be used. It's possible that a
// long time out might over-ride a recent successful response, leading to a falsely-offline
// status. I deem this edge-case acceptable in return for the concurrency benefits of not
// holding a write-lock whilst we check the online status of the node.
*self.status.write().await = new_status;
new_status
}

The issues are:

  • There's no timeout for any of the HTTP methods we call, so this function could run indefinitely in case of a very unresponsive BN.
  • We make 3 calls when 2 would probably be sufficient. We could use the response from is_synced to infer online status, rather than making a separate request for the version.

Version

Lighthouse v5.3.0

@michaelsproul michaelsproul added val-client Relates to the validator client binary optimization Something to make Lighthouse run more efficiently. labels Aug 6, 2024
@macladson
Copy link
Member

macladson commented Oct 26, 2024

Just want to point out with the merge of #4393:

  • This function is now called refresh_health.
  • Issue 2 is resolved, it now only makes 2 calls to the BN.
  • Issue 1 is still present.

@macladson macladson changed the title Optimise refresh_status in VC Add timeouts to refresh_health in VC Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. val-client Relates to the validator client binary
Projects
None yet
Development

No branches or pull requests

2 participants