Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validator: listen and get metadata only for committee validators #1787

Open
wants to merge 11 commits into
base: stage
Choose a base branch
from

Conversation

nkryuchkov
Copy link
Contributor

@nkryuchkov nkryuchkov commented Oct 10, 2024

Closes #1703

Copy link
Contributor

@y0sher y0sher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this pr is still immature. quoting the issue here to clarify my original intentions -

- Consider to pull metadata from beacon node only about validators that are part of your listened subnets.
- Reject/ignore all messages that are not part of your subnet validators (this might happen automatically since you won't have their status and duties)

So about first point I see you started it, but your using the ValidatorSubnet calculation which is pre-alan way to find your subnets. We now use CommiteeSubnet and have a different way of dividing our validators to subnets (according to committees).

About second point I don't see that you did anything about message validation, though if we don't have any metadata about this operator's validators than we might just naturally reject/ignore it? please make sure what happens in this case.

@nkryuchkov
Copy link
Contributor Author

nkryuchkov commented Oct 16, 2024

So about first point I see you started it, but your using the ValidatorSubnet calculation which is pre-alan way to find your subnets. We now use CommiteeSubnet and have a different way of dividing our validators to subnets (according to committees).

Added post-fork handling, please review again

About second point I don't see that you did anything about message validation, though if we don't have any metadata about this operator's validators than we might just naturally reject/ignore it? please make sure what happens in this case.

In this case the message gets ignored:

	ErrNoShareMetadata                         = Error{text: "share has no metadata"}

Copy link
Contributor

@y0sher y0sher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove prefork code, as I explained its not needed anymore
  • to make this PR final we need to POC that it works on stage
  • Compare cpu mem usage
  • compare how this impacts the overload on beacon node
  • We probably need to ignore messages that we don't have duties or its validator we don't see as active (because they're not ours). instead of rejecting them.
  • @MatheusFranco99 can you chime in in terms of protocol and networking, other than more well spread messages, any reason why ignoring these message will impact us badly?

@MatheusFranco99
Copy link
Contributor

@MatheusFranco99 can you chime in in terms of protocol and networking, other than more well-spread messages, any reason why ignoring these messages will impact us badly?

@y0sher
Do you mean ignoring messages from validators that we think are liquidated / not active / exited or whatever similar? I think it's not a problem to ignore instead of reject. We also have ignoring protection against an attack so we should be fine.

But we must not do this if you meant ignoring messages about validators that we are not part of the committee.

@nkryuchkov nkryuchkov marked this pull request as ready for review October 16, 2024 23:42
@nkryuchkov
Copy link
Contributor Author

  • Remove prefork code, as I explained its not needed anymore

done

  • to make this PR final we need to POC that it works on stage

I'm still testing it but so far it seems to be working well

image
  • Compare cpu mem usage

It was deployed between 23:46 and 00:28 and since 01:13

CPU dropped from ~2 to ~0.4-0.5:

image

Memory might have dropped a little bit but the difference is not that huge, also it differs between runs:

image

Receive bandwidth dropped from ~1.3 to ~0.4-0.5, transmit bandwidth just a little bit:

image

Although, for another node the transmit bandwidth difference is significant:

image
  • compare how this impacts the overload on beacon node

I don't see any difference

CPU:

image

Mem:

image
  • We probably need to ignore messages that we don't have duties or its validator we don't see as active (because they're not ours). instead of rejecting them.

Well, we ignore messages if there's no metadata, so if they're not ours, we should have no metadata for them, hence the message is ignored

@MatheusFranco99
Copy link
Contributor

Well, we ignore messages if there's no metadata, so if they're not ours, we should have no metadata for them, hence the message is ignored

@nkryuchkov
When you say "if they're not ours", you mean non SSV validators, right? Or do you mean non-committee validators? Or validators from topics we're not subscribed to? 😅
I think that's the part I got a bit confused

Great job!

@nkryuchkov
Copy link
Contributor Author

@MatheusFranco99 sorry for the confusion, I meant validators from topics we're not subscribed to

Copy link
Contributor

@MatusKysel MatusKysel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for _, share := range shares {
if c.operatorDataStore.GetOperatorID() != 0 && share.BelongsToOperator(c.operatorDataStore.GetOperatorID()) {
ownShares = append(ownShares, share)
}
allPubKeys = append(allPubKeys, share.ValidatorPubKey[:])
committeeSubnet := byte(commons.CommitteeSubnet(share.CommitteeID()))
if bytes.Contains(activeSubnets, []byte{committeeSubnet}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have SharedSubnets and some other subnets utility functions. its best to use a function across the code to make changing and bug fixing easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SharedSubnets checks intersections between two subnet sets but we have a single value here, so I think it's not a good fit here. However, there was a bug in my implementation, I don't need bytes.Contains, I can just check the index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think we need to refactor how we work with subnets in future PRs: Create a type for it and add methods

@@ -1139,6 +1148,12 @@ func (c *controller) UpdateValidatorMetaDataLoop() {
if share.Liquidated {
return true
}

committeeSubnet := byte(commons.CommitteeSubnet(share.CommitteeID()))
if !bytes.Contains(activeSubnets, []byte{committeeSubnet}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -1131,6 +1139,7 @@ func (c *controller) UpdateValidatorMetaDataLoop() {
const batchSize = 512
var sleep = 2 * time.Second

activeSubnets := c.network.ActiveSubnets()
Copy link
Contributor

@moshe-blox moshe-blox Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for performance reasons, because this would be checked >44k times on Mainnet, we should change it to return [SubnetCount]bool or convert into it here, and then we can just check the array at the subnet index instead of using bytes.Contains below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshe-blox There was a bug in my implementation, I don't need to use bytes.Contains actually, I can just check the index. I don't think we need to have an array in this PR just in this method, but we need to refactor subnets to have array as an underlying type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants