Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - feat: hare preround proposal compaction #6129

Closed
wants to merge 19 commits into from

Conversation

acud
Copy link
Contributor

@acud acud commented Jul 11, 2024

Motivation

Current hare prerounds use 20-byte blake3 hashes of the proposals. This PR aims to reduce the message size by adopting the changes described in #5606 with further discussions in the research forum

Description

Adopts the changes by adding a field onto the Value type which encodes the shorter IDs (the first few bytes of the VRF signature for the block proposal. This field is to be used only on the preround.

Closes #5606
Supersedes #6060
Ref #5256 #4765

Test Plan

  • unit tests
  • running it on a devnet that would fork to choose the new protocol version at a given epoch

TODO

  • unit tests
  • change to 3 or 4 byte encoding to prevent collisions
  • error handling from reconstructProposals could probably be improved to just fallback into a full proposals list exchange instead of returning the error
  • add cmd configuration (for switchover epoch)
  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • do we need to increase the preround length?
  • handle proposal with empty eligibility array
  • can we raise the committee size again?
  • Update documentation as needed
  • Update changelog as needed

@acud acud marked this pull request as draft July 11, 2024 19:18
@acud acud self-assigned this Jul 11, 2024
@acud acud force-pushed the hare-compact-encoding-3 branch from 9fc1d38 to 47c13c6 Compare July 11, 2024 22:01
Copy link

codecov bot commented Jul 11, 2024

Codecov Report

Attention: Patch coverage is 82.13783% with 254 lines in your changes missing coverage. Please review.

Project coverage is 81.9%. Comparing base (13dfb49) to head (59fe1b1).
Report is 6 commits behind head on develop.

Files Patch % Lines
hare4/hare.go 80.4% 76 Missing and 29 partials ⚠️
hare4/eligibility/oracle.go 76.0% 56 Missing and 14 partials ⚠️
node/node.go 48.6% 36 Missing and 2 partials ⚠️
hare4/compat/weakcoin.go 0.0% 14 Missing ⚠️
hare4/malfeasance.go 80.4% 6 Missing and 2 partials ⚠️
hare4/legacy_oracle.go 76.6% 4 Missing and 3 partials ⚠️
hare4/tracer.go 0.0% 7 Missing ⚠️
hare4/types.go 94.0% 3 Missing and 1 partial ⚠️
p2p/server/server.go 50.0% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           develop   #6129     +/-   ##
=========================================
  Coverage     81.9%   81.9%             
=========================================
  Files          301     308      +7     
  Lines        32406   33807   +1401     
=========================================
+ Hits         26548   27706   +1158     
- Misses        4135    4327    +192     
- Partials      1723    1774     +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acud acud force-pushed the hare-compact-encoding-3 branch 6 times, most recently from 43754c2 to 0977a02 Compare July 17, 2024 16:37
@acud acud force-pushed the hare-compact-encoding-3 branch 2 times, most recently from d849e25 to 8fb21d1 Compare July 17, 2024 17:57
@acud acud force-pushed the hare-compact-encoding-3 branch from 8fb21d1 to 3ad6a2d Compare July 17, 2024 18:08
@acud acud marked this pull request as ready for review July 17, 2024 18:48
hare4/hare.go Outdated Show resolved Hide resolved
return fmt.Errorf("message %s: cache miss", compactProps.MsgId)
}
resp := &CompactIdResponse{Ids: m.Body.Value.Proposals}
respBytes := codec.MustEncode(resp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, can't we just encode resp into the stream instead of using an intermediate buffer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do this one must know to which length the type is going to encode into without really encoding it (since you must encode how many bytes are in the response before actually writing them into the response). While it is possible, it's probably something that the scale package should offer instead of writing it by hand which would be fragile. Any ideas on how to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to encode the length first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you need to know how many bytes to read out of the stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the scale decoding would fail if the response had the wrong length anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, while we could do smth like this in go-scale, it is not really fragile if the data are an array of fixed-size IDs.
pls check how it's done in the fetch package, e.g. epoch stream handler

if err := h.streamIDs(ctx, s, func(cbk retrieveCallback) error {

func (h *handler) streamIDs(ctx context.Context, s io.ReadWriter, retrieve retrieveFunc) error {
started := false
if err := retrieve(func(total int, id []byte) error {
if !started {
started = true
respSize := scale.LenSize(uint32(total)) + uint32(total*len(id))
if _, err := codec.EncodeLen(s, respSize); err != nil {
return err
}
if _, err := codec.EncodeLen(s, uint32(total)); err != nil {
return err
}
}
if _, err := s.Write(id[:]); err != nil {
return err
}
return nil
},
); err != nil {
if !started {
if err := server.WriteErrorResponse(s, err); err != nil {
h.logger.Debug("failed to write error response", log.ZContext(ctx), zap.Error(err))
}
}
return err
}
// If any IDs were sent:
// Response.Data already sent
// Response.Error has length 0
lens := []uint32{0}
if !started {
// If no ATX IDs were sent:
// Response.Data is just a single zero byte (length 0),
// but the length of Response.Data is 1 so we must send it
// Response.Error has length 0
lens = []uint32{1, 0, 0}
}
for _, l := range lens {
if _, err := codec.EncodeLen(s, l); err != nil {
return err
}
}
return nil
}

and the client part
return readIDSlice(s, &ed.AtxIDs, maxEpochDataAtxIDs)

func readIDSlice[V any, H scale.DecodablePtr[V]](r io.Reader, slice *[]V, limit uint32) (int, error) {
return server.ReadResponse(r, func(respLen uint32) (int, error) {
d := scale.NewDecoder(r)
length, total, err := scale.DecodeLen(d, limit)
if err != nil {
return total, err
}
if int(length*types.Hash32Length)+total != int(respLen) {
return total, errors.New("bad slice length")
}
*slice = make([]V, length)
for i := uint32(0); i < length; i++ {
n, err := H(&(*slice)[i]).DecodeScale(d)
total += n
if err != nil {
return total, err
}
}
return total, err
})
}

The client part is not perfect b/c we could be handling each ID right away instead of waiting for the whole slice to be sent, but that would require further refactoring of the fetcher code; not sure whether it is applicable here.
Nevertheless, the server side can be improved here I think

Maybe we could move some helpers to the codec package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the scale decoding would fail if the response had the wrong length anyway.

So the reason it is done this way is the following:
When you read a response from a peer you want to avoid a situation where you read a stream until an EOF - this is generally bad because you put yourself in the risk of having a malicious peer to just feed you data which is read all into memory, resulting in the node memory usage going through the roof.
That's why you want to know how long is the data in advance.

Copy link
Contributor Author

@acud acud Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivan4th the server.ReadResponse part implicitly reads the respLen which encodes the whole message size in advance https://github.com/spacemeshos/go-spacemesh/blob/develop/p2p/server/server.go#L511

Which is exactly what I'm doing here, except I don't want to use the leaky server abstraction because it does a bunch of things I'm not interested in.

config/presets/testnet.go Outdated Show resolved Hide resolved
hare3/compat/weakcoin.go Show resolved Hide resolved
hare4/legacy_oracle.go Outdated Show resolved Hide resolved
hare4/types.go Outdated Show resolved Hide resolved
hare3/hare.go Show resolved Hide resolved
node/node.go Outdated Show resolved Hide resolved
hare4/hare.go Outdated Show resolved Hide resolved
hare4/hare.go Outdated Show resolved Hide resolved
hare4/hare.go Outdated Show resolved Hide resolved
hare4/hare.go Outdated Show resolved Hide resolved
Comment on lines 1292 to 1300
calls := [3]int{}
for i, n := range cluster.nodes {
n.mverifier.EXPECT().Verify(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(_ signing.Domain, _ types.NodeID, _ []byte, _ types.EdSignature) bool {
calls[i] = calls[i] + 1
// when first call return false, other
return !(calls[i] == 1)
}).AnyTimes()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about creating proposals with invalid signatures instead of mocking the verifier?

If mocking is easier/better, consider rewriting this as:

	for _, n := range cluster.nodes {
		gomock.InOrder(
			n.mverifier.EXPECT().
				Verify(signing.PROPOSAL, gomock.Any(), gomock.Any(), gomock.Any()).
				Return(false).
				MaxTimes(1),
			n.mverifier.EXPECT().
				Verify(signing.PROPOSAL, gomock.Any(), gomock.Any(), gomock.Any()).
				Return(true).
				AnyTimes(),
		)
	}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more complicated because the mocked Verify is not because the proposal signatures are incorrect, but because the node which receives the proposal matches the wrong proposal because it has some collision locally, and then cannot verify the signature correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand. Why not generate colliding proposals in a way that would trigger the desired flow? I generally prefer to simulate real flow (by creating realistic conditions that trigger the tested behavior) rather than hacking the insides of the implementation. Such tests are easier to understand and maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because those things require mining of the actual data, building external tools, the tests would start breaking in case we change small details inside of the actual domain object. Pluggable implementations are standard practice and personally I'd rather go down that way.

hare4/hare.go Outdated Show resolved Hide resolved
@acud
Copy link
Contributor Author

acud commented Jul 23, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jul 24, 2024
@spacemesh-bors
Copy link

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Jul 24, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Jul 24, 2024
@spacemesh-bors
Copy link

try

Build succeeded:

@acud
Copy link
Contributor Author

acud commented Jul 29, 2024

@poszu can I have another review please?

hare4/types.go Outdated Show resolved Hide resolved
hare4/hare.go Outdated Show resolved Hide resolved
@acud acud force-pushed the hare-compact-encoding-3 branch from 6d3ef2e to f75c44c Compare July 31, 2024 23:58
hare4/hare.go Outdated Show resolved Hide resolved
Comment on lines +502 to +506
if !cl.collidingProposals {
// if we want non-colliding proposals we copy from the rng
// otherwise it is kept as an array of zeroes
cl.t.rng.Read(vrf[:])
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just overwriting the first bytes?

Suggested change
if !cl.collidingProposals {
// if we want non-colliding proposals we copy from the rng
// otherwise it is kept as an array of zeroes
cl.t.rng.Read(vrf[:])
}
cl.t.rng.Read(vrf[:])
if cl.collidingProposals {
copy(vrf[:4], []byte("1234"))
}

hare4/hare_test.go Outdated Show resolved Hide resolved
return fmt.Errorf("message %s: cache miss", compactProps.MsgId)
}
resp := &CompactIdResponse{Ids: m.Body.Value.Proposals}
respBytes := codec.MustEncode(resp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the scale decoding would fail if the response had the wrong length anyway.

acud and others added 4 commits August 1, 2024 07:54
Co-authored-by: Bartosz Różański <bartek.roza@gmail.com>
Co-authored-by: Bartosz Różański <bartek.roza@gmail.com>
@acud
Copy link
Contributor Author

acud commented Aug 1, 2024

bors try

spacemesh-bors bot added a commit that referenced this pull request Aug 1, 2024
@spacemesh-bors
Copy link

spacemesh-bors bot commented Aug 1, 2024

try

Build failed:

@acud
Copy link
Contributor Author

acud commented Aug 1, 2024

bors merge

spacemesh-bors bot pushed a commit that referenced this pull request Aug 1, 2024
## Motivation

Current `hare` prerounds use 20-byte `blake3` hashes of the proposals. This PR aims to reduce the message size by adopting the changes described in #5606 with further [discussions](https://community.spacemesh.io/t/compact-encoding-for-hare/427) in the research forum
@spacemesh-bors
Copy link

spacemesh-bors bot commented Aug 1, 2024

Pull request successfully merged into develop.

Build succeeded:

@spacemesh-bors spacemesh-bors bot changed the title feat: hare preround proposal compaction [Merged by Bors] - feat: hare preround proposal compaction Aug 1, 2024
@spacemesh-bors spacemesh-bors bot closed this Aug 1, 2024
@spacemesh-bors spacemesh-bors bot deleted the hare-compact-encoding-3 branch August 1, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compact encoding for hare preround message
4 participants