Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync2: ATX integration #6448
base: develop
Are you sure you want to change the base?
sync2: ATX integration #6448
Changes from 26 commits
ca2d5c6
6e3aa16
4828b67
eff6963
4984889
a89964a
bb31cc6
3e5a401
2753560
f5cae06
9e585e5
042d0d7
294bc6c
0914f1f
1ff57ab
ce73f54
f411f7e
4ac694b
bb7226f
50afbf8
284f836
a8d87f1
24768fb
bb54fd8
244ceb1
9092fe7
2987bbc
7dab83a
858787a
f869ded
ad4924b
8a1e693
c22bd45
6e40bda
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 1031 in fetch/fetch.go
Codecov / codecov/patch
fetch/fetch.go#L1030-L1031
Check warning on line 1035 in fetch/fetch.go
Codecov / codecov/patch
fetch/fetch.go#L1034-L1035
Check warning on line 104 in fetch/mesh_data.go
Codecov / codecov/patch
fetch/mesh_data.go#L100-L104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems very inefficient calling possibly millions of callbacks to communicate the same error where the caller side probably doesn't even care about any error besides the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's rather likely to be a small subset of ATX IDs e.g. not downloaded due to
hs/1
request throttling, and these requests will be retried laterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point still stands -
getHashes
might be called with 5.0 Mio hashes at once - if the limit cannot be acquired because throttling is active then millions of callbacks are called.As far as I can see this only affects this section of the code:
go-spacemesh/sync2/atxs.go
Lines 110 to 129 in ad4924b
This will
print a debug log with the exact same error for every ATX andincrement every element incs.state
. This could be handled much simpler (and arguably more efficiently) without requiring to keep track of the retries of every single hash:Arguably this is out of the scope of this PR but this should be addressed. It makes no sense to register a hash for a peer then requesting that hash in a batch and let the fetcher again reconstruct from which peer to fetch that hash from. Error handling is also bad, because for every fetched hash an error has to be returned via callback or aggregated in a
&fetcher.BatchError{}
. Instead imo it would be much simpler to just have a (blocking)getHash
method that fetches a hash from a given peer and returns an error if something went wrong. Then the caller can easily parallize requests and match errors to peers & hashes.Internally the fetcher can still group requests into single batches, request those from individual peers and deserialise the batched result. This also makes it easier to figure out what when wrong if something did go wrong. Right now we have a lot of log errors of the kind
validation ran for unknown hash
andhash missing from ongoing requests
because the fetcher fails to match requests of hashes to peers and/or callers with how it is structured at the moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do a serious
fetcher
cleanup after we switch tosyncv2
at least for ATXs and malfeasance proofs. Right now it's probably a bit too early as we'll have to update v1 syncers that are on their way out.Simplifying blob fetch logic that currently uses promises etc. should be one of the goals, other being removing non-streaming fetcher client and server code.
Check warning on line 112 in fetch/mesh_data.go
Codecov / codecov/patch
fetch/mesh_data.go#L110-L112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
Check warning on line 119 in fetch/mesh_data.go
Codecov / codecov/patch
fetch/mesh_data.go#L119
Check warning on line 128 in fetch/mesh_data.go
Codecov / codecov/patch
fetch/mesh_data.go#L128