-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PeerDAS #14129
Draft
nalepae
wants to merge
75
commits into
develop
Choose a base branch
from
peerDAS
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Implement PeerDAS #14129
+19,758
−3,076
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nisdas
force-pushed
the
peerDAS
branch
3 times, most recently
from
August 20, 2024 09:41
dc39693
to
3fb1fa4
Compare
* Add Support For Discovery Of Column Subnets * Lint for SubnetsPerNode * Manu's Review * Change to a better name
* Add Data Column Subscriber * Add Data Column Vaidator * Wire all Handlers In * Fix Build * Fix Test * Fix IP in Test * Fix IP in Test
* Add RPC Handler * Add Column Requests * Update beacon-chain/db/filesystem/blob.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Update beacon-chain/p2p/rpc_topic_mappings.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Manu's Review * Manu's Review * Interface Fixes * mock manager --------- Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Bump `c-kzg-4844` lib to the `das` branch. * Implement `MerkleProofKZGCommitments`. * Implement `das-core.md`. * Use `peerdas.CustodyColumnSubnets` and `peerdas.CustodyColumns`. * `CustodyColumnSubnets`: Include `i` in the for loop. * Remove `computeSubscribedColumnSubnet`. * Remove `peerdas.CustodyColumns` out of the for loop.
* Remove capital letter from error messages. * `[4]byte` => `[fieldparams.VersionLength]byte`. * Prometheus: Remove extra `committee`. They are probably due to a bad copy/paste. Note: The name of the probe itself is remaining, to ensure backward compatibility. * Implement Proposer RPC for data columns. * Fix TestProposer_ProposeBlock_OK test. * Remove default peerDAS activation. * `validateDataColumn`: Workaround to return a `VerifiedRODataColumn`
* Add new DA check * Exit early in the event no commitments exist. * Gazelle * Fix Mock Broadcaster * Fix Test Setup * Update beacon-chain/blockchain/process_block.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Manu's Review * Fix Build --------- Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Update `consensus_spec_version` to `v1.5.0-alpha.1`. * `CustodyColumns`: Fix and implement spec tests. * Make deepsource happy. * `^uint64(0)` => `math.MaxUint64`. * Fix `TestLoadConfigFile` test.
…ob`. (#13957) * `SendDataColumnSidecarByRoot`: Return `RODataColumn` instead of `ROBlob`. * Make deepsource happier.
* Upgrade c-kzg-4844 package * Upgrade bazel deps
* Enable E2E And Add Fixes * Register Same Topic For Data Columns * Initialize Capacity Of Slice * Fix Initialization of Data Column Receiver * Remove Mix In From Merkle Proof * E2E: Subscribe to all subnets. * Remove Index Check * Remaining Bug Fixes to Get It Working * Change Evaluator to Allow Test to Finish * Fix Build * Add Data Column Verification * Fix LoopVar Bug * Do Not Allocate Memory * Update beacon-chain/blockchain/process_block.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Update beacon-chain/core/peerdas/helpers.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Update beacon-chain/core/peerdas/helpers.go Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com> * Gofmt * Fix It Again * Fix Test Setup * Fix Build * Fix Trusted Setup panic * Fix Trusted Setup panic * Use New Test --------- Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* Add Data Structure for New Request Type * Add Data Column By Range Handler * Add Data Column Request Methods * Add new validation for columns by range requests * Fix Build * Allow Prysm Node To Fetch Data Columns * Allow Prysm Node To Fetch Data Columns And Sync * Bug Fixes For Interop * GoFmt * Use different var * Manu's Review
* PeerDAS: Implement sampling. * `TestNewRateLimiter`: Fix with the new number of expected registered topics.
* Set Custody Count Correctly * Fix Discovery Count
* Adding error wrapping * Fix `CustodyColumnSubnets` tests.
* Update ckzg4844 to latest version * Run go mod tidy * Remove unnecessary tests & run goimports * Remove fieldparams from blockchain/kzg * Add back blank line * Avoid large copies * Run gazelle * Use trusted setup from the specs & fix issue with struct * Run goimports * Fix mistake in makeCellsAndProofs --------- Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
* PeerDAS: Run reconstruction in parallel. * `isDataAvailableDataColumns` --> `isDataColumnsAvailable` * `isDataColumnsAvailable`: Return `nil` as soon as half of the columns are received. * Make deepsource happy.
* DeepSource: Pass heavy objects by pointers. * `removeBlockFromQueue`: Remove redundant error checking. * `fetchBlobsFromPeer`: Use same variable for `append`. * Remove unused arguments. * Combine types. * `Persist`: Add documentation. * Remove unused receiver * Remove duplicated import. * Stop using both pointer and value receiver at the same time. * `verifyAndPopulateColumns`: Remove unused parameter * Stop using mpty slice literal used to declare a variable.
* `SendDataColumnsByRangeRequest`: Add some new fields in logs. * `BlobStorageSummary`: Implement `HasDataColumnIndex` and `AllDataColumnsAvailable`. * Implement `fetchDataColumnsFromPeers`. * `fetchBlobsFromPeer`: Return only one error.
* Fix the obvious... * Data columns sampling: Modify logging. * `waitForChainStart`: Set it threadsafe - Do only wait once. * Sampling: Wait for chain start before running the sampling. Reason: `newDataColumnSampler1D` needs `s.ctxMap`. `s.ctxMap` is only set when chain is started. Previously `waitForChainStart` was only called in `s.registerHandlers`, it self called in a go-routine. ==> We had a race condition here: Sometimes `newDataColumnSampler1D` were called once `s.ctxMap` were set, sometimes not. * Adresse Nishant's comments. * Sampling: Improve logging. * `waitForChainStart`: Remove `chainIsStarted` check.
* `sendPingRequest`: Add some comments. * `sendPingRequest`: Replace `stream.Conn().RemotePeer()` by `peerID`. * `pingHandler`: Add comments. * `sendMetaDataRequest`: Add comments and implement an unique test. * Gather `SchemaVersion`s in the same `const` definition. * Define `SchemaVersionV3`. * `MetaDataV1`: Fix comment. * Proto: Define `MetaDataV2`. * `MetaDataV2`: Generate SSZ. * `newColumnSubnetIDs`: Use smaller lines. * `metaDataHandler` and `sendMetaDataRequest`: Manage `MetaDataV2`. * `RefreshPersistentSubnets`: Refactor tests (no functional change). * `RefreshPersistentSubnets`: Refactor and add comments (no functional change). * `RefreshPersistentSubnets`: Compare cache with both ENR & metadata. * `RefreshPersistentSubnets`: Manage peerDAS. * `registerRPCHandlersPeerDAS`: Register `RPCMetaDataTopicV3`. * `CustodyCountFromRemotePeer`: Retrieve the count from metadata. Then default to ENR, then default to the default value. * Update beacon-chain/sync/rpc_metadata.go Co-authored-by: Nishant Das <nishdas93@gmail.com> * Fix duplicate case. * Remove version testing. * `debug.proto`: Stop breaking ordering. --------- Co-authored-by: Nishant Das <nishdas93@gmail.com>
* Persist All Changes * Fix All Tests * Fix Build * Fix Build * Fix Build * Fix Test Again * Add missing verification * Add Test Cases for Data Column Validation * Fix comments for methods * Fix comments for methods * Fix Test * Manu's Review
) * `parseIndices`: `O(n**2)` ==> `O(n)`. * PeerDAS: Implement `/eth/v1/beacon/blob_sidecars/{block_id}`. * Update beacon-chain/core/peerdas/helpers.go Co-authored-by: Sammy Rosso <15244892+saolyn@users.noreply.github.com> * Rename some functions. * `Blobs`: Fix empty slice. * `recoverCellsAndProofs` --> Move function in `beacon-chain/core/peerdas`. * peerDAS helpers: Add missing tests. * Implement `CustodyColumnCount`. * `RecoverCellsAndProofs`: Remove useless argument `columnsCount`. * Tests: Add cleanups. * `blobsFromStoredDataColumns`: Reconstruct if needed. * Make deepsource happy. * Beacon API: Use provided indices. * Make deepsource happier. --------- Co-authored-by: Sammy Rosso <15244892+saolyn@users.noreply.github.com>
* Update go.yml * Disable mnd * Update .golangci.yml * Update go.yml * Update go.yml * Update .golangci.yml * Update go.yml * Fix Lint Issues * Remove comment * Update .golangci.yml
* Update values * Update Spec To v1.5.0-alpha.5 * Fix Discovery Tests * Hardcode Subnet Count For Tests * Fix All Initial Sync Tests * Gazelle * Less Chaotic Service Initialization * Gazelle
* Use Data Column Validation Everywhere * Fix Build * Fix Lint * Fix Clock Synchronizer * Fix Panic
* Add Changes for Uint8 Csc * Fix Build * Fix Build for Sync * Fix Discovery Test
* Fix Various Bugs in PeerDAS * Remove Log * Remove useless copy var. --------- Co-authored-by: Manu NALEPA <enalepa@offchainlabs.com>
…or reconstruction (#14397) * `broadcastAndReceiveDataColumns`: Use real `sidecar.ColumnIndex` instead of position in the slice. And improve logging as well. * `isDataColumnsAvailable`: Improve logging. * `validateDataColumn`: Print `Accepted data column sidecar gossip` really at the end. * Subscriber: Improve logging. * `sendAndSaveDataColumnSidecars`: Use common used function for logging. * `dataColumnSidecarByRootRPCHandler`: Logging - Pring `all` instead of all the columns for a super node. * Verification: Improve logging. * `DataColumnsWithholdCount`: Set as `uint64` instead `int`. * `DataColumnFields`: Improve logging. * Logging: Remove now useless private `columnFields`function. * Avoid useless goroutines blocking for reconstruction. * Update beacon-chain/sync/subscriber.go Co-authored-by: Nishant Das <nishdas93@gmail.com> * Address Nishant's comment. * Improve logging. --------- Co-authored-by: Nishant Das <nishdas93@gmail.com>
* Add Data Column Metrics * Shift it All To Peerdas Package
* `pingPeers`: Add log with new ENR when modified. * `p2p Start`: Use idiomatic go error syntax. * P2P `start`: Fix error message. * Use not bootnodes at all if the `--chain-config-file` flag is used and no `--bootstrap-node` flag is used. Before this commit, if the `--chain-config-file` flag is used and no `--bootstrap-node` flag is used, then bootnodes are (incorrectly) defaulted on `mainnet` ones. * `validPeersExist`: Centralize logs. * `AddConnectionHandler`: Improve logging. "Peer connected" does not really reflect the fact that a new peer is actually connected. --> "New peer connection" is more clear. Also, instead of writing `0`, `1`or `2` for direction, now it's writted "Unknown", "Inbound", "Outbound". * Logging: Add 2 decimals for timestamt in text and JSON logs. * Improve "no valid peers" logging. * Improve "Some columns have no peers responsible for custody" logging. * `pubsubSubscriptionRequestLimit`: Increase to be consistent with data columns. * `sendPingRequest`: Improve logging. * `FindPeersWithSubnet`: Regularly recheck in our current set of peers if we have enough peers for this topic. Before this commit, new peers HAD to be found, even if current peers are eventually acceptable. For very small network, it used to lead to infinite search. * `subscribeDynamicWithSyncSubnets`: Use exactly the same subscription function initially and every slot. * Make deepsource happier. * Nishant's commend: Change peer disconnected log. * NIshant's comment: Change `Too many incoming subscription` log from error to debug. * `FindPeersWithSubnet`: Address Nishant's comment. * `batchSize`: Address Nishant's comment. * `pingPeers` ==> `pingPeersAndLogEnr`. * Update beacon-chain/sync/subscriber.go Co-authored-by: Nishant Das <nishdas93@gmail.com> --------- Co-authored-by: Nishant Das <nishdas93@gmail.com>
* `CustodyCountFromRemotePeer`: Set happy path in the outer scope. * `FindPeersWithSubnet`: Improve logging. * `listenForNewNodes`: Avoid infinite loop in a small subnet. * Address Nishant's comment. * FIx Nishant's comment.
* `sendBatchRootRequest`: Refactor and add comments. * `sendBatchRootRequest`: Do send requests to peers that custodies a superset of our columns. Before this commit, we sent "data columns by root requests" for data columns peers do not custody. * Data columns: Use subnet sampling only. (Instead of peer sampling.) aaa * `areDataColumnsAvailable`: Improve logs. * `GetBeaconBlock`: Improve logs. Rationale: A `begin` log should always be followed by a `success` log or a `failure` log.
* `validateDataColumn`: Refactor logging. * `dataColumnSidecarByRootRPCHandler`: Improve logging. * `isDataAvailable`: Improve logging. * Add hidden debug flag: `--data-columns-reject-slot-multiple`. * Add more logs about peer disconnection. * `validPeersExist` --> `enoughPeersAreConnected` * `beaconBlocksByRangeRPCHandler`: Add remote Peer ID in logs. * Stop calling twice `writeErrorResponseToStream` in case of rate limit.
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor. * `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock` * `reconstructDataColumns`: Stop looking into the DB to know if we have some columns. Before this commit: Each time we receive a column, we look into the filesystem for all columns we store. ==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look. Also, as soon as a column is saved into the file system, then if, right after, we look at the filesystem again, we assume the column will be available (strict consistency). It happens not to be always true. ==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency. After this commit: We use a (strictly consistent) cache to determine if we received a column or not. ==> No more consistency issue, and less stress for the filesystem. * `dataColumnSidecarByRootRPCHandler`: Improve logging. Before this commit, logged values assumed that all requested columns correspond to the same block root, which is not always the case. After this commit, we know which columns are requested for which root. * Add a log when broadcasting a data column. This is useful to debug "lost data columns" in devnet. * Address Nishant's comment
* `columnErrBuilder`: Uses `Wrap` instead of `Join`. Reason: `Join` makes a carriage return. The log is quite unreadable. * `validateDataColumn`: Improve log. * `areDataColumnsAvailable`: Improve log. * `SendDataColumnSidecarByRoot` ==> `SendDataColumnSidecarsByRootRequest`. * `handleDA`: Refactor error message. * `sendRecentBeaconBlocksRequest` ==> `sendBeaconBlocksRequest`. Reason: There is no notion at all of "recent" in the function. If the caller decides to call this function only with "recent" blocks, that's fine. However, the function itself will know nothing about the "recentness" of these blocks. * `sendBatchRootRequest`: Improve comments. * `sendBeaconBlocksRequest`: Avoid `else` usage and use map of bool instead of `struct{}`. * `wrapAndReportValidation`: Remove `agent` from log. Reason: This prevent the log to hold on one line, and it is not really useful to debug. * `validateAggregateAndProof`: Add comments. * `GetValidCustodyPeers`: Fix typo. * `GetValidCustodyPeers` ==> `DataColumnsAdmissibleCustodyPeers`. * `CustodyHandler` ==> `DataColumnsHandler`. * `CustodyCountFromRemotePeer` ==> `DataColumnsCustodyCountFromRemotePeer`. * Implement `DataColumnsAdmissibleSubnetSamplingPeers`. * Use `SubnetSamplingSize` instead of `CustodySubnetCount` where needed. * Revert "`wrapAndReportValidation`: Remove `agent` from log." This reverts commit 55db351.
* `retrieveMissingDataColumnsFromPeers`: Improve logging. * `dataColumnSidecarByRootRPCHandler`: Stop decreasing peer's score if asking for a column we do not custody. * `dataColumnSidecarByRootRPCHandler`: If a data column is unavailable, stop waiting for it. This behaviour was useful for peer sampling. Now, just return the data column if we store it. If we don't, skip. * Dirty code comment. * `retrieveMissingDataColumnsFromPeers`: Improve logs. * `SendDataColumnsByRangeRequest`: Improve logs. * `dataColumnSidecarsByRangeRPCHandler`: Improve logs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remaining tasks:
[1, 31]
. (Starting a node since genesis works, starting a node from epoch 1 and later works.) Note: This issue may be already here with blobs (without data columns.)cKZG
library by thegoKZG
one (when available) - In progress./eth/v1/beacon/blob_sidecars/{block_id}
internalBroadcastDataColumn
)DEBUG p2p: Peer connected activePeers=2
log withINFO p2p: Connected peers
log.--minimum-peers-per-subnet=<n>
GetValidCustodyPeers
is used. ==> We need to fetch data from peers that are not super nodes.data columns by root requests
are done more than once. (It may be normal if done to multiple peers, but we have to check.)prysmctl
: Implement byRoot and byRange data columns requests