Erigon3 upstream e4eb9fc #483

blxdyx · 2024-08-19T11:49:08Z

Upstream latest erigon3

- Updated gopsutil version as it has improvements in getting processes and memory info.

`murmur3.New*` methods return interface. And need call minimum 3 methods on it. `16ns` -> `11ns` Also i did bench `github.com/segmentio/murmur3` vs `github.com/twmb/murmur3` on 60bytes hashed string 2nd is faster but adding asm deps. So i stick to go's dep (because asm deps are not friendly for cross-compilation), maybe will try it later - after our new release pipeline is ready. Bench results: intel: `20ns` -> `14ns` amd: `31ns` -> `26ns`

Before this PR we called heimdall.Synchronize as part of heimdall.CheckpointsFromBlock and heimdall.MilestonesFromBlock. The previous implementation of Synchronize was waiting on all scrappers to be synchronised. This is inefficient because `heimdall.CheckpointsFromBlock` needs only the `checkpoints` scrapper to be synchronised. For the initial sync we first only need to wait for the checkpoints to be downloaded and then we can start downloading blocks from devp2p. While we are doing that we can let the spans and milestones be scrapped in the background. Note this is based on the fact that fetching checkpoints has been optimised by doing bulk fetching and finishes in seconds, while fetching Spans has not yet been optimised and for bor-mainnet can take a long time. Changes in the PR: - splits Synchronize into 3 more fine grained SynchronizeCheckpoints, SynchronizeMilestones and SynchronizeSpans calls which are invoked by the Sync algorithm at the right time - Optimises SynchronizeSpans to check if it already has the corresponding span for the given block number before blocking - Moves synchronisation point for Spans and State Sync Events in `Sync.commitExecution` just before we call ExecutionEngine.UpdateForkChoice to make it clearer what data is necessary to be sync-ed before calling Execution - Changes EventNotifier and Synchronize funcs to return err if ctx is cancelled or other errors have happened - Input consistency between the heimdallSynchronizer and bridgeSynchronizer - use blockNum instead of *type.Header - Interface tidy ups

Make Cell unexported Remove ProcessTree/Keys/Update Reviewed and refreshed all unit/bench/fuzz tests related to commitment erigontech#11326

…ech#11515)

Change test scheduling and timeouts after Ottersync introduction. Now we can execute tests more frequently due to the significant reduction in test time. Scheduled to run every night: - tip-tracking - snap-download - sync-from-scratch for mainnet, minimal node Scheduled to run on Sunday: - sync-from-scratch for testnets, archive node

- Collecting CPU and Memory usage info about all processes running on the machine - Running loop 5 times with 2 seconds delay and to calculate average - Sort by CPU usage - Write result to report file Result: ![Screenshot 2024-08-07 at 18 40 08](https://github.com/user-attachments/assets/aac1264c-1eb9-4c8e-b6a6-7e248e37855a)

closes erigontech#11173 Adds tests for the Heimdall Service which cover: - Milestone scrapping - Span scrapping - Checkpoint scrapping - `Producers` API - compares the results with results from the `bor_getSnapshotProposerSequence` RPC API

- Added totals for CPU and Memory usage to processes table - Added CPU usage by cores Example output: ![Screenshot 2024-08-08 at 12 46 17](https://github.com/user-attachments/assets/ec0897d0-81c8-4436-bb65-527363157e76)

continuing erigontech#11326

forgot to silence the logging in the heimdall service tests in a previous PR the logging lvl can be tweaked at times of need if debugging is necessary

Refactored table utils to have an option to generate table and return it as string which will used for saving data to file.

…ot and added clearIndexing command (erigontech#11539) Main checks: * No gap in steps/blocks * Check if all indexing present * Check if all idx, history, domain present

closes erigontech#11177 - adds unwind logic to the new polygon sync stage which uses astrid - seems like we've never done running for bor heimdall so removing empty funcs

Refactored printing cpu info: - move CPU details to table - move CPU usage next to details table - refactor code

…ch#11549) relates to: erigontech#10734 erigontech#11387 restart Erigon with `SAVE_HEAP_PROFILE = true` env variable wait until we reach 45% or more alloc in stage_headers when "noProgressCounter >= 5" or "Rejected header marked as bad"

and also move `design` into `docs` in order to reduce the number of top-level directories

Before we had transaction-wide cache (map) Now i changing it to evm-wide. EVM - is thread-unsafe object - it's ok to use thread-unsafe LRU. But ExecV3 already using 1-ENV per worker. Means we will share between blocks (not on chain-tip for now) bench: - on `mainnet`: it shows 12% improvement on large eth_getLogs call (re-exec large historical range of blocks near 6M block) - on hot state. About chain-tip: - don't see much impact (even if make cache global) - because mainnet/bor-mainnet current bottleneck is "flush" changes to db. but `integration loop_exec --unwind=2` shows 5% improvement. - in future PR we can share 1 lru for many new blocks - currently creating new one every stage loop iteration.

- added flags which was applied to run command to report

Notable things: Removed all testing on Total Difficulty or Difficulty as that is a PoW concept Integration tests about difficulty removed as they only test ETH PoW Remove difficulty checks in a test I had to tweak some hashes as I have removed difficulty computation so some hashes were different is some tests (now hardcoded to 1) - It is running on the tip of the chain too, YAY

CL may be able to handle a non-error message better. Eventually I plan on improving this corner case, as FCU has an 8s timeout and I don't think EL waits that long before declaring busy. This causes some (or all) CL(s) to miss slots, as they don't know what to do now

part of erigontech#11149 All events are supported now

Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>

Bor mainnet <img width="532" alt="Screenshot 2024-08-14 at 12 33 55" src="https://github.com/user-attachments/assets/317498a2-94b3-4973-b136-2546aea30351">

Tidying up bor contracts code: - Rename `GenesisContractsClient` to `StateReceiver` for clarity - Remove unused `LastStateId` func in `StateReceiver` - Move `StateReceiver` into its own file `state_receiver.go` - Create a gomock for it in `state_receiver_mock.go` and use in test - Remove unused function input for `ChainSpanner`

kurtosis assertoor tests in ci decision points: - since the job takes about 45 minutes to run, I decided not to run it on every PR, but rather we run it every 12 hours at 2AM/PM UTC. - all tests are present in https://github.com/ethpandaops/assertoor-test/tree/master/assertoor-tests. This ci runs [these](https://github.com/erigontech/erigon/pull/11464/files#diff-b45e49409c33f39133315225d30c02d8cfacfd9a53d1157bca61284683a4c498R22) tests. - test is for ubuntu only; not for mac or windows. Many tests are failing; specially the validator related ones. Tracked separately - erigontech#11590

I'd like to add `MarshallKey`, `MarshallValue` (and corresponding `Unmarshall`) receiver functions on `EventRecordWithTime` which hide away the `abi` detail so callers don't have to deal with the ABI as a function input everywhere in the code base - PR erigontech#11620 However there is a circular dependency which blocks me from doing that because `EventRecordWithTime` is in package `polygon/heimdall` and it imports `polygon/bor` package for `bor.GenesisContractStateReceiverABI`, however `polygon/bor` imports `polygon/heimdall` for `EventRecordWithTime` due to the [bor event fallback hack](https://github.com/erigontech/erigon/blob/main/polygon/bor/bor.go#L1516-L1556) we have in Bor consensus engine (it is pending removal once we fix the underlying issue that led to it appearing which is a WIP). For now, to unblock myself I am moving the bor ABIs in a sub-package `polygon/bor/borabi`

…bi (erigontech#11620) Relates to and inspired by erigontech#11225 Main motivation for this change is to simplify the Bridge Store interface by removing the need to pass an `ABI` in the function signatures. But it ended up tidying up other bits of the codebase too. We should do the same for the entities in `polygon/heimdall EntityStore` as it is much much cleaner

This reverts PR erigontech#11556, which caused Hive tests (e.g. "Bad Hash on NewPayload") to crash.

…ted tx (erigontech#11629)

addresses follow ups from erigontech#11568 (comment)

Things added: * Keep genesis file stored in datadir * Fixed race conditions in Sentinel * Fixed scheduling of hard fork for same epoch-hardforks * Small refactorings --------- Co-authored-by: Kewei <kewei.train@gmail.com>

Fixed issue with setup diagnostics client on erigon start

- `br.FrozenBlocks()` to ensure that all file types are exist

And write mb/s is also big at prune time - nvme handles it - but on a bit slower disks will affect chain-tip perf. BorMainnet: <img width="1054" alt="Screenshot 2024-08-16 at 08 26 22" src="https://github.com/user-attachments/assets/87e8d227-ac33-4800-a53a-594d872b074f"> Gnosis: <img width="1026" alt="Screenshot 2024-08-16 at 08 36 23" src="https://github.com/user-attachments/assets/1290e57c-3325-4e89-95cd-83baf8fa93c7">

…#11489) - on mainnet: it shows +10% improvement on large eth_getLogs call (re-exec large historical range of blocks near 6M block) - on hot state. on chain tip: - not noticible impact, but now it living only inside current Tx - maybe in future PR's I can share such LRU between block somehow - on loop_exe i also not noticed difference

- Added functionality to grab the heap profile by calling the diagnostics endpoint - Added support to pass heap profile file to diagnostics UI through WebSocket

Save erigon logs into test zip attachment

…rom chaindata (erigontech#11634) We prune `CanonicalHash` and `HeaderNumber ` in `chaindata`. this leads to a -99% in these 2 tables and reduce their total size to 40 pages in TOTAL (from >500k pages). Benchmarks: ``` Total time taken to query blocks from 1000000 to 3000000: 1720.1410 seconds (OLD) Total time taken to query blocks from 1000000 to 3000000: 1747.4729 seconds (New) Amortized delta cost: 0.1ms/req ``` Is the RPC working for those ranges? Yes. ## How does it work? Query canonical hash from snapshots by: `viewHeaderByNumber` -> `header.Hash()` if db hit is missed Query header number from snapshots by: `viewHeaderByHash` -> `header.Num64` if db hit is missed

10% better throughput and latency of: small eth_getLogs, big eth_getLogs increased limit to 4096 - because can re-use lru objects - without worry on "alloc speed"

This adds integrity checking for borevents in the following places 1. Stage Bor Heimdall - check that the event occurs withing the expected time window 2. Dump Events to snapshots - check that the event ids are continuous in and between blocks (will add a time window tests) 3. Index event snapshots - check that the event ids are continuous in and between blocks It also adds an external integrity checker which runs on snapshots checking for continuity and timeliness of events. This can be called using the following command: `erigon snapshots integrity --datadir=~/snapshots/bor-mainnet-patch-1 --from=45500000` (--to specifies an end block) This also now fixes the long running issue with bor events causing unexpected fails in executions: the problem event validation uncovered was a follows: The **kv.BorEventNums** mapping table currently keeps the mapping first event id->block. The code which produces bor-event snapshots to determine which events to put into the snapshot. however if no additional blocks have events by the time the block is stored in the snapshot, the snapshot creation code does not know which events to include - so drops them. This causes problems in two places: * RPC queries & re-execution from snapshots can't find these dropped events * Depending on purge timing these events may erroneously get inserted into future blocks The code change in this PR fixes that bug. It has been tested by running: ``` erigon --datadir=~/chains/e3/amoy --chain=amoy --bor.heimdall=https://heimdall-api-amoy.polygon.technology --db.writemap=false --txpool.disable --no-downloader --bor.milestone=false ``` with validation in place and the confimed by running the following: ``` erigon snapshots rm-all-state-snapshots --datadir=~/chains/e3/amoy rm ~/chains/e3/amoy/chaindata/ erigon --datadir=~/chains/e3/amoy --chain=amoy --bor-heimdall=https://heimdall-api-amoy.polygon.technology --db.writemap=false --no-downloader --bor.milestone=false ``` To recreate the chain from snapshots. It has also been checked with: ``` erigon snapshots integrity --datadir=~/chains/e3/amoy --check=NoBorEventGaps --failFast=true" ``` --------- Co-authored-by: alex.sharov <AskAlexSharov@gmail.com> Co-authored-by: Mark Holt <mark@disributed.vision>

…_e4eb9fc # Conflicts: # cmd/state/exec3/state_recon.go # core/chain_makers.go # core/state/rw_v3.go # core/types/block.go # erigon-lib/go.mod # erigon-lib/go.sum # erigon-lib/kv/tables.go # eth/stagedsync/exec3.go # eth/stagedsync/stage_snapshots.go # go.mod # go.sum # turbo/app/snapshots_cmd.go # turbo/jsonrpc/tracing.go # turbo/stages/headerdownload/header_algo_test.go

dvovk and others added 30 commits August 7, 2024 09:18

Updgopsutilversion (erigontech#11507)

9605979

- Updated gopsutil version as it has improvements in getting processes and memory info.

mdbx: remove unused table names and keys (erigontech#11506)

d919ab7

Commitment code cleanup (erigontech#11517)

80c929f

Make Cell unexported Remove ProcessTree/Keys/Update Reviewed and refreshed all unit/bench/fuzz tests related to commitment erigontech#11326

add pretty printer for k/M/B/T/Quintillion counters printing (erigont…

abaca33

…ech#11515)

Block: use generic atomics (erigontech#11512)

50c9a92

pool: add more err context (erigontech#11521)

0d99274

up x deps (erigontech#11511)

dd2d76d

diagnostics: added cpu and memory usage details (erigontech#11526)

7d9c3d6

- Added totals for CPU and Memory usage to processes table - Added CPU usage by cores Example output: ![Screenshot 2024-08-08 at 12 46 17](https://github.com/user-attachments/assets/ec0897d0-81c8-4436-bb65-527363157e76)

Use PrettyCounter (erigontech#11525)

0c44534

Commitment: remove keys from Update (erigontech#11528)

9443d88

continuing erigontech#11326

reduce prune deadline on chain-tip (erigontech#11529)

b72568a

polygon/heimdall: silence service tests logging (erigontech#11534)

90c9339

forgot to silence the logging in the heimdall service tests in a previous PR the logging lvl can be tweaked at times of need if debugging is necessary

diagnostics: refactored table utils (erigontech#11537)

1e775d6

Refactored table utils to have an option to generate table and return it as string which will used for saving data to file.

Erigon: Added verification for whether snapshots are publishable or n…

1f95904

…ot and added clearIndexing command (erigontech#11539) Main checks: * No gap in steps/blocks * Check if all indexing present * Check if all idx, history, domain present

astrid: make sync stage tx action channel bufferred (erigontech#11535)

26d3258

don't use lfs for consensus spec tests (erigontech#11545)

391fc4b

polygon: add unwind prune to sync stage (erigontech#11531)

a3433cc

closes erigontech#11177 - adds unwind logic to the new polygon sync stage which uses astrid - seems like we've never done running for bor heimdall so removing empty funcs

diagnostics: refactored cpu info (erigontech#11544)

119ec1f

Refactored printing cpu info: - move CPU details to table - move CPU usage next to details table - refactor code

Move visual inside cmd/pics (erigontech#11543)

ff2130d

and also move `design` into `docs` in order to reduce the number of top-level directories

diagnostics: added flags to report (erigontech#11548)

b0a2654

- added flags which was applied to run command to report

more cl state events (erigontech#11527)

647a72f

part of erigontech#11149 All events are supported now

Giulio2002 and others added 27 commits August 15, 2024 07:52

Ottersync: make --all flag to seed .idx, .vi, etc... (erigontech#11540)

8a28557

Co-authored-by: alex.sharov <AskAlexSharov@gmail.com>

Fixed race into E3 salt-blocks.txt (erigontech#11622)

9889b11

db: pagesize default change to 4kb (erigontech#11602)

c9cefa4

Bor mainnet <img width="532" alt="Screenshot 2024-08-14 at 12 33 55" src="https://github.com/user-attachments/assets/317498a2-94b3-4973-b136-2546aea30351">

Revert "E3: Remove Proof-Of-Work Consensus code" (erigontech#11628)

c746942

This reverts PR erigontech#11556, which caused Hive tests (e.g. "Bad Hash on NewPayload") to crash.

stagedsync: fix astrid sync stage panic due to unintentionally commit…

dfa0625

…ted tx (erigontech#11629)

polygon/sync: naming follow up (erigontech#11630)

41465d5

addresses follow ups from erigontech#11568 (comment)

Add custom chain support to Caplin (erigontech#11508)

f6a39a5

Things added: * Keep genesis file stored in datadir * Fixed race conditions in Sentinel * Fixed scheduling of hard fork for same epoch-hardforks * Small refactorings --------- Co-authored-by: Kewei <kewei.train@gmail.com>

print lru stats after exec stage (erigontech#11601)

4b5373e

E3: Added accessor snaptype (erigontech#11632)

6b8c6eb

diagnostics: fix setup (erigontech#11633)

f1e8643

Fixed issue with setup diagnostics client on erigon start

codeBitmap: isPush trick (erigontech#11597)

a883ad9

more conservative blocks prune (erigontech#11614)

c28c0aa

- `br.FrozenBlocks()` to ensure that all file types are exist

diagnostics: added func to grab heap profile (erigontech#11643)

820abe5

- Added functionality to grab the heap profile by calling the diagnostics endpoint - Added support to pass heap profile file to diagnostics UI through WebSocket

lru: disable default tracing (erigontech#11651)

3cb847b

qa-tests: save erigon logs for future inspections (erigontech#11654)

6c387bf

Save erigon logs into test zip attachment

e3: bind ii/domain lru to _visibleFiles object (erigontech#11652)

12ebd14

10% better throughput and latency of: small eth_getLogs, big eth_getLogs increased limit to 4096 - because can re-use lru objects - without worry on "alloc speed"

ibs: remove blockhash (erigontech#11656)

48544c9

upstream erigon3

55e9435

setunapo approved these changes Aug 20, 2024

View reviewed changes

setunapo merged commit f296530 into node-real:erigon3 Aug 20, 2024
1 of 3 checks passed

blxdyx deleted the erigon3_upstream_e4eb9fc branch August 20, 2024 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erigon3 upstream e4eb9fc #483

Erigon3 upstream e4eb9fc #483

blxdyx commented Aug 19, 2024

Erigon3 upstream e4eb9fc #483

Erigon3 upstream e4eb9fc #483

Conversation

blxdyx commented Aug 19, 2024