Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve forkchoice updateHead() time #5867

Merged
merged 8 commits into from
Aug 11, 2023

Conversation

twoeths
Copy link
Contributor

@twoeths twoeths commented Aug 9, 2023

Motivation

  • Performance test of computeDeltas does not reflect the number we see in goerli/mainnet
  • Update: was able to reproduce the big updateHead time in forkChoice updateHead vc 600000 bc 7200 eq 0 test

Description

part of #5852

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Performance Report

✔️ no performance regression detected

🚀🚀 Significant benchmark improvement detected

Benchmark suite Current: 8427863 Previous: 664820a Ratio
forkChoice updateHead vc 600000 bc 1200 eq 0 13.160 ms/op 92.803 ms/op 0.14
Full benchmark results
Benchmark suite Current: 8427863 Previous: 664820a Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 690.78 us/op 538.36 us/op 1.28
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 112.35 us/op 75.356 us/op 1.49
BLS verify - blst-native 1.3420 ms/op 1.2453 ms/op 1.08
BLS verifyMultipleSignatures 3 - blst-native 2.6744 ms/op 2.5238 ms/op 1.06
BLS verifyMultipleSignatures 8 - blst-native 5.5953 ms/op 5.3875 ms/op 1.04
BLS verifyMultipleSignatures 32 - blst-native 21.146 ms/op 19.666 ms/op 1.08
BLS aggregatePubkeys 32 - blst-native 31.899 us/op 25.852 us/op 1.23
BLS aggregatePubkeys 128 - blst-native 121.87 us/op 101.54 us/op 1.20
getAttestationsForBlock 111.23 ms/op 57.596 ms/op 1.93
isKnown best case - 1 super set check 685.00 ns/op 314.00 ns/op 2.18
isKnown normal case - 2 super set checks 619.00 ns/op 297.00 ns/op 2.08
isKnown worse case - 16 super set checks 584.00 ns/op 317.00 ns/op 1.84
CheckpointStateCache - add get delete 6.6610 us/op 5.0350 us/op 1.32
validate api signedAggregateAndProof - struct 3.0467 ms/op 2.8058 ms/op 1.09
validate gossip signedAggregateAndProof - struct 3.0813 ms/op 2.8122 ms/op 1.10
validate api attestation - struct 1.4438 ms/op 1.3431 ms/op 1.07
validate gossip attestation - struct 1.5066 ms/op 1.3665 ms/op 1.10
pickEth1Vote - no votes 1.5701 ms/op 1.2316 ms/op 1.27
pickEth1Vote - max votes 17.097 ms/op 8.0403 ms/op 2.13
pickEth1Vote - Eth1Data hashTreeRoot value x2048 10.886 ms/op 9.3640 ms/op 1.16
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 17.855 ms/op 15.007 ms/op 1.19
pickEth1Vote - Eth1Data fastSerialize value x2048 765.88 us/op 593.72 us/op 1.29
pickEth1Vote - Eth1Data fastSerialize tree x2048 5.6343 ms/op 7.8127 ms/op 0.72
bytes32 toHexString 862.00 ns/op 503.00 ns/op 1.71
bytes32 Buffer.toString(hex) 348.00 ns/op 291.00 ns/op 1.20
bytes32 Buffer.toString(hex) from Uint8Array 558.00 ns/op 445.00 ns/op 1.25
bytes32 Buffer.toString(hex) + 0x 321.00 ns/op 293.00 ns/op 1.10
Object access 1 prop 0.26400 ns/op 0.16400 ns/op 1.61
Map access 1 prop 0.29600 ns/op 0.14800 ns/op 2.00
Object get x1000 12.636 ns/op 7.5410 ns/op 1.68
Map get x1000 1.3050 ns/op 0.64300 ns/op 2.03
Object set x1000 110.22 ns/op 53.180 ns/op 2.07
Map set x1000 71.991 ns/op 43.503 ns/op 1.65
Return object 10000 times 0.34980 ns/op 0.24770 ns/op 1.41
Throw Error 10000 times 4.7145 us/op 3.9372 us/op 1.20
fastMsgIdFn sha256 / 200 bytes 4.7870 us/op 3.3760 us/op 1.42
fastMsgIdFn h32 xxhash / 200 bytes 704.00 ns/op 297.00 ns/op 2.37
fastMsgIdFn h64 xxhash / 200 bytes 879.00 ns/op 353.00 ns/op 2.49
fastMsgIdFn sha256 / 1000 bytes 13.392 us/op 11.665 us/op 1.15
fastMsgIdFn h32 xxhash / 1000 bytes 909.00 ns/op 436.00 ns/op 2.08
fastMsgIdFn h64 xxhash / 1000 bytes 1.0470 us/op 433.00 ns/op 2.42
fastMsgIdFn sha256 / 10000 bytes 116.84 us/op 104.81 us/op 1.11
fastMsgIdFn h32 xxhash / 10000 bytes 2.1760 us/op 1.9910 us/op 1.09
fastMsgIdFn h64 xxhash / 10000 bytes 1.5520 us/op 1.3820 us/op 1.12
enrSubnets - fastDeserialize 64 bits 2.0010 us/op 1.3610 us/op 1.47
enrSubnets - ssz BitVector 64 bits 670.00 ns/op 492.00 ns/op 1.36
enrSubnets - fastDeserialize 4 bits 316.00 ns/op 228.00 ns/op 1.39
enrSubnets - ssz BitVector 4 bits 677.00 ns/op 550.00 ns/op 1.23
prioritizePeers score -10:0 att 32-0.1 sync 2-0 134.14 us/op 126.22 us/op 1.06
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 216.59 us/op 138.56 us/op 1.56
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 224.14 us/op 223.03 us/op 1.00
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 366.23 us/op 371.82 us/op 0.98
prioritizePeers score 0:0 att 64-1 sync 4-1 404.66 us/op 413.41 us/op 0.98
array of 16000 items push then shift 1.7010 us/op 1.6410 us/op 1.04
LinkedList of 16000 items push then shift 9.7020 ns/op 9.7410 ns/op 1.00
array of 16000 items push then pop 62.419 ns/op 65.478 ns/op 0.95
LinkedList of 16000 items push then pop 9.1740 ns/op 9.5200 ns/op 0.96
array of 24000 items push then shift 2.5279 us/op 2.5320 us/op 1.00
LinkedList of 24000 items push then shift 9.6230 ns/op 10.206 ns/op 0.94
array of 24000 items push then pop 118.70 ns/op 132.05 ns/op 0.90
LinkedList of 24000 items push then pop 9.6490 ns/op 9.1160 ns/op 1.06
intersect bitArray bitLen 8 7.0680 ns/op 7.1240 ns/op 0.99
intersect array and set length 8 68.098 ns/op 72.367 ns/op 0.94
intersect bitArray bitLen 128 32.635 ns/op 33.160 ns/op 0.98
intersect array and set length 128 1.0130 us/op 1.0336 us/op 0.98
bitArray.getTrueBitIndexes() bitLen 128 1.5570 us/op 1.8290 us/op 0.85
bitArray.getTrueBitIndexes() bitLen 248 2.4170 us/op 3.0980 us/op 0.78
bitArray.getTrueBitIndexes() bitLen 512 4.7520 us/op 6.4210 us/op 0.74
Buffer.concat 32 items 1.0360 us/op 1.0760 us/op 0.96
Uint8Array.set 32 items 1.9160 us/op 1.8510 us/op 1.04
transfer serialized Status (84 B) 2.0000 us/op 1.9540 us/op 1.02
copy serialized Status (84 B) 1.7010 us/op 1.8480 us/op 0.92
transfer serialized SignedVoluntaryExit (112 B) 2.0950 us/op 2.3500 us/op 0.89
copy serialized SignedVoluntaryExit (112 B) 1.7680 us/op 2.1430 us/op 0.83
transfer serialized ProposerSlashing (416 B) 2.3930 us/op 3.2530 us/op 0.74
copy serialized ProposerSlashing (416 B) 2.2250 us/op 2.7180 us/op 0.82
transfer serialized Attestation (485 B) 2.3670 us/op 3.1800 us/op 0.74
copy serialized Attestation (485 B) 2.3280 us/op 2.9130 us/op 0.80
transfer serialized AttesterSlashing (33232 B) 2.4280 us/op 2.6610 us/op 0.91
copy serialized AttesterSlashing (33232 B) 8.3090 us/op 10.460 us/op 0.79
transfer serialized Small SignedBeaconBlock (128000 B) 2.6650 us/op 3.0000 us/op 0.89
copy serialized Small SignedBeaconBlock (128000 B) 13.191 us/op 34.322 us/op 0.38
transfer serialized Avg SignedBeaconBlock (200000 B) 3.0020 us/op 3.7140 us/op 0.81
copy serialized Avg SignedBeaconBlock (200000 B) 20.972 us/op 49.520 us/op 0.42
transfer serialized BlobsSidecar (524380 B) 3.2750 us/op 3.9470 us/op 0.83
copy serialized BlobsSidecar (524380 B) 94.372 us/op 308.90 us/op 0.31
transfer serialized Big SignedBeaconBlock (1000000 B) 3.3040 us/op 5.0620 us/op 0.65
copy serialized Big SignedBeaconBlock (1000000 B) 160.94 us/op 283.43 us/op 0.57
pass gossip attestations to forkchoice per slot 3.2127 ms/op 2.5814 ms/op 1.24
forkChoice updateHead vc 100000 bc 64 eq 0 1.7251 ms/op 2.3277 ms/op 0.74
forkChoice updateHead vc 600000 bc 64 eq 0 11.346 ms/op 11.734 ms/op 0.97
forkChoice updateHead vc 1000000 bc 64 eq 0 18.345 ms/op 19.728 ms/op 0.93
forkChoice updateHead vc 600000 bc 320 eq 0 11.134 ms/op 17.910 ms/op 0.62
forkChoice updateHead vc 600000 bc 1200 eq 0 13.160 ms/op 92.803 ms/op 0.14
forkChoice updateHead vc 600000 bc 7200 eq 0 12.511 ms/op
forkChoice updateHead vc 600000 bc 64 eq 1000 19.823 ms/op 19.784 ms/op 1.00
forkChoice updateHead vc 600000 bc 64 eq 10000 22.723 ms/op 21.701 ms/op 1.05
forkChoice updateHead vc 600000 bc 64 eq 300000 28.727 ms/op 42.396 ms/op 0.68
computeDeltas 500000 validators 300 proto nodes 20.602 ms/op
computeDeltas 500000 validators 1200 proto nodes 20.736 ms/op
computeDeltas 500000 validators 7200 proto nodes 20.485 ms/op
computeDeltas 750000 validators 300 proto nodes 31.150 ms/op
computeDeltas 750000 validators 1200 proto nodes 31.115 ms/op
computeDeltas 750000 validators 7200 proto nodes 31.945 ms/op
computeDeltas 1400000 validators 300 proto nodes 59.847 ms/op
computeDeltas 1400000 validators 1200 proto nodes 59.399 ms/op
computeDeltas 1400000 validators 7200 proto nodes 58.648 ms/op
computeDeltas 2100000 validators 300 proto nodes 90.194 ms/op
computeDeltas 2100000 validators 1200 proto nodes 87.499 ms/op
computeDeltas 2100000 validators 7200 proto nodes 87.164 ms/op
computeProposerBoostScoreFromBalances 500000 validators 3.4398 ms/op
computeProposerBoostScoreFromBalances 750000 validators 3.3195 ms/op
computeProposerBoostScoreFromBalances 1400000 validators 3.3051 ms/op
computeProposerBoostScoreFromBalances 2100000 validators 3.2900 ms/op
altair processAttestation - 250000 vs - 7PWei normalcase 2.8514 ms/op 3.1375 ms/op 0.91
altair processAttestation - 250000 vs - 7PWei worstcase 3.7891 ms/op 4.5934 ms/op 0.82
altair processAttestation - setStatus - 1/6 committees join 186.74 us/op 172.14 us/op 1.08
altair processAttestation - setStatus - 1/3 committees join 363.86 us/op 324.19 us/op 1.12
altair processAttestation - setStatus - 1/2 committees join 474.67 us/op 419.53 us/op 1.13
altair processAttestation - setStatus - 2/3 committees join 598.63 us/op 552.66 us/op 1.08
altair processAttestation - setStatus - 4/5 committees join 827.02 us/op 744.61 us/op 1.11
altair processAttestation - setStatus - 100% committees join 949.58 us/op 877.79 us/op 1.08
altair processBlock - 250000 vs - 7PWei normalcase 9.7073 ms/op 10.454 ms/op 0.93
altair processBlock - 250000 vs - 7PWei normalcase hashState 17.736 ms/op 18.554 ms/op 0.96
altair processBlock - 250000 vs - 7PWei worstcase 37.359 ms/op 40.492 ms/op 0.92
altair processBlock - 250000 vs - 7PWei worstcase hashState 63.309 ms/op 65.163 ms/op 0.97
phase0 processBlock - 250000 vs - 7PWei normalcase 2.4595 ms/op 3.2453 ms/op 0.76
phase0 processBlock - 250000 vs - 7PWei worstcase 29.519 ms/op 32.063 ms/op 0.92
altair processEth1Data - 250000 vs - 7PWei normalcase 470.07 us/op 574.48 us/op 0.82
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 11.208 us/op 17.432 us/op 0.64
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 36.425 us/op 80.922 us/op 0.45
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 16.904 us/op 26.864 us/op 0.63
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 12.578 us/op 14.381 us/op 0.87
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 129.14 us/op 226.38 us/op 0.57
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.0251 ms/op 1.4674 ms/op 0.70
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.6344 ms/op 1.6772 ms/op 0.97
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 1.4514 ms/op 2.3613 ms/op 0.61
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 3.3925 ms/op 4.3978 ms/op 0.77
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 2.5288 ms/op 2.7087 ms/op 0.93
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 4.7223 ms/op 5.2722 ms/op 0.90
Tree 40 250000 create 326.01 ms/op 425.26 ms/op 0.77
Tree 40 250000 get(125000) 198.10 ns/op 218.75 ns/op 0.91
Tree 40 250000 set(125000) 997.29 ns/op 1.0415 us/op 0.96
Tree 40 250000 toArray() 19.597 ms/op 23.505 ms/op 0.83
Tree 40 250000 iterate all - toArray() + loop 17.795 ms/op 24.346 ms/op 0.73
Tree 40 250000 iterate all - get(i) 68.092 ms/op 76.561 ms/op 0.89
MutableVector 250000 create 12.346 ms/op 16.883 ms/op 0.73
MutableVector 250000 get(125000) 6.4710 ns/op 6.9020 ns/op 0.94
MutableVector 250000 set(125000) 239.33 ns/op 255.25 ns/op 0.94
MutableVector 250000 toArray() 2.9275 ms/op 4.0431 ms/op 0.72
MutableVector 250000 iterate all - toArray() + loop 3.0066 ms/op 4.1334 ms/op 0.73
MutableVector 250000 iterate all - get(i) 1.5366 ms/op 1.5601 ms/op 0.98
Array 250000 create 2.3936 ms/op 3.4001 ms/op 0.70
Array 250000 clone - spread 1.0059 ms/op 1.0728 ms/op 0.94
Array 250000 get(125000) 0.51200 ns/op 0.54900 ns/op 0.93
Array 250000 set(125000) 0.59100 ns/op 0.61200 ns/op 0.97
Array 250000 iterate all - loop 82.303 us/op 87.684 us/op 0.94
effectiveBalanceIncrements clone Uint8Array 300000 23.138 us/op 35.926 us/op 0.64
effectiveBalanceIncrements clone MutableVector 300000 295.00 ns/op 286.00 ns/op 1.03
effectiveBalanceIncrements rw all Uint8Array 300000 178.29 us/op 183.47 us/op 0.97
effectiveBalanceIncrements rw all MutableVector 300000 79.391 ms/op 90.310 ms/op 0.88
phase0 afterProcessEpoch - 250000 vs - 7PWei 114.41 ms/op 120.67 ms/op 0.95
phase0 beforeProcessEpoch - 250000 vs - 7PWei 39.145 ms/op 44.930 ms/op 0.87
altair processEpoch - mainnet_e81889 328.58 ms/op 330.67 ms/op 0.99
mainnet_e81889 - altair beforeProcessEpoch 46.654 ms/op 66.133 ms/op 0.71
mainnet_e81889 - altair processJustificationAndFinalization 13.136 us/op 16.275 us/op 0.81
mainnet_e81889 - altair processInactivityUpdates 5.4697 ms/op 5.2784 ms/op 1.04
mainnet_e81889 - altair processRewardsAndPenalties 50.063 ms/op 73.275 ms/op 0.68
mainnet_e81889 - altair processRegistryUpdates 2.4960 us/op 2.6420 us/op 0.94
mainnet_e81889 - altair processSlashings 423.00 ns/op 925.00 ns/op 0.46
mainnet_e81889 - altair processEth1DataReset 508.00 ns/op 591.00 ns/op 0.86
mainnet_e81889 - altair processEffectiveBalanceUpdates 1.2617 ms/op 1.5407 ms/op 0.82
mainnet_e81889 - altair processSlashingsReset 4.0160 us/op 5.0480 us/op 0.80
mainnet_e81889 - altair processRandaoMixesReset 7.0650 us/op 5.0980 us/op 1.39
mainnet_e81889 - altair processHistoricalRootsUpdate 660.00 ns/op 878.00 ns/op 0.75
mainnet_e81889 - altair processParticipationFlagUpdates 1.9800 us/op 2.6680 us/op 0.74
mainnet_e81889 - altair processSyncCommitteeUpdates 677.00 ns/op 654.00 ns/op 1.04
mainnet_e81889 - altair afterProcessEpoch 125.60 ms/op 146.78 ms/op 0.86
capella processEpoch - mainnet_e217614 1.0116 s/op 1.1904 s/op 0.85
mainnet_e217614 - capella beforeProcessEpoch 220.98 ms/op 299.15 ms/op 0.74
mainnet_e217614 - capella processJustificationAndFinalization 12.489 us/op 19.386 us/op 0.64
mainnet_e217614 - capella processInactivityUpdates 18.673 ms/op 23.865 ms/op 0.78
mainnet_e217614 - capella processRewardsAndPenalties 277.64 ms/op 327.07 ms/op 0.85
mainnet_e217614 - capella processRegistryUpdates 19.473 us/op 35.613 us/op 0.55
mainnet_e217614 - capella processSlashings 415.00 ns/op 1.0210 us/op 0.41
mainnet_e217614 - capella processEth1DataReset 442.00 ns/op 705.00 ns/op 0.63
mainnet_e217614 - capella processEffectiveBalanceUpdates 4.1640 ms/op 4.3704 ms/op 0.95
mainnet_e217614 - capella processSlashingsReset 2.5000 us/op 4.2670 us/op 0.59
mainnet_e217614 - capella processRandaoMixesReset 3.6180 us/op 7.0730 us/op 0.51
mainnet_e217614 - capella processHistoricalRootsUpdate 485.00 ns/op 852.00 ns/op 0.57
mainnet_e217614 - capella processParticipationFlagUpdates 2.0140 us/op 2.2280 us/op 0.90
mainnet_e217614 - capella afterProcessEpoch 299.21 ms/op 325.38 ms/op 0.92
phase0 processEpoch - mainnet_e58758 328.13 ms/op 394.79 ms/op 0.83
mainnet_e58758 - phase0 beforeProcessEpoch 117.35 ms/op 131.38 ms/op 0.89
mainnet_e58758 - phase0 processJustificationAndFinalization 14.493 us/op 17.402 us/op 0.83
mainnet_e58758 - phase0 processRewardsAndPenalties 59.099 ms/op 60.867 ms/op 0.97
mainnet_e58758 - phase0 processRegistryUpdates 9.5580 us/op 10.725 us/op 0.89
mainnet_e58758 - phase0 processSlashings 544.00 ns/op 586.00 ns/op 0.93
mainnet_e58758 - phase0 processEth1DataReset 479.00 ns/op 453.00 ns/op 1.06
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 1.0514 ms/op 2.0083 ms/op 0.52
mainnet_e58758 - phase0 processSlashingsReset 2.0940 us/op 2.3180 us/op 0.90
mainnet_e58758 - phase0 processRandaoMixesReset 4.0230 us/op 3.7540 us/op 1.07
mainnet_e58758 - phase0 processHistoricalRootsUpdate 495.00 ns/op 424.00 ns/op 1.17
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.9220 us/op 4.5850 us/op 0.86
mainnet_e58758 - phase0 afterProcessEpoch 106.34 ms/op 102.55 ms/op 1.04
phase0 processEffectiveBalanceUpdates - 250000 normalcase 1.3141 ms/op 1.5509 ms/op 0.85
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.4625 ms/op 1.7861 ms/op 0.82
altair processInactivityUpdates - 250000 normalcase 23.495 ms/op 21.680 ms/op 1.08
altair processInactivityUpdates - 250000 worstcase 25.999 ms/op 22.244 ms/op 1.17
phase0 processRegistryUpdates - 250000 normalcase 10.908 us/op 9.5350 us/op 1.14
phase0 processRegistryUpdates - 250000 badcase_full_deposits 338.91 us/op 445.77 us/op 0.76
phase0 processRegistryUpdates - 250000 worstcase 0.5 126.45 ms/op 135.93 ms/op 0.93
altair processRewardsAndPenalties - 250000 normalcase 71.918 ms/op 65.859 ms/op 1.09
altair processRewardsAndPenalties - 250000 worstcase 71.377 ms/op 68.807 ms/op 1.04
phase0 getAttestationDeltas - 250000 normalcase 7.8693 ms/op 7.8307 ms/op 1.00
phase0 getAttestationDeltas - 250000 worstcase 7.7772 ms/op 8.5827 ms/op 0.91
phase0 processSlashings - 250000 worstcase 2.2562 ms/op 2.5368 ms/op 0.89
altair processSyncCommitteeUpdates - 250000 156.63 ms/op 167.57 ms/op 0.93
BeaconState.hashTreeRoot - No change 265.00 ns/op 282.00 ns/op 0.94
BeaconState.hashTreeRoot - 1 full validator 53.195 us/op 58.077 us/op 0.92
BeaconState.hashTreeRoot - 32 full validator 537.28 us/op 575.15 us/op 0.93
BeaconState.hashTreeRoot - 512 full validator 5.8260 ms/op 6.0629 ms/op 0.96
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 62.819 us/op 74.748 us/op 0.84
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 925.40 us/op 984.48 us/op 0.94
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 10.896 ms/op 13.350 ms/op 0.82
BeaconState.hashTreeRoot - 1 balances 46.416 us/op 51.132 us/op 0.91
BeaconState.hashTreeRoot - 32 balances 459.09 us/op 483.35 us/op 0.95
BeaconState.hashTreeRoot - 512 balances 4.5131 ms/op 4.9445 ms/op 0.91
BeaconState.hashTreeRoot - 250000 balances 70.853 ms/op 79.462 ms/op 0.89
aggregationBits - 2048 els - zipIndexesInBitList 15.105 us/op 23.433 us/op 0.64
regular array get 100000 times 32.999 us/op 46.124 us/op 0.72
wrappedArray get 100000 times 32.954 us/op 37.734 us/op 0.87
arrayWithProxy get 100000 times 14.746 ms/op 15.209 ms/op 0.97
ssz.Root.equals 202.00 ns/op 281.00 ns/op 0.72
byteArrayEquals 202.00 ns/op 295.00 ns/op 0.68
shuffle list - 16384 els 6.8905 ms/op 7.4069 ms/op 0.93
shuffle list - 250000 els 101.40 ms/op 108.94 ms/op 0.93
processSlot - 1 slots 8.3970 us/op 10.436 us/op 0.80
processSlot - 32 slots 1.3490 ms/op 1.6844 ms/op 0.80
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 52.675 ms/op 53.168 ms/op 0.99
getCommitteeAssignments - req 1 vs - 250000 vc 2.5204 ms/op 2.6608 ms/op 0.95
getCommitteeAssignments - req 100 vs - 250000 vc 3.7275 ms/op 3.8550 ms/op 0.97
getCommitteeAssignments - req 1000 vs - 250000 vc 4.0954 ms/op 4.2540 ms/op 0.96
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 4.5300 ns/op 5.5100 ns/op 0.82
state getBlockRootAtSlot - 250000 vs - 7PWei 721.06 ns/op 697.68 ns/op 1.03
computeProposers - vc 250000 8.6047 ms/op 9.7738 ms/op 0.88
computeEpochShuffling - vc 250000 104.21 ms/op 110.29 ms/op 0.94
getNextSyncCommittee - vc 250000 145.46 ms/op 163.91 ms/op 0.89
computeSigningRoot for AttestationData 13.411 us/op 14.543 us/op 0.92
hash AttestationData serialized data then Buffer.toString(base64) 2.3548 us/op 2.4617 us/op 0.96
toHexString serialized data 1.0766 us/op 1.2478 us/op 0.86
Buffer.toString(base64) 218.29 ns/op 257.69 ns/op 0.85

by benchmarkbot/action

@twoeths
Copy link
Contributor Author

twoeths commented Aug 9, 2023

I'm not sure if using common variables before the for loop help in the case lodestar uses so much memory, the performance is the same when I test in goerli. At least it does no harm, may want to consult Ben if it's a necessary change or not

Screenshot 2023-08-09 at 15 42 25

I'm not sure why the perf test is way faster even though I added all votes with different current root / next root. At least I see the benchmark is better, from more than 3ms for 250k validators

Screenshot 2023-08-09 at 15 48 31

to this new value

Screenshot 2023-08-09 at 15 49 07

@twoeths twoeths marked this pull request as ready for review August 9, 2023 10:12
@twoeths twoeths requested a review from a team as a code owner August 9, 2023 10:12
@@ -21,33 +21,38 @@ export function computeDeltas(
): number[] {
const deltas = Array.from({length: indices.size}, () => 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be improved, it's faster to do a regular for loop than array.from

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Array.fill performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last time i checked regular for loop in the fastest way to populate an array. There should be benchmarks in the state transition package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  array creation
Array.from(() => 0): 9.6100125 ms
    ✔ Array.from(() => 0)
Array.from().fill(0): 8.389225 ms
    ✔ Array.from().fill(0)
Array.from(): 15.170741699999999 ms
    ✔ Array.from()
new Array(): 0.8613583 ms
    ✔ new Array()
new Array(); for loop: 1.2474166 ms
    ✔ new Array(); for loop

@dapplion I'll do new Array() and a for loop in in #5882 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmark just pushing to an empty array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, how important is it? in this case there are always more than 0 protonodes in forkchoice

@twoeths
Copy link
Contributor Author

twoeths commented Aug 10, 2023

with 0aa5224 the forkChoice updateHead vc 600000 bc 7200 eq 0 test time is reduced from 15s/op to 22ms/op in my local environment

@twoeths twoeths changed the title chore: update performance test of computeDeltas fix: improve forkchoice updateHead() time Aug 10, 2023
@wemeetagain wemeetagain merged commit 7b38a1a into unstable Aug 11, 2023
11 checks passed
@wemeetagain wemeetagain deleted the tuyen/computeDelta_perf_test branch August 11, 2023 20:06
@twoeths
Copy link
Contributor Author

twoeths commented Aug 23, 2023

The root cause of this slow down is we call getAncestor() to check for descendant of finalized node too many times. If we have n node, applyScoreChanges will call from n * getAncestor() to 2 * n * getAncestor() with getAncestor(i) ranges from 1 to n

The complexity of a function to check if all nodes are ancestor of finalized node (naive way, not the improvement in this PR) is O(n * (n + 1) / 2), so worse case of applyScoreChanges would be O(n * (n + 1))

We need to review:

  • functions that call getAncestor()
  • functions that loop through the entire proto array more than 1 time

I opened #5901 #5902 from this review cc @dapplion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants