Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/vm: use uint256.Bytes32 and builtin copy to make MSTORE faster #637

Closed
wants to merge 1 commit into from

Conversation

minh-bq
Copy link
Collaborator

@minh-bq minh-bq commented Nov 28, 2024

In commit f791124 ("core/vm: optimize the mstore opcode with loop unrolling"), we optimize the loop that copies each byte by manually unrolling the loop as it seems like Go cannot do that at this time. This makes the code quite ugly and might increase the number of unique instructions executed, creates more pressure to the instruction cache.

This commit instead follows the go-ethereum commit e0a1fd5 ("core/vm: optimize Memory.Set32") by using uint256.Bytes32 and builtin copy. The uint256.Bytes32 is inlined and is compiled into fewer instructions 4x (load, bswap, store). The builtin copy can copy 32 bytes by just 2 load-store pairs using 128-bit (16-byte) xmm register.

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
EvmInsertionSort-8          108.7m ±  6%   104.6m ± 26%        ~ (p=0.631 n=10)
EvmQuickSort-8              6.848m ± 14%   6.633m ±  3%        ~ (p=0.089 n=10)
EvmSignatureValidation-8    15.96µ ±  3%   15.48µ ±  3%        ~ (p=0.052 n=10)
EvmMulticallErcTransfer-8   6.503m ± 15%   6.562m ±  6%        ~ (p=0.912 n=10)
EvmRedBlackTree-8           302.5m ±  4%   305.0m ±  2%        ~ (p=0.684 n=10)
OpMstore-8                  33.47n ± 10%   30.09n ±  6%  -10.07% (p=0.000 n=10)
geomean                     959.9µ         930.0µ         -3.11%

In commit f791124 ("core/vm: optimize the mstore opcode with loop
unrolling"), we optimize the loop that copies each byte by manually unrolling
the loop as it seems like Go cannot do that at this time. This makes the code
quite ugly and might increase the number of unique instructions executed,
creates more pressure to the instruction cache.

This commit instead follows the go-ethereum commit e0a1fd5 ("core/vm:
optimize Memory.Set32") by using uint256.Bytes32 and builtin copy. The
uint256.Bytes32 is inlined and is compiled into fewer instructions 4x (load,
bswap, store). The builtin copy can copy 32 bytes by just 2 load-store pairs
using 128-bit (16-byte) xmm register.

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                          │   old.txt    │               new.txt                │
                          │    sec/op    │    sec/op     vs base                │
EvmInsertionSort-8          108.7m ±  6%   104.6m ± 26%        ~ (p=0.631 n=10)
EvmQuickSort-8              6.848m ± 14%   6.633m ±  3%        ~ (p=0.089 n=10)
EvmSignatureValidation-8    15.96µ ±  3%   15.48µ ±  3%        ~ (p=0.052 n=10)
EvmMulticallErcTransfer-8   6.503m ± 15%   6.562m ±  6%        ~ (p=0.912 n=10)
EvmRedBlackTree-8           302.5m ±  4%   305.0m ±  2%        ~ (p=0.684 n=10)
OpMstore-8                  33.47n ± 10%   30.09n ±  6%  -10.07% (p=0.000 n=10)
geomean                     959.9µ         930.0µ         -3.11%
@minh-bq minh-bq closed this Nov 29, 2024
@minh-bq minh-bq deleted the optimize-mstore branch November 29, 2024 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant