Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement binary EEA inversion for faster BNADD precompile #21515

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shamatar
Copy link

@shamatar shamatar commented Sep 3, 2020

Single field inversion takes almost all the time of execution in case of point addition. Switching from inversion by exponentiation into p-2 power to binary EEA inversion reduces execution time in 3 times

Before change:
BenchmarkG1AddAndMakeAffine-12         85004         13955 ns/op         256 B/op          5 allocs/op
After change:
BenchmarkG1AddAndMakeAffine-12        261412          4593 ns/op         256 B/op          5 allocs/op

@karalabe
Copy link
Member

karalabe commented Sep 4, 2020

Oooh, this sounds nice. Would it make sense to upstream it to Cloudflare's original library? https://github.com/cloudflare/bn256

Edit: I've opened an upstream issue referencing this cloudflare/bn256#23.

@bwesterb
Copy link

bwesterb commented Sep 4, 2020

This is not constant time.

@shamatar
Copy link
Author

shamatar commented Sep 4, 2020

I’m not sure if Cloudflare needs it constant time, but since element is not zero and inverse always exists then such inversion will finish in log(p) steps (for Eth purposes to be bounded computation).

N.B. Original EIP for this functionality does not give constant time guarantees anyway

@armfazh
Copy link

armfazh commented Sep 4, 2020

EEA is not the recommeded way to perform inversion, because its execution is not performed in constant time with respect to the value of its input.
EEA is well-known to be the source of several side-channel attacks, see for example ia.cr/2020/432.

Regarding performance, point operations can be performed in projective coordinates, and at the very end make just one conversion.

@shamatar
Copy link
Author

shamatar commented Sep 4, 2020

As was mentioned above there is no need to constant time for the precompile. And addition is indeed performed in projective coordinates, but Ethereum ABI requires outputs to be in affine coordinates, so fast inversion is important for single addition performance.

@armfazh
Copy link

armfazh commented Sep 4, 2020

This change seems to be specific to the application, and it's likely not to be included in cloudflare/bn256.
You could add a method called MakeAffineInsecure which uses EEA to disinguish from the original MakeAffine method.

@shamatar
Copy link
Author

shamatar commented Sep 7, 2020

Had to do some duplication of code (to avoid global config variables), but now there are two flavors of 'constant time' and 'variable time' available functions. Fp6/Fp12 inversions do not benefit from variable time speedup much, but I've continued with the same approach everywhere. Miller loop and final exponentiations are left with constant time cause speedup there is also negligible.

@holiman
Copy link
Contributor

holiman commented Sep 8, 2020

Before and after:

name                                     old time/op    new time/op          delta
PrecompiledBn256Add/chfast1-Gas=150-6      13.6µs ± 1%           5.1µs ± 1%            -62.24%  (p=0.016 n=5+4)
PrecompiledBn256Add/chfast2-Gas=150-6      13.6µs ± 2%           5.2µs ± 1%            -61.84%  (p=0.016 n=5+4)
PrecompiledBn256Add/cdetrio1-Gas=150-6     1.02µs ± 2%          1.01µs ± 1%               ~     (p=0.286 n=5+5)
PrecompiledBn256Add/cdetrio2-Gas=150-6     1.05µs ± 1%          1.10µs ±20%               ~     (p=0.952 n=5+5)
PrecompiledBn256Add/cdetrio3-Gas=150-6     1.06µs ± 2%          1.05µs ± 4%               ~     (p=0.690 n=5+5)
PrecompiledBn256Add/cdetrio4-Gas=150-6     1.07µs ± 0%          1.10µs ± 2%             +2.21%  (p=0.029 n=4+4)
PrecompiledBn256Add/cdetrio5-Gas=150-6     1.04µs ± 3%          1.03µs ± 4%               ~     (p=0.246 n=5+5)
PrecompiledBn256Add/cdetrio6-Gas=150-6     1.36µs ± 2%          1.45µs ±18%               ~     (p=0.222 n=5+5)
PrecompiledBn256Add/cdetrio7-Gas=150-6     1.37µs ± 2%          1.44µs ±26%               ~     (p=0.841 n=5+5)
PrecompiledBn256Add/cdetrio8-Gas=150-6     1.41µs ± 1%          1.56µs ±18%               ~     (p=0.063 n=5+5)
PrecompiledBn256Add/cdetrio9-Gas=150-6     1.35µs ± 2%          1.38µs ± 1%             +2.19%  (p=0.032 n=5+4)
PrecompiledBn256Add/cdetrio10-Gas=150-6    1.37µs ± 7%          1.47µs ±23%               ~     (p=0.151 n=5+5)
PrecompiledBn256Add/cdetrio11-Gas=150-6    13.9µs ± 1%           5.6µs ± 1%            -59.74%  (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio12-Gas=150-6    13.8µs ± 1%           5.8µs ±13%            -57.82%  (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio13-Gas=150-6    13.7µs ± 1%           5.4µs ± 4%            -60.20%  (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio14-Gas=150-6    2.05µs ± 2%          2.13µs ± 5%               ~     (p=0.056 n=5+5)

It does a real difference in many cases (not all) - particularly these five:

PrecompiledBn256Add/chfast1-Gas=150-6      13.6µs ± 1%           5.1µs ± 1%            -62.24%  (p=0.016 n=5+4)
PrecompiledBn256Add/chfast2-Gas=150-6      13.6µs ± 2%           5.2µs ± 1%            -61.84%  (p=0.016 n=5+4)
PrecompiledBn256Add/cdetrio11-Gas=150-6    13.9µs ± 1%           5.6µs ± 1%            -59.74%  (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio12-Gas=150-6    13.8µs ± 1%           5.8µs ±13%            -57.82%  (p=0.008 n=5+5)
PrecompiledBn256Add/cdetrio13-Gas=150-6    13.7µs ± 1%           5.4µs ± 4%            -60.20%  (p=0.008 n=5+5)

However, the particular cases where it does make a difference, are the cases which are currently the slowest:

namely, these five:

name                                     old mgas/s    
PrecompiledBn256Add/chfast1-Gas=150-6        11.0 ± 1% 
PrecompiledBn256Add/chfast2-Gas=150-6        11.0 ± 2% 
PrecompiledBn256Add/cdetrio11-Gas=150-6      10.8 ± 1% 
PrecompiledBn256Add/cdetrio12-Gas=150-6      10.9 ± 1% 
PrecompiledBn256Add/cdetrio13-Gas=150-6      11.0 ± 1% 

Which are an order of magnitude worse than others:

PrecompiledBn256Add/cdetrio1-Gas=150-6        147 ± 1% 
PrecompiledBn256Add/cdetrio2-Gas=150-6        142 ± 1% 
PrecompiledBn256Add/cdetrio3-Gas=150-6        142 ± 2% 
PrecompiledBn256Add/cdetrio4-Gas=150-6        140 ± 0% 
PrecompiledBn256Add/cdetrio5-Gas=150-6        144 ± 3% 
PrecompiledBn256Add/cdetrio6-Gas=150-6        110 ± 2% 
PrecompiledBn256Add/cdetrio7-Gas=150-6        110 ± 2% 
PrecompiledBn256Add/cdetrio8-Gas=150-6        107 ± 2% 
PrecompiledBn256Add/cdetrio9-Gas=150-6        111 ± 2% 
PrecompiledBn256Add/cdetrio10-Gas=150-6       110 ± 7% 
PrecompiledBn256Add/cdetrio14-Gas=150-6      73.3 ± 2% 

As stated previously, constant time is not something we're bothered by. As long as this is mathematically correct, and doesn't suffer from a different set of worst-cases that we're as of yet unaware of, I'm all for merging this.

@bwesterb
Copy link

bwesterb commented Sep 9, 2020

Constant time must be the default.

@holiman
Copy link
Contributor

holiman commented Sep 10, 2020

We geth-maintainers aren't cryptographers, and it's difficult for us to validate the proposed changes, other than that we can definitely agree that for us, constant time at the cost of speed, is not desireable.

However, I'd highly appreciate any input from @bwesterb or @armfazh regarding:

  • Is the change correct, as in, assuming the implementation is correct, does the math hold?
  • Would this lead to a new (even worse) set of worst-cases?

I don't expect you to spend a lot of time auditing our code, but if you have already looked at it, perhaps you already have formed opinions about these questions.

Copy link

@armfazh armfazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments

crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
crypto/bn256/cloudflare/gfp.go Outdated Show resolved Hide resolved
@MariusVanDerWijden
Copy link
Member

I created some fuzzers for this change here:
https://github.com/MariusVanDerWijden/go-ethereum/tree/var_time_fuzzer
Been running them for some time now and they haven't found a difference between the varTime and the constant time implementations.

@holiman
Copy link
Contributor

holiman commented Sep 24, 2020

@armfazh I think all your concerns has been addressed - none of the variable-time functions are on default paths. Do you think this can/will be merged upstream?

@armfazh
Copy link

armfazh commented Sep 24, 2020

@armfazh I think all your concerns has been addressed - none of the variable-time functions are on default paths. Do you think this can/will be merged upstream?

could anyone of you prepare a PR against upstream, so we can do a final review and merge it then,

@holiman
Copy link
Contributor

holiman commented Sep 28, 2020

I started prepping an upstream PR, but for some reason the test fails. @shamatar do you have any idea why? I just copy-pasted the changes... https://github.com/holiman/bn256/tree/binary_eea_inversion

=== RUN   TestBinaryEAA
    TestBinaryEAA: bn256_test.go:232: results of different inversion do not agree

@shamatar
Copy link
Author

I'll have a look at the morning, sure.

@shamatar
Copy link
Author

I've identified a problem: modulus size in the original repo is 256 bits, so the trick with addNocarry and then right shift by 1 bit will not work cause addNocarry actually produces carry. In Ethereum the curve is different and modulus is 254 bits. I can fix it in a free time, but not sure about the time yet.

@axic
Copy link
Member

axic commented Nov 26, 2021

@shamatar friendly ping, was this ever looked at?

@shamatar
Copy link
Author

Hey @axic! I didn't, but the solution is quite simple, so since you have reminded me I should make a direct PR to the Cloudflare's repo, but for Ethereum the BN curve is different than in the original CF repo, so the trick with addNocarry is perfectly safe for Ethereum curve (adding two 254 bit numbers can not overflow 256 bit buffer) and can be used as-is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants