Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible builds #6393

Open
michaelsproul opened this issue Sep 13, 2024 · 10 comments
Open

Reproducible builds #6393

michaelsproul opened this issue Sep 13, 2024 · 10 comments
Labels
enhancement New feature or request good first issue Good for newcomers infra-ci security

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Sep 13, 2024

Description

There is some desire to build Lighthouse in a reproducible way. Making Lighthouse reproducible means that two source builds of Lighthouse at the same commit and with the same toolchain would produce bit-identical binaries.

This issue is an umbrella issue to track progress towards this goal, and document cases where Lighthouse can already be built reproducibly.

Working Reproducible Builds

  • On my M1 Pro Mac, building Lighthouse twice in a row with make reproducible results in identical binaries.

Failing Reproducible Builds

  • On my x86_64 Linux desktop, building Lighthouse twice in a row under Cross (make build-x86_64) doesn't produce an identical binary.
  • On my x86_64 Linux desktop, building Lighthouse twice in a row on the host (make reproducible) doesn't produce an identical binary.
  • Building with Cross on our release builder also doesn't produce the same binary as one built locally with Cross.

Steps to resolve

Unclear.

I have no idea why the Cross builds are more variable than the macOS builds, when one would naively expect the Docker image to provide a stable toolchain/etc. Perhaps the default Linux linker is more random by default than the macOS linker?

@jmcph4
Copy link
Member

jmcph4 commented Sep 13, 2024

Unlikely to completely solve our issues but this might explain some of the Mac vs Linux differences:

packed - This is the default for Windows MSVC and macOS. The term "packed" here means that all the debug information is packed into a separate file from the main executable. On Windows MSVC this is a *.pdb file, on macOS this is a *.dSYM folder, and on other platforms this is a *.dwp file.

unpacked - This means that debug information will be found in separate files for each compilation unit (object file). This is not supported on Windows MSVC. On macOS this means the original object files will contain debug information. On other Unix platforms this means that *.dwo files will contain debug information.

-- https://doc.rust-lang.org/rustc/codegen-options/index.html#split-debuginfo

Furthering this, is that Cargo's default release profile (which I note your PR uses) has a platform-specific value for this flag.

@jmcph4
Copy link
Member

jmcph4 commented Sep 13, 2024

Okay, the current state of things for me locally:

$ cargo clean
$ make reproducible
$ cp target/reproducible/lighthouse prog1
$ cargo clean
$ cp target/reproducible/lighthouse prog2
$ xxd prog1 > prog1.txt
$ xxd prog2 > prog2.txt
$ diff prog1.txt prog2.txt
59,60c59,60
< 000003a0: 229d eb95 a98c 099f edb8 5345 f0d0 22b6  ".........SE..".
< 000003b0: b94a 27dc 0400 0000 1000 0000 0100 0000  .J'.............
---
> 000003a0: 017f 73ed 31e1 6d74 2e72 a74d c8c2 b8ae  ..s.1.mt.r.M....
> 000003b0: 15ad b366 0400 0000 1000 0000 0100 0000  ...f............
3655022c3655022
< 037c56d0: 7020 3133 2030 333a 3539 3a32 3720 3230  p 13 03:59:27 20
---
> 037c56d0: 7020 3133 2030 343a 3131 3a33 3620 3230  p 13 04:11:36 20

Note that this uses a build profile called reproducible which was my attempt at trying the split-debuginfo idea:

diff --git a/Cargo.toml b/Cargo.toml
index d8c6f487c..55fa92c53 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -248,5 +248,13 @@ lto = "fat"
 codegen-units = 1
 incremental = false
 
+[profile.reproducible]
+inherits = "release"
+lto = "fat"
+codegen-units = 1
+incremental = false
+split-debuginfo = "unpacked"
+strip = "symbols"
+
 [patch.crates-io]
 quick-protobuf = { git = "https://github.com/sigp/quick-protobuf.git", rev = "681f413312404ab6e51f0b46f39b0075c6f4ebfd" }

Edit: Here's my system information:

$ uname -a
Linux [REDACTED] 6.1.0-25-rt-amd64 #1 SMP PREEMPT_RT Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux

@michaelsproul
Copy link
Member Author

Nice!

Similar on my end but with -C split-debuginfo=packed added to RUSTFLAGS. I went the other way to you wrt the debug info, which maybe shows it doesn't matter too much?

My binary diff is:

59,60c59,60
< 000003a0: 2f20 d09e 0f13 5ee7 5c1b da73 a1eb 940b  / ....^.\..s....
< 000003b0: 1bf2 fd62 0400 0000 1000 0000 0100 0000  ...b............
---
> 000003a0: 62bd 4cc3 45d3 1ae7 e9b7 7b7d 8ae6 a950  b.L.E.....{}...P
> 000003b0: ad8b 4953 0400 0000 1000 0000 0100 0000  ..IS............
3278243c3278243
< 03205a20: 7020 3133 2030 343a 3130 3a32 3420 3230  p 13 04:10:24 20
---
> 03205a20: 7020 3133 2030 343a 3138 3a33 3920 3230  p 13 04:18:39 20

@jmcph4
Copy link
Member

jmcph4 commented Sep 13, 2024

I went the other way to you wrt the debug info, which maybe shows it doesn't matter too much?

Yeah, I sense the split-debuginfo idea was a red herring. Good to know we're at the same diff output though!

@michaelsproul
Copy link
Member Author

The timestamp seems to be coming from OpenSSL:

grep -R "13 04:18:39 20" .
grep: ./release/deps/libopenssl_sys-7dfbcd6f42382466.rlib: binary file matches
grep: ./release/deps/libopenssl_sys-e9eecb4002815770.rlib: binary file matches
grep: ./release/deps/lighthouse-49ada88ee9a96b67: binary file matches
grep: ./release/build/openssl-sys-c41c34e22b70916b/out/openssl-build/install/lib/libcrypto.a: binary file matches
grep: ./release/build/deposit_contract-dd8b79780d1c97cb/build_script_build-dd8b79780d1c97cb: binary file matches
grep: ./release/build/deposit_contract-dd8b79780d1c97cb/build-script-build: binary file matches
grep: ./release/build/openssl-sys-0a87195eb601c108/out/openssl-build/install/lib/libcrypto.a: binary file matches
grep: ./release/lighthouse: binary file matche

@michaelsproul
Copy link
Member Author

Compiling with SOURCE_DATE_EPOCH=1 fixes the openssl timestamp to 1970:

59,60c59,60
< 000003a0: 62bd 4cc3 45d3 1ae7 e9b7 7b7d 8ae6 a950  b.L.E.....{}...P
< 000003b0: ad8b 4953 0400 0000 1000 0000 0100 0000  ..IS............
---
> 000003a0: b5bf 1986 dab0 e917 dc3b d710 d421 04dd  .........;...!..
> 000003b0: ff2f 4d24 0400 0000 1000 0000 0100 0000  ./M$............
3278242,3278244c3278242,3278244
< 03205a10: 6275 696c 7420 6f6e 3a20 4672 6920 5365  built on: Fri Se
< 03205a20: 7020 3133 2030 343a 3138 3a33 3920 3230  p 13 04:18:39 20
< 03205a30: 3234 2055 5443 0000 454e 4749 4e45 5344  24 UTC..ENGINESD
---
> 03205a10: 6275 696c 7420 6f6e 3a20 5468 7520 4a61  built on: Thu Ja
> 03205a20: 6e20 2031 2030 303a 3030 3a30 3120 3139  n  1 00:00:01 19
> 03205a30: 3730 2055 5443 0000 454e 4749 4e45 5344  70 UTC..ENGINESD

@michaelsproul
Copy link
Member Author

Ooh, that seemed to fix it completely once I accounted for unclean working directory:

michael@geralt: ~/Programming/lighthouse > cp target/release/lighthouse lighthouse.2
michael@geralt: ~/Programming/lighthouse > xxd lighthouse.1 > prog1.txt
michael@geralt: ~/Programming/lighthouse > xxd lighthouse.2 > prog2.txt
michael@geralt: ~/Programming/lighthouse > diff prog1.txt prog2.txt
michael@geralt: ~/Programming/lighthouse > sha256sum lighthouse.1
0f595e8bc16dd97985ce35710750ffc40f8c1ea056b88f94f69d41c565081343  lighthouse.1
michael@geralt: ~/Programming/lighthouse > sha256sum lighthouse.2
0f595e8bc16dd97985ce35710750ffc40f8c1ea056b88f94f69d41c565081343  lighthouse.2

@metachris
Copy link

One trick we've been using to set the SOURCE_DATE_EPOCH is using the latest commit date: https://github.com/paradigmxyz/reth/pull/10459/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52R64

SOURCE_DATE_EPOCH := $(shell git log -1 --pretty=%ct)

@michaelsproul
Copy link
Member Author

@metachris We are putting this on pause for now as we don't have time to test it and get it ready for v6.0.0. If you or someone from Flashbots would like to update our build scripts with the changes you need, we can review a PR.

@michaelsproul
Copy link
Member Author

My WIP PR is here:

The main issue is that building on different machines, including under Docker vs outside, results in different binaries.

@michaelsproul michaelsproul added the good first issue Good for newcomers label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers infra-ci security
Projects
None yet
Development

No branches or pull requests

3 participants