Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(l2): automate flamegraph tests #1366

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open

Conversation

dsocolobsky
Copy link
Contributor

@dsocolobsky dsocolobsky commented Dec 2, 2024

Motivation

We want to generate flamegraphs both for ethrex and for reth on each push to main, and then be able to view them in github pages or similar.

You can see the flamegraphs at https://lambdaclass.github.io/ethrex/

Description

  • Github workflow for generating the flamegraphs and deploying the .svg files to Github Pages
  • The ethrex_l2CLI was modified to be able to create a default config non-interactively with ethrex_l2 config create default --default since the interactive CLI was not working in Github CI.
  • ethrex_l2 test load was modified so that it now retries when it can't connect to the server instead of failing directly, this was needed to run the test in Github CI.
  • test_data/genesis-load-test.json was modified to add balance to a test account, since the tests were failing sometimes otherwise.
  • You can run make flamegraph locally to run the same scripts locally to generate the Ethrex flamegraphs (not the Reth ones).

Notes

  • I had to run the perf record and then inferno-collapse-perf plus inferno-flamegraph separately because of a bug in perf with Github CI where it would output garbage to the stdout instead of the actual perf data to a perf.data file.

Things that can be improved or we can check

  • Make sure the flamegraphs are being correctly generated, that the symbols are resolved properly. We saw some weird things where some symbols were duplicated.
  • We are also calling perf script with the --no-inline option, since it takes a very long time without it, maybe we're not getting the proper data because of that.
  • Check if we can cache Reth so we don't compile it at each time.
  • Maybe the web can be improved, We can generate .png files as preview since the .svg look cropped there.
  • scripts/flamegraph.sh and .github/scripts/flamegraph_watcher.sh and very similar and they could be merged into one, however I hardcoded some CI-paths in the latter one.

@dsocolobsky dsocolobsky force-pushed the automate-perf-tests branch 5 times, most recently from 4b9b05a to 45b74ad Compare December 2, 2024 14:03
@dsocolobsky dsocolobsky force-pushed the automate-perf-tests branch 3 times, most recently from 8df6f77 to 56dccda Compare December 2, 2024 15:49
@dsocolobsky dsocolobsky force-pushed the automate-perf-tests branch 17 times, most recently from d6742e6 to 2d9e8fd Compare December 2, 2024 20:29
Copy link
Contributor

@fborello-lambda fborello-lambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work 🚀. I think we have to use the script you've created for the Makefile located at the root of the project, apart from that, it's ready to merge in my opinion.

scripts/flamegraph.sh Show resolved Hide resolved
Comment on lines +256 to +259
while [ ! -x "./target/profiling/reth" ]; do
echo "Waiting for reth binary to be ready..."
sleep 10
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the fix for the absurd time measurements, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.

cargo flamegraph both compiles and runs the binary, since we run it with & (in the background) we would immediately start the load test and measuring time, which means we would measure compile time as well!

So now I wait until the binary is generated to start the load test and the time measurement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

L2: Automate flamegraph tests
3 participants