Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous benchmarking #80

Open
bartvanerp opened this issue Feb 28, 2023 · 10 comments
Open

Continuous benchmarking #80

bartvanerp opened this issue Feb 28, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request Performance Improve code speed

Comments

@bartvanerp
Copy link
Member

bartvanerp commented Feb 28, 2023

In the future it would be good to have some kind of benchmarking system in our CI, such that we become aware how changes in our code impact performance. An example of such a system is provided by FluxBench.jl and corresponding website.

@bartvanerp bartvanerp added the enhancement New feature or request label Feb 28, 2023
@albertpod albertpod moved this to 🤔 Ideas in RxInfer Feb 28, 2023
@bartvanerp bartvanerp added the Performance Improve code speed label Mar 1, 2023
@bvdmitri
Copy link
Member

bvdmitri commented Oct 5, 2023

Before we commence the actual benchmarking process, it's crucial to conduct preliminary research to determine what tasks we can and should perform ourselves and what components we can potentially leverage from other packages or libraries. Here are the key aspects we need to investigate:

  • Benchmarking Methodology:
    • Research the available benchmarking tools, such as BenchmarkTools.jl and PkgBenchmark.jl, and determine which one is most suitable for our needs. Others?
  • Benchmark Targets:
    • Define what specific aspects we need to benchmark. For example the set of benchmarks will clearly be different for RxInfer and ExponentialFamily. Clearly outline the performance metrics or characteristics that need measurement.
  • Reporting and Visualization:
    • Explore methods for reporting and visualizing benchmark results. Should we use graphical representations, tables, or a combination of both? What libraries can we use for that?
  • Results Storage:
    • Determine where to store the benchmark results to ensure easy access and future analysis.
  • Benchmark Execution:
    • Investigate the feasibility of executing benchmarks using our GitHub runner. Assess the setup process's complexity and determine if it's straightforward to configure.
  • As much as possible research is appreciated

This task has been added to the milestone for tracking and prioritization.

@bvdmitri bvdmitri added this to the RxInfer update Nov 28th milestone Nov 14, 2023
@bvdmitri
Copy link
Member

@bartvanerp
This task has been added to the milestone for tracking and prioritization.

@bartvanerp
Copy link
Member Author

Just did some very extensive research:

Benchmarking Methodology:
I think PkgBenchmark.jl is the best for creating the benchmark suite. I played around with this for RxSDE.jl a bit and really liked it. This package, however, only tests the execution speed, but I think this metric would be good to start off with. Other metrics, once relevant and implemented, would likely require custom tests anyways.

Benchmark Targets:
Let's start off with automatic execution speed as a performance metric. Later on we can extend it, if we have some relevant other metrics and appropriate tests. For me this is beyond the scope of this PR.

Reporting and Visualization:
PkgBenchmark.jl automatically generates a report (with differences) between two different commits. There also exists BenchmarkCI.jl to run this on GitHub, but I don't think this will give us reliable performance metrics. FluxBench.jl depends on both, but is likely very much tailored towards Flux.jl, so I am not sure whether this is desirable. For now I propose to just generate the report and to include it manually in a PR, which will be required before having the PR approved. Other reporting/visualization tools will be nice, but probably we will have to implement this ourselves.

Results Storage:
If we manually copy them in the PR, then they are saved there. Ideally we want something similar as CodeCov, which just executes the PR and shows the difference report.

Benchmark Execution:
I think this post is a nice example of running the benchmarks through CI on the GitHub hosted runner: https://labs.quansight.org/blog/2021/08/github-actions-benchmarks. It is just not quite stable. Furthermore, it will burn through our GitHub minutes. We could hook up a Raspberry Pi (which is not fast, but perhaps that is actually a good thing as we are targeting these devices) as a custom runner: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners.

@bvdmitri @albertpod let me know what you think.

@bartvanerp
Copy link
Member Author

Aside from the real performance benchmakr, we can also already start building a test suit for allocations using AllocCheck.jl, https://github.com/JuliaLang/AllocCheck.jl.

@bartvanerp
Copy link
Member Author

Today we discussed the issue together with @albertpod and @bvdmitri. We agree on the following plan:

All of our packages will need to be extended with a benchmark suite containing performance and memory (allocation) benchmarks. Alternative metrics can be added later once we have developed suitable methods for testing them. @bartvanerp will make a start with this for the FastCholesky.jl package to experiment with it.

Starting in January we will extend the benchmark suites to our other packages and will divide tasks.

For now we will ask everyone to run the benchmarks locally when filing a PR. The benchmarking diff/results will need to be uploaded with the PR. Future work will be to automate this using a custom GitHub runner (our Raspberry Pi), and to visualize results online.

@bartvanerp
Copy link
Member Author

Made a start with the benchmarks for FastCholesky.jl at ReactiveBayes/FastCholesky.jl#8.

There is one point which I need to adjust in my above statements: let's skip the extra allocation benchmarks, as these are automatically included in PkgBenchmark.jl.

@bartvanerp
Copy link
Member Author

bartvanerp commented Nov 28, 2023

Coming back to the memory benchmarking: I think it will still be good to create tests for inplace functions, which we assume to be non-allocating, to check whether they are still non-allocating. Kind of like a test which checks @allocated foo() == 0. The AllocCheck package currently does not support this, but the TestNoAllocations package does. Nonetheless, AllocCheck has some PR's which will include this behaviour and supercede TestNoAllcoations: JuliaLang/AllocCheck.jl#59, JuliaLang/AllocCheck.jl#55

@bvdmitri
Copy link
Member

I used AllocCheck here. The limitation is that it does check allocations statically, which limits its applications to a very small and type-stable functions. Still useful though

@albertpod albertpod removed this from RxInfer Jan 11, 2024
@albertpod albertpod moved this to 🤔 Ideas in RxInfer Jan 11, 2024
@wouterwln wouterwln modified the milestones: RxInfer update Nov 28th, RxInfer 3.0.0 release Mar 15, 2024
@wouterwln
Copy link
Member

Let's make sure we have a benchmark suite boilerplate set up before the 3.0.0 release, such that we can track performance from 3.0.0 onwards

@wouterwln
Copy link
Member

I'm moving this to 3.1.0 now, but I suggest we use https://github.com/MilesCranmer/AirspeedVelocity.jl for this. It works with existing ecosystem of BenchmarkTools and PkgBenchmark. Let's investigate if we can bench RMP and GraphPPL behaviour with this as well

@wouterwln wouterwln modified the milestones: RxInfer 3.0.0 release, RxInfer 3.1.0 release Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Performance Improve code speed
Projects
Status: 🤔 Ideas
Development

No branches or pull requests

4 participants