Improve Benchmark Selection (#217)

* Try slightly different benchmarks * Try out different sum sizes * Sum over sin of data * reintroduce sum * Fix typo * Improve numbers reported * Improve discussion in benchmark readme and CI * Make link to benchmarking readme clickable * Improve benchmark discussion * Update .github/workflows/CI.yml Co-authored-by: Hong Ge <3279477+yebai@users.noreply.github.com> --------- Co-authored-by: Hong Ge <3279477+yebai@users.noreply.github.com>
compintell · Aug 8, 2024 · 55cb4cf · 55cb4cf
1 parent 1ccdafb
commit 55cb4cf
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 5 deletions.
diff --git a/.github/workflows/CI.yml b/.github/workflows/CI.yml
@@ -122,6 +122,6 @@ jobs:
  uses: peter-evans/create-or-update-comment@v4
  with:
  issue-number: ${{ github.event.pull_request.number }}
- body: "Performance Ratio:\nWarning: results are very approximate!\n```\n${{ steps.read-file.outputs.table }}\n```"
+ body: "Performance Ratio:\nRatio of time to compute gradient and time to compute function.\nWarning: results are very approximate! See [here](https://github.com/compintell/Tapir.jl/tree/main/bench#inter-framework-benchmarking) for more context.\n```\n${{ steps.read-file.outputs.table }}\n```"
  comment-id: ${{ steps.fc.outputs.comment-id }}
  edit-mode: replace
diff --git a/bench/README.md b/bench/README.md
@@ -43,6 +43,12 @@ plot_ratio_histogram!(df)
 ## Inter-framework Benchmarking
 
 This comprises a small suite of functions that we AD using `Tapir.jl`, `Zygote.jl`, `ReverseDiff.jl`, and `Enzyme.jl`.
+The primary purpose of this suite of benchmarks is to ensure that we're regularly comparing the performance of a range of reverse-mode ADs on a set of problems which are known to stretch them in various ways.
+For any given function in the suite, some frameworks might have rules for it, and some not.
+For example, `Zygote.jl` only achieves good performance on any of test cases because it has many rules.
+For this reason, we include a hand-written version of `sum` and of `map`, on which `Zygote.jl` achieves poor performance.
+`ReverseDiff.jl` has this property, although to a lesser extent than `Zygote.jl`.
+
 This suite of benchmarks is also run as part of CI, and the output is recorded in two ways:
 1. a table of results is posted as comment in a PR
 1. the table and a corresponding graph are stored as github actions artifacts, and can be retrieved by going to the "Checks" tab of your PR, and clicking on the artifact button.

diff --git a/bench/run_benchmarks.jl b/bench/run_benchmarks.jl
@@ -42,12 +42,12 @@ should_run_benchmark(args...) = true
 # Test out the performance of a hand-written sum function, so we can be confident that there
 # is no rule. Note that ReverseDiff has a (seemingly not fantastic) hand-written rule for
 # sum.
-function _sum(x::AbstractArray{<:Real})
+function _sum(f::F, x::AbstractArray{<:Real}) where {F}
  y = 0.0
  n = 0
  while n < length(x)
  n += 1
- y += x[n]
+ y += f(x[n])
  end
  return y
 end
@@ -137,8 +137,10 @@ an array.
 """
 function generate_inter_framework_tests()
  return Any[
- ("sum", (sum, randn(100))),
- ("_sum", (_sum, randn(100))),
+ ("sum_1000", (sum, randn(1_000))),
+ ("_sum_1000", (x -> _sum(identity, x), randn(1_000))),
+ ("sum_sin_1000", (x -> sum(sin, x), randn(1_000))),
+ ("_sum_sin_1000", (x -> _sum(sin, x), randn(1_000))),
  ("kron_sum", (_kron_sum, randn(20, 20), randn(40, 40))),
  ("kron_view_sum", (_kron_view_sum, randn(40, 30), randn(40, 40))),
  ("naive_map_sin_cos_exp", (_naive_map_sin_cos_exp, randn(10, 10))),