WIP: add an asv-based benchmarks on cirun [skip cirrus] #4751

ev-br · 2024-06-13T18:28:57Z

No description provided.

ev-br · 2024-06-15T13:13:36Z

The goal here is to run benchmarks on a Graviton arch on AWS Cirun.

The demo setup (for x86_64 for now) is https://github.com/ev-br/ob_bench and https://github.com/ev-br/ob-bench-asv with web visualization at https://ev-br.github.io/ob-bench-asv/.
(The setup is not mine, shamelessly mirrored from https://sgkit-dev.github.io/sgkit-benchmarks-asv/)

In brief:

the benchmark repo (https://github.com/ev-br/ob_bench) pulls in the scipy_openblas32 nightly wheel build and builds/runs the benchmarks
then it pushes the run results to the publisher repo (https://github.com/ev-br/ob-bench-asv) which displays the graphs on its GH pages.

The nightly wheel is in fact weekly (gets built every Thursday), which matches a weekly cron job what we want on AWS.
In principle, an alternative to using the wheel is to replicate its building from source, but why.
This way, the CI job does not really need to rebuild OpenBLAS.
As such, it does not really need to live in the main OpenBLAS repo. Even more, it living in the main repo is confusing (it clones the source, then never touches it).

So how about the following plan @martin-frbg :

clone https://github.com/ev-br/ob_bench and https://github.com/ev-br/ob-bench-asv into the OpenBLAS org, rename if wanted (these were placeholder names)
either configure the publisher repo to serve gh-pages from the main branch or temporarily give me enough permissions to configure it (I do not ask or need for any permissions outside of the publisher repo!)
I'll adapt the benchmark repo to run on AWS cirun instead of vanilla GHA runners.
there will be a need to generate/add a token for the benchmark and publisher repos talking to each other, but this is for after benchmarks run on cirun.
(later) I'll look into deduplicating with codspeed benchmarks.

martin-frbg · 2024-06-15T13:45:48Z

OK, so this is a somewhat different concept from the codspeed one, not meant to flag changes caused by a particular PR ? In that case, I think we could even consider running it in larger than weekly intervals. Giving it its own home under the OpenMathLib umbrella should also be no problem, I guess. Not sure I understand your comment about deduplicating, are you intending this to replace the codspeed setup that you committed recently ? (Different architecture, different frequency of runs, different purpose as far as I understood ?)
Sorry if looking at your demo setup would already answer these - I'll try to do that later today or tomorrow

ev-br · 2024-06-15T14:12:19Z

IIUC the concept was to run it on a cron everywhere, and then codspeed was easy and free to run on each PR? The AWS cirun is trickier and not free. Not exuberantly expensive but nonetheless.

Deduplicating comment is about trivial implementation details: the python side of benchmarking is almost but not completely the same. Two reasons: 1) self-built openblas vs the wheel (prefixed names, scipy_daxpy_ vs daxpy_; the meson detection of what library to use); 2) the benchmark runners are different, so while the core benchmarks are the same, paraphernalia is slightly different. Compare https://github.com/OpenMathLib/OpenBLAS/pull/4751/files#diff-69617a6cd63a6737e2b271070f71b8e55d40a0abb8028283baef5b14f8c6ff71R59 and https://github.com/OpenMathLib/OpenBLAS/blob/develop/benchmark/pybench/benchmarks/bench_blas.py#L25

Merging them completely is not very easy, so I'd consider deduplicating up to a degree when everything runs and we're not on a deadline. No need to worry about right now, I'd say.

So the intended endgame is:

codspeed runs as is (or on a cron, up to you)
asv based benchmarks run on a cron on cirun AWS on a graviton + maybe a skylake x86_64.

ev-br · 2024-06-15T14:22:49Z

codspeed runs as is (or on a cron, up to you)

Now that I remember asking them: not easy, it cannot run on a cron. codspeed only supports pushes and pull requests.
One might work around by having a bot pushing to a separate repo, but is it really worth it.

(Different architecture, different frequency of runs, different purpose as far as I understood ?)

Yeah. Codspeed is nice but only supports x86_64.
So we trade some UI niceties and use asv on AWS. One price to pay is that the runners are different.
#4721 runs unmodified benchmarks which run on codspeed, but there is no web viewer.

martin-frbg · 2024-06-15T22:03:50Z

ok, so we keep the codspeed setup as a canary for performance regressions, and this here is basically creating codspeed-like performance-vs-commits graphs on arm64 at larger intervals (?) So far,so good - they could live in their own repository named something like BLAS-benchmarks. I wonder if it would make sense to create graphs of performance vs matrix size as well (like the older OpenBLAS benchmarks do), and - in light of #4744 - perhaps add baseline data for "competing" implementations too ?

ev-br · 2024-06-16T06:14:01Z

ok, so we keep the codspeed setup as a canary for performance regressions, and this here is basically creating codspeed-like performance-vs-commits graphs on arm64 at larger intervals (?)

Exactly! Codspeed serving as a canary-in-a-PR, and asv a canary in a week's worth of PRs (or more than a week, whatever the interval will be).

I wonder if it would make sense to create graphs of performance vs matrix size as well (like the older OpenBLAS benchmarks do)

This exists: https://ev-br.github.io/ob-bench-asv/#benchmarks.Nrm2.time_dnrm2?x-axis=size
From the landing page, https://ev-br.github.io/ob-bench-asv/, click on a benchmark box, then in the left-side panel click on "size" under "x-axis".
Here the name "size" is set at each benchmark class: https://github.com/ev-br/ob_bench/blob/main/benchmarks/benchmarks.py#L62

and - in light of #4744 - perhaps add baseline data for "competing" implementations too ?

Certainly doable. Need to sort the same some plumbing akin to what I mentioned under the "deduplicate" rubric above to be able to link to it.

they could live in their own repository named something like BLAS-benchmarks

Great! I'll be able to start adapting them to cirun once the repository exists in the org.

EDIT: In addition to web graphs, asv produces text output, like this:

$ asv run -v

... build output sniped....

[ 6.25%] ··· Running (benchmarks.DDot.time_ddot--)........
[56.25%] ··· benchmarks.DDot.time_ddot                                                                                                                                                                  ok
[56.25%] ··· ====== ===========
              size             
             ------ -----------
              100    413±0.8ns 
              1000    554±2ns  
             ====== ===========

[62.50%] ··· benchmarks.DSyrk.time_dsyrk                                                                                                                                                                ok
[62.50%] ··· ====== =============
              size               
             ------ -------------
              100     44.2±0.3μs 
              1000   22.7±0.06ms 
             ====== =============

[68.75%] ··· benchmarks.Daxpy.time_daxpy                                                                                                                                                                ok
[68.75%] ··· ====== ===========
              size             
             ------ -----------
              100    551±0.4ns 
              1000    747±4ns  
             ====== ===========

[75.00%] ··· benchmarks.Dgemm.time_dgemm                                                                                                                                                                ok
[75.00%] ··· ====== ============
              size              
             ------ ------------
              100    52.8±0.1μs 
              1000   42.0±0.1ms 
             ====== ============

[81.25%] ··· benchmarks.Dgesdd.time_dgesdd                                                                                                                                                              ok
[81.25%] ··· =========== =============
                (m, n)                
             ----------- -------------
                100, 5    11.4±0.01μs 
              1000, 222    29.0±0.2ms 
             =========== =============

[87.50%] ··· benchmarks.Dgesv.time_dgesv                                                                                                                                                                ok
[87.50%] ··· ====== =============
              size               
             ------ -------------
              100    70.4±0.07μs 
              1000   30.5±0.09ms 
             ====== =============

[93.75%] ··· benchmarks.Dsyev.time_dsyev                                                                                                                                                                ok
[93.75%] ··· ====== =============
              size               
             ------ -------------
               50     312±0.3μs  
              200    11.4±0.02ms 
             ====== =============

[100.00%] ··· benchmarks.Nrm2.time_dnrm2                                                                                                                                                                 ok
[100.00%] ··· ====== ===========
               size             
              ------ -----------
               100    642±0.6ns 
               1000    1.22±0μs 
              ====== ===========

This is on a c7g.large AWS instance, similar to what cirun uses. This sort of output will be available in the CI logs for each build.

martin-frbg · 2024-06-16T07:00:41Z

good to know that "size" is doable (though having just 100 and 1000 and then doing a line plot is a bit counterproductive)

ev-br · 2024-06-16T07:10:05Z

Of course. Both benchmark functions and parameter combinations, both here and in codspeed, are proof-of-concept, and are mostly thrown in to have quick iteration turnover.
So, what would be most useful sizes? Also, am currently only running the double precision real workloads; do we want single precision and/or complex?

martin-frbg · 2024-06-16T07:21:18Z

So, what would be most useful sizes?

That would very much depend on the BLAS function - most show some jitter that probably stems from cache misses at certain sises if one does fine-grained benchmarks (like the ones in the benchmark folder, that default to checking all sizes between 1 and 1000 with a granularity of 1 -probably too expensive to do on AWS all the time)

Also, am currently only running the double precision real workloads; do we want single precision and/or complex?

Ideally yes, as in most cases each precision has its own dedicated kernel (certainly one for real and one for complex numbers). Probably all outside the scope of the milestone though...

ev-br · 2024-06-16T08:26:01Z

Well, life does not stop on a milestone :-). Let's take these to OpenMathLib/BLAS-Benchmarks#1 and OpenMathLib/BLAS-Benchmarks#2.
Luckily, tweaking benchmark parameters is super easy

ev-br · 2024-06-21T09:59:39Z

closing in favor of a cron job over at https://github.com/OpenMathLib/BLAS-Benchmarks/

ev-br force-pushed the cirun_asv branch from 0159f43 to c5fa571 Compare June 14, 2024 13:56

ev-br added 12 commits June 15, 2024 15:54

REVERT: temp remove other workflows

f09031b

CI: run benchmarks on cirun

7e7edd9

WIP: start adding a cirun-asv workflow

412d81a

REVERT: remove cirun-bench workflow

e07e91a

rename

e23410d

.

612c290

..

4b2aa2e

sudo apt manually

c5ab738

python3 -m asv

9f98cc9

install the nightly wheel

e051857

add asv.conf.json

141e422

vendor the asv setup

efae015

ev-br force-pushed the cirun_asv branch from 7dfac7b to 11a33a1 Compare June 15, 2024 12:54

CI: run asv on CI

17d9f2c

ev-br force-pushed the cirun_asv branch from 11a33a1 to 17d9f2c Compare June 15, 2024 13:04

MAINT: install virualenv for asv

058e85d

ev-br added 2 commits June 15, 2024 16:39

asv config

fa1f173

.

a05915b

ev-br mentioned this pull request Jun 16, 2024

create a PAT for the workflows to push OpenMathLib/BLAS-Benchmarks#3

Closed

ev-br mentioned this pull request Jun 21, 2024

benchmark alternative BLAS/LAPACK implementations OpenMathLib/BLAS-Benchmarks#8

Open

ev-br closed this Jun 21, 2024

ev-br mentioned this pull request Jun 21, 2024

WIP: run benchmarks on cirun [skip cirrus] #4721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add an asv-based benchmarks on cirun [skip cirrus] #4751

WIP: add an asv-based benchmarks on cirun [skip cirrus] #4751

ev-br commented Jun 13, 2024

ev-br commented Jun 15, 2024 •

edited

Loading

martin-frbg commented Jun 15, 2024

ev-br commented Jun 15, 2024 •

edited

Loading

ev-br commented Jun 15, 2024

martin-frbg commented Jun 15, 2024

ev-br commented Jun 16, 2024 •

edited

Loading

martin-frbg commented Jun 16, 2024 •

edited

Loading

ev-br commented Jun 16, 2024

martin-frbg commented Jun 16, 2024 •

edited

Loading

ev-br commented Jun 16, 2024

ev-br commented Jun 21, 2024

WIP: add an asv-based benchmarks on cirun [skip cirrus] #4751

WIP: add an asv-based benchmarks on cirun [skip cirrus] #4751

Conversation

ev-br commented Jun 13, 2024

ev-br commented Jun 15, 2024 • edited Loading

martin-frbg commented Jun 15, 2024

ev-br commented Jun 15, 2024 • edited Loading

ev-br commented Jun 15, 2024

martin-frbg commented Jun 15, 2024

ev-br commented Jun 16, 2024 • edited Loading

martin-frbg commented Jun 16, 2024 • edited Loading

ev-br commented Jun 16, 2024

martin-frbg commented Jun 16, 2024 • edited Loading

ev-br commented Jun 16, 2024

ev-br commented Jun 21, 2024

ev-br commented Jun 15, 2024 •

edited

Loading

ev-br commented Jun 15, 2024 •

edited

Loading

ev-br commented Jun 16, 2024 •

edited

Loading

martin-frbg commented Jun 16, 2024 •

edited

Loading

martin-frbg commented Jun 16, 2024 •

edited

Loading