Continuous benchmark tracking

dergoegge commented at 2:00 pm on March 20, 2023: member

It would be beneficial to have continuous tracking of our benchmark tests, because regressions (or unexpected improvements) otherwise go undetected (at least for a while). Afaict currently, the only benefit of our benchmarking tests is to evaluate changes as they are being proposed but imo that only gives us ~50% of the benefit that benchmarks can provide.

I am imagining this to be a separate service (maybe integrated with @DrahtBot) that regularly runs the benchmarks in an environment configured for benchmarking. Regressions could be reported by the service through opening issues or sending emails. Additionally, a website that presents the benchmark data with some pretty graphs would be nice (example from firefox’s infra).

Setting this up in a way that it is easy to replicate would be very beneficial.

maflcko added the label Brainstorming on Mar 20, 2023

maflcko added the label Tests on Mar 20, 2023

maflcko commented at 2:07 pm on March 20, 2023: member

I think @jamesob set something up at one point, but it had to be queried manually, as there were no notifications. Also, I am not sure if it is running at all. See https://codespeed.bitcoinperf.com/timeline/

maflcko added the label Resource usage on Mar 20, 2023

jonatack commented at 2:16 pm on March 20, 2023: member

I proposed to @LarryRuane last week (Thurs/Fri/Sat) to check in with @jamesob about picking up https://bitcoinperf.com/ and checking with @0xB10C about potential cross-fertilization with tracepoints and their dashboards, and potentially hooking it up to the CI or DrahtBot. Also #26957 (comment).

jonatack commented at 2:21 pm on March 20, 2023: member

See also #26957 (comment) by @martinus for one nice way, with an example, to create and share detailed benchmark results.

dergoegge commented at 2:50 pm on March 20, 2023: member

Honestly I think https://codespeed.bitcoinperf.com/ is pretty close to what we want here. It does seem like that hasn’t been running for a while? But getting that running again and adding some kind of notification system is probably all we need.

LarryRuane commented at 3:24 pm on March 20, 2023: contributor

Yes, this would be very valuable. I’d like to attempt to get this going; @dergoegge, would that be okay? I made a related comment last week before I was aware of these websites (which are definitely better than what I suggested).

dergoegge commented at 3:34 pm on March 20, 2023: member

@LarryRuane cool, please do!

epompeii commented at 12:42 pm on April 17, 2023: none

If using https://codespeed.bitcoinperf.com doesn’t work out, I have created a continuous benchmarking for doing exactly this, Bencher: https://github.com/bencherdev/bencher

Bencher tracks changes over time. It can easily be run in CI as a GitHub Action, and it has statistical thresholds to detect deviations.

aureleoules commented at 2:12 pm on December 12, 2023: contributor

I was not aware that this issue existed but I’ve started working on monitoring benchmark results on pull requests on corecheck. For example: https://corecheck.dev/bitcoin/bitcoin/pulls/28674. It is still experimental and I am still working on reducing the noise between runs, but as of today I usually don’t see more than 5-6% difference between identical bench runs.

epompeii commented at 2:18 pm on December 12, 2023: none

@aureleoules that looks really nice!

Would you be interesting in plotting those data over time? If so I can work on ingesting your results into Bencher, similar to how rustls is doing it: https://bencher.dev/perf/rustls-821705769

aureleoules commented at 2:23 pm on December 12, 2023: contributor

Would you be interesting in plotting those data over time?

Yes I plan to display on the homepage the plot of benchmarks and test coverage ratio of master over time!

maflcko commented at 2:24 pm on December 12, 2023: member

Agree that a plot over time would be useful. They were on https://codespeed.bitcoinperf.com/timeline/ , but it hasn’t run for some years now.

epompeii commented at 2:29 pm on December 12, 2023: none

Sounds great!

If you want them to be live updating, you can embed Bencher plots. Just go to the Share button on the Perf Page and copy the Embed Perf Plot Link for the current plot. This is an example of what that could look like.

ThongchaiDonWanon commented at 8:03 pm on December 18, 2023: none

5M: 6.71% $2.59K 20/8
1H: -4.43% $77.7K 354/227 1D: 144% $545K 2.7K/1.7K

0xB10C commented at 10:25 am on April 8, 2024: contributor

If using https://codespeed.bitcoinperf.com doesn’t work out, I have created a continuous benchmarking for doing exactly this, Bencher: https://github.com/bencherdev/bencher

Bencher tracks changes over time. It can easily be run in CI as a GitHub Action, and it has statistical thresholds to detect deviations.

For Bitcoin Core, it would be useful to have an adapter for the nanobench JSON output. To track this, I’ve opened https://github.com/bencherdev/bencher/issues/361.

0xB10C commented at 12:42 pm on April 8, 2024: contributor

I just learned that nanobench is able to fill in an output format template. It might make sense to try that route first.

epompeii commented at 2:45 pm on April 8, 2024: none

For Bitcoin Core, it would be useful to have an adapter for the nanobench JSON output. To track this, I’ve opened https://github.com/bencherdev/bencher/issues/361. @0xB10C I would be more than happy to implement a nanobench JSON output adapter. It is going to take me a couple of weeks or so to get to it though. So you could either:

Use the nanobench output format template to Bencher Metric Format
Implement the adapter in Bencher and open a PR
Wait a few weeks and I’ll take care of it 😃

0xB10C commented at 8:18 am on April 10, 2024: contributor

I’ve been playing around with bencher running the bitcoin_bench bitcoind binary in a GH action as PoC. A sample dashboard is here (however, it takes a while till it loads for me). While my branch needs a bit of cleanup, it works out of the box without modifications to nanobench using a nanobench output template and a custom bencher metric (seconds instead of the default nanoseconds).

epompeii commented at 1:29 pm on April 10, 2024: none

A sample dashboard is here

For others who haven’t created an account yet this is the public perf page. I have also create a tracking issue to make this sort of redirect the default behavior going forward: https://github.com/bencherdev/bencher/issues/364

(however, it takes a while till it loads for me)

Yes, my apologies about the long load times. I’m still trying to figure out design wise how I want to handle displaying reports with a lot of benchmarks 😃 This has prompted me to create a tracking issue for this as well: https://github.com/bencherdev/bencher/issues/363

0xB10C commented at 1:04 pm on April 29, 2024: contributor

I was made aware of https://bencher.dev/learn/engineering/sqlite-performance-tuning/ recently and the dashboard seem to load nearly instantly now! Cool, I have this on my list to further work on it (at some point). My wip branch is here if someone else wants to give this a shot.

Next thing to look into is probably adding instructions as a measurement. Nanobench supports this on Linux, but I’m not sure this is possible in our CI. time might not be an ideal metric to track on a public GitHub runner that might also be running other jobs in parallel and changing it’s hardware over time. After that, probably setting up a master job for Statistical Continuous Benchmarking and a PR job for Relative Continuous Benchmarking a la https://bencher.dev/docs/how-to/track-benchmarks/.

0xB10C commented at 8:57 am on September 12, 2024: contributor

I recently came across @aureleoules’s https://corecheck.dev/benchmarks which tracks execution time, CPU instructions, and CPU cycles of the Bitcoin Core benchmarks.

maflcko commented at 9:07 am on September 12, 2024: member

Yeah, corecheck is nice, but I think it went down after cmake. See also https://github.com/corecheck/corecheck/issues/10

maflcko commented at 9:16 am on September 12, 2024: member

Another metric to add (back) to the benchmarks would be compile memory usage, which is easy to achieve with $(which time) -f "%M;%S;%U" ..., c.f. https://github.com/chaincodelabs/bitcoinperf/blob/0a1d6f54263cca78e645544ab8247d106d5ff59c/runner/sh.py#L110

maflcko commented at 2:05 pm on February 6, 2025: member

corecheck is back up and integrated into the DrahtBot comment, so is anything left to be done here?

dergoegge commented at 9:34 am on February 7, 2025: member

corecheck is back up and integrated into the DrahtBot comment, so is anything left to be done here?

I think the only thing missing is an alert system or even just a better overview over all benchmarks? Seems difficult to spot regressions in here manually: https://corecheck.dev/benchmarks

maflcko commented at 10:07 am on February 7, 2025: member

Good point. I guess there could be a comment (or the summary comment could be edited) to encourage people to check the result link of a pull request if the difference is larger than some percent.

Continuous benchmark tracking #27284