doc: Demonstrate running benchmarks on each commit #35390

rustaceanrob commented at 5:55 AM on May 27, 2026: member

Comparing the before/after impact of a patch on the results of a benchmark or subset of benchmarks is useful information, especially to catch performance regressions or measure improvements before making a pull request. This is an adaption of the doc comment in productivity.md that explicitly shows how to run benches for each commit since diverging from master.

DrahtBot added the label Scripts and tools on May 27, 2026

DrahtBot commented at 5:55 AM on May 27, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35390.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

DrahtBot added the label CI failed on May 27, 2026

rustaceanrob force-pushed on May 27, 2026

DrahtBot removed the label CI failed on May 27, 2026

stickies-v commented at 10:58 AM on May 27, 2026: contributor

This seems like a useful tool, but I'm worried it'll go stale soon if it's not part of some Bitcoin Core (e.g. CI, release process, ...) So I think it should either be integrated into the project (and this PR evaluated on those merits), or not be in the contrib/ directory to avoid that becoming a repository of stale code? This might warrant an RFC issue imho.

sedited commented at 11:10 AM on May 27, 2026: contributor

Not sure about adding a rust tool to our contrib scripts. This might as well be a dependency-less python script. Does typing really matter here?

in contrib/compare-benches/src/main.rs:87 in 9f537ff10f

  82 | +        .arg(build)
  83 | +        .arg("-S")
  84 | +        .arg(source)
  85 | +        .arg("-DCMAKE_BUILD_TYPE=Release")
  86 | +        .arg("-DBUILD_BENCH=ON")
  87 | +        .arg("-DENABLE_WALLET=OFF")

maflcko commented at 11:27 AM on May 27, 2026:

wallet benchmarks should be enabled?

maflcko commented at 11:27 AM on May 27, 2026: member

Seems fine to add, but I think the only thing this does is transpose the result table and do some pretty printing of the delta.

Also, it only works to compare two commits and wouldn't work in a pull request with more than one commit.

Personally, I'd just prefer a bash one-liner, similar to:

> /tmp/bench.txt && git rebase --interactive HEAD~4 --exec 'cmake --build ./bld -j$(nproc) && git log -1 >> /tmp/bench.txt && ./bld-cmake/bin/bench_bitcoin --filter=.*Hex.* >> /tmp/bench.txt'
cat /tmp/bench.txt

This also has the benefit of not having to hard-code the cmake args.

rustaceanrob commented at 12:30 PM on May 27, 2026: member

Okay, wrote something locally that serves my use case of catching obvious regressions with a shell script. It prints the benchmark result for each commit from HEAD down to a specified base (e.g. master).

The downside of course is that this is not OS-agnostic, but I'm not sure if that matters.

#!/usr/bin/env bash
# Walk every commit from <BASE>..HEAD, build bench_bitcoin, and print results.
# Usage: compare-benches.sh [FILTER] [BASE] [BUILD_DIR]

set -euo pipefail

FILTER="${1:-.*}"
BASE="${2:-master}"
BUILD_DIR="${3:-build}"

BRANCH=$(git symbolic-ref --short HEAD 2>/dev/null || git rev-parse HEAD)
MERGE_BASE=$(git merge-base "$BASE" HEAD)

cleanup() {
    git checkout --quiet "$BRANCH"
}
trap cleanup EXIT

for sha in $(git log --reverse --format="%H" "${MERGE_BASE}..HEAD"); do
    git checkout --quiet --detach "$sha"

    COMMIT=$(git log -1 --pretty="%h  %s")
    printf '\n==> %s\n' "$COMMIT"
    printf '    cmake: building bench_bitcoin...\n'
    cmake --build "$BUILD_DIR" -j"$(nproc)" --target bench_bitcoin &>/dev/null

    "$BUILD_DIR/bin/bench_bitcoin" -filter=".*${FILTER}.*"
done

Example use:

contrib/compare-benches/compare-benches.sh "(MemPool|Orphanage)"

Inspired by the comment of @maflcko above. Are Linux-only scripts useful for contrib? This runs a lower risk of becoming stale but accomplishes what I'm going for.

maflcko commented at 12:59 PM on May 27, 2026: member

Hmm, the Rust code is also hard-coding the bench name without exe, so it won't work on Windows either.

About the Bash script: I don't think we have any human Bash-script reviewer currently, so adding it is currently not possible. Also, it seems redundant with the already documented workflow in:

doc/productivity.md-To execute `cmake --build build && ctest --test-dir build` on every commit since last diverged from master, but without rebasing over an updated master, we can do the following:
doc/productivity.md-```sh
doc/productivity.md:git rebase -i --exec "cmake --build build && ctest --test-dir build" "$(git merge-base master HEAD)"
doc/productivity.md-```

Basically the only difference is that the bench tests are called instead of the unit tests. But the docs give just one example, which is extensible to call any tool, test, or utility (such as signing commits, etc).

Again, it seems fine to add, but it may be more flexible to just rely on the existing docs.

rustaceanrob commented at 1:49 PM on May 27, 2026: member

I would prefer having something explicit like:

git rebase -i --exec "cmake --build build && build/bin/bench_bitcoin -filter='.*MemPool.*'" "$(git merge-base master HEAD)"

mentioned in the benchmarking.md doc to make this more obvious.

I can update this PR with the doc addition and dump the contrib changes as this also does what I intend.

rustaceanrob commented at 1:52 PM on May 27, 2026: member

This seems like a useful tool, but I'm worried it'll go stale soon if it's not part of some Bitcoin Core (e.g. CI, release process, ...)

Tracking regressions or improvements on a regular cadence for each bench seems generally useful to me. Perhaps a start would be to add more pages to the benchcoin website or make a new one with a cheap-ish VPS.

This might warrant an RFC issue imho

Request for comment on scripts/automation related to benchmarking? Or what did you have in mind?

maflcko commented at 2:01 PM on May 27, 2026: member

Tracking regressions or improvements on a regular cadence for each bench seems generally useful to me. Perhaps a start would be to add more pages to the benchcoin website or make a new one with a cheap-ish VPS.

Have you seen the section about corecheck in #35390 (comment) ?

rustaceanrob commented at 2:05 PM on May 27, 2026: member

Neat, way ahead of me.

rustaceanrob force-pushed on May 27, 2026

DrahtBot added the label CI failed on May 27, 2026

rustaceanrob renamed this:
~~contrib: Add `compare-benches` tool~~
doc: Demonstrate running benchmarks on each commit
on May 27, 2026

rustaceanrob commented at 2:35 PM on May 27, 2026: member

Updated to:

    BASE=$(git merge-base master HEAD) && \
    git checkout --detach --quiet "$BASE" && \
    cmake --build build && \
    build/bin/bench_bitcoin -filter='.*MemPool.*' && \
    git checkout --quiet - && \
    git rebase -i --exec "cmake --build build && build/bin/bench_bitcoin -filter='.*MemPool.*'" "$BASE"

which includes the head of master in the benchmark results without inclusion in the rebase

doc: Demonstrate running benchmarks on each commit

Comparing the before/after impact of a patch on the results of a
benchmark or subset of benchmarks is useful information, especially to
catch performance regressions or measure improvements before making a
pull request. This is an adaption of the doc comment in
`productivity.md` that explicitly shows how to run benches for each
commit since diverging from `master`.

d06e0c6e4d

rustaceanrob force-pushed on May 27, 2026

DrahtBot removed the label CI failed on May 27, 2026

in doc/benchmarking.md:49 in d06e0c6e4d

  44 | +    BASE=$(git merge-base master HEAD) && \
  45 | +    git checkout --detach --quiet "$BASE" && \
  46 | +    cmake --build build && \
  47 | +    build/bin/bench_bitcoin -filter='.*MemPool.*' && \
  48 | +    git checkout --quiet - && \
  49 | +    git rebase -i --exec "cmake --build build && build/bin/bench_bitcoin -filter='.*MemPool.*'" "$BASE"

maflcko commented at 7:39 PM on May 28, 2026:

A bit shorter one-liner could be something like: for c in $(git rev-list --reverse $(git merge-base master HEAD)~..HEAD); do git checkout --quiet "$c" && cmake --build build && build/bin/bench_bitcoin -filter='.*MemPool.*' || break; done

However, I think regressions should be checked continuously and automated. Adding something to the docs to say that someone should check this likely won't change anything, because no one will do this manually on all pulls.

rustaceanrob commented at 12:26 PM on May 29, 2026:

How are you thinking this would differ from the report of DrahtBot per PR?

maflcko commented at 11:05 AM on June 2, 2026:

idk, maybe with a stronger integration when a difference is found (e.g. when a new benchmark is added, or an existing one is changed)?

maflcko commented at 3:56 PM on June 25, 2026:

Looks like there is already an issue thread to discuss this: #27284 (comment)

in doc/benchmarking.md:40 in d06e0c6e4d

  36 | @@ -37,6 +37,17 @@ The output will look similar to:
  37 |  ...
  38 |  ```
  39 |  
  40 | +In many cases, comparing benchmarks across commits is useful to see if any

l0rinc commented at 10:58 AM on June 2, 2026:

Not sure I understand why benchmarks are special, we're already expected to do this for tests: check out commit, run compiler/test/bench/linter, check out next commit, run compiler/test/bench/linter? We're already doing interactive rebases to modify commits and running tests on intermediate commits - how is running benchmarks different?

rustaceanrob commented at 11:19 AM on June 2, 2026:

One would have to read productivity.md, extrapolate they can apply the same rebase method with benches, and also run the benches help to find filter. Benchmarks are completely pointless without a comparison, why not demonstrate how to do so in the benchmarking document itself instead of assuming they will read productivity?

l0rinc commented at 11:30 AM on June 2, 2026:

I must be misunderstanding something. My assumption was that we either want to compare two commits in the same PR or the current PR against master. When a reviewer asks for a fix in one of the commits, we already do an interactive rebase to edit that selectively, where we're likely running some tests before continuing the rebase. I don't see how that's any different from running the benchmarks for the intermediate commit - it's what I usually do with optimization PRs, where each commit shows the relevant measurements, e.g. https://github.com/bitcoin/bitcoin/pull/33637/changes/8ef4ce2cb3ac80cbe58b8cf76e183f9eca6fafe4 and https://github.com/bitcoin/bitcoin/pull/33637/changes/703165e6d3b9fbd58b9c31ddce0e961047fc5e71 <img width="895" height="688" alt="image" src="https://github.com/user-attachments/assets/5472f579-0ade-45e6-98bd-90bd2ffed9fc" />

rustaceanrob commented at 2:48 PM on June 2, 2026:

Reviewers may want to run benchmarks on their machine and may not be frequent contributors such that they have done an interactive rebase with exec calls. Again, this may seem obvious to some, but I don't understand the push back in being explicit within the documentation. Right now the docs simply show how to run all the tests at once, which is rarely done in practice and compares to no baseline.

l0rinc commented at 8:36 PM on June 2, 2026:

There's no need to exec anything, just edit the commit like you usually do when amending any previous commit in a PR. Running something while interactive rebase is paused on one commit is no different than running it when you're on the latest commit.

rustaceanrob commented at 8:59 AM on June 3, 2026:

Oh, this command is used to avoid intervention at all. It just prints the two (or more) tables for the filtered benchmarks on each commit you are interested in.

l0rinc commented at 10:41 AM on June 3, 2026:

How is that different from running tests on each commit you are interested in?

rustaceanrob closed this on Jun 25, 2026