refactor: optimize block index comparisons (1.4-6.8x faster) #33637

pull l0rinc wants to merge 5 commits into bitcoin:master from l0rinc:l0rinc/block_index_comparators changing 9 files +186 −47
  1. l0rinc commented at 1:54 am on October 16, 2025: contributor

    Summary

    Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.

    Context

    The comparator is often called directly to compare two separate values and also defines the sorting order for setBlockIndexCandidates, a sorted tree set containing valid block headers where the comparator is invoked extensively.

    Testing

    To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.

    Results

    GCC 15.0.1:

    • CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x faster
    • CheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster

    Clang 22.0.0:

    • CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x faster
    • CheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster

    Compiler: gcc 15.0.1 b60450fae8 bench: add benchmark to measure CBlockIndexWorkComparator performance

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    9.92 100,772,511.62 0.0% 63.98 35.64 1.795 14.17 1.9% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    109,996.46 9,091.20 0.2% 1,014,421.11 394,979.29 2.568 313,025.11 0.0% 5.50 CheckBlockIndex

    e2e02177ba refactor: inline arith_uint256 comparison operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.48 154,439,491.66 0.0% 31.01 23.25 1.334 7.16 3.8% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    105,754.86 9,455.83 0.1% 913,130.11 379,588.20 2.406 276,692.11 0.0% 5.50 CheckBlockIndex

    85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.37 156,990,488.06 0.0% 28.85 22.87 1.261 8.61 3.2% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    83,803.10 11,932.73 0.0% 743,565.09 300,824.84 2.472 232,646.08 0.0% 5.56 CheckBlockIndex

    deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    1.52 656,674,205.98 0.0% 13.18 5.47 2.410 3.06 0.1% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    67,726.84 14,765.19 0.2% 585,826.07 243,155.90 2.409 181,920.07 0.0% 5.54 CheckBlockIndex

    Compiler: clang 22.0.0

    b60450fae8 bench: add benchmark to measure CBlockIndexWorkComparator performance

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    9.96 100,451,893.46 0.0% 61.28 35.75 1.714 13.62 2.1% 5.52 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    96,878.70 10,322.19 0.1% 802,827.10 347,679.01 2.309 234,823.10 0.0% 5.50 CheckBlockIndex

    e2e02177ba refactor: inline arith_uint256 comparison operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.13 163,183,693.10 0.0% 25.74 22.01 1.170 6.10 4.6% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    86,307.94 11,586.42 1.2% 646,229.09 309,785.33 2.086 195,119.09 0.0% 5.54 CheckBlockIndex

    85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.36 157,330,900.16 1.0% 26.20 22.61 1.159 6.55 4.4% 5.53 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    75,031.66 13,327.71 1.0% 650,604.15 266,934.04 2.437 149,967.14 0.0% 5.54 CheckBlockIndex

    deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    1.46 683,414,234.14 0.0% 16.12 5.25 3.067 4.02 0.1% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    69,559.70 14,376.14 0.1% 559,208.07 249,654.28 2.240 132,342.07 0.0% 5.51 CheckBlockIndex

    Reproducer

    Run the equivalence tests with:

    0cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
    

    Each commit shows how it changes the relevant benchmarks.

     0for compiler in gcc clang; do \
     1  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
     2  else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
     3  echo "> Compiler: $compiler $COMP_VER" && \
     4  for commit in b60450fae83970daa9dc2da0706bf126a2f41515 e2e02177ba6f7fac34eda9696dad2e8ecd44e6cd 85b74b01de7e914d07630138eaa78f09a083b85b deb58eea2fdd2712a28aa6b81417087426b19f5b; do \
     5    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
     6    rm -rf build && \
     7    cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
     8    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
     9    for i in 1; do \
    10      build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
    11    done; \
    12  done; \
    13done
    

    Note: something similar was already started in #33334, but this is a broader optimization that doesn’t use the same technique: added as coauthor regardless.

  2. DrahtBot added the label Refactoring on Oct 16, 2025
  3. DrahtBot commented at 1:54 am on October 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK mzumsande
    Approach ACK Raimo33
    Stale ACK laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33300 (fuzz: compact block harness by Crypt-iQ)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • most total work (ascending) -> most total work (descending) [Contradiction: “most” implies descending; the parenthetical “ascending” makes the intent unclear]

    drahtbot_id_5_m

  4. Raimo33 commented at 9:16 am on October 17, 2025: contributor

    Approach ACK

    I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests

  5. in src/bench/blockstorage.cpp:40 in dc7e70e9ac outdated
    35+
    36+        blocks.push_back(std::move(block));
    37+    }
    38+    std::ranges::shuffle(blocks, rng);
    39+
    40+    bench.run([&] {
    


    sipa commented at 3:00 pm on October 17, 2025:
    Mind using bench.batch(n * n).units("cmp") or so here, to put the output units in perspective?

    l0rinc commented at 5:55 am on October 18, 2025:
    Done, updated the commit messages and PR descriptions as well.
  6. l0rinc force-pushed on Oct 18, 2025
  7. l0rinc renamed this:
    refactor: optimize block index comparisons (1.4-7.7x faster)
    refactor: optimize block index comparisons (1.4-6.8x faster)
    on Oct 18, 2025
  8. Christewart commented at 9:34 pm on October 18, 2025: contributor

    I attempted to run the script, not really sure what these results indicate. Just pasting what the results were

    0Darwin Chriss-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64
    
    0Apple clang version 17.0.0 (clang-1700.3.19.1)
    1Target: arm64-apple-darwin24.6.0
    2Thread model: posix
    3InstalledDir: /Library/Developer/CommandLineTools/usr/bin
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                4.97 |      201,237,014.91 |    0.4% |      5.49 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           43,777.21 |           22,842.94 |    0.3% |      5.41 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                1.89 |      529,694,702.31 |    0.1% |      5.50 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           38,303.87 |           26,107.02 |    0.1% |      5.35 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                2.31 |      432,334,219.11 |    0.6% |      5.45 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           33,598.60 |           29,763.15 |    0.2% |      5.32 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.78 |    1,287,258,159.19 |    0.1% |      5.46 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           31,055.32 |           32,200.61 |    0.1% |      5.30 | `CheckBlockIndex`
    
  9. l0rinc commented at 0:24 am on October 19, 2025: contributor
    Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to master):
  10. laanwj approved
  11. laanwj commented at 8:44 am on October 22, 2025: member
    Code review ACK c15d839aefba017e64f14cd9cd2779655352f4b6 i did not run the benchmarks but the code changes look good, using <=> makes sense.
  12. DrahtBot added the label Needs rebase on Oct 27, 2025
  13. l0rinc force-pushed on Oct 28, 2025
  14. l0rinc commented at 11:19 pm on October 28, 2025: contributor
    Rebased after #29640 - only change was adjusting the comment
  15. DrahtBot removed the label Needs rebase on Oct 28, 2025
  16. in src/test/blockchain_tests.cpp:127 in 08d098947a outdated
    122+    auto old_comparator{[](const CBlockIndex* pa, const CBlockIndex* pb) -> bool {
    123+        // First sort by most total work, ...
    124+        if (pa->nChainWork > pb->nChainWork) return false;
    125+        if (pa->nChainWork < pb->nChainWork) return true;
    126+
    127+        // ... then by earliest time received, ...
    


    mzumsande commented at 8:41 pm on October 30, 2025:
    should take the latest comments from the old function after #29640

    l0rinc commented at 8:37 am on October 31, 2025:
    I have already done that, which parts do you think I’m missing? Edit: ah in the tests, good point, let me update it
  17. in src/test/arith_uint256_tests.cpp:281 in 08d098947a outdated
    274@@ -275,8 +275,43 @@ BOOST_AUTO_TEST_CASE( comparison ) // <= >= < >
    275         BOOST_CHECK(! (TmpL < R1L)); BOOST_CHECK(! (R1L > TmpL));
    276     }
    277 
    278-    BOOST_CHECK_LT(ZeroL,
    279-                   OneL);
    280+    BOOST_CHECK_LT(ZeroL, OneL);
    281+}
    282+
    283+BOOST_AUTO_TEST_CASE(comparison_equivalence)
    


    mzumsande commented at 9:13 pm on October 30, 2025:
    commit msg 08d098947aaeae67e3aeef262df1ecdbdbed744a: wway ->way

    l0rinc commented at 10:57 am on October 31, 2025:
    done
  18. in src/test/blockchain_tests.cpp:123 in 08d098947a outdated
    116@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
    117     BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
    118 }
    119 
    120+BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)
    


    mzumsande commented at 9:15 pm on October 30, 2025:
    this looks like a fuzz test, why not make it an actual fuzz target out of this?

    mzumsande commented at 9:18 pm on October 30, 2025:
    would suggest to add a bit more explanation, e.g. mentioning that old_comparator is a snapshot of an old implementation, so that people who look at this in years understand where it’s coming from.

    l0rinc commented at 10:56 am on October 31, 2025:
    I strongly dislike code comments, I prefer explaining with live code over dead comments - and if people disagree they can always do a blame which instantly reveals the purpose since I also over-explain in commit messages usually. Are the names old_comparator in a cblockindex_comparator_equivalence not enough to make it obvious that? I I have added a comment anyway, let me know if it helps.

    l0rinc commented at 10:56 am on October 31, 2025:
    I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage on - not trivial to maintain (see #33731), it’s definitely not my go-to way to test something. This is just a randomized property based unit test - easy to debug, easy to run and understand. But looking at the tests again, I can make it more deterministic so that it hits all important branches which allows me to reduce the iteration count - code coverage and debugging shows this regularly tests each branch now (green line on left side): Added you as coauthor.
  19. in src/node/blockstorage.h:90 in a5584716cb
    86@@ -87,12 +87,20 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256
    87 using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>;
    88 
    89 struct CBlockIndexWorkComparator {
    90-    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
    91+    // First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending).
    


    mzumsande commented at 9:48 pm on October 30, 2025:

    I find “ascending/descending” confusing here, I would have expected the opposites: “descending work” means most work first, but the strict weak ordering of C++ comparators would place the one with the lower work first.

    Would suggest something like “First compare by work (less first), then by earliest activatable time (higher sequence first), then by pointer value (higher first).”


    l0rinc commented at 10:56 am on October 31, 2025:

    I wanted to use the SQL order-by terminology here

    Ascending order puts smaller values first, where “smaller” is defined in terms of the < operator. Similarly, descending order is determined with the > operator.

    But you’re right, I inverted the terminology, added a dedicated test to check for hard-coded order (first param is increasing, second is decreasing), updated the comments (this is why I find comments a lazy solution, it’s too easy to keep them up-to-date), added you as coauthor.

  20. mzumsande commented at 10:08 pm on October 30, 2025: contributor

    Concept ACK

    Not 100% sure yet about 2dd0f2ced35a268bcab661c671e7c70271cdd91f, seems like inlining gives most of the speedup, whereas the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

  21. bench: add benchmark to measure `CBlockIndexWorkComparator` performance
    Profiling shows this comparator consumes a significant portion of `CheckBlockIndex()`:
    ... ChainstateManager::CheckBlockIndex()
        ... std::_Rb_tree<...>::find(...)
            ... node::CBlockIndexWorkComparator::operator()(...)
                ... base_uint<256u>::CompareTo(...) const
    
    This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization.
    
    |              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                9.92 |      100,772,511.62 |    0.0% |           63.98 |           35.64 |  1.795 |          14.17 |    1.9% |      5.50 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |          109,996.46 |            9,091.20 |    0.2% |    1,014,421.11 |      394,979.29 |  2.568 |     313,025.11 |    0.0% |      5.50 | `CheckBlockIndex`
    a46c801405
  22. test: add comparison equivalence tests for upcoming optimizations
    Add equivalence tests to verify behavioral compatibility when optimizing comparison operations.
    These are duplicating behavior for now, but this way the reviewers can validate that they behave the same way before the optimizations.
    
    The `arith_uint256` test verifies that the spaceship operator produces identical results to the original `CompareTo` method for all comparison operators (<, >, <=, >=, ==, !=).
    
    The `CBlockIndexWorkComparator` test captures the current comparison logic in a lambda and verifies that optimized versions maintain identical sorting behavior for chain work, sequence ID, and pointer tiebreaking.
    
    To make sure all paths are taken, we randomly assign previously encountered values.
    
    You can run the tests with:
    > cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    2d03d94535
  23. refactor: inline `arith_uint256` comparison operator
    Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments.
    This eliminates function call overhead in the hot path during block generation and chain selection.
    
    The comparison algorithm remains unchanged, iterating from most significant to least significant byte, but now returns `std::strong_ordering` directly, removing an extra conversion and enabling better inlining.
    
    |              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                6.48 |      154,439,491.66 |    0.0% |           31.01 |           23.25 |  1.334 |           7.16 |    3.8% |      5.50 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |          105,754.86 |            9,455.83 |    0.1% |      913,130.11 |      379,588.20 |  2.406 |     276,692.11 |    0.0% |      5.50 | `CheckBlockIndex`
    ef17dc368c
  24. l0rinc force-pushed on Oct 31, 2025
  25. refactor: optimize `arith_uint256` comparison with spaceship operator
    Replace multiple comparisons with a single C++20 spaceship operator call directly.
    
    |              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                6.37 |      156,990,488.06 |    0.0% |           28.85 |           22.87 |  1.261 |           8.61 |    3.2% |      5.50 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |           83,803.10 |           11,932.73 |    0.0% |      743,565.09 |      300,824.84 |  2.472 |     232,646.08 |    0.0% |      5.56 | `CheckBlockIndex`
    beba9dee73
  26. refactor: optimize `CBlockIndexWorkComparator` with std::tie
    Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code.
    Also, moved it to the header to allow inlining.
    For symmetry, `CBlockIndexHeightOnlyComparator` was also moved to the header.
    
    |              ns/cmp |               cmp/s |    err% |         ins/cmp |         cyc/cmp |    IPC |        bra/cmp |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |                1.52 |      656,674,205.98 |    0.0% |           13.18 |            5.47 |  2.410 |           3.06 |    0.1% |      5.50 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |           67,726.84 |           14,765.19 |    0.2% |      585,826.07 |      243,155.90 |  2.409 |     181,920.07 |    0.0% |      5.54 | `CheckBlockIndex`
    
    Co-authored-by: Raimo33 <claudio.raimondi@protonmail.com>
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    5cbb9a4087
  27. l0rinc force-pushed on Oct 31, 2025
  28. l0rinc commented at 1:13 pm on October 31, 2025: contributor

    Pushed, thanks for the review, you had a few good points, added you as coauthor.

    the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

    the spaceship might not be very familiar to us yet, but it’s arguably simpler, having fewer moving parts, and some compilers prefer that over the manual comparisons.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-02 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me