refactor: optimize block index comparisons (1.4-6.8x faster) #33637

pull l0rinc wants to merge 7 commits into bitcoin:master from l0rinc:l0rinc/block_index_comparators changing 12 files +242 −45
  1. l0rinc commented at 1:54 am on October 16, 2025: contributor

    Context

    The comparator is often called either directly or for the sorting of setBlockIndexCandidates, a red-black tree-set containing valid block headers where the comparator is invoked extensively. Besided the sorted set usages, the comparator is used directly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock; and in ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot.

    It was profiled during the investigation of #33618 (comment).

    Testing

    To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.

    Results

    GCC 15.0.1:

    • CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x faster
    • CheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster

    Clang 22.0.0:

    • CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x faster
    • CheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster

    See https://corecheck.dev/bitcoin/bitcoin/pulls/33637 for the CI’s CheckBlockIndex performance.

    Compiler: gcc 15.0.1 b60450fae8 bench: add benchmark to measure CBlockIndexWorkComparator performance

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    9.92 100,772,511.62 0.0% 63.98 35.64 1.795 14.17 1.9% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    109,996.46 9,091.20 0.2% 1,014,421.11 394,979.29 2.568 313,025.11 0.0% 5.50 CheckBlockIndex

    e2e02177ba refactor: inline arith_uint256 comparison operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.48 154,439,491.66 0.0% 31.01 23.25 1.334 7.16 3.8% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    105,754.86 9,455.83 0.1% 913,130.11 379,588.20 2.406 276,692.11 0.0% 5.50 CheckBlockIndex

    85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.37 156,990,488.06 0.0% 28.85 22.87 1.261 8.61 3.2% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    83,803.10 11,932.73 0.0% 743,565.09 300,824.84 2.472 232,646.08 0.0% 5.56 CheckBlockIndex

    deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    1.52 656,674,205.98 0.0% 13.18 5.47 2.410 3.06 0.1% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    67,726.84 14,765.19 0.2% 585,826.07 243,155.90 2.409 181,920.07 0.0% 5.54 CheckBlockIndex

    Compiler: clang 22.0.0

    b60450fae8 bench: add benchmark to measure CBlockIndexWorkComparator performance

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    9.96 100,451,893.46 0.0% 61.28 35.75 1.714 13.62 2.1% 5.52 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    96,878.70 10,322.19 0.1% 802,827.10 347,679.01 2.309 234,823.10 0.0% 5.50 CheckBlockIndex

    e2e02177ba refactor: inline arith_uint256 comparison operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.13 163,183,693.10 0.0% 25.74 22.01 1.170 6.10 4.6% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    86,307.94 11,586.42 1.2% 646,229.09 309,785.33 2.086 195,119.09 0.0% 5.54 CheckBlockIndex

    85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    6.36 157,330,900.16 1.0% 26.20 22.61 1.159 6.55 4.4% 5.53 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    75,031.66 13,327.71 1.0% 650,604.15 266,934.04 2.437 149,967.14 0.0% 5.54 CheckBlockIndex

    deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    1.46 683,414,234.14 0.0% 16.12 5.25 3.067 4.02 0.1% 5.50 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    69,559.70 14,376.14 0.1% 559,208.07 249,654.28 2.240 132,342.07 0.0% 5.51 CheckBlockIndex

    Reproducer

    Run the equivalence tests with:

    0cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
    

    Each commit shows how it changes the relevant benchmarks.

     0for compiler in gcc clang; do \
     1  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
     2  else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
     3  echo "> Compiler: $compiler $COMP_VER" && \
     4  for commit in 70c6244b709fcf6a853ea44b91b9ed219609ac90 516e05cd7d8dbb61b05854c59f5393367e502284 d053b27c06d06cccbc6395b0aa7dc1c4193a6fc3 8ef4ce2cb3ac80cbe58b8cf76e183f9eca6fafe4 703165e6d3b9fbd58b9c31ddce0e961047fc5e71; do \
     5    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
     6    rm -rf build && \
     7    cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
     8    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
     9    for i in 1; do \
    10      build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
    11    done; \
    12  done; \
    13done
    

    Note: something similar was already started in #33334, but this is a broader optimization that doesn’t use the same technique.

  2. DrahtBot added the label Refactoring on Oct 16, 2025
  3. DrahtBot commented at 1:54 am on October 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33637.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK mzumsande
    Approach ACK Raimo33
    Stale ACK laanwj, optout21

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34075 (fees: Introduce Mempool Based Fee Estimation to reduce overestimation by ismaelsadeeq)
    • #33854 (fix assumevalid is ignored during reindex by Eunovo)
    • #33324 (blocks: add -reobfuscate-blocks argument to enable (de)obfuscating existing blocks by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. Raimo33 commented at 9:16 am on October 17, 2025: contributor

    Approach ACK

    I have tested the new block index comparator but I’ll refrain from acking the added benchmarks/tests

  5. in src/bench/blockstorage.cpp:40 in dc7e70e9ac outdated
    35+
    36+        blocks.push_back(std::move(block));
    37+    }
    38+    std::ranges::shuffle(blocks, rng);
    39+
    40+    bench.run([&] {
    


    sipa commented at 3:00 pm on October 17, 2025:
    Mind using bench.batch(n * n).units("cmp") or so here, to put the output units in perspective?

    l0rinc commented at 5:55 am on October 18, 2025:
    Done, updated the commit messages and PR descriptions as well.
  6. l0rinc force-pushed on Oct 18, 2025
  7. l0rinc renamed this:
    refactor: optimize block index comparisons (1.4-7.7x faster)
    refactor: optimize block index comparisons (1.4-6.8x faster)
    on Oct 18, 2025
  8. Christewart commented at 9:34 pm on October 18, 2025: contributor

    I attempted to run the script, not really sure what these results indicate. Just pasting what the results were

    0Darwin Chriss-MacBook-Pro.local 24.6.0 Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:55 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6031 arm64
    
    0Apple clang version 17.0.0 (clang-1700.3.19.1)
    1Target: arm64-apple-darwin24.6.0
    2Thread model: posix
    3InstalledDir: /Library/Developer/CommandLineTools/usr/bin
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                4.97 |      201,237,014.91 |    0.4% |      5.49 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           43,777.21 |           22,842.94 |    0.3% |      5.41 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                1.89 |      529,694,702.31 |    0.1% |      5.50 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           38,303.87 |           26,107.02 |    0.1% |      5.35 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                2.31 |      432,334,219.11 |    0.6% |      5.45 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           33,598.60 |           29,763.15 |    0.2% |      5.32 | `CheckBlockIndex`
    
    0|              ns/cmp |               cmp/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|                0.78 |    1,287,258,159.19 |    0.1% |      5.46 | `CBlockIndexWorkComparator`
    3
    4|               ns/op |                op/s |    err% |     total | benchmark
    5|--------------------:|--------------------:|--------:|----------:|:----------
    6|           31,055.32 |           32,200.61 |    0.1% |      5.30 | `CheckBlockIndex`
    
  9. l0rinc commented at 0:24 am on October 19, 2025: contributor
    Thanks for the measurements @Christewart, this is how your measurements compare to mine (but most importantly how it compares to master):
  10. laanwj approved
  11. laanwj commented at 8:44 am on October 22, 2025: member
    Code review ACK c15d839aefba017e64f14cd9cd2779655352f4b6 i did not run the benchmarks but the code changes look good, using <=> makes sense.
  12. DrahtBot added the label Needs rebase on Oct 27, 2025
  13. l0rinc force-pushed on Oct 28, 2025
  14. l0rinc commented at 11:19 pm on October 28, 2025: contributor
    Rebased after #29640 - only change was adjusting the comment
  15. DrahtBot removed the label Needs rebase on Oct 28, 2025
  16. in src/test/blockchain_tests.cpp:127 in 08d098947a outdated
    122+    auto old_comparator{[](const CBlockIndex* pa, const CBlockIndex* pb) -> bool {
    123+        // First sort by most total work, ...
    124+        if (pa->nChainWork > pb->nChainWork) return false;
    125+        if (pa->nChainWork < pb->nChainWork) return true;
    126+
    127+        // ... then by earliest time received, ...
    


    mzumsande commented at 8:41 pm on October 30, 2025:
    should take the latest comments from the old function after #29640

    l0rinc commented at 8:37 am on October 31, 2025:
    I have already done that, which parts do you think I’m missing? Edit: ah in the tests, good point, let me update it
  17. in src/test/arith_uint256_tests.cpp:281 in 08d098947a outdated
    274@@ -275,8 +275,43 @@ BOOST_AUTO_TEST_CASE( comparison ) // <= >= < >
    275         BOOST_CHECK(! (TmpL < R1L)); BOOST_CHECK(! (R1L > TmpL));
    276     }
    277 
    278-    BOOST_CHECK_LT(ZeroL,
    279-                   OneL);
    280+    BOOST_CHECK_LT(ZeroL, OneL);
    281+}
    282+
    283+BOOST_AUTO_TEST_CASE(comparison_equivalence)
    


    mzumsande commented at 9:13 pm on October 30, 2025:
    commit msg 08d098947aaeae67e3aeef262df1ecdbdbed744a: wway ->way

    l0rinc commented at 10:57 am on October 31, 2025:
    done
  18. in src/test/blockchain_tests.cpp:120 in 08d098947a outdated
    116@@ -117,6 +117,43 @@ BOOST_AUTO_TEST_CASE(num_chain_tx_max)
    117     BOOST_CHECK_EQUAL(block_index.m_chain_tx_count, std::numeric_limits<uint64_t>::max());
    118 }
    119 
    120+BOOST_AUTO_TEST_CASE(cblockindex_comparator_equivalence)
    


    mzumsande commented at 9:15 pm on October 30, 2025:
    this looks like a fuzz test, why not make it an actual fuzz target out of this?

    mzumsande commented at 9:18 pm on October 30, 2025:
    would suggest to add a bit more explanation, e.g. mentioning that old_comparator is a snapshot of an old implementation, so that people who look at this in years understand where it’s coming from.

    l0rinc commented at 10:56 am on October 31, 2025:
    I strongly dislike code comments, I prefer explaining with live code over dead comments - and if people disagree they can always do a blame which instantly reveals the purpose since I also over-explain in commit messages usually. Are the names old_comparator in a cblockindex_comparator_equivalence not enough to make it obvious that? I I have added a comment anyway, let me know if it helps.

    l0rinc commented at 10:56 am on October 31, 2025:
    I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage on - not trivial to maintain (see #33731), it’s definitely not my go-to way to test something. This is just a randomized property based unit test - easy to debug, easy to run and understand. But looking at the tests again, I can make it more deterministic so that it hits all important branches which allows me to reduce the iteration count - code coverage and debugging shows this regularly tests each branch now (green line on left side): Added you as coauthor.

    dergoegge commented at 12:43 pm on November 6, 2025:

    I look at fuzz tests as exploratory tools that are hard to write and run and debug and do code coverage … not trivial to maintain … not my go-to way to test something

    There is a learning curve to writing and using fuzz testing but it is 100% worth getting in to. The maintenance aspect only relates to running them using libFuzzer on macOS.

    On average, coverage guided fuzzing will be magnitudes better at finding bugs than this style of unit test. For something as small as the comparator your approach is probably good enough, although you could just have a fixed list of test cases that enumerate all possibilities (for the old comparator) instead? (maybe you do already, didn’t look)

  19. in src/node/blockstorage.h:90 in a5584716cb
    86@@ -87,12 +87,20 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256
    87 using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>;
    88 
    89 struct CBlockIndexWorkComparator {
    90-    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
    91+    // First sort by most total work (descending), then by earliest activatable time (ascending), then by pointer value (ascending).
    


    mzumsande commented at 9:48 pm on October 30, 2025:

    I find “ascending/descending” confusing here, I would have expected the opposites: “descending work” means most work first, but the strict weak ordering of C++ comparators would place the one with the lower work first.

    Would suggest something like “First compare by work (less first), then by earliest activatable time (higher sequence first), then by pointer value (higher first).”


    l0rinc commented at 10:56 am on October 31, 2025:

    I wanted to use the SQL order-by terminology here

    Ascending order puts smaller values first, where “smaller” is defined in terms of the < operator. Similarly, descending order is determined with the > operator.

    But you’re right, I inverted the terminology, added a dedicated test to check for hard-coded order (first param is increasing, second is decreasing), updated the comments (this is why I find comments a lazy solution, it’s too easy to keep them up-to-date), added you as coauthor.

  20. mzumsande commented at 10:08 pm on October 30, 2025: contributor

    Concept ACK

    Not 100% sure yet about 2dd0f2ced35a268bcab661c671e7c70271cdd91f, seems like inlining gives most of the speedup, whereas the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

  21. l0rinc force-pushed on Oct 31, 2025
  22. l0rinc force-pushed on Oct 31, 2025
  23. l0rinc commented at 1:13 pm on October 31, 2025: contributor

    Pushed, thanks for the review, you had a few good points, added you as coauthor.

    the gain of using the spaceship operator (which comes at the cost of readability) is marginal.

    the spaceship might not be very familiar to us yet, but it’s arguably simpler, having fewer moving parts, and some compilers prefer that over the manual comparisons.

  24. in src/test/arith_uint256_tests.cpp:286 in 2d03d94535 outdated
    283+BOOST_AUTO_TEST_CASE(comparison_equivalence)
    284+{
    285+    struct TestableArithUint256 : arith_uint256 {
    286+        static int compare_original(const TestableArithUint256& a, const TestableArithUint256& b) {
    287+            constexpr int WIDTH = 8; // 256 / 32
    288+            for (int i = WIDTH - 1; i >= 0; i--) {
    


    optout21 commented at 12:31 pm on November 3, 2025:
    nit: --i is preferred

    l0rinc commented at 12:43 pm on November 3, 2025:
    Generally yes, but here I wanted to copy the original code as closely as possible

    l0rinc commented at 12:48 pm on November 3, 2025:
    Pushed anyway, thanks
  25. in src/test/blockchain_tests.cpp:137 in 2d03d94535 outdated
    132+
    133+        // ... then by earliest activatable time, ...
    134+        if (pa->nSequenceId < pb->nSequenceId)
    135+            return false;
    136+        if (pa->nSequenceId > pb->nSequenceId)
    137+                return true;
    


    optout21 commented at 12:35 pm on November 3, 2025:
    nit: indentation seems off (extra indent)

    l0rinc commented at 12:48 pm on November 3, 2025:
    done, thanks
  26. optout21 commented at 12:44 pm on November 3, 2025: contributor

    utACK 5cbb9a40873203ea5a4dd0aa7127c9756da2f607

    An evident, localized micro-optimization in a hot-path. I haven’t run measurements.

  27. DrahtBot requested review from laanwj on Nov 3, 2025
  28. DrahtBot requested review from mzumsande on Nov 3, 2025
  29. l0rinc force-pushed on Nov 3, 2025
  30. optout21 commented at 12:52 pm on November 3, 2025: contributor

    reACK 80bfeaa240611401c1ffa41f6968b1f79b621938 Rebase.

    Prev: reACK 78b43065a1746c19f74fe6380d765d5922e5c541 Rebase, no content changes. reACK 6e9061f8b7ad77df4566990121f5bebc1f9e6e15 utACK 237eec0f7ca095bc60e40ee94ba1a160a7064753 Re-acked (after 7 mins :D ), only minor/formatting changes since last review

  31. maflcko commented at 3:34 pm on November 6, 2025: member

    Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.

    I don’t follow here. The issue is about the additional CWallet::WriteBestBlock overhead? CBlockIndexWorkComparator seems unrelated to the issue, at least I can’t see how it changed the result between 29.x and 30.x.

    Instead of mentioning an unrelated issue, I think it could be better to mention what real-world effect this improvement has. I think that is -checkblockindex (default in regtest) and possibly LoadExternalBlockFile/LoadBlockIndex?

  32. l0rinc commented at 4:22 pm on November 6, 2025: contributor

    CBlockIndexWorkComparator seems unrelated to the issue

    The original issue won’t be fixed since it’s not exactly a bug, but we could still speed up the regression by optimizing another bottleneck. Since you found the etymology confusing, I have removed the details from the description, only mentioned #33618 (comment) for the flame graph that show this being one of the bottlenecks. Not the biggest one, but the one we can easily optimize and clean up.

    I think that is -checkblockindex (default in regtest)

    Yes, this was mentioned in the benchmarking section.

    possibly LoadExternalBlockFile/LoadBlockIndex

    Yes, it’s used explicitly in Chainstate in FindMostWorkChain, PruneBlockIndexCandidates, ActivateBestChain and InvalidateBlock. And ChainstateManager in LoadBlockIndex, CheckBlockIndex, ActivateSnapshot and PopulateAndValidateSnapshot. But the setBlockIndexCandidates usages should already suffice as motivation, I’d say:

    0git grep 'setBlockIndexCandidates.' src/validation.cpp | wc -l 
    1   44
    
  33. in src/node/blockstorage.h:94 in 237eec0f7c outdated
    86@@ -87,12 +87,21 @@ static constexpr uint32_t UNDO_DATA_DISK_OVERHEAD{STORAGE_HEADER_BYTES + uint256
    87 using BlockMap = std::unordered_map<uint256, CBlockIndex, BlockHasher>;
    88 
    89 struct CBlockIndexWorkComparator {
    90-    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const;
    91+    // First sort by most total work (ascending), then by earliest activatable time (descending), then by pointer value (descending).
    92+    // Pointer tiebreak should only happen with blocks loaded from disk, as those share the same id: 0 for blocks on the best chain, 1 for all others.
    93+    bool operator()(const CBlockIndex* pa, const CBlockIndex* pb) const noexcept
    94+    {
    95+        return std::tie(pa->nChainWork, pb->nSequenceId, pb)
    


    l0rinc commented at 10:50 pm on November 12, 2025:

    I thought of eliminating the worst case here (when pa == pb, where it has to do the comparison for every byte) with something like:

    0return pa != pb &&
    1       std::tie(pa->nChainWork, pb->nSequenceId, pb)
    2     < std::tie(pb->nChainWork, pa->nSequenceId, pa);
    

    But the benchmarks indicate that it’s not really faster.

     0// Copyright (c) 2025-present The Bitcoin Core developers
     1// Distributed under the MIT software license, see the accompanying
     2// file COPYING or http://www.opensource.org/licenses/mit-license.php.
     3
     4#include <arith_uint256.h>
     5#include <bench/bench.h>
     6#include <chain.h>
     7#include <node/blockstorage.h>
     8#include <random.h>
     9#include <uint256.h>
    10
    11#include <algorithm>
    12#include <memory>
    13#include <vector>
    14
    15static void CBlockIndexWorkComparator(benchmark::Bench& bench)
    16{
    17    FastRandomContext rng{/*fDeterministic=*/true};
    18
    19    constexpr size_t n{1'000};
    20    std::vector<std::shared_ptr<CBlockIndex>> blocks;
    21    blocks.reserve(n);
    22    for (size_t i{0}; i < n; ++i) {
    23        auto block{std::make_shared<CBlockIndex>()};
    24        block->nChainWork = UintToArith256(rng.rand256());
    25        block->nSequenceId = int32_t(rng.rand32());
    26        if (i % 10 == 1) {
    27            // Have some duplicates
    28            if (rng.randbool()) {
    29                block = blocks.back();
    30            } else {
    31                if (rng.randbool()) block->nChainWork = blocks.back()->nChainWork;
    32                if (rng.randbool()) block->nSequenceId = blocks.back()->nSequenceId;
    33            }
    34        }
    35        blocks.push_back(std::move(block));
    36    }
    37    std::ranges::shuffle(blocks, rng);
    38
    39    const node::CBlockIndexWorkComparator comparator;
    40    bench.batch(n * n).unit("cmp").run([&] {
    41        for (size_t i{0}; i < n; ++i) {
    42            const auto* lhs{blocks[i].get()};
    43            for (size_t j{0}; j < n; ++j) {
    44                const auto* rhs{blocks[j].get()};
    45                const bool result{comparator(lhs, rhs)};
    46                ankerl::nanobench::doNotOptimizeAway(result);
    47            }
    48        }
    49    });
    50}
    51
    52BENCHMARK(CBlockIndexWorkComparator, benchmark::PriorityLevel::HIGH);
    

    b17504b611 refactor: optimize CBlockIndexWorkComparator with std::tie

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    2.13 469,485,340.38 0.2% 15.07 5.09 2.959 4.02 0.1% 11.01 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    90,265.10 11,078.48 0.1% 637,536.00 215,946.18 2.952 178,844.00 0.0% 10.84 CheckBlockIndex

    2b77e567d8 refactor: short-circuit CBlockIndex comparator for equality

    ns/cmp cmp/s err% ins/cmp cyc/cmp IPC bra/cmp miss% total benchmark
    2.60 385,002,212.20 0.0% 18.00 6.22 2.894 5.00 0.0% 11.00 CBlockIndexWorkComparator
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    94,110.53 10,625.80 0.0% 682,454.03 225,242.05 3.030 186,486.01 0.0% 11.00 CheckBlockIndex
  34. DrahtBot added the label Needs rebase on Nov 25, 2025
  35. l0rinc force-pushed on Nov 25, 2025
  36. DrahtBot removed the label Needs rebase on Nov 25, 2025
  37. l0rinc referenced this in commit d42c922e92 on Nov 26, 2025
  38. l0rinc commented at 5:13 pm on January 15, 2026: contributor
    Rebased to fix silent merge conflict.
  39. l0rinc force-pushed on Jan 15, 2026
  40. DrahtBot added the label CI failed on Jan 15, 2026
  41. DrahtBot commented at 7:13 pm on January 15, 2026: contributor

    🚧 At least one of the CI tasks failed. Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21039886920/job/60501264056 LLM reason (✨ experimental): IWYU reported a failure after adjusting includes in src/node/blockstorage.h, causing the CI to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  42. l0rinc force-pushed on Jan 15, 2026
  43. DrahtBot removed the label CI failed on Jan 15, 2026
  44. DrahtBot added the label Needs rebase on Feb 2, 2026
  45. l0rinc commented at 10:44 am on February 2, 2026: contributor
  46. l0rinc force-pushed on Feb 2, 2026
  47. DrahtBot removed the label Needs rebase on Feb 2, 2026
  48. sipa commented at 5:16 pm on February 2, 2026: member
    You should really add a fuzz test here to compare old and new behavior as suggested before; it’s the perfect use case for it, as it avoids the need for putting presumed knowledge about potential bugs in the unit tests you create.
  49. l0rinc force-pushed on Feb 2, 2026
  50. l0rinc commented at 9:46 pm on February 2, 2026: contributor

    You should really add a fuzz test here to compare old and new behavior

    Thanks for the hint, I’ve split the differential fuzzer and kept the simple unit tests to to lock in the expected ordering of both comparators. After a while (if fuzzers can’t find any difference) we should be able to remove the fuzzers while keeping only the unit tests.

  51. maflcko commented at 7:07 am on February 3, 2026: member

    As pointed out earlier, most of the speedup comes from a move-only inline (https://github.com/bitcoin/bitcoin/pull/33637#pullrequestreview-3401558839)?

    I think it would be better to just submit a trivial to review move-only change stand-alone, which wouldn’t need in-depth review and dedicated differential unit/fuzz tests. Or at least the move-only change should be the first change. This way, it is easier to see if the other changes are worth it to review/test at all.

  52. bench: add benchmark to measure `CBlockIndexWorkComparator` performance
    Profiling shows this comparator consumes a significant portion of `CheckBlockIndex()`:
    ... ChainstateManager::CheckBlockIndex()
        ... std::_Rb_tree<...>::find(...)
            ... node::CBlockIndexWorkComparator::operator()(...)
                ... base_uint<256u>::CompareTo(...) const
    
    This commit introduces a benchmark that performs pairwise comparisons on 1000 randomly generated block indices (with some duplicates) to establish baseline performance metrics before further optimization.
    
    |              ns/cmp |               cmp/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                4.35 |      229,642,637.11 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           38,213.19 |           26,168.97 |    0.5% |      1.10 | `CheckBlockIndex`
    70c6244b70
  53. test: add sorting tests for `CBlockIndexWorkComparator` and `arith_uint256`
    This test was added before other changes to create a baseline behavior.
    f682c2be38
  54. move-only: inline `CBlockIndex` comparators to header
    Move `CBlockIndexWorkComparator::operator()` and `CBlockIndexHeightOnlyComparator::operator()` to the header to facilitate inlining.
    
    |              ns/cmp |               cmp/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                2.87 |      348,283,873.30 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           36,534.37 |           27,371.49 |    0.3% |      1.10 | `CheckBlockIndex`
    516e05cd7d
  55. fuzz: add equivalence coverage for comparison refactors
    Add equivalence coverage to validate behavior while optimizing comparison operations.
    The originals were kept exactly as the previous implementations were done.
    
    The `arith_uint256` equivalence check is implemented as the `arith_uint256_comparison_equivalence` fuzz target.
    It compares the refactored `operator<=>` against a snapshot of the old word-by-word comparison for all operators (<, >, <=, >=, ==, !=).
    
    The `CBlockIndexWorkComparator` equivalence check is implemented as the `block_index_work_comparator` fuzz target.
    It compares the refactored comparator against a snapshot of the old comparator logic (chain work, sequence ID, pointer tiebreak).
    
    Add deterministic unit tests to lock in the expected ordering of both comparators.
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    70d3b99351
  56. refactor: optimize `CBlockIndexWorkComparator` with `std::tie`
    Replace manual comparison branches with a single tuple comparison, allowing the compilers to generate more efficient comparison code.
    
    Made them explicitly `noexcept`, which can enable small optimizer wins and aligns with expectations for hot-path comparators used in ordered containers.
    
    |              ns/cmp |               cmp/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                2.00 |      499,008,850.73 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           32,297.54 |           30,962.11 |    0.9% |      1.10 | `CheckBlockIndex`
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    d053b27c06
  57. refactor: inline `arith_uint256` comparison operator
    Remove the `CompareTo` method and inline its logic directly into `operator<=>`, updating related comments.
    This eliminates function call overhead in the hot path during block generation and chain selection.
    
    The comparison algorithm remains unchanged, iterating from most significant to least significant byte, but now returns `std::strong_ordering` directly, removing an extra conversion and enabling better inlining.
    
    |              ns/cmp |               cmp/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                2.18 |      459,159,880.91 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           31,780.27 |           31,466.07 |    0.5% |      1.10 | `CheckBlockIndex`
    8ef4ce2cb3
  58. refactor: optimize `arith_uint256` comparison with spaceship operator
    Replace multiple comparisons with a single C++20 spaceship operator call directly.
    
    |              ns/cmp |               cmp/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |                0.68 |    1,477,110,677.23 |    0.2% |      1.06 | `CBlockIndexWorkComparator`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           27,528.11 |           36,326.51 |    0.2% |      1.10 | `CheckBlockIndex`
    703165e6d3
  59. l0rinc commented at 11:44 am on February 3, 2026: contributor

    As pointed out earlier, most of the speedup comes from a move-only inline

    That’s actually what enables the other ones to shine, but I don’t mind reordering the commits to make it obvious. The biggest optimization is the spaceship (with inline), it’s now the last commit.

      0for commit in 8b54cc2e71b015b651d7dcbfc3033c1f76c37e4e 9304b3b225f4dcc06b007d073035a586e1806951 0473240b4f8d29016778ebd73745c610c2b437e8 d64d9432a06879e57702f8188a764eb3fc8f2751 c2f69e78ba69ee4232c5190a41509e142b039e8b 7cb0c5f1fc70d57277689d114ab2da14af2a0696 df0153a846036f94e51ab70b7c91491b138d3569; do \
      1    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' && \
      2    rm -rfd build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && \
      3    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
      4    for _ in $(seq 3); do \
      5      sleep 5; \
      6      sudo taskpolicy -t 5 -l 5 nice -n -20 ./build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=1000; \
      7    done; \
      8done
      9
     10
     118b54cc2e71 bench: add benchmark to measure `CBlockIndexWorkComparator` performance
     12
     13|              ns/cmp |               cmp/s |    err% |     total | benchmark
     14|--------------------:|--------------------:|--------:|----------:|:----------
     15|                4.41 |      226,584,742.96 |    0.8% |      1.10 | `CBlockIndexWorkComparator`
     16
     17|               ns/op |                op/s |    err% |     total | benchmark
     18|--------------------:|--------------------:|--------:|----------:|:----------
     19|           38,253.37 |           26,141.49 |    0.9% |      1.09 | `CheckBlockIndex`
     20
     21|              ns/cmp |               cmp/s |    err% |     total | benchmark
     22|--------------------:|--------------------:|--------:|----------:|:----------
     23|                4.35 |      229,642,637.11 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
     24
     25|               ns/op |                op/s |    err% |     total | benchmark
     26|--------------------:|--------------------:|--------:|----------:|:----------
     27|           38,213.19 |           26,168.97 |    0.5% |      1.10 | `CheckBlockIndex`
     28
     29|              ns/cmp |               cmp/s |    err% |     total | benchmark
     30|--------------------:|--------------------:|--------:|----------:|:----------
     31|                4.37 |      228,881,057.73 |    0.3% |      1.10 | `CBlockIndexWorkComparator`
     32
     33|               ns/op |                op/s |    err% |     total | benchmark
     34|--------------------:|--------------------:|--------:|----------:|:----------
     35|           38,287.78 |           26,117.99 |    0.4% |      1.10 | `CheckBlockIndex`
     36
     379304b3b225 test: add sorting tests for `CBlockIndexWorkComparator` and `arith_uint256`
     38
     39|              ns/cmp |               cmp/s |    err% |     total | benchmark
     40|--------------------:|--------------------:|--------:|----------:|:----------
     41|                4.39 |      227,961,262.13 |    0.5% |      1.10 | `CBlockIndexWorkComparator`
     42
     43|               ns/op |                op/s |    err% |     total | benchmark
     44|--------------------:|--------------------:|--------:|----------:|:----------
     45|           38,333.93 |           26,086.55 |    0.2% |      1.10 | `CheckBlockIndex`
     46
     47|              ns/cmp |               cmp/s |    err% |     total | benchmark
     48|--------------------:|--------------------:|--------:|----------:|:----------
     49|                4.39 |      227,728,690.97 |    0.7% |      1.11 | `CBlockIndexWorkComparator`
     50
     51|               ns/op |                op/s |    err% |     total | benchmark
     52|--------------------:|--------------------:|--------:|----------:|:----------
     53|           39,671.02 |           25,207.32 |    0.3% |      1.10 | `CheckBlockIndex`
     54
     55|              ns/cmp |               cmp/s |    err% |     total | benchmark
     56|--------------------:|--------------------:|--------:|----------:|:----------
     57|                4.40 |      227,294,352.21 |    0.5% |      1.09 | `CBlockIndexWorkComparator`
     58
     59|               ns/op |                op/s |    err% |     total | benchmark
     60|--------------------:|--------------------:|--------:|----------:|:----------
     61|           38,971.06 |           25,660.07 |    1.1% |      1.10 | `CheckBlockIndex`
     62
     630473240b4f move-only: inline `CBlockIndex` comparators to header
     64
     65|              ns/cmp |               cmp/s |    err% |     total | benchmark
     66|--------------------:|--------------------:|--------:|----------:|:----------
     67|                2.88 |      347,302,616.35 |    0.3% |      1.10 | `CBlockIndexWorkComparator`
     68
     69|               ns/op |                op/s |    err% |     total | benchmark
     70|--------------------:|--------------------:|--------:|----------:|:----------
     71|           36,701.33 |           27,246.97 |    0.3% |      1.10 | `CheckBlockIndex`
     72
     73|              ns/cmp |               cmp/s |    err% |     total | benchmark
     74|--------------------:|--------------------:|--------:|----------:|:----------
     75|                2.87 |      348,283,873.30 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
     76
     77|               ns/op |                op/s |    err% |     total | benchmark
     78|--------------------:|--------------------:|--------:|----------:|:----------
     79|           36,790.73 |           27,180.76 |    0.6% |      1.10 | `CheckBlockIndex`
     80
     81|              ns/cmp |               cmp/s |    err% |     total | benchmark
     82|--------------------:|--------------------:|--------:|----------:|:----------
     83|                2.87 |      348,095,671.92 |    0.2% |      1.09 | `CBlockIndexWorkComparator`
     84
     85|               ns/op |                op/s |    err% |     total | benchmark
     86|--------------------:|--------------------:|--------:|----------:|:----------
     87|           36,534.37 |           27,371.49 |    0.3% |      1.10 | `CheckBlockIndex`
     88
     89d64d9432a0 test/fuzz: add equivalence coverage for comparison refactors
     90
     91|              ns/cmp |               cmp/s |    err% |     total | benchmark
     92|--------------------:|--------------------:|--------:|----------:|:----------
     93|                2.88 |      347,680,941.08 |    0.5% |      1.10 | `CBlockIndexWorkComparator`
     94
     95|               ns/op |                op/s |    err% |     total | benchmark
     96|--------------------:|--------------------:|--------:|----------:|:----------
     97|           37,044.83 |           26,994.32 |    1.1% |      1.08 | `CheckBlockIndex`
     98
     99|              ns/cmp |               cmp/s |    err% |     total | benchmark
    100|--------------------:|--------------------:|--------:|----------:|:----------
    101|                2.90 |      345,339,421.65 |    0.6% |      1.10 | `CBlockIndexWorkComparator`
    102
    103|               ns/op |                op/s |    err% |     total | benchmark
    104|--------------------:|--------------------:|--------:|----------:|:----------
    105|           37,174.31 |           26,900.30 |    1.4% |      1.09 | `CheckBlockIndex`
    106
    107|              ns/cmp |               cmp/s |    err% |     total | benchmark
    108|--------------------:|--------------------:|--------:|----------:|:----------
    109|                2.88 |      346,925,394.89 |    0.3% |      1.10 | `CBlockIndexWorkComparator`
    110
    111|               ns/op |                op/s |    err% |     total | benchmark
    112|--------------------:|--------------------:|--------:|----------:|:----------
    113|           37,287.05 |           26,818.96 |    1.6% |      1.09 | `CheckBlockIndex`
    114
    115c2f69e78ba refactor: optimize `CBlockIndexWorkComparator` with `std::tie`
    116
    117|              ns/cmp |               cmp/s |    err% |     total | benchmark
    118|--------------------:|--------------------:|--------:|----------:|:----------
    119|                2.00 |      499,008,850.73 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    120
    121|               ns/op |                op/s |    err% |     total | benchmark
    122|--------------------:|--------------------:|--------:|----------:|:----------
    123|           33,476.29 |           29,871.88 |    1.0% |      1.10 | `CheckBlockIndex`
    124
    125|              ns/cmp |               cmp/s |    err% |     total | benchmark
    126|--------------------:|--------------------:|--------:|----------:|:----------
    127|                2.03 |      493,360,414.53 |    0.5% |      1.10 | `CBlockIndexWorkComparator`
    128
    129|               ns/op |                op/s |    err% |     total | benchmark
    130|--------------------:|--------------------:|--------:|----------:|:----------
    131|           33,342.34 |           29,991.90 |    0.6% |      1.10 | `CheckBlockIndex`
    132
    133|              ns/cmp |               cmp/s |    err% |     total | benchmark
    134|--------------------:|--------------------:|--------:|----------:|:----------
    135|                2.01 |      498,514,389.19 |    0.5% |      1.10 | `CBlockIndexWorkComparator`
    136
    137|               ns/op |                op/s |    err% |     total | benchmark
    138|--------------------:|--------------------:|--------:|----------:|:----------
    139|           32,297.54 |           30,962.11 |    0.9% |      1.10 | `CheckBlockIndex`
    140
    1417cb0c5f1fc refactor: inline `arith_uint256` comparison operator
    142
    143|              ns/cmp |               cmp/s |    err% |     total | benchmark
    144|--------------------:|--------------------:|--------:|----------:|:----------
    145|                2.18 |      459,159,880.91 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    146
    147|               ns/op |                op/s |    err% |     total | benchmark
    148|--------------------:|--------------------:|--------:|----------:|:----------
    149|           32,006.20 |           31,243.95 |    0.7% |      1.10 | `CheckBlockIndex`
    150
    151|              ns/cmp |               cmp/s |    err% |     total | benchmark
    152|--------------------:|--------------------:|--------:|----------:|:----------
    153|                2.18 |      458,414,231.91 |    0.3% |      1.10 | `CBlockIndexWorkComparator`
    154
    155|               ns/op |                op/s |    err% |     total | benchmark
    156|--------------------:|--------------------:|--------:|----------:|:----------
    157|           31,780.27 |           31,466.07 |    0.5% |      1.10 | `CheckBlockIndex`
    158
    159|              ns/cmp |               cmp/s |    err% |     total | benchmark
    160|--------------------:|--------------------:|--------:|----------:|:----------
    161|                2.19 |      457,429,572.18 |    0.2% |      1.10 | `CBlockIndexWorkComparator`
    162
    163|               ns/op |                op/s |    err% |     total | benchmark
    164|--------------------:|--------------------:|--------:|----------:|:----------
    165|           32,536.14 |           30,735.06 |    1.2% |      1.09 | `CheckBlockIndex`
    166
    167df0153a846 refactor: optimize `arith_uint256` comparison with spaceship operator
    168
    169|              ns/cmp |               cmp/s |    err% |     total | benchmark
    170|--------------------:|--------------------:|--------:|----------:|:----------
    171|                0.68 |    1,473,011,136.06 |    0.5% |      1.06 | `CBlockIndexWorkComparator`
    172
    173|               ns/op |                op/s |    err% |     total | benchmark
    174|--------------------:|--------------------:|--------:|----------:|:----------
    175|           27,528.11 |           36,326.51 |    0.2% |      1.10 | `CheckBlockIndex`
    176
    177|              ns/cmp |               cmp/s |    err% |     total | benchmark
    178|--------------------:|--------------------:|--------:|----------:|:----------
    179|                0.69 |    1,451,170,702.42 |    1.0% |      1.07 | `CBlockIndexWorkComparator`
    180
    181|               ns/op |                op/s |    err% |     total | benchmark
    182|--------------------:|--------------------:|--------:|----------:|:----------
    183|           27,875.27 |           35,874.09 |    0.4% |      1.10 | `CheckBlockIndex`
    184
    185|              ns/cmp |               cmp/s |    err% |     total | benchmark
    186|--------------------:|--------------------:|--------:|----------:|:----------
    187|                0.68 |    1,477,110,677.23 |    0.2% |      1.06 | `CBlockIndexWorkComparator`
    188
    189|               ns/op |                op/s |    err% |     total | benchmark
    190|--------------------:|--------------------:|--------:|----------:|:----------
    191|           29,059.25 |           34,412.45 |    0.7% |      1.12 | `CheckBlockIndex`
    

    Besides moving the inlines earlier, I also added noexcept to operator<=>(const base_uint and split the general unit test (which we should have regardless of the upcoming changes, they set the baseline to document current behavior) and the fuzz tests which probe and compare against the old code. Added the best measurements of 3 from above to the non-test commit messages for reference.

  60. l0rinc force-pushed on Feb 3, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-03 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me