Summary
Profiling the performance regression in #33618 (comment) revealed that CBlockIndexWorkComparator and its underlying base_uint<256u>::CompareTo are hot paths during block validation, consuming ~4% of CPU time.
Context
The comparator is often called directly to compare two separate values and also defines the sorting order for setBlockIndexCandidates, a sorted tree set containing valid block headers where the comparator is invoked extensively.
Testing
To ensure the optimized implementations are both fast and correct, the first commit adds a dedicated benchmark to measure CBlockIndexWorkComparator performance, and the second commit adds randomized tests comparing the new implementation with the original one.
Results
GCC 15.0.1:
CBlockIndexWorkComparator: 100,772,511 cmp/s → 656,674,205 cmp/s = 6.51x fasterCheckBlockIndex: 9,091 ops/s → 14,765 ops/s = 1.62x faster
Clang 22.0.0:
CBlockIndexWorkComparator: 100,451,893 cmp/s → 683,414,234 cmp/s = 6.8x fasterCheckBlockIndex: 10,322 ops/s → 14,376 ops/s = 1.39x faster
Compiler: gcc 15.0.1 b60450fae8 bench: add benchmark to measure
CBlockIndexWorkComparatorperformance
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 9.92 | 100,772,511.62 | 0.0% | 63.98 | 35.64 | 1.795 | 14.17 | 1.9% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 109,996.46 | 9,091.20 | 0.2% | 1,014,421.11 | 394,979.29 | 2.568 | 313,025.11 | 0.0% | 5.50 | CheckBlockIndex |
e2e02177ba refactor: inline arith_uint256 comparison operator
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 6.48 | 154,439,491.66 | 0.0% | 31.01 | 23.25 | 1.334 | 7.16 | 3.8% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 105,754.86 | 9,455.83 | 0.1% | 913,130.11 | 379,588.20 | 2.406 | 276,692.11 | 0.0% | 5.50 | CheckBlockIndex |
85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 6.37 | 156,990,488.06 | 0.0% | 28.85 | 22.87 | 1.261 | 8.61 | 3.2% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 83,803.10 | 11,932.73 | 0.0% | 743,565.09 | 300,824.84 | 2.472 | 232,646.08 | 0.0% | 5.56 | CheckBlockIndex |
deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 1.52 | 656,674,205.98 | 0.0% | 13.18 | 5.47 | 2.410 | 3.06 | 0.1% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 67,726.84 | 14,765.19 | 0.2% | 585,826.07 | 243,155.90 | 2.409 | 181,920.07 | 0.0% | 5.54 | CheckBlockIndex |
Compiler: clang 22.0.0
b60450fae8 bench: add benchmark to measure CBlockIndexWorkComparator performance
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 9.96 | 100,451,893.46 | 0.0% | 61.28 | 35.75 | 1.714 | 13.62 | 2.1% | 5.52 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 96,878.70 | 10,322.19 | 0.1% | 802,827.10 | 347,679.01 | 2.309 | 234,823.10 | 0.0% | 5.50 | CheckBlockIndex |
e2e02177ba refactor: inline arith_uint256 comparison operator
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 6.13 | 163,183,693.10 | 0.0% | 25.74 | 22.01 | 1.170 | 6.10 | 4.6% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 86,307.94 | 11,586.42 | 1.2% | 646,229.09 | 309,785.33 | 2.086 | 195,119.09 | 0.0% | 5.54 | CheckBlockIndex |
85b74b01de refactor: optimize arith_uint256 comparison with spaceship operator
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 6.36 | 157,330,900.16 | 1.0% | 26.20 | 22.61 | 1.159 | 6.55 | 4.4% | 5.53 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 75,031.66 | 13,327.71 | 1.0% | 650,604.15 | 266,934.04 | 2.437 | 149,967.14 | 0.0% | 5.54 | CheckBlockIndex |
deb58eea2f refactor: optimize CBlockIndexWorkComparator with std::tie
| ns/cmp | cmp/s | err% | ins/cmp | cyc/cmp | IPC | bra/cmp | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 1.46 | 683,414,234.14 | 0.0% | 16.12 | 5.25 | 3.067 | 4.02 | 0.1% | 5.50 | CBlockIndexWorkComparator |
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
|---|---|---|---|---|---|---|---|---|---|
| 69,559.70 | 14,376.14 | 0.1% | 559,208.07 | 249,654.28 | 2.240 | 132,342.07 | 0.0% | 5.51 | CheckBlockIndex |
Reproducer
Run the equivalence tests with:
0cmake -B build && cmake --build build && ./build/bin/test_bitcoin --run_test=arith_uint256_tests,blockchain_tests
Each commit shows how it changes the relevant benchmarks.
0for compiler in gcc clang; do \
1 if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; COMP_VER=$(gcc -dumpfullversion); \
2 else CC=clang; CXX=clang++; COMP_VER=$(clang -dumpversion); fi && \
3 echo "> Compiler: $compiler $COMP_VER" && \
4 for commit in b60450fae83970daa9dc2da0706bf126a2f41515 e2e02177ba6f7fac34eda9696dad2e8ecd44e6cd 85b74b01de7e914d07630138eaa78f09a083b85b deb58eea2fdd2712a28aa6b81417087426b19f5b; do \
5 git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && git log -1 --pretty='%h %s' && \
6 rm -rf build && \
7 cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
8 cmake --build build -j$(nproc) >/dev/null 2>&1 && \
9 for i in 1; do \
10 build/bin/bench_bitcoin -filter='CBlockIndexWorkComparator|CheckBlockIndex' -min-time=5000; \
11 done; \
12 done; \
13done
Note: something similar was already started in #33334, but this is a broader optimization that doesn’t use the same technique: added as coauthor regardless.