Replace cluster linearization algorithm with SFL #32545

pull sipa wants to merge 25 commits into bitcoin:master from sipa:202505_sfl changing 5 files +1347 −953
  1. sipa commented at 11:24 pm on May 17, 2025: member

    Part of cluster mempool: #30289. Based on #30605.

    This replaces the cluster linearization algorithm introduced in #30126 and #30286 (a combination of LIMO with candidate-set search), with a completely different algorithm: spanning-forest linearization, which appears to have much better performance for hard clusters. See this post for a comparison between various linearization algorithms, and this post for benchmarks comparing them. Replaying historical mempool data on it shows that it can effectively linearize every observed cluster up to 64 transactions optimally within tens of microseconds, though pathological examples can be created which take longer.

    The algorithm is effectively a very specialized version of the simplex algorithm to the problem of finding high-feerate topological subsets of clusters, but modified to find all consecutive such subsets concurrently rather than just the first one. See the post above for how it is related.

    It represents the cluster as partitioned into a set of chunks, each with a spanning tree of its internal dependencies connecting the transactions. Randomized improvements are made by selecting dependencies to add and remove to these spanning trees, merging and splitting chunks, until no more improvements are possible. Like simplex, it does not necessarily make progress in every step, and thus has no upper bound on its runtime, but randomization makes long runtimes very unlikely, and additionally makes it hard to adversarially construct clusters in which the algorithm reliably makes bad choices.

  2. DrahtBot commented at 11:24 pm on May 17, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32545.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK jonatack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32263 (cluster mempool: add TxGraph work controls by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • In SetInfo::operator-= comment: “must be subset” -> “must be a subset” [missing article] No other typos were found.

    drahtbot_id_4_m

  3. sipa force-pushed on May 17, 2025
  4. DrahtBot added the label CI failed on May 17, 2025
  5. DrahtBot commented at 11:40 pm on May 17, 2025: contributor

    🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/42417371062 LLM reason (✨ experimental): The CI failure is due to a build error during the compilation of txgraph.cpp.o.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. sipa force-pushed on May 18, 2025
  7. sipa force-pushed on May 18, 2025
  8. DrahtBot removed the label CI failed on May 18, 2025
  9. sipa force-pushed on May 18, 2025
  10. sipa force-pushed on May 19, 2025
  11. DrahtBot added the label CI failed on May 19, 2025
  12. DrahtBot commented at 3:39 am on May 19, 2025: contributor

    🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/42447412610 LLM reason (✨ experimental): The CI failure is due to a failed CTest test: cluster_linearize_tests.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  13. sipa force-pushed on May 20, 2025
  14. sipa force-pushed on May 20, 2025
  15. DrahtBot removed the label CI failed on May 20, 2025
  16. sipa added the label Mempool on May 20, 2025
  17. in src/cluster_linearize.h:558 in 23072f2b0e outdated
    554@@ -539,492 +555,651 @@ class LinearizationChunking
    555     }
    556 };
    557 
    558-/** Class encapsulating the state needed to find the best remaining ancestor set.
    559+/** Class to represent the internal state of the spanning-forest linearization algorithm.
    


    jonatack commented at 4:09 pm on May 22, 2025:
    Appreciate the excellent doxygen documentation here.
  18. jonatack commented at 4:09 pm on May 22, 2025: member
    Concept ACK
  19. sipa force-pushed on May 23, 2025
  20. sipa force-pushed on May 23, 2025
  21. sipa force-pushed on May 24, 2025
  22. sipa force-pushed on May 25, 2025
  23. sipa force-pushed on May 25, 2025
  24. DrahtBot added the label CI failed on May 25, 2025
  25. DrahtBot commented at 4:44 pm on May 25, 2025: contributor

    🚧 At least one of the CI tasks failed. Task CentOS, depends, gui: https://github.com/bitcoin/bitcoin/runs/42858330826 LLM reason (✨ experimental): The CI failure is due to assertion failures within the cluster_linearize_tests and bench_sanity_check tests.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  26. DrahtBot removed the label CI failed on May 25, 2025
  27. sipa force-pushed on May 25, 2025
  28. sipa force-pushed on May 25, 2025
  29. sipa force-pushed on May 28, 2025
  30. sipa commented at 2:57 pm on May 28, 2025: member

    I’m interested in seeing benchmarks of this on various systems, especially high-end/modern ones (as their performance is likely most predictive of what future hardware will be like).

    0./build_dev_mode/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000
    

    Here are some I have gathered:

    ns/cost cost/s err% total benchmark
    1.68 595,052,476.10 0.4% 11.01 LinearizeBoundedExample0
    1.92 521,264,873.78 0.1% 11.01 LinearizeBoundedExample1
    1.79 557,860,504.16 0.3% 10.58 LinearizeBoundedExample2
    1.89 528,599,597.23 0.1% 10.97 LinearizeBoundedExample3
    1.81 552,507,201.82 0.4% 10.81 LinearizeBoundedExample4
    1.97 508,721,677.18 1.0% 11.05 LinearizeBoundedExample5
    1.89 528,912,231.85 0.4% 11.06 LinearizeBoundedExample6
    1.79 557,221,394.89 0.8% 11.01 LinearizeBoundedExample7
    1.79 557,162,554.42 0.3% 11.05 LinearizeBoundedExample8
    1.84 543,573,897.10 0.3% 10.99 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,576.22 634,428.95 0.4% 10.95 LinearizeOptimallyExample00
    3,010.14 332,209.92 0.3% 10.99 LinearizeOptimallyExample01
    5,871.01 170,328.44 0.4% 11.09 LinearizeOptimallyExample02
    6,321.90 158,180.35 0.6% 10.55 LinearizeOptimallyExample03
    15,081.30 66,307.30 0.4% 10.99 LinearizeOptimallyExample04
    12,227.15 81,785.18 0.2% 11.01 LinearizeOptimallyExample05
    12,703.52 78,718.34 0.4% 11.02 LinearizeOptimallyExample06
    11,519.71 86,807.71 0.3% 11.01 LinearizeOptimallyExample07
    14,768.84 67,710.11 0.5% 10.93 LinearizeOptimallyExample08
    19,655.67 50,875.90 0.3% 11.03 LinearizeOptimallyExample09
    11,776.61 84,914.06 0.1% 10.97 LinearizeOptimallyExample10
    17,396.59 57,482.51 0.5% 10.97 LinearizeOptimallyExample11
    25,232.95 39,630.72 0.3% 11.04 LinearizeOptimallyExample12
    28,941.27 34,552.73 0.2% 10.98 LinearizeOptimallyExample13
    36,964.92 27,052.68 0.7% 11.00 LinearizeOptimallyExample14
    37,103.54 26,951.61 0.3% 11.02 LinearizeOptimallyExample15
    43,747.38 22,858.51 0.3% 10.98 LinearizeOptimallyExample16
    59,132.17 16,911.27 0.8% 11.04 LinearizeOptimallyExample17
    61,824.04 16,174.94 0.2% 10.54 LinearizeOptimallyExample18
    73,274.33 13,647.34 0.6% 10.65 LinearizeOptimallyExample19
    ns/cost cost/s err% total benchmark
    1.31 763,277,399.95 0.0% 10.99 LinearizeBoundedExample0
    1.53 652,463,095.02 0.1% 11.00 LinearizeBoundedExample1
    1.49 672,057,837.39 0.0% 11.00 LinearizeBoundedExample2
    1.66 602,362,799.74 0.0% 10.69 LinearizeBoundedExample3
    1.52 656,146,353.18 0.1% 10.71 LinearizeBoundedExample4
    1.65 607,795,039.45 0.0% 10.76 LinearizeBoundedExample5
    1.59 628,108,657.72 0.0% 10.82 LinearizeBoundedExample6
    1.50 668,384,115.59 0.1% 11.00 LinearizeBoundedExample7
    1.51 663,161,838.88 0.0% 11.01 LinearizeBoundedExample8
    1.53 654,682,223.87 0.1% 10.99 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,209.24 826,967.78 0.3% 10.98 LinearizeOptimallyExample00
    2,393.97 417,716.84 0.0% 11.01 LinearizeOptimallyExample01
    4,764.63 209,879.68 0.0% 11.01 LinearizeOptimallyExample02
    5,067.95 197,318.41 0.1% 11.01 LinearizeOptimallyExample03
    12,716.36 78,638.87 0.0% 11.01 LinearizeOptimallyExample04
    10,231.29 97,739.43 0.0% 11.01 LinearizeOptimallyExample05
    10,649.64 93,899.84 0.0% 11.01 LinearizeOptimallyExample06
    9,564.17 104,556.95 0.1% 11.01 LinearizeOptimallyExample07
    11,884.47 84,143.41 0.3% 11.00 LinearizeOptimallyExample08
    16,436.12 60,841.62 0.1% 11.00 LinearizeOptimallyExample09
    9,937.05 100,633.51 0.1% 11.00 LinearizeOptimallyExample10
    14,575.73 68,607.20 0.2% 11.02 LinearizeOptimallyExample11
    21,355.98 46,825.28 0.1% 11.01 LinearizeOptimallyExample12
    24,439.13 40,917.99 0.0% 11.00 LinearizeOptimallyExample13
    31,407.36 31,839.67 0.1% 11.00 LinearizeOptimallyExample14
    31,366.41 31,881.24 0.1% 11.01 LinearizeOptimallyExample15
    37,182.88 26,894.10 0.1% 11.00 LinearizeOptimallyExample16
    50,528.12 19,790.96 0.3% 11.02 LinearizeOptimallyExample17
    52,714.16 18,970.24 0.1% 11.00 LinearizeOptimallyExample18
    62,357.87 16,036.47 0.1% 10.57 LinearizeOptimallyExample19
    ns/cost cost/s err% total benchmark
    1.53 655,436,526.57 0.7% 1.10 LinearizeBoundedExample0
    1.81 552,194,379.23 0.2% 1.10 LinearizeBoundedExample1
    1.52 656,148,852.84 0.4% 1.09 LinearizeBoundedExample2
    1.75 572,294,606.97 0.7% 1.06 LinearizeBoundedExample3
    1.62 615,593,105.29 0.3% 1.07 LinearizeBoundedExample4
    1.78 561,989,024.83 1.2% 1.08 LinearizeBoundedExample5
    1.69 592,612,447.36 0.2% 1.10 LinearizeBoundedExample6
    1.58 633,450,827.45 0.3% 1.10 LinearizeBoundedExample7
    1.56 640,690,050.49 0.1% 1.10 LinearizeBoundedExample8
    1.59 628,148,004.87 0.2% 1.09 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,499.94 666,693.94 0.4% 1.09 LinearizeOptimallyExample00
    2,884.51 346,679.16 0.2% 1.10 LinearizeOptimallyExample01
    5,815.27 171,961.12 0.1% 1.10 LinearizeOptimallyExample02
    6,097.37 164,005.14 0.2% 1.05 LinearizeOptimallyExample03
    13,502.59 74,059.89 0.1% 1.10 LinearizeOptimallyExample04
    11,447.25 87,357.24 0.1% 1.10 LinearizeOptimallyExample05
    11,639.59 85,913.71 0.1% 1.10 LinearizeOptimallyExample06
    9,557.65 104,628.26 0.1% 1.10 LinearizeOptimallyExample07
    12,139.84 82,373.43 0.1% 1.10 LinearizeOptimallyExample08
    18,237.62 54,831.73 0.1% 1.10 LinearizeOptimallyExample09
    11,806.18 84,701.39 0.1% 1.10 LinearizeOptimallyExample10
    17,519.95 57,077.80 0.2% 1.09 LinearizeOptimallyExample11
    24,397.16 40,988.37 0.2% 1.10 LinearizeOptimallyExample12
    28,264.97 35,379.48 0.1% 1.10 LinearizeOptimallyExample13
    36,230.91 27,600.74 0.5% 1.10 LinearizeOptimallyExample14
    36,251.16 27,585.33 0.3% 1.10 LinearizeOptimallyExample15
    42,480.07 23,540.45 0.3% 1.10 LinearizeOptimallyExample16
    59,497.62 16,807.39 0.4% 1.10 LinearizeOptimallyExample17
    62,192.70 16,079.06 0.2% 1.06 LinearizeOptimallyExample18
    73,604.11 13,586.20 0.7% 1.06 LinearizeOptimallyExample19
    ns/cost cost/s err% total benchmark
    4.39 227,695,874.58 0.4% 10.99 LinearizeBoundedExample0
    5.32 187,925,655.80 2.3% 11.09 LinearizeBoundedExample1
    4.96 201,470,069.83 1.6% 11.13 LinearizeBoundedExample2
    5.47 182,878,034.13 1.9% 10.75 LinearizeBoundedExample3
    5.02 199,278,377.58 2.6% 10.73 LinearizeBoundedExample4
    5.69 175,678,783.21 2.7% 11.08 LinearizeBoundedExample5
    5.53 180,751,956.64 4.1% 11.00 LinearizeBoundedExample6
    5.80 172,338,240.84 2.0% 10.81 LinearizeBoundedExample7
    5.24 190,865,678.33 3.1% 10.63 LinearizeBoundedExample8
    5.51 181,564,438.26 2.9% 11.38 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    4,399.38 227,304.78 1.9% 11.18 LinearizeOptimallyExample00
    8,789.91 113,766.77 0.9% 10.84 LinearizeOptimallyExample01
    15,890.75 62,929.71 1.6% 11.07 LinearizeOptimallyExample02
    19,082.82 52,403.17 2.2% 11.00 LinearizeOptimallyExample03
    44,627.25 22,407.83 1.8% 11.20 LinearizeOptimallyExample04
    33,718.11 29,657.66 1.1% 11.03 LinearizeOptimallyExample05
    35,473.62 28,189.96 1.1% 11.05 LinearizeOptimallyExample06
    33,162.95 30,154.13 1.5% 10.88 LinearizeOptimallyExample07
    45,473.12 21,991.01 0.9% 11.12 LinearizeOptimallyExample08
    57,919.95 17,265.21 1.1% 11.04 LinearizeOptimallyExample09
    32,640.19 30,637.08 2.1% 11.16 LinearizeOptimallyExample10
    46,559.58 21,477.86 1.5% 10.98 LinearizeOptimallyExample11
    66,614.50 15,011.75 1.4% 10.55 LinearizeOptimallyExample12
    76,783.09 13,023.70 0.5% 10.74 LinearizeOptimallyExample13
    97,635.10 10,242.22 1.7% 11.00 LinearizeOptimallyExample14
    99,854.80 10,014.54 0.8% 11.00 LinearizeOptimallyExample15
    119,444.66 8,372.08 1.2% 11.11 LinearizeOptimallyExample16
    156,160.25 6,403.68 0.9% 10.98 LinearizeOptimallyExample17
    163,364.57 6,121.28 0.8% 10.90 LinearizeOptimallyExample18
    193,524.10 5,167.32 1.2% 11.14 LinearizeOptimallyExample19
    ns/cost cost/s err% total benchmark
    1.75 571,671,353.08 0.1% 10.99 LinearizeBoundedExample0
    2.05 487,547,934.48 0.1% 11.00 LinearizeBoundedExample1
    1.97 506,752,217.57 0.1% 10.64 LinearizeBoundedExample2
    2.24 447,162,026.45 0.1% 11.00 LinearizeBoundedExample3
    2.08 480,557,495.14 0.1% 11.01 LinearizeBoundedExample4
    2.30 434,505,839.40 0.1% 11.12 LinearizeBoundedExample5
    2.21 451,557,580.61 0.0% 11.00 LinearizeBoundedExample6
    2.25 444,374,372.72 0.0% 11.00 LinearizeBoundedExample7
    2.12 471,017,980.18 0.1% 10.93 LinearizeBoundedExample8
    2.22 450,256,469.71 1.1% 11.07 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,586.71 630,233.45 0.1% 10.98 LinearizeOptimallyExample00
    3,268.23 305,976.43 0.0% 11.00 LinearizeOptimallyExample01
    6,858.84 145,797.34 0.2% 10.54 LinearizeOptimallyExample02
    7,873.51 127,008.11 0.1% 10.73 LinearizeOptimallyExample03
    17,277.23 57,879.64 0.1% 11.00 LinearizeOptimallyExample04
    14,178.66 70,528.55 0.1% 11.00 LinearizeOptimallyExample05
    16,009.26 62,463.84 0.0% 10.99 LinearizeOptimallyExample06
    12,586.44 79,450.57 1.6% 10.87 LinearizeOptimallyExample07
    15,396.54 64,949.64 0.0% 11.00 LinearizeOptimallyExample08
    24,369.54 41,034.82 0.0% 11.00 LinearizeOptimallyExample09
    13,610.19 73,474.38 0.1% 10.99 LinearizeOptimallyExample10
    20,073.90 49,815.92 0.1% 11.01 LinearizeOptimallyExample11
    30,462.32 32,827.44 0.3% 11.12 LinearizeOptimallyExample12
    34,713.56 28,807.18 0.3% 10.93 LinearizeOptimallyExample13
    44,015.70 22,719.17 0.1% 11.00 LinearizeOptimallyExample14
    43,492.08 22,992.69 0.1% 11.00 LinearizeOptimallyExample15
    51,596.62 19,381.12 0.1% 11.01 LinearizeOptimallyExample16
    71,384.39 14,008.67 0.2% 10.69 LinearizeOptimallyExample17
    73,714.84 13,565.79 0.1% 10.69 LinearizeOptimallyExample18
    87,910.27 11,375.24 0.0% 10.82 LinearizeOptimallyExample19
  31. davidgumberg commented at 3:16 am on May 29, 2025: contributor

    (built with Clang)

    AMD Ryzen 9 9950X 16-Core Processor

    0./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000
    
    ns/cost cost/s err% ins/cost cyc/cost IPC bra/cost miss% total benchmark
    1.19 836,855,166.72 0.4% 20.42 5.13 3.984 2.29 2.4% 11.05 LinearizeBoundedExample0
    1.44 695,776,928.37 0.1% 24.33 6.17 3.945 2.74 2.4% 10.99 LinearizeBoundedExample1
    1.17 852,566,861.36 0.2% 18.78 5.03 3.733 2.33 3.2% 11.04 LinearizeBoundedExample2
    1.36 735,968,415.25 0.1% 19.54 5.83 3.349 2.36 4.6% 10.57 LinearizeBoundedExample3
    1.24 805,653,703.91 0.0% 18.58 5.33 3.487 2.27 4.2% 10.56 LinearizeBoundedExample4
    1.34 745,442,570.61 0.1% 20.46 5.76 3.553 2.50 4.1% 10.62 LinearizeBoundedExample5
    1.27 784,639,270.59 0.0% 19.10 5.47 3.490 2.35 4.3% 10.65 LinearizeBoundedExample6
    1.18 847,347,367.90 0.2% 18.45 5.07 3.642 2.23 3.6% 10.70 LinearizeBoundedExample7
    1.19 841,752,012.32 0.2% 18.65 5.10 3.657 2.29 3.5% 10.70 LinearizeBoundedExample8
    1.20 833,486,862.05 0.1% 19.07 5.15 3.703 2.31 3.2% 10.79 LinearizeBoundedExample9
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    1,062.27 941,380.47 0.2% 23,154.64 4,561.62 5.076 2,732.15 1.1% 10.99 LinearizeOptimallyExample00
    2,199.32 454,686.52 0.1% 44,955.91 9,443.49 4.761 5,125.75 1.2% 11.01 LinearizeOptimallyExample01
    4,655.43 214,803.15 0.1% 76,020.42 19,989.80 3.803 8,376.75 2.7% 11.01 LinearizeOptimallyExample02
    4,687.57 213,330.01 0.2% 79,606.35 20,125.95 3.955 8,838.43 2.7% 10.99 LinearizeOptimallyExample03
    11,198.28 89,299.41 0.1% 180,121.05 48,080.45 3.746 21,333.25 2.9% 11.01 LinearizeOptimallyExample04
    8,806.12 113,557.33 0.0% 148,315.36 37,809.30 3.923 17,597.94 2.9% 10.83 LinearizeOptimallyExample05
    8,881.81 112,589.68 0.0% 129,414.56 38,127.56 3.394 15,799.78 4.3% 10.83 LinearizeOptimallyExample06
    7,396.23 135,204.04 0.0% 139,880.41 31,756.73 4.405 16,345.08 2.7% 10.67 LinearizeOptimallyExample07
    9,691.58 103,182.40 0.0% 201,859.77 41,612.38 4.851 23,957.84 1.1% 11.01 LinearizeOptimallyExample08
    13,912.45 71,878.08 0.1% 205,829.89 59,733.52 3.446 25,060.12 4.2% 11.00 LinearizeOptimallyExample09
    9,799.39 102,047.14 0.1% 131,704.66 42,074.45 3.130 13,959.94 5.7% 11.00 LinearizeOptimallyExample10
    14,178.88 70,527.42 0.1% 181,891.98 60,879.32 2.988 18,797.61 5.9% 11.02 LinearizeOptimallyExample11
    19,417.52 51,499.89 0.1% 231,655.76 83,370.63 2.779 26,582.04 6.6% 11.01 LinearizeOptimallyExample12
    21,875.47 45,713.31 0.1% 262,258.87 93,911.73 2.793 30,218.20 6.7% 11.00 LinearizeOptimallyExample13
    28,875.31 34,631.67 0.1% 336,971.73 123,981.98 2.718 38,520.06 6.5% 11.01 LinearizeOptimallyExample14
    27,860.30 35,893.37 0.1% 334,160.29 119,619.93 2.794 37,727.10 6.5% 11.00 LinearizeOptimallyExample15
    33,495.46 29,854.79 0.1% 395,850.86 143,808.78 2.753 45,419.38 6.8% 11.00 LinearizeOptimallyExample16
    46,820.27 21,358.27 0.2% 527,868.16 201,024.15 2.626 60,683.66 7.4% 11.01 LinearizeOptimallyExample17
    47,420.53 21,087.91 0.1% 549,255.39 203,589.67 2.698 62,108.84 7.0% 11.01 LinearizeOptimallyExample18
    56,870.07 17,583.94 0.2% 665,564.77 244,167.60 2.726 75,210.75 6.7% 10.98 LinearizeOptimallyExample19

    AMD Ryzen 9 7900X 12-Core Processor

    0./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000
    
    ns/cost cost/s err% ins/cost cyc/cost IPC bra/cost miss% total benchmark
    1.36 737,410,406.03 0.2% 20.58 6.34 3.247 2.34 2.5% 11.02 LinearizeBoundedExample0
    1.65 606,854,042.04 0.1% 24.57 7.70 3.190 2.76 2.5% 10.99 LinearizeBoundedExample1
    1.31 760,547,140.52 0.1% 18.87 6.14 3.071 2.36 3.4% 11.02 LinearizeBoundedExample2
    1.52 657,532,700.47 0.0% 19.61 7.11 2.758 2.38 4.9% 10.63 LinearizeBoundedExample3
    1.39 718,444,894.17 0.1% 18.63 6.51 2.864 2.30 4.3% 10.63 LinearizeBoundedExample4
    1.52 656,965,976.43 0.1% 20.51 7.11 2.882 2.53 4.1% 10.71 LinearizeBoundedExample5
    1.44 694,175,863.25 0.1% 19.12 6.73 2.839 2.39 4.5% 10.72 LinearizeBoundedExample6
    1.34 745,455,006.56 0.0% 18.49 6.27 2.948 2.27 3.8% 10.81 LinearizeBoundedExample7
    1.35 740,869,763.95 0.2% 18.70 6.31 2.965 2.32 3.7% 10.83 LinearizeBoundedExample8
    1.36 733,610,430.06 0.0% 19.17 6.37 3.009 2.34 3.4% 11.00 LinearizeBoundedExample9
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    1,231.66 811,912.24 0.0% 23,357.46 5,757.88 4.057 2,788.34 1.0% 11.01 LinearizeOptimallyExample00
    2,608.32 383,388.20 0.1% 44,890.34 12,192.83 3.682 5,176.76 1.2% 11.02 LinearizeOptimallyExample01
    5,129.41 194,954.12 0.1% 76,112.07 23,977.96 3.174 8,574.18 2.6% 10.99 LinearizeOptimallyExample02
    5,283.32 189,274.88 0.1% 79,355.44 24,692.88 3.214 8,960.39 2.7% 11.01 LinearizeOptimallyExample03
    12,510.48 79,933.00 0.1% 178,410.73 58,475.18 3.051 21,948.73 3.0% 11.00 LinearizeOptimallyExample04
    9,699.21 103,101.15 0.1% 147,263.70 45,334.12 3.248 18,517.16 2.7% 10.99 LinearizeOptimallyExample05
    9,890.03 101,111.94 0.1% 128,779.01 46,228.98 2.786 16,175.51 4.5% 11.00 LinearizeOptimallyExample06
    8,853.35 112,951.56 0.1% 138,404.58 41,381.10 3.345 16,972.04 2.6% 10.83 LinearizeOptimallyExample07
    11,429.49 87,493.00 0.0% 199,933.47 53,426.08 3.742 24,782.93 1.0% 11.00 LinearizeOptimallyExample08
    15,278.31 65,452.27 0.1% 203,844.43 71,404.74 2.855 25,811.52 4.2% 10.98 LinearizeOptimallyExample09
    10,430.03 95,877.04 0.1% 130,835.87 48,749.71 2.684 14,322.22 5.7% 11.01 LinearizeOptimallyExample10
    14,955.33 66,865.78 0.1% 181,388.73 69,905.18 2.595 19,826.07 5.7% 11.01 LinearizeOptimallyExample11
    20,192.09 49,524.34 0.1% 229,811.63 94,374.34 2.435 27,200.27 6.6% 11.01 LinearizeOptimallyExample12
    23,265.71 42,981.71 0.3% 260,056.43 108,734.94 2.392 30,956.74 6.7% 10.99 LinearizeOptimallyExample13
    29,690.95 33,680.30 0.1% 334,363.41 138,778.90 2.409 39,510.12 6.4% 10.99 LinearizeOptimallyExample14
    29,188.83 34,259.68 0.1% 331,541.94 136,412.59 2.430 38,603.88 6.3% 10.99 LinearizeOptimallyExample15
    35,084.30 28,502.78 0.0% 391,658.97 163,986.36 2.388 46,505.21 6.6% 11.00 LinearizeOptimallyExample16
    47,904.99 20,874.65 0.1% 521,828.04 223,869.99 2.331 62,246.01 7.0% 11.01 LinearizeOptimallyExample17
    49,498.85 20,202.49 0.1% 543,459.92 231,331.22 2.349 63,630.07 6.7% 11.01 LinearizeOptimallyExample18
    59,173.83 16,899.36 0.1% 658,965.47 276,557.48 2.383 77,128.66 6.3% 11.00 LinearizeOptimallyExample19
  32. ismaelsadeeq commented at 1:51 pm on May 29, 2025: member

    I’m interested in seeing benchmarks of this on various systems, especially high-end/modern ones (as their performance is likely most predictive of what future hardware will be like).

    Below are benchmarks on MacBook M2 PRO 2023. Compiled using clang with only build bench cmake option.

    0cmake -B build -DBUILD_BENCH=ON && cmake --build build
    
    0abubakarismail@Abubakars-MacBook-Pro ~/D/W/b/bitcoin ((47bdf8fa))> ./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-ti
    1me=1000
    
    ns/cost cost/s err% total benchmark
    1.85 539,881,102.87 6.1% 10.64 :wavy_dash: LinearizeBoundedExample0 (Unstable with ~171,993.8 iters. Increase minEpochIterations to e.g. 1719938)
    2.16 463,765,560.64 1.9% 11.52 LinearizeBoundedExample1
    1.89 529,370,444.85 3.4% 11.88 LinearizeBoundedExample2
    2.09 478,266,850.10 2.1% 10.69 LinearizeBoundedExample3
    2.00 500,459,378.92 3.4% 10.99 LinearizeBoundedExample4
    2.23 448,896,053.69 4.1% 12.62 LinearizeBoundedExample5
    2.21 453,201,053.94 6.2% 9.74 :wavy_dash: LinearizeBoundedExample6 (Unstable with ~70,645.3 iters. Increase minEpochIterations to e.g. 706453)
    1.88 531,282,530.06 1.7% 11.59 LinearizeBoundedExample7
    1.82 550,292,679.97 1.5% 11.85 LinearizeBoundedExample8
    1.81 553,506,180.26 1.1% 10.68 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,927.91 518,696.06 1.2% 11.39 LinearizeOptimallyExample00
    3,227.99 309,790.35 1.0% 10.99 LinearizeOptimallyExample01
    6,619.50 151,068.72 2.3% 10.17 LinearizeOptimallyExample02
    6,648.74 150,404.54 1.1% 10.67 LinearizeOptimallyExample03
    15,989.38 62,541.52 1.7% 11.11 LinearizeOptimallyExample04
    13,491.31 74,121.78 4.1% 12.15 LinearizeOptimallyExample05
    12,958.96 77,166.66 1.0% 10.63 LinearizeOptimallyExample06
    11,525.25 86,766.02 3.7% 11.77 LinearizeOptimallyExample07
    14,095.38 70,945.21 0.6% 10.80 LinearizeOptimallyExample08
    21,410.11 46,706.90 1.1% 11.46 LinearizeOptimallyExample09
    13,372.29 74,781.50 1.5% 11.06 LinearizeOptimallyExample10
    19,803.65 50,495.75 3.2% 10.79 LinearizeOptimallyExample11
    26,699.86 37,453.38 1.1% 11.15 LinearizeOptimallyExample12
    36,106.95 27,695.50 8.0% 11.03 :wavy_dash: LinearizeOptimallyExample13 (Unstable with ~24,184.5 iters. Increase minEpochIterations to e.g. 241845)
    41,346.30 24,185.96 3.3% 10.49 LinearizeOptimallyExample14
    41,673.85 23,995.86 1.6% 9.89 LinearizeOptimallyExample15
    47,425.14 21,085.87 1.1% 11.07 LinearizeOptimallyExample16
    66,540.78 15,028.38 2.4% 11.26 LinearizeOptimallyExample17
    66,934.73 14,939.93 1.4% 10.54 LinearizeOptimallyExample18
    81,828.99 12,220.61 1.8% 11.44 LinearizeOptimallyExample19

    Almost 10x speedup edit: bench cases changed, so not comparable with master.

  33. sipa commented at 1:54 pm on May 29, 2025: member
    @ismaelsadeeq Not an apples-to-apples comparison I’m afraid, because the benchmark cases are changed in this PR too. Can you run the LinearizeBoundedExample? benchmarks too?
  34. ismaelsadeeq commented at 2:35 pm on May 29, 2025: member

    @ismaelsadeeq Not an apples-to-apples comparison I’m afraid, because the benchmark cases are changed in this PR too

    Oops sorry haven’t take a look, I was hoping I can compare locally myself. I saw a link to the bench results comparing them in the description 👍🏾

    Can you run the LinearizeBoundedExample? benchmarks too?

    Yep I did and updated the bench results. previous one was not pointing to your latest push.

  35. instagibbs commented at 4:28 pm on May 29, 2025: member
    Ran apples to apples out of curiosity. One example ran marginally slower on this PR (but still very fast), many are many multiples faster and some see 500x+ improvements.
  36. gmaxwell commented at 6:47 pm on May 29, 2025: contributor

    @instagibbs I believe the only cases that should be slower with SFL are either very small examples (where their time is irrelevant because its very low per tx compared to bigger clusters) and ones with huge numbers of dependencies. Beyond huge dependencies being generally unrepresentative they’re also expensive to generate because every dependency needs a vin, they also tend to be so slow with both that I guess neither will run to completion in practice.

    A better way to compare across implementations might be some kind of optimization time vs amount of fee required to create a cluster with that dependency geometry (just assuming a 1s/vb feerate or something, not adding up the actual costs in the cluster because another cluster could exist with the same topology but different fees/size). like fee = 16*txn + 41*inputs_consumed + 9*outputs_consumed_within_cluster – so I’d hope to find that SFL always beats the current exponential algorithm on this kind of metric (except for very small inputs).

    The exponential algorithm gets faster with dependency rich clusters because the linearization essentially becomes topologically forced, so the current algorithm doesn’t have many options to consider. If it massive high dependency clusters became common on the network in the future it might make sense to bring back the exponential algorithm for them. :P but I doubt that would happen.

  37. fjahr commented at 8:26 pm on May 30, 2025: contributor

    I’m interested in seeing benchmarks of this on various systems

    $ build/bin/bench_bitcoin –filter=“Linearize.Example.” -min-time=10000

    ns/cost cost/s err% total benchmark
    1.12 893,382,447.85 0.5% 11.06 LinearizeBoundedExample0
    1.34 743,601,759.92 0.7% 11.07 LinearizeBoundedExample1
    1.14 880,692,144.98 0.3% 10.99 LinearizeBoundedExample2
    1.33 749,663,261.00 0.3% 10.98 LinearizeBoundedExample3
    1.24 807,545,779.18 0.5% 10.57 LinearizeBoundedExample4
    1.37 730,046,090.82 0.4% 10.64 LinearizeBoundedExample5
    1.28 779,862,831.19 0.4% 10.66 LinearizeBoundedExample6
    1.17 856,937,081.55 0.4% 10.71 LinearizeBoundedExample7
    1.19 840,624,085.43 0.2% 10.74 LinearizeBoundedExample8
    1.19 840,938,970.51 0.4% 10.74 LinearizeBoundedExample9
    ns/op op/s err% total benchmark
    1,155.37 865,524.35 0.2% 11.00 LinearizeOptimallyExample00
    2,170.00 460,828.59 0.2% 11.04 LinearizeOptimallyExample01
    4,507.21 221,866.90 0.2% 11.01 LinearizeOptimallyExample02
    4,435.85 225,435.87 0.3% 10.96 LinearizeOptimallyExample03
    10,189.93 98,136.12 0.2% 11.02 LinearizeOptimallyExample04
    8,854.57 112,936.09 0.4% 10.79 LinearizeOptimallyExample05
    8,685.08 115,140.05 0.1% 10.82 LinearizeOptimallyExample06
    7,162.17 139,622.57 0.0% 10.64 LinearizeOptimallyExample07
    9,120.88 109,638.49 0.0% 11.00 LinearizeOptimallyExample08
    12,780.73 78,242.79 0.1% 11.00 LinearizeOptimallyExample09
    9,242.70 108,193.53 0.1% 10.98 LinearizeOptimallyExample10
    13,653.81 73,239.62 0.3% 11.04 LinearizeOptimallyExample11
    19,211.80 52,051.34 0.2% 11.00 LinearizeOptimallyExample12
    22,243.75 44,956.45 0.1% 11.00 LinearizeOptimallyExample13
    28,409.81 35,199.11 0.2% 10.99 LinearizeOptimallyExample14
    28,265.23 35,379.16 0.1% 10.99 LinearizeOptimallyExample15
    33,767.58 29,614.20 0.3% 11.02 LinearizeOptimallyExample16
    46,851.80 21,343.90 0.4% 11.03 LinearizeOptimallyExample17
    48,470.35 20,631.17 0.1% 10.98 LinearizeOptimallyExample18
    57,668.81 17,340.40 0.4% 10.98 LinearizeOptimallyExample19
  38. sipa force-pushed on May 31, 2025
  39. sipa commented at 1:24 pm on May 31, 2025: member

    @instagibbs @ismaelsadeeq @gmaxwell FWIW, I didn’t include apples-to-apples comparison because the large-scale comparison benchmarks I did (especially, this one) were just overwhelmingly in favor of SFL over the existing one. The benchmarks were mostly done to decide between SFL and another replacement algorithm, GGT, but see the reasons for choosing SFL here.

    A fair apples-to-apples comparison I think needs to use a dataset large enough to encompass both good and bad cases for each of the algorithms, and not be specifically biased for one of them, which is hard in a benchmark I can include in the PR directly. If people are interested, I can try to clean up the branch I used to create the graphs above, however.

  40. sipa commented at 9:25 pm on May 31, 2025: member

    Thanks for all the benchmarks, here are some graphs created from them to show their relative performance:

    (code & data to generate them is here)

  41. sipa commented at 1:43 am on June 2, 2025: member

    If people are interested in a much larger scale comparison between the old and the new algorithm, checkout https://github.com/sipa/bitcoin/commits/bench_sfl_css, build its bin/bench_bitcoin, and:

    0wget https://bitcoin.sipa.be/clusters/clusters.tgz
    1tar -xzf clusters.tgz
    2for F in clusters_sim2023_large clusters_medium clusters_spanning clusters_bipartite; do time ./build/bin/bench_bitcoin --filter="LinearizeDataSet" <$F >$F.out; done
    3zstd -19 clusters_*.out
    

    And then PR the resulting *.out.zst files to https://github.com/sipa/lin-benches/tree/main/data.

  42. l0rinc commented at 10:21 am on June 4, 2025: contributor

    I’ve run the command on a Raspberry Pi 4 Model B - Cortex-A72 4-Core ARM with SanDisk SSD PLUS 1TB (USB 3.0) with to have data about lower-end hardware as well.

    0git remote add sipa https://github.com/sipa/bitcoin.git || true && git fetch sipa bench_sfl_css && git switch -C bench_sfl_css sipa/bench_sfl_css && \
    1git clean -fxd && git reset --hard && \
    2cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j"$(nproc)" && \
    3wget -q https://bitcoin.sipa.be/clusters/clusters.tgz && tar -xzf clusters.tgz && \
    4for F in clusters_sim2023_large clusters_medium clusters_spanning clusters_bipartite; do \
    5    time build/bin/bench_bitcoin --filter="LinearizeDataSet" <"$F" >"$F.out"; \
    6done && \
    7tar -czf clusters.out.tgz clusters_*.out
    

    It took considerably longer than expected (~13h):

    • clusters_sim2023_large: 11h 39m
    • clusters_medium: 26m 33s
    • clusters_spanning: 19m 37s
    • clusters_bipartite: 37m 17s

    Let’s see if I can attach it here: clusters.out.tgz

  43. sipa commented at 12:44 pm on June 4, 2025: member

    I’ve created a repo for benchmark data, and added my results plus @l0rinc’s: https://github.com/sipa/lin-benches/tree/main/data.

    Also, is this with 32-bit or 64-bit userspace?

  44. l0rinc commented at 1:22 pm on June 4, 2025: contributor

    Also, is this with 32-bit or 64-bit userspace?

    64-bit userspace (AArch64)

    0$ file bitcoind
    1bitcoind: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=7e059ec01f7460042910ca4ed15270382269c9d5, for GNU/Linux 3.7.0, with debug_info, not stripped
    
    0$ getconf LONG_BIT
    164
    2$ dpkg --print-architecture
    3arm64
    4$ uname -m
    5aarch64
    
  45. sipa force-pushed on Jun 12, 2025
  46. sipa commented at 5:28 pm on June 12, 2025: member
    Rebased on top of #30605.
  47. sipa force-pushed on Jun 12, 2025
  48. clusterlin tests: count SimpleCandidateFinder iterations better
    Only count the number of actual new subsets added. If the queue contains
    a work item that completely covers a component, no transaction can be added
    to it without creating a disconnected component. In this case, also don't
    count it as an iteration.
    
    With this, the number of iterations performed by SimpleCandidateFinder is
    bounded by the number of distinct connected topologically-valid subsets of
    the cluster.
    77a432ee70
  49. clusterlin tests: separate testing of Search- and SimpleCandidateFinder
    This separates the existing fuzz test into:
    
    * clusterlin_search_finder: establishes SearchCandidateFinder's correctness using the
                                simpler SimpleCandidateFinder.
    * clusterlin_simple_finder: establishes SimpleCandidateFinder's correctness using the
                                (even) simpler ExhaustiveCandidateFinder.
    
    rather than trying to do both at once.
    a38c38951e
  50. clusterlin tests: make SimpleCandidateFinder always find connected
    Make a small change to guarantee that SimpleCandidateFinder only ever returns
    connected solutions, even when non-optimal. Then test this property.
    10e90f7aef
  51. clusterlin tests: separate testing of SimpleLinearize and Linearize
    The separates the existing fuzz test into:
    
    * clusterlin_linearize: establishes the correctness of Linearize() using the
                            simpler SimpleLinearize() function.
    * clusterlin_simple_linearize: establishes the correctness of SimpleLinearize() by
                            comparing with all valid linearizations computed by
                            std::next_permutation.
    
    rather than combining the first two into a single fuzz test.
    98c1c88b6f
  52. clusterlin tests: optimize clusterlin_simple_linearize
    Whenever a non-topological permutation is encountered, fast forward to the
    last permutation with the same non-topological prefix, skipping over
    potentially many permutations that are non-topological for the same reason.
    
    With that, increase the checking of all permutations to clusters of size 8
    instead of 7.
    6e37824ac3
  53. clusterlin tests: compare with fuzz-provided topological sets 5f92ebee0d
  54. clusterlin tests: compare with fuzz-provided linearizations 94f3e17c33
  55. clusterlin tests: support non-empty ReadTopologicalSubset()
    In several call sites for ReadTopologicalSubset, a non-empty result is
    expected, necessitating a special case at the call site for empty results.
    
    Fix this by adding a bool non_empty argument, which does this special
    casing (more efficiently) inside ReadTopologicalSubset itself.
    da23ecef29
  56. clusterlin tests: verify that chunks are minimal 1fa55a64ed
  57. clusterlin: abstract try-permutations into ExhaustiveLinearize function
    Rather than this exhaustive linearization check happening inline inside
    clusterlin_simple_linearize, abstract it out into a Linearize()-like
    function for clarity.
    
    Note that this isn't exactly a refactor, because the old code would compare the
    found linearization against all (valid) permutations, while the new code instead
    first computes the best linearization from all valid permutations, and then
    compares it with the found one.
    b64e61d2de
  58. clusterlin: add big comment explaning the relation between tests d7fca5c171
  59. clusterlin: add known-correct optimal linearization tests (tests) 763b2a7f1f
  60. clusterlin: replace benchmarks with SFL-hard ones (bench) f1337395ff
  61. clusterlin: replace cluster linearization with SFL implementation (feature)
    This replaces the existing LIMO linearization algorithm (which internally uses
    ancestor set finding and candidate set finding) with the much more performant
    spanning-forest linearization algorithm.
    
    See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419
    2204e48acf
  62. clusterlin: remove unused {Ancestor,Search}CandidateFinder code (cleanup)
    This removes the candidate set finding classes, as well as related tests
    and benchmarks for them.
    2aa8318828
  63. clusterlin: keep track of the active parents of each transaction (optimization)
    This avoids the need for a loop over all parents of a transaction while walking
    a chunk, and removes the need to store the set of parent dependencies explicitly.
    4e713b26ff
  64. clusterlin: split chunks based on maximum gain (optimization)
    This introduces the notion of gain to the SFL algorithm. Given a chunk c,
    an active dependency d in it, and the chunks (t, b) that c would split into
    if d were deactivated, the gain is defined as either (they are equivalent):
    
      (feerate(t) - feerate(b)) * size(t) * size(b)
      fee(t) * size(b) - fee(b) * size(t)
    
    It happens to also be equal to these:
    
      (feerate(t) - feerate(c)) * size(t) * size(c)
      fee(t) * size(c) - fee(c) * size(t)
    
    Its relevance is that this metric is proportional to a lower bound on the area
    under the fee-size diagram which would be gained IF a deactivation of d does not
    result in a self-merge of t and b again.
    
    This commit adds logic to find, within each chunk, the dependency with the highest
    gain. In benchmarks, this appears to be a very good heuristic for deciding which
    splits are worth making.
    68ce630abd
  65. clusterlin: use pre-allocated array for dependencies instead of vector (optimization)
    This reduces the number of allocations required inside the SFL algorithm, and works
    because the number of dependencies per transaction is at most n-1.
    
    To minimize the memory usage from this pre-allocation (which might impact memory locality),
    change the data type of DepIdx from uint32_t to uint8_t or uint16_t when possible.
    20352d1abc
  66. clusterlin: keep child dependencies split into active and inactive (optimization)
    Within the per-transaction child dependency list, keep the active ones before all
    inactive ones. This improves the complexity over iterating over active dependencies
    from O(m) to O(n), as at most n-1 dependencies can be active within any given chunk
    at any given time.
    60228d9666
  67. clusterlin: keep FIFO queue of improvable chunks (optimization)
    This distributes the work over the various chunks fairly, and simultaneously
    avoids retrying chunks over and over again which are already known to be optimal.
    705638299d
  68. clusterlin: randomize merges/splits in SFL (feature) d760ad1ca9
  69. clusterlin: use random split in SFL on every 3rd attempt (feature)
    Out of an abundance of caution that adversarially-constructed clusters might
    reliably result in bad chunk split decisions with the maximum-gain strategy,
    make every third consecutive attempt to split the same chunk use a random
    strategy instead.
    48d392963d
  70. clusterlin: make dependency being active implicit (optimization)
    We do not need to actually keep track of whether a dependency is active
    or not; it is implied by whether or not it appears within the active
    prefix of its parent's child_deps, and its child's parent_deps.
    
    Just remove setting and checking it.
    e911189f20
  71. clusterlin: add more accurate cost modeling (feature)
    This adds a rough estimate of algorithm runtime, so it can be interrupted if no
    solution is found in time. Due to inherent differences between platforms, this
    will not be extremely accurate, but it is preferable over directly measuring
    time for consistency.
    800bd44d58
  72. clusterlin: minimize chunks (feature)
    After the normal optimization process finishes, and finds an optimal
    spanning forest, run a second process (while computation budget remains)
    to split chunks into minimal equal-feerate chunks.
    
    As a side-effect, this also guarantees that the optimal chunk order is
    deterministic.
    505bc96f11
  73. sipa force-pushed on Jun 14, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-19 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me