Cluster mempool: more accurate cost model for SFL #34138

pull sipa wants to merge 18 commits into bitcoin:master from sipa:202512_sfl_costs changing 4 files +582 −491
  1. sipa commented at 2:57 pm on December 22, 2025: member

    Builds on top of #34023, part of #30289.

    This introduces a more accurate cost model for SFL, to control how much CPU time is spent inside the algorithm for clusters that cannot be linearized perfectly within a reasonable amount of time.

    The goal is having a metric for the amount of work performed, so that txmempool can impose limits on that work: a lower bound that is always performed (unless optimality is reached before that point, of course), and an upper bound to limit the latency and total CPU time spent on this. There are conflicting design goals here:

    • On the one hand, it seems ideal if this metric is closely correlated to actual CPU time, because otherwise the limits become inaccurate.
    • On the other hand, it seems a nightmare to have the metric be platform/system dependent, as it makes network-wide reasoning nearly impossible. It’s expected that slower systems take longer to do the same thing; this holds for everything, and we don’t need to compensate for this.

    There are multiple solutions to this:

    • One extreme is just measuring the time. This is very accurate, but extremely platform dependent, and also non-deterministic due to random scheduling/cache effects.
    • The other extreme is using a very abstract metric like counting how many times certain loops/function inside the algorithm run. That is what is implemented as of #32545 and #34023, just counting the sum of the numbers of transactions updated across all UpdateChunks() calls. It however necessarily fails to account for significant portions of runtime spent elsewhere, resulting in a rather wide range of “ns per cost” values.
    • This PR takes a middle ground, counting many function calls / branches / loops, with weights that were determined through benchmarking on an average on a number of systems.
  2. DrahtBot commented at 2:57 pm on December 22, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34138.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34023 (Optimized SFL cluster linearization by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa renamed this:
    202512 sfl costs
    Cluster mempool: more accurate cost model for SFL
    on Dec 22, 2025
  4. sipa force-pushed on Dec 22, 2025
  5. DrahtBot added the label CI failed on Dec 22, 2025
  6. sipa force-pushed on Dec 22, 2025
  7. DrahtBot removed the label CI failed on Dec 22, 2025
  8. sipa force-pushed on Dec 22, 2025
  9. DrahtBot added the label CI failed on Dec 22, 2025
  10. DrahtBot commented at 10:24 pm on December 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20443977146/job/58743290866 LLM reason (✨ experimental): Lint check failed due to scripted_diff differences in the patch.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  11. sipa force-pushed on Dec 22, 2025
  12. DrahtBot removed the label CI failed on Dec 23, 2025
  13. gmaxwell commented at 7:47 pm on December 24, 2025: contributor

    My thoughts:

    It should use a metric like this, not time, because it would be weird and undesirable for the behavior to change a lot just based on what system you’re on or what you’re doing with it. People expect slower computers to have higher cpu utilization and the intended limits are low enough that I think they won’t be problematic on even quite slow computers.

    An operation based metric based on the performance of any specific computer will likely be much more reflective of the real resourced used on another kind of computer than some simple iteration count– so even if something profiled on a single computer doesn’t reflect yours it beats keeping the iteration based approach.

    One possibility would be to benchmark on several modern architectures like zen4, macpro, and graviton… normalize their coefficients for equal time on some corpus, and then average them. As in the weights for the different measures are just the average of the systems except for some per-system scaling factor of expected average performance difference. (Or in the context of the regression used to fit this, fit a model on the coefficients with Archs-1 additional free scaling factors and then discard the scaling factors)– e.g. minimize ((A*x+B*y+C*z+..)+Arch1*(A*x+B*y+C*z+..)+Arch2*(A*x+B*y+C*z+..)-(Arch0time+Arch1time+Arch2time+…))^2. This wouldn’t reflect the cost on any specific system perfectly but it might give a better estimate overall, across multiple systems. And my argument of “A wrong cost is still better than no cost model” still applies. The downside is that you’d need to measure multiple systems to calibrate it.

    An alternative might be to just use the coefficients for whatever popular modern hardware you think is most resource constrained– better to have the unexpectedly show runs impact the fast computers more. And at least this approach only requires the extensive profiling on one system.

    As far as what transactions calibration should be against, a corpus of historical clusters is probably okay. It would probably make sense to exclude from the corpus easy clusters, e.g. ones with fewer than 10 transactions. You only care if the metric is accurate on times near the limit. I don’t think using artificial cases is that helpful, as the metric will only be accurate ‘on average’ anyways.

    I don’t think the coefficients need to be that well maintained over time except due to future algorithmic optimizations simply because doing anything here is better than just iteration counting. And since it’s desirable to behave mostly consistently (so when someone asks why was some suboptimal cluster picked you can just look and see finding it is usually over the limit, etc), you wouldn’t want to updated them often – probably only when advances in typical computer speeds also justify meaningfully increasing the limit overall such that almost all transactions would get as much or more optimization permitted under the new limits.

  14. sipa force-pushed on Dec 26, 2025
  15. sipa commented at 6:45 pm on December 27, 2025: member

    An alternative might be to just use the coefficients for whatever popular modern hardware you think is most resource constrained– better to have the unexpectedly show runs impact the fast computers more. And at least this approach only requires the extensive profiling on one system.

    Fair enough, but an argument can also be made in favor of using metrics tuned for a relatively high-end system, in that those are likely a better predictor for which machines on the network, in the short to medium term future, will be responsible for actual transaction and block propagation. The numbers are such that any node, even running very low-end hardware, will not have trouble keeping up with mempool/relay, but from a network perspective, it’s a well-connected subset of relatively fast systems that is responsible for relay.

    I don’t think using artificial cases is that helpful, as the metric will only be accurate ‘on average’ anyways.

    Hmm, that does risk situations where adverserially-constructed transactions are significantly (but still probably not more than a small factor, say 2x-3x) more expensive per cost. I think it makes sense to use as wide a variety of clusters, real and synthetic, to calibrate.

  16. gmaxwell commented at 7:51 pm on December 27, 2025: contributor

    Yeah I was trying to capture ‘reflective of the future’ in the word “modern”. Fair point on the relatively fast systems being the ones that count – while slow systems are just “also runs” and it’s sufficient that they just aren’t broken. Fair enough.

    Hmm, that does risk situations where adverserially-constructed transactions are significantly (but still probably not more than a small factor, say 2x-3x) more expensive per cost.

    A concern I had with synthetics is that with artificially hard clusters the costs might end up distorted by ‘hard’ transactions tending to trigger more of an operation and having that cost inflated in the regression though that operation itself is actually fast. It might make sense to have a starting offset based on the number of dependencies or some costs having a multiplier of the number of dependencies in them.

  17. sipa commented at 0:13 am on December 30, 2025: member

    A concern I had with synthetics is that with artificially hard clusters the costs might end up distorted by ‘hard’ transactions tending to trigger more of an operation and having that cost inflated in the regression though that operation itself is actually fast.

    FWIW, the current PR (although outdated now)’s weights were determined by benchmarking small pieces of code (functions, sometimes individual loop executions) and running linear regression on those. I also tried regression that tried to relate overall linearization time with various linear factors (number of times each function/loop is executed), but it turned out to be far too noisy. The result ended up containing negative factors etc.

    So I think that when #34023 is in, I will redo the numbers here with the same approach. For every function or non-trivial branch in the algorithm, benchmark it, and gather a histogram of its runtimes, broken down by loop iterations (e.g. in Activate broken down by #tx affected, in PickMergeChunk broken down by number of chunks visited, …). Then look at the 90th or 99th percentile of those runtimes, and perform linear regression with just one variable, that iteration count.

    The exact composition of the training data set shouldn’t matter really, except to the extent that it has some variety, and not say all clusters with deps=tx-1 or so.

  18. DrahtBot added the label Needs rebase on Jan 6, 2026
  19. sipa force-pushed on Jan 7, 2026
  20. sipa force-pushed on Jan 7, 2026
  21. DrahtBot added the label CI failed on Jan 7, 2026
  22. DrahtBot removed the label Needs rebase on Jan 7, 2026
  23. sipa force-pushed on Jan 12, 2026
  24. sipa force-pushed on Jan 13, 2026
  25. sipa force-pushed on Jan 13, 2026
  26. DrahtBot removed the label CI failed on Jan 13, 2026
  27. DrahtBot added the label Needs rebase on Jan 15, 2026
  28. sipa force-pushed on Jan 15, 2026
  29. DrahtBot removed the label Needs rebase on Jan 15, 2026
  30. sipa commented at 2:36 am on February 2, 2026: member

    I can use some help gathering benchmarks to construct a fair cost model here.

    If you’d like to help:

    • Check out and build https://github.com/sipa/bitcoin/commits/wip_costmodel
    • Run zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=3000 run - >output.log. Feel free to modify the -iters= value, more is better; it should take 10-30 seconds per -iters= depending on hardware. It was ~14 hours on a Ryzen 5950X CPU with -iters=3000.
    • Run python3 digests-fit.py <output.log >result.txt, and post the result.txt output, using the script from https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b.
    • Optionally, upload the full output.log somewhere, but it may be (after compression) 30-50 MiB in size.

    I’m especially interested in weaker and arm64 hardware (the branch above only supports x86, x86_64, arm64).

  31. l0rinc commented at 7:38 pm on February 3, 2026: contributor

    I ran it on a few of my benchmarking servers (3 x86_64 and 3 arm64):

    Prerequisites

    • apt install pip zstd && pip install numpy --break-system-packages
    • sed -i '' 's/operator<(const Centroid& other) noexcept/operator<(const Centroid\& other) const noexcept/' src/util/tdigest.h
    • looks like it’s not working on Windows currently since bitcoin-costmodel.cpp includes common/args.h which pulls in arpa/inet.h
    0git fetch --depth=1 https://github.com/sipa/bitcoin.git wip_costmodel && \
    1git reset --hard 4cecae4469c70ebdb745ac065607bffdcc57132d && \
    2cmake -B build -DCMAKE_BUILD_TYPE=Release && \
    3cmake --build build --target bitcoin-costmodel -j$(nproc) && \
    4curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/b4ab5a03c9e95f74ea7c379061d025b6c8b25b7a/digest_fit.py && \
    5(echo "" && echo "$(date -I) | bitcoin-costmodel | ZSTD $(zstd --version 2>&1 | grep -oP 'v[\d.]+') | GCC $(gcc -dumpfullversion) | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T . | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source . | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" && echo "") && \
    6zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=1000 run - >output.log && \
    7python3 digest_fit.py output.log >result-$(hostname).txt && \
    8cat result-$(hostname).txt
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 7.71
     3Activate: 5.65*p1
     4Deactivate: 5.69*p1
     5GetLinearization: 31.15*p1
     6Initialize: 10.93*p1
     7MakeTopological: 23.32*p2
     8MergeChunks1: 0.40*p1
     9MergeChunks2: 0.84*p1 + 1.66
    10MinimizeStep1: 4.21*p1 + 10.85
    11MinimizeStep2: 10.63*p1
    12PickChunkToOptimize: 0.23*p1
    13PickDependendencyToSplit: 4.09*p1
    14PickMergeCandidate: 4.69*p1
    15StartMinimizing: 7.22*p1 + 48.03
    16StartOptimizing: 5.34*p1 + 5.98
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 18.72
     3Activate: 9.11*p1
     4Deactivate: 8.92*p1 + 6.27
     5GetLinearization: 75.58*p1
     6Initialize: 36.04*p1
     7MakeTopological: 39.71*p2
     8MergeChunks1: 2.25*p1
     9MergeChunks2: 2.56*p1 + 7.41
    10MinimizeStep1: 11.30*p1 + 8.83
    11MinimizeStep2: 6.92*p1 + 16.64
    12PickChunkToOptimize: 2.38*p1 + 4.74
    13PickDependendencyToSplit: 8.95*p1 + 14.74
    14PickMergeCandidate: 6.18*p1 + 0.33
    15StartMinimizing: 17.12*p1 + 141.21
    16StartOptimizing: 11.54*p1 + 52.52
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 19.25
     3Activate: 9.25*p1
     4Deactivate: 10.28*p1
     5GetLinearization: 61.48*p1
     6Initialize: 34.17*p1 + 186.12
     7MakeTopological: 19.80*p1 + 25.32*p2 + 38.10
     8MergeChunks1: 2.23*p1
     9MergeChunks2: 2.54*p1 + 6.99
    10MinimizeStep1: 11.73*p1
    11MinimizeStep2: 26.10*p1
    12PickChunkToOptimize: 1.89*p1 + 6.61
    13PickDependendencyToSplit: 9.05*p1 + 12.30
    14PickMergeCandidate: 6.12*p1 + 1.28
    15StartMinimizing: 15.23*p1 + 103.94
    16StartOptimizing: 11.41*p1 + 20.37
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 17.98
     3Activate: 9.10*p1
     4Deactivate: 8.87*p1 + 7.09
     5GetLinearization: 74.10*p1
     6Initialize: 36.14*p1
     7MakeTopological: 39.55*p2
     8MergeChunks1: 2.27*p1
     9MergeChunks2: 2.56*p1 + 7.99
    10MinimizeStep1: 11.25*p1 + 9.95
    11MinimizeStep2: 6.81*p1 + 17.35
    12PickChunkToOptimize: 2.36*p1 + 5.28
    13PickDependendencyToSplit: 8.95*p1 + 15.29
    14PickMergeCandidate: 6.18*p1 + 1.07
    15StartMinimizing: 17.06*p1 + 142.00
    16StartOptimizing: 11.51*p1 + 56.15
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 23.59
     3Activate: 9.61*p1
     4Deactivate: 10.21*p1
     5GetLinearization: 77.78*p1
     6Initialize: 40.87*p1
     7MakeTopological: 48.42*p2
     8MergeChunks1: 1.57*p1
     9MergeChunks2: 2.54*p1 + 1.81
    10MinimizeStep1: 12.06*p1 + 1.54
    11MinimizeStep2: 26.05*p1
    12PickChunkToOptimize: 1.86*p1
    13PickDependendencyToSplit: 11.96*p1 + 1.31
    14PickMergeCandidate: 11.22*p1
    15StartMinimizing: 19.20*p1 + 82.47
    16StartOptimizing: 12.85*p1 + 36.29
    
     0Processing -
     1Processed 385084 clusters
     2Overhead: 25.26
     3Activate: 9.56*p1
     4Deactivate: 10.16*p1
     5GetLinearization: 77.78*p1
     6Initialize: 40.85*p1
     7MakeTopological: 50.16*p2
     8MergeChunks1: 1.53*p1
     9MergeChunks2: 2.54*p1 + 0.31
    10MinimizeStep1: 12.05*p1
    11MinimizeStep2: 25.52*p1
    12PickChunkToOptimize: 1.90*p1
    13PickDependendencyToSplit: 11.94*p1
    14PickMergeCandidate: 11.17*p1
    15StartMinimizing: 19.24*p1 + 80.55
    16StartOptimizing: 12.85*p1 + 34.45
    

    Let me know if you want to rerun any of these.

  32. l0rinc commented at 1:28 pm on February 4, 2026: contributor

    I reran the processing with

    0hostname && rm digest_fit.py && curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/42bdcd4ec7346c749a6c754a128ba21ba6849639/digest_fit.py && \
    1python3 digest_fit.py output.log
    

     0Overhead: 19.248
     1Activate: 9.476*p1 [2726.989s]
     2Deactivate: 10.470*p1 + 2.291 [2666.019s]
     3GetLinearization: 61.475*p1 [715.603s]
     4Initialize: 34.170*p1 + 186.119 [592.175s]
     5MakeTopological: 19.804*p1 + 25.317*p2 + 38.096 [264.399s]
     6MergeChunks1: 2.228*p1 [341.002s]
     7MergeChunks2: 2.529*p1 + 7.409 [317.839s]
     8MinimizeStep1: 11.734*p1 [2096.186s]
     9MinimizeStep2: 26.096*p1 [195.987s]
    10PickChunkToOptimize: 1.828*p1 + 8.069 [0.012s]
    11PickDependendencyToSplit: 9.036*p1 + 12.789 [1010.246s]
    12PickMergeCandidate: 6.091*p1 + 2.350 [493.177s]
    13StartMinimizing: 15.228*p1 + 103.938 [92.461s]
    14StartOptimizing: 11.401*p1 + 20.701 [42.878s]
    
     0Overhead: 17.983
     1Activate: 9.317*p1 [2733.568s]
     2Deactivate: 8.866*p1 + 16.132 [2390.161s]
     3GetLinearization: 74.096*p1 [854.892s]
     4Initialize: 36.138*p1 [550.421s]
     5MakeTopological: 39.547*p2 [260.227s]
     6MergeChunks1: 2.266*p1 [360.430s]
     7MergeChunks2: 2.552*p1 + 8.364 [322.145s]
     8MinimizeStep1: 11.241*p1 + 10.218 [2098.954s]
     9MinimizeStep2: 6.808*p1 + 17.346 [157.919s]
    10PickChunkToOptimize: 2.284*p1 + 6.890 [0.012s]
    11PickDependendencyToSplit: 8.936*p1 + 15.849 [1009.249s]
    12PickMergeCandidate: 6.154*p1 + 2.129 [501.109s]
    13StartMinimizing: 17.064*p1 + 142.004 [111.884s]
    14StartOptimizing: 11.501*p1 + 56.349 [52.416s]
    
     0Overhead: 18.715
     1Activate: 9.323*p1 [2729.082s]
     2Deactivate: 8.917*p1 + 15.380 [2395.672s]
     3GetLinearization: 75.581*p1 [862.685s]
     4Initialize: 36.040*p1 [548.866s]
     5MakeTopological: 39.709*p2 [261.569s]
     6MergeChunks1: 2.246*p1 [352.307s]
     7MergeChunks2: 2.549*p1 + 7.841 [315.487s]
     8MinimizeStep1: 11.292*p1 + 9.119 [2096.362s]
     9MinimizeStep2: 6.923*p1 + 16.639 [153.148s]
    10PickChunkToOptimize: 2.301*p1 + 6.534 [0.012s]
    11PickDependendencyToSplit: 8.934*p1 + 15.353 [1006.867s]
    12PickMergeCandidate: 6.151*p1 + 1.482 [492.462s]
    13StartMinimizing: 17.121*p1 + 141.209 [111.588s]
    14StartOptimizing: 11.534*p1 + 52.731 [51.534s]
    
     0Overhead: 23.588
     1Activate: 9.843*p1 [2820.254s]
     2Deactivate: 10.383*p1 + 2.969 [2648.347s]
     3GetLinearization: 77.781*p1 [890.546s]
     4Initialize: 40.875*p1 [606.170s]
     5MakeTopological: 48.420*p2 [314.757s]
     6MergeChunks1: 1.574*p1 [203.882s]
     7MergeChunks2: 2.516*p1 + 2.879 [269.665s]
     8MinimizeStep1: 12.048*p1 + 2.183 [2155.578s]
     9MinimizeStep2: 26.165*p1 [212.966s]
    10PickChunkToOptimize: 1.876*p1 [0.003s]
    11PickDependendencyToSplit: 11.942*p1 + 2.219 [1273.421s]
    12PickMergeCandidate: 11.218*p1 [768.275s]
    13StartMinimizing: 19.197*p1 + 82.468 [105.855s]
    14StartOptimizing: 12.839*p1 + 36.759 [51.647s]
    
     0Overhead: 25.257
     1Activate: 9.791*p1 [2792.813s]
     2Deactivate: 10.379*p1 + 0.986 [2628.386s]
     3GetLinearization: 77.780*p1 [890.172s]
     4Initialize: 40.855*p1 [605.782s]
     5MakeTopological: 50.159*p2 [325.778s]
     6MergeChunks1: 1.531*p1 [192.671s]
     7MergeChunks2: 2.510*p1 + 1.593 [256.651s]
     8MinimizeStep1: 12.050*p1 [2135.382s]
     9MinimizeStep2: 25.555*p1 [216.631s]
    10PickChunkToOptimize: 1.924*p1 [0.003s]
    11PickDependendencyToSplit: 11.936*p1 + 0.147 [1264.241s]
    12PickMergeCandidate: 11.166*p1 [747.816s]
    13StartMinimizing: 19.235*p1 + 80.547 [105.245s]
    14StartOptimizing: 12.837*p1 + 35.015 [51.086s]
    
     0Overhead: 7.708
     1Activate: 5.786*p1 [1662.408s]
     2Deactivate: 5.812*p1 + 0.467 [1477.593s]
     3GetLinearization: 31.149*p1 [354.169s]
     4Initialize: 10.925*p1 [164.259s]
     5MakeTopological: 23.315*p2 [151.937s]
     6MergeChunks1: 0.431*p1 + 1.960 [123.429s]
     7MergeChunks2: 0.744*p1 + 6.643 [175.632s]
     8MinimizeStep1: 4.171*p1 + 12.586 [864.344s]
     9MinimizeStep2: 14.948*p1 [149.725s]
    10PickChunkToOptimize: 0.311*p1 + 3.482 [25.053s]
    11PickDependendencyToSplit: 4.094*p1 [435.622s]
    12PickMergeCandidate: 4.696*p1 [409.024s]
    13StartMinimizing: 7.201*p1 + 48.716 [43.655s]
    14StartOptimizing: 5.303*p1 + 7.715 [19.819s]
    
  33. sipa force-pushed on Feb 5, 2026
  34. sipa marked this as ready for review on Feb 5, 2026
  35. sipa commented at 10:00 pm on February 5, 2026: member

    Thank you @l0rinc. I have combined your results with my own benchmarks in https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b, where they’re combined into an “average” machine (after normalization), and then approximated with a simple model:

     0inline void InitializeEnd(int num_txns, int num_deps) noexcept { m_cost += 92 * num_txns; }
     1inline void MakeTopologicalEnd(int num_chunks, int num_steps) noexcept { m_cost += 3 * num_chunks + 38 * num_steps; }
     2inline void StartOptimizingEnd(int num_chunks) noexcept { m_cost += 11 * num_chunks; }
     3inline void ActivateEnd(int num_deps) noexcept { m_cost += 9 * num_deps; }
     4inline void DeactivateEnd(int num_deps) noexcept { m_cost += 9 * num_deps + 6; }
     5inline void MergeChunksMid(int num_txns) noexcept { m_cost += 2 * num_txns; }
     6inline void MergeChunksEnd(int num_steps) noexcept { m_cost += 2 * num_steps + 6; }
     7inline void PickMergeCandidateEnd(int num_steps) noexcept { m_cost += 7 * num_steps + 1; }
     8inline void PickChunkToOptimizeEnd(int num_steps) noexcept { m_cost += num_steps + 3; }
     9inline void PickDependencyToSplitEnd(int num_txns) noexcept { m_cost += 8 * num_txns + 7; }
    10inline void StartMinimizingEnd(int num_chunks) noexcept { m_cost += 15 * num_chunks; }
    11inline void MinimizeStepMid(int num_txns) noexcept { m_cost += 9 * num_txns + 12; }
    12inline void MinimizeStepEnd(bool split) noexcept { m_cost += 19 * split + 5; }
    

    where the total m_cost corresponds to roughly 0.6 ns on Ryzen 5950X, 0.65 ns on M4 Max, 1.1 ns on Rpi5.

    It’s easy to re-run the modelling steps if more benchmarks get added, but I think this is already more than good enough.

  36. sipa force-pushed on Feb 5, 2026
  37. DrahtBot added the label CI failed on Feb 5, 2026
  38. DrahtBot commented at 10:59 pm on February 5, 2026: contributor

    🚧 At least one of the CI tasks failed. Task fuzzer,address,undefined,integer: https://github.com/bitcoin/bitcoin/actions/runs/21729773334/job/62681164680 LLM reason (✨ experimental): Fuzzing target crashed with a deadly signal during libFuzzer run, causing the CI build to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  39. DrahtBot removed the label CI failed on Feb 6, 2026
  40. DrahtBot added the label Needs rebase on Feb 11, 2026
  41. sipa force-pushed on Feb 12, 2026
  42. clusterlin: split tx/chunk dep counting (preparation)
    This splits the chunk_deps variable in LoadLinearization in two, one for
    tracking tx dependencies and one for chunk dependencies. This is a
    preparation for a later commit, where chunks won't be identified anymore
    by a presentative transaction in them, but by a separate index. With
    that, it seems weird to keep them both in the same structure if they
    will be indexed in an incompatible way.
    
    Note that the changes in src/test/util/cluster_linearize.h to the table
    of worst observed iteration counts are due to switching to a different
    data set, and are unrelated to the changes in this commit.
    89c36576e2
  43. clusterlin: count chunk deps without loop (optimization)
    This small optimization avoids the need to loop over the parents of each
    transaction when initializing the dependency-counting structures inside
    GetLinearization().
    f34ec142af
  44. scripted-diff: rename _rep -> _idx in SFL
    This is a preparation for the next commit, where chunks will no longer
    be identified using a representative transaction, but using a set index.
    Reduce the load of line changes by doing this rename ahead of time.
    
    -BEGIN VERIFY SCRIPT-
    sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
    -END VERIFY SCRIPT-
    2c6fde72a4
  45. clusterlin: get rid of DepData, reuse sets (optimization)
    This significantly changes the data structures used in SFL, based on the
    observation that the DepData::top_setinfo fields are quite wasteful:
    there is one per dependency (up to n^2/4), but we only ever need one per
    active dependency (of which there at most n-1). In total, the number of
    chunks plus the number of active dependencies is always exactly equal to
    the number of transactions, so it makes sense to have a shared pool of
    SetInfos, which are used for both chunks and top sets.
    
    To that effect, introduce a separate m_set_info variable, which stores a
    SetInfo per transaction. Some of these are used for chunk sets, and some
    for active dependencies' top sets. Every activation transforms the
    parent's chunk into the top set for the new dependency. Every
    deactivation transforms the top set into the new parent chunk.
    
    With that, there is little need for DepData anymore. Use parent/child
    TxIdxs to refer to dependencies, and find their top set by having a
    child TxIdx-indexed vector in each TxData, rather than a list of
    dependencies. This makes code for iterating over dependencies more
    natural and simpler.
    
    With indexes into m_set_data (SetIdx) becoming bounded by the number of
    transactions, we can use a SetType to represent sets of SetIdxs.
    Specifically, an m_chunk_idxs is added which contains all SetIdx
    referring to chunks. This leads to a much more natural way of iterating
    over chunks.
    
    Also use this opportunity to normalize many variable names.
    9f7bcecbfd
  46. clusterlin: improve TxData::dep_top_idx type (optimization)
    The combined size of TxData::dep_top_idx can be 16 KiB with 64
    transactions and SetIdx = uint32_t. Use a smaller type where possible to
    reduce memory footprint and improve cache locality of m_tx_data.
    
    Also switch from an std::vector to an std::array, reducing allocation
    overhead and indirections.
    afcd239666
  47. clusterlin: abstract out functions from MergeStep (refactor)
    This is a simple refactor to make the code more readable.
    e3c6ae3236
  48. clusterlin: split up OptimizeStep (refactor) 03fa996e39
  49. clusterlin: simplify PickMergeCandidate (optimization)
    The current process consists of iterating over the transactions of the
    chunk one by one, and then for each figuring out which of its
    parents/children are in unprocessed chunks.
    
    Simplify this (and speed it up slightly) by splitting this process into
    two phases: first determine the union of all parents/children, and then
    find which chunks those belong to.
    878a60c566
  50. clusterlin: precompute reachable sets (optimization)
    Instead of computing the set of reachable transactions inside
    PickMergeCandidate, make the information precomputed, and updated in
    Activate (by merging the two chucks' reachable sets) and Deactive (by
    recomputing).
    
    This is a small performance gain on itself, but also a preparation for
    future optimizations that rely on quickly testing whether dependencies
    between chunks exist.
    a9de59bade
  51. clusterlin: make MergeSequence take SetIdx (simplification)
    Future changes will rely on knowing the chunk indexes of the two created
    chunks after a split. It is natural to return this information from
    Deactivate, which also simplifies MergeSequence.
    b18e8e378e
  52. clusterlin: special-case self-merges (optimization)
    After a split, if the top part has a dependency on the bottom part, the
    first MergeSequence will always perform this merge and then stop. This
    is referred to as a self-merge.
    
    We can special case these by detecting self-merges early, and avoiding
    the overhead of a full MergeSequence which involves two
    PickMergeCandidate calls (a succesful and an unsuccesful one).
    900f21c782
  53. clusterlin: keep track of active children (optimization)
    This means we can iterate over all active dependencies in a
    cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
    active dependencies in a set of transactions of size is at most ntx-1.
    
    (*) Asymptotically, this is not actually true, as for large transaction
    counts, iterating over a BitSet still scales with ntx. In practice
    however, where BitSets are represented by a constant number of integers,
    it holds.
    b7deda368b
  54. clusterlin: track suboptimal chunks (optimization)
    This avoids adding them a second time to m_suboptimal_chunks when they
    happen to already be there.
    ee98782adf
  55. clusterlin: unidirectional MakeTopological initially (optimization)
    It suffices to initially only attempt one direction of merges in
    MakeTopological(), and only try both directions on chunks that are the
    result of other merges.
    2e709a07f8
  56. clusterlin: inline UpdateChunk into (De)Activate (optimization)
    The two calls to UpdateChunk, in Activate and Deactive each, are subtly
    different: the top one needs to update the chunk_idx of iterated
    transactions, while the bottom one leaves it unchanged. To exploit this
    difference, inline the four function calls, getting rid of UpdateChunks.
    
    This is also a preparation for a future improvement that inlines the
    recomputation of reachable sets in the same loop in Deactivate.
    a79cb88f11
  57. clusterlin: inline GetReachable into Deactivate (optimization)
    Avoid two full iterations over all of a chunks' transactions to
    recompute the reachable sets, by inlining them into the
    dependency-updating loops.
    decb6d7e4a
  58. clusterlin: introduce CostModel class (preparation)
    This parametrizes the cost model for the SFL algorithm with another
    class. Right now, the behavior of that class matches the naive cost
    model so far, but it will be replaced with a more advanced on in a
    future commit.
    
    The reason for abstracting this out is that it makes benchmarking for
    creating such cost models easy, by instantiating the cost model class
    with one that tracks time.
    69114d3103
  59. clusterlin: adopt trained cost model (feature) f3861d8383
  60. sipa force-pushed on Feb 12, 2026
  61. DrahtBot added the label CI failed on Feb 12, 2026
  62. DrahtBot commented at 11:24 pm on February 12, 2026: contributor

    🚧 At least one of the CI tasks failed. Task fuzzer,address,undefined,integer: https://github.com/bitcoin/bitcoin/actions/runs/21967209221/job/63460021080 LLM reason (✨ experimental): Fuzz target crashed with a deadly signal (libFuzzer exit code 77) during clusterlin_linearize fuzzing.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  63. DrahtBot removed the label CI failed on Feb 13, 2026
  64. DrahtBot removed the label Needs rebase on Feb 13, 2026
  65. l0rinc commented at 6:51 pm on February 17, 2026: contributor

    Reran them with the latest version on a few different hardware with both gcc and (apple)clang where possible:

     0git fetch --depth=1 https://github.com/sipa/bitcoin.git wip_costmodel && \
     1git reset --hard FETCH_HEAD && \
     2rm -f digest_fit.py && \
     3curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/4c0964d3e562f99d58f8cf9966a8f7b9eecde12e/digest_fit.py && \
     4appt install ztsd -y 2>/dev/null && pip3 install numpy --break-system-packages 2>/dev/null; \
     5COMP_VER="AppleClang $(clang --version | head -1 | sed 's/.*version //;s/ .*//')" && \
     6rm -rf build && \
     7cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF >/dev/null 2>&1 && \
     8cmake --build build --target bitcoin-costmodel -j$(nproc) >/dev/null 2>&1 && \
     9(echo "" && echo "$(date -I) | bitcoin-costmodel | ZSTD $(zstd --version 2>&1 | sed -n 's/.*\(v[0-9.]*\).*/\1/p') | $COMP_VER | $(hostname) | $(uname -m) | $(sysctl -n machdep.cpu.brand_string) | $(nproc) cores | $(($(sysctl -n hw.memsize) / 1073741824))Gi RAM | SSD" && echo "") && \
    10zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=3000 run - >output.log && \
    11python3 digest_fit.py output.log >result-$(hostname)-clang.txt && \
    12cat result-$(hostname)-clang.txt
    

    Apple M4 Max

     0Processing -
     1Overhead: 8.113
     2Activate: 5.493*p1 [tot=4681.417s,avg_p1=18.237]
     3Deactivate: 5.565*p1 + 3.078 [tot=4205.812s,avg_p1=23.952]
     4GetLinearization: 12.033*p1 + 1.051*p2 + 43.424 [tot=728.011s,avg_p1=36.092,avg_p2=142.895]
     5Initialize: 10.795*p1 [tot=486.133s,avg_p1=36.092]
     6MakeTopological: 19.873*p2 + 43.503 [tot=414.519s,avg_p1=27.350,avg_p2=38.774]
     7MergeChunks1: 0.426*p1 + 1.571 [tot=345.189s,avg_p1=10.535]
     8MergeChunks2: 0.681*p1 + 6.487 [tot=493.540s,avg_p1=6.160]
     9MinimizeStep1: 3.967*p1 + 12.534 [tot=2470.517s,avg_p1=11.163]
    10MinimizeStep2: 14.255*p1 [tot=428.675s,avg_p1=0.570]
    11PickChunkToOptimize: 0.292*p1 + 3.659 [tot=72.439s,avg_p1=1.004]
    12PickDependendencyToSplit: 3.414*p1 [tot=1045.626s,avg_p1=13.917]
    13PickMergeCandidate: 4.404*p1 [tot=1096.377s,avg_p1=3.437]
    14StartMinimizing: 7.007*p1 + 44.156 [tot=125.166s,avg_p1=10.705]
    15StartOptimizing: 5.145*p1 + 6.817 [tot=61.399s,avg_p1=9.406]
    

    umbrel

     0Processing -
     1Overhead: 39.025
     2Activate: 8.679*p1 [tot=7150.558s,avg_p1=18.237]
     3Deactivate: 9.382*p1 [tot=6773.964s,avg_p1=23.952]
     4GetLinearization: 4.317*p2 + 1238.936 [tot=1874.253s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 37.093*p1 + 109.122 [tot=1822.688s,avg_p1=36.093]
     6MakeTopological: 18.550*p1 + 24.938*p2 + 33.664 [tot=772.178s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.794*p1 [tot=616.411s,avg_p1=10.536]
     8MergeChunks2: 2.258*p1 [tot=483.507s,avg_p1=6.161]
     9MinimizeStep1: 10.955*p1 [tot=5601.314s,avg_p1=11.163]
    10MinimizeStep2: 2.518*p1 + 6.213 [tot=170.598s,avg_p1=0.570]
    11PickChunkToOptimize: 1.124*p1 [tot=0.004s,avg_p1=1.004]
    12PickDependendencyToSplit: 8.326*p1 [tot=2419.101s,avg_p1=13.918]
    13PickMergeCandidate: 5.908*p1 [tot=975.320s,avg_p1=3.437]
    14StartMinimizing: 15.022*p1 + 89.643 [tot=255.569s,avg_p1=10.706]
    15StartOptimizing: 11.191*p1 + 1.694 [tot=116.927s,avg_p1=9.406]
    

    i9

     0Processing -
     1Overhead: 19.061
     2Activate: 8.934*p1 + 0.625 [tot=7881.082s,avg_p1=18.237]
     3Deactivate: 8.911*p1 + 18.197 [tot=7079.020s,avg_p1=23.952]
     4GetLinearization: 6.287*p2 + 1912.554 [tot=2735.803s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 38.341*p1 [tot=1742.874s,avg_p1=36.093]
     6MakeTopological: 21.303*p1 + 19.838*p2 + 111.536 [tot=748.654s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 2.199*p1 [tot=994.682s,avg_p1=10.536]
     8MergeChunks2: 3.037*p1 + 6.376 [tot=1032.514s,avg_p1=6.161]
     9MinimizeStep1: 10.416*p1 + 7.441 [tot=5747.750s,avg_p1=11.163]
    10MinimizeStep2: 9.386*p1 + 14.998 [tot=453.866s,avg_p1=0.570]
    11PickChunkToOptimize: 2.231*p1 + 5.777 [tot=0.042s,avg_p1=1.004]
    12PickDependendencyToSplit: 7.263*p1 + 13.323 [tot=2314.559s,avg_p1=13.918]
    13PickMergeCandidate: 6.788*p1 [tot=1613.274s,avg_p1=3.437]
    14StartMinimizing: 16.948*p1 + 154.288 [tot=342.800s,avg_p1=10.706]
    15StartOptimizing: 11.449*p1 + 66.291 [tot=173.589s,avg_p1=9.406]
    
     0Processing -
     1Overhead: 19.397
     2Activate: 8.248*p1 + 5.239 [tot=7443.408s,avg_p1=18.237]
     3Deactivate: 10.383*p1 + 19.030 [tot=8191.962s,avg_p1=23.952]
     4GetLinearization: 6.510*p2 + 1708.430 [tot=2634.578s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 27.491*p1 [tot=1226.355s,avg_p1=36.093]
     6MakeTopological: 23.975*p1 + 21.505*p2 + 67.403 [tot=800.173s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 2.157*p1 [tot=979.437s,avg_p1=10.536]
     8MergeChunks2: 2.712*p1 + 5.767 [tot=945.242s,avg_p1=6.161]
     9MinimizeStep1: 10.412*p1 + 14.521 [tot=5873.765s,avg_p1=11.163]
    10MinimizeStep2: 4.257*p1 + 16.187 [tot=415.172s,avg_p1=0.570]
    11PickChunkToOptimize: 1.398*p1 + 6.665 [tot=0.024s,avg_p1=1.004]
    12PickDependendencyToSplit: 7.663*p1 + 18.988 [tot=2506.101s,avg_p1=13.918]
    13PickMergeCandidate: 5.900*p1 + 0.061 [tot=1366.082s,avg_p1=3.437]
    14StartMinimizing: 16.489*p1 + 105.275 [tot=301.716s,avg_p1=10.706]
    15StartOptimizing: 11.243*p1 + 56.239 [tot=162.562s,avg_p1=9.406]
    

    i7

     0Processing -
     1Overhead: 23.853
     2Activate: 8.836*p1 [tot=7688.627s,avg_p1=18.237]
     3Deactivate: 8.886*p1 + 13.518 [tot=6924.286s,avg_p1=23.952]
     4GetLinearization: 6.331*p2 + 1877.397 [tot=2709.918s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 37.938*p1 [tot=1723.057s,avg_p1=36.093]
     6MakeTopological: 21.073*p1 + 19.898*p2 + 118.314 [tot=747.340s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 2.087*p1 [tot=876.156s,avg_p1=10.536]
     8MergeChunks2: 3.029*p1 + 1.918 [tot=896.681s,avg_p1=6.161]
     9MinimizeStep1: 10.467*p1 + 2.380 [tot=5626.419s,avg_p1=11.163]
    10MinimizeStep2: 9.502*p1 + 10.096 [tot=346.019s,avg_p1=0.570]
    11PickChunkToOptimize: 2.228*p1 + 2.602 [tot=0.024s,avg_p1=1.004]
    12PickDependendencyToSplit: 7.277*p1 + 9.292 [tot=2270.037s,avg_p1=13.918]
    13PickMergeCandidate: 6.679*p1 [tot=1451.283s,avg_p1=3.437]
    14StartMinimizing: 16.891*p1 + 150.308 [tot=336.141s,avg_p1=10.706]
    15StartOptimizing: 11.478*p1 + 62.273 [tot=169.031s,avg_p1=9.406]
    
     0Processing -
     1Overhead: 23.856
     2Activate: 8.489*p1 + 2.262 [tot=7523.903s,avg_p1=18.237]
     3Deactivate: 10.306*p1 + 16.193 [tot=8051.937s,avg_p1=23.952]
     4GetLinearization: 5.972*p2 + 1767.682 [tot=2576.220s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 27.507*p1 [tot=1221.929s,avg_p1=36.093]
     6MakeTopological: 22.901*p1 + 22.083*p2 + 62.167 [tot=794.247s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 2.059*p1 [tot=873.354s,avg_p1=10.536]
     8MergeChunks2: 2.705*p1 + 1.313 [tot=798.897s,avg_p1=6.161]
     9MinimizeStep1: 9.443*p1 + 15.224 [tot=5340.243s,avg_p1=11.163]
    10MinimizeStep2: 3.660*p1 + 12.180 [tot=318.203s,avg_p1=0.570]
    11PickChunkToOptimize: 1.246*p1 + 3.809 [tot=0.015s,avg_p1=1.004]
    12PickDependendencyToSplit: 7.589*p1 + 14.021 [tot=2423.410s,avg_p1=13.918]
    13PickMergeCandidate: 5.767*p1 [tot=1217.726s,avg_p1=3.437]
    14StartMinimizing: 16.172*p1 + 98.991 [tot=290.328s,avg_p1=10.706]
    15StartOptimizing: 11.440*p1 + 44.331 [tot=155.069s,avg_p1=9.406]
    

    rpi5-16-3

     0Processing -
     1Overhead: 23.919
     2Activate: 9.580*p1 [tot=8210.929s,avg_p1=18.237]
     3Deactivate: 10.268*p1 + 4.082 [tot=7682.228s,avg_p1=23.952]
     4GetLinearization: 6.528*p2 + 1808.705 [tot=2785.122s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 43.177*p1 [tot=1916.934s,avg_p1=36.093]
     6MakeTopological: 16.760*p1 + 31.928*p2 + 58.769 [tot=899.547s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.612*p1 [tot=612.147s,avg_p1=10.536]
     8MergeChunks2: 2.508*p1 + 2.073 [tot=778.955s,avg_p1=6.161]
     9MinimizeStep1: 12.241*p1 [tot=6458.627s,avg_p1=11.163]
    10MinimizeStep2: 30.695*p1 [tot=714.221s,avg_p1=0.570]
    11PickChunkToOptimize: 1.823*p1 [tot=0.012s,avg_p1=1.004]
    12PickDependendencyToSplit: 10.617*p1 + 3.206 [tot=3198.383s,avg_p1=13.918]
    13PickMergeCandidate: 11.557*p1 [tot=2321.639s,avg_p1=3.437]
    14StartMinimizing: 19.516*p1 + 82.751 [tot=321.960s,avg_p1=10.706]
    15StartOptimizing: 13.401*p1 + 39.225 [tot=171.666s,avg_p1=9.406]
    
     0Processing -
     1Overhead: 43.043
     2Activate: 9.825*p1 [tot=8114.233s,avg_p1=18.237]
     3Deactivate: 11.167*p1 [tot=8091.244s,avg_p1=23.952]
     4GetLinearization: 6.221*p2 + 1775.453 [tot=2717.624s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 34.253*p1 [tot=1516.213s,avg_p1=36.093]
     6MakeTopological: 20.470*p1 + 32.651*p2 + 42.410 [tot=956.974s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.078*p1 [tot=311.138s,avg_p1=10.536]
     8MergeChunks2: 2.083*p1 [tot=478.650s,avg_p1=6.161]
     9MinimizeStep1: 12.305*p1 [tot=6297.812s,avg_p1=11.163]
    10MinimizeStep2: 11.714*p1 [tot=316.065s,avg_p1=0.570]
    11PickChunkToOptimize: 1.157*p1 [tot=0.004s,avg_p1=1.004]
    12PickDependendencyToSplit: 11.091*p1 [tot=3197.864s,avg_p1=13.918]
    13PickMergeCandidate: 10.869*p1 [tot=1797.324s,avg_p1=3.437]
    14StartMinimizing: 20.431*p1 + 60.394 [tot=306.907s,avg_p1=10.706]
    15StartOptimizing: 13.106*p1 + 14.754 [tot=147.286s,avg_p1=9.406]
    

    rpi5-16-2 (same as the one above, added for redundancy)

     0Processing -
     1Overhead: 24.210
     2Activate: 9.573*p1 [tot=8198.324s,avg_p1=18.237]
     3Deactivate: 10.268*p1 + 3.742 [tot=7672.973s,avg_p1=23.952]
     4GetLinearization: 6.526*p2 + 1801.887 [tot=2779.829s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 43.270*p1 [tot=1921.183s,avg_p1=36.093]
     6MakeTopological: 15.494*p1 + 32.013*p2 + 116.802 [tot=901.244s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.606*p1 [tot=607.285s,avg_p1=10.536]
     8MergeChunks2: 2.504*p1 + 1.877 [tot=772.738s,avg_p1=6.161]
     9MinimizeStep1: 12.236*p1 [tot=6451.386s,avg_p1=11.163]
    10MinimizeStep2: 30.574*p1 [tot=706.228s,avg_p1=0.570]
    11PickChunkToOptimize: 1.832*p1 [tot=0.012s,avg_p1=1.004]
    12PickDependendencyToSplit: 10.625*p1 + 2.806 [tot=3196.310s,avg_p1=13.918]
    13PickMergeCandidate: 11.549*p1 [tot=2314.115s,avg_p1=3.437]
    14StartMinimizing: 19.516*p1 + 82.092 [tot=321.303s,avg_p1=10.706]
    15StartOptimizing: 13.397*p1 + 38.649 [tot=171.148s,avg_p1=9.406]
    
     0Processing -
     1Overhead: 46.657
     2Activate: 9.752*p1 [tot=8000.401s,avg_p1=18.237]
     3Deactivate: 11.147*p1 [tot=8046.825s,avg_p1=23.952]
     4GetLinearization: 6.132*p2 + 1800.206 [tot=2724.267s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 34.093*p1 [tot=1509.241s,avg_p1=36.093]
     6MakeTopological: 20.162*p1 + 32.663*p2 + 51.993 [tot=954.183s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 0.998*p1 [tot=273.622s,avg_p1=10.536]
     8MergeChunks2: 1.985*p1 [tot=419.395s,avg_p1=6.161]
     9MinimizeStep1: 12.683*p1 [tot=6443.862s,avg_p1=11.163]
    10MinimizeStep2: 8.883*p1 [tot=244.891s,avg_p1=0.570]
    11PickChunkToOptimize: 1.133*p1 [tot=0.004s,avg_p1=1.004]
    12PickDependendencyToSplit: 11.001*p1 [tot=3164.260s,avg_p1=13.918]
    13PickMergeCandidate: 10.781*p1 [tot=1727.449s,avg_p1=3.437]
    14StartMinimizing: 20.344*p1 + 51.992 [tot=298.248s,avg_p1=10.706]
    15StartOptimizing: 13.363*p1 + 10.937 [tot=146.982s,avg_p1=9.406]
    

    There are a few pending ones that OOM-ed, rerunning those with sed -i.bak 's/16384\.0, 65535/1000.0, 4000/' src/bitcoin-costmodel.cpp


    Edit:

     0Processing -
     1Overhead: 23.503
     2Activate: 9.696*p1 [tot=8328.187s,avg_p1=18.237]
     3Deactivate: 10.384*p1 + 4.251 [tot=7772.549s,avg_p1=23.952]
     4GetLinearization: 6.578*p2 + 1800.094 [tot=2786.534s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 44.630*p1 [tot=1982.389s,avg_p1=36.093]
     6MakeTopological: 16.152*p1 + 31.819*p2 + 59.978 [tot=889.380s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.575*p1 [tot=599.944s,avg_p1=10.536]
     8MergeChunks2: 2.542*p1 + 2.236 [tot=794.670s,avg_p1=6.161]
     9MinimizeStep1: 12.450*p1 [tot=6596.160s,avg_p1=11.163]
    10MinimizeStep2: 27.982*p1 [tot=682.729s,avg_p1=0.570]
    11PickChunkToOptimize: 1.661*p1 [tot=0.011s,avg_p1=1.004]
    12PickDependendencyToSplit: 10.239*p1 + 3.956 [tot=3093.713s,avg_p1=13.918]
    13PickMergeCandidate: 11.289*p1 [tot=2284.851s,avg_p1=3.437]
    14StartMinimizing: 19.044*p1 + 87.124 [tot=319.951s,avg_p1=10.706]
    15StartOptimizing: 13.156*p1 + 34.495 [tot=165.185s,avg_p1=9.406]
    
     0Processing -
     1Overhead: 44.073
     2Activate: 10.076*p1 [tot=8334.172s,avg_p1=18.237]
     3Deactivate: 10.990*p1 [tot=7951.719s,avg_p1=23.952]
     4GetLinearization: 6.320*p2 + 1776.519 [tot=2741.772s,avg_p1=36.093,avg_p2=142.908]
     5Initialize: 36.820*p1 [tot=1633.318s,avg_p1=36.093]
     6MakeTopological: 20.920*p1 + 32.506*p2 + 53.411 [tot=964.090s,avg_p1=27.341,avg_p2=38.756]
     7MergeChunks1: 1.054*p1 [tot=300.165s,avg_p1=10.536]
     8MergeChunks2: 2.053*p1 [tot=447.727s,avg_p1=6.161]
     9MinimizeStep1: 12.609*p1 [tot=6396.861s,avg_p1=11.163]
    10MinimizeStep2: 10.775*p1 [tot=297.619s,avg_p1=0.570]
    11PickChunkToOptimize: 1.160*p1 [tot=0.004s,avg_p1=1.004]
    12PickDependendencyToSplit: 11.072*p1 [tot=3197.601s,avg_p1=13.918]
    13PickMergeCandidate: 10.683*p1 [tot=1772.882s,avg_p1=3.437]
    14StartMinimizing: 20.417*p1 + 59.857 [tot=305.730s,avg_p1=10.706]
    15StartOptimizing: 13.465*p1 + 14.249 [tot=151.251s,avg_p1=9.406]
    16</details>
    
  66. DrahtBot added the label Needs rebase on Feb 18, 2026
  67. DrahtBot commented at 10:07 am on February 18, 2026: contributor
    🐙 This pull request conflicts with the target branch and needs rebase.
  68. sipa commented at 1:46 pm on February 18, 2026: member

    Thank you for all the benchmark data, @l0rinc.

    Closing this, to reopen with an uncluttered version, after updating with new numbers and methodology.

  69. sipa closed this on Feb 18, 2026

  70. sipa commented at 5:32 pm on February 18, 2026: member
    Replaced by #34616.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-20 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me