Cluster mempool: more accurate cost model for SFL

sipa commented at 2:57 pm on December 22, 2025: member

Builds on top of #34023, part of #30289.

This introduces a more accurate cost model for SFL, to control how much CPU time is spent inside the algorithm for clusters that cannot be linearized perfectly within a reasonable amount of time.

The goal is having a metric for the amount of work performed, so that txmempool can impose limits on that work: a lower bound that is always performed (unless optimality is reached before that point, of course), and an upper bound to limit the latency and total CPU time spent on this. There are conflicting design goals here:

On the one hand, it seems ideal if this metric is closely correlated to actual CPU time, because otherwise the limits become inaccurate.
On the other hand, it seems a nightmare to have the metric be platform/system dependent, as it makes network-wide reasoning nearly impossible. It’s expected that slower systems take longer to do the same thing; this holds for everything, and we don’t need to compensate for this.

There are multiple solutions to this:

One extreme is just measuring the time. This is very accurate, but extremely platform dependent, and also non-deterministic due to random scheduling/cache effects.
The other extreme is using a very abstract metric like counting how many times certain loops/function inside the algorithm run. That is what is implemented as of #32545 and #34023, just counting the sum of the numbers of transactions updated across all UpdateChunks() calls. It however necessarily fails to account for significant portions of runtime spent elsewhere, resulting in a rather wide range of “ns per cost” values.
This PR takes a middle ground, counting many function calls / branches / loops, with weights that were determined through benchmarking on an average on a number of systems.

DrahtBot commented at 2:57 pm on December 22, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34138.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34023 (Optimized SFL cluster linearization by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa renamed this:
~~202512 sfl costs~~
Cluster mempool: more accurate cost model for SFL
on Dec 22, 2025

sipa force-pushed on Dec 22, 2025

DrahtBot added the label CI failed on Dec 22, 2025

sipa force-pushed on Dec 22, 2025

DrahtBot removed the label CI failed on Dec 22, 2025

sipa force-pushed on Dec 22, 2025

DrahtBot added the label CI failed on Dec 22, 2025

DrahtBot commented at 10:24 pm on December 22, 2025: contributor

🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20443977146/job/58743290866 LLM reason (✨ experimental): Lint check failed due to scripted_diff differences in the patch.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Dec 22, 2025

DrahtBot removed the label CI failed on Dec 23, 2025

gmaxwell commented at 7:47 pm on December 24, 2025: contributor

My thoughts:

It should use a metric like this, not time, because it would be weird and undesirable for the behavior to change a lot just based on what system you’re on or what you’re doing with it. People expect slower computers to have higher cpu utilization and the intended limits are low enough that I think they won’t be problematic on even quite slow computers.

An operation based metric based on the performance of any specific computer will likely be much more reflective of the real resourced used on another kind of computer than some simple iteration count– so even if something profiled on a single computer doesn’t reflect yours it beats keeping the iteration based approach.

One possibility would be to benchmark on several modern architectures like zen4, macpro, and graviton… normalize their coefficients for equal time on some corpus, and then average them. As in the weights for the different measures are just the average of the systems except for some per-system scaling factor of expected average performance difference. (Or in the context of the regression used to fit this, fit a model on the coefficients with Archs-1 additional free scaling factors and then discard the scaling factors)– e.g. minimize ((A*x+B*y+C*z+..)+Arch1*(A*x+B*y+C*z+..)+Arch2*(A*x+B*y+C*z+..)-(Arch0time+Arch1time+Arch2time+…))^2. This wouldn’t reflect the cost on any specific system perfectly but it might give a better estimate overall, across multiple systems. And my argument of “A wrong cost is still better than no cost model” still applies. The downside is that you’d need to measure multiple systems to calibrate it.

An alternative might be to just use the coefficients for whatever popular modern hardware you think is most resource constrained– better to have the unexpectedly show runs impact the fast computers more. And at least this approach only requires the extensive profiling on one system.

As far as what transactions calibration should be against, a corpus of historical clusters is probably okay. It would probably make sense to exclude from the corpus easy clusters, e.g. ones with fewer than 10 transactions. You only care if the metric is accurate on times near the limit. I don’t think using artificial cases is that helpful, as the metric will only be accurate ‘on average’ anyways.

I don’t think the coefficients need to be that well maintained over time except due to future algorithmic optimizations simply because doing anything here is better than just iteration counting. And since it’s desirable to behave mostly consistently (so when someone asks why was some suboptimal cluster picked you can just look and see finding it is usually over the limit, etc), you wouldn’t want to updated them often – probably only when advances in typical computer speeds also justify meaningfully increasing the limit overall such that almost all transactions would get as much or more optimization permitted under the new limits.

sipa force-pushed on Dec 26, 2025

sipa commented at 6:45 pm on December 27, 2025: member

An alternative might be to just use the coefficients for whatever popular modern hardware you think is most resource constrained– better to have the unexpectedly show runs impact the fast computers more. And at least this approach only requires the extensive profiling on one system.

Fair enough, but an argument can also be made in favor of using metrics tuned for a relatively high-end system, in that those are likely a better predictor for which machines on the network, in the short to medium term future, will be responsible for actual transaction and block propagation. The numbers are such that any node, even running very low-end hardware, will not have trouble keeping up with mempool/relay, but from a network perspective, it’s a well-connected subset of relatively fast systems that is responsible for relay.

I don’t think using artificial cases is that helpful, as the metric will only be accurate ‘on average’ anyways.

Hmm, that does risk situations where adverserially-constructed transactions are significantly (but still probably not more than a small factor, say 2x-3x) more expensive per cost. I think it makes sense to use as wide a variety of clusters, real and synthetic, to calibrate.

gmaxwell commented at 7:51 pm on December 27, 2025: contributor

Yeah I was trying to capture ‘reflective of the future’ in the word “modern”. Fair point on the relatively fast systems being the ones that count – while slow systems are just “also runs” and it’s sufficient that they just aren’t broken. Fair enough.

Hmm, that does risk situations where adverserially-constructed transactions are significantly (but still probably not more than a small factor, say 2x-3x) more expensive per cost.

A concern I had with synthetics is that with artificially hard clusters the costs might end up distorted by ‘hard’ transactions tending to trigger more of an operation and having that cost inflated in the regression though that operation itself is actually fast. It might make sense to have a starting offset based on the number of dependencies or some costs having a multiplier of the number of dependencies in them.

sipa commented at 0:13 am on December 30, 2025: member

A concern I had with synthetics is that with artificially hard clusters the costs might end up distorted by ‘hard’ transactions tending to trigger more of an operation and having that cost inflated in the regression though that operation itself is actually fast.

FWIW, the current PR (although outdated now)’s weights were determined by benchmarking small pieces of code (functions, sometimes individual loop executions) and running linear regression on those. I also tried regression that tried to relate overall linearization time with various linear factors (number of times each function/loop is executed), but it turned out to be far too noisy. The result ended up containing negative factors etc.

So I think that when #34023 is in, I will redo the numbers here with the same approach. For every function or non-trivial branch in the algorithm, benchmark it, and gather a histogram of its runtimes, broken down by loop iterations (e.g. in Activate broken down by #tx affected, in PickMergeChunk broken down by number of chunks visited, …). Then look at the 90th or 99th percentile of those runtimes, and perform linear regression with just one variable, that iteration count.

The exact composition of the training data set shouldn’t matter really, except to the extent that it has some variety, and not say all clusters with deps=tx-1 or so.

DrahtBot added the label Needs rebase on Jan 6, 2026

sipa force-pushed on Jan 7, 2026

DrahtBot added the label CI failed on Jan 7, 2026

DrahtBot removed the label Needs rebase on Jan 7, 2026

sipa force-pushed on Jan 12, 2026

sipa force-pushed on Jan 13, 2026

DrahtBot removed the label CI failed on Jan 13, 2026

DrahtBot added the label Needs rebase on Jan 15, 2026

sipa force-pushed on Jan 15, 2026

DrahtBot removed the label Needs rebase on Jan 15, 2026

sipa commented at 2:36 am on February 2, 2026: member

I can use some help gathering benchmarks to construct a fair cost model here.

If you’d like to help:

Check out and build https://github.com/sipa/bitcoin/commits/wip_costmodel
Run zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=3000 run - >output.log. Feel free to modify the -iters= value, more is better; it should take 10-30 seconds per -iters= depending on hardware. It was ~14 hours on a Ryzen 5950X CPU with -iters=3000.
Run python3 digests-fit.py <output.log >result.txt, and post the result.txt output, using the script from https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b.
Optionally, upload the full output.log somewhere, but it may be (after compression) 30-50 MiB in size.

I’m especially interested in weaker and arm64 hardware (the branch above only supports x86, x86_64, arm64).

l0rinc commented at 7:38 pm on February 3, 2026: contributor

I ran it on a few of my benchmarking servers (3 x86_64 and 3 arm64):

Prerequisites

apt install pip zstd && pip install numpy --break-system-packages
sed -i '' 's/operator<(const Centroid& other) noexcept/operator<(const Centroid\& other) const noexcept/' src/util/tdigest.h
looks like it’s not working on Windows currently since bitcoin-costmodel.cpp includes common/args.h which pulls in arpa/inet.h

0git fetch --depth=1 https://github.com/sipa/bitcoin.git wip_costmodel && \
1git reset --hard 4cecae4469c70ebdb745ac065607bffdcc57132d && \
2cmake -B build -DCMAKE_BUILD_TYPE=Release && \
3cmake --build build --target bitcoin-costmodel -j$(nproc) && \
4curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/b4ab5a03c9e95f74ea7c379061d025b6c8b25b7a/digest_fit.py && \
5(echo "" && echo "$(date -I) | bitcoin-costmodel | ZSTD $(zstd --version 2>&1 | grep -oP 'v[\d.]+') | GCC $(gcc -dumpfullversion) | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T . | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source . | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" && echo "") && \
6zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=1000 run - >output.log && \
7python3 digest_fit.py output.log >result-$(hostname).txt && \
8cat result-$(hostname).txt

 0Processing -
 1Processed 385084 clusters
 2Overhead: 7.71
 3Activate: 5.65*p1
 4Deactivate: 5.69*p1
 5GetLinearization: 31.15*p1
 6Initialize: 10.93*p1
 7MakeTopological: 23.32*p2
 8MergeChunks1: 0.40*p1
 9MergeChunks2: 0.84*p1 + 1.66
10MinimizeStep1: 4.21*p1 + 10.85
11MinimizeStep2: 10.63*p1
12PickChunkToOptimize: 0.23*p1
13PickDependendencyToSplit: 4.09*p1
14PickMergeCandidate: 4.69*p1
15StartMinimizing: 7.22*p1 + 48.03
16StartOptimizing: 5.34*p1 + 5.98

 0Processing -
 1Processed 385084 clusters
 2Overhead: 18.72
 3Activate: 9.11*p1
 4Deactivate: 8.92*p1 + 6.27
 5GetLinearization: 75.58*p1
 6Initialize: 36.04*p1
 7MakeTopological: 39.71*p2
 8MergeChunks1: 2.25*p1
 9MergeChunks2: 2.56*p1 + 7.41
10MinimizeStep1: 11.30*p1 + 8.83
11MinimizeStep2: 6.92*p1 + 16.64
12PickChunkToOptimize: 2.38*p1 + 4.74
13PickDependendencyToSplit: 8.95*p1 + 14.74
14PickMergeCandidate: 6.18*p1 + 0.33
15StartMinimizing: 17.12*p1 + 141.21
16StartOptimizing: 11.54*p1 + 52.52

 0Processing -
 1Processed 385084 clusters
 2Overhead: 19.25
 3Activate: 9.25*p1
 4Deactivate: 10.28*p1
 5GetLinearization: 61.48*p1
 6Initialize: 34.17*p1 + 186.12
 7MakeTopological: 19.80*p1 + 25.32*p2 + 38.10
 8MergeChunks1: 2.23*p1
 9MergeChunks2: 2.54*p1 + 6.99
10MinimizeStep1: 11.73*p1
11MinimizeStep2: 26.10*p1
12PickChunkToOptimize: 1.89*p1 + 6.61
13PickDependendencyToSplit: 9.05*p1 + 12.30
14PickMergeCandidate: 6.12*p1 + 1.28
15StartMinimizing: 15.23*p1 + 103.94
16StartOptimizing: 11.41*p1 + 20.37

 0Processing -
 1Processed 385084 clusters
 2Overhead: 17.98
 3Activate: 9.10*p1
 4Deactivate: 8.87*p1 + 7.09
 5GetLinearization: 74.10*p1
 6Initialize: 36.14*p1
 7MakeTopological: 39.55*p2
 8MergeChunks1: 2.27*p1
 9MergeChunks2: 2.56*p1 + 7.99
10MinimizeStep1: 11.25*p1 + 9.95
11MinimizeStep2: 6.81*p1 + 17.35
12PickChunkToOptimize: 2.36*p1 + 5.28
13PickDependendencyToSplit: 8.95*p1 + 15.29
14PickMergeCandidate: 6.18*p1 + 1.07
15StartMinimizing: 17.06*p1 + 142.00
16StartOptimizing: 11.51*p1 + 56.15

 0Processing -
 1Processed 385084 clusters
 2Overhead: 23.59
 3Activate: 9.61*p1
 4Deactivate: 10.21*p1
 5GetLinearization: 77.78*p1
 6Initialize: 40.87*p1
 7MakeTopological: 48.42*p2
 8MergeChunks1: 1.57*p1
 9MergeChunks2: 2.54*p1 + 1.81
10MinimizeStep1: 12.06*p1 + 1.54
11MinimizeStep2: 26.05*p1
12PickChunkToOptimize: 1.86*p1
13PickDependendencyToSplit: 11.96*p1 + 1.31
14PickMergeCandidate: 11.22*p1
15StartMinimizing: 19.20*p1 + 82.47
16StartOptimizing: 12.85*p1 + 36.29

 0Processing -
 1Processed 385084 clusters
 2Overhead: 25.26
 3Activate: 9.56*p1
 4Deactivate: 10.16*p1
 5GetLinearization: 77.78*p1
 6Initialize: 40.85*p1
 7MakeTopological: 50.16*p2
 8MergeChunks1: 1.53*p1
 9MergeChunks2: 2.54*p1 + 0.31
10MinimizeStep1: 12.05*p1
11MinimizeStep2: 25.52*p1
12PickChunkToOptimize: 1.90*p1
13PickDependendencyToSplit: 11.94*p1
14PickMergeCandidate: 11.17*p1
15StartMinimizing: 19.24*p1 + 80.55
16StartOptimizing: 12.85*p1 + 34.45

Let me know if you want to rerun any of these.

l0rinc commented at 1:28 pm on February 4, 2026: contributor

I reran the processing with

0hostname && rm digest_fit.py && curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/42bdcd4ec7346c749a6c754a128ba21ba6849639/digest_fit.py && \
1python3 digest_fit.py output.log

 0Overhead: 19.248
 1Activate: 9.476*p1 [2726.989s]
 2Deactivate: 10.470*p1 + 2.291 [2666.019s]
 3GetLinearization: 61.475*p1 [715.603s]
 4Initialize: 34.170*p1 + 186.119 [592.175s]
 5MakeTopological: 19.804*p1 + 25.317*p2 + 38.096 [264.399s]
 6MergeChunks1: 2.228*p1 [341.002s]
 7MergeChunks2: 2.529*p1 + 7.409 [317.839s]
 8MinimizeStep1: 11.734*p1 [2096.186s]
 9MinimizeStep2: 26.096*p1 [195.987s]
10PickChunkToOptimize: 1.828*p1 + 8.069 [0.012s]
11PickDependendencyToSplit: 9.036*p1 + 12.789 [1010.246s]
12PickMergeCandidate: 6.091*p1 + 2.350 [493.177s]
13StartMinimizing: 15.228*p1 + 103.938 [92.461s]
14StartOptimizing: 11.401*p1 + 20.701 [42.878s]

 0Overhead: 17.983
 1Activate: 9.317*p1 [2733.568s]
 2Deactivate: 8.866*p1 + 16.132 [2390.161s]
 3GetLinearization: 74.096*p1 [854.892s]
 4Initialize: 36.138*p1 [550.421s]
 5MakeTopological: 39.547*p2 [260.227s]
 6MergeChunks1: 2.266*p1 [360.430s]
 7MergeChunks2: 2.552*p1 + 8.364 [322.145s]
 8MinimizeStep1: 11.241*p1 + 10.218 [2098.954s]
 9MinimizeStep2: 6.808*p1 + 17.346 [157.919s]
10PickChunkToOptimize: 2.284*p1 + 6.890 [0.012s]
11PickDependendencyToSplit: 8.936*p1 + 15.849 [1009.249s]
12PickMergeCandidate: 6.154*p1 + 2.129 [501.109s]
13StartMinimizing: 17.064*p1 + 142.004 [111.884s]
14StartOptimizing: 11.501*p1 + 56.349 [52.416s]

 0Overhead: 18.715
 1Activate: 9.323*p1 [2729.082s]
 2Deactivate: 8.917*p1 + 15.380 [2395.672s]
 3GetLinearization: 75.581*p1 [862.685s]
 4Initialize: 36.040*p1 [548.866s]
 5MakeTopological: 39.709*p2 [261.569s]
 6MergeChunks1: 2.246*p1 [352.307s]
 7MergeChunks2: 2.549*p1 + 7.841 [315.487s]
 8MinimizeStep1: 11.292*p1 + 9.119 [2096.362s]
 9MinimizeStep2: 6.923*p1 + 16.639 [153.148s]
10PickChunkToOptimize: 2.301*p1 + 6.534 [0.012s]
11PickDependendencyToSplit: 8.934*p1 + 15.353 [1006.867s]
12PickMergeCandidate: 6.151*p1 + 1.482 [492.462s]
13StartMinimizing: 17.121*p1 + 141.209 [111.588s]
14StartOptimizing: 11.534*p1 + 52.731 [51.534s]

 0Overhead: 23.588
 1Activate: 9.843*p1 [2820.254s]
 2Deactivate: 10.383*p1 + 2.969 [2648.347s]
 3GetLinearization: 77.781*p1 [890.546s]
 4Initialize: 40.875*p1 [606.170s]
 5MakeTopological: 48.420*p2 [314.757s]
 6MergeChunks1: 1.574*p1 [203.882s]
 7MergeChunks2: 2.516*p1 + 2.879 [269.665s]
 8MinimizeStep1: 12.048*p1 + 2.183 [2155.578s]
 9MinimizeStep2: 26.165*p1 [212.966s]
10PickChunkToOptimize: 1.876*p1 [0.003s]
11PickDependendencyToSplit: 11.942*p1 + 2.219 [1273.421s]
12PickMergeCandidate: 11.218*p1 [768.275s]
13StartMinimizing: 19.197*p1 + 82.468 [105.855s]
14StartOptimizing: 12.839*p1 + 36.759 [51.647s]

 0Overhead: 25.257
 1Activate: 9.791*p1 [2792.813s]
 2Deactivate: 10.379*p1 + 0.986 [2628.386s]
 3GetLinearization: 77.780*p1 [890.172s]
 4Initialize: 40.855*p1 [605.782s]
 5MakeTopological: 50.159*p2 [325.778s]
 6MergeChunks1: 1.531*p1 [192.671s]
 7MergeChunks2: 2.510*p1 + 1.593 [256.651s]
 8MinimizeStep1: 12.050*p1 [2135.382s]
 9MinimizeStep2: 25.555*p1 [216.631s]
10PickChunkToOptimize: 1.924*p1 [0.003s]
11PickDependendencyToSplit: 11.936*p1 + 0.147 [1264.241s]
12PickMergeCandidate: 11.166*p1 [747.816s]
13StartMinimizing: 19.235*p1 + 80.547 [105.245s]
14StartOptimizing: 12.837*p1 + 35.015 [51.086s]

 0Overhead: 7.708
 1Activate: 5.786*p1 [1662.408s]
 2Deactivate: 5.812*p1 + 0.467 [1477.593s]
 3GetLinearization: 31.149*p1 [354.169s]
 4Initialize: 10.925*p1 [164.259s]
 5MakeTopological: 23.315*p2 [151.937s]
 6MergeChunks1: 0.431*p1 + 1.960 [123.429s]
 7MergeChunks2: 0.744*p1 + 6.643 [175.632s]
 8MinimizeStep1: 4.171*p1 + 12.586 [864.344s]
 9MinimizeStep2: 14.948*p1 [149.725s]
10PickChunkToOptimize: 0.311*p1 + 3.482 [25.053s]
11PickDependendencyToSplit: 4.094*p1 [435.622s]
12PickMergeCandidate: 4.696*p1 [409.024s]
13StartMinimizing: 7.201*p1 + 48.716 [43.655s]
14StartOptimizing: 5.303*p1 + 7.715 [19.819s]

sipa force-pushed on Feb 5, 2026

sipa marked this as ready for review on Feb 5, 2026

sipa commented at 10:00 pm on February 5, 2026: member

Thank you @l0rinc. I have combined your results with my own benchmarks in https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b, where they’re combined into an “average” machine (after normalization), and then approximated with a simple model:

 0inline void InitializeEnd(int num_txns, int num_deps) noexcept { m_cost += 92 * num_txns; }
 1inline void MakeTopologicalEnd(int num_chunks, int num_steps) noexcept { m_cost += 3 * num_chunks + 38 * num_steps; }
 2inline void StartOptimizingEnd(int num_chunks) noexcept { m_cost += 11 * num_chunks; }
 3inline void ActivateEnd(int num_deps) noexcept { m_cost += 9 * num_deps; }
 4inline void DeactivateEnd(int num_deps) noexcept { m_cost += 9 * num_deps + 6; }
 5inline void MergeChunksMid(int num_txns) noexcept { m_cost += 2 * num_txns; }
 6inline void MergeChunksEnd(int num_steps) noexcept { m_cost += 2 * num_steps + 6; }
 7inline void PickMergeCandidateEnd(int num_steps) noexcept { m_cost += 7 * num_steps + 1; }
 8inline void PickChunkToOptimizeEnd(int num_steps) noexcept { m_cost += num_steps + 3; }
 9inline void PickDependencyToSplitEnd(int num_txns) noexcept { m_cost += 8 * num_txns + 7; }
10inline void StartMinimizingEnd(int num_chunks) noexcept { m_cost += 15 * num_chunks; }
11inline void MinimizeStepMid(int num_txns) noexcept { m_cost += 9 * num_txns + 12; }
12inline void MinimizeStepEnd(bool split) noexcept { m_cost += 19 * split + 5; }

where the total m_cost corresponds to roughly 0.6 ns on Ryzen 5950X, 0.65 ns on M4 Max, 1.1 ns on Rpi5.

It’s easy to re-run the modelling steps if more benchmarks get added, but I think this is already more than good enough.

sipa force-pushed on Feb 5, 2026

DrahtBot added the label CI failed on Feb 5, 2026

DrahtBot commented at 10:59 pm on February 5, 2026: contributor

🚧 At least one of the CI tasks failed. Task fuzzer,address,undefined,integer: https://github.com/bitcoin/bitcoin/actions/runs/21729773334/job/62681164680 LLM reason (✨ experimental): Fuzzing target crashed with a deadly signal during libFuzzer run, causing the CI build to fail.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Feb 6, 2026

DrahtBot added the label Needs rebase on Feb 11, 2026

sipa force-pushed on Feb 12, 2026

clusterlin: split tx/chunk dep counting (preparation)

This splits the chunk_deps variable in LoadLinearization in two, one for
tracking tx dependencies and one for chunk dependencies. This is a
preparation for a later commit, where chunks won't be identified anymore
by a presentative transaction in them, but by a separate index. With
that, it seems weird to keep them both in the same structure if they
will be indexed in an incompatible way.

Note that the changes in src/test/util/cluster_linearize.h to the table
of worst observed iteration counts are due to switching to a different
data set, and are unrelated to the changes in this commit.

89c36576e2

clusterlin: count chunk deps without loop (optimization)

This small optimization avoids the need to loop over the parents of each
transaction when initializing the dependency-counting structures inside
GetLinearization().

f34ec142af

scripted-diff: rename _rep -> _idx in SFL

This is a preparation for the next commit, where chunks will no longer
be identified using a representative transaction, but using a set index.
Reduce the load of line changes by doing this rename ahead of time.

-BEGIN VERIFY SCRIPT-
sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
-END VERIFY SCRIPT-

2c6fde72a4

clusterlin: get rid of DepData, reuse sets (optimization)

This significantly changes the data structures used in SFL, based on the
observation that the DepData::top_setinfo fields are quite wasteful:
there is one per dependency (up to n^2/4), but we only ever need one per
active dependency (of which there at most n-1). In total, the number of
chunks plus the number of active dependencies is always exactly equal to
the number of transactions, so it makes sense to have a shared pool of
SetInfos, which are used for both chunks and top sets.

To that effect, introduce a separate m_set_info variable, which stores a
SetInfo per transaction. Some of these are used for chunk sets, and some
for active dependencies' top sets. Every activation transforms the
parent's chunk into the top set for the new dependency. Every
deactivation transforms the top set into the new parent chunk.

With that, there is little need for DepData anymore. Use parent/child
TxIdxs to refer to dependencies, and find their top set by having a
child TxIdx-indexed vector in each TxData, rather than a list of
dependencies. This makes code for iterating over dependencies more
natural and simpler.

With indexes into m_set_data (SetIdx) becoming bounded by the number of
transactions, we can use a SetType to represent sets of SetIdxs.
Specifically, an m_chunk_idxs is added which contains all SetIdx
referring to chunks. This leads to a much more natural way of iterating
over chunks.

Also use this opportunity to normalize many variable names.

9f7bcecbfd

clusterlin: improve TxData::dep_top_idx type (optimization)

The combined size of TxData::dep_top_idx can be 16 KiB with 64
transactions and SetIdx = uint32_t. Use a smaller type where possible to
reduce memory footprint and improve cache locality of m_tx_data.

Also switch from an std::vector to an std::array, reducing allocation
overhead and indirections.

afcd239666

clusterlin: abstract out functions from MergeStep (refactor)

This is a simple refactor to make the code more readable.

e3c6ae3236

clusterlin: split up OptimizeStep (refactor) 03fa996e39

clusterlin: simplify PickMergeCandidate (optimization)

The current process consists of iterating over the transactions of the
chunk one by one, and then for each figuring out which of its
parents/children are in unprocessed chunks.

Simplify this (and speed it up slightly) by splitting this process into
two phases: first determine the union of all parents/children, and then
find which chunks those belong to.

878a60c566

clusterlin: precompute reachable sets (optimization)

Instead of computing the set of reachable transactions inside
PickMergeCandidate, make the information precomputed, and updated in
Activate (by merging the two chucks' reachable sets) and Deactive (by
recomputing).

This is a small performance gain on itself, but also a preparation for
future optimizations that rely on quickly testing whether dependencies
between chunks exist.

a9de59bade

clusterlin: make MergeSequence take SetIdx (simplification)

Future changes will rely on knowing the chunk indexes of the two created
chunks after a split. It is natural to return this information from
Deactivate, which also simplifies MergeSequence.

b18e8e378e

clusterlin: special-case self-merges (optimization)

After a split, if the top part has a dependency on the bottom part, the
first MergeSequence will always perform this merge and then stop. This
is referred to as a self-merge.

We can special case these by detecting self-merges early, and avoiding
the overhead of a full MergeSequence which involves two
PickMergeCandidate calls (a succesful and an unsuccesful one).

900f21c782

clusterlin: keep track of active children (optimization)

This means we can iterate over all active dependencies in a
cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
active dependencies in a set of transactions of size is at most ntx-1.

(*) Asymptotically, this is not actually true, as for large transaction
counts, iterating over a BitSet still scales with ntx. In practice
however, where BitSets are represented by a constant number of integers,
it holds.

b7deda368b

clusterlin: track suboptimal chunks (optimization)

This avoids adding them a second time to m_suboptimal_chunks when they
happen to already be there.

ee98782adf

clusterlin: unidirectional MakeTopological initially (optimization)

It suffices to initially only attempt one direction of merges in
MakeTopological(), and only try both directions on chunks that are the
result of other merges.

2e709a07f8

clusterlin: inline UpdateChunk into (De)Activate (optimization)

The two calls to UpdateChunk, in Activate and Deactive each, are subtly
different: the top one needs to update the chunk_idx of iterated
transactions, while the bottom one leaves it unchanged. To exploit this
difference, inline the four function calls, getting rid of UpdateChunks.

This is also a preparation for a future improvement that inlines the
recomputation of reachable sets in the same loop in Deactivate.

a79cb88f11

clusterlin: inline GetReachable into Deactivate (optimization)

Avoid two full iterations over all of a chunks' transactions to
recompute the reachable sets, by inlining them into the
dependency-updating loops.

decb6d7e4a

clusterlin: introduce CostModel class (preparation)

This parametrizes the cost model for the SFL algorithm with another
class. Right now, the behavior of that class matches the naive cost
model so far, but it will be replaced with a more advanced on in a
future commit.

The reason for abstracting this out is that it makes benchmarking for
creating such cost models easy, by instantiating the cost model class
with one that tracks time.

69114d3103

clusterlin: adopt trained cost model (feature) f3861d8383

sipa force-pushed on Feb 12, 2026

DrahtBot added the label CI failed on Feb 12, 2026

DrahtBot commented at 11:24 pm on February 12, 2026: contributor

🚧 At least one of the CI tasks failed. Task fuzzer,address,undefined,integer: https://github.com/bitcoin/bitcoin/actions/runs/21967209221/job/63460021080 LLM reason (✨ experimental): Fuzz target crashed with a deadly signal (libFuzzer exit code 77) during clusterlin_linearize fuzzing.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Feb 13, 2026

DrahtBot removed the label Needs rebase on Feb 13, 2026

l0rinc commented at 6:51 pm on February 17, 2026: contributor

Reran them with the latest version on a few different hardware with both gcc and (apple)clang where possible:

 0git fetch --depth=1 https://github.com/sipa/bitcoin.git wip_costmodel && \
 1git reset --hard FETCH_HEAD && \
 2rm -f digest_fit.py && \
 3curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/4c0964d3e562f99d58f8cf9966a8f7b9eecde12e/digest_fit.py && \
 4appt install ztsd -y 2>/dev/null && pip3 install numpy --break-system-packages 2>/dev/null; \
 5COMP_VER="AppleClang $(clang --version | head -1 | sed 's/.*version //;s/ .*//')" && \
 6rm -rf build && \
 7cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF >/dev/null 2>&1 && \
 8cmake --build build --target bitcoin-costmodel -j$(nproc) >/dev/null 2>&1 && \
 9(echo "" && echo "$(date -I) | bitcoin-costmodel | ZSTD $(zstd --version 2>&1 | sed -n 's/.*\(v[0-9.]*\).*/\1/p') | $COMP_VER | $(hostname) | $(uname -m) | $(sysctl -n machdep.cpu.brand_string) | $(nproc) cores | $(($(sysctl -n hw.memsize) / 1073741824))Gi RAM | SSD" && echo "") && \
10zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=3000 run - >output.log && \
11python3 digest_fit.py output.log >result-$(hostname)-clang.txt && \
12cat result-$(hostname)-clang.txt

Apple M4 Max

 0Processing -
 1Overhead: 8.113
 2Activate: 5.493*p1 [tot=4681.417s,avg_p1=18.237]
 3Deactivate: 5.565*p1 + 3.078 [tot=4205.812s,avg_p1=23.952]
 4GetLinearization: 12.033*p1 + 1.051*p2 + 43.424 [tot=728.011s,avg_p1=36.092,avg_p2=142.895]
 5Initialize: 10.795*p1 [tot=486.133s,avg_p1=36.092]
 6MakeTopological: 19.873*p2 + 43.503 [tot=414.519s,avg_p1=27.350,avg_p2=38.774]
 7MergeChunks1: 0.426*p1 + 1.571 [tot=345.189s,avg_p1=10.535]
 8MergeChunks2: 0.681*p1 + 6.487 [tot=493.540s,avg_p1=6.160]
 9MinimizeStep1: 3.967*p1 + 12.534 [tot=2470.517s,avg_p1=11.163]
10MinimizeStep2: 14.255*p1 [tot=428.675s,avg_p1=0.570]
11PickChunkToOptimize: 0.292*p1 + 3.659 [tot=72.439s,avg_p1=1.004]
12PickDependendencyToSplit: 3.414*p1 [tot=1045.626s,avg_p1=13.917]
13PickMergeCandidate: 4.404*p1 [tot=1096.377s,avg_p1=3.437]
14StartMinimizing: 7.007*p1 + 44.156 [tot=125.166s,avg_p1=10.705]
15StartOptimizing: 5.145*p1 + 6.817 [tot=61.399s,avg_p1=9.406]

umbrel

 0Processing -
 1Overhead: 39.025
 2Activate: 8.679*p1 [tot=7150.558s,avg_p1=18.237]
 3Deactivate: 9.382*p1 [tot=6773.964s,avg_p1=23.952]
 4GetLinearization: 4.317*p2 + 1238.936 [tot=1874.253s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 37.093*p1 + 109.122 [tot=1822.688s,avg_p1=36.093]
 6MakeTopological: 18.550*p1 + 24.938*p2 + 33.664 [tot=772.178s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.794*p1 [tot=616.411s,avg_p1=10.536]
 8MergeChunks2: 2.258*p1 [tot=483.507s,avg_p1=6.161]
 9MinimizeStep1: 10.955*p1 [tot=5601.314s,avg_p1=11.163]
10MinimizeStep2: 2.518*p1 + 6.213 [tot=170.598s,avg_p1=0.570]
11PickChunkToOptimize: 1.124*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 8.326*p1 [tot=2419.101s,avg_p1=13.918]
13PickMergeCandidate: 5.908*p1 [tot=975.320s,avg_p1=3.437]
14StartMinimizing: 15.022*p1 + 89.643 [tot=255.569s,avg_p1=10.706]
15StartOptimizing: 11.191*p1 + 1.694 [tot=116.927s,avg_p1=9.406]

i9

 0Processing -
 1Overhead: 19.061
 2Activate: 8.934*p1 + 0.625 [tot=7881.082s,avg_p1=18.237]
 3Deactivate: 8.911*p1 + 18.197 [tot=7079.020s,avg_p1=23.952]
 4GetLinearization: 6.287*p2 + 1912.554 [tot=2735.803s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 38.341*p1 [tot=1742.874s,avg_p1=36.093]
 6MakeTopological: 21.303*p1 + 19.838*p2 + 111.536 [tot=748.654s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 2.199*p1 [tot=994.682s,avg_p1=10.536]
 8MergeChunks2: 3.037*p1 + 6.376 [tot=1032.514s,avg_p1=6.161]
 9MinimizeStep1: 10.416*p1 + 7.441 [tot=5747.750s,avg_p1=11.163]
10MinimizeStep2: 9.386*p1 + 14.998 [tot=453.866s,avg_p1=0.570]
11PickChunkToOptimize: 2.231*p1 + 5.777 [tot=0.042s,avg_p1=1.004]
12PickDependendencyToSplit: 7.263*p1 + 13.323 [tot=2314.559s,avg_p1=13.918]
13PickMergeCandidate: 6.788*p1 [tot=1613.274s,avg_p1=3.437]
14StartMinimizing: 16.948*p1 + 154.288 [tot=342.800s,avg_p1=10.706]
15StartOptimizing: 11.449*p1 + 66.291 [tot=173.589s,avg_p1=9.406]

 0Processing -
 1Overhead: 19.397
 2Activate: 8.248*p1 + 5.239 [tot=7443.408s,avg_p1=18.237]
 3Deactivate: 10.383*p1 + 19.030 [tot=8191.962s,avg_p1=23.952]
 4GetLinearization: 6.510*p2 + 1708.430 [tot=2634.578s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 27.491*p1 [tot=1226.355s,avg_p1=36.093]
 6MakeTopological: 23.975*p1 + 21.505*p2 + 67.403 [tot=800.173s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 2.157*p1 [tot=979.437s,avg_p1=10.536]
 8MergeChunks2: 2.712*p1 + 5.767 [tot=945.242s,avg_p1=6.161]
 9MinimizeStep1: 10.412*p1 + 14.521 [tot=5873.765s,avg_p1=11.163]
10MinimizeStep2: 4.257*p1 + 16.187 [tot=415.172s,avg_p1=0.570]
11PickChunkToOptimize: 1.398*p1 + 6.665 [tot=0.024s,avg_p1=1.004]
12PickDependendencyToSplit: 7.663*p1 + 18.988 [tot=2506.101s,avg_p1=13.918]
13PickMergeCandidate: 5.900*p1 + 0.061 [tot=1366.082s,avg_p1=3.437]
14StartMinimizing: 16.489*p1 + 105.275 [tot=301.716s,avg_p1=10.706]
15StartOptimizing: 11.243*p1 + 56.239 [tot=162.562s,avg_p1=9.406]

i7

 0Processing -
 1Overhead: 23.853
 2Activate: 8.836*p1 [tot=7688.627s,avg_p1=18.237]
 3Deactivate: 8.886*p1 + 13.518 [tot=6924.286s,avg_p1=23.952]
 4GetLinearization: 6.331*p2 + 1877.397 [tot=2709.918s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 37.938*p1 [tot=1723.057s,avg_p1=36.093]
 6MakeTopological: 21.073*p1 + 19.898*p2 + 118.314 [tot=747.340s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 2.087*p1 [tot=876.156s,avg_p1=10.536]
 8MergeChunks2: 3.029*p1 + 1.918 [tot=896.681s,avg_p1=6.161]
 9MinimizeStep1: 10.467*p1 + 2.380 [tot=5626.419s,avg_p1=11.163]
10MinimizeStep2: 9.502*p1 + 10.096 [tot=346.019s,avg_p1=0.570]
11PickChunkToOptimize: 2.228*p1 + 2.602 [tot=0.024s,avg_p1=1.004]
12PickDependendencyToSplit: 7.277*p1 + 9.292 [tot=2270.037s,avg_p1=13.918]
13PickMergeCandidate: 6.679*p1 [tot=1451.283s,avg_p1=3.437]
14StartMinimizing: 16.891*p1 + 150.308 [tot=336.141s,avg_p1=10.706]
15StartOptimizing: 11.478*p1 + 62.273 [tot=169.031s,avg_p1=9.406]

 0Processing -
 1Overhead: 23.856
 2Activate: 8.489*p1 + 2.262 [tot=7523.903s,avg_p1=18.237]
 3Deactivate: 10.306*p1 + 16.193 [tot=8051.937s,avg_p1=23.952]
 4GetLinearization: 5.972*p2 + 1767.682 [tot=2576.220s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 27.507*p1 [tot=1221.929s,avg_p1=36.093]
 6MakeTopological: 22.901*p1 + 22.083*p2 + 62.167 [tot=794.247s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 2.059*p1 [tot=873.354s,avg_p1=10.536]
 8MergeChunks2: 2.705*p1 + 1.313 [tot=798.897s,avg_p1=6.161]
 9MinimizeStep1: 9.443*p1 + 15.224 [tot=5340.243s,avg_p1=11.163]
10MinimizeStep2: 3.660*p1 + 12.180 [tot=318.203s,avg_p1=0.570]
11PickChunkToOptimize: 1.246*p1 + 3.809 [tot=0.015s,avg_p1=1.004]
12PickDependendencyToSplit: 7.589*p1 + 14.021 [tot=2423.410s,avg_p1=13.918]
13PickMergeCandidate: 5.767*p1 [tot=1217.726s,avg_p1=3.437]
14StartMinimizing: 16.172*p1 + 98.991 [tot=290.328s,avg_p1=10.706]
15StartOptimizing: 11.440*p1 + 44.331 [tot=155.069s,avg_p1=9.406]

rpi5-16-3

 0Processing -
 1Overhead: 23.919
 2Activate: 9.580*p1 [tot=8210.929s,avg_p1=18.237]
 3Deactivate: 10.268*p1 + 4.082 [tot=7682.228s,avg_p1=23.952]
 4GetLinearization: 6.528*p2 + 1808.705 [tot=2785.122s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 43.177*p1 [tot=1916.934s,avg_p1=36.093]
 6MakeTopological: 16.760*p1 + 31.928*p2 + 58.769 [tot=899.547s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.612*p1 [tot=612.147s,avg_p1=10.536]
 8MergeChunks2: 2.508*p1 + 2.073 [tot=778.955s,avg_p1=6.161]
 9MinimizeStep1: 12.241*p1 [tot=6458.627s,avg_p1=11.163]
10MinimizeStep2: 30.695*p1 [tot=714.221s,avg_p1=0.570]
11PickChunkToOptimize: 1.823*p1 [tot=0.012s,avg_p1=1.004]
12PickDependendencyToSplit: 10.617*p1 + 3.206 [tot=3198.383s,avg_p1=13.918]
13PickMergeCandidate: 11.557*p1 [tot=2321.639s,avg_p1=3.437]
14StartMinimizing: 19.516*p1 + 82.751 [tot=321.960s,avg_p1=10.706]
15StartOptimizing: 13.401*p1 + 39.225 [tot=171.666s,avg_p1=9.406]

 0Processing -
 1Overhead: 43.043
 2Activate: 9.825*p1 [tot=8114.233s,avg_p1=18.237]
 3Deactivate: 11.167*p1 [tot=8091.244s,avg_p1=23.952]
 4GetLinearization: 6.221*p2 + 1775.453 [tot=2717.624s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 34.253*p1 [tot=1516.213s,avg_p1=36.093]
 6MakeTopological: 20.470*p1 + 32.651*p2 + 42.410 [tot=956.974s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.078*p1 [tot=311.138s,avg_p1=10.536]
 8MergeChunks2: 2.083*p1 [tot=478.650s,avg_p1=6.161]
 9MinimizeStep1: 12.305*p1 [tot=6297.812s,avg_p1=11.163]
10MinimizeStep2: 11.714*p1 [tot=316.065s,avg_p1=0.570]
11PickChunkToOptimize: 1.157*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 11.091*p1 [tot=3197.864s,avg_p1=13.918]
13PickMergeCandidate: 10.869*p1 [tot=1797.324s,avg_p1=3.437]
14StartMinimizing: 20.431*p1 + 60.394 [tot=306.907s,avg_p1=10.706]
15StartOptimizing: 13.106*p1 + 14.754 [tot=147.286s,avg_p1=9.406]

rpi5-16-2 (same as the one above, added for redundancy)

 0Processing -
 1Overhead: 24.210
 2Activate: 9.573*p1 [tot=8198.324s,avg_p1=18.237]
 3Deactivate: 10.268*p1 + 3.742 [tot=7672.973s,avg_p1=23.952]
 4GetLinearization: 6.526*p2 + 1801.887 [tot=2779.829s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 43.270*p1 [tot=1921.183s,avg_p1=36.093]
 6MakeTopological: 15.494*p1 + 32.013*p2 + 116.802 [tot=901.244s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.606*p1 [tot=607.285s,avg_p1=10.536]
 8MergeChunks2: 2.504*p1 + 1.877 [tot=772.738s,avg_p1=6.161]
 9MinimizeStep1: 12.236*p1 [tot=6451.386s,avg_p1=11.163]
10MinimizeStep2: 30.574*p1 [tot=706.228s,avg_p1=0.570]
11PickChunkToOptimize: 1.832*p1 [tot=0.012s,avg_p1=1.004]
12PickDependendencyToSplit: 10.625*p1 + 2.806 [tot=3196.310s,avg_p1=13.918]
13PickMergeCandidate: 11.549*p1 [tot=2314.115s,avg_p1=3.437]
14StartMinimizing: 19.516*p1 + 82.092 [tot=321.303s,avg_p1=10.706]
15StartOptimizing: 13.397*p1 + 38.649 [tot=171.148s,avg_p1=9.406]

 0Processing -
 1Overhead: 46.657
 2Activate: 9.752*p1 [tot=8000.401s,avg_p1=18.237]
 3Deactivate: 11.147*p1 [tot=8046.825s,avg_p1=23.952]
 4GetLinearization: 6.132*p2 + 1800.206 [tot=2724.267s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 34.093*p1 [tot=1509.241s,avg_p1=36.093]
 6MakeTopological: 20.162*p1 + 32.663*p2 + 51.993 [tot=954.183s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 0.998*p1 [tot=273.622s,avg_p1=10.536]
 8MergeChunks2: 1.985*p1 [tot=419.395s,avg_p1=6.161]
 9MinimizeStep1: 12.683*p1 [tot=6443.862s,avg_p1=11.163]
10MinimizeStep2: 8.883*p1 [tot=244.891s,avg_p1=0.570]
11PickChunkToOptimize: 1.133*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 11.001*p1 [tot=3164.260s,avg_p1=13.918]
13PickMergeCandidate: 10.781*p1 [tot=1727.449s,avg_p1=3.437]
14StartMinimizing: 20.344*p1 + 51.992 [tot=298.248s,avg_p1=10.706]
15StartOptimizing: 13.363*p1 + 10.937 [tot=146.982s,avg_p1=9.406]

There are a few pending ones that OOM-ed, rerunning those with sed -i.bak 's/16384\.0, 65535/1000.0, 4000/' src/bitcoin-costmodel.cpp

Edit:

 0Processing -
 1Overhead: 23.503
 2Activate: 9.696*p1 [tot=8328.187s,avg_p1=18.237]
 3Deactivate: 10.384*p1 + 4.251 [tot=7772.549s,avg_p1=23.952]
 4GetLinearization: 6.578*p2 + 1800.094 [tot=2786.534s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 44.630*p1 [tot=1982.389s,avg_p1=36.093]
 6MakeTopological: 16.152*p1 + 31.819*p2 + 59.978 [tot=889.380s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.575*p1 [tot=599.944s,avg_p1=10.536]
 8MergeChunks2: 2.542*p1 + 2.236 [tot=794.670s,avg_p1=6.161]
 9MinimizeStep1: 12.450*p1 [tot=6596.160s,avg_p1=11.163]
10MinimizeStep2: 27.982*p1 [tot=682.729s,avg_p1=0.570]
11PickChunkToOptimize: 1.661*p1 [tot=0.011s,avg_p1=1.004]
12PickDependendencyToSplit: 10.239*p1 + 3.956 [tot=3093.713s,avg_p1=13.918]
13PickMergeCandidate: 11.289*p1 [tot=2284.851s,avg_p1=3.437]
14StartMinimizing: 19.044*p1 + 87.124 [tot=319.951s,avg_p1=10.706]
15StartOptimizing: 13.156*p1 + 34.495 [tot=165.185s,avg_p1=9.406]

 0Processing -
 1Overhead: 44.073
 2Activate: 10.076*p1 [tot=8334.172s,avg_p1=18.237]
 3Deactivate: 10.990*p1 [tot=7951.719s,avg_p1=23.952]
 4GetLinearization: 6.320*p2 + 1776.519 [tot=2741.772s,avg_p1=36.093,avg_p2=142.908]
 5Initialize: 36.820*p1 [tot=1633.318s,avg_p1=36.093]
 6MakeTopological: 20.920*p1 + 32.506*p2 + 53.411 [tot=964.090s,avg_p1=27.341,avg_p2=38.756]
 7MergeChunks1: 1.054*p1 [tot=300.165s,avg_p1=10.536]
 8MergeChunks2: 2.053*p1 [tot=447.727s,avg_p1=6.161]
 9MinimizeStep1: 12.609*p1 [tot=6396.861s,avg_p1=11.163]
10MinimizeStep2: 10.775*p1 [tot=297.619s,avg_p1=0.570]
11PickChunkToOptimize: 1.160*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 11.072*p1 [tot=3197.601s,avg_p1=13.918]
13PickMergeCandidate: 10.683*p1 [tot=1772.882s,avg_p1=3.437]
14StartMinimizing: 20.417*p1 + 59.857 [tot=305.730s,avg_p1=10.706]
15StartOptimizing: 13.465*p1 + 14.249 [tot=151.251s,avg_p1=9.406]
16</details>

DrahtBot added the label Needs rebase on Feb 18, 2026

DrahtBot commented at 10:07 am on February 18, 2026: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

sipa commented at 1:46 pm on February 18, 2026: member

Thank you for all the benchmark data, @l0rinc.

Closing this, to reopen with an uncluttered version, after updating with new numbers and methodology.

sipa closed this on Feb 18, 2026

sipa commented at 5:32 pm on February 18, 2026: member

Replaced by #34616.

Cluster mempool: more accurate cost model for SFL #34138

Code Coverage & Benchmarks

Reviews

Conflicts

Prerequisites