Cluster mempool: SFL cost model (take 2)

sipa commented at 5:26 pm on February 18, 2026: member

Part of #30289, replaces earlier #34138.

This introduces a more accurate cost model for SFL, to control how much CPU time is spent inside the algorithm for clusters that cannot be linearized perfectly within a reasonable amount of time.

The goal is having a metric for the amount of work performed, so that txmempool can impose limits on that work: a lower bound that is always performed (unless optimality is reached before that point, of course), and an upper bound to limit the latency and total CPU time spent on this. There are conflicting design goals here:

On the one hand, it seems ideal if this metric is closely correlated to actual CPU time, because otherwise the limits become inaccurate.
On the other hand, it seems a nightmare to have the metric be platform/system dependent, as it makes network-wide reasoning nearly impossible. It’s expected that slower systems take longer to do the same thing; this holds for everything, and we don’t need to compensate for this.

There are multiple solutions to this:

One extreme is just measuring the time. This is very accurate, but extremely platform dependent, and also non-deterministic due to random scheduling/cache effects.
The other extreme is using a very abstract metric like counting how many times certain loops/function inside the algorithm run. That is what is implemented in master right now, just counting the sum of the numbers of transactions updated across all UpdateChunks() calls. It however necessarily fails to account for significant portions of runtime spent elsewhere, resulting in a rather wide range of “ns per cost” values.
This PR takes a middle ground, counting many function calls / branches / loops, with weights that were determined through benchmarking on an average on a number of systems.

Specifically, the cost model was obtained by:

For a variety of machines:
- Running a fixed collection of ~385000 clusters found through random generation and fuzzing, optimizing for difficulty of linearization.
  - Linearize each 1000-5000 times, with different random seeds. Sometimes without input linearization, sometimes with a bad one.
    - Gather cycle counts for each of the operations included in this cost model, broken down by their parameters.
- Correct the data by subtracting the runtime of obtaining the cycle count.
- Drop the 5% top and bottom samples from each cycle count dataset, and compute the average of the remaining samples.
- For each operation, fit a least-squares linear function approximation through the samples.
Rescale all machine expressions to make their total time match, as we only care about relative cost of each operation.
Take the per-operation average of operation expressions across all machines, to construct expressions for an average machine.
Approximate the result with integer coefficients.

The benchmarks were performed by l0rinc <pap.lorinc@gmail.com> and myself, on AMD Ryzen 5950X, AMD Ryzen 7995WX, AMD Ryzen 9980X, Apple M4 Max, Intel Core i5-12500H, Intel Core Ultra 7 155H, Intel N150 (Umbrel), Intel Core i7-7700, Intel Core i9-9900K, Intel Haswell (VPS, virtualized), Intel Xeon E5-2637, ARM Cortex-A76 (Raspberry Pi 5), ARM Cortex-A72 (Raspberry Pi 4).

Based on final benchmarking, the “acceptable” iteration count (which is the minimum spent on every cluster) is to 75000 units, which corresponds to roughly 50 μs on Ryzen 5950X and similar modern desktop hardware.

DrahtBot commented at 5:26 pm on February 18, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, murchandamus
Stale ACK	ajtowns

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34643 ([RFC] cluster_linearize: add O(N) fast path for chain-shaped clusters by HowHsu)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa force-pushed on Feb 18, 2026

DrahtBot added the label CI failed on Feb 18, 2026

sipa commented at 6:33 pm on February 18, 2026: member

I’m currently weighing each machine equally, which may not be what we want (e.g., it means x86_64 systems contribute 82%). It’s easy to redo the computation with other factors. The data and scripts are here: https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b

DrahtBot removed the label CI failed on Feb 18, 2026

murchandamus commented at 9:32 pm on February 19, 2026: member

I have not reviewed the SFL implementation before, but this seems pretty straight forward—just counting various operations at different weights.

I’m kinda curious how much the additional counting impacts the performance.

gmaxwell commented at 1:47 am on February 20, 2026: contributor

A measurement would be good for certainty but I wasn’t worried about the cost because instead of running inside inner loops most of it just adds an existing count like how many txn there are at some step or another.

sipa commented at 1:45 pm on February 20, 2026: member

Running this branch on all clusters seen in a simulated replay of 2023 mempool data, I get the following maximum costs (from 1 to 64 transaction clusters, inclusive): 148, 413, 717, 1196, 1505, 2413, 2222, 2772, 3357, 3847, 4899, 5557, 6461, 7431, 8507, 9144, 10737, 12169, 13691, 14034, 16550, 18539, 19804, 21090, 22750, 24219, 25492, 27941, 26319, 28270, 26858, 26835, 30743, 32388, 34586, 33729, 32543, 34478, 23431, 36431, 39179, 25413, 34125, 29534, 30974, 34904, 45576, 68614, 35684, 31717, 37144, 34917, 36022, 43473, 50659, 41848, 48538, 64825, 43945, 49389, 45978, 45961, 46946, 52467.

Same number for recent dumps from oct 2025 - yesterday (not simulated replay): 148, 413, 717, 1176, 1616, 2430, 3000, 3624, 3944, 4764, 4800, 6147, 9162, 11460, 8832, 11347, 9785, 12437, 13388, 14681, 12882, 19168, 16532, 23370, 22523, 19992, 14850, 19338, 26696, 23344, 15424, 23684, 30767, 24648, 25410, 27899, 20638, 25316, 24522, 26477, 24069, 34044, 24520, 40842, 22827, 29580, 27610, 25332, 26773, 30432, 30786, 36058, 37928, 37084, 35088, 40434, 39613, 40156, 42154, 49497, 42668, 55243, 39423, 55692.

So almost all sizes (all except ntx=58 and ntx=48 in 2023, all recent ones) saw no clusters that would need more than the acceptable level of 64000 cost. These numbers are created by running every cluster 100 times, with 100 difference RNG seeds, and with “worst case” input linearizations (including random ones, worst-possible inputs, or no input). In practice, I expect that most large clusters will be built up gradually, so will be passed a nearly-perfect input linearization, yielding better numbers than these. Also note that linearization efforts perform up to 5 times the acceptable level (so 320000 cost) spread over multiple clusters. If there is a single non-optimal cluster left, all that effort may be spent on one.

Histogram for a few selected transaction counts, for the recent mempool dumps:

And the same for the simulated 2023 dumps:

For reference, here are the same numbers with the cost model in current master (acceptable limit = 1700), for the recent dumps: 0, 4, 10, 34, 48, 92, 122, 146, 169, 264, 242, 305, 545, 664, 491, 660, 628, 800, 868, 844, 842, 1088, 1172, 1387, 1424, 1211, 920, 1129, 1583, 1413, 978, 1404, 1838, 1611, 1492, 1701, 1248, 1506, 1541, 1792, 1458, 2045, 1533, 2656, 1380, 1938, 1789, 1534, 1726, 1800, 1825, 2181, 2532, 2213, 2216, 2437, 2600, 2535, 2581, 3101, 2571, 3472, 2351, 3552.

sipa commented at 8:07 pm on February 20, 2026: member

@murchandamus @gmaxwell I ran the included benchmarks before and after this PR (./build/bin/bench_bitcoin -filter='LinearizeOptimally.*Total.*' -min-time=10000), and created this graph to compare the numbers:

It looks like a slight slowdown for faster benchmarks, and a slight speedup for the slower ones - which I assume is just noise introduced by frequency scaling. The geometric average is 0.6% faster after this PR (which makes no sense).

I assume the impact of this PR is negligible.

gmaxwell commented at 3:49 am on February 21, 2026: contributor

The figures show that this is useful to get in: without it there is a fair amount of recent flux that may not reach optimality, but will post this PR.

ajtowns commented at 5:36 am on February 21, 2026: contributor

Might have been nicer to bump the ACCEPTABLE_ITERS and ITERS values in commit one as a scaling change, changing each “iter” from being 1 unit to being a 37 or 38 unit cost – then it would be obvious that the second commit is reducing ActiveEnd() and DeactivateEnd() from a 37*num_deps + 37 cost to 9*num_deps / 10*num_deps + 8 cost with the remaining cost value distributed amongst other functions that previously had no cost accounting. That might also make it clearer if this change is reducing some ITERS values in the second commit (vs having to check if they’re raised by more or less than a factor of 64000/1700)?

The “iters” naming now seems inaccurate, perhaps worth a scripted-diff followup?

Doing a very quick check with wip_costmodel at 5c47cbf8dc6e18472d2df2ab14fc90ba5c18096f, and dividing all the results by a factor of 6 for running on super-old hardware, I get roughly similar results to the core_n150 figures from the gist, which seems pretty reasonable, so I don’t see any reasons to be concerned about the figures used.

Maybe it would be good to add the scripts to contrib/ or maintainer-tools/ so the data can be re-generated in a couple of years with a new generation of hardware (and potential code improvements)?

ACK 4673a5870fd66b8f91002b128963239100ce8930

sipa force-pushed on Feb 21, 2026

DrahtBot added the label CI failed on Feb 21, 2026

sipa commented at 4:03 pm on February 21, 2026: member

I added two commits, one which consequently changes the terminology from “iters” to “cost” everywhere. I didn’t do it as a scripted diff, as there are many variables and names, as well as comments that need changing. The other one rescales the cost by 38, and changes the ACCEPTABLE_COST value to 64000.

I didn’t include the scripts/tooling to construct the cost model here (yet) because it needs some cleaning up (e.g. i use a variety of cluster serialization formats, contains some fuzzing stuff that isn’t appropriate here, etc). I did modify the preparatory “introduce costmodel class” commit to have all functions used by the benchmark code, so that it can be added as a future followup without changes to cluster_linearize.h.

This makes it much clearer what real impact the new cost model has, but to be fair: the ACCEPTABLE_COST=1700 value was set prior to #34023, which sped things up singificantly, so it wouldn’t be unreasonable to also first bump the ACCEPTABLE_COST value somehow to account for that. That would make it clearer that the advance of a better cost model isn’t in what it means for real-world clusters, but that it’s a protection against worst case - the old naive model is likely off (by a small constant factor) for adversarially-constructed clusters.

DrahtBot removed the label CI failed on Feb 21, 2026

bitcoin deleted a comment on Feb 22, 2026

DrahtBot added the label Needs rebase on Feb 23, 2026

sipa force-pushed on Feb 23, 2026

in src/txmempool.h:54 in 090bd916b0

50@@ -51,7 +51,7 @@ static const uint32_t MEMPOOL_HEIGHT = 0x7FFFFFFF;
51 
52 /** How much linearization cost required for TxGraph clusters to have
53  * "acceptable" quality, if they cannot be optimally linearized with less cost. */
54-static constexpr uint64_t ACCEPTABLE_COST = 1'700;
55+static constexpr uint64_t ACCEPTABLE_COST = 64'000;

murchandamus commented at 9:16 pm on February 23, 2026:

I was a bit surprised to see that the ACCEPTABLE_COST is flat per cluster. From what I understand in conjunction with the costs from above, small clusters of up to 15 txs will generally be optimal even within the ACCEPTABLE_COST, but assuming roughly similarly sized txs, it would mean that we would be willing to do a lot more work per total transaction weight, if we get a whole mempool full of clusters with 16–20 txs than if the mempool were filled with clusters that have 64 txs. Assuming the limit is chosen not to be an issue at highest compute per weight, but just thought it was counter-intuitive.

in src/test/util/cluster_linearize.h:407 in 1494777e03

410-        712006, 630078, 897636, 1082468, 1164776, 1254798, 1251796, 1244310,
411-        1293748, 996626, 1317156, 1444722, 1550932, 1182294, 1575024, 1290784,
412-        1330912, 2249866, 1629136, 1568526, 1609870, 1969654, 2409580, 2547330
413+        0, 413, 721, 1319, 2195, 3446, 4792, 7135,
414+        8247, 9999, 12359, 17234, 18011, 23076, 25101, 28798,
415+        37024, 42019, 50090, 51380, 64472, 56722, 85284, 86352,

murchandamus commented at 9:23 pm on February 23, 2026:

I’m surprised that the cost doesn’t increase monotonously. I see that it was also the case in the old costs, but do you have an idea why cost goes down from 64'472 to 56'722 on clusters with 21 and 22 txs respectively?

There are also some reductions later, with 38 and 39 being smaller than 37, 41 smaller than 40, 42 smaller than 41, etc.

Is it just that no worse clusters have yet been found for the larger counts, or smth? If there is a cluster with 21 txs that takes over 64'000 cost to linearize, wouldn’t just appending another ancestor or child to it produce a 22 tx-cluster that takes at least as much time?

sipa commented at 9:46 pm on February 23, 2026:

The real upper bound is certainly monotonic, but indeed, the numbers here are pulled from a large randomized search that tried to find hard clusters, but without much guidance. I considered setting the numbers to COSTS[i] = max(SEEN_COST[j] for j=0..i), but figured it doesn’t really achieve much - this is (amongst others) the reason for multiplying by 2 at the end.

murchandamus commented at 10:10 pm on February 23, 2026:

I was just considering whether adding transactions to smaller clusters to increase the cost of the clusters with bigger counts would make sense, but that would also homogenize the seeds for the clusters and maybe distort your cost model in odd ways.

Either way, makes sense that it would be harder to find bad clusters when the counts go up.

DrahtBot removed the label Needs rebase on Feb 23, 2026

murchandamus commented at 9:26 pm on February 23, 2026: member

Code review ACK 1494777e03fd6c2e0670335bb2ad613e1133a2bc

Concept makes sense to me. I reviewed the changes compared to the prior version. I have not tried to repeat the methodology for creating the cost model.

DrahtBot requested review from ajtowns on Feb 23, 2026

l0rinc commented at 9:18 am on February 24, 2026: contributor

It’s easy to redo the computation with other factors. The data and scripts are here: https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b

I have rerun the bitcoin-costmodel benchmarks with the latest changes - with reduced memory usage and iteration count of 1000 only.

 0git fetch --depth=1 https://github.com/sipa/bitcoin.git wip_costmodel && \
 1git reset --hard FETCH_HEAD && \
 2sed -i 's/16384\.0, 65535/1000.0, 4000/' src/bitcoin-costmodel.cpp && \
 3curl -sSLO https://gist.githubusercontent.com/sipa/68109fe5f412f6c42f9b157420f2911b/raw/33f925772771e0e9bb0d7724bfd0aa0e8ee0a381/digest_fit.py && \
 4pip install numpy --break-system-packages 2>/dev/null; \
 5for compiler in gcc clang; do \
 6  if [ "$compiler" = "gcc" ]; then CC=gcc; CXX=g++; else CC=clang; CXX=clang++; fi && \
 7  COMP_VER=$($CXX --version | head -1) && \
 8  rm -rf build && \
 9  cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_IPC=OFF -DCMAKE_C_COMPILER=$CC -DCMAKE_CXX_COMPILER=$CXX >/dev/null 2>&1 && \
10  cmake --build build --target bitcoin-costmodel -j$(nproc) >/dev/null 2>&1 && \
11  echo "" && \
12  echo "$(date -I) | bitcoin-costmodel | ZSTD $(zstd --version 2>&1 | grep -oP 'v[\d.]+') | $COMP_VER | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T . | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source . | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" && \
13  echo "" && \
14  zstd -d <cluster-benchmarks.zst | ./build/bin/bitcoin-costmodel -iters=1000 run - >output-$compiler.log && \
15  python3 digest_fit.py output-$compiler.log >result-$(hostname)-$compiler.txt && \
16  cat result-$(hostname)-$compiler.txt; \
17done

Apple M4 Max, 64GB, SSD

 0Processing -
 1Overhead: 8.179
 2Activate: 5.598*p1 [tot=1594.213s,avg_p1=18.238]
 3Deactivate: 5.660*p1 + 3.571 [tot=1430.088s,avg_p1=23.953]
 4GetLinearization: 12.223*p1 + 1.053*p2 + 42.704 [tot=245.904s,avg_p1=36.092,avg_p2=142.923]
 5Initialize: 10.955*p1 [tot=164.742s,avg_p1=36.092]
 6MakeTopological: 8.079*p1 + 15.451*p2 [tot=139.989s,avg_p1=27.358,avg_p2=38.786]
 7MergeChunks1: 0.425*p1 + 1.592 [tot=115.999s,avg_p1=10.535]
 8MergeChunks2: 0.666*p1 + 6.503 [tot=164.219s,avg_p1=6.160]
 9MinimizeStep1: 4.032*p1 + 12.943 [tot=838.476s,avg_p1=11.163]
10MinimizeStep2: 14.358*p1 [tot=145.787s,avg_p1=0.570]
11PickChunkToOptimize: 0.281*p1 + 3.921 [tot=24.566s,avg_p1=1.004]
12PickDependendencyToSplit: 3.425*p1 [tot=350.921s,avg_p1=13.920]
13PickMergeCandidate: 4.474*p1 [tot=370.141s,avg_p1=3.438]
14StartMinimizing: 7.099*p1 + 46.025 [tot=42.637s,avg_p1=10.706]
15StartOptimizing: 5.328*p1 + 4.745 [tot=20.605s,avg_p1=9.406]

i9, SSD

 0Processing -
 1Overhead: 18.910
 2Activate: 9.020*p1 + 3.043 [tot=2685.294s,avg_p1=18.237]
 3Deactivate: 9.033*p1 + 19.407 [tot=2401.357s,avg_p1=23.952]
 4GetLinearization: 57.790*p1 + 4.715*p2 [tot=992.497s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 40.476*p1 [tot=615.574s,avg_p1=36.093]
 6MakeTopological: 21.812*p1 + 20.861*p2 + 121.765 [tot=260.722s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 2.232*p1 [tot=340.310s,avg_p1=10.536]
 8MergeChunks2: 2.674*p1 + 6.251 [tot=309.821s,avg_p1=6.160]
 9MinimizeStep1: 10.284*p1 + 11.467 [tot=1922.329s,avg_p1=11.163]
10MinimizeStep2: 10.260*p1 + 15.063 [tot=155.468s,avg_p1=0.570]
11PickChunkToOptimize: 2.690*p1 + 2.558 [tot=0.014s,avg_p1=1.004]
12PickDependendencyToSplit: 7.033*p1 + 16.113 [tot=760.427s,avg_p1=13.921]
13PickMergeCandidate: 6.877*p1 [tot=549.398s,avg_p1=3.437]
14StartMinimizing: 17.119*p1 + 164.092 [tot=117.720s,avg_p1=10.706]
15StartOptimizing: 11.648*p1 + 81.314 [tot=62.848s,avg_p1=9.404]

 0Processing -
 1Overhead: 19.177
 2Activate: 9.074*p1 + 10.792 [tot=2792.868s,avg_p1=18.237]
 3Deactivate: 10.277*p1 + 21.294 [tot=2726.177s,avg_p1=23.952]
 4GetLinearization: 55.482*p1 + 4.435*p2 [tot=938.795s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 32.914*p1 [tot=489.013s,avg_p1=36.093]
 6MakeTopological: 21.233*p1 + 23.192*p2 + 95.112 [tot=269.341s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 2.165*p1 [tot=331.263s,avg_p1=10.536]
 8MergeChunks2: 2.546*p1 + 10.058 [tot=339.222s,avg_p1=6.160]
 9MinimizeStep1: 10.387*p1 + 15.747 [tot=1964.615s,avg_p1=11.163]
10MinimizeStep2: 4.470*p1 + 16.456 [tot=141.287s,avg_p1=0.570]
11PickChunkToOptimize: 1.441*p1 + 6.319 [tot=0.008s,avg_p1=1.004]
12PickDependendencyToSplit: 7.632*p1 + 19.780 [tot=835.190s,avg_p1=13.921]
13PickMergeCandidate: 5.950*p1 [tot=461.772s,avg_p1=3.437]
14StartMinimizing: 16.261*p1 + 112.590 [tot=102.005s,avg_p1=10.706]
15StartOptimizing: 11.998*p1 + 61.388 [tot=58.601s,avg_p1=9.404]

umbrel (Intel N150), 16GB, SSD

 0Processing -
 1
 2Overhead: 27.832
 3Activate: 8.955*p1 [tot=2515.991s,avg_p1=18.237]
 4Deactivate: 9.744*p1 [tot=2383.591s,avg_p1=23.952]
 5GetLinearization: 36.150*p1 + 2.921*p2 [tot=634.859s,avg_p1=36.093,avg_p2=142.894]
 6Initialize: 35.446*p1 + 172.795 [tot=605.554s,avg_p1=36.093]
 7MakeTopological: 20.715*p1 + 22.494*p2 + 44.934 [tot=253.173s,avg_p1=27.350,avg_p2=38.778]
 8MergeChunks1: 2.057*p1 [tot=273.357s,avg_p1=10.536]
 9MergeChunks2: 2.501*p1 + 0.509 [tot=240.955s,avg_p1=6.160]
10MinimizeStep1: 11.299*p1 [tot=1970.837s,avg_p1=11.163]
11MinimizeStep2: 2.320*p1 + 16.584 [tot=133.120s,avg_p1=0.570]
12PickChunkToOptimize: 1.664*p1 [tot=0.007s,avg_p1=1.004]
13PickDependendencyToSplit: 8.582*p1 [tot=846.620s,avg_p1=13.921]
14PickMergeCandidate: 6.073*p1 [tot=401.670s,avg_p1=3.437]
15StartMinimizing: 15.343*p1 + 103.957 [tot=91.941s,avg_p1=10.706]
16StartOptimizing: 11.512*p1 + 14.095 [tot=43.644s,avg_p1=9.404]

Intel Xeon E5, WSL, SSD

 0Overhead: 28.017
 1Activate: 8.768*p1 [tot=2519.184s,avg_p1=18.237]
 2Deactivate: 9.359*p1 + 4.557 [tot=2342.359s,avg_p1=23.952]
 3GetLinearization: 54.233*p1 + 4.810*p2 [tot=972.846s,avg_p1=36.093,avg_p2=142.894]
 4Initialize: 41.135*p1 [tot=624.424s,avg_p1=36.093]
 5MakeTopological: 24.718*p1 + 22.593*p2 + 110.831 [tot=283.225s,avg_p1=27.350,avg_p2=38.778]
 6MergeChunks1: 2.448*p1 [tot=334.228s,avg_p1=10.536]
 7MergeChunks2: 2.819*p1 + 2.240 [tot=276.389s,avg_p1=6.160]
 8MinimizeStep1: 9.886*p1 [tot=1746.706s,avg_p1=11.163]
 9MinimizeStep2: 9.839*p1 + 15.325 [tot=155.631s,avg_p1=0.570]
10PickChunkToOptimize: 1.698*p1 [tot=0.007s,avg_p1=1.004]
11PickDependendencyToSplit: 8.367*p1 [tot=823.341s,avg_p1=13.921]
12PickMergeCandidate: 7.503*p1 [tot=521.529s,avg_p1=3.437]
13StartMinimizing: 17.882*p1 + 150.266 [tot=118.321s,avg_p1=10.706]
14StartOptimizing: 11.877*p1 + 57.946 [tot=56.209s,avg_p1=9.404]

i7, HDD

 0Processing -
 1Overhead: 19.116
 2Activate: 8.986*p1 + 2.753 [tot=2671.658s,avg_p1=18.237]
 3Deactivate: 8.965*p1 + 19.365 [tot=2383.935s,avg_p1=23.952]
 4GetLinearization: 61.028*p1 + 4.133*p2 [tot=986.357s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 40.006*p1 [tot=605.883s,avg_p1=36.093]
 6MakeTopological: 21.805*p1 + 20.715*p2 + 127.759 [tot=260.282s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 2.220*p1 [tot=337.175s,avg_p1=10.536]
 8MergeChunks2: 2.668*p1 + 5.773 [tot=305.424s,avg_p1=6.160]
 9MinimizeStep1: 10.191*p1 + 11.545 [tot=1905.236s,avg_p1=11.163]
10MinimizeStep2: 10.036*p1 + 14.665 [tot=151.558s,avg_p1=0.570]
11PickChunkToOptimize: 2.207*p1 + 5.212 [tot=0.013s,avg_p1=1.004]
12PickDependendencyToSplit: 6.974*p1 + 15.867 [tot=753.141s,avg_p1=13.921]
13PickMergeCandidate: 6.843*p1 [tot=543.565s,avg_p1=3.437]
14StartMinimizing: 17.133*p1 + 161.475 [tot=117.083s,avg_p1=10.706]
15StartOptimizing: 11.582*p1 + 76.595 [tot=61.557s,avg_p1=9.404]

 0Processing -
 1Overhead: 19.201
 2Activate: 9.101*p1 + 12.006 [tot=2814.620s,avg_p1=18.237]
 3Deactivate: 10.401*p1 + 21.868 [tot=2762.815s,avg_p1=23.952]
 4GetLinearization: 57.032*p1 + 3.832*p2 [tot=922.766s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 32.362*p1 [tot=480.837s,avg_p1=36.093]
 6MakeTopological: 22.839*p1 + 22.051*p2 + 97.927 [tot=269.747s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 2.156*p1 [tot=332.680s,avg_p1=10.536]
 8MergeChunks2: 2.542*p1 + 9.923 [tot=334.137s,avg_p1=6.160]
 9MinimizeStep1: 9.496*p1 + 21.487 [tot=1847.662s,avg_p1=11.163]
10MinimizeStep2: 3.844*p1 + 16.658 [tot=140.128s,avg_p1=0.570]
11PickChunkToOptimize: 1.351*p1 + 6.180 [tot=0.008s,avg_p1=1.004]
12PickDependendencyToSplit: 7.575*p1 + 20.227 [tot=831.213s,avg_p1=13.921]
13PickMergeCandidate: 5.968*p1 [tot=461.877s,avg_p1=3.437]
14StartMinimizing: 16.225*p1 + 114.680 [tot=102.335s,avg_p1=10.706]
15StartOptimizing: 12.003*p1 + 64.131 [tot=59.504s,avg_p1=9.404]

rpi5, 16GB, SSD

 0Processing -
 1Overhead: 23.898
 2Activate: 9.801*p1 [tot=2801.938s,avg_p1=18.237]
 3Deactivate: 10.322*p1 + 3.828 [tot=2572.187s,avg_p1=23.952]
 4GetLinearization: 51.459*p1 + 4.504*p2 [tot=925.741s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 44.627*p1 [tot=661.749s,avg_p1=36.093]
 6MakeTopological: 17.030*p1 + 31.337*p2 + 77.511 [tot=299.348s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.613*p1 [tot=205.340s,avg_p1=10.536]
 8MergeChunks2: 2.510*p1 + 2.451 [tot=263.993s,avg_p1=6.160]
 9MinimizeStep1: 12.290*p1 [tot=2166.691s,avg_p1=11.163]
10MinimizeStep2: 30.714*p1 [tot=238.051s,avg_p1=0.570]
11PickChunkToOptimize: 1.819*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 10.604*p1 + 3.797 [tot=1067.592s,avg_p1=13.921]
13PickMergeCandidate: 11.569*p1 [tot=777.395s,avg_p1=3.437]
14StartMinimizing: 21.294*p1 + 82.980 [tot=116.188s,avg_p1=10.706]
15StartOptimizing: 13.986*p1 + 41.363 [tot=60.348s,avg_p1=9.404]

 0Processing -
 1Overhead: 40.024
 2Activate: 10.081*p1 [tot=2803.479s,avg_p1=18.237]
 3Deactivate: 11.000*p1 [tot=2660.379s,avg_p1=23.952]
 4GetLinearization: 49.894*p1 + 4.451*p2 [tot=912.992s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 35.626*p1 [tot=530.229s,avg_p1=36.093]
 6MakeTopological: 21.264*p1 + 32.515*p2 + 49.365 [tot=322.696s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.145*p1 [tot=114.937s,avg_p1=10.536]
 8MergeChunks2: 2.148*p1 [tot=171.859s,avg_p1=6.160]
 9MinimizeStep1: 12.937*p1 [tot=2206.145s,avg_p1=11.163]
10MinimizeStep2: 14.086*p1 [tot=127.199s,avg_p1=0.570]
11PickChunkToOptimize: 1.312*p1 [tot=0.002s,avg_p1=1.004]
12PickDependendencyToSplit: 11.210*p1 [tot=1084.155s,avg_p1=13.921]
13PickMergeCandidate: 10.776*p1 [tot=617.271s,avg_p1=3.437]
14StartMinimizing: 20.572*p1 + 50.882 [tot=100.551s,avg_p1=10.706]
15StartOptimizing: 13.808*p1 + 18.358 [tot=52.586s,avg_p1=9.404]

rpi5, 4GB, SSD

 0Processing -
 1Overhead: 23.348
 2Activate: 9.819*p1 [tot=2810.914s,avg_p1=18.237]
 3Deactivate: 10.322*p1 + 4.427 [tot=2577.834s,avg_p1=23.952]
 4GetLinearization: 51.282*p1 + 4.502*p2 [tot=922.692s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 44.773*p1 [tot=664.630s,avg_p1=36.093]
 6MakeTopological: 17.031*p1 + 31.133*p2 + 67.815 [tot=296.774s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.626*p1 [tot=208.552s,avg_p1=10.536]
 8MergeChunks2: 2.511*p1 + 2.918 [tot=268.396s,avg_p1=6.160]
 9MinimizeStep1: 12.304*p1 + 0.051 [tot=2172.199s,avg_p1=11.163]
10MinimizeStep2: 31.101*p1 [tot=239.177s,avg_p1=0.570]
11PickChunkToOptimize: 1.845*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 10.601*p1 + 4.387 [tot=1069.637s,avg_p1=13.921]
13PickMergeCandidate: 11.586*p1 [tot=782.893s,avg_p1=3.437]
14StartMinimizing: 21.282*p1 + 78.390 [tot=114.229s,avg_p1=10.706]
15StartOptimizing: 13.881*p1 + 38.576 [tot=58.929s,avg_p1=9.404]

 0Processing -
 1Overhead: 38.428
 2Activate: 9.967*p1 [tot=2777.824s,avg_p1=18.237]
 3Deactivate: 11.228*p1 [tot=2725.796s,avg_p1=23.952]
 4GetLinearization: 50.975*p1 + 4.242*p2 [tot=914.941s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 35.868*p1 [tot=531.874s,avg_p1=36.093]
 6MakeTopological: 21.030*p1 + 31.953*p2 + 49.261 [tot=318.056s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.179*p1 [tot=120.372s,avg_p1=10.536]
 8MergeChunks2: 2.202*p1 [tot=183.343s,avg_p1=6.160]
 9MinimizeStep1: 13.269*p1 [tot=2264.548s,avg_p1=11.163]
10MinimizeStep2: 14.981*p1 [tot=134.309s,avg_p1=0.570]
11PickChunkToOptimize: 1.506*p1 [tot=0.002s,avg_p1=1.004]
12PickDependendencyToSplit: 11.144*p1 [tot=1080.400s,avg_p1=13.921]
13PickMergeCandidate: 10.825*p1 [tot=629.195s,avg_p1=3.437]
14StartMinimizing: 20.817*p1 + 59.740 [tot=104.219s,avg_p1=10.706]
15StartOptimizing: 13.381*p1 + 18.251 [tot=51.103s,avg_p1=9.404]

rpi5, 8GB, SSD

 0Processing -
 1Overhead: 23.140
 2Activate: 9.687*p1 [tot=2777.049s,avg_p1=18.237]
 3Deactivate: 10.456*p1 + 3.688 [tot=2603.650s,avg_p1=23.952]
 4GetLinearization: 50.741*p1 + 4.603*p2 [tot=923.867s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 42.846*p1 [tot=634.682s,avg_p1=36.093]
 6MakeTopological: 14.236*p1 + 31.691*p2 + 53.419 [tot=285.318s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.584*p1 [tot=202.209s,avg_p1=10.536]
 8MergeChunks2: 2.552*p1 + 2.358 [tot=267.465s,avg_p1=6.160]
 9MinimizeStep1: 12.420*p1 [tot=2194.739s,avg_p1=11.163]
10MinimizeStep2: 28.011*p1 [tot=226.643s,avg_p1=0.570]
11PickChunkToOptimize: 1.752*p1 [tot=0.004s,avg_p1=1.004]
12PickDependendencyToSplit: 10.283*p1 + 5.197 [tot=1041.000s,avg_p1=13.921]
13PickMergeCandidate: 11.311*p1 [tot=759.450s,avg_p1=3.437]
14StartMinimizing: 19.361*p1 + 82.797 [tot=106.814s,avg_p1=10.706]
15StartOptimizing: 13.656*p1 + 37.500 [tot=57.827s,avg_p1=9.404]

 0Processing -
 1Overhead: 40.332
 2Activate: 9.952*p1 [tot=2770.413s,avg_p1=18.237]
 3Deactivate: 11.188*p1 [tot=2704.028s,avg_p1=23.952]
 4GetLinearization: 51.984*p1 + 4.253*p2 [tot=927.931s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 35.707*p1 [tot=529.655s,avg_p1=36.093]
 6MakeTopological: 22.520*p1 + 31.452*p2 + 58.737 [tot=322.704s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 1.142*p1 [tot=114.736s,avg_p1=10.536]
 8MergeChunks2: 2.182*p1 [tot=182.552s,avg_p1=6.160]
 9MinimizeStep1: 12.512*p1 [tot=2144.423s,avg_p1=11.163]
10MinimizeStep2: 14.020*p1 [tot=126.242s,avg_p1=0.570]
11PickChunkToOptimize: 1.381*p1 [tot=0.002s,avg_p1=1.004]
12PickDependendencyToSplit: 11.267*p1 [tot=1094.306s,avg_p1=13.921]
13PickMergeCandidate: 10.953*p1 [tot=621.661s,avg_p1=3.437]
14StartMinimizing: 20.315*p1 + 61.064 [tot=103.192s,avg_p1=10.706]
15StartOptimizing: 13.565*p1 + 19.028 [tot=52.060s,avg_p1=9.404]

rpi4, 2GB, SSD

 0Processing -
 1Overhead: 35.363
 2Activate: 17.547*p1 [tot=5095.471s,avg_p1=18.237]
 3Deactivate: 18.994*p1 + 9.833 [tot=4762.486s,avg_p1=23.952]
 4GetLinearization: 122.981*p1 + 7.866*p2 [tot=2027.524s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 106.738*p1 [tot=1604.800s,avg_p1=36.093]
 6MakeTopological: 30.780*p1 + 31.800*p2 + 276.003 [tot=398.491s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 4.087*p1 [tot=589.879s,avg_p1=10.536]
 8MergeChunks2: 5.379*p1 + 8.576 [tot=577.186s,avg_p1=6.160]
 9MinimizeStep1: 21.465*p1 + 0.182 [tot=3824.965s,avg_p1=11.163]
10MinimizeStep2: 34.209*p1 + 11.073 [tot=227.304s,avg_p1=0.570]
11PickChunkToOptimize: 2.891*p1 + 1.060 [tot=0.014s,avg_p1=1.004]
12PickDependendencyToSplit: 17.653*p1 + 9.053 [tot=1783.533s,avg_p1=13.921]
13PickMergeCandidate: 17.830*p1 [tot=1337.157s,avg_p1=3.437]
14StartMinimizing: 37.755*p1 + 215.639 [tot=228.088s,avg_p1=10.706]
15StartOptimizing: 25.006*p1 + 105.459 [tot=118.366s,avg_p1=9.404]

 0Processing -
 1Overhead: 58.460
 2Activate: 17.624*p1 [tot=5043.321s,avg_p1=18.237]
 3Deactivate: 18.746*p1 [tot=4563.726s,avg_p1=23.952]
 4GetLinearization: 115.308*p1 + 8.739*p2 [tot=1980.740s,avg_p1=36.093,avg_p2=142.894]
 5Initialize: 95.302*p1 + 206.111 [tot=1526.402s,avg_p1=36.093]
 6MakeTopological: 48.451*p1 + 50.039*p2 + 201.466 [tot=589.358s,avg_p1=27.350,avg_p2=38.778]
 7MergeChunks1: 3.269*p1 [tot=401.560s,avg_p1=10.536]
 8MergeChunks2: 5.289*p1 + 1.280 [tot=498.795s,avg_p1=6.160]
 9MinimizeStep1: 20.998*p1 [tot=3651.613s,avg_p1=11.163]
10MinimizeStep2: 8.388*p1 + 29.740 [tot=256.643s,avg_p1=0.570]
11PickChunkToOptimize: 2.844*p1 [tot=0.006s,avg_p1=1.004]
12PickDependendencyToSplit: 18.648*p1 [tot=1827.622s,avg_p1=13.921]
13PickMergeCandidate: 16.577*p1 [tot=1036.357s,avg_p1=3.437]
14StartMinimizing: 39.507*p1 + 133.783 [tot=207.539s,avg_p1=10.706]
15StartOptimizing: 25.044*p1 + 43.924 [tot=100.029s,avg_p1=9.404]

Intel Xeon E5, WSL, SSD

 0Overhead: 28.017
 1Activate: 8.768*p1 [tot=2519.184s,avg_p1=18.237]
 2Deactivate: 9.359*p1 + 4.557 [tot=2342.359s,avg_p1=23.952]
 3GetLinearization: 54.233*p1 + 4.810*p2 [tot=972.846s,avg_p1=36.093,avg_p2=142.894]
 4Initialize: 41.135*p1 [tot=624.424s,avg_p1=36.093]
 5MakeTopological: 24.718*p1 + 22.593*p2 + 110.831 [tot=283.225s,avg_p1=27.350,avg_p2=38.778]
 6MergeChunks1: 2.448*p1 [tot=334.228s,avg_p1=10.536]
 7MergeChunks2: 2.819*p1 + 2.240 [tot=276.389s,avg_p1=6.160]
 8MinimizeStep1: 9.886*p1 [tot=1746.706s,avg_p1=11.163]
 9MinimizeStep2: 9.839*p1 + 15.325 [tot=155.631s,avg_p1=0.570]
10PickChunkToOptimize: 1.698*p1 [tot=0.007s,avg_p1=1.004]
11PickDependendencyToSplit: 8.367*p1 [tot=823.341s,avg_p1=13.921]
12PickMergeCandidate: 7.503*p1 [tot=521.529s,avg_p1=3.437]
13StartMinimizing: 17.882*p1 + 150.266 [tot=118.321s,avg_p1=10.706]
14StartOptimizing: 11.877*p1 + 57.946 [tot=56.209s,avg_p1=9.404]

clusterlin: introduce CostModel class (preparation)

This parametrizes the cost model for the SFL algorithm with another
class. Right now, the behavior of that class matches the naive cost
model so far, but it will be replaced with a more advanced on in a
future commit.

The reason for abstracting this out is that it makes benchmarking for
creating such cost models easy, by instantiating the cost model class
with one that tracks time.

9e7129df29

clusterlin: use 'cost' terminology instead of 'iters' (refactor) ecc9a84f85

clusterlin: rescale costs (preparation) 4eefdfc5b7

sipa force-pushed on Feb 24, 2026

in src/cluster_linearize.h:510 in 9bc2ff2485

506+         m_cost += 39 * num_txns;
507+         // Cost of producing linearization at the end.
508+         m_cost += 48 * num_txns + 4 * num_deps;
509+    }
510     inline void GetLinearizationBegin() noexcept {}
511     inline void GetLinearizationEnd(int num_txns, int num_deps) noexcept {}

instagibbs commented at 4:48 pm on February 24, 2026:

9bc2ff248516ccaf6a0cad1879e1143d00ba6f02

unused?

sipa commented at 4:53 pm on February 24, 2026:

Yes and no. It is used in the commit (not included in this PR) which adds the benchmarking code, used to determine the model (just like all the ...Begin() calls appear unused here).

GetLinearization is special however, in that we cannot account for its cost at the end - that’s too late, as we only invoke it after determining we’re past the cost limit already. So in the adopted model here, it’s accounted for at startup, inside InitializeEnd.

instagibbs commented at 4:57 pm on February 24, 2026:

Add the latter part as a brief comment?

sipa commented at 5:05 pm on February 24, 2026:

Done.

sipa commented at 4:51 pm on February 24, 2026: member

I redid the numbers after receiving more benchmarks (Raspberry Pi 4 and Intel Xeon E5-2637), and fixing a bug in the least squares approximation (which only affects GetLinearization()).

To aim for the same accuracy as before, the scale factor is now slightly higher, using acceptable iters 75000, and using a factor 44 in the initial rescaling.

clusterlin: adopt trained cost model (feature)

See the comments for the SFLDefaultCostModel class for details on how
the numbers were obtained.

744d47fcee

sipa force-pushed on Feb 24, 2026

DrahtBot added the label CI failed on Feb 24, 2026

sipa commented at 6:10 pm on February 24, 2026: member

I think it would be good to have this in Bitcoin Core 31.0 still, so as it means practically linearizing everything optimally (assuming cluster patterns remain as we’ve seen historically).

in src/cluster_linearize.h:1779 in ecc9a84f85

1777@@ -1778,23 +1778,23 @@ std::tuple<std::vector<DepGraphIndex>, bool, uint64_t> Linearize(
1778     }
1779     // Make improvement steps to it until we hit the max_iterations limit, or an optimal result

instagibbs commented at 6:56 pm on February 24, 2026:

ecc9a84f854e5b77dfc8876cf7c9b8d0f3de89d0

stale max_iterations

fanquake added this to the milestone 31.0 on Feb 24, 2026

in src/test/txgraph_tests.cpp:384 in ecc9a84f85

382@@ -383,7 +383,7 @@ BOOST_AUTO_TEST_CASE(txgraph_staging)
383     /* Create a new graph for the test.
384      * The parameters are max_cluster_count, max_cluster_size, acceptable_iters

instagibbs commented at 6:59 pm on February 24, 2026:

ecc9a84f854e5b77dfc8876cf7c9b8d0f3de89d0

stale acceptable_iters (could just annotate the params if we want this documentation)

DrahtBot removed the label CI failed on Feb 24, 2026

in src/test/fuzz/cluster_linearize.cpp:1024 in 744d47fcee

1020@@ -1021,6 +1021,7 @@ FUZZ_TARGET(clusterlin_linearize)
1021     try {
1022         reader >> VARINT(max_cost) >> Using<DepGraphFormatter>(depgraph) >> rng_seed >> flags;
1023     } catch (const std::ios_base::failure&) {}
1024+    if (depgraph.TxCount() <= 1) return;

instagibbs commented at 7:13 pm on February 24, 2026:

744d47fcee0d32a71154292699bfdecf954a6065

This just because we don’t actually care to update the table because we have specialized singleton clusters that don’t call it?

instagibbs approved

instagibbs commented at 8:04 pm on February 24, 2026: member

ACK 744d47fcee0d32a71154292699bfdecf954a6065

Changes are straight forward.

Only did light validation of the costing on my own machine, making sure the ns/cost benchmarks weren’t diverging and match expected.

Would be nice to have some additional paper trail on methodology (including bug fixes…), though I’m unsure where it would live.

DrahtBot requested review from murchandamus on Feb 24, 2026

sipa commented at 8:40 pm on February 24, 2026: member

@instagibbs What are your numbers for ./build/bin/bench_bitcoin -filter=.*Linearize.*PerCost.* -min-time=100? The 2.5 ns number is what I extrapolate for a Raspberry Pi 4, which would be surprising if your machine was slower.

instagibbs commented at 9:42 pm on February 24, 2026: member

Chaos agents may have introduced sanitizers at some point and I’m clearly not as familiar with cmake cache as I should be… down to 0.5-0.93ns/cost

murchandamus commented at 11:47 pm on February 24, 2026: member

reACK 744d47fcee0d32a71154292699bfdecf954a6065

via range-diff:

git range-diff 9581a0a5b18a4d7b22e293b069c731421d6135b1..1494777e03fd6c2e0670335bb2ad613e1133a2bc 5db78c84ad278b3aaa21fa48f036293e335886a9..744d47fcee0d32a71154292699bfdecf954a6065

Also ran the cost model (with only 10 iters):

 02026-02-24 | bitcoin-costmodel | ZSTD v1.5.7 | g++ (Ubuntu 15.2.0-4ubuntu4) 15.2.0 | Callisto | x86_64 | AMD Ryzen 9 9950X3D 16-Core Processor | 32 cores | 89Gi RAM | zfs | HDD
 1
 2Processing -
 3Overhead: 11.267
 4Activate: 4.888*p1 [tot=14.039s,avg_p1=18.279]
 5Deactivate: 4.856*p1 + 5.068 [tot=12.640s,avg_p1=24.067]
 6GetLinearization: 21.660*p1 + 1.119*p2 [tot=3.581s,avg_p1=36.027,avg_p2=142.750]
 7Initialize: 15.079*p1 [tot=2.293s,avg_p1=36.027]
 8MakeTopological: 9.610*p1 + 13.130*p2 [tot=1.389s,avg_p1=27.375,avg_p2=38.879]
 9MergeChunks1: 0.846*p1 [tot=1.166s,avg_p1=10.506]
10MergeChunks2: 1.234*p1 [tot=1.206s,avg_p1=6.129]
11MinimizeStep1: 4.622*p1 + 4.730 [tot=8.490s,avg_p1=11.154]
12MinimizeStep2: 12.405*p1 [tot=1.062s,avg_p1=0.570]
13PickChunkToOptimize: 0.537*p1 [tot=0.000s,avg_p1=1.004]
14PickDependendencyToSplit: 3.230*p1 + 4.823 [tot=3.556s,avg_p1=14.193]
15PickMergeCandidate: 3.083*p1 [tot=2.113s,avg_p1=3.463]
16StartMinimizing: 7.400*p1 + 68.191 [tot=0.500s,avg_p1=10.720]
17StartOptimizing: 6.001*p1 + 8.392 [tot=0.230s,avg_p1=9.339]
18
192026-02-24 | bitcoin-costmodel | ZSTD v1.5.7 | Ubuntu clang version 20.1.8 (0ubuntu4) | Callisto | x86_64 | AMD Ryzen 9 9950X3D 16-Core Processor | 32 cores | 89Gi RAM | zfs | HDD
20
21Processing -
22Overhead: 11.213
23Activate: 5.107*p1 [tot=14.881s,avg_p1=18.279]
24Deactivate: 5.815*p1 + 5.557 [tot=15.103s,avg_p1=24.067]
25GetLinearization: 21.296*p1 + 1.045*p2 [tot=3.507s,avg_p1=36.027,avg_p2=142.750]
26Initialize: 10.433*p1 [tot=1.558s,avg_p1=36.027]
27MakeTopological: 10.923*p1 + 12.702*p2 + 1.723 [tot=1.425s,avg_p1=27.375,avg_p2=38.879]
28MergeChunks1: 0.865*p1 [tot=1.182s,avg_p1=10.506]
29MergeChunks2: 1.287*p1 [tot=1.244s,avg_p1=6.129]
30MinimizeStep1: 5.614*p1 + 2.104 [tot=10.060s,avg_p1=11.154]
31MinimizeStep2: 9.999*p1 [tot=0.962s,avg_p1=0.570]
32PickChunkToOptimize: 0.506*p1 [tot=0.000s,avg_p1=1.004]
33PickDependendencyToSplit: 3.693*p1 + 6.073 [tot=4.088s,avg_p1=14.193]
34PickMergeCandidate: 3.329*p1 [tot=2.246s,avg_p1=3.463]
35StartMinimizing: 9.012*p1 + 42.197 [tot=0.505s,avg_p1=10.720]
36StartOptimizing: 5.855*p1 + 6.491 [tot=0.221s,avg_p1=9.339]

sipa commented at 11:57 pm on February 24, 2026: member

@instagibbs I agree it would be nice to keep the code & data used for generating the cost model somewhere, though I’m not sure were. It’s currently a few commits that plug in benchmarking into the *Begin() and *End() functions, plus two python scripts for aggregating the results (one per machine, one for across machines). I’m happy to clean those up a bit more, but I think the costmodel c++ code is overkill to maintain within this repository. Perhaps as a .patch file in the contrib directory?

l0rinc commented at 8:11 am on February 25, 2026: contributor

Added Umbrel and a few leftover clang results above. For some reason Nodl still couldn’t even finish a single iteration so far :/

fanquake merged this on Feb 25, 2026

fanquake closed this on Feb 25, 2026

Cluster mempool: SFL cost model (take 2) #34616

Reviews

Conflicts