Optimized SFL cluster linearization

sipa commented at 8:21 pm on December 6, 2025: member

Follow-up to #32545, part of #30289.

This contains many of the optimizations that were originally part of #32545 itself.

Here is a list of commits and the geometric average of the benchmark timings. Note that these aren’t worst cases, but because of the nature of the optimizations here, I do expect them to apply roughly equally to all kinds of clusters. In other words, the relative improvement shown by these numbers should be representative:

commit title	ns per optimal linearization
clusterlin: split tx/chunk dep counting (preparation)	24,760.30
clusterlin: count chunk deps without loop (optimization)	24,677.64
scripted-diff: rename _rep -> _idx in SFL	24,640.08
clusterlin: get rid of DepData, reuse sets (optimization)	24,389.01
clusterlin: improve TxData::dep_top_idx type (optimization)	22,578.58
clusterlin: abstract out functions from MergeStep (refactor)	22,577.15
clusterlin: split up OptimizeStep (refactor)	22,981.11
clusterlin: simplify PickMergeCandidate (optimization)	22,018.63
clusterlin: precompute reachable sets (optimization)	21,194.91
clusterlin: make MergeSequence take SetIdx (simplification)	21,135.60
clusterlin: special-case self-merges (optimization)	20,588.13
clusterlin: keep track of active children (optimization)	13,911.22
clusterlin: track suboptimal chunks (optimization)	13,629.42
clusterlin: unidirectional MakeTopological initially (optimization)	12,796.57
clusterlin: inline UpdateChunk into (De)Activate (optimization)	12,706.35
clusterlin: inline GetReachable into Deactivate (optimization)	12,525.66

And to show that they apply to all clusters roughly similarly:

DrahtBot commented at 8:21 pm on December 6, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34023.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, ajtowns

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34138 (Cluster mempool: more accurate cost model for SFL by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa force-pushed on Dec 6, 2025

DrahtBot added the label CI failed on Dec 6, 2025

DrahtBot commented at 8:52 pm on December 6, 2025: contributor

🚧 At least one of the CI tasks failed. Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/19993781856/job/57338323718 LLM reason (✨ experimental): Cluster linearize tests fail due to a size mismatch (linearization size != depgraph TxCount) triggering assertion failures and aborted tests.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Dec 6, 2025

sipa force-pushed on Dec 7, 2025

DrahtBot removed the label CI failed on Dec 7, 2025

sipa force-pushed on Dec 7, 2025

sipa force-pushed on Dec 9, 2025

sipa force-pushed on Dec 10, 2025

sipa force-pushed on Dec 11, 2025

DrahtBot added the label CI failed on Dec 12, 2025

sipa force-pushed on Dec 12, 2025

DrahtBot removed the label CI failed on Dec 12, 2025

sipa force-pushed on Dec 12, 2025

DrahtBot added the label CI failed on Dec 12, 2025

DrahtBot commented at 11:07 pm on December 12, 2025: contributor

🚧 At least one of the CI tasks failed. Task Windows-cross to x86_64, ucrt: https://github.com/bitcoin/bitcoin/actions/runs/20182154259/job/57944831923 LLM reason (✨ experimental): Compiler error: -Werror treats a range-for over a tuple as an error in cluster_linearize.h, causing the build to fail.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Dec 13, 2025

sipa force-pushed on Dec 13, 2025

sipa force-pushed on Dec 14, 2025

DrahtBot added the label CI failed on Dec 14, 2025

DrahtBot removed the label CI failed on Dec 15, 2025

sipa force-pushed on Dec 15, 2025

sipa force-pushed on Dec 16, 2025

sipa commented at 9:23 pm on December 16, 2025: member

I have moved the changes to TxGraph here to a separate PR, #34085, as it’s not actually dependent on the optimizations here.

sipa force-pushed on Dec 17, 2025

sipa force-pushed on Dec 18, 2025

sipa force-pushed on Dec 19, 2025

sipa commented at 3:31 pm on December 19, 2025: member

Rebased after merge of #32545, but not marking as Ready for review yet, as I’m investigating another data structure layout. If that turns out to be fruitful, that may replace some of the commits here, so I don’t want to waste reviewers’ time on those yet.

sipa force-pushed on Dec 21, 2025

sipa marked this as ready for review on Dec 21, 2025

sipa commented at 10:41 pm on December 21, 2025: member

Ready for review.

I have made a substantial change here, mostly in the new “get rid of DepData, reuse sets (optimization)” commit. It’s cleaner code, but does a few things quite differently from the merged code. As a result, it touches a good portion of the SFL code due to data structure changes. I’ve used the opportunity to make variable names more consistent throughout.

Sorry for this - if I had considered this approach before #32545, I’d have used a data structure layout closer to this one from the start, avoiding such a big change. As is however, I don’t see a good way of splitting it up into smaller meaningful changes.

sipa force-pushed on Dec 22, 2025

DrahtBot added the label CI failed on Dec 22, 2025

DrahtBot removed the label CI failed on Dec 22, 2025

sipa force-pushed on Dec 22, 2025

DrahtBot added the label CI failed on Dec 22, 2025

DrahtBot commented at 10:29 pm on December 22, 2025: contributor

🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20444032805/job/58743457093 LLM reason (✨ experimental): Lint check failed due to scripted-diffs detecting mismatches in code changes.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Dec 23, 2025

sipa force-pushed on Dec 26, 2025

sipa commented at 8:37 pm on December 26, 2025: member

Made a number of improvements:

Moved some of the logic to new functions (PickRandomTx, GetReachable, PickMergeCandidate), hopefully making things more readable.
Added a precomputation for the set of reachable transactions (union of out-of-chunk parents/children within a chunk) rather than computing them on the fly at merging time in Activate() and Deactivate().
Added a special case for self-merges in Improve(), avoiding the need for MergeSequence() in that case.
Made chunk minimization use randomly-chosen pivot transactions, out of a concern that the earlier deterministic choice could be used to create reliably-slow-to-minimize clusters.
Optimized the chunk minimization logic to not retry certain cases that can be predicted to be useless: when a split of a chunk is found in its second phase (so with pivot down for example, after pivot up has been tried and failed already), then the created sub-chunk that holds the pivot can reuse the same pivot, direction, and phase, because we know no split with the pivot in the other direction is possible.

sipa force-pushed on Dec 26, 2025

DrahtBot added the label CI failed on Dec 26, 2025

DrahtBot removed the label CI failed on Dec 26, 2025

sipa commented at 6:17 pm on January 1, 2026: member

Ugh, this cluster takes 1.25 ms on my Ryzen 5950X machine to linearize optimally on average (most of the time is in optimizing, the time spent on minimizing chunks is negligible). For comparison, the worst existing benchmark on the same system takes 0.051 ms.

001020001020101020201020301020401020501020601020701020801020901020a01020b01020c01020d01020e01020f0102100102110102127b02137b02147b021501020001000000000100000000010001000000000e0102010000000000000000000001000000000000020f0102020001000101000000000000000000000000000e01020300000001010100000000000000000000000004010204000000010001000000000000000000000000000e010205000000000000000000010000000001000000000b0102060000000000000100000000000000000000010009010207000100000100000000000000000000000000000501020800000000000000000000000000000000010000001a01020a00000000000000000000000000000000000000001701020a00000000000000000000000000000000000100001401020b00000000010000000000000000000000000000001401020c00000000000000000000000000000001000000000801020d00000000000000000100000000000000000000000701020e00000000000000000100000000000000000000000601020f000100000000000000000000000000000000000006010210000000000000000000000001000000000000000003010211000000000001000000000000000000000000000001010212000000000000000000000000000000000000000000220102130000000000000000000000000000000000000000001a0102140000000000000000000000000000000000000000000c010215000000000000000000000000000000000000000000070102000000010000010000000000000000000000000029010201000000000001000000000000010000000000000c01020200000000000000000100000100000000000000090102030000000100010000000000000000000000000001010204000101000000000000000000000000000000000101020500000000000000000000000000000000010000002401020600000001000000000000000000000000000000001d01020700000000000000000000000000000001000000001c01020800000000000000000000000000000000010000001c01020900000000000000000000000000000000010000001501020a00000000000000000000000000000000000001000901020b00000000000000000000000000000000000100000101020c0000000000000000000000000000000000000000002e01020d0000000000000000000000000000000000000000002c01020e0000000000000000000000000000000000000000002b01020f0000000000000000000000000000000000000000001e0102100000000000000000000000000000000000000000000a01021100000000000000000000000000000000000000000006010212000000000000000000000000000000000000000000040102130000000000000000000000000000000000000000000000

Interestingly, it’s only 0.15 ms when using a splitting strategy consisting of first 2 random split attempts on a given chunk, and then all max-gain splits.

instagibbs commented at 6:50 pm on January 3, 2026: member

huh, I was playing around with your new worst case SFL example. I’m getting ~2ms per optimal solve which roughly matches your number.

But, if I modify the harness to do 1'700*5 iterations per Linearize, and loop until optimal passing in prior linearizations emitted (was curious how bad it would be with DoWork), I’m getting 245mics to optimal? (edit: looks like it’s taking just 2 iterations at this level)

As I add more to max iterations, the performance starts to progressively get worse until it solves it in one call.

If I drop it to down to 1,700 iterations only, it becomes 150mics average. Any lower and it starts to fail solving it ever since we’re presumably not making enough progress to overcome the fact we’re losing SFL state in between.

I guess this is randomness bailing us out once every N invocations rather than getting stuck in a loop deep somewhere?

sipa commented at 4:29 pm on January 5, 2026: member

@instagibbs Interesting, that’s arguably a more relevant metric in the grand scheme of things, though it doesn’t say much here, as the example isn’t specifically designed to need a high “minimum iterations to make progress” - it’s very well possible that examples exist that are far worse than this one in that regard.

These outlier long optimal linearization times seem very dependent on the particular splitting strategy used. The current PR here uses max-gain always, except every third split is done randomly (so random is used iff m_self_merges for a chunk is 2, 5, 8, 11, …). It seems like there may be much better (and much worse) choices here. I’m investigating this more.

sipa commented at 0:04 am on January 6, 2026: member

This is pretty unexpected. I’ve analyzed a number of split strategies, and gathered the maximum iteration count needed to reach optimal, under a large-ish (~380000 clusters) rather arbitrarily constructed set of input clusters (while providing them with bad or missing input linearization).

 0MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 0) % 1) == 0: 387722  # always max_gain
 1MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 0) % 2) == 0: 5614508
 2MrrMrrMrrMrr: ((m_self_merges[chunk_idx] + 0) % 3) == 0: 3354629
 3MrrrMrrrMrrr: ((m_self_merges[chunk_idx] + 0) % 4) == 0: 1648066
 4MrrrrMrrrrMr: ((m_self_merges[chunk_idx] + 0) % 5) == 0: 2139700
 5MrrrrrMrrrrr: ((m_self_merges[chunk_idx] + 0) % 6) == 0: 1021390
 6
 7rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 0) % 1) != 0: 100000000 # always random
 8rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 0) % 2) != 0: 100000000
 9rMMrMMrMMrMM: ((m_self_merges[chunk_idx] + 0) % 3) != 0: 100000000
10rMMMrMMMrMMM: ((m_self_merges[chunk_idx] + 0) % 4) != 0: 4479215
11rMMMMrMMMMrM: ((m_self_merges[chunk_idx] + 0) % 5) != 0: 861766
12rMMMMMrMMMMM: ((m_self_merges[chunk_idx] + 0) % 6) != 0: 459358
13
14MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 1) % 1) == 0: 387722    # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
15rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 1) % 2) == 0: 100000000 # same as ((m_self_merges[chunk_idx] + 0) % 2) != 0
16rrMrrMrrMrrM: ((m_self_merges[chunk_idx] + 1) % 3) == 0: 100000000
17rrrMrrrMrrrM: ((m_self_merges[chunk_idx] + 1) % 4) == 0: 100000000
18rrrrMrrrrMrr: ((m_self_merges[chunk_idx] + 1) % 5) == 0: 100000000
19rrrrrMrrrrrM: ((m_self_merges[chunk_idx] + 1) % 6) == 0: 100000000
20
21rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 1) % 1) != 0: 100000000 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
22MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 1) % 2) != 0: 5614508 # same as ((m_self_merges[chunk_idx] + 0) % 2) == 0
23MMrMMrMMrMrM: ((m_self_merges[chunk_idx] + 1) % 3) != 0: 3659198 # the current PR
24MMMrMMMrMMrM: ((m_self_merges[chunk_idx] + 1) % 4) != 0: 1083539
25MMMMrMMMMrMM: ((m_self_merges[chunk_idx] + 1) % 5) != 0: 650703
26MMMMMrMMMMMr: ((m_self_merges[chunk_idx] + 1) % 6) != 0: 668819
27
28MMMMMMMMMMMM: m_self_merges[chunk_idx] >= 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
29rMMMMMMMMMMM: m_self_merges[chunk_idx] >= 1: 388318
30rrMMMMMMMMMM: m_self_merges[chunk_idx] >= 2: 594013
31rrrMMMMMMMMM: m_self_merges[chunk_idx] >= 3: 8543778
32rrrrMMMMMMMM: m_self_merges[chunk_idx] >= 4: 27802956
33rrrrrMMMMMMM: m_self_merges[chunk_idx] >= 5: 100000000
34
35rrrrrrrrrrrr: m_self_merges[chunk_idx] < 0: 100000000 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
36Mrrrrrrrrrrr: m_self_merges[chunk_idx] < 1: 2191100
37MMrrrrrrrrrr: m_self_merges[chunk_idx] < 2: 759915
38MMMrrrrrrrrr: m_self_merges[chunk_idx] < 3: 464325
39MMMMrrrrrrrr: m_self_merges[chunk_idx] < 4: 1042613
40MMMMMrrrrrrr: m_self_merges[chunk_idx] < 5: 523463

The maximum iteration count passed to Linearize was 100M, so when 100000000 appears it may well mean “indefinite”.

More max-gain seems to be pretty much always better. However, I’m very surprised that always-random apparently doesn’t reach optimal for some input clusters. Will investigate further, but with this, I’m inclined to take something like ((m_self_merges[chunk_idx] + 0) % 6) != 0 rather than the PR’s ((m_self_merges[chunk_idx] + 1) % 3) != 0.

EDIT: redid the numbers after fixing a bug that sometimes caused splits with zero gain.

 0MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 0) % 1) == 0: 387722  # always max_gain
 1MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 0) % 2) == 0: 6012644
 2MrrMrrMrrMrr: ((m_self_merges[chunk_idx] + 0) % 3) == 0: 3354629
 3MrrrMrrrMrrr: ((m_self_merges[chunk_idx] + 0) % 4) == 0: 1648066
 4MrrrrMrrrrMr: ((m_self_merges[chunk_idx] + 0) % 5) == 0: 2139700
 5MrrrrrMrrrrr: ((m_self_merges[chunk_idx] + 0) % 6) == 0: 1003802
 6
 7rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 0) % 1) != 0: 416225 # always random
 8rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 0) % 2) != 0: 4796320
 9rMMrMMrMMrMM: ((m_self_merges[chunk_idx] + 0) % 3) != 0: 1427591
10rMMMrMMMrMMM: ((m_self_merges[chunk_idx] + 0) % 4) != 0: 1095346
11rMMMMrMMMMrM: ((m_self_merges[chunk_idx] + 0) % 5) != 0: 570429
12rMMMMMrMMMMM: ((m_self_merges[chunk_idx] + 0) % 6) != 0: 426605
13
14MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 1) % 1) == 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
15rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 1) % 2) == 0: 4796320 # same as ((m_self_merges[chunk_idx] + 0) % 2) != 0
16rrMrrMrrMrrM: ((m_self_merges[chunk_idx] + 1) % 3) == 0: 2473649
17rrrMrrrMrrrM: ((m_self_merges[chunk_idx] + 1) % 4) == 0: 1520079
18rrrrMrrrrMrr: ((m_self_merges[chunk_idx] + 1) % 5) == 0: 2107109
19rrrrrMrrrrrM: ((m_self_merges[chunk_idx] + 1) % 6) == 0: 882761
20
21rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 1) % 1) != 0: 416225 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
22MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 1) % 2) != 0: 6012644 # same as ((m_self_merges[chunk_idx] + 0) % 2) == 0
23MMrMMrMMrMrM: ((m_self_merges[chunk_idx] + 1) % 3) != 0: 2223704 # the current PR
24MMMrMMMrMMrM: ((m_self_merges[chunk_idx] + 1) % 4) != 0: 872725
25MMMMrMMMMrMM: ((m_self_merges[chunk_idx] + 1) % 5) != 0: 916415
26MMMMMrMMMMMr: ((m_self_merges[chunk_idx] + 1) % 6) != 0: 668819
27
28MMMMMMMMMMMM: m_self_merges[chunk_idx] >= 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
29rMMMMMMMMMMM: m_self_merges[chunk_idx] >= 1: 471715
30rrMMMMMMMMMM: m_self_merges[chunk_idx] >= 2: 598471
31rrrMMMMMMMMM: m_self_merges[chunk_idx] >= 3: 270490
32rrrrMMMMMMMM: m_self_merges[chunk_idx] >= 4: 248124
33rrrrrMMMMMMM: m_self_merges[chunk_idx] >= 5: 291305
34
35rrrrrrrrrrrr: m_self_merges[chunk_idx] < 0: 416225 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
36Mrrrrrrrrrrr: m_self_merges[chunk_idx] < 1: 700377
37MMrrrrrrrrrr: m_self_merges[chunk_idx] < 2: 546130
38MMMrrrrrrrrr: m_self_merges[chunk_idx] < 3: 464325
39MMMMrrrrrrrr: m_self_merges[chunk_idx] < 4: 467645
40MMMMMrrrrrrr: m_self_merges[chunk_idx] < 5: 523463

As expected, generally more max-gain is better, but it does seem that alternating often between the two strategies is worse than sticking with one. Also - and this may be an artifact of a biased test set - it seems that doing a few random iterations first helps.

gmaxwell commented at 1:12 am on January 6, 2026: contributor

You could consider if there is some other progress metric you could gate randomness on– it really only needs to use random if its consistently not making progress under max_gain (or even ’less than expected progress’). Might also be worth checking if there isn’t some bug with random since IIRC in the initial test when you decided to MMrMMr didn’t have as much of negative effect over MMM as you’re seeing now.

sipa commented at 4:10 am on January 6, 2026: member

Certainly there seems to be something wrong. These clusters that need millions/billions of iterations only need a few 1000 iterations in my Python version of the algorithm, which ought to be roughly identical to the always-random strategy.

EDIT: found the bug, indeed inside the random strategy (it would under certain conditions split equal-feerate tops off).

sipa commented at 3:46 pm on January 6, 2026: member

See updated results above.

sipa force-pushed on Jan 6, 2026

sipa commented at 9:53 pm on January 6, 2026: member

Rebased, changed the approach here to “rMMMMMrMMMMM” (((m_self_merges[chunk_idx] + 0) % 6) != 0), and fixed the bug in the random strategy.

sipa force-pushed on Jan 7, 2026

in src/cluster_linearize.h:1102 in 4f6553139e outdated

1098@@ -1099,32 +1099,30 @@ class SpanningForestState
1099         /** A heap with all chunks (by representative) that can currently be included, sorted by
1100          *  chunk feerate and a random tie-breaker. */
1101         std::vector<std::pair<TxIdx, uint64_t>> ready_chunks;
1102-        /** Information about chunks:

instagibbs commented at 2:41 pm on January 9, 2026:

f29fde45764a51801c6574917bcc35e880a8acdf

nit: commit message has typo: “presentative”

sipa commented at 8:45 pm on February 16, 2026:

Fixed.

sipa force-pushed on Jan 12, 2026

DrahtBot added the label CI failed on Jan 12, 2026

DrahtBot commented at 6:21 am on January 12, 2026: contributor

🚧 At least one of the CI tasks failed. Task Windows native, fuzz, VS 2022: https://github.com/bitcoin/bitcoin/actions/runs/20908711728/job/60067268129 LLM reason (✨ experimental): Fuzz test txgraph failed due to an assertion violation (iters <= iters_for_optimal) causing the fuzz target to exit with code 1.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Jan 12, 2026

DrahtBot removed the label CI failed on Jan 12, 2026

sipa force-pushed on Jan 13, 2026

DrahtBot added the label CI failed on Jan 13, 2026

DrahtBot commented at 3:21 am on January 13, 2026: contributor

🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20942872497/job/60179846256 LLM reason (✨ experimental): Lint check failed (scripted-diffs) causing the CI failure.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Jan 13, 2026

fanquake referenced this in commit baa554f708 on Jan 15, 2026

DrahtBot added the label Needs rebase on Jan 15, 2026

sipa force-pushed on Jan 15, 2026

DrahtBot removed the label Needs rebase on Jan 15, 2026

fanquake added this to the milestone 31.0 on Feb 5, 2026

sipa force-pushed on Feb 5, 2026

sipa commented at 5:46 pm on February 11, 2026: member

Working to rebase this after the merge of #34257, and split some more things off.

DrahtBot added the label Needs rebase on Feb 11, 2026

sipa force-pushed on Feb 11, 2026

sipa commented at 10:07 pm on February 11, 2026: member

Rebased, and I have dropped the logic for deciding between maxgain / random split strategies from this PR (sticking with always-random like master). Those aren’t really an optimization as it doesn’t seem like an unconditional win, may deserve some separate discussion, and aren’t as crucial to get in the initial cluster mempool release (31.0).

EDIT: moved benchmark results into PR description

DrahtBot removed the label Needs rebase on Feb 11, 2026

sipa force-pushed on Feb 11, 2026

DrahtBot added the label CI failed on Feb 11, 2026

DrahtBot removed the label CI failed on Feb 12, 2026

sipa commented at 3:41 pm on February 12, 2026: member

EDIT: moved into PR description

sipa force-pushed on Feb 12, 2026

in src/cluster_linearize.h:1120 in a19a554e75 outdated

1128+        unsigned init_dir = m_rng.randbool();
1129+        /** Which chunks are the result of merging, and thus need merge attempts in both
1130+         *  directions. */
1131+        SetType merged_chunks;
1132+        // Mark chunks as suboptimal.
1133+        m_suboptimal_idxs = m_chunk_idxs;

ajtowns commented at 4:01 am on February 14, 2026:

Should this also have a m_suboptimal_chunks.clear() or Assume(m_suboptimal_chunks.empty()) ? Likewise for StartOptimizing below

sipa commented at 10:49 pm on February 14, 2026:

Done, in a separate commit.

ajtowns commented at 5:03 am on February 14, 2026: contributor

ACK a19a554e752cf44cbd15b5e57eb397b219fc1fd0

Ran fuzz tests for a little while, and the code for the fuzzers hasn’t been touched so that’s promising that the behaviour is consistent. Looked through each of the changes, and they make sense to me, but without rewriting “get rid of DepData” myself, hard to be 100% confident I didn’t miss something there. Changes all look good and logical to me.

sipa commented at 2:39 pm on February 14, 2026: member

@ajtowns Would it help if I split up the “get rid of DepData” commit into one that introduces m_set_info as a shared pool but not get rid of DepData first?

ajtowns commented at 8:55 pm on February 14, 2026: contributor

Would be happy to look at it, but I’m skeptical that it would be much clearer? It didn’t look to me like there was much to be done to improve it from what you already have: you could go to the trouble of abstracting it into an interface that makes sense both before and after so you’re just changing the interface implementation, but creating an abstraction like that that would almost certainly be actively going against the optimisations you’re making here, so also doesn’t seem like a net win. I think the fuzz tests are a pretty good assurance of correctness.

[EDIT: I pointed claude at trying the split, and what it came up with doesn’t seem an improvement]

sipa force-pushed on Feb 14, 2026

sipa commented at 10:19 pm on February 14, 2026: member

Pushed an update that splits the “get rid of DepData” commit in two:

The first one introduces the SetIdx and the SetInfo pool, but leaves DepData in place
The second one replaces DepData with (par, chl) indexing and a per-child dep_top_idx inside TxData.

sipa force-pushed on Feb 15, 2026

sipa commented at 3:05 pm on February 15, 2026: member

I cleaned the commit history a bit, to avoid changing the same line multiple times. Also added an extra commit that adds extra Assumes() and sanity check asserts.

Ready for review again; I won’t make further changes unless requested by reviewers.

in src/cluster_linearize.h:1089 in 4be5ced4f8

1093-                    std::swap(m_suboptimal_chunks.back(), m_suboptimal_chunks[j]);
1094-                }
1095+        for (auto chunk_idx : m_chunk_idxs) {
1096+            m_suboptimal_chunks.push_back(chunk_idx);
1097+            // Randomize the initial order of suboptimal chunks in the queue.
1098+            SetIdx j = m_rng.randrange<TxIdx>(m_suboptimal_chunks.size());

instagibbs commented at 3:32 pm on February 16, 2026:

nit

0            SetIdx j = m_rng.randrange<SetIdx>(m_suboptimal_chunks.size());

sipa commented at 8:45 pm on February 16, 2026:

Fixed.

in src/cluster_linearize.h:941 in 3f17976f46

940         // the parent, and a bottom containing the child. The top should have a higher feerate.
941-        Deactivate(dep_idx);
942+        Deactivate(parent_idx, child_idx);
943 
944         // At this point we have exactly two chunks which may violate topology constraints (the
945         // parent chunk and child chunk that were produced by deactivating dep_idx). We can fix

instagibbs commented at 3:44 pm on February 16, 2026:

3f17976f46f34dca23e9ca7bce0747a01b6af14b

stale comment re: dep_idx?

sipa commented at 8:45 pm on February 16, 2026:

Fixed.

in src/cluster_linearize.h:1475 in 3f17976f46

1484         //
1485-        for (const auto& [par_idx, chl_idx, dep_idx] : active_dependencies) {
1486-            const auto& dep_data = m_dep_data[dep_idx];
1487+        for (const auto& [par_idx, chl_idx] : active_dependencies) {
1488             // Verify the top set's transactions: it must contain the parent, and for every
1489             // active dependency, except dep_idx itself, if it contains the parent or child, it

instagibbs commented at 3:45 pm on February 16, 2026:

3f17976f46f34dca23e9ca7bce0747a01b6af14b

another stale comment re: dep_idx

sipa commented at 8:45 pm on February 16, 2026:

Fixed.

in src/cluster_linearize.h:854 in 87d74c17d9 outdated

851             }
852         }
853         // Compute the new sets of reachable transactions for each new chunk.
854-        m_reachable[child_chunk_idx] = GetReachable(bottom_info.transactions);
855-        m_reachable[parent_chunk_idx] = GetReachable(top_info.transactions);
856+        m_reachable[parent_chunk_idx].first = top_parents - top_info.transactions;

instagibbs commented at 5:05 pm on February 16, 2026:

87d74c17d9eccedca1caf83f4b08249f5a877379

Just noting GetReachable is only used in SanityCheck (which is fine)

sipa commented at 8:45 pm on February 16, 2026:

Added a comment.

instagibbs approved

instagibbs commented at 5:15 pm on February 16, 2026: member

ACK 87d74c17d9eccedca1caf83f4b08249f5a877379

Changes make sense, did not run fuzzer myself. Non-blocking nits only

in the chart’s x-asis, what’s the denominator?

DrahtBot requested review from ajtowns on Feb 16, 2026

sipa force-pushed on Feb 16, 2026

sipa commented at 8:45 pm on February 16, 2026: member

Addressed @instagibbs’ comments, added a commit to drop the DepGraph& argument to SpanningForestState::SanityCheck(), and added extra comments and assertions suggested by Claude (reviewed/applied/edited by me).

in the chart’s x-asis, what’s the denominator?

“benchmark count”, I guess. Each benchmark takes up equals x-axis space, sorted by master’s runtime.

EDIT: oh, in the label? Those are “ntx/ndeps” for the test.

in src/cluster_linearize.h:1378 in 7a0309ba9f

1396+            chunk_deps[chl_chunk_idx] += (chl_data.parents - chl_chunk_info.transactions).Count();
1397         }
1398         /** Function to compute the highest element of a chunk, by fallback_order. */
1399-        auto max_fallback_fn = [&](TxIdx chunk_rep) noexcept {
1400-            auto& chunk = m_tx_data[chunk_rep].chunk_setinfo.transactions;
1401+        auto max_fallback_fn = [&](TxIdx chunk_idx) noexcept {

ajtowns commented at 11:58 am on February 17, 2026:

This should be a SetIdx argument, I think.

sipa commented at 2:06 pm on February 17, 2026:

Indeed, fixed.

in src/cluster_linearize.h:878 in 7a0309ba9f

929+        Assume(m_chunk_idxs[top_idx]);
930+        Assume(m_chunk_idxs[bottom_idx]);
931+        auto& top_chunk_info = m_set_info[top_idx];
932+        auto& bottom_chunk_info = m_set_info[bottom_idx];
933         // Count the number of dependencies between bottom_chunk and top_chunk.
934         TxIdx num_deps{0};

ajtowns commented at 12:40 pm on February 17, 2026:

I don’t think num_deps should have a type of TxIdx?

sipa commented at 2:06 pm on February 17, 2026:

Fixed, separate first commit.

ajtowns commented at 1:28 pm on February 17, 2026: contributor

Pushed an update that splits the “get rid of DepData” commit in two:

Yeah, I’m still not smart enough to understand things unless they’re way more bite-size than that. I broke the first of those commits down into 12 commits before it really made sense. OTOH, I think that found a couple of bugs/typos, so yay?

I didn’t really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don’t see any harm in it either.

ACK 7a0309ba9fccf6f5d5ee428fee36cc6ecf3605c4 though probably should do the fix ups.

DrahtBot requested review from instagibbs on Feb 17, 2026

clusterlin: fix type to count dependencies 666b37970f

clusterlin: avoid depgraph argument in SanityCheck (cleanup)

Since the deterministic ordering change, SpanningForestState holds a
reference to the DepGraph it is linearizing. So this means we do not
need to pass it to SanityCheck() as an argument anymore.

900e459778

clusterlin: split tx/chunk dep counting (preparation)

This splits the chunk_deps variable in LoadLinearization in two, one for
tracking tx dependencies and one for chunk dependencies. This is a
preparation for a later commit, where chunks won't be identified anymore
by a representative transaction in them, but by a separate index. With
that, it seems weird to keep them both in the same structure if they
will be indexed in an unrelated way.

Note that the changes in src/test/util/cluster_linearize.h to the table
of worst observed iteration counts are due to switching to a different
data set, and are unrelated to the changes in this commit.

f66fa69ce0

clusterlin: count chunk deps without loop (optimization)

This small optimization avoids the need to loop over the parents of each
transaction when initializing the dependency-counting structures inside
GetLinearization().

d69c9f56ea

clusterlin: add more Assumes and sanity checks (tests) 268fcb6a53

scripted-diff: rename _rep -> _idx in SFL

This is a preparation for the next commit, where chunks will no longer
be identified using a representative transaction, but using a set index.
Reduce the load of line changes by doing this rename ahead of time.

-BEGIN VERIFY SCRIPT-
sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
-END VERIFY SCRIPT-

20e2f3e96d

clusterlin: pool SetInfos (preparation)

This significantly changes the data structures used in SFL, based on the
observation that the DepData::top_setinfo fields are quite wasteful:
there is one per dependency (up to n^2/4), but we only ever need one per
active dependency (of which there at most n-1). In total, the number of
chunks plus the number of active dependencies is always exactly equal to
the number of transactions, so it makes sense to have a shared pool of
SetInfos, which are used for both chunks and top sets.

To that effect, introduce a separate m_set_info variable, which stores a
SetInfo per transaction. Some of these are used for chunk sets, and some
for active dependencies' top sets. Every activation transforms the
parent's chunk into the top set for the new dependency. Every
deactivation transforms the top set into the new parent chunk.

With indexes into m_set_data (SetIdx) becoming bounded by the number of
transactions, we can use a SetType to represent sets of SetIdxs.
Specifically, an m_chunk_idxs is added which contains all SetIdx
referring to chunks. This leads to a much more natural way of iterating
over chunks.

Also use this opportunity to normalize many variable names.

7c6f63a8a9

clusterlin: get rid of DepData (optimization)

With the earlier change to pool SetInfo objects, there is little need
for DepData anymore. Use parent/child TxIdxs to refer to dependencies,
and find their top set by having a child TxIdx-indexed vector in each
TxData, rather than a list of dependencies. This makes code for
iterating over dependencies more natural and simpler.

73cbd15d45

clusterlin: improve TxData::dep_top_idx type (optimization)

The combined size of TxData::dep_top_idx can be 16 KiB with 64
transactions and SetIdx = uint32_t. Use a smaller type where possible to
reduce memory footprint and improve cache locality of m_tx_data.

Also switch from an std::vector to an std::array, reducing allocation
overhead and indirections.

b75574a653

clusterlin: abstract out functions from MergeStep (refactor)

This is a simple refactor to make the code more readable.

cbd684a471

clusterlin: split up OptimizeStep (refactor) dcf458ffb9

clusterlin: simplify PickMergeCandidate (optimization)

The current process consists of iterating over the transactions of the
chunk one by one, and then for each figuring out which of its
parents/children are in unprocessed chunks.

Simplify this (and speed it up slightly) by splitting this process into
two phases: first determine the union of all parents/children, and then
find which chunks those belong to.

6f898dbb8b

clusterlin: precompute reachable sets (optimization)

Instead of computing the set of reachable transactions inside
PickMergeCandidate, make the information precomputed, and updated in
Activate (by merging the two chunks' reachable sets) and Deactivate (by
recomputing).

This is a small performance gain on itself, but also a preparation for
future optimizations that rely on quickly testing whether dependencies
between chunks exist.

7194de3f7c

clusterlin: make MergeSequence take SetIdx (simplification)

Future changes will rely on knowing the chunk indexes of the two created
chunks after a split. It is natural to return this information from
Deactivate, which also simplifies MergeSequence.

3221f1a074

clusterlin: special-case self-merges (optimization)

After a split, if the top part has a dependency on the bottom part, the
first MergeSequence will always perform this merge and then stop. This
is referred to as a self-merge.

We can special case these by detecting self-merges early, and avoiding
the overhead of a full MergeSequence which involves two
PickMergeCandidate calls (a succesful and an unsuccesful one).

ae16485aa9

clusterlin: keep track of active children (optimization)

This means we can iterate over all active dependencies in a
cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
active dependencies in a set of transactions of size is at most ntx-1.

(*) Asymptotically, this is not actually true, as for large transaction
counts, iterating over a BitSet still scales with ntx. In practice
however, where BitSets are represented by a constant number of integers,
it holds.

63b06d5523

clusterlin: track suboptimal chunks (optimization)

This avoids adding them a second time to m_suboptimal_chunks when they
happen to already be there.

1daa600c1c

clusterlin: unidirectional MakeTopological initially (optimization)

It suffices to initially only attempt one direction of merges in
MakeTopological(), and only try both directions on chunks that are the
result of other merges.

b684f954bb

clusterlin: inline UpdateChunk into (De)Activate (optimization)

The two calls to UpdateChunk, in Activate and Deactive each, are subtly
different: the top one needs to update the chunk_idx of iterated
transactions, while the bottom one leaves it unchanged. To exploit this
difference, inline the four function calls, getting rid of UpdateChunks.

This is also a preparation for a future improvement that inlines the
recomputation of reachable sets in the same loop in Deactivate.

d90f98ab4a

clusterlin: inline GetReachable into Deactivate (optimization)

Avoid two full iterations over all of a chunks' transactions to
recompute the reachable sets, by inlining them into the
dependency-updating loops.

Note that there is no need to do the same for Activate, because the
reachable sets after merging can be computed directly from the input
chunks' reachable sets. Deactivate needs to recompute them, however.

c2fcf25069

sipa force-pushed on Feb 17, 2026

instagibbs commented at 2:14 pm on February 17, 2026: member

reACK c2fcf250697325636218225d578c3844ab9ca633

DrahtBot requested review from ajtowns on Feb 17, 2026

sipa commented at 2:14 pm on February 17, 2026: member

OTOH, I think that found a couple of bugs/typos, so yay?

Yay!

Do you want me to try to perform some of those splits here? I think the introduction/usage of m_chunk_idxs may be reasonable.

I didn’t really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don’t see any harm in it either.

The reason is that on Activation, the top chunk becomes the new dependency’s top set, and thus the other one - the bottom chunk - needs to become the merged chunk’s set (and thus returned by Activate). By reusing the top chunk as top set, we avoid the need for updating things. On Deactivation, the reverse is done (the initial chunk becomes the bottom chunk, the dependency top set becomes the top chunk).

ajtowns commented at 11:43 pm on February 17, 2026: contributor

OTOH, I think that found a couple of bugs/typos, so yay?

Yay!

Do you want me to try to perform some of those splits here? I think the introduction/usage of m_chunk_idxs may be reasonable.

I think it’s useful for understanding the changes incrementally, but I’m not sure anyone’s likely to want to do that (vs just understanding the latest implementation from scratch) outside of reviewing this PR, in which case they can look at my branch if they want?

I didn’t really see why you swapped the parent/child ordering in Activate (return child_chunk_idx), but I don’t see any harm in it either. By reusing the top chunk as top set, we avoid the need for updating things.

Ah, essentially saving swapping two (24 byte?) SetInfo’s I guess.

reACK c2fcf250697325636218225d578c3844ab9ca633

fanquake merged this on Feb 18, 2026

fanquake closed this on Feb 18, 2026

Optimized SFL cluster linearization #34023

Code Coverage & Benchmarks

Reviews

Conflicts