Optimized SFL cluster linearization #34023

pull sipa wants to merge 17 commits into bitcoin:master from sipa:202512_sfl_opt changing 2 files +537 −470
  1. sipa commented at 8:21 pm on December 6, 2025: member

    Follow-up to #32545, part of #30289.

    This contains many of the optimizations that were originally part of #32545 itself.

  2. DrahtBot commented at 8:21 pm on December 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34023.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34257 (txgraph: deterministic optimal transaction order by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • // Verify it has a valid chunk index, and that chunk includes this -> // Verify it has a valid chunk index, and that chunk includes this transaction. [The sentence is truncated and missing the noun “transaction”, making the comment incomplete and less clear.]

    2026-01-15

  3. sipa force-pushed on Dec 6, 2025
  4. DrahtBot added the label CI failed on Dec 6, 2025
  5. DrahtBot commented at 8:52 pm on December 6, 2025: contributor

    🚧 At least one of the CI tasks failed. Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/19993781856/job/57338323718 LLM reason (✨ experimental): Cluster linearize tests fail due to a size mismatch (linearization size != depgraph TxCount) triggering assertion failures and aborted tests.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. sipa force-pushed on Dec 6, 2025
  7. sipa force-pushed on Dec 6, 2025
  8. sipa force-pushed on Dec 7, 2025
  9. sipa force-pushed on Dec 7, 2025
  10. DrahtBot removed the label CI failed on Dec 7, 2025
  11. sipa force-pushed on Dec 7, 2025
  12. sipa force-pushed on Dec 9, 2025
  13. sipa force-pushed on Dec 10, 2025
  14. sipa force-pushed on Dec 11, 2025
  15. DrahtBot added the label CI failed on Dec 12, 2025
  16. sipa force-pushed on Dec 12, 2025
  17. DrahtBot removed the label CI failed on Dec 12, 2025
  18. sipa force-pushed on Dec 12, 2025
  19. sipa force-pushed on Dec 12, 2025
  20. DrahtBot added the label CI failed on Dec 12, 2025
  21. DrahtBot commented at 11:07 pm on December 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Task Windows-cross to x86_64, ucrt: https://github.com/bitcoin/bitcoin/actions/runs/20182154259/job/57944831923 LLM reason (✨ experimental): Compiler error: -Werror treats a range-for over a tuple as an error in cluster_linearize.h, causing the build to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  22. DrahtBot removed the label CI failed on Dec 13, 2025
  23. sipa force-pushed on Dec 13, 2025
  24. sipa force-pushed on Dec 14, 2025
  25. sipa force-pushed on Dec 14, 2025
  26. sipa force-pushed on Dec 14, 2025
  27. DrahtBot added the label CI failed on Dec 14, 2025
  28. DrahtBot removed the label CI failed on Dec 15, 2025
  29. sipa force-pushed on Dec 15, 2025
  30. sipa force-pushed on Dec 15, 2025
  31. sipa force-pushed on Dec 16, 2025
  32. sipa force-pushed on Dec 16, 2025
  33. sipa commented at 9:23 pm on December 16, 2025: member
    I have moved the changes to TxGraph here to a separate PR, #34085, as it’s not actually dependent on the optimizations here.
  34. sipa force-pushed on Dec 17, 2025
  35. sipa force-pushed on Dec 18, 2025
  36. sipa force-pushed on Dec 19, 2025
  37. sipa force-pushed on Dec 19, 2025
  38. sipa commented at 3:31 pm on December 19, 2025: member
    Rebased after merge of #32545, but not marking as Ready for review yet, as I’m investigating another data structure layout. If that turns out to be fruitful, that may replace some of the commits here, so I don’t want to waste reviewers’ time on those yet.
  39. sipa force-pushed on Dec 21, 2025
  40. sipa marked this as ready for review on Dec 21, 2025
  41. sipa commented at 10:41 pm on December 21, 2025: member

    Ready for review.

    I have made a substantial change here, mostly in the new “get rid of DepData, reuse sets (optimization)” commit. It’s cleaner code, but does a few things quite differently from the merged code. As a result, it touches a good portion of the SFL code due to data structure changes. I’ve used the opportunity to make variable names more consistent throughout.

    Sorry for this - if I had considered this approach before #32545, I’d have used a data structure layout closer to this one from the start, avoiding such a big change. As is however, I don’t see a good way of splitting it up into smaller meaningful changes.

  42. sipa force-pushed on Dec 22, 2025
  43. sipa force-pushed on Dec 22, 2025
  44. sipa force-pushed on Dec 22, 2025
  45. DrahtBot added the label CI failed on Dec 22, 2025
  46. DrahtBot removed the label CI failed on Dec 22, 2025
  47. sipa force-pushed on Dec 22, 2025
  48. sipa force-pushed on Dec 22, 2025
  49. DrahtBot added the label CI failed on Dec 22, 2025
  50. DrahtBot commented at 10:29 pm on December 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20444032805/job/58743457093 LLM reason (✨ experimental): Lint check failed due to scripted-diffs detecting mismatches in code changes.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  51. DrahtBot removed the label CI failed on Dec 23, 2025
  52. sipa force-pushed on Dec 26, 2025
  53. sipa commented at 8:37 pm on December 26, 2025: member

    Made a number of improvements:

    • Moved some of the logic to new functions (PickRandomTx, GetReachable, PickMergeCandidate), hopefully making things more readable.
    • Added a precomputation for the set of reachable transactions (union of out-of-chunk parents/children within a chunk) rather than computing them on the fly at merging time in Activate() and Deactivate().
    • Added a special case for self-merges in Improve(), avoiding the need for MergeSequence() in that case.
    • Made chunk minimization use randomly-chosen pivot transactions, out of a concern that the earlier deterministic choice could be used to create reliably-slow-to-minimize clusters.
    • Optimized the chunk minimization logic to not retry certain cases that can be predicted to be useless: when a split of a chunk is found in its second phase (so with pivot down for example, after pivot up has been tried and failed already), then the created sub-chunk that holds the pivot can reuse the same pivot, direction, and phase, because we know no split with the pivot in the other direction is possible.
  54. sipa force-pushed on Dec 26, 2025
  55. DrahtBot added the label CI failed on Dec 26, 2025
  56. DrahtBot removed the label CI failed on Dec 26, 2025
  57. sipa commented at 6:17 pm on January 1, 2026: member

    Ugh, this cluster takes 1.25 ms on my Ryzen 5950X machine to linearize optimally on average (most of the time is in optimizing, the time spent on minimizing chunks is negligible). For comparison, the worst existing benchmark on the same system takes 0.051 ms.

    001020001020101020201020301020401020501020601020701020801020901020a01020b01020c01020d01020e01020f0102100102110102127b02137b02147b021501020001000000000100000000010001000000000e0102010000000000000000000001000000000000020f0102020001000101000000000000000000000000000e01020300000001010100000000000000000000000004010204000000010001000000000000000000000000000e010205000000000000000000010000000001000000000b0102060000000000000100000000000000000000010009010207000100000100000000000000000000000000000501020800000000000000000000000000000000010000001a01020a00000000000000000000000000000000000000001701020a00000000000000000000000000000000000100001401020b00000000010000000000000000000000000000001401020c00000000000000000000000000000001000000000801020d00000000000000000100000000000000000000000701020e00000000000000000100000000000000000000000601020f000100000000000000000000000000000000000006010210000000000000000000000001000000000000000003010211000000000001000000000000000000000000000001010212000000000000000000000000000000000000000000220102130000000000000000000000000000000000000000001a0102140000000000000000000000000000000000000000000c010215000000000000000000000000000000000000000000070102000000010000010000000000000000000000000029010201000000000001000000000000010000000000000c01020200000000000000000100000100000000000000090102030000000100010000000000000000000000000001010204000101000000000000000000000000000000000101020500000000000000000000000000000000010000002401020600000001000000000000000000000000000000001d01020700000000000000000000000000000001000000001c01020800000000000000000000000000000000010000001c01020900000000000000000000000000000000010000001501020a00000000000000000000000000000000000001000901020b00000000000000000000000000000000000100000101020c0000000000000000000000000000000000000000002e01020d0000000000000000000000000000000000000000002c01020e0000000000000000000000000000000000000000002b01020f0000000000000000000000000000000000000000001e0102100000000000000000000000000000000000000000000a01021100000000000000000000000000000000000000000006010212000000000000000000000000000000000000000000040102130000000000000000000000000000000000000000000000
    

    Interestingly, it’s only 0.15 ms when using a splitting strategy consisting of first 2 random split attempts on a given chunk, and then all max-gain splits.

  58. instagibbs commented at 6:50 pm on January 3, 2026: member

    huh, I was playing around with your new worst case SFL example. I’m getting ~2ms per optimal solve which roughly matches your number.

    But, if I modify the harness to do 1'700*5 iterations per Linearize, and loop until optimal passing in prior linearizations emitted (was curious how bad it would be with DoWork), I’m getting 245mics to optimal? (edit: looks like it’s taking just 2 iterations at this level)

    As I add more to max iterations, the performance starts to progressively get worse until it solves it in one call.

    If I drop it to down to 1,700 iterations only, it becomes 150mics average. Any lower and it starts to fail solving it ever since we’re presumably not making enough progress to overcome the fact we’re losing SFL state in between.

    I guess this is randomness bailing us out once every N invocations rather than getting stuck in a loop deep somewhere?

  59. sipa commented at 4:29 pm on January 5, 2026: member

    @instagibbs Interesting, that’s arguably a more relevant metric in the grand scheme of things, though it doesn’t say much here, as the example isn’t specifically designed to need a high “minimum iterations to make progress” - it’s very well possible that examples exist that are far worse than this one in that regard.

    These outlier long optimal linearization times seem very dependent on the particular splitting strategy used. The current PR here uses max-gain always, except every third split is done randomly (so random is used iff m_self_merges for a chunk is 2, 5, 8, 11, …). It seems like there may be much better (and much worse) choices here. I’m investigating this more.

  60. sipa commented at 0:04 am on January 6, 2026: member

    This is pretty unexpected. I’ve analyzed a number of split strategies, and gathered the maximum iteration count needed to reach optimal, under a large-ish (~380000 clusters) rather arbitrarily constructed set of input clusters (while providing them with bad or missing input linearization).

     0MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 0) % 1) == 0: 387722  # always max_gain
     1MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 0) % 2) == 0: 5614508
     2MrrMrrMrrMrr: ((m_self_merges[chunk_idx] + 0) % 3) == 0: 3354629
     3MrrrMrrrMrrr: ((m_self_merges[chunk_idx] + 0) % 4) == 0: 1648066
     4MrrrrMrrrrMr: ((m_self_merges[chunk_idx] + 0) % 5) == 0: 2139700
     5MrrrrrMrrrrr: ((m_self_merges[chunk_idx] + 0) % 6) == 0: 1021390
     6
     7rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 0) % 1) != 0: 100000000 # always random
     8rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 0) % 2) != 0: 100000000
     9rMMrMMrMMrMM: ((m_self_merges[chunk_idx] + 0) % 3) != 0: 100000000
    10rMMMrMMMrMMM: ((m_self_merges[chunk_idx] + 0) % 4) != 0: 4479215
    11rMMMMrMMMMrM: ((m_self_merges[chunk_idx] + 0) % 5) != 0: 861766
    12rMMMMMrMMMMM: ((m_self_merges[chunk_idx] + 0) % 6) != 0: 459358
    13
    14MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 1) % 1) == 0: 387722    # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
    15rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 1) % 2) == 0: 100000000 # same as ((m_self_merges[chunk_idx] + 0) % 2) != 0
    16rrMrrMrrMrrM: ((m_self_merges[chunk_idx] + 1) % 3) == 0: 100000000
    17rrrMrrrMrrrM: ((m_self_merges[chunk_idx] + 1) % 4) == 0: 100000000
    18rrrrMrrrrMrr: ((m_self_merges[chunk_idx] + 1) % 5) == 0: 100000000
    19rrrrrMrrrrrM: ((m_self_merges[chunk_idx] + 1) % 6) == 0: 100000000
    20
    21rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 1) % 1) != 0: 100000000 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
    22MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 1) % 2) != 0: 5614508 # same as ((m_self_merges[chunk_idx] + 0) % 2) == 0
    23MMrMMrMMrMrM: ((m_self_merges[chunk_idx] + 1) % 3) != 0: 3659198 # the current PR
    24MMMrMMMrMMrM: ((m_self_merges[chunk_idx] + 1) % 4) != 0: 1083539
    25MMMMrMMMMrMM: ((m_self_merges[chunk_idx] + 1) % 5) != 0: 650703
    26MMMMMrMMMMMr: ((m_self_merges[chunk_idx] + 1) % 6) != 0: 668819
    27
    28MMMMMMMMMMMM: m_self_merges[chunk_idx] >= 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
    29rMMMMMMMMMMM: m_self_merges[chunk_idx] >= 1: 388318
    30rrMMMMMMMMMM: m_self_merges[chunk_idx] >= 2: 594013
    31rrrMMMMMMMMM: m_self_merges[chunk_idx] >= 3: 8543778
    32rrrrMMMMMMMM: m_self_merges[chunk_idx] >= 4: 27802956
    33rrrrrMMMMMMM: m_self_merges[chunk_idx] >= 5: 100000000
    34
    35rrrrrrrrrrrr: m_self_merges[chunk_idx] < 0: 100000000 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
    36Mrrrrrrrrrrr: m_self_merges[chunk_idx] < 1: 2191100
    37MMrrrrrrrrrr: m_self_merges[chunk_idx] < 2: 759915
    38MMMrrrrrrrrr: m_self_merges[chunk_idx] < 3: 464325
    39MMMMrrrrrrrr: m_self_merges[chunk_idx] < 4: 1042613
    40MMMMMrrrrrrr: m_self_merges[chunk_idx] < 5: 523463
    

    The maximum iteration count passed to Linearize was 100M, so when 100000000 appears it may well mean “indefinite”.

    More max-gain seems to be pretty much always better. However, I’m very surprised that always-random apparently doesn’t reach optimal for some input clusters. Will investigate further, but with this, I’m inclined to take something like ((m_self_merges[chunk_idx] + 0) % 6) != 0 rather than the PR’s ((m_self_merges[chunk_idx] + 1) % 3) != 0.

    EDIT: redid the numbers after fixing a bug that sometimes caused splits with zero gain.

     0MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 0) % 1) == 0: 387722  # always max_gain
     1MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 0) % 2) == 0: 6012644
     2MrrMrrMrrMrr: ((m_self_merges[chunk_idx] + 0) % 3) == 0: 3354629
     3MrrrMrrrMrrr: ((m_self_merges[chunk_idx] + 0) % 4) == 0: 1648066
     4MrrrrMrrrrMr: ((m_self_merges[chunk_idx] + 0) % 5) == 0: 2139700
     5MrrrrrMrrrrr: ((m_self_merges[chunk_idx] + 0) % 6) == 0: 1003802
     6
     7rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 0) % 1) != 0: 416225 # always random
     8rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 0) % 2) != 0: 4796320
     9rMMrMMrMMrMM: ((m_self_merges[chunk_idx] + 0) % 3) != 0: 1427591
    10rMMMrMMMrMMM: ((m_self_merges[chunk_idx] + 0) % 4) != 0: 1095346
    11rMMMMrMMMMrM: ((m_self_merges[chunk_idx] + 0) % 5) != 0: 570429
    12rMMMMMrMMMMM: ((m_self_merges[chunk_idx] + 0) % 6) != 0: 426605
    13
    14MMMMMMMMMMMM: ((m_self_merges[chunk_idx] + 1) % 1) == 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
    15rMrMrMrMrMrM: ((m_self_merges[chunk_idx] + 1) % 2) == 0: 4796320 # same as ((m_self_merges[chunk_idx] + 0) % 2) != 0
    16rrMrrMrrMrrM: ((m_self_merges[chunk_idx] + 1) % 3) == 0: 2473649
    17rrrMrrrMrrrM: ((m_self_merges[chunk_idx] + 1) % 4) == 0: 1520079
    18rrrrMrrrrMrr: ((m_self_merges[chunk_idx] + 1) % 5) == 0: 2107109
    19rrrrrMrrrrrM: ((m_self_merges[chunk_idx] + 1) % 6) == 0: 882761
    20
    21rrrrrrrrrrrr: ((m_self_merges[chunk_idx] + 1) % 1) != 0: 416225 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
    22MrMrMrMrMrMr: ((m_self_merges[chunk_idx] + 1) % 2) != 0: 6012644 # same as ((m_self_merges[chunk_idx] + 0) % 2) == 0
    23MMrMMrMMrMrM: ((m_self_merges[chunk_idx] + 1) % 3) != 0: 2223704 # the current PR
    24MMMrMMMrMMrM: ((m_self_merges[chunk_idx] + 1) % 4) != 0: 872725
    25MMMMrMMMMrMM: ((m_self_merges[chunk_idx] + 1) % 5) != 0: 916415
    26MMMMMrMMMMMr: ((m_self_merges[chunk_idx] + 1) % 6) != 0: 668819
    27
    28MMMMMMMMMMMM: m_self_merges[chunk_idx] >= 0: 387722 # always max_gain, same as ((m_self_merges[chunk_idx] + 0) % 1) == 0
    29rMMMMMMMMMMM: m_self_merges[chunk_idx] >= 1: 471715
    30rrMMMMMMMMMM: m_self_merges[chunk_idx] >= 2: 598471
    31rrrMMMMMMMMM: m_self_merges[chunk_idx] >= 3: 270490
    32rrrrMMMMMMMM: m_self_merges[chunk_idx] >= 4: 248124
    33rrrrrMMMMMMM: m_self_merges[chunk_idx] >= 5: 291305
    34
    35rrrrrrrrrrrr: m_self_merges[chunk_idx] < 0: 416225 # always random, same as ((m_self_merges[chunk_idx] + 0) % 1) != 0
    36Mrrrrrrrrrrr: m_self_merges[chunk_idx] < 1: 700377
    37MMrrrrrrrrrr: m_self_merges[chunk_idx] < 2: 546130
    38MMMrrrrrrrrr: m_self_merges[chunk_idx] < 3: 464325
    39MMMMrrrrrrrr: m_self_merges[chunk_idx] < 4: 467645
    40MMMMMrrrrrrr: m_self_merges[chunk_idx] < 5: 523463
    

    As expected, generally more max-gain is better, but it does seem that alternating often between the two strategies is worse than sticking with one. Also - and this may be an artifact of a biased test set - it seems that doing a few random iterations first helps.

  61. gmaxwell commented at 1:12 am on January 6, 2026: contributor
    You could consider if there is some other progress metric you could gate randomness on– it really only needs to use random if its consistently not making progress under max_gain (or even ’less than expected progress’). Might also be worth checking if there isn’t some bug with random since IIRC in the initial test when you decided to MMrMMr didn’t have as much of negative effect over MMM as you’re seeing now.
  62. sipa commented at 4:10 am on January 6, 2026: member

    Certainly there seems to be something wrong. These clusters that need millions/billions of iterations only need a few 1000 iterations in my Python version of the algorithm, which ought to be roughly identical to the always-random strategy.

    EDIT: found the bug, indeed inside the random strategy (it would under certain conditions split equal-feerate tops off).

  63. sipa commented at 3:46 pm on January 6, 2026: member
    See updated results above.
  64. sipa force-pushed on Jan 6, 2026
  65. sipa commented at 9:53 pm on January 6, 2026: member
    Rebased, changed the approach here to “rMMMMMrMMMMM” (((m_self_merges[chunk_idx] + 0) % 6) != 0), and fixed the bug in the random strategy.
  66. sipa force-pushed on Jan 7, 2026
  67. sipa force-pushed on Jan 12, 2026
  68. DrahtBot added the label CI failed on Jan 12, 2026
  69. DrahtBot commented at 6:21 am on January 12, 2026: contributor

    🚧 At least one of the CI tasks failed. Task Windows native, fuzz, VS 2022: https://github.com/bitcoin/bitcoin/actions/runs/20908711728/job/60067268129 LLM reason (✨ experimental): Fuzz test txgraph failed due to an assertion violation (iters <= iters_for_optimal) causing the fuzz target to exit with code 1.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  70. sipa force-pushed on Jan 12, 2026
  71. DrahtBot removed the label CI failed on Jan 12, 2026
  72. sipa force-pushed on Jan 13, 2026
  73. sipa force-pushed on Jan 13, 2026
  74. DrahtBot added the label CI failed on Jan 13, 2026
  75. DrahtBot commented at 3:21 am on January 13, 2026: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20942872497/job/60179846256 LLM reason (✨ experimental): Lint check failed (scripted-diffs) causing the CI failure.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  76. DrahtBot removed the label CI failed on Jan 13, 2026
  77. fanquake referenced this in commit baa554f708 on Jan 15, 2026
  78. DrahtBot added the label Needs rebase on Jan 15, 2026
  79. clusterlin: split chunks based on maximum gain (optimization)
    This introduces the notion of gain to the SFL algorithm. Given a chunk c,
    an active dependency d in it, and the chunks (t, b) that c would split into
    if d were deactivated, the gain is defined as either (they are equivalent):
    
      (feerate(t) - feerate(b)) * size(t) * size(b)
      fee(t) * size(b) - fee(b) * size(t)
    
    It happens to also be equal to these:
    
      (feerate(t) - feerate(c)) * size(t) * size(c)
      fee(t) * size(c) - fee(c) * size(t)
    
    Its relevance is that this metric is proportional to a lower bound on the area
    under the fee-size diagram which would be gained IF a deactivation of d does not
    result in a self-merge of t and b again.
    
    This commit adds logic to find, within each chunk, the dependency with the highest
    gain. In benchmarks, this appears to be a very good heuristic for deciding which
    splits are worth making.
    701dc90d4a
  80. clusterlin: split tx/chunk dep counting (preparation)
    This splits the chunk_deps variable in LoadLinearization in two, one for
    tracking tx dependencies and one for chunk dependencies. This is a
    preparation for a later commit, where chunks won't be identified anymore
    by a presentative transaction in them, but by a separate index. With
    that, it seems weird to keep them both in the same structure if they
    will be indexed in an incompatible way.
    7484639288
  81. clusterlin: count chunk deps without loop (optimization)
    This small optimization avoids the need to loop over the parents of each
    transaction when initializing the dependency-counting structures inside
    GetLinearization().
    9739f81d5c
  82. scripted-diff: rename _rep -> _idx in SFL
    This is a preparation for the next commit, where chunks will no longer
    be identified using a representative transaction, but using a set index.
    Reduce the load of line changes by doing this rename ahead of time.
    
    -BEGIN VERIFY SCRIPT-
    sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
    -END VERIFY SCRIPT-
    6eac922fff
  83. clusterlin: get rid of DepData, reuse sets (optimization)
    This significantly changes the data structures used in SFL, based on the
    observation that the DepData::top_setinfo fields are quite wasteful:
    there is one per dependency (up to n^2/4), but we only ever need one per
    active dependency (of which there at most n-1). In total, the number of
    chunks plus the number of active dependencies is always exactly equal to
    the number of transactions, so it makes sense to have a shared pool of
    SetInfos, which are used for both chunks and top sets.
    
    To that effect, introduce a separate m_set_info variable, which stores a
    SetInfo per transaction. Some of these are used for chunk sets, and some
    for active dependencies' top sets. Every activation transforms the
    parent's chunk into the top set for the new dependency. Every
    deactivation transforms the top set into the new parent chunk.
    
    With that, there is little need for DepData anymore. Use parent/child
    TxIdxs to refer to dependencies, and find their top set by having a
    child TxIdx-indexed vector in each TxData, rather than a list of
    dependencies. This makes code for iterating over dependencies more
    natural and simpler.
    
    With indexes into m_set_data (SetIdx) becoming bounded by the number of
    transactions, we can use a SetType to represent sets of SetIdxs.
    Specifically, an m_chunk_idxs is added which contains all SetIdx
    referring to chunks. This leads to a much more natural way of iterating
    over chunks.
    
    Also use this opportunity to normalize many variable names.
    112c152fc8
  84. clusterlin: improve TxData::dep_top_idx type (optimization)
    The combined size of TxData::dep_top_idx can be 16 KiB with 64
    transactions and SetIdx = uint32_t. Use a smaller type where possible to
    reduce memory footprint and improve cache locality of m_tx_data.
    
    Also switch from an std::vector to an std::array, reducing allocation
    overhead and indirections.
    c38d3ca8cb
  85. clusterlin: abstract out functions from MergeStep (refactor)
    This is a simple refactor to make the code more readable.
    1c2e8be3bf
  86. clusterlin: simplify PickMergeCandidate (optimization)
    The current process consists of iterating over the transactions of the
    chunk one by one, and then for each figuring out which of its
    parents/children are in unprocessed chunks.
    
    Simplify this (and speed it up slightly) by splitting this process into
    two phases: first determine the union of all parents/children, and then
    find which chunks those belong to.
    9290af3889
  87. clusterlin: precompute reachable sets (optimization)
    Instead of computing the set of reachable transactions inside
    PickMergeCandidate, make the information precomputed, and updated in
    Activate (by merging the two chucks' reachable sets) and Deactive (by
    recomputing).
    
    This is a small performance gain on itself, but also a preparation for
    future optimizations that rely on quickly testing whether dependencies
    between chunks exist.
    820ea8cfc6
  88. clusterlin: make MergeSequence take SetIdx (simplification)
    Future changes will rely on knowing the chunk indexes of the two created
    chunks after a split. It is natural to return this information from
    Deactivate, which also simplifies MergeSequence.
    941b5fb558
  89. clusterlin: special-case self-merges (optimization)
    After a split, if the top part has a dependency on the bottom part, the
    first MergeSequence will always perform this merge and then stop. This
    is referred to as a self-merge.
    
    We can special case these by detecting self-merges early, and avoiding
    the overhead of a full MergeSequence which involves two
    PickMergeCandidate calls (a succesful and an unsuccesful one).
    a1f8064a2e
  90. clusterlin: keep track of active children (optimization)
    This means we can iterate over all active dependencies in a
    cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
    active dependencies in a set of transactions of size is at most ntx-1.
    
    (*) Asymptotically, this is not actually true, as for large transaction
    counts, iterating over a BitSet still scales with ntx. In practice
    however, where BitSets are represented by a constant number of integers,
    it holds.
    1664b57eb9
  91. clusterlin: track suboptimal chunks (optimization)
    This avoids adding them a second time to m_suboptimal_chunks when they
    happen to already be there.
    41c44d95a2
  92. clusterlin: use random split in SFL on every 6th attempt (feature)
    Out of an abundance of caution that adversarially-constructed clusters
    might reliably result in bad chunk split decisions with the maximum-gain
    strategy, make the first and every sixth consecutive attempt to split
    the same chunk use a random split strategy instead.
    ebe4110479
  93. clusterlin: unidirectional MakeTopological initially (optimization)
    It suffices to initially only attempt one direction of merges in
    MakeTopological(), and only try both directions on chunks that are the
    result of other merges.
    4d09d0c15d
  94. clusterlin: inline UpdateChunk into (De)Activate (optimization)
    The two calls to UpdateChunk, in Activate and Deactive each, are subtly
    different: the top one needs to update the chunk_idx of iterated
    transactions, while the bottom one leaves it unchanged. To exploit this
    difference, inline the four function calls, getting rid of UpdateChunks.
    
    This is also a preparation for a future improvement that inlines the
    recomputation of reachable sets in the same loop in Deactivate.
    fde8f7ef87
  95. clusterlin: inline GetReachable into Deactivate (optimization)
    Avoid two full iterations over all of a chunks' transactions to
    recompute the reachable sets, by inlining them into the
    dependency-updating loops.
    0172b97531
  96. sipa force-pushed on Jan 15, 2026
  97. DrahtBot removed the label Needs rebase on Jan 15, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-19 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me