Optimized SFL cluster linearization #34023

pull sipa wants to merge 18 commits into bitcoin:master from sipa:202512_sfl_opt changing 5 files +702 −436
  1. sipa commented at 8:21 pm on December 6, 2025: member

    Follow-up to #32545, part of #30289.

    This contains many of the optimizations that were originally part of #32545 itself.

  2. DrahtBot commented at 8:21 pm on December 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34023.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

  3. sipa force-pushed on Dec 6, 2025
  4. DrahtBot added the label CI failed on Dec 6, 2025
  5. DrahtBot commented at 8:52 pm on December 6, 2025: contributor

    🚧 At least one of the CI tasks failed. Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/19993781856/job/57338323718 LLM reason (✨ experimental): Cluster linearize tests fail due to a size mismatch (linearization size != depgraph TxCount) triggering assertion failures and aborted tests.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. sipa force-pushed on Dec 6, 2025
  7. sipa force-pushed on Dec 6, 2025
  8. sipa force-pushed on Dec 7, 2025
  9. sipa force-pushed on Dec 7, 2025
  10. DrahtBot removed the label CI failed on Dec 7, 2025
  11. sipa force-pushed on Dec 7, 2025
  12. sipa force-pushed on Dec 9, 2025
  13. sipa force-pushed on Dec 10, 2025
  14. sipa force-pushed on Dec 11, 2025
  15. DrahtBot added the label CI failed on Dec 12, 2025
  16. sipa force-pushed on Dec 12, 2025
  17. DrahtBot removed the label CI failed on Dec 12, 2025
  18. sipa force-pushed on Dec 12, 2025
  19. sipa force-pushed on Dec 12, 2025
  20. DrahtBot added the label CI failed on Dec 12, 2025
  21. DrahtBot commented at 11:07 pm on December 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Task Windows-cross to x86_64, ucrt: https://github.com/bitcoin/bitcoin/actions/runs/20182154259/job/57944831923 LLM reason (✨ experimental): Compiler error: -Werror treats a range-for over a tuple as an error in cluster_linearize.h, causing the build to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  22. DrahtBot removed the label CI failed on Dec 13, 2025
  23. sipa force-pushed on Dec 13, 2025
  24. sipa force-pushed on Dec 14, 2025
  25. sipa force-pushed on Dec 14, 2025
  26. sipa force-pushed on Dec 14, 2025
  27. DrahtBot added the label CI failed on Dec 14, 2025
  28. DrahtBot removed the label CI failed on Dec 15, 2025
  29. sipa force-pushed on Dec 15, 2025
  30. sipa force-pushed on Dec 15, 2025
  31. sipa force-pushed on Dec 16, 2025
  32. sipa force-pushed on Dec 16, 2025
  33. sipa commented at 9:23 pm on December 16, 2025: member
    I have moved the changes to TxGraph here to a separate PR, #34085, as it’s not actually dependent on the optimizations here.
  34. sipa force-pushed on Dec 17, 2025
  35. sipa force-pushed on Dec 18, 2025
  36. sipa force-pushed on Dec 19, 2025
  37. sipa force-pushed on Dec 19, 2025
  38. sipa commented at 3:31 pm on December 19, 2025: member
    Rebased after merge of #32545, but not marking as Ready for review yet, as I’m investigating another data structure layout. If that turns out to be fruitful, that may replace some of the commits here, so I don’t want to waste reviewers’ time on those yet.
  39. sipa force-pushed on Dec 21, 2025
  40. sipa marked this as ready for review on Dec 21, 2025
  41. sipa commented at 10:41 pm on December 21, 2025: member

    Ready for review.

    I have made a substantial change here, mostly in the new “get rid of DepData, reuse sets (optimization)” commit. It’s cleaner code, but does a few things quite differently from the merged code. As a result, it touches a good portion of the SFL code due to data structure changes. I’ve used the opportunity to make variable names more consistent throughout.

    Sorry for this - if I had considered this approach before #32545, I’d have used a data structure layout closer to this one from the start, avoiding such a big change. As is however, I don’t see a good way of splitting it up into smaller meaningful changes.

  42. sipa force-pushed on Dec 22, 2025
  43. sipa force-pushed on Dec 22, 2025
  44. sipa force-pushed on Dec 22, 2025
  45. DrahtBot added the label CI failed on Dec 22, 2025
  46. DrahtBot removed the label CI failed on Dec 22, 2025
  47. sipa force-pushed on Dec 22, 2025
  48. sipa force-pushed on Dec 22, 2025
  49. DrahtBot added the label CI failed on Dec 22, 2025
  50. DrahtBot commented at 10:29 pm on December 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20444032805/job/58743457093 LLM reason (✨ experimental): Lint check failed due to scripted-diffs detecting mismatches in code changes.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  51. DrahtBot removed the label CI failed on Dec 23, 2025
  52. clusterlin: split chunks based on maximum gain (optimization)
    This introduces the notion of gain to the SFL algorithm. Given a chunk c,
    an active dependency d in it, and the chunks (t, b) that c would split into
    if d were deactivated, the gain is defined as either (they are equivalent):
    
      (feerate(t) - feerate(b)) * size(t) * size(b)
      fee(t) * size(b) - fee(b) * size(t)
    
    It happens to also be equal to these:
    
      (feerate(t) - feerate(c)) * size(t) * size(c)
      fee(t) * size(c) - fee(c) * size(t)
    
    Its relevance is that this metric is proportional to a lower bound on the area
    under the fee-size diagram which would be gained IF a deactivation of d does not
    result in a self-merge of t and b again.
    
    This commit adds logic to find, within each chunk, the dependency with the highest
    gain. In benchmarks, this appears to be a very good heuristic for deciding which
    splits are worth making.
    bc27102438
  53. clusterlin: split tx/chunk dep counting (preparation)
    This splits the chunk_deps variable in LoadLinearization in two, one for
    tracking tx dependencies and one for chunk dependencies. This is a
    preparation for a later commit, where chunks won't be identified anymore
    by a presentative transaction in them, but by a separate index. With
    that, it seems weird to keep them both in the same structure if they
    will be indexed in an incompatible way.
    c5fee18a3f
  54. clusterlin: count chunk deps without loop (optimization)
    This small optimization avoids the need to loop over the parents of each
    transaction when initializing the dependency-counting structures inside
    GetLinearization().
    99736b7800
  55. scripted-diff: rename _rep -> _idx in SFL
    This is a preparation for the next commit, where chunks will no longer
    be identified using a representative transaction, but using a set index.
    Reduce the load of line changes by doing this rename ahead of time.
    
    -BEGIN VERIFY SCRIPT-
    sed --in-place 's/_rep/_idx/g' src/cluster_linearize.h
    -END VERIFY SCRIPT-
    0876d384e6
  56. clusterlin: get rid of DepData, reuse sets (optimization)
    This significantly changes the data structures used in SFL, based on the
    observation that the DepData::top_setinfo fields are quite wasteful:
    there is one per dependency (up to n^2/4), but we only ever need one per
    active dependency (of which there at most n-1). In total, the number of
    chunks plus the number of active dependencies is always exactly equal to
    the number of transactions, so it makes sense to have a shared pool of
    SetInfos, which are used for both chunks and top sets.
    
    To that effect, introduce a separate m_set_info variable, which stores a
    SetInfo per transaction. Some of these are used for chunk sets, and some
    for active dependencies' top sets. Every activation transforms the
    parent's chunk into the top set for the new dependency. Every
    deactivation transforms the top set into the new parent chunk.
    
    With that, there is little need for DepData anymore. Use parent/child
    TxIdxs to refer to dependencies, and find their top set by having a
    child TxIdx-indexed vector in each TxData, rather than a list of
    dependencies. This makes code for iterating over dependencies more
    natural and simpler.
    
    With indexes into m_set_data (SetIdx) becoming bounded by the number of
    transactions, we can use a SetType to represent sets of SetIdxs.
    Specifically, an m_chunk_idxs is added which contains all SetIdx
    referring to chunks. This leads to a much more natural way of iterating
    over chunks.
    
    Also use this opportunity to normalize many variable names.
    a9d05cc3f7
  57. clusterlin: improve TxData::dep_top_idx type (optimization)
    The combined size of TxData::dep_top_idx can be 16 KiB with 64
    transactions and SetIdx = uint32_t. Use a smaller type where possible to
    reduce memory footprint and improve cache locality of m_tx_data.
    
    Also switch from an std::vector to an std::array, reducing allocation
    overhead and indirections.
    7726f67db2
  58. clusterlin: abstract out functions from MergeStep (refactor)
    This is a simple refactor to make the code more readable.
    f452e1b7e5
  59. clusterlin: simplify PickMergeCandidate (optimization)
    The current process consists of iterating over the transactions of the
    chunk one by one, and then for each figuring out which of its
    parents/children are in unprocessed chunks.
    
    Simplify this (and speed it up slightly) by splitting this process into
    two phases: first determine the union of all parents/children, and then
    find which chunks those belong to.
    e4d5ef982c
  60. clusterlin: precompute reachable sets (optimization)
    Instead of computing the set of reachable transactions inside
    PickMergeCandidate, make the information precomputed, and updated in
    Activate (by merging the two chucks' reachable sets) and Deactive (by
    recomputing).
    
    This is a small performance gain on itself, but also a preparation for
    future optimizations that rely on quickly testing whether dependencies
    between chunks exist.
    aaebb6c904
  61. clusterlin: make MergeSequence take SetIdx (simplification)
    Future changes will rely on knowing the chunk indexes of the two created
    chunks after a split. It is natural to return this information from
    Deactivate, which also simplifies MergeSequence.
    13942f6930
  62. clusterlin: special-case self-merges (optimization)
    After a split, if the top part has a dependency on the bottom part, the
    first MergeSequence will always perform this merge and then stop. This
    is referred to as a self-merge.
    
    We can special case these by detecting self-merges early, and avoiding
    the overhead of a full MergeSequence which involves two
    PickMergeCandidate calls (a succesful and an unsuccesful one).
    c7c076d6df
  63. clusterlin: keep track of active children (optimization)
    This means we can iterate over all active dependencies in a
    cluster/chunk in O(ntx) time rather than O(ndeps) (*), as the number of
    active dependencies in a set of transactions of size is at most ntx-1.
    
    (*) Asymptotically, this is not actually true, as for large transaction
    counts, iterating over a BitSet still scales with ntx. In practice
    however, where BitSets are represented by a constant number of integers,
    it holds.
    ab0d2285ab
  64. sipa force-pushed on Dec 26, 2025
  65. sipa commented at 8:37 pm on December 26, 2025: member

    Made a number of improvements:

    • Moved some of the logic to new functions (PickRandomTx, GetReachable, PickMergeCandidate), hopefully making things more readable.
    • Added a precomputation for the set of reachable transactions (union of out-of-chunk parents/children within a chunk) rather than computing them on the fly at merging time in Activate() and Deactivate().
    • Added a special case for self-merges in Improve(), avoiding the need for MergeSequence() in that case.
    • Made chunk minimization use randomly-chosen pivot transactions, out of a concern that the earlier deterministic choice could be used to create reliably-slow-to-minimize clusters.
    • Optimized the chunk minimization logic to not retry certain cases that can be predicted to be useless: when a split of a chunk is found in its second phase (so with pivot down for example, after pivot up has been tried and failed already), then the created sub-chunk that holds the pivot can reuse the same pivot, direction, and phase, because we know no split with the pivot in the other direction is possible.
  66. clusterlin: track suboptimal chunks (optimization)
    This avoids adding them a second time to m_suboptimal_chunks when they
    happen to already be there.
    383cf262c8
  67. clusterlin: use random split in SFL on every 3rd attempt (feature)
    Out of an abundance of caution that adversarially-constructed clusters might
    reliably result in bad chunk split decisions with the maximum-gain strategy,
    make every third consecutive attempt to split the same chunk use a random
    split strategy instead.
    8a896005c1
  68. clusterlin: unidirectional MakeTopological initially (optimization)
    It suffices to initially only attempt one direction of merges in
    MakeTopological(), and only try both directions on chunks that are the
    result of other merges.
    9342728274
  69. clusterlin: minimize chunks (feature)
    After the normal optimization process finishes, and finds an optimal
    spanning forest, run a second process (while computation budget remains)
    to split chunks into minimal equal-feerate chunks.
    c990f53dc4
  70. clusterlin: inline UpdateChunk into (De)Activate (optimization)
    The two calls to UpdateChunk, in Activate and Deactive each, are subtly
    different: the top one needs to update the chunk_idx of iterated
    transactions, while the bottom one leaves it unchanged. To exploit this
    difference, inline the four function calls, getting rid of UpdateChunks.
    
    This is also a preparation for a future improvement that inlines the
    recomputation of reachable sets in the same loop in Deactivate.
    cd20513ef6
  71. clusterlin: inline GetReachable into Deactivate (optimization)
    Avoid two full iterations over all of a chunks' transactions to
    recompute the reachable sets, by inlining them into the
    dependency-updating loops.
    cf0d8da590
  72. sipa force-pushed on Dec 26, 2025
  73. DrahtBot added the label CI failed on Dec 26, 2025
  74. DrahtBot removed the label CI failed on Dec 26, 2025


sipa DrahtBot


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-30 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me