Find minimal chunks in SFL #34259

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202601_sfl_minimize changing 5 files +235 −29
  1. sipa commented at 5:17 AM on January 12, 2026: member

    Part of #30289.

    This was split off from #34023, because it's not really an optimization but a feature. The feature existed pre-SFL, so this brings SFL to parity in terms of functionality with the old code.

    The idea is that while optimality - as achieved by SFL before this PR - guarantees a linearization whose feerate diagram is optimal, it may be possible to split chunks into smaller equal-feerate parts. This is desirable because even though it doesn't change the diagram, it provides more flexibility for optimization (binpacking is easier when the pieces are smaller).

    Thus, this PR introduces the stronger notion of "minimality": optimal chunks, which are also split into their smallest possible pieces. To accomplish that, an additional step in the SFL algorithm is added which aims to split chunks into minimal equal-feerate parts where possible, without introducing circular dependencies between them. It works based on the observation that if an (already otherwise optimal) chunk has a way of being split into two equal-feerate parts, and T is a given transaction in the chunk, then we can find the split in two steps:

    • One time, pretend T has $\epsilon$ higher feerate than it really has. If a split exists with T in the top part, this will find it.
    • The other time, pretend T has $\epsilon$ lower feerate than it really has. If a split exists with T in the bottom part, this will find it.

    So we try both on each found optimal chunk. If neither works, the chunk is minimal. If one works, recurse into the split chunks to split them further.

  2. DrahtBot commented at 5:17 AM on January 12, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34259.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, marcofleon

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34257 (txgraph: deterministic optimal transaction order by sipa)
    • #34138 (Cluster mempool: more accurate cost model for SFL by sipa)
    • #34023 (Optimized SFL cluster linearization by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. sipa force-pushed on Jan 12, 2026
  4. instagibbs commented at 2:17 PM on January 12, 2026: member

    thank you for splitting this out

  5. in src/cluster_linearize.h:1550 in 9f147194eb outdated
    1542 | @@ -1361,11 +1543,19 @@ std::tuple<std::vector<DepGraphIndex>, bool, uint64_t> Linearize(const DepGraph<
    1543 |      }
    1544 |      // Make improvement steps to it until we hit the max_iterations limit, or an optimal result
    1545 |      // is found.
    1546 | -    bool optimal = false;
    1547 |      if (forest.GetCost() < max_iterations) {
    1548 |          forest.StartOptimizing();
    1549 |          do {
    1550 | -            if (!forest.OptimizeStep()) {
    1551 | +            if (!forest.OptimizeStep()) break;
    


    instagibbs commented at 3:00 PM on January 12, 2026:

    Linearize() description of the return tuples "optimal" value should pedantically say the result is minimal as well


    sipa commented at 6:36 PM on January 12, 2026:

    Done.

  6. sipa force-pushed on Jan 12, 2026
  7. sipa commented at 3:38 PM on January 12, 2026: member

    I've made this the "what to review next" in #30289, since I plan to rebase #34257 on top of this as well (several tests in it are only possible when we have minimal chunks).

  8. in src/cluster_linearize.h:1162 in 9697bc36c1
    1157 | +        bool move_pivot_down = flags & 1;
    1158 | +        /** Whether this is already the second stage. */
    1159 | +        bool second_stage = flags & 2;
    1160 | +
    1161 | +        // Find a random dependency whose top and bottom set feerates are equal, and which has
    1162 | +        // pivot as bottom (if move_pivot_down) or as top (if !move_pivot_down).
    


    instagibbs commented at 4:47 PM on January 12, 2026:

    micro-nit

            // pivot in bottom set (if move_pivot_down) or in top set (if !move_pivot_down).
    

    sipa commented at 6:36 PM on January 12, 2026:

    Done.

  9. in src/cluster_linearize.h:1124 in 9697bc36c1
    1120 | @@ -1068,6 +1121,119 @@ class SpanningForestState
    1121 |          return false;
    1122 |      }
    1123 |  
    1124 | +    /** Initialize data structure for minimizing the chunks. Step() cannot be called anymore
    


    instagibbs commented at 5:03 PM on January 12, 2026:
        /** Initialize data structure for minimizing the chunks. OptimizeStep() cannot be called anymore
    

    instagibbs commented at 5:04 PM on January 12, 2026:

    we should also say this shouldn't be called unless we are known-optimal, since I think that' assumed in the minimize step code


    sipa commented at 6:36 PM on January 12, 2026:

    Done, done.

  10. in src/cluster_linearize.h:1505 in 9697bc36c1 outdated
    1497 | @@ -1328,6 +1498,18 @@ class SpanningForestState
    1498 |              auto tx_idx = m_suboptimal_chunks[i];
    1499 |              assert(m_transaction_idxs[tx_idx]);
    1500 |          }
    1501 | +
    1502 | +        //
    1503 | +        // Verify m_nonminimal_chunks.
    1504 | +        //
    1505 | +        SetType nonminimal_reps;
    


    instagibbs commented at 5:40 PM on January 12, 2026:

    nit

            assert(m_nonminimal_chunks.size() <= depgraph.TxCount());
            SetType nonminimal_reps;
    

    sipa commented at 6:35 PM on January 12, 2026:

    Done, but with an even stronger condition: assert(m_nonminimal_chunks.IsSubsetOf(m_transaction_txns));.

  11. instagibbs approved
  12. instagibbs commented at 6:22 PM on January 12, 2026: member

    ACK 9697bc36c1387909e8a3384f09cd9121d8788d86

  13. sipa force-pushed on Jan 12, 2026
  14. in src/cluster_linearize.h:701 in ae716cd6ff
     696 | +            if (pos == 0) return tx_idx;
     697 | +            --pos;
     698 | +        }
     699 | +        Assume(false);
     700 | +        return TxIdx(-1);
     701 | +   }
    


    instagibbs commented at 10:27 PM on January 12, 2026:

    game breaking

        }
    

    sipa commented at 10:38 PM on January 12, 2026:

    Fixed.

  15. clusterlin: minimize chunks (feature)
    After the normal optimization process finishes, and finds an optimal
    spanning forest, run a second process (while computation budget remains)
    to split chunks into minimal equal-feerate chunks.
    da56ef239b
  16. sipa force-pushed on Jan 12, 2026
  17. fanquake added this to the milestone 31.0 on Jan 13, 2026
  18. marcofleon commented at 5:50 PM on January 14, 2026: contributor

    crACK da56ef239b12786e3a177afda14352dda4a70bc6

    Also left the clusterlin_linearize and clusterlin_sfl fuzz tests running for a while, and no issues.

    From what I can see, having the SFL internal state be minimal provides the advantage of making the optimal value more precise (it's now both optimal and minimal). It's a stronger condition for testing. Is there something else I'm missing, maybe wrt some future optimization, as to why having minimal SFL chunks is good?

  19. sipa commented at 6:52 PM on January 14, 2026: member

    @marcofleon All else being equal, smaller chunks are better simply because they are more flexible.

    A big chunk may not fit in a block, requiring a need to skip it, and use a lower-feerate option. If the big chunk can be split in smaller equal-feerate chunks, more of it may fit.

    Even with a hypothetical backtracking binpacking approach, more smaller chunks just means more possible ways to pack, so likely better.

    I've updated the PR description to explain this a bit.

  20. marcofleon commented at 7:55 PM on January 14, 2026: contributor

    Right, makes sense. I realized where I got confused. I thought ChunkLinearizationInfo() was already playing the part of "split up equal fee rate chunks", but of course the linearization order can change from having smaller SFL chunks. This would result in potentially more chunks being read in ChunkLinearizationInfo().

    Your explanation and PR description make more sense to me now, thanks.

  21. sipa commented at 8:03 PM on January 14, 2026: member

    Exactly. It is actually possible that accidentally the linearization order that comes out of SFL, when fed into ChunkLinearization(Info), results in splitting of chunks into equal-feerate parts, but this is not guaranteed. The PostLinearize() call after Linearize() also has a chance of improving the linearization in such a way that chunking splits things up further (and in practice, I suspect this will often be enough). However, even that is not guaranteed to always give the minimal resulting chunks. Building the splitting into SFL itself however can guarantee it.

  22. in src/txgraph.cpp:2065 in da56ef239b
    2062 | @@ -2063,8 +2063,7 @@ std::pair<uint64_t, bool> GenericClusterImpl::Relinearize(TxGraphImpl& graph, in
    2063 |      uint64_t rng_seed = graph.m_rng.rand64();
    2064 |      auto [linearization, optimal, cost] = Linearize(m_depgraph, max_iters, rng_seed, m_linearization, /*is_topological=*/IsTopological());
    2065 |      // Postlinearize to undo some of the non-determinism caused by randomizing the linearization.
    


    instagibbs commented at 8:08 PM on January 14, 2026:

    random Q: what's specific non-determinism is being removed by PostLin?


    sipa commented at 10:55 PM on January 14, 2026:

    For example:

    graph BT
      T1["T1: 2/3 "];
      T2["T2: 3/4 "];
      T3["T3:  7/10"];
      T4["T4: 100/1"];
      T4-->T1; T4-->T2; T4-->T3;
    

    In master today (after #32545, before #34257), the order of T1,T2,T3 is random. PostLinearize will always output T2,T3,T1,T4.

  23. fanquake merged this on Jan 15, 2026
  24. fanquake closed this on Jan 15, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-19 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me