cluster mempool: add TxGraph reorg functionality #31553

pull sipa wants to merge 13 commits into bitcoin:master from sipa:202412_txgraph_trim changing 3 files +1164 −60
  1. sipa commented at 7:06 pm on December 22, 2024: member

    Part of cluster mempool (#30289). Builds on top of #31444.

    During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a TxGraph::Trim(), which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.

    Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

    This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.

    The Trim() function is rather unusual compared to the TxGraph functionality added in previous PRs, in that Trim() makes it own decisions about what the resulting graph contents will be, without good specification of how it makes that decision - it is just a best-effort attempt (which is improved in the last commit). All other TxGraph mutators are simply to inform the graph about changes the calling mempool code decided on; this one lets the decision be made by txgraph.

    As part of this, the “oversized” property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).

  2. DrahtBot commented at 7:06 pm on December 22, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31553.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

  3. glozow added the label Mempool on Jan 2, 2025
  4. sipa force-pushed on Jan 8, 2025
  5. DrahtBot added the label CI failed on Jan 9, 2025
  6. DrahtBot commented at 0:52 am on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35343429418

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. sipa force-pushed on Jan 9, 2025
  8. DrahtBot removed the label CI failed on Jan 9, 2025
  9. sipa force-pushed on Jan 9, 2025
  10. in src/txgraph.cpp:1691 in 0c8dc2323e outdated
    948+    Ref ret;
    949+    // Construct a new Entry, and link it with the Ref.
    950+    auto idx = m_entries.size();
    951+    m_entries.emplace_back();
    952+    auto& entry = m_entries.back();
    953+    entry.m_ref = &ret;
    


    theuni commented at 6:43 pm on January 9, 2025:
    I’m not sure how this is intended to be used, but storing a stack address seems like a problem? RVO may help but that seems brittle. I imagine the caller should be passing in their own Ref instead?

    sipa commented at 8:19 pm on January 9, 2025:

    I believe it is safe, both with NRVO and without.

    With NRVO, ret is constructed directly in the caller’s target destination, so this isn’t a pointer to local stack space.

    Without NRVO, the Ref(Ref&&) move constructor is invoked by return ret;, which will update the pointer to the caller’s destination.


    theuni commented at 10:15 pm on January 9, 2025:
    Ah, right, I missed that the move ctor would handle the update. Thanks for explaining.
  11. in src/txgraph.cpp:1172 in 0c8dc2323e outdated
    1167+    }
    1168+}
    1169+
    1170+TxGraph::Ref& TxGraph::Ref::operator=(Ref&& other) noexcept
    1171+{
    1172+    // Inform both TxGraphs about the Refs being swapped.
    


    theuni commented at 6:48 pm on January 9, 2025:
    Why is this doing an effective swap? I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index. Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    sipa commented at 8:38 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.


    sipa commented at 9:34 pm on January 9, 2025:
    Anyway, done!

    theuni commented at 9:56 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.

    Afaik the move/swap idiom is only safe if the swapped-to value’s dtor doesn’t have any interesting ordering requirements or side-effects.

    As a contrived example, a user may do something like:

    0std::vector<TxGraph::Ref> vec;
    1
    2vec.push_back(txgraph->AddTransaction(fee));
    3auto ref = txgraph->AddTransaction(fee2);
    4...
    5ref = std::move(vec.back());
    

    The vector now holds the old ref and UnlinkRef will not be called until that element is removed. I realize it’s allowed to be a “valid but unspecified state”, but I wouldn’t expect a ref to be hanging around.

  12. in src/txgraph.cpp:188 in 0c8dc2323e outdated
    183+    /** A class of objects held internally in TxGraphImpl, with information about a single
    184+     *  transaction. */
    185+    struct Entry
    186+    {
    187+        /** Pointer to the corresponding Ref object, if any. */
    188+        Ref* m_ref;
    


    theuni commented at 6:51 pm on January 9, 2025:
    m_ref{nullptr};

    sipa commented at 9:34 pm on January 9, 2025:
    Done.
  13. in src/txgraph.cpp:297 in 0c8dc2323e outdated
    292+    LinearizationIndex lin_idx{0};
    293+    // Iterate over the chunks.
    294+    for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
    295+        auto chunk = chunking.GetChunk(chunk_idx);
    296+        // Iterate over the transactions in the linearization, which must match those in chunk.
    297+        while (true) {
    


    theuni commented at 7:02 pm on January 9, 2025:

    Trying to convince myself this is guaranteed to terminate…

    do{} while (!chunk.transactions.None()) rather than the break for readability? Or just while() if we need to guard against an empty linearization (presumably not?)


    sipa commented at 9:39 pm on January 9, 2025:

    It terminates because:

    • Every chunk contains at least one element (added an Assume for that)
    • In the inner loop, one element from that chunk is Reset() (added an Assume that it indeed resets a bit that was previously set).

    I’ve changed it to a do {} while(chunk.transactions.Any()); loop in the first commits, though it reverts back to a while (true) { ... } loop later, when the loop becomes a bit more complex.

  14. in src/txgraph.cpp:805 in 0c8dc2323e outdated
    306+}
    307+
    308+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
    309+{
    310+    // Iterate over the prefix of to_remove that applies to this cluster.
    311+    SetType todo;
    


    theuni commented at 7:17 pm on January 9, 2025:
    Assume !to_remove.empty() or early return if it’s allowed?

    sipa commented at 9:40 pm on January 9, 2025:
    Done. I’ve also added a comment to the Cluster::ApplyRemovals() function definition stating that at least one element from the front of to_remove must belong to this Cluster (which is really why that requirement exists).
  15. in src/txgraph.cpp:1662 in 0c8dc2323e outdated
    1002+    // Make sure the transaction isn't scheduled for removal.
    1003+    ApplyRemovals();
    1004+    return m_entries[GetRefIndex(arg)].m_locator.IsPresent();
    1005+}
    1006+
    1007+std::vector<TxGraph::Ref*> Cluster::GetAncestorRefs(const TxGraphImpl& graph, ClusterIndex idx) noexcept
    


    theuni commented at 7:42 pm on January 9, 2025:
    Looks like these 3 functions could reserve() for their ret vectors.

    sipa commented at 9:40 pm on January 9, 2025:
    Done. The third one disappears in a later commit, though.
  16. theuni commented at 8:03 pm on January 9, 2025: member
    Very quick and shallow pass through the initial impl commit. This PR is a lot to get through :)
  17. sipa force-pushed on Jan 9, 2025
  18. DrahtBot added the label CI failed on Jan 9, 2025
  19. DrahtBot commented at 11:22 pm on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35398283653

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  20. sipa force-pushed on Jan 10, 2025
  21. DrahtBot removed the label CI failed on Jan 10, 2025
  22. sipa force-pushed on Jan 10, 2025
  23. sipa force-pushed on Jan 16, 2025
  24. sipa force-pushed on Jan 22, 2025
  25. in src/txgraph.cpp:1548 in 6cb99b067c outdated
    1464@@ -1449,6 +1465,7 @@ Cluster::Cluster(TxGraphImpl& graph, const FeeFrac& feerate, GraphIndex graph_in
    1465 TxGraph::Ref TxGraphImpl::AddTransaction(const FeeFrac& feerate) noexcept
    1466 {
    1467     Assume(m_chunkindex_observers == 0 || m_clustersets.size() > 1);
    1468+    Assume(feerate.size > 0 && uint64_t(feerate.size) <= m_max_cluster_size);
    


    sdaftuar commented at 2:10 pm on January 24, 2025:
    FYI – in my rebase of #28676, I’m seeing tx_pool fuzz test failures due to this line. Not clear to me whether we should require the caller to enforce the policy requirement that a single tx be below the cluster size limit, or just let the caller discover a changeset is oversized and then reject?

    sipa commented at 2:34 pm on January 24, 2025:

    Right. That rule exists because the alternative requires existing clusters to be oversized as AddTransaction constructs a singleton cluster instantly. All other forms of oversizedness happen as a result of applying dependencies, which are done lazily.

    I’ll think about relaxing this.


    sipa commented at 8:45 pm on January 26, 2025:
    Done, it is now allowed to have individually oversized transactions.
  26. sipa force-pushed on Jan 24, 2025
  27. sipa commented at 10:14 pm on January 24, 2025: member

    Some changes:

    • As a result of dropping Cleanup in the base PR, Trim now reports which transactions it removed, as it becomes the caller’s responsibility of destroying Refs.
  28. DrahtBot added the label CI failed on Jan 24, 2025
  29. DrahtBot commented at 11:23 pm on January 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36148923463

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  30. sipa force-pushed on Jan 26, 2025
  31. sipa commented at 4:42 am on January 26, 2025: member
    • Add support for calling AddTransaction with a feerate whose size already violates the cluster size limit.
  32. DrahtBot removed the label CI failed on Jan 26, 2025
  33. sipa force-pushed on Jan 26, 2025
  34. sipa force-pushed on Jan 30, 2025
  35. DrahtBot commented at 1:01 am on January 31, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36451226624

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  36. DrahtBot added the label CI failed on Jan 31, 2025
  37. sipa force-pushed on Jan 31, 2025
  38. DrahtBot removed the label CI failed on Jan 31, 2025
  39. sipa force-pushed on Jan 31, 2025
  40. sipa force-pushed on Feb 1, 2025
  41. sipa force-pushed on Feb 4, 2025
  42. DrahtBot added the label CI failed on Feb 4, 2025
  43. DrahtBot removed the label CI failed on Feb 4, 2025
  44. sipa force-pushed on Feb 6, 2025
  45. sipa force-pushed on Feb 11, 2025
  46. sipa force-pushed on Feb 12, 2025
  47. DrahtBot commented at 11:49 pm on February 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37128455883

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  48. DrahtBot added the label CI failed on Feb 12, 2025
  49. sipa force-pushed on Feb 13, 2025
  50. DrahtBot removed the label CI failed on Feb 13, 2025
  51. sipa force-pushed on Feb 14, 2025
  52. sipa force-pushed on Feb 20, 2025
  53. sipa force-pushed on Feb 21, 2025
  54. in src/cluster_linearize.h:1381 in e43f6ca3b8 outdated
    1364+        // Figure out which elements need to be moved before elem.
    1365+        SetType place_before = done & depgraph.Ancestors(elem);
    1366+        // Find which position to place elem in (updating j), continuously moving the elements
    1367+        // in between forward.
    1368+        while (place_before.Any()) {
    1369+            // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
    


    yancyribbens commented at 10:20 pm on February 24, 2025:
    0            // j cannot be 0 or less; if it was, then there was necessarily nothing earlier which
    

    yancyribbens commented at 10:23 pm on February 24, 2025:
    0if it was, then there was necessarily nothing earlier which elem needs to be place before anymore, and place_before would be empty.
    

    This comment seems jumbled and hard to understand. Is it possible to word this better?

  55. in src/txgraph.cpp:647 in e43f6ca3b8 outdated
    615+
    616+void TxGraphImpl::CreateChunkData(GraphIndex idx, LinearizationIndex chunk_count) noexcept
    617+{
    618+    auto& entry = m_entries[idx];
    619+    if (!m_chunkindex_discarded.empty()) {
    620+        // Reuse an discarded node handle.
    


    yancyribbens commented at 10:48 pm on February 24, 2025:
    0        // Reuse a discarded node handle.
    
  56. sipa force-pushed on Mar 3, 2025
  57. sipa force-pushed on Mar 19, 2025
  58. sipa force-pushed on Mar 19, 2025
  59. DrahtBot added the label CI failed on Mar 19, 2025
  60. DrahtBot commented at 9:06 pm on March 19, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39067440822

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  61. sipa force-pushed on Mar 19, 2025
  62. DrahtBot removed the label CI failed on Mar 20, 2025
  63. DrahtBot added the label Needs rebase on Mar 20, 2025
  64. sipa force-pushed on Mar 20, 2025
  65. DrahtBot added the label CI failed on Mar 20, 2025
  66. DrahtBot commented at 1:42 pm on March 20, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39111492257

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  67. sipa force-pushed on Mar 20, 2025
  68. DrahtBot removed the label Needs rebase on Mar 20, 2025
  69. DrahtBot removed the label CI failed on Mar 20, 2025
  70. sipa force-pushed on Mar 22, 2025
  71. DrahtBot added the label CI failed on Mar 22, 2025
  72. DrahtBot commented at 1:23 am on March 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39212785621

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  73. sipa force-pushed on Mar 22, 2025
  74. DrahtBot removed the label CI failed on Mar 22, 2025
  75. sipa force-pushed on Mar 23, 2025
  76. sipa force-pushed on Mar 24, 2025
  77. glozow referenced this in commit f1d129d963 on Mar 26, 2025
  78. sipa force-pushed on Mar 27, 2025
  79. sipa commented at 3:49 am on March 27, 2025: member
    Rebased after merge of #31363, and on top of #32151.
  80. sipa force-pushed on Mar 27, 2025
  81. sipa force-pushed on Mar 27, 2025
  82. sipa force-pushed on Mar 28, 2025
  83. sipa force-pushed on Mar 28, 2025
  84. txgraph: Add GetMainStagingDiagrams function (feature)
    This allows determining whether the changes in a staging diagram unambiguously improve
    the graph, through CompareChunks().
    dac4d915a4
  85. txgraph: Maintain chunk index (preparation)
    This is preparation for exposing mining and eviction functionality in
    TxGraph.
    1a9fb7fd5a
  86. txgraph: Introduce TxGraphImpl observer tracking (preparation)
    This is preparation for a next commit which will introduce a class whose
    objects hold references to internals in TxGraphImpl, which disallows
    modifications to the graph while such objects exist.
    9bfd68d667
  87. txgraph: Generalize GetClusterRefs to support subsections (preparation)
    This is preparation for a next commit which will need a way to extract Refs
    for just individual chunks from a cluster.
    e610a1df3a
  88. txgraph: Introduce BlockBuilder interface (feature)
    This interface lets one iterate efficiently over the chunks of the main
    graph in a TxGraph, in the same order as CompareMainOrder. Each chunk
    can be marked as "included" or "skipped" (and in the latter case,
    dependent chunks will be skipped).
    f01ce7f2c9
  89. txgraph: Introduce TxGraph::GetWorstMainChunk (feature)
    It returns the last chunk that would be suggested for mining by BlockBuilder
    objects. This is intended for eviction.
    eb10f3a370
  90. txgraph: Reuse discarded chunkindex entries (optimization) 33f97b568f
  91. txgraph: Skipping end of cluster has no impact (optimization) 74bb8e3e21
  92. txgraph: Special-case singletons in chunk index (optimization) d8e4b92fb4
  93. txgraph: Add ability to configure maximum cluster size/weight (feature)
    This is integrated with the oversized property: the graph is oversized when
    any connected component within it contains more than the cluster count limit
    many transactions, or when their combined size/weight exceeds the cluster size
    limit.
    
    It becomes disallowed to call AddTransaction with a size larger than this limit.
    In addition, SetTransactionFeeRate becomes SetTransactionFee, so that we do not
    need to deal with the case that a call to this function might affect the
    oversizedness.
    f095a0016c
  94. txgraph: Permit transactions that exceed cluster size limit (feature) bc5d37f252
  95. txgraph: Add ability to trim oversized clusters (feature)
    During reorganisations, it is possible that dependencies get add which
    result in clusters that violate limits (count, size), when linking the
    new from-block transactions to the old from-mempool transactions.
    
    Unlike RBF scenarios, we cannot simply reject these policy violations
    when they are due to received blocks. To accomodate this, add a Trim()
    function to TxGraph, which removes transactions (including descendants)
    in order to make all resulting clusters satisfy the limits.
    b0131940ba
  96. txgraph: Track multiple potential would-be clusters in Trim (improvement)
    In a Trim function, for any given would-be group of clusters, a (rudimentary)
    linearization for the would-be cluster is constructed on the fly by adding
    eligible transactions to a heap. This continues until the total count or
    size of the transaction exists a configured limit. Any transactions which
    appear later in this linearization are discarded.
    
    However, given that transactions at the end are discarded, it is possible that
    the would-be cluster splits apart into multiple clusters. And those clusters
    may well permit far more transactions before their limits are reached.
    
    Take this into account by using a union-find structure inside TrimTxData to
    keep track of the count/size of all would-be clusters that would be formed
    at any point.
    
    This is not an optimization in terms of CPU usage or memory; it just
    improves the quality of the transactions removed by Trim().
    deaa870b2c
  97. sipa force-pushed on Mar 28, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me