cluster mempool: add TxGraph reorg functionality #31553

pull sipa wants to merge 34 commits into bitcoin:master from sipa:202412_txgraph_trim changing 12 files +4312 −151
  1. sipa commented at 7:06 pm on December 22, 2024: member

    Part of cluster mempool (#30289). Builds on top of #31444.

    During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a TxGraph::Trim(), which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.

    Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

    This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.

    The Trim() function is rather unusual compared to the TxGraph functionality added in previous PRs, in that Trim() makes it own decisions about what the resulting graph contents will be, without good specification of how it makes that decision - it is just a best-effort attempt (which is improved in the last commit). All other TxGraph mutators are simply to inform the graph about changes the calling mempool code decided on; this one lets the decision be made by txgraph.

    As part of this, the “oversized” property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).

  2. DrahtBot commented at 7:06 pm on December 22, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31553.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31519 (refactor: Use std::span over Span by maflcko)
    • #30605 (Cluster linearization: separate tests from tests-of-tests by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. glozow added the label Mempool on Jan 2, 2025
  4. sipa force-pushed on Jan 8, 2025
  5. DrahtBot added the label CI failed on Jan 9, 2025
  6. DrahtBot commented at 0:52 am on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35343429418

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. sipa force-pushed on Jan 9, 2025
  8. DrahtBot removed the label CI failed on Jan 9, 2025
  9. sipa force-pushed on Jan 9, 2025
  10. in src/txgraph.cpp:1653 in 0c8dc2323e outdated
    948+    Ref ret;
    949+    // Construct a new Entry, and link it with the Ref.
    950+    auto idx = m_entries.size();
    951+    m_entries.emplace_back();
    952+    auto& entry = m_entries.back();
    953+    entry.m_ref = &ret;
    


    theuni commented at 6:43 pm on January 9, 2025:
    I’m not sure how this is intended to be used, but storing a stack address seems like a problem? RVO may help but that seems brittle. I imagine the caller should be passing in their own Ref instead?

    sipa commented at 8:19 pm on January 9, 2025:

    I believe it is safe, both with NRVO and without.

    With NRVO, ret is constructed directly in the caller’s target destination, so this isn’t a pointer to local stack space.

    Without NRVO, the Ref(Ref&&) move constructor is invoked by return ret;, which will update the pointer to the caller’s destination.


    theuni commented at 10:15 pm on January 9, 2025:
    Ah, right, I missed that the move ctor would handle the update. Thanks for explaining.
  11. in src/txgraph.cpp:1172 in 0c8dc2323e outdated
    1167+    }
    1168+}
    1169+
    1170+TxGraph::Ref& TxGraph::Ref::operator=(Ref&& other) noexcept
    1171+{
    1172+    // Inform both TxGraphs about the Refs being swapped.
    


    theuni commented at 6:48 pm on January 9, 2025:
    Why is this doing an effective swap? I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index. Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    sipa commented at 8:38 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.


    sipa commented at 9:34 pm on January 9, 2025:
    Anyway, done!

    theuni commented at 9:56 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.

    Afaik the move/swap idiom is only safe if the swapped-to value’s dtor doesn’t have any interesting ordering requirements or side-effects.

    As a contrived example, a user may do something like:

    0std::vector<TxGraph::Ref> vec;
    1
    2vec.push_back(txgraph->AddTransaction(fee));
    3auto ref = txgraph->AddTransaction(fee2);
    4...
    5ref = std::move(vec.back());
    

    The vector now holds the old ref and UnlinkRef will not be called until that element is removed. I realize it’s allowed to be a “valid but unspecified state”, but I wouldn’t expect a ref to be hanging around.

  12. in src/txgraph.cpp:188 in 0c8dc2323e outdated
    183+    /** A class of objects held internally in TxGraphImpl, with information about a single
    184+     *  transaction. */
    185+    struct Entry
    186+    {
    187+        /** Pointer to the corresponding Ref object, if any. */
    188+        Ref* m_ref;
    


    theuni commented at 6:51 pm on January 9, 2025:
    m_ref{nullptr};

    sipa commented at 9:34 pm on January 9, 2025:
    Done.
  13. in src/txgraph.cpp:297 in 0c8dc2323e outdated
    292+    LinearizationIndex lin_idx{0};
    293+    // Iterate over the chunks.
    294+    for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
    295+        auto chunk = chunking.GetChunk(chunk_idx);
    296+        // Iterate over the transactions in the linearization, which must match those in chunk.
    297+        while (true) {
    


    theuni commented at 7:02 pm on January 9, 2025:

    Trying to convince myself this is guaranteed to terminate…

    do{} while (!chunk.transactions.None()) rather than the break for readability? Or just while() if we need to guard against an empty linearization (presumably not?)


    sipa commented at 9:39 pm on January 9, 2025:

    It terminates because:

    • Every chunk contains at least one element (added an Assume for that)
    • In the inner loop, one element from that chunk is Reset() (added an Assume that it indeed resets a bit that was previously set).

    I’ve changed it to a do {} while(chunk.transactions.Any()); loop in the first commits, though it reverts back to a while (true) { ... } loop later, when the loop becomes a bit more complex.

  14. in src/txgraph.cpp:778 in 0c8dc2323e outdated
    306+}
    307+
    308+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
    309+{
    310+    // Iterate over the prefix of to_remove that applies to this cluster.
    311+    SetType todo;
    


    theuni commented at 7:17 pm on January 9, 2025:
    Assume !to_remove.empty() or early return if it’s allowed?

    sipa commented at 9:40 pm on January 9, 2025:
    Done. I’ve also added a comment to the Cluster::ApplyRemovals() function definition stating that at least one element from the front of to_remove must belong to this Cluster (which is really why that requirement exists).
  15. in src/txgraph.cpp:1662 in 0c8dc2323e outdated
    1002+    // Make sure the transaction isn't scheduled for removal.
    1003+    ApplyRemovals();
    1004+    return m_entries[GetRefIndex(arg)].m_locator.IsPresent();
    1005+}
    1006+
    1007+std::vector<TxGraph::Ref*> Cluster::GetAncestorRefs(const TxGraphImpl& graph, ClusterIndex idx) noexcept
    


    theuni commented at 7:42 pm on January 9, 2025:
    Looks like these 3 functions could reserve() for their ret vectors.

    sipa commented at 9:40 pm on January 9, 2025:
    Done. The third one disappears in a later commit, though.
  16. theuni commented at 8:03 pm on January 9, 2025: member
    Very quick and shallow pass through the initial impl commit. This PR is a lot to get through :)
  17. sipa force-pushed on Jan 9, 2025
  18. DrahtBot added the label CI failed on Jan 9, 2025
  19. DrahtBot commented at 11:22 pm on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35398283653

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  20. sipa force-pushed on Jan 10, 2025
  21. DrahtBot removed the label CI failed on Jan 10, 2025
  22. sipa force-pushed on Jan 10, 2025
  23. sipa force-pushed on Jan 16, 2025
  24. sipa force-pushed on Jan 22, 2025
  25. in src/txgraph.cpp:1548 in 6cb99b067c outdated
    1464@@ -1449,6 +1465,7 @@ Cluster::Cluster(TxGraphImpl& graph, const FeeFrac& feerate, GraphIndex graph_in
    1465 TxGraph::Ref TxGraphImpl::AddTransaction(const FeeFrac& feerate) noexcept
    1466 {
    1467     Assume(m_chunkindex_observers == 0 || m_clustersets.size() > 1);
    1468+    Assume(feerate.size > 0 && uint64_t(feerate.size) <= m_max_cluster_size);
    


    sdaftuar commented at 2:10 pm on January 24, 2025:
    FYI – in my rebase of #28676, I’m seeing tx_pool fuzz test failures due to this line. Not clear to me whether we should require the caller to enforce the policy requirement that a single tx be below the cluster size limit, or just let the caller discover a changeset is oversized and then reject?

    sipa commented at 2:34 pm on January 24, 2025:

    Right. That rule exists because the alternative requires existing clusters to be oversized as AddTransaction constructs a singleton cluster instantly. All other forms of oversizedness happen as a result of applying dependencies, which are done lazily.

    I’ll think about relaxing this.


    sipa commented at 8:45 pm on January 26, 2025:
    Done, it is now allowed to have individually oversized transactions.
  26. sipa force-pushed on Jan 24, 2025
  27. sipa commented at 10:14 pm on January 24, 2025: member

    Some changes:

    • As a result of dropping Cleanup in the base PR, Trim now reports which transactions it removed, as it becomes the caller’s responsibility of destroying Refs.
  28. DrahtBot added the label CI failed on Jan 24, 2025
  29. DrahtBot commented at 11:23 pm on January 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36148923463

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  30. sipa force-pushed on Jan 26, 2025
  31. sipa commented at 4:42 am on January 26, 2025: member
    • Add support for calling AddTransaction with a feerate whose size already violates the cluster size limit.
  32. DrahtBot removed the label CI failed on Jan 26, 2025
  33. sipa force-pushed on Jan 26, 2025
  34. sipa force-pushed on Jan 30, 2025
  35. DrahtBot commented at 1:01 am on January 31, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36451226624

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  36. DrahtBot added the label CI failed on Jan 31, 2025
  37. sipa force-pushed on Jan 31, 2025
  38. DrahtBot removed the label CI failed on Jan 31, 2025
  39. sipa force-pushed on Jan 31, 2025
  40. sipa force-pushed on Feb 1, 2025
  41. sipa force-pushed on Feb 4, 2025
  42. DrahtBot added the label CI failed on Feb 4, 2025
  43. DrahtBot removed the label CI failed on Feb 4, 2025
  44. sipa force-pushed on Feb 6, 2025
  45. sipa force-pushed on Feb 11, 2025
  46. sipa force-pushed on Feb 12, 2025
  47. DrahtBot commented at 11:49 pm on February 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37128455883

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  48. DrahtBot added the label CI failed on Feb 12, 2025
  49. sipa force-pushed on Feb 13, 2025
  50. DrahtBot removed the label CI failed on Feb 13, 2025
  51. sipa force-pushed on Feb 14, 2025
  52. sipa force-pushed on Feb 20, 2025
  53. clusterlin: add FixLinearization function + fuzz test
    This function takes an existing ordering for transactions in a DepGraph, and
    makes it a valid linearization for it (i.e., topological). Any topological
    prefix of the input remains untouched.
    b6a7238ec3
  54. clusterlin: make IsAcyclic() a DepGraph member function
    ... instead of being a separate test-only function.
    
    Also add a fuzz test for it returning false.
    87fed74a9a
  55. clusterlin: (refactor) ClusterIndex -> DepGraphIndex
    Since cluster_linearize.h does not actually have a Cluster type anymore, it is more
    appropriate to rename the index type to DepGraphIndex.
    9363105246
  56. feefrac: introduce tagged wrappers to distinguish vsize/WU rates 2637908a18
  57. txgraph: (feature) add initial version
    This adds an initial version of the txgraph module, with the TxGraph class.
    It encapsulates knowledge about the fees, sizes, and dependencies between all
    mempool transactions, but nothing else.
    
    In particular, it lacks knowledge about txids, inputs, outputs, CTransactions,
    ... and so on. Instead, it exposes a generic TxGraph::Ref type to reference
    nodes in the TxGraph, which can be passed around and stored by layers on top.
    cba16ead84
  58. txgraph: (tests) add simulation fuzz test
    This adds a simulation fuzz test for txgraph, by comparing with a naive
    reimplementation that models the entire graph as a single DepGraph, and
    clusters in TxGraph as connected components within that DepGraph.
    98ab5afb84
  59. txgraph: (tests) add internal sanity check function
    To make testing more powerful, expose a function to perform an internal sanity
    check on the state of a TxGraph. This is especially important as TxGraphImpl
    contains many redundantly represented pieces of information:
    
    * graph contains clusters, which refer to entries, but the entries refer back
    * graph maintains pointers to Ref objects, which point back to the graph.
    
    This lets us make sure they are always in sync.
    8813222308
  60. txgraph: (optimization) avoid per-group vectors for clusters & dependencies
    Instead construct a single vector with the list of all clusters in all groups,
    and then store per-group offset/range in that list.
    
    For dependencies, reuse m_deps_to_add, and store offset/range into that.
    43d746b27e
  61. txgraph: (feature) make max cluster count configurable and "oversize" state
    Instead of leaving the responsibility on higher layers to guarantee that
    no connected component within TxGraph (a barely exposed concept, except through
    GetCluster()) exceeds the cluster count limit, move this responsibility to
    TxGraph itself:
    * TxGraph retains a cluster count limit, but it becomes configurable at construction
      time (this primarily helps with testing that it is properly enforced).
    * It is always allowed to perform mutators on TxGraph, even if they would cause the
      cluster count limit to be exceeded. Instead, TxGraph exposes an IsOversized()
      function, which queries whether it is in a special "oversize" state.
    * During oversize state, many inspectors are unavailable, but mutators remain valid,
      so the higher layer can "fix" the oversize state before continuing.
    acc230d2f5
  62. txgraph: (optimization) avoid representative lookup for each dependency
    The m_deps_to_add vector is sorted by child Cluster*, which matches the
    order of an_clusters. This means we can walk through m_deps_to_add while
    doing the representative lookups for an_clusters, and reuse them.
    b54fd58008
  63. txgraph: (optimization) avoid looking up the same child cluster repeatedly
    Since m_deps_to_add has been sorted by child Cluster* already, all dependencies
    with the same child will be processed consecutively. Take advantage of this by
    remember the last partition merged with, and reusing that if applicable.
    fa9aff881d
  64. txgraph: (optimization) delay chunking while sub-acceptable
    Chunk-based information (primarily, chunk feerates) are never accessed without
    first bringing the relevant Clusters to an "acceptable" quality level. Thus,
    while operations are ongoing and Clusters are not acceptable, we can omit
    computing the chunkings and chunk feerates for Clusters.
    d63cee9251
  65. txgraph: (optimization) special-case removal of tail of cluster
    When transactions are removed from the tail of a cluster, we know the existing
    linearization remains acceptable/optimal (if it already was), but may just need
    splitting, so special case these into separate quality levels.
    849317dd55
  66. txgraph: (refactor) group per-graph data in ClusterSet
    This is a preparation for a next commit where a TxGraph will start representing
    potentially two distinct graphs (a main one, and a staging one with proposed
    changes).
    b4885a2d12
  67. txgraph: (refactor) abstract out ClearLocator
    Move a number of related modifications to TxGraphImpl into a separate
    function for removal of transactions. This is preparation for a later
    commit where this will be useful in more than one place.
    7fa108d67b
  68. txgraph: (feature) add staging support
    In order to make it easy to evaluate proposed changes to a TxGraph, introduce a
    "staging" mode, where mutators (AddTransaction, AddDependency, RemoveTransaction)
    do not modify the actual graph, but just a staging version of it. That staging
    graph can then be commited (replacing the main one with it), or aborted (discarding
    the staging).
    7017e89ab8
  69. txgraph: (optimization) cache oversizedness of graphs def212b423
  70. txgraph: (feature) destroying Ref means removing transaction
    Before this commit, if a TxGraph::Ref object is destroyed, it becomes impossible
    to refer to, but the actual corresponding transaction node in the TxGraph remains,
    and remains indefinitely as there is no way to remove it.
    
    Fix this by making the destruction of TxGraph::Ref trigger immediate removal of
    the corresponding transaction in TxGraph, both in main and staging if it exists.
    d885ae1a94
  71. txgraph: (feature) expose ability to compare transactions
    In order to make it possible for higher layers to compare transaction quality
    (ordering within the implicit total ordering on the mempool), expose a comparison
    function and test it.
    11442cfc33
  72. txgraph: (feature) Add DoWork function
    This can be called when the caller has time to spend now, and wants future operations
    to be fast.
    fc98be3add
  73. txgraph: (feature) Add CountDistinctClusters function 0e4b3943f3
  74. txgraph: (preparation) multiple inputs to Get{Ancestors,Descendant}Refs
    This is a preparation for the next commit, which adds a feature to request
    the Refs to multiple ancestors/descendants at once.
    225a360797
  75. txgraph: (feature) Get{Ancestors,Descendants}Union d5abb86439
  76. txgraph: (feature) Add GetMainStagingDiagrams function
    This allows determining whether the changes in a staging diagram unambiguously improve
    the graph, through CompareChunks().
    cdaf4f8519
  77. txgraph: (preparation) maintain chunk index
    This is preparation for exposing mining and eviction functionality in
    TxGraph.
    369448552a
  78. txgraph: (feature) introduce BlockBuilder interface
    This interface lets one iterate efficiently over the chunks of the main
    graph in a TxGraph, in the same order as CompareMainOrder. Each chunk
    can be marked as "included" or "skipped" (and in the latter case,
    dependent chunks will be skipped).
    6c275c62b2
  79. txgraph: (feature) introduce TxGraph::GetWorstMainChunk
    It returns the last chunk that would be suggested for mining by BlockBuilder
    objects. This is intended for eviction.
    67b69301c9
  80. txgraph: (optimization) reuse discarded chunkindex entries 80ab8682b0
  81. txgraph: (optimization) skipping end of cluster has no impact 31be1466a8
  82. txgraph: (optimization) special-case singletons in chunk index efad48b239
  83. txgraph: (feature) Add ability to configure maximum cluster size (weight)
    This is integrated with the oversized property: the graph is oversized when
    any connected component within it contains more than the cluster count limit
    many transactions, or when their combined size/weight exceeds the cluster size
    limit.
    
    It becomes disallowed to call AddTransaction with a size larger than this limit.
    In addition, SetTransactionFeeRate becomes SetTransactionFee, so that we do not
    need to deal with the case that a call to this function might affect the
    oversizedness.
    01f8623ee1
  84. txgraph: (feature) permit transactions that exceed cluster size limit 4ab8eea1c6
  85. txgraph: (feature) Add ability to trim oversized clusters
    During reorganisations, it is possible that dependencies get add which
    result in clusters that violate limits (count, size), when linking the
    new from-block transactions to the old from-mempool transactions.
    
    Unlike RBF scenarios, we cannot simply reject these policy violations
    when they are due to received blocks. To accomodate this, add a Trim()
    function to TxGraph, which removes transactions (including descendants)
    in order to make all resulting clusters satisfy the limits.
    dd90502650
  86. txgraph: (improvement) track multiple potential would-be clusters in Trim
    In a Trim function, for any given would-be group of clusters, a (rudimentary)
    linearization for the would-be cluster is constructed on the fly by adding
    eligible transactions to a heap. This continues until the total count or
    size of the transaction exists a configured limit. Any transactions which
    appear later in this linearization are discarded.
    
    However, given that transactions at the end are discarded, it is possible that
    the would-be cluster splits apart into multiple clusters. And those clusters
    may well permit far more transactions before their limits are reached.
    
    Take this into account by using a union-find structure inside TrimTxData to
    keep track of the count/size of all would-be clusters that would be formed
    at any point.
    
    This is not an optimization in terms of CPU usage or memory; it just
    improves the quality of the transactions removed by Trim().
    e43f6ca3b8
  87. sipa force-pushed on Feb 21, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me