cluster mempool: add TxGraph reorg functionality #31553

pull sipa wants to merge 25 commits into bitcoin:master from sipa:202412_txgraph_trim changing 8 files +3838 −14
  1. sipa commented at 7:06 pm on December 22, 2024: member

    Part of cluster mempool (#30289). Builds on top of #31444.

    During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a TxGraph::Trim(), which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.

    Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

    This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.

    As part of this, the “oversized” property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).

  2. DrahtBot commented at 7:06 pm on December 22, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31553.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

  3. glozow added the label Mempool on Jan 2, 2025
  4. sipa force-pushed on Jan 8, 2025
  5. DrahtBot added the label CI failed on Jan 9, 2025
  6. DrahtBot commented at 0:52 am on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35343429418

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. sipa force-pushed on Jan 9, 2025
  8. DrahtBot removed the label CI failed on Jan 9, 2025
  9. sipa force-pushed on Jan 9, 2025
  10. in src/txgraph.cpp:1546 in 0c8dc2323e outdated
    948+    Ref ret;
    949+    // Construct a new Entry, and link it with the Ref.
    950+    auto idx = m_entries.size();
    951+    m_entries.emplace_back();
    952+    auto& entry = m_entries.back();
    953+    entry.m_ref = &ret;
    


    theuni commented at 6:43 pm on January 9, 2025:
    I’m not sure how this is intended to be used, but storing a stack address seems like a problem? RVO may help but that seems brittle. I imagine the caller should be passing in their own Ref instead?

    sipa commented at 8:19 pm on January 9, 2025:

    I believe it is safe, both with NRVO and without.

    With NRVO, ret is constructed directly in the caller’s target destination, so this isn’t a pointer to local stack space.

    Without NRVO, the Ref(Ref&&) move constructor is invoked by return ret;, which will update the pointer to the caller’s destination.


    theuni commented at 10:15 pm on January 9, 2025:
    Ah, right, I missed that the move ctor would handle the update. Thanks for explaining.
  11. in src/txgraph.cpp:1172 in 0c8dc2323e outdated
    1167+    }
    1168+}
    1169+
    1170+TxGraph::Ref& TxGraph::Ref::operator=(Ref&& other) noexcept
    1171+{
    1172+    // Inform both TxGraphs about the Refs being swapped.
    


    theuni commented at 6:48 pm on January 9, 2025:
    Why is this doing an effective swap? I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index. Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    sipa commented at 8:38 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.


    sipa commented at 9:34 pm on January 9, 2025:
    Anyway, done!

    theuni commented at 9:56 pm on January 9, 2025:

    Why is this doing an effective swap?

    I think this is quite common, that move-construction is effectively performing a swap.

    I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

    That’s possible too, and slightly more efficient I guess.

    Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

    Indeed. I don’t think that’s a problem.

    Afaik the move/swap idiom is only safe if the swapped-to value’s dtor doesn’t have any interesting ordering requirements or side-effects.

    As a contrived example, a user may do something like:

    0std::vector<TxGraph::Ref> vec;
    1
    2vec.push_back(txgraph->AddTransaction(fee));
    3auto ref = txgraph->AddTransaction(fee2);
    4...
    5ref = std::move(vec.back());
    

    The vector now holds the old ref and UnlinkRef will not be called until that element is removed. I realize it’s allowed to be a “valid but unspecified state”, but I wouldn’t expect a ref to be hanging around.

  12. in src/txgraph.cpp:188 in 0c8dc2323e outdated
    183+    /** A class of objects held internally in TxGraphImpl, with information about a single
    184+     *  transaction. */
    185+    struct Entry
    186+    {
    187+        /** Pointer to the corresponding Ref object, if any. */
    188+        Ref* m_ref;
    


    theuni commented at 6:51 pm on January 9, 2025:
    m_ref{nullptr};

    sipa commented at 9:34 pm on January 9, 2025:
    Done.
  13. in src/txgraph.cpp:297 in 0c8dc2323e outdated
    292+    LinearizationIndex lin_idx{0};
    293+    // Iterate over the chunks.
    294+    for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
    295+        auto chunk = chunking.GetChunk(chunk_idx);
    296+        // Iterate over the transactions in the linearization, which must match those in chunk.
    297+        while (true) {
    


    theuni commented at 7:02 pm on January 9, 2025:

    Trying to convince myself this is guaranteed to terminate…

    do{} while (!chunk.transactions.None()) rather than the break for readability? Or just while() if we need to guard against an empty linearization (presumably not?)


    sipa commented at 9:39 pm on January 9, 2025:

    It terminates because:

    • Every chunk contains at least one element (added an Assume for that)
    • In the inner loop, one element from that chunk is Reset() (added an Assume that it indeed resets a bit that was previously set).

    I’ve changed it to a do {} while(chunk.transactions.Any()); loop in the first commits, though it reverts back to a while (true) { ... } loop later, when the loop becomes a bit more complex.

  14. in src/txgraph.cpp:687 in 0c8dc2323e outdated
    306+}
    307+
    308+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
    309+{
    310+    // Iterate over the prefix of to_remove that applies to this cluster.
    311+    SetType todo;
    


    theuni commented at 7:17 pm on January 9, 2025:
    Assume !to_remove.empty() or early return if it’s allowed?

    sipa commented at 9:40 pm on January 9, 2025:
    Done. I’ve also added a comment to the Cluster::ApplyRemovals() function definition stating that at least one element from the front of to_remove must belong to this Cluster (which is really why that requirement exists).
  15. in src/txgraph.cpp:1605 in 0c8dc2323e outdated
    1002+    // Make sure the transaction isn't scheduled for removal.
    1003+    ApplyRemovals();
    1004+    return m_entries[GetRefIndex(arg)].m_locator.IsPresent();
    1005+}
    1006+
    1007+std::vector<TxGraph::Ref*> Cluster::GetAncestorRefs(const TxGraphImpl& graph, ClusterIndex idx) noexcept
    


    theuni commented at 7:42 pm on January 9, 2025:
    Looks like these 3 functions could reserve() for their ret vectors.

    sipa commented at 9:40 pm on January 9, 2025:
    Done. The third one disappears in a later commit, though.
  16. theuni commented at 8:03 pm on January 9, 2025: member
    Very quick and shallow pass through the initial impl commit. This PR is a lot to get through :)
  17. sipa force-pushed on Jan 9, 2025
  18. DrahtBot added the label CI failed on Jan 9, 2025
  19. DrahtBot commented at 11:22 pm on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35398283653

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  20. sipa force-pushed on Jan 10, 2025
  21. DrahtBot removed the label CI failed on Jan 10, 2025
  22. sipa force-pushed on Jan 10, 2025
  23. clusterlin: add FixLinearization function + fuzz test
    This function takes an existing ordering for transactions in a DepGraph, and
    makes it a valid linearization for it (i.e., topological). Any topological
    prefix of the input remains untouched.
    30beeff0ae
  24. clusterlin: make IsAcyclic() a DepGraph member function
    ... instead of being a separate test-only function.
    91fcc73089
  25. txgraph: (feature) add initial version
    This adds an initial version of the txgraph module, with the TxGraph class.
    It encapsulates knowledge about the fees, sizes, and dependencies between all
    mempool transactions, but nothing else.
    
    In particular, it lacks knowledge about txids, inputs, outputs, CTransactions,
    ... and so for. Instead, it exposes a generic TxGraph::Ref type to reference
    nodes in the TxGraph, which can be passed around and stored by layers on top.
    0eb039bcf3
  26. txgraph: (tests) add simulation fuzz test
    This adds a simulation fuzz test for txgraph, by comparing with a naive
    reimplementation that models the entire graph as a single DepGraph, and
    clusters in TxGraph as connected components within that DepGraph.
    18c8eafcd7
  27. txgraph: (tests) add internal sanity check function
    To make testing more powerful, expose a function to perform an internal sanity
    check on the state of a TxGraph. This is especially important as TxGraphImpl
    contains many redundantly represented pieces of information:
    
    * graph contains clusters, which refer to entries, but the entries refer back
    * graph maintains pointers to Ref objects, which point back to the graph.
    
    This lets us make sure they are always in sync.
    ca9e539c9e
  28. txgraph: (feature) make max cluster count configurable and "oversize" state
    Instead of leaving the responsibility on higher layers to guarantee that
    no connected component within TxGraph (a barely exposed concept, except through
    GetCluster()) exceeds the cluster count limit, move this responsibility to
    TxGraph itself:
    * TxGraph retains a cluster count limit, but it becomes configurable at construction
      time (this primarily helps with testing that it is properly enforced).
    * It is always allowed to perform mutators on TxGraph, even if they would cause the
      cluster count limit to be exceeded. Instead, TxGraph exposes an IsOversized()
      function, which queries whether it is in a special "oversize" state.
    * During oversize state, many inspectors are unavailable, but mutators remain valid,
      so the higher layer can "fix" the oversize state before continuing.
    652d5aad6e
  29. txgraph: (optimization) delay chunking while sub-acceptable
    Chunk-based information (primarily, chunk feerates) are never accessed without
    first bringing the relevant Clusters to an "acceptable" quality level. Thus,
    while operations are ongoing and Clusters are not acceptable, we can omit
    computing the chunkings and chunk feerates for Clusters.
    2734dcb344
  30. txgraph: (optimization) special-case removal of tail of cluster
    When transactions are removed from the tail of a cluster, we know the existing
    linearization remains acceptable/optimal (if it already was), but may just need
    splitting, so special case these into separate quality levels.
    a5a41bbd61
  31. txgraph: (refactor) group per-graph data in ClusterSet
    This is a preparation for a next commit where a TxGraph will start representing
    potentially two distinct graphs (a main one, and a staging one with proposed
    changes).
    b2b17e7a10
  32. txgraph: (refactor) abstract out ClearLocator
    Move a number of related modifications to TxGraphImpl into a separate
    function for removal of transactions. This is preparation for a later
    commit where this will be useful in more than one place.
    eb05507849
  33. txgraph: (feature) add staging support
    In order to make it easy to evaluate proposed changes to a TxGraph, introduce a
    "staging" mode, where mutators (AddTransaction, AddDependency, RemoveTransaction)
    do not modify the actual graph, but just a staging version of it. That staging
    graph can then be commited (replacing the main one with it), or aborted (discarding
    the staging).
    7b106d7ac0
  34. txgraph: (feature) destroying Ref means removing transaction
    Before this commit, if a TxGraph::Ref object is destroyed, it becomes impossible
    to refer to, but the actual corresponding transaction node in the TxGraph remains,
    and remains indefinitely as there is no way to remove it.
    
    Fix this by making the destruction of TxGraph::Ref trigger immediate removal of
    the corresponding transaction in TxGraph, both in main and staging if it exists.
    05b35a0f4f
  35. txgraph: (feature) expose ability to compare transactions
    In order to make it possible for higher layers to compare transaction quality
    (ordering within the implicit total ordering on the mempool), expose a comparison
    function and test it.
    4005a901af
  36. txgraph: (feature) Add DoWork function
    This can be called when the caller has time to spend now, and wants future operations
    to be fast.
    114cd31eb0
  37. txgraph: (feature) Add CountDistinctClusters function 0da43c2217
  38. txgraph: (feature) Add GetMainStagingDiagrams function
    This allows determining whether the changes in a staging diagram unambiguously improve
    the graph, through CompareChunks().
    d02e9eca3c
  39. txgraph: (preparation) maintain chunk index
    This is preparation for exposing mining and eviction functionality in
    TxGraph.
    e5d44f827a
  40. txgraph: (feature) introduce BlockBuilder interface
    This interface lets one iterate efficiently over the chunks of the main
    graph in a TxGraph, in the same order as CompareMainOrder. Each chunk
    can be marked as "included" or "skipped" (and in the latter case,
    dependent chunks will be skipped).
    2a60dabfd9
  41. txgraph: (feature) introduce TxGraph::GetWorstMainChunk
    It returns the last chunk that would be suggested for mining by BlockBuilder
    objects. This is intended for eviction.
    7bd675734a
  42. txgraph: (optimization) reuse discarded chunkindex entries 55d998a2df
  43. txgraph: (optimization) skipping end of cluster has no impact 645642932d
  44. txgraph: (optimization) special-case singletons in chunk index a17fae3c9d
  45. txgraph: (feature) Add ability to configure maximum cluster size (weight)
    This is integrated with the oversized property: the graph is oversized when
    any connected component within it contains more than the cluster count limit
    many transactions, or when their combined size/weight exceeds the cluster size
    limit.
    
    It becomes disallowed to call AddTransaction with a size larger than this limit.
    In addition, SetTransactionFeeRate becomes SetTransactionFee, so that we do not
    need to deal with the case that a call to this function might affect the
    oversizedness.
    6875953e73
  46. txgraph: (feature) Add ability to trim oversized clusters
    During reorganisations, it is possible that dependencies get add which
    result in clusters that violate limits (count, size), when linking the
    new from-block transactions to the old from-mempool transactions.
    
    Unlike RBF scenarios, we cannot simply reject these policy violations
    when they are due to received blocks. To accomodate this, add a Trim()
    function to TxGraph, which removes transactions (including descendants)
    in order to make all resulting clusters satisfy the limits.
    5f451f89b4
  47. txgraph: (optimization) track multiple potential would-be clusters in Trim
    In a Trim function, for any given would-be group of clusters, a (rudimentary)
    linearization for the would-be cluster is constructed on the fly by adding
    eligible transactions to a heap. This continues until the total count or
    size of the transaction exists a configured limit. Any transactions which
    appear later in this linearization are discarded.
    
    However, given that transactions at the end are discarded, it is possible that
    the would-be cluster splits apart into multiple clusters. And those clusters
    may well permit far more transactions before their limits are reached.
    
    Take this into account by using a union-find structure inside TrimTxData to
    keep track of the count/size of all would-be clusters that would be formed
    at any point.
    9046cbf9ed
  48. sipa force-pushed on Jan 16, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me