cluster mempool: introduce TxGraph #31363

pull sipa wants to merge 13 commits into bitcoin:master from sipa:202411_txgraph changing 8 files +2576 −14
  1. sipa commented at 3:59 pm on November 24, 2024: member

    Part of cluster mempool: #30289.

    1. Overview

    This introduces the TxGraph class, which encapsulates knowledge about the fees, sizes, and dependencies between all mempool transactions, but nothing else. In particular, it lacks knowledge about CTransaction, inputs, outputs, txids, wtxids, prioritization, validatity, policy rules, and a lot more. Being restricted to just those aspects of the mempool makes the behavior very easy to fully specify (ignoring the actual linearizations produced), and write simulation-based tests for (which are included in this PR).

    2. Interface

    The interface can be largely categorized into:

    • Mutation functions:
      • AddTransaction (add a new transaction with specified feerate, and get a Ref object back to identify it).
      • RemoveTransaction (given a Ref object, remove the transaction).
      • AddDependency (given two Ref objects, add a dependency between them).
      • SetTransactionFeerate (modify the feerate associated with a Ref object).
    • Inspector functions:
      • GetAncestors (get the ancestor set in the form of Ref* pointers)
      • GetDescendants (get the descendant set in the form of Ref* pointers)
      • GetCluster (get the connected component set in the form of Ref* pointers, in the order they would be mined).
      • GetIndividualFeerate (get the feerate of a tranaction)
      • GetChunkFeerate (get the mining score of a transaction)
    • Staging functions:
      • StartStaging (make all future mutations operate on a proposed transaction graph)
      • CommitStaging (apply all the changes that are staged)
      • AbortStaging (discard all the changes that are staged)

    3. Implementation

    Internally the graph data is kept in clustered form (partitioned into connected components), for which linearizations are maintained and updated as needed using the cluster_linearize.h algorithms under the hood, but this is hidden from the users of this class. Implementation-wise, mutations are generally applied lazily, appending to queues of to-be-removed transactions and to-be-added dependencies, so they can be batched for higher performance. Inspectors will generally only evaluate as much as is needed to answer queries, with roughly 5 levels of processing to go to fully instantiated and acceptable cluster linearizations, in order:

    1. ApplyRemovals (take batches of to-be-removed transactions and translate them to “holes” in the corresponding Clusters/DepGraphs).
    2. SplitAll (creating holes in Clusters may cause them to break apart into smaller connected components, so make turn them into separate Clusters/linearizations).
    3. GroupClusters (figure out which Clusters will need to be combined in order to add requested to-be-added dependencies, as these may span clusters).
    4. ApplyDependencies (actually merge Clusters as precomputed by GroupClusters, and add the dependencies between them).
    5. MakeAcceptable (perform the LIMO linearization algorithm on Clusters to make sure their linearizations are acceptable).

    4. Future work

    This is only an initial version of TxGraph, and some functionality is missing before #28676 can be rebased on top of it:

    • The ability to get comparative feerate diagrams before/after for the set of staged changes (to evaluate RBF incentive-compatibility).
    • Mining interface (ability to iterate transactions quickly in mining score order).
    • Eviction interface (reverse of mining order, plus memory usage accounting).
    • Ability to inspect the would-be-constructed clusters (so they can be “fixed” if they violate policy rules, as opposed to just accepted/rejected, before applying - this is needed for reorgs which are not optional).
    • Interface for controlling how much effort is spent on LIMO. In this PR it is hardcoded.

    Then there are further improvements possible which would not block other work:

    • Making Cluster a virtual class with different implementations based on transaction count (which could dramatically reduce memory usage, as most Clusters are just a single transaction, for which the current implementation is overkill).
    • The ability to have background thread(s) for improving cluster linearizations.
  2. DrahtBot commented at 3:59 pm on November 24, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31363.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28676 ([WIP] Cluster mempool implementation by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa force-pushed on Nov 24, 2024
  4. DrahtBot commented at 4:20 pm on November 24, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33443336095

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  5. DrahtBot added the label CI failed on Nov 24, 2024
  6. sipa force-pushed on Nov 24, 2024
  7. sipa force-pushed on Nov 24, 2024
  8. DrahtBot removed the label CI failed on Nov 24, 2024
  9. sipa force-pushed on Nov 25, 2024
  10. sipa force-pushed on Nov 25, 2024
  11. sipa force-pushed on Nov 25, 2024
  12. glozow added the label Mempool on Nov 28, 2024
  13. clusterlin: add FixLinearization function + fuzz test
    This function takes an existing ordering for transactions in a DepGraph, and
    makes it a valid linearization for it (i.e., topological). Any topological
    prefix of the input remains untouched.
    48330d8916
  14. clusterlin: make IsAcyclic() a DepGraph member function
    ... instead of being a separate test-only function.
    f1c7072165
  15. txgraph: (feature) add initial version
    This adds an initial version of the txgraph module, with the TxGraph class.
    It encapsulates knowledge about the fees, sizes, and dependencies between all
    mempool transactions, but nothing else.
    
    In particular, it lacks knowledge about txids, inputs, outputs, CTransactions,
    ... and so for. Instead, it exposes a generic TxGraph::Ref type to reference
    nodes in the TxGraph, which can be passed around and stored by layers on top.
    6dedddd0e2
  16. txgraph: (tests) add simulation fuzz test
    This adds a simulation fuzz test for txgraph, by comparing with a naive
    reimplementation that models the entire graph as a single DepGraph, and
    clusters in TxGraph as connected components within that DepGraph.
    0e59bf818d
  17. txgraph: (tests) add internal sanity check function
    To make testing more powerful, expose a function to perform an internal sanity
    check on the state of a TxGraph. This is especially important as TxGraphImpl
    contains many redundantly represented pieces of information:
    
    * graph contains clusters, which refer to entries, but the entries refer back
    * graph maintains pointers to Ref objects, which point back to the graph.
    
    This lets us make sure they are always in sync.
    f0f6d19889
  18. txgraph: (feature) make max cluster count configurable and "oversize" state
    Instead of leaving the responsibility on higher layers to guarantee that
    no connected component within TxGraph (a barely exposed concept, except through
    GetCluster()) exceeds the cluster count limit, move this responsibility to
    TxGraph itself:
    * TxGraph retains a cluster count limit, but it becomes configurable at construction
      time (this primarily helps with testing that it is properly enforced).
    * It is always allowed to perform mutators on TxGraph, even if they would cause the
      cluster count limit to be exceeded. Instead, TxGraph exposes an IsOversized()
      function, which queries whether it is in a special "oversize" state.
    * During oversize state, many inspectors are unavailable, but mutators remain valid,
      so the higher layer can "fix" the oversize state before continuing.
    e927b829a9
  19. txgraph: (optimization) delay chunking while sub-acceptable
    Chunk-based information (primarily, chunk feerates) are never accessed without
    first bringing the relevant Clusters to an "acceptable" quality level. Thus,
    while operations are ongoing and Clusters are not acceptable, we can omit
    computing the chunkings and chunk feerates for Clusters.
    69e171a590
  20. txgraph: (optimization) special-case removal of tail of cluster
    When transactions are removed from the tail of a cluster, we know the existing
    linearization remains acceptable/optimal (if it already was), but may just need
    splitting, so special case these into separate quality levels.
    500f4cd22d
  21. txgraph: (refactor) group per-graph data in ClusterSet
    This is a preparation for a next commit where a TxGraph will start representing
    potentially two distinct graphs (a main one, and a staging one with proposed
    changes).
    58344670f4
  22. txgraph: (refactor) abstract out ClearLocator
    Move a number of related modifications to TxGraphImpl into a separate
    function for removal of transactions. This is preparation for a later
    commit where this will be useful in more than one place.
    823357392a
  23. txgraph: (feature) add staging support
    In order to make it easy to evaluate proposed changes to a TxGraph, introduce a
    "staging" mode, where mutators (AddTransaction, AddDependency, RemoveTransaction)
    do not modify the actual graph, but just a staging version of it. That staging
    graph can then be commited (replacing the main one with it), or aborted (discarding
    the staging).
    bbda77f3c2
  24. txgraph: (feature) destroying Ref means removing transaction
    Before this commit, if a TxGraph::Ref object is destroyed, it becomes impossible
    to refer to, but the actual corresponding transaction node in the TxGraph remains,
    and remains indefinitely as there is no way to remove it.
    
    Fix this by making the destruction of TxGraph::Ref trigger immediate removal of
    the corresponding transaction in TxGraph, both in main and staging if it exists.
    0ab07bfd2e
  25. txgraph: (feature) expose ability to compare transactions
    In order to make it possible for higher layers to compare transaction quality
    (ordering within the implicit total ordering on the mempool), expose a comparison
    function and test it.
    c4b4e1f358
  26. sipa force-pushed on Dec 2, 2024


sipa DrahtBot

Labels
Mempool


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me