txgraph: randomize order of same-feerate distinct-cluster transactions #33335

pull sipa wants to merge 12 commits into bitcoin:master from sipa:202509_txgraph_rand changing 5 files +824 −268
  1. sipa commented at 11:43 pm on September 7, 2025: member

    Part of #30289.

    Before, the implicit ordering of transactions in a TxGraph’s main graph is sorted by:

    • feerate, high to low
    • Cluster, low to high m_sequence
    • linearization order within clusters

    However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

    In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

    To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

    Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don’t think the determinism is that valuable.

    For context: #28676 (review)

  2. DrahtBot commented at 11:43 pm on September 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK instagibbs

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28676 (Cluster mempool implementation by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa force-pushed on Sep 9, 2025
  4. in src/txgraph.cpp:490 in 593d418137 outdated
    350+        // If neither pointer is nullptr, compare the Clusters' sequence numbers' hashes.
    351         Assume(a == b || a->m_sequence != b->m_sequence);
    352+        auto hash_a = ClusterSequenceHash(a->m_sequence);
    353+        auto hash_b = ClusterSequenceHash(b->m_sequence);
    354+        if (hash_a != hash_b) return hash_a <=> hash_b;
    355+        // As a final tiebreak, compare the sequence numbers themselves.
    


    instagibbs commented at 1:11 pm on September 11, 2025:
    double checking this will happen IFF there’s as siphash collision, i.e., on average every ~2^32 sequence comparisons?

    sipa commented at 3:52 pm on September 11, 2025:

    Eh. When seen individual comparisons in isolation, independently, each has a probability of exactly 2^-64 of being a collision.

    But within a group of N clusters, the probability that at least one pair exists whose hashes will collide, is approximately 1-exp(-N*(N-1)/2^65). You can be even more precise, as only pairs of clusters which have chunks with equal feerates matter, but this is a sufficiently low bound that I don’t think we care.


    instagibbs commented at 1:42 pm on September 12, 2025:
    I didn’t really mean to dive into the exact numbers, but yeah, for “rare collisions”.
  5. instagibbs commented at 1:17 pm on September 11, 2025: member
    concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster’s original sequence value
  6. instagibbs approved
  7. instagibbs commented at 1:57 pm on September 12, 2025: member

    ACK 593d418137e4802bbe229707dcda5796522e2e5e

    double-checked Trim benchmark just in case, looks unchanged

  8. txgraph: Make level of Cluster implicit (optimization)
    This reduces per-Cluster memory usage by making Clusters not aware of their
    own level. Instead, track it either in calling code, or infer it based on
    the transactions in them.
    fdcb5f17bd
  9. txgraph: move some sanity checks from Cluster to TxGraphImpl (refactor) 4281438bc3
  10. txgraph: avoid holes in DepGraph positions (mem optimization) b6cf300096
  11. depgraph: add memory usage control (feature) f8761d4dcd
  12. txgraph: keep data structures compact (mem optimization) 64e5c6979c
  13. txgraph: keep track of Cluster memory usage (preparation) 9c84cc34b9
  14. txgraph: expose memory usage estimate function (feature) a940e46bb0
  15. txgraph: make Cluster an abstract class (refactor) 752ce6b1ae
  16. txgraph: add Cluster functions to avoid downcasts in Merge/Split (refactor)
    This adds 4 functions to Cluster to help implement Merge() and Split() without
    explicit downcasts using static_cast (removing the assumption that the Clusters
    being merged/split are also GenericClusterImpls).
    89b7997e74
  17. txgraph: abstract out creation of empty Clusters (refactor) a2899bcd93
  18. txgraph: add SingletonClusterImpl (mem optimization)
    This adds a specialized Cluster implementation for singleton clusters, saving
    a significant amount of memory by avoiding the need for m_depgraph, m_mapping,
    and m_linearization, and their overheads.
    66401148dd
  19. txgraph: randomize order of same-feerate distinct-cluster transactions
    Before, the implicit ordering of transactions in a TxGraph's main graph is
    sorted by:
    - feerate, high to low
    - Cluster, low to high m_sequence
    - linearization order within clusters
    
    However, since m_sequence values are assigned deterministically and predictably,
    the ordering of same-feerate distinct-Cluster transactions can reveal something
    about the order they were inserted into TxGraph.
    
    In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder)
    is used to decide the order of to-be-announced transactions. These should not
    expose information about insertion order.
    
    To deal with this, do not compare the m_sequence values directly, but hash them
    with a fixed but random key first. In case this yields a collision (which needs
    ~192000 clusters to have a probability of 10^-9 of existing), fall back to
    comparing the m_sequence values directly.
    45341bab33
  20. sipa force-pushed on Sep 17, 2025
  21. sipa commented at 3:04 pm on September 17, 2025: member
    Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-18 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me