txgraph: randomize order of same-feerate distinct-cluster transactions #33335

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202509_txgraph_rand changing 1 files +21 −6
  1. sipa commented at 11:43 pm on September 7, 2025: member

    Part of #30289.

    Before, the implicit ordering of transactions in a TxGraph’s main graph is sorted by:

    • feerate, high to low
    • Cluster, low to high m_sequence
    • linearization order within clusters

    However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

    In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

    To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

    Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don’t think the determinism is that valuable.

    For context: #28676 (review)

  2. DrahtBot commented at 11:43 pm on September 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    No conflicts as of last run.

  3. sipa force-pushed on Sep 9, 2025
  4. in src/txgraph.cpp:485 in 593d418137 outdated
    350+        // If neither pointer is nullptr, compare the Clusters' sequence numbers' hashes.
    351         Assume(a == b || a->m_sequence != b->m_sequence);
    352+        auto hash_a = ClusterSequenceHash(a->m_sequence);
    353+        auto hash_b = ClusterSequenceHash(b->m_sequence);
    354+        if (hash_a != hash_b) return hash_a <=> hash_b;
    355+        // As a final tiebreak, compare the sequence numbers themselves.
    


    instagibbs commented at 1:11 pm on September 11, 2025:
    double checking this will happen IFF there’s as siphash collision, i.e., on average every ~2^32 sequence comparisons?

    sipa commented at 3:52 pm on September 11, 2025:

    Eh. When seen individual comparisons in isolation, independently, each has a probability of exactly 2^-64 of being a collision.

    But within a group of N clusters, the probability that at least one pair exists whose hashes will collide, is approximately 1-exp(-N*(N-1)/2^65). You can be even more precise, as only pairs of clusters which have chunks with equal feerates matter, but this is a sufficiently low bound that I don’t think we care.


    instagibbs commented at 1:42 pm on September 12, 2025:
    I didn’t really mean to dive into the exact numbers, but yeah, for “rare collisions”.
  5. instagibbs commented at 1:17 pm on September 11, 2025: member
    concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster’s original sequence value
  6. instagibbs approved
  7. instagibbs commented at 1:57 pm on September 12, 2025: member

    ACK 593d418137e4802bbe229707dcda5796522e2e5e

    double-checked Trim benchmark just in case, looks unchanged

  8. sipa force-pushed on Sep 17, 2025
  9. sipa commented at 3:04 pm on September 17, 2025: member
    Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.
  10. sipa force-pushed on Oct 12, 2025
  11. sipa commented at 1:02 pm on October 12, 2025: member
    Rebased.
  12. sipa force-pushed on Oct 14, 2025
  13. in src/txgraph.cpp:467 in 949353ac63
    462@@ -460,15 +463,26 @@ class TxGraphImpl final : public TxGraph
    463             m_graph_index{graph_index}, m_chunk_count{chunk_count} {}
    464     };
    465 
    466+    /** Compute the hash of a Cluster sequence number. This is used for randomizing the order of
    467+     *  equal-feerate transaction in distinct clusters, to avoid leaking insertion order. */
    


    instagibbs commented at 4:51 pm on October 14, 2025:

    seems this got lost in rebase?

    0     *  equal-feerate transactions in distinct clusters, to avoid leaking insertion order. */
    

    sipa commented at 6:08 pm on October 14, 2025:
    Fixed (again).
  14. instagibbs commented at 4:57 pm on October 14, 2025: member

    Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward

    git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172

  15. txgraph: randomize order of same-feerate distinct-cluster transactions
    Before, the implicit ordering of transactions in a TxGraph's main graph is
    sorted by:
    - feerate, high to low
    - Cluster, low to high m_sequence
    - linearization order within clusters
    
    However, since m_sequence values are assigned deterministically and predictably,
    the ordering of same-feerate distinct-Cluster transactions can reveal something
    about the order they were inserted into TxGraph.
    
    In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder)
    is used to decide the order of to-be-announced transactions. These should not
    expose information about insertion order.
    
    To deal with this, do not compare the m_sequence values directly, but hash them
    with a fixed but random key first. In case this yields a collision (which needs
    ~192000 clusters to have a probability of 10^-9 of existing), fall back to
    comparing the m_sequence values directly.
    e325c3aff6
  16. sipa force-pushed on Oct 14, 2025
  17. glozow added the label Mempool on Oct 27, 2025
  18. instagibbs commented at 8:13 pm on December 15, 2025: member
    f0699bd5718eb099e242900d2b4494c171d70825 does something similar, with making chunk ordering within the same cluster at same chunk feerate randomized
  19. sipa commented at 8:52 pm on December 15, 2025: member
    @instagibbs Indeed, and with the same rationale.
  20. instagibbs commented at 5:08 pm on December 18, 2025: member
    Should TxGraph::m_rng be instrumented to allow it being deterministic to increase fuzz stability?
  21. sipa commented at 6:13 pm on December 18, 2025: member

    The txgraph tests sets the RNG to deterministic mode before every run, see the SeedRandomStateForTest(SeedRand::ZEROS); call at the start.

    Without that, there is even logic to show an error if the global RNG is used. See CheckGlobalsImpl::~CheckGlobalsImpl in src/test/fuzz/util/check_globals.cpp.

  22. instagibbs commented at 6:29 pm on December 18, 2025: member
    ACK e325c3aff6b2bcff6a409a1b4c2ac1857c725fc6
  23. sipa commented at 6:37 pm on December 27, 2025: member

    After discussion with @gmaxwell, there is a possibility to do something more deterministic here, though I’m not sure it’s worth it.

    It would be possible to sort by this metric:

    • First by chunk feerate
    • Then by cumulative equal-feerate chunk size: the sum of the sizes within all chunks of the same feerate in the same cluster, up to and including the one that holds the transaction under consideration.
    • Then by salted hash of cluster id.
    • Then by position within the chunk (or cluster).

    Instead of this “sum of sizes of transactions in equal-feerate chunks up to current chunk”, it can be any function of the cluster that is monotonically increasing within equal-feerate transaction groups of the same cluster. It can even be multiple functions, e.g. first sum of sizes, and then sum of hashes of topologies of chunks or so.

    This would make things more deterministic, but in an arbitrary way - making certain transactions preferred over others, without them being fundamentally more desirable for miners. Size at least has some bearing on the block building logic, but it’s not clear to me that smallest-first is necessarily best: if the block boundary falls near the end of a group of equal-feerate chunks, you’d want the smallest ones at the end. Using a hash of topology or so is even more arbitrary.

  24. instagibbs commented at 3:00 pm on December 29, 2025: member

    Tend to agree I don’t think the additional complexity is worth it since:

    1. you’re not going to get rid of lots of non-determinism in practice (see brc20 patterns of same-size-same-feerate-same-topo chains)
    2. may encourage weird games by transaction makers to “grind” it in general case to come in “on top”, if you consider the more general/arbitrary functions like topology hash.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-30 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me