txgraph: randomize order of same-feerate distinct-cluster transactions #33335

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202509_txgraph_rand changing 1 files +21 −6
  1. sipa commented at 11:43 pm on September 7, 2025: member

    Part of #30289.

    Before, the implicit ordering of transactions in a TxGraph’s main graph is sorted by:

    • feerate, high to low
    • Cluster, low to high m_sequence
    • linearization order within clusters

    However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

    In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

    To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

    Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don’t think the determinism is that valuable.

    For context: #28676 (review)

  2. DrahtBot commented at 11:43 pm on September 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK instagibbs

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. sipa force-pushed on Sep 9, 2025
  4. in src/txgraph.cpp:485 in 593d418137 outdated
    350+        // If neither pointer is nullptr, compare the Clusters' sequence numbers' hashes.
    351         Assume(a == b || a->m_sequence != b->m_sequence);
    352+        auto hash_a = ClusterSequenceHash(a->m_sequence);
    353+        auto hash_b = ClusterSequenceHash(b->m_sequence);
    354+        if (hash_a != hash_b) return hash_a <=> hash_b;
    355+        // As a final tiebreak, compare the sequence numbers themselves.
    


    instagibbs commented at 1:11 pm on September 11, 2025:
    double checking this will happen IFF there’s as siphash collision, i.e., on average every ~2^32 sequence comparisons?

    sipa commented at 3:52 pm on September 11, 2025:

    Eh. When seen individual comparisons in isolation, independently, each has a probability of exactly 2^-64 of being a collision.

    But within a group of N clusters, the probability that at least one pair exists whose hashes will collide, is approximately 1-exp(-N*(N-1)/2^65). You can be even more precise, as only pairs of clusters which have chunks with equal feerates matter, but this is a sufficiently low bound that I don’t think we care.


    instagibbs commented at 1:42 pm on September 12, 2025:
    I didn’t really mean to dive into the exact numbers, but yeah, for “rare collisions”.
  5. instagibbs commented at 1:17 pm on September 11, 2025: member
    concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster’s original sequence value
  6. instagibbs approved
  7. instagibbs commented at 1:57 pm on September 12, 2025: member

    ACK 593d418137e4802bbe229707dcda5796522e2e5e

    double-checked Trim benchmark just in case, looks unchanged

  8. sipa force-pushed on Sep 17, 2025
  9. sipa commented at 3:04 pm on September 17, 2025: member
    Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.
  10. sipa force-pushed on Oct 12, 2025
  11. sipa commented at 1:02 pm on October 12, 2025: member
    Rebased.
  12. sipa force-pushed on Oct 14, 2025
  13. in src/txgraph.cpp:467 in 949353ac63
    462@@ -460,15 +463,26 @@ class TxGraphImpl final : public TxGraph
    463             m_graph_index{graph_index}, m_chunk_count{chunk_count} {}
    464     };
    465 
    466+    /** Compute the hash of a Cluster sequence number. This is used for randomizing the order of
    467+     *  equal-feerate transaction in distinct clusters, to avoid leaking insertion order. */
    


    instagibbs commented at 4:51 pm on October 14, 2025:

    seems this got lost in rebase?

    0     *  equal-feerate transactions in distinct clusters, to avoid leaking insertion order. */
    

    sipa commented at 6:08 pm on October 14, 2025:
    Fixed (again).
  14. instagibbs commented at 4:57 pm on October 14, 2025: member

    Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward

    git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172

  15. txgraph: randomize order of same-feerate distinct-cluster transactions
    Before, the implicit ordering of transactions in a TxGraph's main graph is
    sorted by:
    - feerate, high to low
    - Cluster, low to high m_sequence
    - linearization order within clusters
    
    However, since m_sequence values are assigned deterministically and predictably,
    the ordering of same-feerate distinct-Cluster transactions can reveal something
    about the order they were inserted into TxGraph.
    
    In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder)
    is used to decide the order of to-be-announced transactions. These should not
    expose information about insertion order.
    
    To deal with this, do not compare the m_sequence values directly, but hash them
    with a fixed but random key first. In case this yields a collision (which needs
    ~192000 clusters to have a probability of 10^-9 of existing), fall back to
    comparing the m_sequence values directly.
    e325c3aff6
  16. sipa force-pushed on Oct 14, 2025
  17. glozow added the label Mempool on Oct 27, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-20 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me