txgraph: randomize order of same-feerate distinct-cluster transactions

sipa commented at 11:43 pm on September 7, 2025: member

Part of #30289.

Before, the implicit ordering of transactions in a TxGraph’s main graph is sorted by:

feerate, high to low
Cluster, low to high m_sequence
linearization order within clusters

However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don’t think the determinism is that valuable.

For context: #28676 (review)

DrahtBot commented at 11:43 pm on September 7, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

No conflicts as of last run.

sipa force-pushed on Sep 9, 2025

in src/txgraph.cpp:485 in 593d418137 outdated

350+        // If neither pointer is nullptr, compare the Clusters' sequence numbers' hashes.
351         Assume(a == b || a->m_sequence != b->m_sequence);
352+        auto hash_a = ClusterSequenceHash(a->m_sequence);
353+        auto hash_b = ClusterSequenceHash(b->m_sequence);
354+        if (hash_a != hash_b) return hash_a <=> hash_b;
355+        // As a final tiebreak, compare the sequence numbers themselves.

instagibbs commented at 1:11 pm on September 11, 2025:

double checking this will happen IFF there’s as siphash collision, i.e., on average every ~2^32 sequence comparisons?

sipa commented at 3:52 pm on September 11, 2025:

Eh. When seen individual comparisons in isolation, independently, each has a probability of exactly 2^-64 of being a collision.

But within a group of N clusters, the probability that at least one pair exists whose hashes will collide, is approximately 1-exp(-N*(N-1)/2^65). You can be even more precise, as only pairs of clusters which have chunks with equal feerates matter, but this is a sufficiently low bound that I don’t think we care.

instagibbs commented at 1:42 pm on September 12, 2025:

I didn’t really mean to dive into the exact numbers, but yeah, for “rare collisions”.

instagibbs commented at 1:17 pm on September 11, 2025: member

concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster’s original sequence value

instagibbs approved

instagibbs commented at 1:57 pm on September 12, 2025: member

ACK 593d418137e4802bbe229707dcda5796522e2e5e

double-checked Trim benchmark just in case, looks unchanged

sipa force-pushed on Sep 17, 2025

sipa commented at 3:04 pm on September 17, 2025: member

Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.

sipa force-pushed on Oct 12, 2025

sipa commented at 1:02 pm on October 12, 2025: member

Rebased.

sipa force-pushed on Oct 14, 2025

in src/txgraph.cpp:467 in 949353ac63

462@@ -460,15 +463,26 @@ class TxGraphImpl final : public TxGraph
463             m_graph_index{graph_index}, m_chunk_count{chunk_count} {}
464     };
465 
466+    /** Compute the hash of a Cluster sequence number. This is used for randomizing the order of
467+     *  equal-feerate transaction in distinct clusters, to avoid leaking insertion order. */

instagibbs commented at 4:51 pm on October 14, 2025:

seems this got lost in rebase?

0     *  equal-feerate transactions in distinct clusters, to avoid leaking insertion order. */

sipa commented at 6:08 pm on October 14, 2025:

Fixed (again).

instagibbs commented at 4:57 pm on October 14, 2025: member

Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward

git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172

txgraph: randomize order of same-feerate distinct-cluster transactions

Before, the implicit ordering of transactions in a TxGraph's main graph is
sorted by:
- feerate, high to low
- Cluster, low to high m_sequence
- linearization order within clusters

However, since m_sequence values are assigned deterministically and predictably,
the ordering of same-feerate distinct-Cluster transactions can reveal something
about the order they were inserted into TxGraph.

In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder)
is used to decide the order of to-be-announced transactions. These should not
expose information about insertion order.

To deal with this, do not compare the m_sequence values directly, but hash them
with a fixed but random key first. In case this yields a collision (which needs
~192000 clusters to have a probability of 10^-9 of existing), fall back to
comparing the m_sequence values directly.

e325c3aff6

sipa force-pushed on Oct 14, 2025

glozow added the label Mempool on Oct 27, 2025

instagibbs commented at 8:13 pm on December 15, 2025: member

f0699bd5718eb099e242900d2b4494c171d70825 does something similar, with making chunk ordering within the same cluster at same chunk feerate randomized

sipa commented at 8:52 pm on December 15, 2025: member

@instagibbs Indeed, and with the same rationale.

instagibbs commented at 5:08 pm on December 18, 2025: member

Should TxGraph::m_rng be instrumented to allow it being deterministic to increase fuzz stability?

sipa commented at 6:13 pm on December 18, 2025: member

The txgraph tests sets the RNG to deterministic mode before every run, see the SeedRandomStateForTest(SeedRand::ZEROS); call at the start.

Without that, there is even logic to show an error if the global RNG is used. See CheckGlobalsImpl::~CheckGlobalsImpl in src/test/fuzz/util/check_globals.cpp.

instagibbs commented at 6:29 pm on December 18, 2025: member

ACK e325c3aff6b2bcff6a409a1b4c2ac1857c725fc6

sipa commented at 6:37 pm on December 27, 2025: member

After discussion with @gmaxwell, there is a possibility to do something more deterministic here, though I’m not sure it’s worth it.

It would be possible to sort by this metric:

First by chunk feerate
Then by cumulative equal-feerate chunk size: the sum of the sizes within all chunks of the same feerate in the same cluster, up to and including the one that holds the transaction under consideration.
Then by salted hash of cluster id.
Then by position within the chunk (or cluster).

Instead of this “sum of sizes of transactions in equal-feerate chunks up to current chunk”, it can be any function of the cluster that is monotonically increasing within equal-feerate transaction groups of the same cluster. It can even be multiple functions, e.g. first sum of sizes, and then sum of hashes of topologies of chunks or so.

This would make things more deterministic, but in an arbitrary way - making certain transactions preferred over others, without them being fundamentally more desirable for miners. Size at least has some bearing on the block building logic, but it’s not clear to me that smallest-first is necessarily best: if the block boundary falls near the end of a group of equal-feerate chunks, you’d want the smallest ones at the end. Using a hash of topology or so is even more arbitrary.

instagibbs commented at 3:00 pm on December 29, 2025: member

Tend to agree I don’t think the additional complexity is worth it since:

you’re not going to get rid of lots of non-determinism in practice (see brc20 patterns of same-size-same-feerate-same-topo chains)
may encourage weird games by transaction makers to “grind” it in general case to come in “on top”, if you consider the more general/arbitrary functions like topology hash.

txgraph: randomize order of same-feerate distinct-cluster transactions #33335

Code Coverage & Benchmarks

Reviews

Conflicts