txgraph: randomize order of same-feerate distinct-cluster transactions #33335

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202509_txgraph_rand changing 1 files +21 −6
  1. sipa commented at 11:43 pm on September 7, 2025: member

    Part of #30289.

    Before, the implicit ordering of transactions in a TxGraph’s main graph is sorted by:

    • feerate, high to low
    • Cluster, low to high m_sequence
    • linearization order within clusters

    However, since m_sequence values are assigned deterministically and predictably, the ordering of same-feerate distinct-Cluster transactions can reveal something about the order they were inserted into TxGraph.

    In #28676, this ordering (through TxGraph::CompareMainOrder) is used to decide the order of to-be-announced transactions. These should not expose information about insertion order.

    To deal with this, do not compare the m_sequence values directly, but hash them with a fixed but random key first. In case this yields a collision (which needs ~192000 clusters to have a probability of 10^-9 of existing), fall back to comparing the m_sequence values directly.

    Alternatives: we could instead use the pre-cluster-mempool approach of sorting by wtxid in case of ties, which is equally non-revealing, and deterministic. I find it cleaner to have a randomized internal total ordering rather than needing a partial ordering within TxGraph that needs to be complemented by tie-breaking logic inside the mempool, however, and I don’t think the determinism is that valuable.

    For context: #28676 (review)

  2. DrahtBot commented at 11:43 pm on September 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33335.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34257 (txgraph: deterministic optimal transaction order by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa force-pushed on Sep 9, 2025
  4. in src/txgraph.cpp:485 in 593d418137 outdated
    350+        // If neither pointer is nullptr, compare the Clusters' sequence numbers' hashes.
    351         Assume(a == b || a->m_sequence != b->m_sequence);
    352+        auto hash_a = ClusterSequenceHash(a->m_sequence);
    353+        auto hash_b = ClusterSequenceHash(b->m_sequence);
    354+        if (hash_a != hash_b) return hash_a <=> hash_b;
    355+        // As a final tiebreak, compare the sequence numbers themselves.
    


    instagibbs commented at 1:11 pm on September 11, 2025:
    double checking this will happen IFF there’s as siphash collision, i.e., on average every ~2^32 sequence comparisons?

    sipa commented at 3:52 pm on September 11, 2025:

    Eh. When seen individual comparisons in isolation, independently, each has a probability of exactly 2^-64 of being a collision.

    But within a group of N clusters, the probability that at least one pair exists whose hashes will collide, is approximately 1-exp(-N*(N-1)/2^65). You can be even more precise, as only pairs of clusters which have chunks with equal feerates matter, but this is a sufficiently low bound that I don’t think we care.


    instagibbs commented at 1:42 pm on September 12, 2025:
    I didn’t really mean to dive into the exact numbers, but yeah, for “rare collisions”.
  5. instagibbs commented at 1:17 pm on September 11, 2025: member
    concept ACK, I also think it removes the kinda weird behavior where same CFR clusters get mined/evicted based on the underlying cluster’s original sequence value
  6. instagibbs approved
  7. instagibbs commented at 1:57 pm on September 12, 2025: member

    ACK 593d418137e4802bbe229707dcda5796522e2e5e

    double-checked Trim benchmark just in case, looks unchanged

  8. sipa force-pushed on Sep 17, 2025
  9. sipa commented at 3:04 pm on September 17, 2025: member
    Rebased on top of #33157, as there are some conflicts, and I expect the other one will go in first.
  10. sipa force-pushed on Oct 12, 2025
  11. sipa commented at 1:02 pm on October 12, 2025: member
    Rebased.
  12. sipa force-pushed on Oct 14, 2025
  13. in src/txgraph.cpp:467 in 949353ac63
    462@@ -460,15 +463,26 @@ class TxGraphImpl final : public TxGraph
    463             m_graph_index{graph_index}, m_chunk_count{chunk_count} {}
    464     };
    465 
    466+    /** Compute the hash of a Cluster sequence number. This is used for randomizing the order of
    467+     *  equal-feerate transaction in distinct clusters, to avoid leaking insertion order. */
    


    instagibbs commented at 4:51 pm on October 14, 2025:

    seems this got lost in rebase?

    0     *  equal-feerate transactions in distinct clusters, to avoid leaking insertion order. */
    

    sipa commented at 6:08 pm on October 14, 2025:
    Fixed (again).
  14. instagibbs commented at 4:57 pm on October 14, 2025: member

    Thinking a bit more on the fact that this randomizes eviction ordering in addition to INV gossiping, but rebase was straight forward

    git range-diff master 593d418137e4802bbe229707dcda5796522e2e5e 949353ac63249bf51fce34f50ac27f6ec51d0172

  15. txgraph: randomize order of same-feerate distinct-cluster transactions
    Before, the implicit ordering of transactions in a TxGraph's main graph is
    sorted by:
    - feerate, high to low
    - Cluster, low to high m_sequence
    - linearization order within clusters
    
    However, since m_sequence values are assigned deterministically and predictably,
    the ordering of same-feerate distinct-Cluster transactions can reveal something
    about the order they were inserted into TxGraph.
    
    In the full cluster mempool PR, this ordering (through TxGraph::CompareMainOrder)
    is used to decide the order of to-be-announced transactions. These should not
    expose information about insertion order.
    
    To deal with this, do not compare the m_sequence values directly, but hash them
    with a fixed but random key first. In case this yields a collision (which needs
    ~192000 clusters to have a probability of 10^-9 of existing), fall back to
    comparing the m_sequence values directly.
    e325c3aff6
  16. sipa force-pushed on Oct 14, 2025
  17. glozow added the label Mempool on Oct 27, 2025
  18. instagibbs commented at 8:13 pm on December 15, 2025: member
    f0699bd5718eb099e242900d2b4494c171d70825 does something similar, with making chunk ordering within the same cluster at same chunk feerate randomized
  19. sipa commented at 8:52 pm on December 15, 2025: member
    @instagibbs Indeed, and with the same rationale.
  20. instagibbs commented at 5:08 pm on December 18, 2025: member
    Should TxGraph::m_rng be instrumented to allow it being deterministic to increase fuzz stability?
  21. sipa commented at 6:13 pm on December 18, 2025: member

    The txgraph tests sets the RNG to deterministic mode before every run, see the SeedRandomStateForTest(SeedRand::ZEROS); call at the start.

    Without that, there is even logic to show an error if the global RNG is used. See CheckGlobalsImpl::~CheckGlobalsImpl in src/test/fuzz/util/check_globals.cpp.

  22. instagibbs commented at 6:29 pm on December 18, 2025: member
    ACK e325c3aff6b2bcff6a409a1b4c2ac1857c725fc6
  23. sipa commented at 6:37 pm on December 27, 2025: member

    After discussion with @gmaxwell, there is a possibility to do something more deterministic here, though I’m not sure it’s worth it.

    It would be possible to sort by this metric:

    • First by chunk feerate
    • Then by cumulative equal-feerate chunk size: the sum of the sizes within all chunks of the same feerate in the same cluster, up to and including the one that holds the transaction under consideration.
    • Then by salted hash of cluster id.
    • Then by position within the chunk (or cluster).

    Instead of this “sum of sizes of transactions in equal-feerate chunks up to current chunk”, it can be any function of the cluster that is monotonically increasing within equal-feerate transaction groups of the same cluster. It can even be multiple functions, e.g. first sum of sizes, and then sum of hashes of topologies of chunks or so.

    This would make things more deterministic, but in an arbitrary way - making certain transactions preferred over others, without them being fundamentally more desirable for miners. Size at least has some bearing on the block building logic, but it’s not clear to me that smallest-first is necessarily best: if the block boundary falls near the end of a group of equal-feerate chunks, you’d want the smallest ones at the end. Using a hash of topology or so is even more arbitrary.

  24. instagibbs commented at 3:00 pm on December 29, 2025: member

    Tend to agree I don’t think the additional complexity is worth it since:

    1. you’re not going to get rid of lots of non-determinism in practice (see brc20 patterns of same-size-same-feerate-same-topo chains)
    2. may encourage weird games by transaction makers to “grind” it in general case to come in “on top”, if you consider the more general/arbitrary functions like topology hash.
  25. gmaxwell commented at 4:49 pm on December 31, 2025: contributor

    hm? you can absolutely eliminate all non-determinism– e.g. with the old use the wtxid as a tiebreaker, if nothing else. I think you’ll probably have to eliminate all non-determinism one way or another eventually if this sort order is eventually used for efficient block relay. It may be desirable to eliminate it now so it doesn’t have to be done later and so e.g. metrics that determine if miners are censoring transactions get less noise from randomness here, or even compact block reconstruction rates are less harmed by differences in eviction choices if the network starts evicting. They’re fringe benefits (at least now) but I’m not aware of wtxid grinding to get a slight tiebreak advantage ever having been a concern. In any case, for the reasons that determinism is better (or worse) any marginal increase in determinism is good/bad.

    My discussion was sipa was basically just on ‘can’t you prefer smaller first’ under the idea that it would be better for block packing, but he convinced me that the block boundary could fall anywhere in the cluster and so that wasn’t an obvious win. But being able to sort on cumulative size is more or less orthogonal to making the sort more deterministic. @0xB10C any thoughts about block content predictability here?

  26. instagibbs commented at 4:52 pm on December 31, 2025: member

    e.g. with the old use the wtxid as a tiebreaker, if nothing else

    Right, I was speaking of the exact proposed methods (not wtxid!), we previously used time as a tiebreaker iirc (pre cluster mempool)

    edit: guess I didn’t publish my other time related comment

    I’m not aware of wtxid grinding to get a slight tiebreak advantage ever having been a concern

    Well, I don’t think we used wtxid as tiebreak previously, so it wouldn’t have been a concern? In practice I don’t think it would effect network resources, so I’m not particularly concerned, it’s just something new people might do.

  27. gmaxwell commented at 5:12 pm on December 31, 2025: contributor
    Ah I guess it was the relay sort tie broke on wtxid then. In any case I think it’s good to tiebreak first on things we want people to do but after all that the choice is something like wtxid or random (I agree arrival order is right out). A final arbitrary criteria risks that people game it to get a slight preference, but the alternative of random has the downside of making block inclusion/order less consistent between nodes.
  28. sipa commented at 7:22 pm on December 31, 2025: member

    There are several places where tie-break decisions need to be made:

    1. The order of equal-feerate chunks in distinct clusters: currently lowest cluster_id first, while this PR makes it uniformly random.
    2. The order of equal-feerate chunks within the same cluster (without dependency constraints between them): since SFL, smallest chunk first and then uniformly random.
    3. The order of transactions within the same chunk: since SFL, random but followed by a PostLinearize() which in practice often undoes the randomization.

    So even with this PR, the current situation is arguably inconsistent (we prefer smaller chunks, but don’t tie-break based on size elsewhere). I feel like we should make a decision for what properties are desirable, and then implement it consistently. I think we definitely want highest-feerate chunk first (that’s what cluster mempool is), and probably want PostLinearize() within chunks, as they are objective (even though PostLinearize()’s quality improvement isn’t actually observable without sub-chunk block template building). But besides those, we could decide:

    • Preference for smaller size or not? While smallest doesn’t directly mean better, it’s at least objectively relevant to block building, and if grinding is somehow an outcome, grinding for smaller size is probably not terrible.
      • (s) Preference for size.
        • For (1), prefer smaller clusters first (h) or smaller equal-feerate-chunks-before (c)
        • For (2), prefer smaller chunks like now.
        • For (3), prefer smaller transactions within chunks first. Optionally (f) or not (m) it could involve preferring higher-feerate transactions within a chunk first too, before preferring smaller size.
      • (u) Do not discriminate based on size at all (including no preference for smaller chunks).
    • How to tie-break the rest:
      • (r) Randomly.
      • (t) By txid.
        • For (1), pick lower sum of txids (as bigint) of the whole cluster (h) or sum of txids of equal-feerate-chunks-before (c).
        • For (2), pick chunk with smallest sum of txids first.
        • For (3), pick lowest txid transaction first.
      • (w) Like (t), but use wtxids rather than txids. I think this is (very slightly) worse as it might encourage witness grinding which is slightly easier than full txid grinding, and doesn’t have any benefits as far as I can see.
      • (d) Like (t), but use a salted hash of txids rather than directly. This avoids a global bias, and thus avoids grinding, but unlike (r) it is per-host deterministic (and thus won’t randomly change any time a cluster is modified).

    I think all of these are doable for implementation. For any of the versions with dependency on wtxid (d/t/w) I think it would involve handing TxGraph a callback function which computes a score based on mempool data (as TxGraph doesn’t have access to wtxids directly).

    • The easiest solution is (ur). Everything random, including no preference for smallest chunk first.
    • If size preference is a good discriminator, I think (scfr) makes sense, and probably gets rid of most non-determinism.
    • If we want fully deterministic, (scft), but it comes at the cost of some implementation complexity.

    Marking as draft until decided what approach we want.

  29. sipa marked this as a draft on Dec 31, 2025
  30. gmaxwell commented at 8:35 pm on December 31, 2025: contributor

    Other preferences like “creates fewest outputs” could also be used before breaking on size, but perhaps it would be rare that this would be the tiebreaker? Like size it has an advantage that if someone is “gaming” it that’s a good-ish thing. In any case the more criteria you stick in front of a grindable tiebreaker the less the grinding matters. (Or, if you do go with a local pseudorandom choice, – which I think you shouldn’t– the more criteria in front of it the less unpredictable the behavior will be.)

    You could prefer the lowest maximum confirmed input height as a tie breaker before TXID: It’s a deterministic, naturally scarce resource– and favoring the coins that have moved less recently in the event of a tie seems healthy.

    Does your comparison need to obey the triangle inequality– I assume not if you could decide ‘randomly’? H(sorted(txid_a,txid_b))^txid_a < H(sorted(txid_a,txid_b))^txid_b as the tiebraker? this can’t usefully be ground.

  31. sipa commented at 8:56 pm on December 31, 2025: member

    Other preferences like “creates fewest outputs” could also be used before breaking on size, but perhaps it would be rare that this would be the tiebreaker? Like size it has an advantage that if someone is “gaming” it that’s a good-ish thing.

    Indeed, but I don’t think it’s particularly useful. It should be pretty rare to have a different number of outputs if the size is already the same. And similarly, after grinding for smaller size (which probably already means fewer outputs) you can’t really make the number of outputs smaller while keeping the size the same. There is also some CPU/memory cost to having more criteria, especially if it somehow turns out that they need to be cached on a per-chunk basis (which may be the case for (c)).

    You could prefer the lowest maximum confirmed input height as a tie breaker before TXID: It’s a deterministic, naturally scarce resource– and favoring the coins that have moved less recently in the event of a tie seems healthy.

    Ha, tie-break by anti-priority.

    Does your comparison need to obey the triangle inequality– I assume not if you could decide ‘randomly’? H(sorted(txid_a,txid_b))^txid_a < H(sorted(txid_a,txid_b))^txid_b as the tiebraker? this can’t usefully be ground.

    Yes, it needs to be consistent. In the current implementation, that’s achieved by mapping everything to a score function and comparing the scores. For random, the score isn’t actually a function, just a randomly-generated value assigned to each chunk/transaction.

    For (2) and (3), any scoring function works (even if the result is some complicated tuple of uint256s and feerates), really, because it’s internal to the linearization algorithm. Compute once for each chunk/txn throw into the decision-making heaps.

    For (1), it’s more complicated, because it either needs to be cached per chunk/cluster, or recomputed any time two equal-feerate chunks are sorted. Furthermore, it must respect chunk ordering within clusters. So we need something that can be implemented as a function of (cluster, chunk within that cluster), in such a way that f(cluster, chunk_pos1) <= f(cluster, chunk_pos2) if chunk_pos1 <= chunk_pos2 and the two chunks have equal feerate. That’s why I suggest sums of values (sizes, or hashes) of the whole cluster, or of equal-feerate chunk prefixes; those are naturally monotonically increasing with chunk position. Anything that’s just a function of cluster satisfies this, however.

    So it’s possible for both (1) and (2) to just use SHA256(concat(sorted(txids))) or so (for all txids within cluster, or all txids within chunk). But I’m not sure that is desirable, because it means any new transaction could completely change the existing orderings.

  32. sipa commented at 9:14 pm on January 4, 2026: member

    What about the following fully-deterministic ordering:

    First, prefer smaller sizes.

    • For sorting equal-feerate chunks in distinct clusters: sort by the sum of tx weights across all chunks in the same cluster of the same feerate, up to and including the chunk itself.
    • For sorting equal-feerate chunks within the same cluster: sort by chunk weight.
    • For sorting transactions within a chunk: sort by tx weight (and PostLinearize afterwards).

    Then, tie-break by “maximum txid”:

    • For sorting equal-feerate chunks (both within and across clusters), sort by the maximum txid within the chunk.
    • For sorting transactions within a chunk, use transaction txid (and PostLinearize afterwards).

    I think this can be done with a very small impact on CPU and memory (8 bytes per transaction: 4 to store the equal-feerate-chunk-prefix-weight, and 4 to store the max txid GraphIndex).

  33. sipa commented at 5:25 am on January 15, 2026: member
    Closing in favor of #34257.
  34. sipa closed this on Jan 15, 2026

  35. ajtowns commented at 10:42 pm on January 15, 2026: contributor

    First, prefer smaller sizes.

    Wouldn’t it be better to prefer larger sizes? If you have two equal feerate chunks, A and B, and you’re filling a block then either:

    • both A and B fit, in which case ordering doesn’t matter
    • neither of A and B fit, in which case ordering doesn’t matter
    • either but not both of A and B fit, in which case you’ll add one of them, and then fill the rest of the block with lower feerate txs, so you would prefer to include the larger of A and B to maximise total fee for the block
  36. sipa commented at 10:47 pm on January 15, 2026: member

    @ajtowns There is also the possibility that A is smaller and fits, and B is bigger and doesn’t fit (regardless of whether A has been included). In this case, you want A first. I believe this will exactly compensate for the probability that one but not both fits, where you prefer the bigger one first. (EDIT: this only matters within a cluster - across clusters they’re independent)

    Relevant IRC discussion today:

     011:42:57 < darosior> sipa: the discussion in [#33335](/bitcoin-bitcoin/33335/) is convincing that a deterministic tie breaker is preferable. But i find it quite unintuitive that between two equal-feerate transactions you would choose to pick the one that pays a miner less. Miners optimize for maximum fees, which often (though not always) means maximizing feerate first, but also just absolute
     111:42:57 < darosior> fees? Now i understand miners optimize for what actually ends up paying them, which is what they can actually pack in a template, and that sometimes smaller transactions can lead to higher-paying templates, but is that the common case?
     2...
     311:54:28 < sipa> darosior: it's important to note that this is just choosing the order of chunks within a group of equal-feerate ones - and that group has fixed size, and fixed total fee
     411:56:17 < sipa> so it's just a question of where to put the boundaries between chunks in it, and it's really the distance between where the block boundary falls and the last chunk boundary before it that matters
     512:03:27 < darosior> And more often than not you'd want to have the smaller transaction first because the larger (higher-fee) one would not fit and therefore if you didn't order this way you'd give up both their fees?
     612:03:35  * darosior spells out loud to make sure he understands
     7...
     813:14:51 < sipa> darosior: no, i don't think it matters either way - the only reason to prefer smaller is because people deliberately grinding for smaller size feels "not really bad"
     913:15:02 < sipa> we just can't predict where the boundaries will fall
    1013:15:51 < sipa> maybe in a world where there is a permanent transaction backlog, and many transactions that just never get mined, smaller first is actually better because you can have clusters whose ends never make it
    1113:16:23 < sipa> but i don't think this is an important point... the real "quality" aspect is covered by picking higher feerate first, and splitting up chunks in minimal parts
    1213:16:32 < sipa> the rest is just convention to get determinism
    1313:41:53 < darosior> On the other hand choosing on the contrary the highest-fee-paying one may directly solve the "grinding" since it would mean just paying for more block space which someone can already do if they want to. But yeah it's probably never going to matter either way.
    1413:45:46 < sipa> yeah, fair point - i can see an argument for largest size first too
    
  37. ajtowns commented at 2:11 am on January 16, 2026: contributor

    There is also the possibility that A is smaller and fits, and B is bigger and doesn’t fit (regardless of whether A has been included).

    Right – when they’re in different clusters, considering B first is fine; it’ll be skipped and A taken; but when they’re in the same cluster, skipping B would also prevent anything else from the cluster being included.

    So within a cluster, consider smaller chunks first; across clusters, consider the bigger chunk first? Maybe that’s a bit too much complexity for too little value though, especially since it’s not stable between cluster merges/splits?

    FWIW, I don’t find the grinding argument very persuasive: if you can find a way to make your tx smaller, then you’re already encouraged to do that by fee rate prioritisation. Making a larger tx and also paying more fees for it just to keep the fee rate constant isn’t attractive, as far as I can see.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-19 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me