p2p: Replace per-peer transaction rate-limiting with global rate limits #34628

ajtowns commented at 3:09 AM on February 20, 2026: contributor

Per-peer m_tx_inventory_to_send queues have CPU and memory costs that scale with both queue size and peer count. Under high transaction volume, this has previously caused severe issues (May 2023 disclosure) and still can cause measurable delays (Feb 2026 Runestone surge, with the msghand thread observed hitting 100% CPU and queue memory reaching ~95MB).

This PR replaces the per-peer rate limiting with a global queue using dual token buckets (limiting transaction by both count and serialized size). Transactions that arrive within the bucket capacity still relay nearly immediately, but excess transactions queue in a global backlog and drain as the token buckets refill.

Key parameters:

Count bucket: 14 tx/s, 420 capacity (30s buffer)
Size bucket: 20 kB/s (~12 MB/600s), 50 MB capacity
Outbound peers refill faster by a factor of 2.5

Per-peer queues are retained solely for privacy batching and are always fully emptied, removing the old INVENTORY_BROADCAST_MAX cap.

This reduces the memory and CPU burden during transaction spikes when the queuing logic is engaged from O(queue * peers) to O(queue), as the queued transactions no longer need to be retained per-peer or re-sorted per-peer.

Design discussion: https://gist.github.com/ajtowns/d61bea974a07190fa6c6c8eaef3638b9

DrahtBot added the label P2P on Feb 20, 2026

DrahtBot commented at 3:10 AM on February 20, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34628.

Reviews

See the guideline and AI policy for information on the review process.

Type	Reviewers
ACK	sipa, instagibbs
Concept ACK	0xB10C, polespinasa, naiyoma

If your review is incorrectly listed, please copy-paste <code></code> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#35642 (headersync: do parameter search at runtime by sipa)
#35591 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
#35522 (refactor: Extract per-message helpers from SendMessages() (move-only) by pablomartin4btc)
#35513 (rpc: help metadata fixes by RuslanProgrammer)
#35511 (RFC: consensus: Make CAmount a class by hodlinator)
#35502 (refactor: extract per-message helpers from ProcessMessage (move-only) by w0xlt)
#35474 (node: move index ownership to NodeContext by w0xlt)
#35315 (refactor: Use NodeClock::time_point in more places by maflcko)
#35016 (net: deduplicate private broadcast state and snapshot types by kenji-yamam0to)
#34824 (net: encapsulate TxRelay state and replace recursive mutexes by w0xlt)
#31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label CI failed on Feb 20, 2026

ajtowns force-pushed on Feb 20, 2026

ajtowns commented at 12:43 PM on February 20, 2026: contributor

CI failure is presumably either #34631 or #34387

DrahtBot removed the label CI failed on Feb 24, 2026

in src/util/tokenbucket.h:57 in 869a1ae012 outdated

  52 | +    }
  53 | +
  54 | +    /** Consume n tokens. Returns false if the balance dropped below m_max_debt. */
  55 | +    bool decrement(double n = 1.0)
  56 | +    {
  57 | +        m_value -= n;

chriszeng1010 commented at 5:33 PM on March 2, 2026:

Decrement can still go below m_max_debt before checking is complete.

ajtowns commented at 12:57 PM on March 4, 2026:

decrement() can always go below m_max_debt, it only reports when it has done so -- it leaves it up to the caller to not go further into debt.

DrahtBot added the label Needs rebase on Mar 11, 2026

ajtowns force-pushed on Mar 12, 2026

DrahtBot removed the label Needs rebase on Mar 12, 2026

0xB10C commented at 9:45 AM on March 12, 2026: contributor

Concept ACK!

I've been running this for a few days now and written down a few observations on a small mass-broadcast event that happend a few hours ago: https://bnoc.xyz/t/increased-b-msghand-thread-utilization-due-to-runestone-transactions-on-2026-02-17/81/11

The node with this patch was significantly less affected than the others running a recent master.

I haven't set up any monitoring for the newly added getnetworkinfo fields yet.

DrahtBot added the label CI failed on Mar 12, 2026

DrahtBot closed this on Mar 12, 2026

DrahtBot reopened this on Mar 12, 2026

ajtowns added this to the milestone 32.0 on Mar 12, 2026

DrahtBot removed the label CI failed on Mar 12, 2026

in src/txmempool.cpp:541 in 344de4b8dd outdated

 537 | @@ -538,6 +538,55 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 538 |          for (const auto& input: tx.vin) mempoolDuplicate.SpendCoin(input.prevout);
 539 |          AddCoins(mempoolDuplicate, tx, std::numeric_limits<int>::max());
 540 |      }
 541 | +

sipa commented at 6:22 PM on March 20, 2026:

In commit "txmempool: Add SortMiningScoreWithTopology"

This feels more like something for a fuzz or unit test. CTxMemPool::check is for internal consistency checks in the CTxMemPool representation, I feel.

ajtowns commented at 1:20 AM on March 21, 2026:

.... I might have spent too much time vibecoding and caught hallucinations? I could have sworn I was replacing existing code here. EDIT: Dropped this code.

in src/txmempool.cpp:605 in 344de4b8dd outdated

 601 | @@ -553,6 +602,27 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 602 |      assert(innerUsage == cachedInnerUsage);
 603 |  }
 604 |  
 605 | +std::vector<CTxMemPool::txiter> CTxMemPool::SortMiningScoreWithTopology(std::span<const Wtxid> wtxids, size_t n) const

sipa commented at 6:46 PM on March 20, 2026:

It looks like both eventual production call sites of this function (BumpInvVecForProcessing and PeerManagerImpl::SendMessages) do a deduplication pass on the results.

Would it make sense to do this on the fly inside this function? It can't use std::partial_sort anymore, but it can use std::make_heap and friends to implement partial sorting, with a dynamic end point until n distinct elements have been found? Something like

std::vector<CTxMemPool::txiter> CTxMemPool::SortMiningScoreWithTopology(std::span<const Wtxid> wtxids, size_t n) const
{
    auto cmp = [&](const auto& a, const auto& b) EXCLUSIVE_LOCKS_REQUIRED(cs) noexcept { return m_txgraph->CompareMainOrder(*a, *b) > 0; };

    std::vector<txiter> res;

    n = std::min(wtxids.size(), n);
    if (n > 0) {
        // Construct a heap with txiters for all wtxids that exist in the mempool.
        std::vector<txiter> heap;
        heap.reserve(wtxids.size());
        for (auto& wtxid : wtxids) {
            if (auto i{GetIter(wtxid)}; i.has_value()) {
                heap.push_back(i.value());
            }
        }
        std::ranges::make_heap(heap, cmp);

        // Pop transactions until n distinct ones in res have been found.
        res.reserve(heap.size());
        while (res.size() < n && !heap.empty()) {
            std::ranges::pop_heap(heap, cmp);
            if (res.empty() || heap.back() != res.back()) {
                res.push_back(heap.back());
            }
            heap.pop_back();
        }

        // Copy the remainder over, without sorting or deduplication.
        res.insert(res.end(), heap.begin(), heap.end());
    }

    return res;
}

With even more low-level code the duplicate vector can be avoided, I think. Tests don't pass with this, I haven't investigated why.

ajtowns commented at 1:14 AM on March 21, 2026:

Without having looked, is the comparison backwards?

My understanding is partial_sort has two benefits:

it only makes a heap out of the target size, so iterates through the source array once and then does log(k) work for each element, with better locality
when updating the k elements in the heap with a new element from the source, it does the sift-down algorithm which is more efficient than heap_push()/heap_pop(), but isn't exposed via the STL so would mean writing your own heap implementation

Deduping the fairly small output list as you pass through it, when duplicates are rare anyway, seemed fine to me?

sipa commented at 2:20 PM on March 21, 2026:

Without having looked, is the comparison backwards?

I don't think so. It's a max heap, but I want to pop the "lowest" elements off first, so I needed to swap the comparator I think.

My understanding is partial_sort has two benefits:

Interesting. So it has complexity O(n log m) (with n = number of elements, m = sorted prefix size), while the approach I have in mind is O(n + (m log n)) (O(n) to construct the heap of all elements, and then m operations of each O(log n) to extract the best m elements. Complexity-wise, my approach seems better, since n > m, except it has worse memory locality. This makes me wonder if I'm missing something, since the cppreference.cpp documentation seems to imply std::partial_sort is intended for low m values.

Deduping the fairly small output list as you pass through it, when duplicates are rare anyway, seemed fine to me?

Yeah, it probably doesn't matter that much. It just looked like the deduplication is something that CTxMemPool::SortMiningScoreWithTopology could do internally since both call sites need it anyway. And then it seemed possible to have the count be dynamic, but only when using a different approach than what std::partial_sort seems to enable.

ajtowns commented at 12:58 AM on March 22, 2026:

Complexity-wise, my approach seems better, since n > m, except it has worse memory locality.

Yeah. I think the ratio between the number of swaps each approach performs in the worst case is log2(m) : 2 -- if you give the input in exactly the wrong order, each element goes to the top of the heap with log(m) steps, whereas for the full heap, it adds up to 2. So for m~=100, that's 3.3x more swaps, but the swaps are contained to a set of 100 elements, which might get you more than a 3.3x speedup per-swap due to locality? (In the average case, for most elements you'll just compare to the top of heap element, find it's worse and do 0 swaps, and overall it reduces from O(n log(m) + m log(m)) to O(n + m log(m)) which is better than the O(n + m log(n)) from the full heap.

Oh, hmm; in the per-peer thread we're always taking everything, so I think there we should be explicitly using std::sort then and not any of this partial business anyway (pushed). That should ensure that the m sizes we're using in practice are always about 70 (-txsendrate times inbound broadcast interval).

in src/txmempool.cpp:483 in ea647debfd

 480 | @@ -481,10 +481,10 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 481 |          const CTransaction& tx = it->GetTx();
 482 |  
 483 |          // CompareMiningScoreWithTopology should agree with GetSortedScoreWithTopology()

sipa commented at 7:46 PM on March 20, 2026:

In commit "txmempool: Drop CompareMiningScoreWithTopology"

Comment is outdated now.

in src/util/tokenbucket.h:16 in 859e8eb020

  11 | +
  12 | +/** A token bucket rate limiter.
  13 | + *
  14 | + * Tokens are added at a steady rate (m_rate per second) up to a capacity
  15 | + * cap (m_cap). Tokens are removed by calling decrement(). The balance
  16 | + * may go negative down to m_max_debt; decrement() returns false when

sipa commented at 7:54 PM on March 20, 2026:

In commit "util/tokenbucket.h: Provide a generic TokenBucket class"

Is it useful to support debt? I believe it can be avoided by a transformation that raises both m_value and m_cap by -m_max_debt.

ajtowns commented at 4:04 AM on March 21, 2026:

The distinction is in InvToSendBucket::avail() which says "you can start doing stuff as long as the size_bucket's value is >=0", which would have to get incremented as well to be equivalent.

The main effect is that when the size bucket is under pressure, you get a chance to relay at least 50kB each iteration, rather than the avail test passing as soon as you can relay 1B, then the loop ending immediately after you relay the first tx.

I think it can be simplified a bit by moving the max_debt value to just being a parameter of decrement() though. Will update.

ajtowns force-pushed on Mar 21, 2026

ajtowns force-pushed on Mar 22, 2026

DrahtBot added the label CI failed on Mar 22, 2026

DrahtBot removed the label CI failed on Mar 22, 2026

in src/net_processing.cpp:6190 in ba3a81d036 outdated

6187 | @@ -6028,63 +6188,56 @@ bool PeerManagerImpl::SendMessages(CNode& node)
6188 |                  // Determine transactions to relay
6189 |                  if (fSendTrickle) {
6190 |                      // Produce a vector with all candidates for sending

xyzconstant commented at 3:04 AM on April 15, 2026:

In commit: "net_processing: Change m_tx_inventory_to_send from set to vector" (2998d73f692059f149fa2d5a4108b172b21c8cac)

nit: Forgot to remove this comment as well?

ajtowns commented at 4:11 AM on April 15, 2026:

No? inv_tx is a vector with all candidates for sending in the new code.

xyzconstant commented at 4:33 AM on April 15, 2026:

Yeah you're right, but now filterrate definition sits in between

    // Produce a vector with all candidates for sending
    const CFeeRate filterrate{tx_relay->m_fee_filter_received.load()};

    // Topologically and fee-rate sort the inventory we send for privacy and priority reasons.
    // sorted from lowest priority to highest, skipping low fee
    auto inv_tx = [&]() EXCLUSIVE_LOCKS_REQUIRED(tx_relay->m_tx_inventory_mutex) {

ajtowns commented at 4:32 AM on April 26, 2026:

Tweaked the comments a bit

DrahtBot added the label Needs rebase on Apr 23, 2026

ajtowns force-pushed on Apr 23, 2026

DrahtBot removed the label Needs rebase on Apr 23, 2026

in src/node/transaction.cpp:64 in cafa97202e outdated

  59 | @@ -60,15 +60,15 @@ TransactionError BroadcastTransaction(NodeContext& node,
  60 |              if (!existingCoin.IsSpent()) return TransactionError::ALREADY_IN_UTXO_SET;
  61 |          }
  62 |  
  63 | -        if (auto mempool_tx = node.mempool->get(txid); mempool_tx) {
  64 | +        mempool_tx = node.mempool->get(txid);
  65 | +        if (mempool_tx) {

polespinasa commented at 11:38 AM on April 24, 2026:

in cafa97202e95df278153369ccda9c1bb61880a5e I don't think we should have an if statement without code logic inside. Why not just if(!mempool_tx) and do what is inside the else code block

ajtowns commented at 3:05 AM on April 26, 2026:

The if block provides a place for the detailed comments on what happen in the case the tx is already in the mempool.

in src/net_processing.cpp:174 in 466dccc89c

 175 | -static constexpr unsigned int INVENTORY_BROADCAST_TARGET = INVENTORY_BROADCAST_PER_SECOND * count_seconds(INBOUND_INVENTORY_BROADCAST_INTERVAL);
 176 | -/** Maximum number of inventory items to send per transmission. */
 177 | -static constexpr unsigned int INVENTORY_BROADCAST_MAX = 1000;
 178 | -static_assert(INVENTORY_BROADCAST_MAX >= INVENTORY_BROADCAST_TARGET, "INVENTORY_BROADCAST_MAX too low");
 179 | -static_assert(INVENTORY_BROADCAST_MAX <= node::MAX_PEER_TX_ANNOUNCEMENTS, "INVENTORY_BROADCAST_MAX too high");
 180 | +// static constexpr unsigned int INVENTORY_BROADCAST_TARGET = INVENTORY_BROADCAST_PER_SECOND * count_seconds(INBOUND_INVENTORY_BROADCAST_INTERVAL);

polespinasa commented at 3:49 PM on April 24, 2026:

In 466dccc89c5748c7f4ccfb10b2079c2eb54589c5 you can delete this two lines that are commented and then deleted in 0491df8e7876b9b2b86e25190823f3ff632311c6.

ajtowns commented at 3:10 AM on April 26, 2026:

They're commented in the earlier commit then uncommented (not deleted) in the later commit. Commenting them makes it easy to see how they're (not) changed, while still allowing the in-between commits to compile without hitting "unused variable" warnings/errors.

in src/txmempool.cpp:541 in 798db790a8

 537 | @@ -538,6 +538,7 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 538 |          for (const auto& input: tx.vin) mempoolDuplicate.SpendCoin(input.prevout);
 539 |          AddCoins(mempoolDuplicate, tx, std::numeric_limits<int>::max());
 540 |      }
 541 | +

polespinasa commented at 3:57 PM on April 24, 2026:

in 798db790a8a5b9c6b11b3e8599cb88f9d6015d87 nit: random empty line added here

in src/txmempool.cpp:567 in 798db790a8 outdated

 562 | +
 563 | +    n = std::min(wtxids.size(), n);
 564 | +    if (n > 0) {
 565 | +        res.reserve(wtxids.size());
 566 | +        for (auto& wtxid : wtxids) {
 567 | +            if (auto i{GetIter(wtxid)}; i.has_value()) {

polespinasa commented at 4:07 PM on April 24, 2026:

in 798db790a8a5b9c6b11b3e8599cb88f9d6015d87

nit: I think this can be simplified: if (auto i = GetIter(wtxid)) res.push_back(*i);

ajtowns commented at 3:13 AM on April 26, 2026:

The i.has_value() check is to catch if any of the txs have been removed from the mempool and to ensure that all the returned txiter's are valid. Having that check be implicit in the if is probably fine, but seems worse than writing it explicitly to me..

in src/txmempool.cpp:572 in 798db790a8 outdated

 567 | +            if (auto i{GetIter(wtxid)}; i.has_value()) {
 568 | +                res.push_back(i.value());
 569 | +            }
 570 | +        }
 571 | +
 572 | +        if (n >= res.size()) {

polespinasa commented at 4:47 PM on April 24, 2026:

in 798db790a8a5b9c6b11b3e8599cb88f9d6015d87

I think std::partial_sort gives the same result as std::sort if the mid iterator reaches the end. So changing the whole if ... else... block for:

std::partial_sort(res.rbegin(),
                  res.rbegin() + std::min(n, res.size()),
                  res.rend(),
                  cmp);

I think it should work.

ajtowns commented at 5:14 AM on April 25, 2026:

std::partial_sort is less efficient than std::sort when sorting the entire container, eg https://stackoverflow.com/questions/45455345/performance-of-stdpartial-sort-versus-stdsort-when-sorting-the-whole-ran

polespinasa commented at 4:51 PM on April 24, 2026: member

Concept ACK

Code reviewed till 798db790a8a5b9c6b11b3e8599cb88f9d6015d87 will continue soon :)

Left small comments and suggestions but nothing important, feel free to ignore.

DrahtBot added the label Needs rebase on Apr 24, 2026

ajtowns force-pushed on Apr 26, 2026

DrahtBot removed the label Needs rebase on Apr 26, 2026

ajtowns force-pushed on Apr 26, 2026

DrahtBot added the label CI failed on Apr 26, 2026

DrahtBot removed the label CI failed on Apr 26, 2026

ajtowns commented at 5:28 AM on April 26, 2026: contributor

Rebased past #35097, addressed review comments

in src/util/tokenbucket.h:43 in f632171cca

  38 | +    /** Refill tokens based on elapsed time since last call. No refill
  39 | +     *  occurs on the first call (establishes the time baseline). */
  40 | +    void increment(const time_point& now)
  41 | +    {
  42 | +        if (now > m_last_updated) {
  43 | +            if (m_value < m_cap && m_last_updated.time_since_epoch().count() > 0) {

polespinasa commented at 2:53 PM on April 27, 2026:

in f632171 I think this is a bit more clearer way to say "this is not the first call". As it is checking whether m_last_updated has been initialized or not.

if (m_value < m_cap && m_last_updated != time_point{}) {...}

ajtowns commented at 6:55 AM on April 28, 2026:

Yeah that's nicer. Introduce a MIN_TIME to compare against instead.

in src/net_processing.cpp:544 in 2a9ab8062b outdated

 539 | +        count_bucket.increment(now);
 540 | +    }
 541 | +
 542 | +    bool decrement(double size)
 543 | +    {
 544 | +        bool x = size_bucket.decrement(size, /*floor=*/-50e3);

polespinasa commented at 3:00 PM on April 27, 2026:

in 2a9ab8062bcadb5e8128572671bba762101efaa6 why negative floor?

ajtowns commented at 6:40 AM on April 28, 2026:

Because the comparison is m_value > floor not m_value > -floor

polespinasa commented at 11:13 AM on April 28, 2026:

Sorry, maybe the question was not clear. I mean why negative? Why can it go bellow 0 in the first place?

ajtowns commented at 3:23 PM on April 28, 2026:

So that if/when outgoing txs are bandwidth constrained (ie your bucket is often empty) you send ~50kB of tx data in each burst, rather than just the single highest priority tx in the backlog (since you'll immediately process the backlog when your size bucket goes above zero, even by just one byte).

polespinasa commented at 3:12 PM on April 27, 2026: member

code reviewed 54bbc5649c65395eadbff7f237359ad33a6862e5

Probably this needs a release note for -txsendrate and the new info in getnetworkinfo.

Left a small comment and a question

fanquake added the label Needs release note on Apr 27, 2026

ajtowns force-pushed on Apr 28, 2026

ajtowns commented at 6:57 AM on April 28, 2026: contributor

Added a release note. Are we meant to remove the "Needs release note" label when there's a release note included in the PR, or is it more like an "I need to breathe, and I am breathing" arrangement where the latter doesn't negate the former?

polespinasa commented at 7:14 AM on April 28, 2026: member

Added a release note. Are we meant to remove the "Needs release note" label when there's a release note included in the PR, or is it more like an "I need to breathe, and I am breathing" arrangement where the latter doesn't negate the former?

I think there's no policy on that, I've seen both cases in the past, #31278 got it and then removed it, #32138 got it and never removed the label even if the note was there.

IMHO is good to keep it as a reminder in case the release note is dropped by mistake at some point, so reviewers can realize that something is missing.

maflcko removed the label Needs release note on Apr 28, 2026

polespinasa commented at 11:18 AM on April 28, 2026: member

How can I help testing this? @0xB10C are you using a patch to measure it or just by enabling debug and net flag you catch inv_to_send

0xB10C commented at 3:59 PM on April 29, 2026: contributor

@0xB10C are you using a patch to measure it or just by enabling debug and net flag you catch inv_to_send

No, I've been running this PR. The measurements described on https://bnoc.xyz/t/increased-b-msghand-thread-utilization-due-to-runestone-transactions-on-2026-02-17/81/11 were done collecting data from a few different interfaces with peer-observer:

inv-to-send set sizes across multiple nodes via the getpeerinfo RPC
time spent in b-msghand thread via a prometheus process-exporter
the localhost ping-pong duration with a custom p2p client that measures the time it takes for the node to respond. This measures message backlog
size of the INVs the node sends us also with a custom P2P client on localhost that listens for INVs from the node.

Not sure if this helps much.

instagibbs commented at 8:11 AM on May 5, 2026: member

concept ACK, will review

in src/node/transaction.cpp:133 in f940743fac

 129 | @@ -130,7 +130,7 @@ TransactionError BroadcastTransaction(NodeContext& node,
 130 |      case TxBroadcast::MEMPOOL_NO_BROADCAST:
 131 |          break;
 132 |      case TxBroadcast::MEMPOOL_AND_BROADCAST_TO_ALL:
 133 | -        node.peerman->InitiateTxBroadcastToAll(txid, wtxid);
 134 | +        node.peerman->InitiateTxBroadcastToAll(mempool_tx ? mempool_tx : tx);

instagibbs commented at 2:26 PM on May 12, 2026:

f940743fac03f27d8cf3c9f2d8a0dd5ba36209bd

Why not just tx unconfiditionally?

polespinasa commented at 7:29 PM on May 14, 2026:

I think is because we might have a tx in the mempool with same tx id but different witness. So we would be announcing a tx that we don't have in our mempool because it would conflict with our version of the tx.

instagibbs commented at 8:26 PM on May 14, 2026:

this is extremely non-obvious and should be documented if so

polespinasa commented at 8:34 PM on May 14, 2026:

It is :)

See my other comment: https://github.com/bitcoin/bitcoin/pull/34628/changes/BASE..f940743fac03f27d8cf3c9f2d8a0dd5ba36209bd#r3137421379

Maybe the comment could be moved removing the empty if?

instagibbs commented at 8:37 PM on May 14, 2026:

resolving, didnt notice the unmoved/unchanged comment in the diff

edit: github doesnt want to let me

in src/net_processing.cpp:173 in b47c81af3b

 168 | @@ -169,13 +169,9 @@ static constexpr auto INBOUND_INVENTORY_BROADCAST_INTERVAL{5s};
 169 |  static constexpr auto OUTBOUND_INVENTORY_BROADCAST_INTERVAL{2s};
 170 |  /** Maximum rate of inventory items to send per second.
 171 |   *  Limits the impact of low-fee transaction floods. */
 172 | -static constexpr unsigned int INVENTORY_BROADCAST_PER_SECOND{14};
 173 | +// static constexpr unsigned int INVENTORY_BROADCAST_PER_SECOND{14};
 174 |  /** Target number of tx inventory items to send per transmission. */

instagibbs commented at 2:36 PM on May 12, 2026:

b47c81af3b3e7ae59bc09a3f621ecdf8f3dc62da

unrelated to PR: appears to also be the target for blocks, not just tx?

ajtowns commented at 8:24 PM on May 14, 2026:

No: we do reserve that much space (INVENTORY_BROADCAST_TARGET) for the vector before announcing blocks by inv, but we'll just spam all the blocks we have queued (splitting into new messages as needed).

in src/txmempool.cpp:556 in 3a72379141

 552 | @@ -553,6 +553,31 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 553 |      assert(innerUsage == cachedInnerUsage);
 554 |  }
 555 |  
 556 | +std::vector<CTxMemPool::txiter> CTxMemPool::SortMiningScoreWithTopology(std::span<const Wtxid> wtxids, size_t n) const

instagibbs commented at 2:43 PM on May 12, 2026:

3a7237914178c9aa8018f86ae5e446f263e0c843

n is overly terse imo, and the help isn't clear to me either. n_best?

in src/net_processing.cpp:304 in 1ace17fc1b outdated

 300 | @@ -301,7 +301,7 @@ struct Peer {
 301 |           *  we retrieve the txid from the corresponding mempool transaction when
 302 |           *  constructing the `inv` message. We use the mempool to sort transactions
 303 |           *  in dependency order before relay, so this does not have to be sorted. */
 304 | -        std::set<Wtxid> m_tx_inventory_to_send GUARDED_BY(m_tx_inventory_mutex);
 305 | +        std::vector<Wtxid> m_tx_inventory_to_send GUARDED_BY(m_tx_inventory_mutex);

instagibbs commented at 2:46 PM on May 12, 2026:

1ace17fc1bbf265bcb1100e8b025f279223c9da7

Still being called a set in the help

in src/net_processing.cpp:6049 in 1ace17fc1b outdated

6068 | +                    }();
6069 | +                    tx_relay->m_tx_inventory_to_send.clear();
6070 | +
6071 | +                    LOCK(tx_relay->m_bloom_filter_mutex);
6072 | +                    vInv.reserve(std::min<size_t>(MAX_INV_SZ, vInv.size() + inv_tx.size()));
6073 | +                    while (!inv_tx.empty()) {

instagibbs commented at 3:01 PM on May 12, 2026:

1ace17fc1bbf265bcb1100e8b025f279223c9da7

Feel like just reverse ranging it or similar, then not popping anything, is faster?

  for (auto it = inv_tx.rbegin(); it != inv_tx.rend(); ++it) {
      const auto& tx = *it;
      ...
  }

ajtowns commented at 8:23 PM on May 14, 2026:

"faster" ? Iterating over inv_tx a second time when destructing to decrement the CTxRefs would probably be slower I would have thought, but it seems likely to basically unmeasurable either way?

instagibbs commented at 8:25 PM on May 14, 2026:

ok, "less indirect"?

ajtowns commented at 6:30 AM on May 30, 2026:

I prefer the while (!empty) { x = back(); pop(); .... } approach here, so leaving as-is.

instagibbs commented at 6:08 PM on May 14, 2026: member

Some initial comments while I still work through the approach.

To be honest I'm finding it a little difficult to follow the lifetime of invs.

In this branch https://github.com/instagibbs/bitcoin/commit/3f87d24eea279fb6b68f5f9af9579cd8b8909db3 , I considered forcing all announcements through the backlog, and then draining this every tick if:

avail() is large enough to drain entire backlog (replacement for immediate path)
same as before in this PR, for when avail() batch gets "big enough" to cost a partial sort

I also am finding it difficult to understand the negative budgeting. InvToSendBucket::decrement return value is never checked and in my branch is deleted anyways. Does /*floor=*/-50e3 even do anything in the PR?

This change would mean in the immediate path we wouldn't be checking m_tx_inventory_known_filter and deduped later, fwiw.

Probably other issues with divergence in your attempt, but I can't make heads or tails right now.

ajtowns commented at 8:38 PM on May 14, 2026: contributor

I also am finding it difficult to understand the negative budgeting. InvToSendBucket::decrement return value is never checked and in my branch is deleted anyways. Does /*floor=*/-50e3 even do anything in the PR?

BumpInvVecForProcessing calls if (!inv_bucket.size_bucket.decrement(itervec[i]->GetTx().ComputeTotalSize())) which is where the -50e3 param should be having an effect (allowing a larger batch of txs when the size limit is in effect), but is missing.

naiyoma commented at 7:23 PM on May 29, 2026: contributor

Concept ACK.

I've attempted to test these changes using a Warnet scenario:

Setup:

4-node network,
38 additional silent listener peers attached to tank-0001 (they accept INVs but never reply, was trying to create a worst case for the per-peer sort path)
~10,000 transactions injected into tank-0000, then relayed to tank-0001
mempool size and test conditions consistent across both runs

I then compared the behavior before(on master) and after the changes in this PR.

The graphs below show per-peer inv_to_send queue sizes and ping times for each run, covering the period from initial connection. The first graph shows the queue size for each peer. The second graph shows the ping time for the nodes.(excluding silent peers)

On master, per-peer queues climbed to ~6,300 entries and honest-peer max pingtime crossed 200ms.

With this PR, per-peer queues stayed under ~440 and honest-peer max pingtime stayed under ~50ms.

measured b-msghand thread CPU on tank-0001 (same scenario, same network size, sampled per-second from /proc/<pid>/task/<tid>/stat):

On master, the thread crossed 50% CPU 13 times, with peaks near 90%.

With this PR, the same thread crossed 50% only twice.

ajtowns force-pushed on May 30, 2026

ajtowns commented at 6:34 AM on May 30, 2026: contributor

Reworked a bunch; now everything goes through the backlog which means there's just one code path, and simplifies a few things. It still immediately attempts to drain the backlog when receiving a new tx, so there's not much behaviour change (not doing this seems to require messing about with functional test assumptions). The size bucket decrement() floor now takes effect, and the count bucket now also uses a decrement() floor for the "try to do 70 transactions at once" logic.

in src/net_processing.cpp:6044 in 08528cd9bc

6043 | @@ -6046,13 +6044,9 @@ bool PeerManagerImpl::SendMessages(CNode& node)
6044 |                      // A heap is used so that not all items need sorting if only a few are being sent.

instagibbs commented at 3:09 PM on June 1, 2026:

08528cd9bc1ac60538cc3fa9223c3ae7d9cfc286

This is probably nuked later, but this comment is stale as of this commit

in src/net_processing.cpp:2338 in 01b28af3ed

2349 | +        // save the remaining section as-is (probably mostly unsorted)
2350 | +        for (size_t j = 0; j < i; ++j) {
2351 | +            backlog.push_back(itervec[j]->GetTx().GetWitnessHash());
2352 | +        }
2353 | +        if (backlog.empty()) {
2354 | +            std::vector<Wtxid>{}.swap(backlog); // free memory associated with vec

instagibbs commented at 6:40 PM on June 1, 2026:

01b28af3ed7b794d214a3d1ab751605a2555d05f

I think this is a no-op? We're already swapping out backlog earlier in the function when we detect we are going to take all the entries, leaving an empty backlog, leading to another no-op swap?

ajtowns commented at 6:12 AM on June 2, 2026:

Good catch

instagibbs commented at 7:14 PM on June 1, 2026: member

Significantly clearer to me now, thanks.

I have another cleanup suggestion which makes it a bit easier for me to track what's happening: https://github.com/instagibbs/bitcoin/commit/5b26aa31ed42e2aba576f6837917c8d690e115bb

Like usual I may have missed a key detail.

ajtowns commented at 1:26 AM on June 2, 2026: contributor

I have another cleanup suggestion which makes it a bit easier for me to track what's happening: instagibbs@5b26aa3

I believe deduping first loses track of the distinction between the sorted elements and the unsorted ones, so if you sorted n+1 elements, then remove 5 duplicates, then relay the top n elements, you'll relay 4 elements with random fee rates.

ajtowns force-pushed on Jun 2, 2026

ajtowns commented at 9:24 AM on June 2, 2026: contributor

Addressed feedback; txmempool func renamed to ExtractBestByMiningScoreWithTopology which now returns a fully sorted vector of txiters from best to worst, and updates the input vector with any remaining wtxids, minimising the work callers need to do.

ajtowns force-pushed on Jun 2, 2026

DrahtBot added the label CI failed on Jun 2, 2026

DrahtBot removed the label CI failed on Jun 2, 2026

in src/txmempool.cpp:556 in a983ac1163 outdated

 552 | @@ -553,6 +553,61 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 553 |      assert(innerUsage == cachedInnerUsage);
 554 |  }
 555 |  
 556 | +std::vector<CTxMemPool::txiter> CTxMemPool::ExtractBestByMiningScoreWithTopology(std::vector<Wtxid>& wtxids, size_t n_to_sort) const

instagibbs commented at 12:56 PM on June 2, 2026:

a983ac1163e6638c4f9e0271982dbe9f8c864525

Take or leave suggestion: Could we return the new wtxids vector instead of editing in place? Could make things even clearer and still avoid performance hit.

ajtowns commented at 6:15 PM on June 5, 2026:

I feel like the "Extract" naming is pretty clear, and returning pairs seems ugly to me, so going to leave for now at least.

in src/txmempool.cpp:582 in a983ac1163 outdated

 577 | +            bool extra = false;
 578 | +            if (n_to_sort + 1 >= res.size()) {
 579 | +                // use regular sort when taking everything
 580 | +                std::sort(begin, end, cmp);
 581 | +            } else {
 582 | +                // when doing the partial sort we include an element

instagibbs commented at 2:49 PM on June 2, 2026:

a983ac1163e6638c4f9e0271982dbe9f8c864525

Offline you said the invariant we're trying to achieve is "you're always eliminating N elements from the backlog and you're only sending elements that don't have a repeat in the backlog to peer queues".

One curious side-effect of this behavior is if we somehow have lots of high feerate, repeat duplicates (e.g., [A, A, A, A, ..., A]), we may emit nothing because of this cleanup/deferring behavior. Granted, this would be very odd / hard to achieve in practice, and we would quickly be clearing out the backlog every mempool inclusion and INVENTORY_BUCKET_CHECK_DELAY. (IIUC, the token bucket would not be draining for backlog clearing, only for things returned by ExtractBestByMiningScoreWithTopology)

The only sane alternative I think would be to drop this boundary checking entirely, relying on bloom filters to handle these repeats, with some loss in the bucket accounting exactness. I'd rather not special case this corner case either way.

As-is just including your explanation as the goal in the comment would help a ton.

ajtowns commented at 7:49 PM on June 5, 2026:

Added some comments.

I believe you can get arbitrary many duplicates on regtest by setting mocktime to pause processing, and then using send sendrawtransaction to resubmit the same package to the mempool multiple times. It'll only enter the mempool once, but will be re-added to the inv backlog each time.

in src/txmempool.cpp:572 in a983ac1163

 567 | +                res.push_back(i.value());
 568 | +            }
 569 | +        }
 570 | +        if (res.empty()) {
 571 | +            // nothing remaining in mempool
 572 | +            wtxids.clear();

instagibbs commented at 2:56 PM on June 2, 2026:

a983ac1163e6638c4f9e0271982dbe9f8c864525

unconditionally wiping wtxids above this seems to be fine and flow better to me

in src/net_processing.cpp:6033 in 1f243296a1

6052 | +                        auto itervec = m_mempool.ExtractBestByMiningScoreWithTopology(vec, vec.size());
6053 | +                        std::vector<CTransactionRef> res;
6054 | +                        res.reserve(itervec.size());
6055 | +                        for (auto txiter : itervec) {
6056 | +                            if (txiter->GetFee() < filterrate.GetFee(txiter->GetTxSize())) {
6057 | +                                continue;

instagibbs commented at 3:00 PM on June 2, 2026:

1f243296a1ea6b285ddcb079a5d5414addcb7473

reminding myself and other reviewers we do continue and not break because this is individual feerate, not chunk feerate

ajtowns commented at 7:50 PM on June 5, 2026:

Added a comment here.

in src/txmempool.cpp:461 in be6d3190e6 outdated

 457 | @@ -458,7 +458,7 @@ void CTxMemPool::check(const CCoinsViewCache& active_coins_tip, int64_t spendhei
 458 |      assert(diagram.size() <= score_with_topo.size() + 1);
 459 |      assert(diagram.size() >= 1);
 460 |  
 461 | -    std::optional<Wtxid> last_wtxid = std::nullopt;
 462 | +    std::optional<txiter> last_iter = std::nullopt;

instagibbs commented at 3:06 PM on June 2, 2026:

be6d3190e631ebac2e7bd70750ea6f6246e0d6a6

old function name in commit message SortMiningScoreWithTopology

in src/net_processing.cpp:520 in e7777f1c8b

 515 | +     * Size bucket: Fills at 12MB every 600s, times mult so expected to be 6 times
 516 | +     *   the rate at which blocks can confirm transactions, but at least 3 times that in
 517 | +     *   the worst case. High limit to avoid triggering even with large spikes, but a
 518 | +     *   modest initial value to ensure that frequent node restarts don't raise the limit
 519 | +     *   too much.
 520 | +     * Count floor: In order to avoid resorting the global backlog too often, we ensure

instagibbs commented at 5:06 PM on June 2, 2026:

e7777f1c8bd929e4a6f6a90f61534cdb79274476

s/resorting/re-sorting/

in src/net_processing.cpp:2304 in e7777f1c8b outdated

2300 | @@ -2239,27 +2301,101 @@ void PeerManagerImpl::SendPings()
2301 |      for(auto& it : m_peer_map) it.second->m_ping_queued = true;
2302 |  }
2303 |  
2304 | -void PeerManagerImpl::InitiateTxBroadcastToAll(const Txid& txid, const Wtxid& wtxid)
2305 | +std::vector<Wtxid> InvToSendBucket::TakeForProcessing(CTxMemPool& mempool)

instagibbs commented at 5:12 PM on June 2, 2026:

e7777f1c8bd929e4a6f6a90f61534cdb79274476

I think this is logically equivalent and easier to read, since avail() is a superset of condition to do call ExtractBestByMiningScoreWithTopology, and we're calling it beforehand everywhere. Left the initial call to avoid taking the mempool lock but folding it in is also valid.

diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 05ff4b70a6..5e719c4229 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -2291,86 +2291,90 @@ void PeerManagerImpl::BlockChecked(const std::shared_ptr<const CBlock>& block, c
     }
     if (it != mapBlockSource.end())
         mapBlockSource.erase(it);
 }
 
 //////////////////////////////////////////////////////////////////////////////
 //
 // Messages
 //
 
 bool PeerManagerImpl::AlreadyHaveBlock(const uint256& block_hash)
 {
     return m_chainman.m_blockman.LookupBlockIndex(block_hash) != nullptr;
 }
 
 void PeerManagerImpl::SendPings()
 {
     LOCK(m_peer_mutex);
     for(auto& it : m_peer_map) it.second->m_ping_queued = true;
 }
 
 std::vector<Wtxid> InvToSendBucket::TakeForProcessing(CTxMemPool& mempool)
 {
     AssertLockHeld(mempool.cs);
 
+    // We only kick off if we have count and size available and things waiting
+    Assume(avail());
+
     size_t n_to_take = static_cast<size_t>(std::max<double>(count_bucket.value() - count_floor, 0));
 
     std::vector<Wtxid> best;
 
-    if (n_to_take > 0 && !backlog.empty()) {
-        auto itervec = mempool.ExtractBestByMiningScoreWithTopology(backlog, n_to_take);
-        bool tokens_left = true;
-        for (auto txiter : itervec) {
-            auto& wtxid = txiter->GetTx().GetWitnessHash();
-            if (tokens_left) {
-                best.push_back(wtxid);
-                if (!decrement(txiter->GetTx().ComputeTotalSize())) {
-                    tokens_left = false;
-                }
-            } else {
-                backlog.push_back(wtxid);
+    auto itervec = mempool.ExtractBestByMiningScoreWithTopology(backlog, n_to_take);
+    bool tokens_left = true;
+    for (auto txiter : itervec) {
+        auto& wtxid = txiter->GetTx().GetWitnessHash();
+        if (tokens_left) {
+            best.push_back(wtxid);
+            // May go in "debt" through floor esp based on size
+            if (!decrement(txiter->GetTx().ComputeTotalSize())) {
+                tokens_left = false;
             }
+        } else {
+            backlog.push_back(wtxid);
         }
-        if (backlog.empty() && backlog.capacity() > MAX_INV_BACKLOG_RESERVE_CAPACITY) {
-            /* if backlog grew very large, free it */
-            std::vector<Wtxid>{}.swap(backlog);
-        }
+    }
+    if (backlog.empty() && backlog.capacity() > MAX_INV_BACKLOG_RESERVE_CAPACITY) {
+        /* if backlog grew very large, free it */
+        std::vector<Wtxid>{}.swap(backlog);
     }
     return best;
 }
 
 void PeerManagerImpl::ProcessInvBacklog(NodeClock::time_point now, bool backlog_bumped)
 {
     // Don't run the body of this function unless it's been a little
     // while since the last run, or we just added a new tx to the backlog.
     if (!backlog_bumped && now <= m_next_inv_bucket_check.load()) return;
     m_next_inv_bucket_check = now + INVENTORY_BUCKET_CHECK_DELAY;
 
     LOCK(m_inv_to_send_mutex);
     m_inbound_inv_bucket.increment(now);
     m_outbound_inv_bucket.increment(now);
+ 
+   // Early check to avoid taking mempool lock
     bool in_avail = m_inbound_inv_bucket.avail();
     bool out_avail = m_outbound_inv_bucket.avail();
     if (!in_avail && !out_avail) return;
 
     std::vector<Wtxid> for_inbound;
     std::vector<Wtxid> for_outbound;
 
     {
         LOCK(m_mempool.cs);
         if (in_avail) for_inbound = m_inbound_inv_bucket.TakeForProcessing(m_mempool);
         if (out_avail) for_outbound = m_outbound_inv_bucket.TakeForProcessing(m_mempool);
     }
 
     if (!for_inbound.empty() || !for_outbound.empty()) {
         bool any_inbound_connected = false;
         bool any_outbound_connected = false;
         LOCK(m_peer_mutex);
         for (auto& it : m_peer_map) {
             Peer& peer = *it.second;
             auto tx_relay = peer.GetTxRelay();
             if (!tx_relay) continue;
 
             LOCK(tx_relay->m_tx_inventory_mutex);
             // Only queue transactions for announcement once the version handshake
             // is completed. The time of arrival for these transactions is

ajtowns commented at 7:51 PM on June 5, 2026:

Didn't include the Assume -- I believe the function should behave correctly if when called with insufficient tokens or an empty backlog.

instagibbs commented at 5:16 PM on June 2, 2026: member

getting easier and easier to understand each pass, thanks for taking all this feedback

Some tasteful debug logging would probably be good too, to at least see when we are building a backlog maybe? I'll also probably start running this soon.

ajtowns force-pushed on Jun 5, 2026

ajtowns commented at 8:17 PM on June 5, 2026: contributor

Addressed feedback, added some "tasteful" logging.

in src/util/tokenbucket.h:53 in 69c92aa592

  48 | +            }
  49 | +        }
  50 | +        m_last_updated = now;
  51 | +    }
  52 | +
  53 | +    /** Consume n tokens. Returns false if the balance dropped to the given floor. */

instagibbs commented at 4:14 PM on June 8, 2026:

69c92aa5927c89071543acb9fbfdb2df33f4a928

Should explicitly document that the floor can be breached

in src/util/tokenbucket.h:64 in 69c92aa592

  59 | +
  60 | +    /** Current token balance. */
  61 | +    double value() const { return m_value; }
  62 | +
  63 | +private:
  64 | +    static constexpr time_point MIN_TIME{std::numeric_limits<duration>::min()};

instagibbs commented at 4:15 PM on June 8, 2026:

69c92aa5927c89071543acb9fbfdb2df33f4a928

bot find: numeric_limits has no chrono::duration specialization, so it's returning 0 instead of negative number: Bot suggestion:

static constexpr time_point MIN_TIME{time_point::min()}

instagibbs commented at 4:37 PM on June 8, 2026:

69c92aa5927c89071543acb9fbfdb2df33f4a928

Some bucket unit tests to fill out coverage a bit (includes the other suggested fix for 0s)

diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
index add992b764..96221d4dd2 100644
--- a/src/test/util_tests.cpp
+++ b/src/test/util_tests.cpp
@@ -2007,3 +2007,58 @@ BOOST_AUTO_TEST_CASE(token_bucket_drain_and_refill)
 }
 
+BOOST_AUTO_TEST_CASE(token_bucket_first_increment_at_epoch)
+{
+    // The first increment establishes the baseline (no refill) even when it
+    // lands exactly on the clock epoch; later increments then refill normally.
+    util::TokenBucket<NodeClock> b(/*rate=*/100, /*value=*/0, /*cap=*/1000);
+    b.increment(NodeClock::time_point{0s});
+    BOOST_CHECK_EQUAL(b.value(), 0);
+    b.increment(NodeClock::time_point{5s});
+    BOOST_CHECK_EQUAL(b.value(), 500); // 100/s * 5s
+}
+
+BOOST_AUTO_TEST_CASE(token_bucket_at_cap_advances_baseline)
+{
+    util::TokenBucket<NodeClock> b(/*rate=*/10, /*value=*/100, /*cap=*/100);
+    BOOST_CHECK_EQUAL(b.value(), 100); // already at cap
+    b.increment(NodeClock::time_point{1s});   // baseline established at 1s
+    b.increment(NodeClock::time_point{100s}); // 99s spent at the cap; baseline -> 100s
+    BOOST_CHECK_EQUAL(b.value(), 100);
+
+    b.decrement(100); // drain to 0
+    BOOST_CHECK_EQUAL(b.value(), 0);
+
+    // refill doesn't "bank" the extra 99s we were at cap
+    b.increment(NodeClock::time_point{101s});
+    BOOST_CHECK_EQUAL(b.value(), 10);
+
+    // And when real time genuinely elapses, a single increment refills straight
+    // back to the cap immediately.
+    b.increment(NodeClock::time_point{200s}); // 99s elapsed -> +990, clamped to cap
+    BOOST_CHECK_EQUAL(b.value(), 100);
+}
+
+BOOST_AUTO_TEST_CASE(token_bucket_fractional_refill)
+{
+    // Sub-second elapsed time accumulates fractional tokens via double math.
+    util::TokenBucket<NodeClock> b(/*rate=*/10, /*value=*/0, /*cap=*/100);
+    b.increment(NodeClock::time_point{1s});
+    b.increment(NodeClock::time_point{1250ms}); // 10/s * 0.25s = 2.5
+    BOOST_CHECK_EQUAL(b.value(), 2.5);
+}
+
+BOOST_AUTO_TEST_CASE(token_bucket_refill_from_debt)
+{
+    // Refilling from a negative (debt) balance accrues normally and still
+    // clamps to the cap rather than to debt + increment.
+    util::TokenBucket<NodeClock> b(/*rate=*/10, /*value=*/0, /*cap=*/100);
+    BOOST_CHECK(!b.decrement(50)); // -> -50, below floor 0
+    BOOST_CHECK_EQUAL(b.value(), -50);
+    b.increment(NodeClock::time_point{1s});   // baseline
+    b.increment(NodeClock::time_point{4s});   // +30 -> -20
+    BOOST_CHECK_EQUAL(b.value(), -20);
+    b.increment(NodeClock::time_point{100s}); // +960 but clamped to cap
+    BOOST_CHECK_EQUAL(b.value(), 100);
+}
+
 BOOST_AUTO_TEST_SUITE_END()

in src/txmempool.h:348 in ec6c3ce92d

 343 | +     * unspecified order.
 344 | +     *
 345 | +     * Note that the returned `txiter` values may become invalidated once
 346 | +     * mempool.cs is released.
 347 | +     */
 348 | +    std::vector<txiter> ExtractBestByMiningScoreWithTopology(std::vector<Wtxid>& wtxids, size_t n_to_sort) const EXCLUSIVE_LOCKS_REQUIRED(cs);

instagibbs commented at 5:21 PM on June 8, 2026:

ec6c3ce92db81aa74ca509af499f203d278d681b

Some suggested unit tests, seems like a good target for some tests. Probably would be easy to adapt to a fuzz target?

diff --git a/src/test/mempool_tests.cpp b/src/test/mempool_tests.cpp
index 77024c3edc..8069bf3540 100644
--- a/src/test/mempool_tests.cpp
+++ b/src/test/mempool_tests.cpp
@@ -13,4 +13,5 @@
 
 #include <boost/test/unit_test.hpp>
+#include <algorithm>
 #include <vector>
 
@@ -501,3 +502,151 @@ BOOST_AUTO_TEST_CASE(MempoolAncestryTestsDiamond)
 }
 
+// Create static size tx with no ancestors
+static CTransactionRef MakeRelayTx(uint32_t nonce)
+{
+    CMutableTransaction tx;
+    tx.vin.resize(1);
+    tx.vin[0].prevout = COutPoint(Txid::FromUint256(uint256{1}), nonce);
+    tx.vin[0].scriptSig = CScript() << OP_11;
+    tx.vout.resize(1);
+    tx.vout[0].scriptPubKey = CScript() << OP_11 << OP_EQUAL;
+    tx.vout[0].nValue = 10 * COIN;
+    return MakeTransactionRef(tx);
+}
+
+// Add tx to the pool with the given total fee
+static Wtxid AddRelayTx(CTxMemPool& pool, TestMemPoolEntryHelper& entry, uint32_t nonce, CAmount fee)
+{
+    const auto tx_ref{MakeRelayTx(nonce)};
+    TryAddToMempool(pool, entry.Fee(fee).FromTx(tx_ref));
+    return tx_ref->GetWitnessHash();
+}
+
+// Run ExtractBestByMiningScoreWithTopology and translate the returned iterators
+// back into wtxids
+static std::vector<Wtxid> ExtractBest(CTxMemPool& pool, std::vector<Wtxid>& wtxids, size_t n_to_sort)
+{
+    LOCK(pool.cs);
+    std::vector<Wtxid> out;
+    for (const auto it : pool.ExtractBestByMiningScoreWithTopology(wtxids, n_to_sort)) {
+        out.push_back(it->GetTx().GetWitnessHash());
+    }
+    return out;
+}
+
+BOOST_AUTO_TEST_CASE(MempoolExtractBestByMiningScore)
+{
+    CTxMemPool& pool = *Assert(m_node.mempool);
+    TestMemPoolEntryHelper entry;
+
+    // Build 6 independent txs with strictly distinct, descending feerates.
+    std::vector<Wtxid> best;
+    for (uint32_t i = 0; i < 6; ++i) {
+        best.push_back(AddRelayTx(pool, entry, i + 1, (10 - i) * 1000));
+    }
+
+    // 1) Full sort: n_to_sort >= size returns everything, best-to-worst, and
+    //    fully drains the input vector.
+    {
+        std::vector<Wtxid> in{best};
+        std::shuffle(in.begin(), in.end(), m_rng); // order does not matter
+        auto res = ExtractBest(pool, in, in.size());
+        BOOST_CHECK(in.empty());
+        BOOST_CHECK(res == best);
+    }
+
+    // 2) wtxids not in the mempool are silently dropped (and don't appear in
+    //    the result), while the rest still sort correctly.
+    {
+        std::vector<Wtxid> in{best[2], MakeRelayTx(99991)->GetWitnessHash(), best[0],
+                              MakeRelayTx(99992)->GetWitnessHash(), best[4]};
+        std::shuffle(in.begin(), in.end(), m_rng); // order does not matter
+        // *All* out-of-mempool deletions happen if n_to_sort is non-0
+        auto res = ExtractBest(pool, in, 3);
+        BOOST_CHECK(in.empty());
+        const std::vector<Wtxid> expected{best[0], best[2], best[4]};
+        BOOST_CHECK(res == expected);
+    }
+
+    // 3) All out of mempool txs are wiped, regardless of n_to_sort argument,
+    //    as long as non-0
+    {
+        std::vector<Wtxid> in{MakeRelayTx(99991)->GetWitnessHash(),
+                              MakeRelayTx(99992)->GetWitnessHash(),
+                              MakeRelayTx(99993)->GetWitnessHash(),
+                              MakeRelayTx(99994)->GetWitnessHash(),
+                              MakeRelayTx(99995)->GetWitnessHash()};
+        std::shuffle(in.begin(), in.end(), m_rng); // order does not matter
+        auto res = ExtractBest(pool, in, /*n_to_sort=*/1);
+        BOOST_CHECK(in.empty());
+        BOOST_CHECK(res.empty());
+    }
+
+
+    // 4) Duplicate wtxids are deduplicated in a single (full-sort) pass.
+    {
+        std::vector<Wtxid> in{best[3], best[1], best[3], best[1], best[3]};
+        std::shuffle(in.begin(), in.end(), m_rng); // order does not matter
+        auto res = ExtractBest(pool, in, 100);
+        BOOST_CHECK(in.empty());
+        const std::vector<Wtxid> expected{best[1], best[3]};
+        BOOST_CHECK(res == expected);
+    }
+
+    // 5) The docstring example: with 2*n_to_sort + 1 copies of a single tx and
+    //    n_to_sort == 2, the first two passes return nothing (each removing 2
+    //    duplicates), and only the third pass yields the tx exactly once.
+    {
+        std::vector<Wtxid> in(5, best[0]);
+        auto p1 = ExtractBest(pool, in, 2);
+        BOOST_CHECK(p1.empty());
+        BOOST_CHECK_EQUAL(in.size(), 3U);
+        auto p2 = ExtractBest(pool, in, 2);
+        BOOST_CHECK(p2.empty());
+        BOOST_CHECK_EQUAL(in.size(), 1U);
+        auto p3 = ExtractBest(pool, in, 2);
+        BOOST_CHECK(in.empty());
+        const std::vector<Wtxid> expected{best[0]};
+        BOOST_CHECK(p3 == expected);
+    }
+
+    // 6) Draining in small batches (partial-sort path) across many passes,
+    //    with duplicates mixed in, yields every distinct tx exactly once, in
+    //    global best-to-worst order, and never returns something still queued.
+    {
+        std::vector<Wtxid> in;
+        for (const auto& w : best) { in.push_back(w); in.push_back(w); } // each twice
+        std::shuffle(in.begin(), in.end(), m_rng); // order does not matter
+        std::vector<Wtxid> drained;
+        const uint32_t n_to_sort{2};
+        size_t max_passes{in.size()};
+        while (!in.empty()) {
+            BOOST_REQUIRE(max_passes-- > 0);
+            const auto in_size_pre{in.size()};
+            auto res = ExtractBest(pool, in, n_to_sort);
+            if (in_size_pre >= n_to_sort) {
+                BOOST_CHECK_EQUAL(in.size(), in_size_pre - n_to_sort);
+            } else {
+                BOOST_CHECK(in.empty());
+            }
+
+            // Nothing returned this pass may still be sitting in the queue.
+            for (const auto& r : res) {
+                BOOST_CHECK(std::find(in.begin(), in.end(), r) == in.end());
+            }
+            drained.insert(drained.end(), res.begin(), res.end());
+        }
+        BOOST_CHECK(drained == best); // every tx once, global order preserved
+    }
+
+    // 7) n_to_sort == 0 is a no-op: nothing is returned and the input vector is
+    //    left untouched
+    {
+        std::vector<Wtxid> in{best[0], best[1]};
+        auto res = ExtractBest(pool, in, 0);
+        BOOST_CHECK(res.empty());
+        BOOST_CHECK_EQUAL(in.size(), 2U);
+    }
+}
+
 BOOST_AUTO_TEST_SUITE_END()

in test/functional/p2p_tx_relay_rate_limit.py:27 in a389e0cb2e outdated

  22 | +SEND_RATE = 2                 # -txsendrate value
  23 | +BUCKET_CAP = SEND_RATE * 30   # count bucket capacity (60)
  24 | +NUM_TXS = 80                  # total transactions to submit
  25 | +
  26 | +
  27 | +class TxRelayRateLimitTest(BitcoinTestFramework):

instagibbs commented at 5:39 PM on June 8, 2026:

a389e0cb2ed88be61213a9515ecad762ec40eb1c

Some suggested assertions for both outbound and inbound tok size/count returns, take as you feel led.

diff --git a/test/functional/p2p_tx_relay_rate_limit.py b/test/functional/p2p_tx_relay_rate_limit.py
index c4451f6eeb..3b9bada4a1 100755
--- a/test/functional/p2p_tx_relay_rate_limit.py
+++ b/test/functional/p2p_tx_relay_rate_limit.py
@@ -17,7 +17,14 @@ from test_framework.blocktools import COINBASE_MATURITY
 from test_framework.p2p import P2PTxInvStore
 from test_framework.test_framework import BitcoinTestFramework
-from test_framework.util import assert_equal
+from test_framework.util import (
+    assert_equal,
+    assert_greater_than,
+    assert_greater_than_or_equal,
+)
 from test_framework.wallet import MiniWallet
 
+SIZE_BUCKET_INITIAL = 12_000_000  # initial size bucket value (12MB), see InvToSendBucket
+SIZE_BUCKET_CAP = 50_000_000      # size bucket capacity (50MB)
+
 SEND_RATE = 2                 # -txsendrate value
 BUCKET_CAP = SEND_RATE * 30   # count bucket capacity (60)
@@ -33,4 +40,23 @@ class TxRelayRateLimitTest(BitcoinTestFramework):
         return node.getnetworkinfo()['inv_buckets']['inbound']['backlog']
 
+    def inv_buckets(self, node):
+        return node.getnetworkinfo()['inv_buckets']
+
+    def assert_fresh_buckets(self, node):
+        """Before any relay, both buckets are empty and full of tokens."""
+        info = node.getnetworkinfo()
+        assert_equal(info['tx_send_rate'], SEND_RATE)
+        buckets = info['inv_buckets']
+        assert_equal(set(buckets.keys()), {'inbound', 'outbound'})
+        for direction in ('inbound', 'outbound'):
+            b = buckets[direction]
+            assert_equal(set(b.keys()), {'backlog', 'count_tok', 'size_tok'})
+            assert_equal(b['backlog'], 0)
+            # count bucket starts at (and is capped at) capacity
+            assert_equal(b['count_tok'], BUCKET_CAP)
+            # size bucket starts at 12MB and can only grow without sends, up to 50MB
+            assert_greater_than_or_equal(b['size_tok'], SIZE_BUCKET_INITIAL)
+            assert_greater_than_or_equal(SIZE_BUCKET_CAP, b['size_tok'])
+
     def run_test(self):
         node = self.nodes[0]
@@ -50,6 +76,6 @@ class TxRelayRateLimitTest(BitcoinTestFramework):
         assert_equal(len(peer.get_invs()), 0)
 
-        # Verify the configured send rate
-        assert_equal(node.getnetworkinfo()['tx_send_rate'], SEND_RATE)
+        # Sanity-check the getnetworkinfo bucket fields before any relay.
+        self.assert_fresh_buckets(node)
 
         self.test_rate_limit_and_rbf(node, wallet, peer)
@@ -81,4 +107,11 @@ class TxRelayRateLimitTest(BitcoinTestFramework):
         assert_equal(len(peer.get_invs()), 0)
 
+        # Both count buckets are drained (even without outbounds),
+        # the size buckets are not; backlog reported
+        for direction in ('inbound', 'outbound'):
+            b = self.inv_buckets(node)[direction]
+            assert_equal(b['count_tok'], 0)
+            assert_greater_than(b['size_tok'], SIZE_BUCKET_INITIAL - 1_000_000)
+
         # RBF the backlogged original while time is still frozen, so the
         # replacement also queues in the backlog (the bucket is exhausted). The
@@ -108,4 +141,15 @@ class TxRelayRateLimitTest(BitcoinTestFramework):
         assert int(tx_rbf_repl['wtxid'], 16) in announced
 
+        # With the backlog cleared, advancing time well past the refill window
+        # tops both count buckets back up to their capacity and leaves no
+        # residual backlog in either direction.
+        self.log.info("Verifying buckets refill to capacity once idle")
+        node.bumpmocktime(BUCKET_CAP)
+        peer.sync_with_ping()
+        for direction in ('inbound', 'outbound'):
+            b = self.inv_buckets(node)[direction]
+            assert_equal(b['count_tok'], BUCKET_CAP)
+            assert_greater_than_or_equal(SIZE_BUCKET_CAP, b['size_tok'])
+
         self.log.info("Rate limiting and RBF backlog cleanup test passed")

instagibbs commented at 5:45 PM on June 8, 2026: member

ee38f78365dc45c2bff8cf495a26be905bcabbb5

Last round of comments from me I think. Will be running for manual testing

in src/init.cpp:693 in 747783c839 outdated

 688 | @@ -689,6 +689,10 @@ void SetupServerArgs(ArgsManager& argsman, bool can_listen_ipc)
 689 |                     OptionsCategory::NODE_RELAY);
 690 |      argsman.AddArg("-minrelaytxfee=<amt>", strprintf("Fees (in %s/kvB) smaller than this are considered zero fee for relaying, mining and transaction creation (default: %s)",
 691 |          CURRENCY_UNIT, FormatMoney(DEFAULT_MIN_RELAY_TX_FEE)), ArgsManager::ALLOW_ANY, OptionsCategory::NODE_RELAY);
 692 | +    argsman.AddArg("-txsendrate=<n>",
 693 | +                   strprintf("Set the maximum ongoing rate for sending transactions to (inbound) peers (default: %u tx/s)",

instagibbs commented at 5:50 PM on June 8, 2026:

747783c839a69065a6438839cafd1bf3d99a87af

Should we mention the 2.5x multiplier for outbounds here, or in getnetworkinfo?

in src/net_processing.cpp:2334 in 05cc940590 outdated

2345 | +    }
2346 | +
2347 | +    return best;
2348 | +}
2349 | +
2350 | +void PeerManagerImpl::ProcessInvBacklog(NodeClock::time_point now, bool backlog_bumped)

instagibbs commented at 6:36 PM on June 8, 2026:

05cc940590aaf4e6cb322a854564ce9494034150

Running this PR intentionally causing a backlog, I noticed that the inbound bucket tokens were being used, backlog sorted, etc, even if no one was connected.

Wondering if this could be a cheap check to avoid doing work, fluctuating token counts, for no reason?

edit: Would also have to adapt the backlog freeing call in TakeForProcessing...

diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 084dbc99c7..e14791becc 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -2382,4 +2382,31 @@ void PeerManagerImpl::ProcessInvBacklog(NodeClock::time_point now, bool backlog_
     if (!in_avail && !out_avail) return;
 
+    // If relevant peers don't exist, clear the backlog and
+    // avoid doing any additional work, and avoid using tokens 
+    bool any_inbound_connected = false;
+    bool any_outbound_connected = false;
+    {
+        LOCK(m_peer_mutex);
+        for (auto& it : m_peer_map) {
+            Peer& peer = *it.second;
+            // Quick and dirty filter; we don't check
+            // if tx relay was selected until second time through
+            if (peer.m_is_inbound) {
+                any_inbound_connected = true;
+            } else {
+                any_outbound_connected = true;
+            }
+        }
+    }
+    if (!any_inbound_connected) {
+        std::vector<Wtxid>{}.swap(m_inbound_inv_bucket.backlog);
+        in_avail = false;
+    }
+    if (!any_outbound_connected) {
+        std::vector<Wtxid>{}.swap(m_outbound_inv_bucket.backlog);
+        out_avail = false;
+    }
+    if (!in_avail && !out_avail) return;
+
     std::vector<Wtxid> for_inbound;
     std::vector<Wtxid> for_outbound;
@@ -2392,6 +2419,4 @@ void PeerManagerImpl::ProcessInvBacklog(NodeClock::time_point now, bool backlog_
 
     if (!for_inbound.empty() || !for_outbound.empty()) {
-        bool any_inbound_connected = false;
-        bool any_outbound_connected = false;
         LOCK(m_peer_mutex);
         for (auto& it : m_peer_map) {
@@ -2407,19 +2432,8 @@ void PeerManagerImpl::ProcessInvBacklog(NodeClock::time_point now, bool backlog_
             // in the announcement.
             if (tx_relay->m_next_inv_send_time == 0s) continue;
-            if (peer.m_is_inbound) {
-                any_inbound_connected = true;
-            } else {
-                any_outbound_connected = true;
-            }
             for (auto& i : (peer.m_is_inbound ? for_inbound : for_outbound)) {
                 tx_relay->m_tx_inventory_to_send.emplace_back(i);
             }
         }
-
-        // if the node has no in/outbound connections, clear the corresponding backlog entirely
-        // this reduces wasted memory, and avoids having the bucket artificially empty for when
-        // future peers do connect.
-        if (!any_inbound_connected) std::vector<Wtxid>{}.swap(m_inbound_inv_bucket.backlog);
-        if (!any_outbound_connected) std::vector<Wtxid>{}.swap(m_outbound_inv_bucket.backlog);
     }
 }

ajtowns commented at 7:14 AM on June 26, 2026:

Wondering if this could be a cheap check to avoid doing work, fluctuating token counts, for no reason?

I don't really think there's much work to be saved -- if you have no inbounds, then you'll empty the inbound backlog at least as fast as you empty the outbound backlog, so won't do any additional loops due to inbound processing normally being slower. I don't think avoiding token count fluctuations is worth adding an extra loop over your peers every time you get a tx.

instagibbs commented at 8:05 PM on June 8, 2026: member

Hm, forcerelay peers can maybe OOM crash you by sending you the same tx over and over since the entries are not de-duplicated just in time anymore in this PR.

DrahtBot added the label Needs rebase on Jun 8, 2026

instagibbs commented at 12:35 PM on June 10, 2026: member

Running with intentional backlog via txsendrate=1 with a debug build, getting very reasonable ~5% cpu usage with over 20k items in two backlogs consistently. Aligns with my understanding of the design: if we only have a tiny budget, it degrades to O(n) linear scans over the backlog rather than full sort of all queues over all peers. @0xB10C might be worthwhile running a long running node like this as test infra, along with a standard config node with the same code

0xB10C commented at 1:34 PM on June 10, 2026: contributor

@0xB10C might be worthwhile running a long running node like this as test infra, along with a standard config node with the same code

been doing this since march with -txsendrate=4 https://bnoc.xyz/t/increased-b-msghand-thread-utilization-due-to-runestone-transactions-on-2026-02-17/81/14?u=b10c but I should probably update the nodes to to a more recent version of this PR.

ajtowns force-pushed on Jun 26, 2026

DrahtBot added the label CI failed on Jun 26, 2026

ajtowns commented at 7:25 AM on June 26, 2026: contributor

Hm, forcerelay peers can maybe OOM crash you by sending you the same tx over and over since the entries are not de-duplicated just in time anymore in this PR.

Converted the std::vector backlog/inv_to_send back to std::set so it auto dedupes, and simplified the mempool code correspondingly; it also now erases from the set (rather than reconstructing the vector in place from scratch). Uses a bit more memory and has less a bit less memory locality, but shouldn't be a measurable performance difference, I think.

DrahtBot removed the label Needs rebase on Jun 26, 2026

DrahtBot removed the label CI failed on Jun 26, 2026

instagibbs commented at 3:17 PM on June 26, 2026: member

This is a lot easier to reason about, at a cost I think we can swallow.

Did you consider unordered_set + SaltedWtxidHasher? Should be ~3.4x faster inserts, ~3.7x faster scanning, and a bit less (~9%?) memory. We don't care about order since we're linear scanning, then sorting top N results.

edit: Something like ~5s inbound and ~2s output maximum time where the backlog could theoretically be larger than the underlying mempool, but requires rbfs/evicted txs paying their way in the first play via incremental fee. Seems more than ok!

Also noting that in the case where no peers are connected (meaning we get no new txs), the backlog stays frozen. Not sure that's an issue, just peculiar and something I'll think about. I believe it just means during a long peer-less stretch you hold onto backlog you'd never serve to any future users.

in src/net_processing.cpp:6270 in 3c070a93bc

6293 | -                        if (!txinfo.tx) {
6294 | -                            continue;
6295 | +                    // (sorted from higher priority to lowest, skipping low fee)
6296 | +                    const CFeeRate filterrate{tx_relay->m_fee_filter_received.load()};
6297 | +                    auto inv_tx = [&]() EXCLUSIVE_LOCKS_REQUIRED(tx_relay->m_tx_inventory_mutex) {
6298 | +                        auto& vec = tx_relay->m_tx_inventory_to_send;

instagibbs commented at 3:29 PM on June 26, 2026:

this is a set now not a vec(could also be unordered set?)

comment describing the field needs changing too

ajtowns force-pushed on Jun 27, 2026

ajtowns commented at 10:38 AM on June 27, 2026: contributor

Did you consider unordered_set + SaltedWtxidHasher? Should be ~3.4x faster inserts, ~3.7x faster scanning, and a bit less (~9%?) memory. We don't care about order since we're linear scanning, then sorting top N results.

Where are you getting those numbers from? Claude says "Smells like LLM-generated numbers presented as facts." ;) I would expect them to be dominated by the mempool lookups in either case. I did try that first, but using an unordered_set means you ought to deal with the capacity, both to avoid wasting too much time rehashing as the set grows, and to avoid permanently wasting memory as it shrinks. Other than declarations, the differences I had were just:

//TakeForProcessing
+    /* if backlog is empty, reset it to a default size */
+    if (backlog.empty()) backlog.reserve(1500);

//SendMessages
                     tx_relay->m_tx_inventory_to_send.clear();
+                    tx_relay->m_tx_inventory_to_send.reserve(150);

Picking those base sizes seemed pretty arbitrary, and since it's already using std::set prior to this PR changing to unordered_set it just seemed gratuitous to me.

edit: Something like ~5s inbound and ~2s output maximum time where the backlog could theoretically be larger than the underlying mempool, but requires rbfs/evicted txs paying their way in the first play via incremental fee. Seems more than ok!

At 56 bytes per backlog entry (std::set overhead plus 32B wtxid), 1MB of backlog data would be ~18k txs; if each tx has a signature, that's 3600 sigs/second, or 0.277 ms per signature verification, which is faster than the 0.36 ms per sig figure from here with 500-sigs being batch verified. So I think getting more than a few meg of stale wtxids in the backlog should be fairly difficult even in the laboratory just due to processing tx acceptance sequentially.

Also noting that in the case where no peers are connected (meaning we get no new txs), the backlog stays frozen.

The backlog gets processed both on a new tx (so calling sendrawtransaction with an entry from the mempool would clear it), and via SendMessages (so just having a peer connected, despite not getting any new txs), but otherwise, yeah. Something to refactor if/when we move ThreadMessageHandler from CConnman to PeerManager.

instagibbs commented at 10:47 AM on June 27, 2026: member

Where are you getting those numbers from? Claude says "Smells like LLM-generated numbers presented as facts." ;) I would expect them to be dominated by the mempool lookups in either case.

Bespoke, unrepresentative benchamarks, I'll have my bot MCP your bot. Obviously these costs do not dominate vs the other operations, and your memory management comment is compelling. Just wanted to make sure you had considered it.

So I think getting more than a few meg of stale wtxids in the backlog should be fairly difficult even in the laboratory just due to processing tx acceptance sequentially.

Agreed, it would have been "nice" to have a closed form reasoning about the total size, but the cure would be worse than the disease.

instagibbs approved

instagibbs commented at 3:07 PM on June 29, 2026: member

ACK e26f3f0ea8f747cd88d6602c67a677c7795743e1

DrahtBot requested review from naiyoma on Jun 29, 2026

DrahtBot requested review from 0xB10C on Jun 29, 2026

DrahtBot requested review from polespinasa on Jun 29, 2026

in src/net_processing.cpp:5997 in e26f3f0ea8

5993 | @@ -5850,6 +5994,8 @@ bool PeerManagerImpl::SendMessages(CNode& node)
5994 |  
5995 |      MaybeSendSendHeaders(node, peer);
5996 |  
5997 | +    ProcessInvBacklog(NodeClock::now());

instagibbs commented at 5:30 PM on July 6, 2026:

munit:

    ProcessInvBacklog(now);

in src/txmempool.cpp:595 in e26f3f0ea8 outdated

 601 | +            if (n_to_sort >= res.size()) {
 602 | +                // use regular sort when sorting everything
 603 | +                std::sort(begin, end, cmp);
 604 | +            } else {
 605 | +                middle = begin + n_to_sort;
 606 | +                std::partial_sort(begin, middle, end, cmp);

instagibbs commented at 5:50 PM on July 6, 2026:

bot find: This line has no test coverage(at all?) as the functional test is right on the edge of having an unsorted backlog.

This function is a good target for some unit/fuzz tests?

in test/functional/p2p_tx_relay_rate_limit.py:97 in e26f3f0ea8

  92 | +        # the backlog is empty and every surviving tx has trickled out.
  93 | +        self.log.info("Advancing time to drain the backlog")
  94 | +        for _ in range(30):
  95 | +            if self.inbound_backlog(node) == 0 and len(peer.get_invs()) == NUM_TXS:
  96 | +                break
  97 | +            node.bumpmocktime(5)

instagibbs commented at 5:58 PM on July 6, 2026:

This change causes a not-totally-sorted backlog, even though we probably want more directed coverage?

            node.bumpmocktime(4)

in src/net_processing.cpp:1164 in e26f3f0ea8

1159 | @@ -1105,6 +1160,14 @@ class PeerManagerImpl final : public PeerManager
1160 |  
1161 |      /// The transactions to be broadcast privately.
1162 |      PrivateBroadcast m_tx_for_private_broadcast;
1163 | +
1164 | +    mutable Mutex m_inv_to_send_mutex;

instagibbs commented at 6:03 PM on July 6, 2026:

to imitate m_tx_download_mutex ?

    mutable Mutex m_inv_to_send_mutex ACQUIRED_BEFORE(m_mempool.cs);

ajtowns commented at 1:19 PM on July 8, 2026:

Done this. Requires clang-22 or -Wthread-safety-beta to have an effect though, so I think it's untested.

fanquake commented at 1:21 PM on July 8, 2026:

Our TSAN CI job uses Clang 22.1.8.

DrahtBot added the label Needs rebase on Jul 7, 2026

ajtowns force-pushed on Jul 8, 2026

DrahtBot removed the label Needs rebase on Jul 8, 2026

instagibbs commented at 5:39 PM on July 8, 2026: member

reACK b9ab06ac3321f8b089cd4789fb43515b4ce07fd4

rebase with additional annotation + NodeClock value reuse + bumpmocktime modification that adds partial sort coverage of backlog (still would like some direct coverage ideally)

in src/net_processing.cpp:520 in 9a49162d3d outdated

 515 | +     *   too much.
 516 | +     * Count floor: In order to avoid sorting the global backlog too often, we ensure
 517 | +     *   that we always remove at least an average INV message's number of transactions
 518 | +     *   each time we do work. (Or 50kB if the size bucket is the limiting factor)
 519 | +     */
 520 | +    InvToSendBucket(unsigned int rate, double mult)

sipa commented at 7:49 PM on July 10, 2026:

In commit "net_processing: add a global delay queue for sending txs"

Adding some named constants for these numbers would be nice.

ajtowns commented at 11:33 PM on July 11, 2026:

Added named constants within the struct

in src/net_processing.cpp:2333 in 9a49162d3d outdated

2334 | -        Peer& peer{*peer_ref};
2335 | +    AssertLockHeld(mempool.cs);
2336 |  
2337 | -        auto tx_relay = peer.GetTxRelay();
2338 | -        if (!tx_relay) continue;
2339 | +    size_t n_to_take = static_cast<size_t>(std::max<double>(count_bucket.value() - count_floor, 0));

sipa commented at 8:00 PM on July 10, 2026:

The rationale for just looking at the count_floor here is that we pretty much always expect that to be the limiting bucket?

ajtowns commented at 2:25 AM on July 11, 2026:

Mostly that ExtractBestByMiningScoreWithTopology only takes a count and doesn't try to do any size calculations.

in src/net_processing.cpp:504 in 9a49162d3d

 498 | @@ -494,6 +499,52 @@ struct CNodeState {
 499 |      int64_t m_last_block_announcement{0};
 500 |  };
 501 |  
 502 | +struct InvToSendBucket {
 503 | +    const double count_floor{0};
 504 | +    std::set<Wtxid> backlog;

sipa commented at 8:15 PM on July 10, 2026:

Would it make sense to use an std::vector<Wtxid> here instead?

ExtractBestByMiningScoreWithTopology does need to go through the whole thing anyway, so it might as well do a sort + dedup step beforehand. Unless very significant duplication is expected, the improved memory locality and lower memory usage should make it a win over the std::set on-the-fly deduplication.

LLM claims the sources of duplications are (a) forcerelay peers (b) sendrawtransaction and (c) ReattemptInitialBroadcast, which all sound fine to me if correct.

instagibbs commented at 12:43 AM on July 11, 2026:

It was done that way prior #34628 (comment)

:sweat_smile:

ajtowns commented at 2:39 AM on July 11, 2026:

Man, things would be pretty bad if ReattemptInitialBroadcast was causing enough duplicates to be any sort of problem.

If you have a sustained backlog of 50k txs with ~0 duplicates, with 70 in and 70 out every 5s (in order for the backlog to be sustained), then sorting would be 50k*15 operations every 5 seconds, but std::set should be just be 70*15 ops every 5 seconds? (15 being the log(50k) factor) I guess sorting by wtxid first would make looking up the mempool entry by wtxid faster, so it's probably pretty easy to argue that the ~~O(N * log(M))~~ mempool lookups will happily absorb an O(N log(N)) backlog sort. EDIT: except the mempool lookups are O(N)-ish because it's hash based. but still...

:scream:

ajtowns commented at 11:34 PM on July 11, 2026:

Okay, it's a vector that gets fully sorted each time ExtractBestBMSWT is called, and deduped just before the mempool lookups occur

net_processing: bump last_inv_sequence for bip35 messages explicitly

This avoids relying on the bump that occurs through the normal INV
process, which is confusing.

46c8c471dc

net_processing: Remove per-peer rate-limiting

Per-peer rate limiting introduces storage and compute costs proportional
to the number of peers. This has caused severe bugs in the past, and
continues to be a risk in the event of periods of extremely high rates
of transaction submission. Avoid these problems by always completely
emptying the m_tx_inventory_to_send queue when processing it.

Note that this increases the potential size of INV messages we send
for normal tx relay from ~1000 (limited by INVENTORY_BROADCAST_MAX)
to potentially 50000 (limited by MAX_INV_SZ).

026f70e05f

txmempool: Add ExtractBestByMiningScoreWithTopology

Add a method for (partially) sorting a batch of transactions (specified as
a std::vector of wtxids) per mempool order, designed for transaction relay.

6cfc65d210

net_processing: Replace CompareInvMempoolOrder

Remove CompareInvMempoolOrder, replacing it with the new
ExtractBestByMiningScoreWithTopology. The trickle send code is reworked
accordingly.

e1b7490fbc

txmempool: Drop CompareMiningScoreWithTopology

Now unused; replaced by ExtractBestByMiningScoreWithTopology.

749bb447f8

util/tokenbucket.h: Provide a generic TokenBucket class

This is a simple token bucket parameterized on clock type, used in the
following commit.

7927650e56

net_processing: add a global delay queue for sending txs

Without the per-peer rate limiting, nodes can act as an amplifier for
transaction spam -- receiving many transactions from one node, but
relaying each of them to over 100 other nodes. Limit the impact of this
by providing a global rate limit.

This is implemented using dual token buckets, one that consumes a
token for every transaction, and one that consumes a token for every
serialized byte. This rate limits both per-tx resource usage (eg INV
messages) and overall relay bandwidth.

Main bucket parameters:
 * Count: 14tx/s rate, 420tx (30s) capacity
 * Size: 12MB/600s rate (4-6 blocks per target block interval), 50MB capacity

The size bucket is expected to be large enough to almost never have an
impact in normal usage, even during transaction storms, and is primarily
intended to mitigate attack-like scenarios.

Outbound connections get a separate pair of buckets, with rates boosted
by a 2.5x multiplier.

This avoids the excessive memory and CPU usage due to the 100x multiplier
from the queues being per-peer.

Note that this also reduces the size of INV messages we send for general
tx relay back to a more reasonable level of under 600 txs in 99.999%
of cases.

df31ee57aa

net_processing: Provide a 30bpm heartbeat log while inv backlog is in use 6307bd034b

init: add -txsendrate configuration parameter

Adds a debug-only configuration option to set the target
transaction/second rate for relay to inbound connections. This is mostly
intended to be set to artificially low values to aid in testing behaviour
when a backlog occurs, but is also available in case the default 14tx/s
target is somehow too low in practice.

74a47a5207

rpc: report -txsendrate and bucket info via getnetworkinfo

Add `tx_send_rate` and `inv_buckets` fields to getnetworkinfo. The
`inv_buckets` field has separate `inbound` and `outbound` entries,
reporting backlog count, count tokens, and size tokens. Useful for
monitoring relay behavior.

4842903ac1

tests: basic functional test for tx rate limiting 5cde66341a

doc: Add release note for -txsendrate etc 12b0dc33c4

net_processing: Drop unnecessary txid arg from InitiateTxBroadcastToAll 349c72ee00

ajtowns force-pushed on Jul 11, 2026

sipa commented at 2:15 PM on July 12, 2026: member

Code review ACK 349c72ee00a06581aa8cddaced6377c49a81d511. I haven't tested it myself yet (though switched my well-connected node to it now), but the posted benchmarks and analyses look convincing.

I like the design of separating the concerns of rate-limiting (global) and privacy-trickling (per peer, though with a shared clock for inbounds). The mempool itself functions as a bound on the data structures, because (over time) only mempool transactions can appear in the queues, and only once in each.

(I was also fine with the std::set<Wtxid> queue design in case there are concerns about duplicates still - but I like the compactness of the data structures here more)

sipa closed this on Jul 12, 2026

sipa reopened this on Jul 12, 2026

in src/net_processing.cpp:2458 in 349c72ee00

2469 | +        }
2470 | +
2471 | +        // if the node has no in/outbound connections, clear the corresponding backlog entirely
2472 | +        // this reduces wasted memory, and avoids having the bucket artificially empty for when
2473 | +        // future peers do connect.
2474 | +        if (!any_inbound_connected) m_inbound_inv_bucket.backlog.clear();

instagibbs commented at 9:19 PM on July 12, 2026:

mu-nit: could manage the capacities like in other spots

could think about a common routine to ClearBacklog()

ajtowns commented at 3:13 PM on July 14, 2026:

If it was over capacity, it'll get cleared on the next call to TakeForProcessing that actually empties the backlog. Usually that should be when the next tx comes in; the unusual cases are if there is continually more new txs to relay than the rate limit bucket has capacity for, but the only downside is that the backlog vector will have extra space to not have to reallocate while the flood is ongoing. Current mempool is 100k entries at ~256MB, and 100k entries in the backlog is ~3.2MB of allocation here (potentially doubled to 6.4MB if you have no peers and are flooding your node with txs via sendrawtransaction, I guess?)

instagibbs approved

instagibbs commented at 9:28 PM on July 12, 2026: member

reACK 349c72ee00a06581aa8cddaced6377c49a81d511

via git range-diff master b9ab06ac3321f8b089cd4789fb43515b4ce07fd4 349c72ee00a06581aa8cddaced6377c49a81d511

Backlog wtxids being deduplicated as a batch keeps things easy to reason about, no +1 bonus logic etc.

in src/rpc/net.cpp:662 in 4842903ac1

 658 | @@ -659,6 +659,16 @@ static RPCMethod getnetworkinfo()
 659 |                          }},
 660 |                          {RPCResult::Type::BOOL, "localrelay", "true if transaction relay is requested from peers"},
 661 |                          {RPCResult::Type::NUM, "timeoffset", "the time offset"},
 662 | +                        {RPCResult::Type::NUM, "tx_send_rate", "configured target for maximum number of transactions per second to send to inbound peers"},

mzumsande commented at 12:08 AM on July 14, 2026:

Concept ACK

Finished a first reading through the code, will dig in deeper in the next days.

One thing I noticed is that in many spots throughout the PR (tx_send_rate etc.) it talks about restricting "sending transactions" or "transaction relay", while all changes refer to "announcing" transactions.

The limit doesn't apply for peers proactively asking us for transactions via GETDATA that we didn't INV, and not all transactions that we announce will actually be requested. Should this distinction be made more clear?

ajtowns commented at 5:33 AM on July 14, 2026:

One thing I noticed is that in many spots throughout the PR (tx_send_rate etc.) it talks about restricting "sending transactions" or "transaction relay", while all changes refer to "announcing" transactions.

That's more or less the way the code currently in master talks about INV announcements:

                // Determine transactions to relay
                if (fSendTrickle) {
                    // Produce a vector with all candidates for sending
                        // Remove it from the to-be-sent set
                        // Not in the mempool anymore? don't bother sending it.
                        // Peer told you to not send transactions at that feerate? Don't bother sending it.
                        // Send

For what it's worth, the way I look at it is that we've "sent" the tx as soon as we announce it; after that, it's our peer's problem to do any requests etc. That's where all the policy/decision making goes; the "GETDATA/TX" part of the sending side is just a dumb lookup/reply (modulo the "have we even announced this" seq number stuff).