p2p: Release m_peer_mutex early in InitiateTxBroadcastToAll #35297

maflcko commented at 9:40 AM on May 15, 2026: member

The InitiateTxBroadcastToAll method holds the m_peer_mutex while updating the bloom filters for all peers. This is perfectly fine, because updating the bloom filters is fast. Though, from a style-perspective, the lock does not need to be held for the whole function. Also, holding the lock longer, may confuse Tsan into a lock-order inversion false-positive (ref: #19303 (comment)).

So "fix" both issues in this style-refactor.

p2p: Release m_peer_mutex early in InitiateTxBroadcastToAll fa2afba28b

DrahtBot added the label P2P on May 15, 2026

DrahtBot commented at 9:40 AM on May 15, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35297.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	xyzconstant, shuv-amp, danielabrozzoni, sedited

If your review is incorrectly listed, please copy-paste <code></code> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34628 (p2p: Replace per-peer transaction rate-limiting with global rate limits by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko marked this as a draft on May 15, 2026

maflcko marked this as ready for review on May 15, 2026

xyzconstant commented at 5:24 AM on May 19, 2026: contributor

Code review ACK fa2afba28b5776a710386a3bacd5cb308149d671

While reviewing #34628 noticed this minor issue with m_peer_mutex being held for too long. The refactor looks clean to me.

shuv-amp commented at 9:44 AM on May 19, 2026: none

ACK fa2afba28b5776a710386a3bacd5cb308149d671

Reviewed the locking change and tested locally.

Copying the PeerRef values while holding m_peer_mutex keeps the Peer objects alive after the map lock is released. The tx inventory update remains protected by TxRelay::m_tx_inventory_mutex, so this removes the m_peer_mutex -> per-peer tx-relay lock nesting from this path.

Built locally and ran:

cmake --build build --target bitcoind bitcoin-cli -j"$(sysctl -n hw.ncpu)"
test/functional/p2p_filter.py --configfile=$PWD/build/test/config.ini --timeout-factor=4
test/functional/p2p_segwit.py --configfile=$PWD/build/test/config.ini --timeout-factor=4

danielabrozzoni commented at 5:12 PM on June 3, 2026: member

Code Review ACK fa2afba28b5776a710386a3bacd5cb308149d671

Code looks good to me, I didn't try reproducing the old tsan warning locally, but I inspected the code and saw the previous lock order cycle:

PeerManagerImpl::InitiateTxBroadcastToAll would:
1. LOCK m_peer_mutex
2. LOCK tx_relay->m_tx_inventory_mutex
PeerManagerImpl::SendMessages would:
1. LOCK tx_relay->m_tx_inventory_mutex
2. LOCK m_mempool.cs
ActivateBestChain and callbacks would:
1. LOCK m_mempool.cs
2. LOCK m_peer_mutex (in GetPeerRef)

in src/net_processing.cpp:2265 in fa2afba28b

2263 | +    for (const PeerRef& peer_ref : GetAllPeers()) {
2264 | +        if (!peer_ref) continue;
2265 | +        Peer& peer{*peer_ref};
2266 | +
2267 |          auto tx_relay = peer.GetTxRelay();
2268 |          if (!tx_relay) continue;

sedited commented at 8:21 PM on June 8, 2026:

I guess technically this could be racy, but in practice it is not, because InitiateTxBroadcastToAll is not called from a context where the tx_relay status might change?

maflcko commented at 6:52 AM on June 9, 2026:

I don't think this refactor changes anything about races:

If you are referring to a memory-safety race, where tx_relay could be set to nullptr: There is no code currently doing that, and the bug would be unchanged by this pull request, because m_peer_mutex does not protect m_tx_relay.
If you are referring to a logic race, where tx_relay may still be nullptr: This certainly does happen in practise, because calling this function on the RPC/wallet thread will race against the p2p threads creating new peers, creating tx_relay structs and setting m_next_inv_send_time. However, this pull request does not change anything about that. If there was a RPC thread that waited for a peer or a peer's tx_relay, or a peer's tx_relay.m_tx_inventory_mutex to be initialized, they are unaffected by this refactor, because they will correctly wait for the status and then correctly observe the intended behavior.
If you are referring to a race, where the RPC/wallet thread doesn't wait and "blindly" calls this function, then the behavior before and after this refactor will be that the broadcast is initiated on a best-effort basis to all peers which have m_next_inv_send_time currently set.

Of course, it is permitted to lock m_peer_mutex while locking a peer-specific mutex, but I don't see a use-case for that (unless one would want to micro-optimize to avoid taking a copy of a shared pointer). So I think it is better to avoid holding m_peer_mutex longer than needed and only hold it longer when there is a strong rationale.

If m_peer_mutex was used to prevent peer-internal state races, that seems like a confusing layer violation. If this was used, at a minimum it would require a strong rationale, and proper lock annotation (or code comment).

sedited commented at 8:15 AM on June 9, 2026:

Thanks, this was helpful to further trace some of its usage.

maflcko commented at 11:20 AM on June 9, 2026:

Of course, it is permitted to lock m_peer_mutex while locking a peer-specific mutex, but I don't see a use-case for that (unless one would want to micro-optimize to avoid taking a copy of a shared pointer). So I think it is better to avoid holding m_peer_mutex longer than needed and only hold it longer when there is a strong rationale.

To extend on this: Another place that would be safer by taking the GetAllPeers snapshot and releasing the m_peer_mutex early is RelayAddress/PushAddress. With the current code in master, the following (arguably absurd) diff would compile fine, but lead to runtime-UB/crash later on:

diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 71cc6ff7ba..5a8f6105f0 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -1139,2 +1139,3 @@ void PeerManagerImpl::PushAddress(Peer& peer, const CAddress& addr)
         } else {
+         LOCK(m_peer_mutex);
             peer.m_addrs_to_send.push_back(addr);

I think copying shared pointers here is fine and beneficial to reduce the (recursive) scope of the lock. Though, this style-cleanup can be done in the future when that code is touched the next time.

edit: nvm, this is caught by clang:

src/net_processing.cpp:1140:10: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_peer_mutex' [-Wthread-safety-analysis]
 1140 |          LOCK(m_peer_mutex);
      |          ^

sedited approved

sedited commented at 8:16 AM on June 9, 2026: contributor

ACK fa2afba28b5776a710386a3bacd5cb308149d671

fanquake merged this on Jun 9, 2026

fanquake closed this on Jun 9, 2026

maflcko deleted the branch on Jun 9, 2026