net: encapsulate TxRelay state and replace recursive mutexes #34824

pull w0xlt wants to merge 8 commits into bitcoin:master from w0xlt:refactor/replace-txrelay-recursive-mutexes changing 7 files +766 −204
  1. w0xlt commented at 8:07 AM on March 14, 2026: contributor

    Part of #19303.

    This PR refactors per-peer transaction relay state so TxRelay owns its guarded data and lock boundaries before replacing its RecursiveMutex members with plain Mutex.

    Before this change, Peer::TxRelay was mostly a collection of public state inside net_processing.cpp. Callers directly locked its mutexes, inspected its fields, and mutated its containers. That made the mutex type change hard to review: replacing RecursiveMutex with Mutex is only clearly safe if outside code cannot accidentally recurse, take locks in the wrong order, or bypass the intended state transitions.

    The main changes are:

    • Move Peer::TxRelay into node::TxRelay.
    • Route bloom-filter and transaction-inventory state changes through named TxRelay methods.
    • Remove direct TxRelay field and mutex access from net_processing.cpp.
    • Snapshot queued transaction inventory into a TxInventoryBatch in SendMessages(), process it without holding the TxRelay mutex, and merge unsent entries back afterward.
    • Hide the batch backing storage behind a small semantic API.
    • Add unit tests and a fuzz target for the extracted TxRelay behavior.
    • Replace the two RecursiveMutex members with Mutex and annotate the inventory-before-bloom lock order.

    No P2P behavior change is intended. The purpose is to move from "external code locks and mutates TxRelay internals" to "TxRelay owns its state transitions and locking contract."

    The most important commit to review is the SendMessages() batch snapshot refactor, because that is where the lock ownership model changes. The final mutex replacement is intentionally small after the state has been encapsulated.

  2. DrahtBot added the label P2P on Mar 14, 2026
  3. DrahtBot commented at 8:08 AM on March 14, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34824.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK hebasto, theuni
    Stale ACK sedited, pablomartin4btc

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #35522 (refactor: Extract per-message helpers from SendMessages() (move-only) by pablomartin4btc)
    • #35511 (RFC: consensus: Make CAmount a class by hodlinator)
    • #35502 (refactor: extract per-message helpers from ProcessMessage (move-only) by w0xlt)
    • #35315 (refactor: Use NodeClock::time_point in more places by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

    LLM Linter (✨ experimental)

    Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

    • CBloomFilter{10, 0.001, 0, BLOOM_UPDATE_NONE} in src/test/txrelay_tests.cpp

    <sup>2026-06-17 17:25:34</sup>

  4. sedited commented at 9:09 AM on March 14, 2026: contributor

    Concept ACK

    I looked this over and also didn't find any cases where locking recursion would be required, but the change seems brittle to me. Not sure if just changing these is the best way to handle this.

  5. w0xlt force-pushed on Mar 15, 2026
  6. w0xlt commented at 7:39 PM on March 15, 2026: contributor

    Thanks @sedited for the feedback.

    The original PR intentionally included only the first commit to avoid non-trivial refactoring and focus on the RecursiveMutex conversion. After your concerns that this minimal approach might be brittle, I took a second pass.

    A broader refactor seems justified: both TxRelay and net_processing benefit from clearer separation of concerns and improved modularization. The main changes are:

    • TxRelay is now a dedicated component in src/node/txrelay.h, instead of being embedded in src/net_processing.cpp.

    • SendMessages() and related call sites now use intent-named operations (ConsumeSendMempool, SetNextInvSendTime, TxInventoryKnownInsert, etc.) rather than mutating fields directly, which reduces local complexity.

    • Locking contracts are now explicit at API boundaries (EXCLUSIVE_LOCKS_REQUIRED, LOCK_RETURNED), improving static thread-safety checking and reviewability.

    • State transitions/invariants are centralized in TxRelay methods (PushInventory, VerifyInventoryPristine, ClearTxInventoryToSendIfNoRelayTxs), reducing the risk of missed field updates.

    • Lock-order reasoning is clearer because lock acquisition sites are now structured and annotated rather than scattered raw member-lock uses.

    • Future changes to TxRelay internals (containers/fields) should require fewer net-processing edits, since call sites depend more on behavior methods than storage details.

  7. hebasto commented at 4:12 PM on March 16, 2026: member

    Replace m_bloom_filter_mutex and m_tx_inventory_mutex in Peer::TxRelay from RecursiveMutex to Mutex.

    Concept ACK.

  8. theuni commented at 10:58 PM on March 18, 2026: member

    Concept ACK.

    Lock-order reasoning is clearer because lock acquisition sites are now structured and annotated rather than scattered raw member-lock uses.

    See also the ACQUIRED_BEFORE/ACQUIRED_AFTER annotations which may be useful when there's an established ordering that should be followed.

    (Note that unless -Wthread-safety-beta is used, clang 22 is required to actually see the warnings: https://github.com/llvm/llvm-project/pull/152853)

  9. fanquake commented at 6:26 AM on March 19, 2026: member

    (Note that unless -Wthread-safety-beta is used, clang 22 is required to actually see the warnings: https://github.com/llvm/llvm-project/pull/152853)

    (also note that we currently use Clang 22.1.2 in the CI)

  10. in src/node/txrelay.h:27 in 8ed9eb0b24
      22 | +/**
      23 | + * Per-peer state for transaction relay. Manages bloom filter and
      24 | + * transaction inventory data, each protected by its own mutex.
      25 | + */
      26 | +struct TxRelay {
      27 | +    mutable Mutex m_bloom_filter_mutex;
    


    sedited commented at 12:11 PM on March 20, 2026:

    Can we make these private now? m_tx_inventory_mutex is still annotated outside this class, but I think that annotation can be dropped now?


    w0xlt commented at 9:41 PM on March 24, 2026:

    Done. Thanks.

  11. sedited commented at 12:58 PM on March 20, 2026: contributor

    Approach ACK

    I think I prefer this over a bare bones mutex type change. The changes in the first four commits are pretty straight forward and make sense to me. The last commit is a bit more involved, but git color coding the moved bits does help a bit. I'd be curious to see what others think.

  12. w0xlt force-pushed on Mar 24, 2026
  13. w0xlt commented at 9:42 PM on March 24, 2026: contributor

    ACQUIRED_BEFORE/ACQUIRED_AFTER annotations added. Tested with clang 18 and -Wthread-safety-beta.

  14. in src/net_processing.cpp:38 in 7ccd50834a
      34 | @@ -35,6 +35,7 @@
      35 |  #include <node/protocol_version.h>
      36 |  #include <node/timeoffsets.h>
      37 |  #include <node/txdownloadman.h>
      38 | +#include <node/txrelay.h>
    


    sedited commented at 12:13 PM on May 3, 2026:

    Nit (include order): This should go between txreconciliation and warnings.


    w0xlt commented at 7:01 PM on May 17, 2026:

    Done. Thanks.

  15. in src/node/txrelay.h:219 in 475366b55a outdated
     214 | +    /** Queue a transaction for relay if the version handshake is complete
     215 | +     *  and the peer doesn't already know about it. */
     216 | +    void PushInventory(const uint256& hash, const Wtxid& wtxid) EXCLUSIVE_LOCKS_REQUIRED(!m_tx_inventory_mutex)
     217 | +    {
     218 | +        LOCK(m_tx_inventory_mutex);
     219 | +        if (m_next_inv_send_time == 0s) return;
    


    sedited commented at 12:23 PM on May 3, 2026:

    I'm getting a clang tidy warning here (and for the other one):

    /home/user/bitcoin/src/node/txrelay.h:123:38: error: no matching literal operator for call to 'operator""s' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template [clang-diagnostic-error]
      123 |         if (m_next_inv_send_time == 0s) return;
          |                                      ^
    

    Can you do using namespace std::chrono_literals; like we do in random.h?


    w0xlt commented at 7:01 PM on May 17, 2026:

    Done. Thanks.

  16. in src/node/txrelay.h:80 in 475366b55a outdated
      77 | @@ -77,10 +78,16 @@ struct TxRelay {
      78 |          return true;
      79 |      }
      80 |  
    


    sedited commented at 12:47 PM on May 3, 2026:

    In commit 475366b55a9d8010ab81e82d7c5f4d0e95803bbf

    Can we make all the data fields private then?


    w0xlt commented at 7:01 PM on May 17, 2026:

    Done. Thanks.

  17. sedited approved
  18. sedited commented at 1:26 PM on May 3, 2026: contributor

    ACK 475366b55a9d8010ab81e82d7c5f4d0e95803bbf

    I'm not sure about the last commit. It is a bit noisy, but I also like the additional assurances and encapsulation.

  19. DrahtBot requested review from hebasto on May 3, 2026
  20. DrahtBot requested review from theuni on May 3, 2026
  21. w0xlt force-pushed on May 17, 2026
  22. sedited approved
  23. sedited commented at 10:28 AM on May 17, 2026: contributor

    re-ACK 67527c70975e1c3bb4918ba0ebeb4313d8c741be

  24. pablomartin4btc commented at 5:47 PM on June 7, 2026: member

    Concept ACK. The first commits look straightforward (as said earlier), and commit 5 seems to preserve the existing lock order while moving the remaining TxRelay state access in SendMessages() behind annotated helpers.

    Non-blocking/ follow-up idea: after this PR, it may be worth considering a small refactor of the tx-relay section of SendMessages(). The code could maybe be split into a few conceptual helpers for periodic relay scheduling, BIP35 mempool responses, and regular tx-inventory relay selection. I would not suggest perhaps one helper per comment/ filter condition, but grouping those areas into conceptual helpers could make SendMessages() cleaner and easier to follow, while keeping the change behaviour-preserving.

  25. pablomartin4btc commented at 5:17 AM on June 12, 2026: member

    ACK https://github.com/bitcoin/bitcoin/commit/67527c70975e1c3bb4918ba0ebeb4313d8c741be.

    Checked both m_bloom_filter_mutex and m_tx_inventory_mutex in Peer::TxRelay aren't acquired recursively, makes sense to move them into non-recursive mutexes (the 18 lock sites mentioned in the description are enumerated in the first commit body for reference).

    The PR introduces ACQUIRED_BEFORE(m_tx_inventory_mutex) on m_bloom_filter_mutex (and ACQUIRED_AFTER on the other side) - as suggested. This establishes a lock ordering that didn't exist before, that ordering annotation becomes load-bearing for deadlock prevention, not just documentation.

    Also, reviewed the replacement of direct member access to TxRelay internals with accessor methods. Members are now private — the only access path is through accessors that either self-lock (negative capability !mutex) or require the caller to hold the lock, with Clang TSA enforcing both at compile time.

  26. in src/net_processing.cpp:928 in 05b3bbcb0e outdated
     924 | @@ -925,7 +925,7 @@ class PeerManagerImpl final : public PeerManager
     925 |  
     926 |      /** Determine whether or not a peer can request a transaction, and return it (or nullptr if not found or not allowed). */
     927 |      CTransactionRef FindTxForGetData(const Peer::TxRelay& tx_relay, const GenTxid& gtxid)
     928 | -        EXCLUSIVE_LOCKS_REQUIRED(!m_most_recent_block_mutex, !tx_relay.m_tx_inventory_mutex);
     929 | +        EXCLUSIVE_LOCKS_REQUIRED(!m_most_recent_block_mutex, !tx_relay.GetTxInventoryMutex());
    


    maflcko commented at 11:19 AM on June 12, 2026:

    05b3bbcb0e2b7d66a8107c26b837cb68f4eb5dc0: idk, what is the point of this pull request?

    Just changing a RecursiveMutex to a Mutex without proper TSA doesn't make sense, imo. This just risks unsafe code down the line in the future.

    This pull is now moving locking to happen in different files/classes, and still exposes the mutex field globally, just like before.

    I don't see how this makes anything safer/easier.

    Either, this should be done properly or not at all. I know this isn't easy, which is probably why it hasn't been done for such a long time, but that shouldn't be an excuse to do it improperly.


    w0xlt commented at 8:56 AM on June 13, 2026:

    Updated. Direct TxRelay data access is gone and state changes now go through TSA-annotated helpers. The remaining mutex accessors are kept for the existing SendMessages() batching/lock-order annotations.

  27. in src/node/txrelay.h:78 in 05b3bbcb0e
      73 | +    template <typename Callable>
      74 | +    bool WithBloomFilterIfSet(Callable&& callable) EXCLUSIVE_LOCKS_REQUIRED(!m_bloom_filter_mutex)
      75 | +    {
      76 | +        LOCK(m_bloom_filter_mutex);
      77 | +        if (!m_bloom_filter) return false;
      78 | +        callable(*m_bloom_filter);
    


    maflcko commented at 11:20 AM on June 12, 2026:

    05b3bbcb0e2b7d66a8107c26b837cb68f4eb5dc0: Is this pattern even safe?

    Does clang TSA detect if callable were to lock the mutex and thus lead to UB?


    w0xlt commented at 8:56 AM on June 13, 2026:

    Done. Removed this callback pattern. The filtered-block path now uses TxRelay::MakeMerkleBlock() instead.


    maflcko commented at 10:30 AM on June 19, 2026:

    I think the pattern could work, but only if the mutex getter is not defined and only exists to provide compile-time annotations: #19303 (comment)


    w0xlt commented at 7:29 AM on June 20, 2026:

    Thanks, that makes sense. The current version no longer has the public mutex getter or the callback-under-lock pattern. The filtered block path now uses TxRelay::MakeMerkleBlock(), and the mutexes are not exposed, so I think this concern is now addressed.

  28. maflcko commented at 11:21 AM on June 12, 2026: member
  29. w0xlt force-pushed on Jun 13, 2026
  30. w0xlt commented at 8:56 AM on June 13, 2026: contributor

    Update after the latest changes:

    The specific callback/TSA concern raised above has been addressed: WithBloomFilterIfSet() was removed, and the filtered-block path now uses TxRelay::MakeMerkleBlock(), so CMerkleBlock construction happens directly under TxRelay's bloom-filter mutex without passing an arbitrary callable while the mutex is held.

    The broader goal here is not just “remove RecursiveMutex” for its own sake. The intent is to make the locking logic easier to follow. Compared to the original minimal mutex-type change, the current branch now:

    • moves TxRelay into its own component,
    • makes the guarded data private,
    • routes state transitions through named, TSA-annotated methods,
    • documents/enforces the m_tx_inventory_mutex -> m_bloom_filter_mutex lock order,
    • adds unit coverage and a fuzz target for the extracted TxRelay behavior.

    There is still some intentional mutex exposure through GetBloomFilterMutex() / GetTxInventoryMutex() because SendMessages() currently batches work while holding locks across multiple relay-policy steps.

    I think keeping that structure avoids a larger behavioral refactor in this PR, while still making the existing locking more explicit and reviewable.

    This should make future follow-ups easier, since TxRelay state transitions are now centralized, annotated, and independently tested.

  31. w0xlt commented at 1:24 AM on June 14, 2026: contributor

    Regarding the new txrelay fuzz target, it exercises the public TxRelay API while tracking the expected relay state and checking TxRelay against it after each operation.

    It covers bloom-filter lifecycle, merkle-block creation, known-inventory handling, queued announcement drain/erase paths, BIP35 mempool one-shot handling, fee-filter, and sequence bookkeeping while accounting for rolling-bloom false positives.

  32. maflcko commented at 7:55 AM on June 15, 2026: member

    The broader goal here is not just “remove RecursiveMutex” for its own sake. The intent is to make the locking logic easier to follow. Compared to the original minimal mutex-type change, the current branch now:

    * moves `TxRelay` into its own component,
    
    * makes the guarded data private,
    
    * routes state transitions through named, TSA-annotated methods,
    
    * documents/enforces the `m_tx_inventory_mutex -> m_bloom_filter_mutex` lock order,
    
    * adds unit coverage and a fuzz target for the extracted `TxRelay` behavior.

    There is still some intentional mutex exposure through GetBloomFilterMutex() / GetTxInventoryMutex() because SendMessages() currently batches work while holding locks across multiple relay-policy steps.

    Yeah, and those aren't check with Clang TSA, are they? I am just not convinced that moving to a non-recursive mutex is useful, when it is less safe and requires manual review to avoid UB.

    All of the changes you list have nothing to do with the mutex change. It is possible to do all of the changes independently while keeping the recursive mutex.

    Some of the changes are required to make the mutex non-recursive, but I don't think they are sufficient. IIUC making the data and the mutex private (and not exposing references to them) are required changes. However, by exposing references to them in a public getter, those changes are basically undone.

    I know this pattern may not have been followed in the past (in #19303 related pulls and elsewhere), but I think there is value in first defining a set of possible safe code patterns, and then apply them consistently and consequently. IIUC the pattern is that non-recursive mutexes can be used for fully private fields and that recursive mutexes should be used for globals/public fields? If so, then I think mixing the patterns does not make sense.

  33. w0xlt force-pushed on Jun 16, 2026
  34. w0xlt renamed this:
    net: refactor: replace Peer::TxRelay RecursiveMutex instances with Mutex
    net: encapsulate TxRelay state and replace recursive mutexes
    on Jun 16, 2026
  35. DrahtBot added the label CI failed on Jun 16, 2026
  36. DrahtBot commented at 7:58 PM on June 16, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task test ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/27640198801/job/81737890148</sub> <sub>LLM reason (✨ experimental): CI failed at compile time because src/test/txrelay_tests.cpp references private members of node::TxRelay (e.g., m_tx_inventory_mutex, m_next_inv_send_time, m_send_mempool), causing clang++ errors.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  37. refactor: extract node TxRelay helpers
    Move the per-peer transaction relay state and helper methods out of
    net_processing.cpp into a new node::TxRelay type.
    
    Keep the existing locking model and behavior intact while giving the
    transaction relay code a single home. Net processing still owns
    the relay flow. This commit only moves the state, bloom-filter
    helpers, and inventory helpers so later commits can tighten the
    interface without mixing in behavior changes.
    fea38e8583
  38. test: add node TxRelay unit tests
    Add focused unit coverage for node::TxRelay's public behavior:
    bloom-filter lifecycle, inventory gating before the first send schedule,
    known-inventory deduplication, state accessors, and BIP37 merkle block
    creation.
    
    These tests document behavior preserved by the extraction and provide
    guardrails for later private-state and locking changes.
    29e266efe9
  39. fuzz: add node TxRelay target
    Add a fuzz target that exercises TxRelay's public operations in random
    order: relay flag changes, bloom-filter updates, known transaction
    inserts, send scheduling, inventory queueing, and fee-filter state.
    
    Seed the test random context because CRollingBloomFilter uses global
    random state. This keeps the target deterministic for a given fuzz
    input.
    66d60245ba
  40. refactor: snapshot TxRelay inventory in SendMessages
    Move the remaining SendMessages transaction-inventory access behind
    
    TxRelay methods.
    
    StartTxInventoryBatch() snapshots queued inventory while holding the
    
    TxRelay mutex, lets SendMessages drain the batch without holding that
    
    mutex, and ReturnTxInventory() merges unsent entries back into the live
    
    queue.
    
    This removes the remaining direct TxRelay state access from
    
    net_processing.cpp and makes the guarded TxRelay data private without
    
    changing intended relay behavior. Update the existing TxRelay unit tests
    
    to use the new public interface at the same point in history.
    d4abaf576b
  41. test: cover TxRelay inventory batches
    Update TxRelay tests and fuzz coverage for StartTxInventoryBatch() and
    
    ReturnTxInventory().
    
    The tests cover send-time scheduling, disabled-relay clearing, moving
    
    queued inventory into a batch, and returning unprocessed entries to the
    
    live queue.
    a50712d8dd
  42. refactor: hide TxRelay inventory batch storage
    Make TxInventoryBatch move-only and hide its backing std::set from
    net_processing.cpp.
    
    Expose semantic operations for batch state, queued candidate iteration,
    and candidate erasure. ReturnTxInventory() now takes the batch object
    itself so callers cannot depend on or move out its storage.
    0aa10d85aa
  43. test: cover opaque TxRelay inventory batches
    Update the fuzz target and unit tests to use the opaque TxInventoryBatch
    API.
    
    Add conservation checks for repeated snapshot/drain rounds and
    concurrent PushInventory() while a batch is being processed.
    50692ebcf3
  44. refactor: use Mutex in TxRelay
    Replace TxRelay's RecursiveMutex members with plain Mutex now that
    callers no longer take those locks directly.
    
    Annotate the bloom-filter mutex as acquired after the inventory mutex.
    This matches the only nested locking path in StartTxInventoryBatch(),
    makes the lock order explicit, and prevents accidental recursive locking
    from creeping back in.
    d88df07a82
  45. w0xlt force-pushed on Jun 17, 2026
  46. w0xlt commented at 8:16 PM on June 17, 2026: contributor

    Thanks for the feedback. The concern is valid.

    The original goal of this PR was fairly small, just replacing those variables with Mutex, and I was trying to keep the change scoped and avoid a larger TxRelay refactor.

    To address the mutex exposure issue, I reworked the branch so that encapsulation comes first, with the mutex replacement as the final step.

    The public mutex accessors have been removed, the guarded state is now private, and net_processing.cpp no longer locks or mutates TxRelay internals directly.

    The SendMessages() path now uses an opaque TxInventoryBatch to snapshot and retrieve queued inventory from TxRelay.

    I updated the PR title and description to better reflect the current scope. The PR has evolved beyond a simple mutex replacement and now introduces a new networking component.

  47. w0xlt commented at 8:16 PM on June 17, 2026: contributor

    Unrelated CI error

  48. maflcko removed the label CI failed on Jun 19, 2026
  49. DrahtBot added the label CI failed on Jun 19, 2026
  50. DrahtBot commented at 10:56 AM on June 19, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task test ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/27707284284/job/82331997608</sub> <sub>LLM reason (✨ experimental): CI failed to build the fuzz target because src/test/fuzz/txrelay.cpp tries to access private node::TxRelay members (m_tx_inventory_mutex, m_send_mempool, m_next_inv_send_time), producing compilation errors under -Werror.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-21 02:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me