net: make m_nodes_mutex non-recursive #32394

pull vasild wants to merge 2 commits into bitcoin:master from vasild:m_nodes_mutex changing 2 files +69 −49
  1. vasild commented at 12:04 pm on May 1, 2025: contributor

    The only case of a recursive lock was a nested ForNode() call to trim the size of lNodesAnnouncingHeaderAndIDs to <= 3. This need not be nested, so take it out.

    Before:

    0fornode(newnode)
    1    if (size >= 3)
    2        fornode(front) handle removal of front
    3        pop front
    4    push back newnode
    

    After:

    0fornode(newnode)
    1    push back newnode
    2if (size > 3)
    3    fornode(front) handle removal of front
    4    pop front
    

    lNodesAnnouncingHeaderAndIDs is protected by cs_main which is locked during the entire operation.

    Partially resolves: #19303


    This PR included #32326 (first 3 commits in this PR). That PR was merged first, so the size of this was reduced.

  2. DrahtBot commented at 12:04 pm on May 1, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32394.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator
    Concept ACK hebasto

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32983 (rpc: refactor: use string_view in Arg/MaybeArg by stickies-v)
    • #32278 (doc: better document NetEventsInterface and the deletion of “CNode"s by vasild)
    • #32065 (i2p: make a time gap between creating transient sessions and using them by vasild)
    • #32015 (net: replace manual reference counting of CNode with shared_ptr by vasild)
    • #30988 (Split CConnman by vasild)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
    • #28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on May 1, 2025
  4. Bonobirho99 commented at 1:21 pm on May 1, 2025: none
    .
  5. hebasto commented at 2:18 pm on May 1, 2025: member

    Partially resolves: #19303

    Concept ACK on that.

  6. DrahtBot added the label Needs rebase on May 20, 2025
  7. vasild force-pushed on May 21, 2025
  8. vasild commented at 11:33 am on May 21, 2025: contributor
    98bba932bf...22d4c57a99: rebase due to conflicts
  9. DrahtBot removed the label Needs rebase on May 21, 2025
  10. DrahtBot added the label CI failed on Jul 30, 2025
  11. DrahtBot removed the label CI failed on Jul 30, 2025
  12. DrahtBot added the label Needs rebase on Sep 17, 2025
  13. vasild force-pushed on Sep 29, 2025
  14. vasild commented at 1:00 pm on September 29, 2025: contributor
    22d4c57a99...9a49877f26: rebase due to conflicts
  15. DrahtBot removed the label Needs rebase on Sep 29, 2025
  16. DrahtBot added the label Needs rebase on Oct 1, 2025
  17. hebasto commented at 10:53 am on October 2, 2025: member
    Time to rebase once more?
  18. vasild force-pushed on Oct 2, 2025
  19. vasild commented at 10:59 am on October 2, 2025: contributor
    9a49877f26...aaa75c1a41: rebase due to conflicts, also remove the first 3 commits from where because they were merged via #32326. Now this PR is just one commit. Also clarify a bit the commit message and the PR description.
  20. DrahtBot removed the label Needs rebase on Oct 2, 2025
  21. DrahtBot added the label Needs rebase on Oct 6, 2025
  22. vasild force-pushed on Oct 7, 2025
  23. vasild commented at 9:08 am on October 7, 2025: contributor
    aaa75c1a41...492be23a18: rebase due to conflicts
  24. DrahtBot removed the label Needs rebase on Oct 7, 2025
  25. in src/net_processing.cpp:1288 in 492be23a18 outdated
    1284@@ -1285,23 +1285,23 @@ void PeerManagerImpl::MaybeSetPeerAsAnnouncingHeaderAndIDs(NodeId nodeid)
    1285     }
    1286     m_connman.ForNode(nodeid, [this](CNode* pfrom) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    1287         AssertLockHeld(::cs_main);
    1288-        if (lNodesAnnouncingHeaderAndIDs.size() >= 3) {
    


    l0rinc commented at 6:03 pm on October 9, 2025:
    there are a lot of changes in a single commit that seem not strictly related. Could you maybe extract low-risk refactorings to separate commits so that the more riskier ones are as simple as possible? I prefer the commits telling me a story for why we’re doing the change, but here I see a lot of changes that aren’t guiding me.

    hodlinator commented at 7:31 pm on October 14, 2025:

    Tried cherrypicking

     0diff --git a/src/net.h b/src/net.h
     1index d806b42a1e..f997bbe820 100644
     2--- a/src/net.h
     3+++ b/src/net.h
     4@@ -1519,7 +1519,7 @@ private:
     5     mutable Mutex m_added_nodes_mutex;
     6     std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
     7     std::list<CNode*> m_nodes_disconnected;
     8-    mutable RecursiveMutex m_nodes_mutex;
     9+    mutable Mutex m_nodes_mutex;
    10     std::atomic<NodeId> nLastNodeId{0};
    11     unsigned int nPrevNodeCount{0};
    12 
    13diff --git a/src/net_processing.cpp b/src/net_processing.cpp
    14index 3777832215..20dcca2ca7 100644
    15--- a/src/net_processing.cpp
    16+++ b/src/net_processing.cpp
    17@@ -1285,23 +1285,23 @@ void PeerManagerImpl::MaybeSetPeerAsAnnouncingHeaderAndIDs(NodeId nodeid)
    18     }
    19     m_connman.ForNode(nodeid, [this](CNode* pfrom) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    20         AssertLockHeld(::cs_main);
    21-        if (lNodesAnnouncingHeaderAndIDs.size() >= 3) {
    22-            // As per BIP152, we only get 3 of our peers to announce
    23-            // blocks using compact encodings.
    24-            m_connman.ForNode(lNodesAnnouncingHeaderAndIDs.front(), [this](CNode* pnodeStop){
    25-                MakeAndPushMessage(*pnodeStop, NetMsgType::SENDCMPCT, /*high_bandwidth=*/false, /*version=*/CMPCTBLOCKS_VERSION);
    26-                // save BIP152 bandwidth state: we select peer to be low-bandwidth
    27-                pnodeStop->m_bip152_highbandwidth_to = false;
    28-                return true;
    29-            });
    30-            lNodesAnnouncingHeaderAndIDs.pop_front();
    31-        }
    32         MakeAndPushMessage(*pfrom, NetMsgType::SENDCMPCT, /*high_bandwidth=*/true, /*version=*/CMPCTBLOCKS_VERSION);
    33         // save BIP152 bandwidth state: we select peer to be high-bandwidth
    34         pfrom->m_bip152_highbandwidth_to = true;
    35         lNodesAnnouncingHeaderAndIDs.push_back(pfrom->GetId());
    36         return true;
    37     });
    38+    if (lNodesAnnouncingHeaderAndIDs.size() > 3) {
    39+        // As per BIP152, we only get 3 of our peers to announce
    40+        // blocks using compact encodings.
    41+        m_connman.ForNode(lNodesAnnouncingHeaderAndIDs.front(), [this](CNode* pnodeStop){
    42+            MakeAndPushMessage(*pnodeStop, NetMsgType::SENDCMPCT, /*high_bandwidth=*/false, /*version=*/CMPCTBLOCKS_VERSION);
    43+            // save BIP152 bandwidth state: we select peer to be low-bandwidth
    44+            pnodeStop->m_bip152_highbandwidth_to = false;
    45+            return true;
    46+        });
    47+        lNodesAnnouncingHeaderAndIDs.pop_front();
    48+    }
    49 }
    50 
    51 bool PeerManagerImpl::TipMayBeStale()
    

    …and building with Clang 19.1.7 on Linux results in a lot of warnings spam:

     0In file included from ../src/node/peerman_args.cpp:5:
     1In file included from ../src/node/peerman_args.h:8:
     2In file included from ../src/net_processing.h:10:
     3../src/net.h:1186:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
     4 1186 |         LOCK(m_nodes_mutex);
     5      |         ^
     6../src/sync.h:259:56: note: expanded from macro 'LOCK'
     7  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
     8      |                                                        ^
     9In file included from ../src/node/peerman_args.cpp:5:
    10In file included from ../src/node/peerman_args.h:8:
    11In file included from ../src/net_processing.h:10:
    12../src/net.h:1195:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    13 1195 |         LOCK(m_nodes_mutex);
    14      |         ^
    15../src/sync.h:259:56: note: expanded from macro 'LOCK'
    16  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    17      |                                                        ^
    182 warnings generated.
    19[3/26] Building CXX object src/CMakeFiles/bitcoin_node.dir/node/txreconciliation.cpp.o
    20In file included from ../src/node/txreconciliation.cpp:5:
    21In file included from ../src/node/txreconciliation.h:8:
    22../src/net.h:1186:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    23 1186 |         LOCK(m_nodes_mutex);
    24      |         ^
    25../src/sync.h:259:56: note: expanded from macro 'LOCK'
    26  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    27      |                                                        ^
    28In file included from ../src/node/txreconciliation.cpp:5:
    29In file included from ../src/node/txreconciliation.h:8:
    30../src/net.h:1195:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    31 1195 |         LOCK(m_nodes_mutex);
    32      |         ^
    33../src/sync.h:259:56: note: expanded from macro 'LOCK'
    34  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    35      |                                                        ^
    362 warnings generated.
    37[4/26] Building CXX object src/CMakeFiles/bitcoin_node.dir/mapport.cpp.o
    38In file included from ../src/mapport.cpp:12:
    39../src/net.h:1186:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    40 1186 |         LOCK(m_nodes_mutex);
    41      |         ^
    42../src/sync.h:259:56: note: expanded from macro 'LOCK'
    43  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    44      |                                                        ^
    45In file included from ../src/mapport.cpp:12:
    46../src/net.h:1195:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    47 1195 |         LOCK(m_nodes_mutex);
    48      |         ^
    49../src/sync.h:259:56: note: expanded from macro 'LOCK'
    50  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    51      |                                                        ^
    522 warnings generated.
    53[5/26] Building CXX object src/CMakeFiles/bitcoin_node.dir/headerssync.cpp.o
    54In file included from ../src/headerssync.cpp:5:
    55In file included from ../src/headerssync.h:11:
    56../src/net.h:1186:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    57 1186 |         LOCK(m_nodes_mutex);
    58      |         ^
    59../src/sync.h:259:56: note: expanded from macro 'LOCK'
    60  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    61      |                                                        ^
    62In file included from ../src/headerssync.cpp:5:
    63In file included from ../src/headerssync.h:11:
    64../src/net.h:1195:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    65 1195 |         LOCK(m_nodes_mutex);
    66      |         ^
    67../src/sync.h:259:56: note: expanded from macro 'LOCK'
    68  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    69      |                                                        ^
    702 warnings generated.
    71[6/26] Building CXX object src/CMakeFiles/bitcoin_node.dir/node/context.cpp.o
    72In file included from ../src/node/context.cpp:13:
    73../src/net.h:1186:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    74 1186 |         LOCK(m_nodes_mutex);
    75      |         ^
    76../src/sync.h:259:56: note: expanded from macro 'LOCK'
    77  259 | #define LOCK(cs) UniqueLock UNIQUE_NAME(criticalblock)(MaybeCheckNotHeld(cs), #cs, __FILE__, __LINE__)
    78      |                                                        ^
    79In file included from ../src/node/context.cpp:13:
    80../src/net.h:1195:9: warning: calling function 'MaybeCheckNotHeld' requires negative capability '!m_nodes_mutex' [-Wthread-safety-analysis]
    81 1195 |         LOCK(m_nodes_mutex);
    82      |         ^
    83...
    

    So I think it’s quite natural to want to add all the annotations to silence the warnings within the same commit.


    vasild commented at 2:06 pm on October 15, 2025:

    This PR includes:

    1. a change around lNodesAnnouncingHeaderAndIDs to remove the only recursive usage of the mutex
    2. change of the type from RecursiveMutex to Mutex
    3. a pile of annotations to keep the compiler happy after 2.

    Extracted 1. into a separate commit and kept 2. and 3. together in a single commit: “it’s quite natural to want to add all the annotations to silence the warnings within the same commit”

  26. hodlinator commented at 7:39 pm on October 14, 2025: contributor
    Concept ACK 492be23a1873d7165e6cc75d1282f4feb63f87b9
  27. net: drop the only recursive usage of CConnman::m_nodes_mutex
    The only recursive usage of `CConnman::m_nodes_mutex` is from
    `PeerManagerImpl::MaybeSetPeerAsAnnouncingHeaderAndIDs()` which uses
    nested calls to `CConnman::ForNode()` to trim the size of
    `lNodesAnnouncingHeaderAndIDs` to `<= 3`. This need not be nested, so
    take it out.
    
    Before:
    ```
    fornode(newnode)
        if (size >= 3)
            fornode(front) handle removal of front
            pop front
        push back newnode
    ```
    
    After:
    ```
    fornode(newnode)
        push back newnode
    if (size > 3)
        fornode(front) handle removal of front
        pop front
    ```
    
    `lNodesAnnouncingHeaderAndIDs` is protected by `cs_main` which is locked
    during the entire operation.
    de2065b965
  28. vasild force-pushed on Oct 15, 2025
  29. vasild commented at 2:03 pm on October 15, 2025: contributor
    492be23a18...4aad3714d6: split in two commits: #32394 (review)
  30. in src/net_processing.cpp:1297 in 4aad3714d6 outdated
    1292-                MakeAndPushMessage(*pnodeStop, NetMsgType::SENDCMPCT, /*high_bandwidth=*/false, /*version=*/CMPCTBLOCKS_VERSION);
    1293-                // save BIP152 bandwidth state: we select peer to be low-bandwidth
    1294-                pnodeStop->m_bip152_highbandwidth_to = false;
    1295-                return true;
    1296-            });
    1297-            lNodesAnnouncingHeaderAndIDs.pop_front();
    


    hodlinator commented at 3:41 pm on October 15, 2025:

    nit: On master, we will be holding on to the m_nodes_mutex as we start the lookup and removal for this older node at lNodesAnnouncingHeaderAndIDs.front(). This older node can be removed from CConnman::m_nodes just before we execute an outer ForNode (while not holding m_nodes_mutex).

    Don’t think there is any risk to how it works on master or in this PR, we .pop_front() the old entry regardless of whether we find it or not. But might still be nice to add

    0    Assume(lNodesAnnouncingHeaderAndIDs.size() <= 3);
    

    as a post-condition for the function to document/communicate expectations.


    vasild commented at 8:36 am on October 16, 2025:

    Such an Assume() would make for stricter requirements, even in master, or more complex reasoning to confirm that it does not make stricter requirements. I mean, can lNodesAnnouncingHeaderAndIDs have e.g. 5 elements when this code snippet starts? It is not obvious by just looking at this code, so one has to assume that it can or study the code elsewhere to confirm that it cannot have 4 or more.

    So, both in master and in this PR such an Assume() would be triggered if lNodesAnnouncingHeaderAndIDs has e.g. 5 elements.

    The logic in master is - add one object and remove one object if they are too many. This PR keeps it that way.


    hodlinator commented at 8:52 pm on October 18, 2025:

    Agree that one cannot know unless one greps the entire codebase for insertions to lNodesAnnouncingHeaderAndIDs. Currently this is the only function. But worst case, some parallel pull request could introduce another function that inserts more elements than 3. Using a dev-only Assume instead of assert as I suggested would make it less of a hazard.

    Maybe it should be a precondition as well. Anyways, just a minor suggestion to attempt making existing tests against number 3 less magical.

  31. in src/net.h:1736 in 4aad3714d6


    hodlinator commented at 6:41 pm on October 15, 2025:

    Shouldn’t we also annotate this one with

    0EXCLUSIVE_LOCKS_REQUIRED(!connman.m_nodes_mutex)
    

    ? …which leads to also annotating ThreadMessageHandler.


    vasild commented at 8:42 am on October 16, 2025:
    Yes, good catch! Done.
  32. hodlinator commented at 6:51 pm on October 15, 2025: contributor

    Reviewed 4aad3714d66cc80c08cb1c827fb49fba7dcf8d50

    All annotations make sense, but left Q regarding 1-2 potentially missing ones.

  33. net: make CConnman::m_nodes_mutex non-recursive
    This change includes `s/RecursiveMutex/Mutex/` and a pile of
    annotations to keep the compiler happy after the type change.
    
    Partially resolves: https://github.com/bitcoin/bitcoin/issues/19303
    4b3a2c2360
  34. vasild force-pushed on Oct 16, 2025
  35. vasild commented at 8:42 am on October 16, 2025: contributor
    4aad3714d6...4b3a2c2360: do #32394 (review)
  36. hodlinator approved
  37. hodlinator commented at 9:00 pm on October 18, 2025: contributor

    ACK 4b3a2c23608e709a63b9d5bd69c3eb16cf08e462

    Manually went through all functions using CConnman::m_nodes_mutex and only remaining issue I could find is now fixed (https://github.com/bitcoin/bitcoin/pull/32394/files#r2433600255).

    Also checked out first commit and only cherrypicked mutex type change to re-confirm Clang spews warnings about missing annotations (and no warnings at HEAD of PR).

  38. DrahtBot requested review from hebasto on Oct 18, 2025
  39. DrahtBot added the label Needs rebase on Oct 24, 2025
  40. DrahtBot commented at 7:06 pm on October 24, 2025: contributor
    🐙 This pull request conflicts with the target branch and needs rebase.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-10-31 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me