net: replace manual reference counting of CNode with shared_ptr #32015

pull vasild wants to merge 1 commits into bitcoin:master from vasild:cnode_shared_ptr changing 11 files +191 −258
  1. vasild commented at 9:53 am on March 7, 2025: contributor

    Before this change the code used to count references to CNode objects manually via CNode::nRefCount. Unneeded CNodes were scheduled for deletion by putting them in CConnman::m_nodes_disconnected and were deleted after their reference count reached zero. Deleting consists of calling PeerManager::FinalizeNode() and destroying the CNode object.

    Replace this scheme with std::shared_ptr. This simplifies the code and removes: CNode::nRefCount CNode::GetRefCount() CNode::AddRef() CNode::Release() CConnman::m_nodes_disconnected CConnman::NodesSnapshot

    Now creating a snapshot of CConnman::m_nodes is done by simply copying it (under the mutex).

    Call PeerManager::FinalizeNode() from the destructor of CNode, which is called when the reference count reaches 0.


    This has some history in #28222 and #10738. See below #32015 (comment) and #32015 (comment).

  2. DrahtBot commented at 9:53 am on March 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32015.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32073 (net: Block v2->v1 transport downgrade if !fNetworkActive by hodlinator)
    • #32065 (i2p: make a time gap between creating transient sessions and using them by vasild)
    • #32061 ([draft] Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
    • #31672 (rpc: add cpu_load to getpeerinfo by vasild)
    • #30988 (Split CConnman by vasild)
    • #30951 (net: option to disallow v1 connection on ipv4 and ipv6 peers by stratospher)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
    • #28584 (Fuzz: extend CConnman tests by vasild)
    • #28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Mar 7, 2025
  4. vasild commented at 9:53 am on March 7, 2025: contributor
  5. vasild commented at 9:55 am on March 7, 2025: contributor
    cc @hodlinator, you might be interested in this as well
  6. fanquake commented at 10:13 am on March 7, 2025: member

    https://github.com/bitcoin/bitcoin/actions/runs/13718139296/job/38367326902?pr=32015#step:7:1603:

    0Run p2p_headers_presync with args ['/Users/runner/work/bitcoin/bitcoin/ci/scratch/build-aarch64-apple-darwin23.6.0/src/test/fuzz/fuzz', PosixPath('/Users/runner/work/bitcoin/bitcoin/ci/scratch/qa-assets/fuzz_corpora/p2p_headers_presync')]net_processing.cpp:295 SetTxRelay: Assertion `!m_tx_relay' failed.
    1Error processing input "/Users/runner/work/bitcoin/bitcoin/ci/scratch/qa-assets/fuzz_corpora/p2p_headers_presync/c12f87158cda99b4fb2e1fe7312af392a1f4634a"
    2
    3net_processing.cpp:295 SetTxRelay: Assertion `!m_tx_relay' failed.
    4Error processing input "/Users/runner/work/bitcoin/bitcoin/ci/scratch/qa-assets/fuzz_corpora/p2p_headers_presync/c12f87158cda99b4fb2e1fe7312af392a1f4634a"
    
  7. in src/net.cpp:1935 in 2747f135be outdated
    1941+
    1942+                // Keep a reference to this CNode object to delay its destruction until
    1943+                // after m_nodes_mutex has been released. Destructing a node involves
    1944+                // calling m_msgproc->FinalizeNode() which acquires cs_main. Lock order
    1945+                // should be cs_main, m_nodes_mutex.
    1946+                disconnected_nodes.push_back(node);
    


    dergoegge commented at 11:03 am on March 7, 2025:

    Now that ~CNode is responsible for calling FinalizeNode, the guarantees around which thread does the actual cleanup are lost. It’s basically just whichever thread ends up holding the last instance of a CNode. This seems bug prone.

    E.g. there are no guarantees (afaict) that disconnected_nodes will actually be the last owner.


    vasild commented at 1:29 pm on March 7, 2025:

    The idea is that it does not matter which thread will destroy the CNode objects.

    there are no guarantees (afaict) that disconnected_nodes will actually be the last owner.

    That is right, but there is no need for such a guarantee. disconnected_nodes is to guarantee that the objects will not be destroyed while holding m_nodes_mutex.


    hodlinator commented at 11:06 pm on March 7, 2025:

    In thread #32015 (review):

    Maybe one could have an additional vector<shared_ptr<CNode>> member in CConnman beyond m_nodes that always keeps the shared_ptr::use_count() above 0. Then use removal from that vector (when use_count() == 1) for controlled deletion with enforced lock order, more like how the code was before?


    vasild commented at 8:55 am on March 11, 2025:

    I do not think that is necessary. I mean - if we would assume that the code is going to be modified in a careless way to break the logic, in other words to destroy CNode objects while holding m_nodes_mutex, then we might as well assume that the code could be modified in a way that breaks the manual reference counting. IMO the manual reference counting is more fragile and error prone.

    Further, if the code is changed to destroy CNode objects while holding m_nodes_mutex, then 83 functional tests fail. I think it is safe to assume that such a thing will not go unnoticed.


    hodlinator commented at 8:18 pm on March 11, 2025:

    In thread #32015 (review):

    Started proving out my suggestion and landed on something closer to master.

    • Resurrect m_nodes_disconnected (now holding shared_ptr) along with clearly defined lifetimes.
    • Resurrect DeleteNode(), but make it take an r-value shared_ptr and assert that use_count() == 1.
    • Avoid adding ~CNode() and CNode::m_destruct_cb along with sending it in everywhere in the constructor. This means we don’t need to touch 5 of the test files.

    https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/32015_suggestion

    Passes unit tests and functional tests.

  8. DrahtBot added the label CI failed on Mar 7, 2025
  9. DrahtBot commented at 11:14 am on March 7, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/38367328595

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  10. vasild force-pushed on Mar 7, 2025
  11. vasild commented at 1:23 pm on March 7, 2025: contributor
    2747f135be...e35a382388: fix the CI failure, thanks!
  12. vasild force-pushed on Mar 7, 2025
  13. DrahtBot removed the label CI failed on Mar 7, 2025
  14. in src/net.cpp:1888 in 36d932cebb outdated
    1883@@ -1884,8 +1884,15 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
    1884     } // no default case, so the compiler can warn about missing cases
    1885 
    1886     // Count existing connections
    1887-    int existing_connections = WITH_LOCK(m_nodes_mutex,
    1888-                                         return std::count_if(m_nodes.begin(), m_nodes.end(), [conn_type](CNode* node) { return node->m_conn_type == conn_type; }););
    1889+    const int existing_connections = WITH_LOCK(m_nodes_mutex,
    1890+        return std::count_if(
    


    hodlinator commented at 10:24 pm on March 7, 2025:
    nit: Might as well use std::ranges::count_if here and in GetFullOutboundConnCount.

    vasild commented at 4:06 pm on March 10, 2025:
    Indeed, done, thanks!
  15. hodlinator commented at 11:13 pm on March 7, 2025: contributor
    Concept ACK 36d932cebbb235a79e90575960646e86a23d39a8
  16. vasild force-pushed on Mar 10, 2025
  17. vasild commented at 4:07 pm on March 10, 2025: contributor
    36d932cebb...af622d00ba: address #32015 (review)
  18. purpleKarrot commented at 6:24 pm on March 10, 2025: contributor
    Instead of passing around shared pointers to mutable data (aka global variables in disguise), would it be possible to increase the level of abstraction and pass around values with regular semantics? Every C++ programmer should rewatch https://youtu.be/W2tWOdzgXHA from time to time.
  19. hodlinator commented at 7:51 pm on March 10, 2025: contributor

    Every C++ programmer should rewatch https://youtu.be/W2tWOdzgXHA from time to time.

    I remembered that as “the seminal std::rotate()-talk”, but it also has “No Raw Pointers” from 48:45. Probably stoked my hatred of smart pointers which led to #31713. To be fair, in the talk he also says that we used to do intrusive ref-counting… so this is at least one step away from that.

    Instead of passing around shared pointers to mutable data (aka global variables in disguise), would it be possible to increase the level of abstraction and pass around values with regular semantics?

    Guess one could have an InternalCNode which is the actual meat, and make the public CNode instances hold shared_ptr<InternalCNode>. Not sure how well that plays out. Maybe you @purpleKarrot could take a stab at something along the lines of what you pointed to? Keep in mind that we try to minimize the number of lines touched in a change unless necessary.

  20. purpleKarrot commented at 9:08 pm on March 10, 2025: contributor
    After reviewing the code line by line, I realized that the only places where AddRef was ever called, was directly after construction (plus one place in the fuzz tests, not sure why that is needed). It looks like the CNode instances are not actually shared (good news). Could it be passed around by std::unique_ptr instead? This would greatly simplify future refactoring.
  21. in src/net.cpp:1717 in af622d00ba outdated
    1713@@ -1716,7 +1714,7 @@ bool CConnman::AttemptToEvictConnection()
    1714         return false;
    1715     }
    1716     LOCK(m_nodes_mutex);
    1717-    for (CNode* pnode : m_nodes) {
    1718+    for (const auto& pnode : m_nodes) {
    


    hodlinator commented at 10:01 pm on March 10, 2025:

    Not sure about adding non-transitive const here since we set pnode->fDisconnect inside the loop.

    Similar case in DisconnectNodes()-loop.


    vasild commented at 11:55 am on March 11, 2025:
    Removed the const.
  22. dergoegge commented at 10:15 pm on March 10, 2025: member

    realized that the only places where AddRef was ever called, was directly after construction

    It is also called in the RAII helper NodesSnapshot which is used in different threads (i.e. “net” and “msghand”), so unique_ptr won’t work unfortunately.

    plus one place in the fuzz tests, not sure why that is needed

    It’s probably just there to have it “covered”.

  23. in src/test/util/net.h:57 in af622d00ba outdated
    53@@ -54,20 +54,19 @@ struct ConnmanTestMsg : public CConnman {
    54     void AddTestNode(CNode& node)
    55     {
    56         LOCK(m_nodes_mutex);
    57-        m_nodes.push_back(&node);
    58+        m_nodes.push_back(std::shared_ptr<CNode>(&node));
    


    purpleKarrot commented at 10:41 pm on March 10, 2025:

    This looks suspicious. No custom deleter is passed to the shared_ptr constructor, so it defaults to default_delete which will delete ptr; eventually. Is the argument CNode& node guaranteed to be heap allocated? Otherwise:

    0m_nodes.push_back(std::shared_ptr<CNode>(&node, [](CNode*){}));
    

    vasild commented at 10:25 am on March 11, 2025:

    Yes, the argument CNode& node is always heap allocated. Not sure about “guaranteed”. If not heap allocated, then this will be a gross error, resulting in an immediate crash or an error report by Valgrind or a memory sanitizer.

    Note that the semantic of this piece of code is not changed by this PR. It was the same before as well - it would eventually delete the object.

  24. in src/net.cpp:438 in af622d00ba outdated
    435@@ -438,8 +436,7 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    436                 // It is possible that we already have a connection to the IP/port pszDest resolved to.
    437                 // In that case, drop the connection that was just created.
    438                 LOCK(m_nodes_mutex);
    


    hodlinator commented at 10:52 pm on March 10, 2025:
    Since we aren’t doing anything with pnode anymore, we should be able to remove this LOCK() too, avoiding re-entrant locking inside FindNode().

    vasild commented at 12:00 pm on March 11, 2025:
    Right! Removed.
  25. in src/net.cpp:335 in af622d00ba outdated
    331@@ -332,32 +332,32 @@ bool IsLocal(const CService& addr)
    332     return mapLocalHost.count(addr) > 0;
    333 }
    334 
    335-CNode* CConnman::FindNode(const CNetAddr& ip)
    


    hodlinator commented at 10:55 pm on March 10, 2025:
    It is notable how unsafe the design was for the FindNode()-methods before switching to shared_ptr. Returning a raw pointer as we let go of the mutex protecting it.

    purpleKarrot commented at 11:15 pm on March 10, 2025:

    This is not a raw pointer, but an observer (“Raw pointer” is coined by Sean Parent as “anything that implies ownership”, which explicitly includes shared_ptr. Stricly speaking, changing CNode* to std::shared_ptr<CNode> turns an observer into a raw pointer).

    I actually think that returning an observer is fine here, even after the change.


    hodlinator commented at 8:02 am on March 11, 2025:

    If that is the case I think Parent goes one step too far in redefining terms.

    Returning an raw C pointer is not an okay design, as another thread could potentially come along and delete whatever the returned pointer is referencing, resulting in use-after-free. This is mitigated by FindNode being private and node deletion being done in a controlled fashion on master. But it’s a loaded foot-gun IMO.

    Adding EXCLUSIVE_LOCKS_REQUIRED() would have been a different fix, forcing the caller to protect the returned value with the mutex.

    0CNode* FindNode(...) EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex)
    

    Introducing shared_ptr solves the use-after-free design issue. (Although I agree it would be great to get rid of shared_ptr).


    vasild commented at 12:10 pm on March 11, 2025:
    I agree with @hodlinator that this has lifetime issue in master and in this PR it does not. That issue was also discussed in another PR. Good to resolve it.
  26. purpleKarrot commented at 11:18 pm on March 10, 2025: contributor

    unique_ptr won’t work unfortunately.

    Got it. m_nodes needs to be able to be snapshotted. But that seems to be the only place where copies are made. FindNode should return non-owning observers rather than extend the ownership.

  27. vasild force-pushed on Mar 11, 2025
  28. vasild commented at 11:55 am on March 11, 2025: contributor
    af622d00ba...d233c7e999: address #32015 (review)
  29. vasild force-pushed on Mar 11, 2025
  30. vasild commented at 12:00 pm on March 11, 2025: contributor
    d233c7e999...4492abbf0a: rebase and address #32015 (review)
  31. in src/net.h:1361 in 4492abbf0a outdated
    1354@@ -1363,7 +1355,7 @@ class CConnman
    1355     bool AlreadyConnectedToAddress(const CAddress& addr);
    1356 
    1357     bool AttemptToEvictConnection();
    1358-    CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type, bool use_v2transport) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
    1359+    std::shared_ptr<CNode> ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type, bool use_v2transport) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
    1360     void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr, const std::vector<NetWhitelistPermissions>& ranges) const;
    1361 
    1362     void DeleteNode(CNode* pnode);
    


    hodlinator commented at 1:01 pm on March 11, 2025:
    Function body was removed.

    vasild commented at 5:07 am on March 12, 2025:
    Removed the declaration as well, thanks!
  32. in src/net.cpp:3478 in 4492abbf0a outdated
    3463     WITH_LOCK(m_nodes_mutex, nodes.swap(m_nodes));
    3464-    for (CNode* pnode : nodes) {
    3465+    for (auto& pnode : nodes) {
    3466         LogDebug(BCLog::NET, "Stopping node, %s", pnode->DisconnectMsg(fLogIPs));
    3467         pnode->CloseSocketDisconnect();
    3468-        DeleteNode(pnode);
    


    hodlinator commented at 1:14 pm on March 11, 2025:
    Seems brittle on master that we just delete the node here without checking the refcount. From what I see we only call StopNodes() from tests and CConnman::Stop(), and we only create NodesSnapshots inside threads we’ve already stopped earlier in Stop().
  33. vasild force-pushed on Mar 12, 2025
  34. vasild commented at 5:06 am on March 12, 2025: contributor
    4492abbf0a...8c0ce1ca1c: address #32015 (review)
  35. Sjors commented at 10:24 am on March 12, 2025: member

    This seems like a nice simplification. It would be good to do before nobody remembers what the old code was trying to do (and prevent). Since @theUni tried something similar before in #10738 it would be great if he can look at this again.

    In particular the 2017 PR claimed “special care must be taken” as to which thread destroys the object, but @vasild believes this doesn’t matter (here): #32015 (review)

    The end result of that PR seems a lot more complicated than what this PR does, and perhaps that’s part of why it didn’t get a huge amount of review and was closed eventually? @willcl-ark then tried again in 2023 in #28222 (initially not knowing about the earlier attempt). Similar concerns about threading were raised: #28222#pullrequestreview-1565029167

    It seems that was never really settled and the PR died.

    It would be good to be address this in the PR description to prevent going in circles.

  36. vasild commented at 1:18 pm on March 12, 2025: contributor

    Some comments from the other PRs:

    from #10738#issue-240283546:

    Special care must be taken, though, to only delete CNodes from a single thread, and to control which thread that is.

    In master deletion of CNode objects could happen in the net thread or in the main thread. In other words, some CNode object might be destroyed in the net thread and another CNode object might be destroyed in the main thread. I do not see any problem with that. Of course we do not destroy the same object from more than one thread, in master or in this PR. With this PR deletion might as well happen in the msghand thread (when destroying the temporary copy of m_nodes in ThreadMessageHandler()). That is fine as well, correct me if I am wrong.

    from #28222#pullrequestreview-1565029167:

    which thread actually deletes the CNode.

    The reason why that is important is because we only want to call FinalizeNode once and we want there to be no other references to that CNode at that time.

    This PR achieves that by calling FinalizeNode() from ~CNode(). Just to clarify the obvious: the destructor is called only once and at the time the destructor is called shared_ptr has ensured that there are no other references to the object. This is simpler and easier to reason about than the code in master which waits manually for the number of references to drop.

  37. vasild commented at 3:48 pm on March 14, 2025: contributor

    Further idea around the second sentence of this:

    #10738#issue-240283546:

    Special care must be taken, though, to only delete CNodes from a single thread, and to control which thread that is. Eventually, this should move to the message processing thread, so that it’s not possible to hit cs_main from the main network thread.

    It is possible to decouple CNode deletion from the FinalizeNode() call and do the latter in the msghand thread regardless of when CNode is deleted. FinalizeNode() only needs 3 properties from CNode, so they can be extracted out from the CNode object and saved to a place which later is looked up by the msghand thread which calls FinalizeNode() on that data.

    Here is a patch that does that:

      0diff --git i/src/net_processing.cpp w/src/net_processing.cpp
      1index 1da3ec9d21..1124036a2a 100644
      2--- i/src/net_processing.cpp
      3+++ w/src/net_processing.cpp
      4@@ -487,13 +487,13 @@ public:
      5         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex);
      6     void NewPoWValidBlock(const CBlockIndex *pindex, const std::shared_ptr<const CBlock>& pblock) override
      7         EXCLUSIVE_LOCKS_REQUIRED(!m_most_recent_block_mutex);
      8 
      9     /** Implement NetEventsInterface */
     10     void InitializeNode(const CNode& node, ServiceFlags our_services) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_tx_download_mutex);
     11-    void FinalizeNode(const CNode& node) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_headers_presync_mutex, !m_tx_download_mutex);
     12+    void FinalizeNode(const NodeFinalizationData& node) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_headers_presync_mutex, !m_tx_download_mutex);
     13     bool HasAllDesirableServiceFlags(ServiceFlags services) const override;
     14     bool ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt) override
     15         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_most_recent_block_mutex, !m_headers_presync_mutex, g_msgproc_mutex, !m_tx_download_mutex);
     16     bool SendMessages(CNode* pto) override
     17         EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_most_recent_block_mutex, g_msgproc_mutex, !m_tx_download_mutex);
     18 
     19@@ -1560,80 +1560,78 @@ void PeerManagerImpl::ReattemptInitialBroadcast(CScheduler& scheduler)
     20     // Schedule next run for 10-15 minutes in the future.
     21     // We add randomness on every cycle to avoid the possibility of P2P fingerprinting.
     22     const auto delta = 10min + FastRandomContext().randrange<std::chrono::milliseconds>(5min);
     23     scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
     24 }
     25 
     26-void PeerManagerImpl::FinalizeNode(const CNode& node)
     27+void PeerManagerImpl::FinalizeNode(const NodeFinalizationData& node)
     28 {
     29-    NodeId nodeid = node.GetId();
     30     {
     31     LOCK(cs_main);
     32     {
     33         // We remove the PeerRef from g_peer_map here, but we don't always
     34         // destruct the Peer. Sometimes another thread is still holding a
     35         // PeerRef, so the refcount is >= 1. Be careful not to do any
     36         // processing here that assumes Peer won't be changed before it's
     37         // destructed.
     38-        PeerRef peer = RemovePeer(nodeid);
     39+        PeerRef peer = RemovePeer(node.id);
     40         assert(peer != nullptr);
     41         m_wtxid_relay_peers -= peer->m_wtxid_relay;
     42         assert(m_wtxid_relay_peers >= 0);
     43     }
     44-    CNodeState *state = State(nodeid);
     45+    CNodeState *state = State(node.id);
     46     assert(state != nullptr);
     47 
     48     if (state->fSyncStarted)
     49         nSyncStarted--;
     50 
     51     for (const QueuedBlock& entry : state->vBlocksInFlight) {
     52         auto range = mapBlocksInFlight.equal_range(entry.pindex->GetBlockHash());
     53         while (range.first != range.second) {
     54             auto [node_id, list_it] = range.first->second;
     55-            if (node_id != nodeid) {
     56+            if (node_id != node.id) {
     57                 range.first++;
     58             } else {
     59                 range.first = mapBlocksInFlight.erase(range.first);
     60             }
     61         }
     62     }
     63     {
     64         LOCK(m_tx_download_mutex);
     65-        m_txdownloadman.DisconnectedPeer(nodeid);
     66+        m_txdownloadman.DisconnectedPeer(node.id);
     67     }
     68-    if (m_txreconciliation) m_txreconciliation->ForgetPeer(nodeid);
     69+    if (m_txreconciliation) m_txreconciliation->ForgetPeer(node.id);
     70     m_num_preferred_download_peers -= state->fPreferredDownload;
     71     m_peers_downloading_from -= (!state->vBlocksInFlight.empty());
     72     assert(m_peers_downloading_from >= 0);
     73     m_outbound_peers_with_protect_from_disconnect -= state->m_chain_sync.m_protect;
     74     assert(m_outbound_peers_with_protect_from_disconnect >= 0);
     75 
     76-    m_node_states.erase(nodeid);
     77+    m_node_states.erase(node.id);
     78 
     79     if (m_node_states.empty()) {
     80         // Do a consistency check after the last peer is removed.
     81         assert(mapBlocksInFlight.empty());
     82         assert(m_num_preferred_download_peers == 0);
     83         assert(m_peers_downloading_from == 0);
     84         assert(m_outbound_peers_with_protect_from_disconnect == 0);
     85         assert(m_wtxid_relay_peers == 0);
     86         WITH_LOCK(m_tx_download_mutex, m_txdownloadman.CheckIsEmpty());
     87     }
     88     } // cs_main
     89-    if (node.fSuccessfullyConnected &&
     90-        !node.IsBlockOnlyConn() && !node.IsInboundConn()) {
     91+    if (node.was_connected_full_outbound) {
     92         // Only change visible addrman state for full outbound peers.  We don't
     93         // call Connected() for feeler connections since they don't have
     94         // fSuccessfullyConnected set.
     95         m_addrman.Connected(node.addr);
     96     }
     97     {
     98         LOCK(m_headers_presync_mutex);
     99-        m_headers_presync_stats.erase(nodeid);
    100+        m_headers_presync_stats.erase(node.id);
    101     }
    102-    LogDebug(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
    103+    LogDebug(BCLog::NET, "Cleared nodestate for peer=%d\n", node.id);
    104 }
    105 
    106 bool PeerManagerImpl::HasAllDesirableServiceFlags(ServiceFlags services) const
    107 {
    108     // Shortcut for (services & GetDesirableServiceFlags(services)) == GetDesirableServiceFlags(services)
    109     return !(GetDesirableServiceFlags(services) & (~services));
    110diff --git i/src/net.h w/src/net.h
    111index ff83ecde4d..cbe044f27e 100644
    112--- i/src/net.h
    113+++ w/src/net.h
    114@@ -1000,14 +1000,29 @@ public:
    115     /** Mutex for anything that is only accessed via the msg processing thread */
    116     static Mutex g_msgproc_mutex;
    117 
    118     /** Initialize a peer (setup state) */
    119     virtual void InitializeNode(const CNode& node, ServiceFlags our_services) = 0;
    120 
    121+    /** Data needed for `FinalizeNode()`. */
    122+    struct NodeFinalizationData {
    123+        NodeFinalizationData(const CNode& node)
    124+            : id{node.GetId()},
    125+              addr{node.addr},
    126+              was_connected_full_outbound{node.fSuccessfullyConnected &&
    127+                                          !node.IsBlockOnlyConn() &&
    128+                                          !node.IsInboundConn()}
    129+        {}
    130+
    131+        const NodeId id;
    132+        const CService addr;
    133+        const bool was_connected_full_outbound;
    134+    };
    135+
    136     /** Handle removal of a peer (clear state) */
    137-    virtual void FinalizeNode(const CNode& node) = 0;
    138+    virtual void FinalizeNode(const NodeFinalizationData& node_finalization_data) = 0;
    139 
    140     /**
    141      * Callback to determine whether the given set of service flags are sufficient
    142      * for a peer to be "relevant".
    143      */
    144     virtual bool HasAllDesirableServiceFlags(ServiceFlags services) const = 0;
    145@@ -1112,15 +1127,17 @@ public:
    146 
    147     ~CConnman();
    148 
    149     bool Start(CScheduler& scheduler, const Options& options) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !m_added_nodes_mutex, !m_addr_fetches_mutex, !mutexMsgProc);
    150 
    151     void StopThreads();
    152-    void StopNodes();
    153-    void Stop()
    154+    void StopNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_for_finalization_mutex);
    155+    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_for_finalization_mutex)
    156     {
    157+        AssertLockNotHeld(m_nodes_for_finalization_mutex);
    158+
    159         StopThreads();
    160         StopNodes();
    161     };
    162 
    163     void Interrupt() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    164     bool GetNetworkActive() const { return fNetworkActive; };
    165@@ -1261,12 +1278,15 @@ public:
    166 
    167     /** Return true if we should disconnect the peer for failing an inactivity check. */
    168     bool ShouldRunInactivityChecks(const CNode& node, std::chrono::seconds now) const;
    169 
    170     bool MultipleManualOrFullOutboundConns(Network net) const EXCLUSIVE_LOCKS_REQUIRED(m_nodes_mutex);
    171 
    172+    /** Remember some of the node's properties for calling `m_msgproc->FinalizeNode()` later. */
    173+    void RememberForFinalization(const CNode& node) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_for_finalization_mutex);
    174+
    175 private:
    176     struct ListenSocket {
    177     public:
    178         std::shared_ptr<Sock> sock;
    179         inline void AddSocketPermissionFlags(NetPermissionFlags& flags) const { NetPermissions::AddFlag(flags, m_permissions); }
    180         ListenSocket(std::shared_ptr<Sock> sock_, NetPermissionFlags permissions_)
    181@@ -1287,13 +1307,13 @@ private:
    182     bool InitBinds(const Options& options);
    183 
    184     void ThreadOpenAddedConnections() EXCLUSIVE_LOCKS_REQUIRED(!m_added_nodes_mutex, !m_unused_i2p_sessions_mutex, !m_reconnections_mutex);
    185     void AddAddrFetch(const std::string& strDest) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex);
    186     void ProcessAddrFetch() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_unused_i2p_sessions_mutex);
    187     void ThreadOpenConnections(std::vector<std::string> connect, Span<const std::string> seed_nodes) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_added_nodes_mutex, !m_nodes_mutex, !m_unused_i2p_sessions_mutex, !m_reconnections_mutex);
    188-    void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    189+    void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc, !m_nodes_for_finalization_mutex);
    190     void ThreadI2PAcceptIncoming();
    191     void AcceptConnection(const ListenSocket& hListenSocket);
    192 
    193     /**
    194      * Create a `CNode` object from a socket that has just been accepted and add the node to
    195      * the `m_nodes` member.
    196@@ -1305,12 +1325,16 @@ private:
    197     void CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    198                                       NetPermissionFlags permission_flags,
    199                                       const CService& addr_bind,
    200                                       const CService& addr);
    201 
    202     void DisconnectNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_reconnections_mutex, !m_nodes_mutex);
    203+
    204+    /** Call `m_msgproc->FinalizeNode()` on all entries from `m_nodes_for_finalization`. */
    205+    void FinalizeNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_for_finalization_mutex);
    206+
    207     void NotifyNumConnectionsChanged();
    208     /** Return true if the peer is inactive and should be disconnected. */
    209     bool InactivityCheck(const CNode& node) const;
    210 
    211     /**
    212      * Generate a collection of sockets to check for IO readiness.
    213@@ -1395,12 +1419,17 @@ private:
    214     // Whether the node should be passed out in ForEach* callbacks
    215     static bool NodeFullyConnected(const CNode& node);
    216 
    217     uint16_t GetDefaultPort(Network net) const;
    218     uint16_t GetDefaultPort(const std::string& addr) const;
    219 
    220+    Mutex m_nodes_for_finalization_mutex;
    221+
    222+    /** When a node is destroyed some of its properties are saved here for calling m_msgproc->FinalizeNode() later. */
    223+    std::queue<std::unique_ptr<NetEventsInterface::NodeFinalizationData>> m_nodes_for_finalization GUARDED_BY(m_nodes_for_finalization_mutex);
    224+
    225     // Network usage totals
    226     mutable Mutex m_total_bytes_sent_mutex;
    227     std::atomic<uint64_t> nTotalBytesRecv{0};
    228     uint64_t nTotalBytesSent GUARDED_BY(m_total_bytes_sent_mutex) {0};
    229 
    230     // outbound limit & stats
    231diff --git i/src/net.cpp w/src/net.cpp
    232index d641579f6b..3a6ba226d0 100644
    233--- i/src/net.cpp
    234+++ w/src/net.cpp
    235@@ -534,13 +534,13 @@ std::shared_ptr<CNode> CConnman::ConnectNode(CAddress addrConnect, const char *p
    236             CalculateKeyedNetGroup(target_addr),
    237             nonce,
    238             addr_bind,
    239             pszDest ? pszDest : "",
    240             conn_type,
    241             /*inbound_onion=*/false,
    242-            [this](CNode& node) { m_msgproc->FinalizeNode(node); },
    243+            [this](CNode& node) { RememberForFinalization(node); },
    244             CNodeOptions{
    245                 .permission_flags = permission_flags,
    246                 .i2p_sam_session = std::move(i2p_transient_session),
    247                 .recv_flood_size = nReceiveFloodSize,
    248                 .use_v2transport = use_v2transport,
    249             });
    250@@ -1832,13 +1832,13 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    251         CalculateKeyedNetGroup(addr),
    252         nonce,
    253         addr_bind,
    254         /*addrNameIn=*/"",
    255         ConnectionType::INBOUND,
    256         inbound_onion,
    257-        [this](CNode& node) { m_msgproc->FinalizeNode(node); },
    258+        [this](CNode& node) { RememberForFinalization(node); },
    259         CNodeOptions{
    260             .permission_flags = permission_flags,
    261             .prefer_evict = discouraged,
    262             .recv_flood_size = nReceiveFloodSize,
    263             .use_v2transport = use_v2transport,
    264         });
    265@@ -1904,14 +1904,12 @@ void CConnman::DisconnectNodes()
    266     AssertLockNotHeld(m_reconnections_mutex);
    267 
    268     // Use a temporary variable to accumulate desired reconnections, so we don't need
    269     // m_reconnections_mutex while holding m_nodes_mutex.
    270     decltype(m_reconnections) reconnections_to_add;
    271 
    272-    std::vector<std::shared_ptr<CNode>> disconnected_nodes;
    273-
    274     {
    275         LOCK(m_nodes_mutex);
    276 
    277         if (!fNetworkActive) {
    278             // Disconnect any connected nodes
    279             for (auto& pnode : m_nodes) {
    280@@ -1925,18 +1923,12 @@ void CConnman::DisconnectNodes()
    281         // Disconnect unused nodes
    282         for (auto it = m_nodes.begin(); it != m_nodes.end();) {
    283             auto node = *it;
    284             if (node->fDisconnect) {
    285                 it = m_nodes.erase(it);
    286 
    287-                // Keep a reference to this CNode object to delay its destruction until
    288-                // after m_nodes_mutex has been released. Destructing a node involves
    289-                // calling m_msgproc->FinalizeNode() which acquires cs_main. Lock order
    290-                // should be cs_main, m_nodes_mutex.
    291-                disconnected_nodes.push_back(node);
    292-
    293                 // Add to reconnection list if appropriate. We don't reconnect right here, because
    294                 // the creation of a connection is a blocking operation (up to several seconds),
    295                 // and we don't want to hold up the socket handler thread for that long.
    296                 if (node->m_transport->ShouldReconnectV1()) {
    297                     reconnections_to_add.push_back({
    298                         .addr_connect = node->addr,
    299@@ -1964,12 +1956,38 @@ void CConnman::DisconnectNodes()
    300         // Move entries from reconnections_to_add to m_reconnections.
    301         LOCK(m_reconnections_mutex);
    302         m_reconnections.splice(m_reconnections.end(), std::move(reconnections_to_add));
    303     }
    304 }
    305 
    306+void CConnman::RememberForFinalization(const CNode& node)
    307+{
    308+    LOCK(m_nodes_for_finalization_mutex);
    309+    m_nodes_for_finalization.emplace(std::make_unique<NetEventsInterface::NodeFinalizationData>(node));
    310+}
    311+
    312+void CConnman::FinalizeNodes()
    313+{
    314+    AssertLockNotHeld(m_nodes_for_finalization_mutex);
    315+
    316+    for (;;) {
    317+        std::unique_ptr<NetEventsInterface::NodeFinalizationData> node_finalization_data;
    318+
    319+        {
    320+            LOCK(m_nodes_for_finalization_mutex);
    321+            if (m_nodes_for_finalization.empty()) {
    322+                break;
    323+            }
    324+            node_finalization_data.swap(m_nodes_for_finalization.front());
    325+            m_nodes_for_finalization.pop();
    326+        }
    327+
    328+        m_msgproc->FinalizeNode(*node_finalization_data);
    329+    }
    330+}
    331+
    332 void CConnman::NotifyNumConnectionsChanged()
    333 {
    334     size_t nodes_size;
    335     {
    336         LOCK(m_nodes_mutex);
    337         nodes_size = m_nodes.size();
    338@@ -3050,12 +3068,14 @@ void CConnman::ThreadMessageHandler()
    339 
    340         WAIT_LOCK(mutexMsgProc, lock);
    341         if (!fMoreWork) {
    342             condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
    343         }
    344         fMsgProcWake = false;
    345+
    346+        FinalizeNodes();
    347     }
    348 }
    349 
    350 void CConnman::ThreadI2PAcceptIncoming()
    351 {
    352     static constexpr auto err_wait_begin = 1s;
    353@@ -3440,12 +3460,14 @@ void CConnman::StopThreads()
    354     if (threadSocketHandler.joinable())
    355         threadSocketHandler.join();
    356 }
    357 
    358 void CConnman::StopNodes()
    359 {
    360+    AssertLockNotHeld(m_nodes_for_finalization_mutex);
    361+
    362     if (fAddressesInitialized) {
    363         DumpAddresses();
    364         fAddressesInitialized = false;
    365 
    366         if (m_use_addrman_outgoing) {
    367             // Anchor connections are only dumped during clean shutdown.
    368@@ -3466,12 +3488,14 @@ void CConnman::StopNodes()
    369     }
    370     nodes.clear();
    371 
    372     vhListenSocket.clear();
    373     semOutbound.reset();
    374     semAddnode.reset();
    375+
    376+    FinalizeNodes(); // Finalize any leftovers which were added after the msghand thread was interrupted.
    377 }
    378 
    379 CConnman::~CConnman()
    380 {
    381     Interrupt();
    382     Stop();
    383diff --git i/src/test/fuzz/p2p_handshake.cpp w/src/test/fuzz/p2p_handshake.cpp
    384index d881cb35ba..99b2d70706 100644
    385--- i/src/test/fuzz/p2p_handshake.cpp
    386+++ w/src/test/fuzz/p2p_handshake.cpp
    387@@ -61,14 +61,14 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize)
    388 
    389     LOCK(NetEventsInterface::g_msgproc_mutex);
    390 
    391     std::vector<CNode*> peers;
    392     const auto num_peers_to_add = fuzzed_data_provider.ConsumeIntegralInRange(1, 3);
    393     for (int i = 0; i < num_peers_to_add; ++i) {
    394-        peers.push_back(ConsumeNodeAsUniquePtr(fuzzed_data_provider, i, [&peerman](CNode& node) {
    395-                            peerman->FinalizeNode(node);
    396+        peers.push_back(ConsumeNodeAsUniquePtr(fuzzed_data_provider, i, [&connman](CNode& node) {
    397+                            connman.RememberForFinalization(node);
    398                         }).release());
    399         connman.AddTestNode(*peers.back());
    400         peerman->InitializeNode(
    401             *peers.back(),
    402             static_cast<ServiceFlags>(fuzzed_data_provider.ConsumeIntegral<uint64_t>()));
    403     }
    404diff --git i/src/test/fuzz/p2p_headers_presync.cpp w/src/test/fuzz/p2p_headers_presync.cpp
    405index 4c470dab1c..d6159efaee 100644
    406--- i/src/test/fuzz/p2p_headers_presync.cpp
    407+++ w/src/test/fuzz/p2p_headers_presync.cpp
    408@@ -58,13 +58,13 @@ void HeadersSyncSetup::ResetAndInitialize()
    409         ConnectionType::INBOUND
    410     };
    411 
    412     for (auto conn_type : conn_types) {
    413         CAddress addr{};
    414         m_connections.push_back(new CNode(id++, nullptr, addr, 0, 0, addr, "", conn_type, false,
    415-                                          [this](CNode& node) { m_node.peerman->FinalizeNode(node); }));
    416+                                          [this](CNode& node) { m_node.connman->RememberForFinalization(node); }));
    417         CNode& p2p_node = *m_connections.back();
    418 
    419         connman.Handshake(
    420             /*node=*/p2p_node,
    421             /*successfully_connected=*/true,
    422             /*remote_services=*/ServiceFlags(NODE_NETWORK | NODE_WITNESS),
    423diff --git i/src/test/fuzz/process_message.cpp w/src/test/fuzz/process_message.cpp
    424index b2031a60cc..a7b6913584 100644
    425--- i/src/test/fuzz/process_message.cpp
    426+++ w/src/test/fuzz/process_message.cpp
    427@@ -64,13 +64,13 @@ FUZZ_TARGET(process_message, .init = initialize_process_message)
    428 
    429     const std::string random_message_type{fuzzed_data_provider.ConsumeBytesAsString(CMessageHeader::MESSAGE_TYPE_SIZE).c_str()};
    430     if (!LIMIT_TO_MESSAGE_TYPE.empty() && random_message_type != LIMIT_TO_MESSAGE_TYPE) {
    431         return;
    432     }
    433     CNode& p2p_node = *ConsumeNodeAsUniquePtr(fuzzed_data_provider, std::nullopt, [](CNode& node) {
    434-                           g_setup->m_node.peerman->FinalizeNode(node);
    435+                           g_setup->m_node.connman->RememberForFinalization(node);
    436                        }).release();
    437 
    438     connman.AddTestNode(p2p_node);
    439     FillNode(fuzzed_data_provider, connman, p2p_node);
    440 
    441     const auto mock_time = ConsumeTime(fuzzed_data_provider);
    442diff --git i/src/test/fuzz/process_messages.cpp w/src/test/fuzz/process_messages.cpp
    443index 89584e3d59..9377327f8e 100644
    444--- i/src/test/fuzz/process_messages.cpp
    445+++ w/src/test/fuzz/process_messages.cpp
    446@@ -53,13 +53,13 @@ FUZZ_TARGET(process_messages, .init = initialize_process_messages)
    447     LOCK(NetEventsInterface::g_msgproc_mutex);
    448 
    449     std::vector<CNode*> peers;
    450     const auto num_peers_to_add = fuzzed_data_provider.ConsumeIntegralInRange(1, 3);
    451     for (int i = 0; i < num_peers_to_add; ++i) {
    452         peers.push_back(ConsumeNodeAsUniquePtr(fuzzed_data_provider, i, [&connman](CNode& node) {
    453-                            connman.MsgProc()->FinalizeNode(node);
    454+                            connman.RememberForFinalization(node);
    455                         }).release());
    456         CNode& p2p_node = *peers.back();
    457 
    458         FillNode(fuzzed_data_provider, connman, p2p_node);
    459 
    460         connman.AddTestNode(p2p_node);
    

    Maybe worth exploring in a followup if people think that makes sense. Calling FinalizeNode() on the msghand thread to avoid locking cs_main in the net thread looks out of the scope of this PR which is about removing the manual reference counting of CNode.

  38. DrahtBot added the label Needs rebase on Mar 20, 2025
  39. net: replace manual reference counting of CNode with shared_ptr
    Before this change the code used to count references to `CNode` objects
    manually via `CNode::nRefCount`. Unneeded `CNode`s were scheduled for
    deletion by putting them in `CConnman::m_nodes_disconnected` and were
    deleted after their reference count reached zero. Deleting consists of
    calling `PeerManager::FinalizeNode()` and destroying the `CNode` object.
    
    Replace this scheme with `std::shared_ptr`. This simplifies the code and
    removes:
    `CNode::nRefCount`
    `CNode::GetRefCount()`
    `CNode::AddRef()`
    `CNode::Release()`
    `CConnman::m_nodes_disconnected`
    `CConnman::NodesSnapshot`
    
    Now creating a snapshot of `CConnman::m_nodes` is done by simply copying
    it (under the mutex).
    
    Call `PeerManager::FinalizeNode()` from the destructor of `CNode`, which
    is called when the reference count reaches 0.
    dcb0be18ac
  40. vasild force-pushed on Mar 20, 2025
  41. vasild commented at 5:45 pm on March 20, 2025: contributor
    8c0ce1ca1c...dcb0be18ac: rebase due to conflicts
  42. DrahtBot removed the label Needs rebase on Mar 20, 2025
  43. hodlinator commented at 8:59 am on March 24, 2025: contributor

    unique_ptr

    Here is a branch implementing another possible variant, using unique_ptr as suggested by @purpleKarrot in #32015 (comment): https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/32015_unique_ptr

    Pros

    • It keeps the CNode ownership/lifetime under control of CConnman::m_nodes/m_nodes_disconnected. This can be considered an advancement over the unclear ownership/lifetime of shared_ptr (“global variables in disguise”).

    • It avoids the need for introducing ~CNode() and m_destruct_cb together with passing a callback into CNode() everywhere.

    Cons

    • It requires more ongoing awareness of CNode lifetime than when using shared_ptr. However, once we grok CNode lifetimes, they appear fairly simple (unless I’m missing something):
     0CConnman::ThreadSocketHandler()
     1    while (!interruptNet)
     2        DisconnectNodes()
     3            for m_nodes
     4                move disconnected nodes to m_nodes_disconnected
     5            for m_nodes_disconnected
     6                // Not referenced from other threads.
     7                // And no new snapshots possible since we are in m_nodes_disconnected.
     8                if refcount <= 0
     9                    DeleteNode()
    10        CConnman::SocketHandler()
    11            NodesSnapshot
    12
    13CConnman::ThreadMessageHandler()
    14    while (!flagInterruptMsgProc)
    15        NodesSnapshot
    16
    17Shutdown() (init.cpp)
    18    CConnman::Stop()
    19        StopThreads()
    20            threadMessageHandler.join()
    21            threadSocketHandler.join()
    22        StopNodes()
    23            for m_nodes
    24                DeleteNode()
    25            for m_nodes_disconnected
    26                DeleteNode()
    
    • It retains CNode::nRefCount (but documents its purpose as tracking NodesSnapshot references).
  44. DrahtBot commented at 10:31 pm on March 24, 2025: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  45. DrahtBot added the label Needs rebase on Mar 24, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me