net: disconnect inside AttemptToEvictConnection #27912

pull willcl-ark wants to merge 3 commits into bitcoin:master from willcl-ark:27843-i2p-thread changing 2 files +61 −31
  1. willcl-ark commented at 10:26 am on June 19, 2023: member

    Fixes #27843

    To avoid overflowing -maxconnections disconnect nodes marked for eviction directly inside of AttemptToEvictConnection(). This has the end result that new connections will only be accepted after an existing connection is dropped, otherwise the new connection is dropped.

    Previously the number of connected nodes could overflow nMaxInbound as (multiple) new connections could be accepted from ThreadI2PAccept – each marking an existing connection to drop (in the future) – before ThreadSocketHandler looped through to DisconnectNodes() and took care of the disconnections.

    Node disconnection and deletion are broken out into individual functions which handle a single node so they can be called both from DisconnectNodes() and AttemptToEvictConnection. This will result in more un/locking operations to perform mass disconnections, but as this only really happens when the network becomes inactive it should not be a problem.

  2. DrahtBot commented at 10:26 am on June 19, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28222 (Use shared_ptr for CNode inside CConnman by willcl-ark)
    • #27213 (p2p: Diversify automatic outbound connections with respect to networks by amitiuttarwar)
    • #26938 ([WIP] p2p: asmap, avoid inbound connections from a specific AS by brunoerg)
    • #25832 (tracing: network connection tracepoints by 0xB10C)
    • #25572 (refactor: Introduce EvictionManager and use it for the inbound eviction logic by dergoegge)
    • #25268 (refactor: Introduce EvictionManager by dergoegge)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Jun 19, 2023
  4. in src/net.cpp:1152 in 1d28dfb751 outdated
    1160         std::list<CNode*> nodes_disconnected_copy = m_nodes_disconnected;
    1161         for (CNode* pnode : nodes_disconnected_copy)
    1162         {
    1163             // Destroy the object only after other threads have stopped using it.
    1164-            if (pnode->GetRefCount() <= 0) {
    1165+            // Prevent two threads trying to delete the same node: set nRefCount to -1 first
    


    dergoegge commented at 12:15 pm on June 19, 2023:
    I think m_disconnect_mutex already prevents this from happening

    willcl-ark commented at 1:22 pm on June 19, 2023:

    Hmm you may be right, I might have gone too far with that. What I was trying to protect against was checking the number of references in one instruction, and permitting another thread to increase the refCount in the meantime before we remove the node in the subsequent lines. But, it looks like we don’t have any functions which would increase the refCount during normal operation (only incremented on new connections), so probably unnecessary here.

    If I’m honest, I kind of prefer this belt-and-suspenders way myself, but would be easily persuaded that it’s superfluous or worse than current behaviour (useless extra locking).

    I felt a bit nevous about nesting m_disconnect_mutex inside of m_nodes_mutex, but it seemed like it was slightly preferable to just locking m_nodes_mutex for the whole of DisconnectNodes() as that’s used in many other operations. I don’t think there is any way StopNodes() and DisconnectNodes() can lock each other here though. Would you agree?


    dergoegge commented at 1:50 pm on June 19, 2023:

    The CNode reference counting is a total nightmare and we should (and have in the past: https://github.com/bitcoin/bitcoin/pull/10738/) try to get rid of it at some point.

    permitting another thread to increase the refCount in the meantime before we remove the node in the subsequent lines.

    This is actually possible with and without your patch and the way we protect against this is by not calling CNode::AddRef after nodes where moved to the disconnection queue.

    0if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
    1    // Theoretically, nothing stops a different thread from calling `AddRef()` while we are here
    2    m_nodes_disconnected.remove(pnode);
    3    DeleteNode(pnode);
    4}
    

    But, it looks like we don’t have any functions which would increase the refCount during normal operation (only incremented on new connections), so probably unnecessary here.

    Our RAII helper NodesSnapshot does call AddRef during normal operation (e.g. before processing messages) but NodesSnapshot doesn’t take a snapshot of m_nodes_disconnected so it’s fine.

    I don’t think there is any way StopNodes() and DisconnectNodes() can lock each other here though.

    Agreed, as long as you always lock in the same order (which afaict you do).


    Crypt-iQ commented at 2:28 pm on June 19, 2023:
    I agree with @dergoegge. Don’t think StopNodes() can race with DisconnectNodes() since StopThreads() is called first
  5. in src/net.cpp:1126 in 2d33e62f62 outdated
    1126         {
    1127-            if (pnode->fDisconnect)
    1128-            {
    1129-                // remove from m_nodes
    1130-                m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    1131+            LOCK(m_disconnect_mutex);
    


    Crypt-iQ commented at 1:56 pm on June 19, 2023:
    Is it better to lock m_disconnect_mutex for the duration of the node_copy loop or to move it right before m_nodes_disconnected? Not sure about the performance hit of repeatedly locking+unlocking in the latter case

    willcl-ark commented at 12:10 pm on June 20, 2023:

    I think that’s as currently implemented (perhap GitHub diff throwing you again)?

    https://github.com/bitcoin/bitcoin/blob/2d33e62f6260e9bdde76a31df3b6ffbd20d17db6/src/net.cpp#L1125-L1143


    Crypt-iQ commented at 8:11 am on June 21, 2023:
    I meant like rightttt before m_nodes_disconnect.push_back(pnode). But don’t think it matters so feel free to ignore
  6. in src/net.cpp:1169 in 2d33e62f62 outdated
    1169-    }
    1170-    if(nodes_size != nPrevNodeCount) {
    1171-        nPrevNodeCount = nodes_size;
    1172-        if (m_client_interface) {
    1173-            m_client_interface->NotifyNumConnectionsChanged(nodes_size);
    1174+        if(nodes_size != nPrevNodeCount) {
    


    Crypt-iQ commented at 2:07 pm on June 19, 2023:
    nPrevNodeCount needs a mutex now since it’s called from ThreadI2PSocketHandler also

    Crypt-iQ commented at 11:23 am on June 20, 2023:
    ignore this, I was confused by github’s indentation
  7. in src/net.cpp:1175 in 2d33e62f62 outdated
    1175+            nPrevNodeCount = nodes_size;
    1176+            should_notify = true;
    1177         }
    1178     }
    1179+    if (should_notify && m_client_interface) {
    1180+        m_client_interface->NotifyNumConnectionsChanged(nodes_size);
    


    Crypt-iQ commented at 2:14 pm on June 19, 2023:
    Not familiar with the GUI code, but could this call to m_client_interface->NotifyNumConnectionsChanged race?

    willcl-ark commented at 11:05 am on June 20, 2023:

    I am also not familiar with the GUI code, but IIUC I don’t think this should race as the lines above lock the mutex and update nPrevNodeCount inside it, so even if two threads called this function, t1 would lock first and update the nPrevNodeCount``, and when it unlocked (and notified) t2 would find that nodes_size()was equal tonPrevNodeCountand so would exit without callingNotifyNumConnectionsChanged`.

    But this should still send a notification per connection count change on either thread.

    That said, it’s easy enough to move this block inside the m_nodes_mutex block to be safer, and I may do just that..

    Could change the function to:

     0void CConnman::NotifyNumConnectionsChanged()
     1{
     2    size_t nodes_size;
     3    bool should_notify{false};
     4    {
     5        LOCK(m_nodes_mutex);
     6        nodes_size = m_nodes.size();
     7        if(nodes_size != nPrevNodeCount) {
     8            nPrevNodeCount = nodes_size;
     9            should_notify = true;
    10        }
    11        if (should_notify && m_client_interface) {
    12            m_client_interface->NotifyNumConnectionsChanged(nodes_size);
    13        }
    14    }
    15}
    

    Crypt-iQ commented at 11:42 am on June 20, 2023:

    I think it’s possible that:

    • t1 locks m_nodes_mutex, checks nodes_size and updates nPrevNodeCount & should_notify
    • t1 unlocks the mutex and something else gets scheduled by the CPU
    • the connection count is updated again
    • t2 locks m_nodes_mutex and sets should_notify
    • t2 unlocks the mutex and calls NotifyNumConnectionsChanged
    • t1 gets scheduled and calls NotifyNumConnectionsChanged

    My concern here was whether it’s ok if NotifyNumConnectionsChanged is called simultaneously from 2 threads? I think it’d be good to have nPrevNodeCount to have a GUARDED_BY(m_nodes_mutex) annotation

  8. in src/net.cpp:1406 in 2d33e62f62 outdated
    1401+    AssertLockNotHeld(m_total_bytes_sent_mutex);
    1402+
    1403+    SetSyscallSandboxPolicy(SyscallSandboxPolicy::NET);
    1404+    while (!interruptNet) {
    1405+        DisconnectNodes();
    1406+        NotifyNumConnectionsChanged();
    


    Crypt-iQ commented at 2:28 pm on June 19, 2023:
    I don’t think NotifyNumConnectionsChanged is necessary here, but I could understand the consistency argument
  9. in src/net.cpp:1399 in 2d33e62f62 outdated
    1395@@ -1390,6 +1396,18 @@ void CConnman::ThreadSocketHandler()
    1396     }
    1397 }
    1398 
    1399+void CConnman::ThreadI2PSocketHandler()
    


    Crypt-iQ commented at 2:29 pm on June 19, 2023:
    github won’t let me comment on untouched lines, but minor nit: threadI2PAcceptIncoming could be renamed to threadI2PSocketHandler?
  10. in src/net.cpp:2124 in 2d33e62f62 outdated
    2139-            continue;
    2140-        }
    2141+        return;
    2142+    }
    2143+
    2144+    if (!advertising_listen_addr) {
    


    Crypt-iQ commented at 2:56 pm on June 19, 2023:
    Setting advertising_listen_addr to false each time at the top of the function causes each I2P connection to log an AddLocal(...) line. I think this could be used to perform a disk-fill attack (my debug.log grew several MB in several minutes) . Prior to this patch, it was set once before the while(!interrupt) so it wouldn’t trigger every time on a new connection
  11. Crypt-iQ commented at 3:00 pm on June 19, 2023: contributor
    Tested it and this patch does the fix the issue
  12. willcl-ark force-pushed on Jun 20, 2023
  13. willcl-ark commented at 12:18 pm on June 20, 2023: member

    Thanks @Crypt-iQ for the review. I pushed 21dae96 which includes most of the suggested changes:

    • NotifyNumConnectionsChanged now inside m_nodes_mutex block
    • moved advertising_listen_addr to a class member on CConnman initialised to false
    • Took the suggestion to rename the thread while touching
    • guarded nPrevNodeCount with m_nodes_mutex
  14. vasild commented at 3:34 pm on June 26, 2023: contributor

    I acknowledge the problem described in #27843: ThreadI2PAcceptIncoming() calls CreateNodeFromAcceptedSocket(). The latter checks nInbound >= nMaxInbound and calls AttemptToEvictConnection() if that is true, but the eviction is done asynchronously - AttemptToEvictConnection() only marks it for disconnect and leaves the actual disconnection and socket closure to another thread for a later time (ThreadSocketHandler()). Until that other thread actually does the disconnect, ThreadI2PAcceptIncoming() may call CreateNodeFromAcceptedSocket() way too many times, ending up with N excess connections above 125 (the new I2P ones) and N other connections marked for disconnect, but not disconnected until ThreadSocketHandler() calls DisconnectNodes().

    I intended ThreadI2PAcceptIncoming() to be just low level socket-accept function, an alternative to AcceptConnection() (which may be better named as AcceptConnectionTCP()). And then all the higher level logic (e.g. capping the number of connections) to happen in CreateNodeFromAcceptedSocket(). It indeed checks nMaxInbound but does the disconnect asynchronously. It works for AcceptConnection() because the async happens to be in the same thread. But it will not work if that is decoupled or if a new one is added later to accept connections by other means (other than TCP and I2P/SAM).

    I did not look deeply in the code in this PR, but I gather it drags some high level logic (cap number of connections) to ThreadI2PAcceptIncoming(). This seems suboptimal because now the higher level logic is in two places. If a new accept method is added, then it would have to be replicated there too. And ThreadI2PAcceptIncoming() becomes not just low level socket-accept function.

    What about changing AttemptToEvictConnection() to disconnect synchronously? It is called from just one place. Something like:

     0--- i/src/net.cpp
     1+++ w/src/net.cpp
     2@@ -927,12 +927,13 @@ bool CConnman::AttemptToEvictConnection()
     3     }
     4     LOCK(m_nodes_mutex);
     5     for (CNode* pnode : m_nodes) {
     6         if (pnode->GetId() == *node_id_to_evict) {
     7             LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionType
     8             pnode->fDisconnect = true;
     9+            DisconnectNodes();
    10             return true;
    11         }
    12     }
    13     return false;
    14 }
    

    (modulo trying to avoid the recursive lock of m_nodes_mutex and maybe some optimization because DisconnectNodes() iterates on all nodes and in this case we know which node is marked for disconnect).

  15. willcl-ark commented at 12:37 pm on June 29, 2023: member

    Yes I agree and like this approach better.

    I had created a ThreadI2PSocketHandler() thread which essentially had the same loop as ThreadSocketHandler(). The repetition of DisconnectNodes() and NotifyNumConnectionsChanged() in two threads seemed annoying (and needed a few new locks) and I was considering whether a new seperate thread should just handle disconnecting nodes marked for disconnection and notifications, but I think doing it as part of CreateNodeFromAcceptedSocket() (in AttemptToEvictConnection()) makes more sense so will work on this.

  16. DrahtBot added the label CI failed on Jun 30, 2023
  17. willcl-ark force-pushed on Jul 4, 2023
  18. willcl-ark force-pushed on Jul 4, 2023
  19. willcl-ark renamed this:
    net: run disconnect in I2P thread
    net: disconnect inside AttemptToEvictConnection
    on Jul 4, 2023
  20. willcl-ark commented at 3:38 pm on July 4, 2023: member
    OK I’ve pushed a new set of changes which now disconnects nodes synchronously inside of AttemptToEvictConnection. @Crypt-iQ I’d be curious if you still see these new changes as resolving the issue in #27843? I havent’ gotten your test patch working to my satisfaction yet (or at least, I don’t see positive eviction candidate selection during it so it wouldn’t overflow nMaxInbound even without these changes).
  21. DrahtBot removed the label CI failed on Jul 7, 2023
  22. in src/net.h:1048 in a3903cc186 outdated
    1044@@ -1043,7 +1045,8 @@ class CConnman
    1045     std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
    1046     mutable Mutex m_added_nodes_mutex;
    1047     std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    1048-    std::list<CNode*> m_nodes_disconnected;
    1049+    mutable RecursiveMutex m_nodes_disconnected_mutex;
    


    vasild commented at 1:02 pm on July 11, 2023:

    This need not be RecursiveMutex?

    0    mutable Mutex m_nodes_disconnected_mutex;
    

    willcl-ark commented at 2:17 pm on July 17, 2023:
    Ok I’ve used a GlobalMutex for this now. I never Recursive behaviour but was getting clang static analysis errors without. Looking at sync.h it seems that GlobalMutex should be the correct type for this I think.

    vasild commented at 3:22 pm on July 18, 2023:

    GlobalMutex silences some of the thread safety analysis and was introduced to overcome some limitations on those, is supposed to be used only for mutexes that are defined globally. I agree it is confusing. I don’t like it and have a plan to remove GlobalMutex but it is stuck at #25390.

      0diff --git i/src/net.cpp w/src/net.cpp
      1index c2160e945f..c8ee95a3d0 100644
      2--- i/src/net.cpp
      3+++ w/src/net.cpp
      4@@ -886,12 +886,14 @@ size_t CConnman::SocketSendData(CNode& node) const
      5  *   to forge.  In order to partition a node the attacker must be
      6  *   simultaneously better at all of them than honest peers.
      7  *   If we find a candidate perform the eviction.
      8  */
      9 bool CConnman::AttemptToEvictConnection()
     10 {
     11+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     12+
     13     std::vector<NodeEvictionCandidate> vEvictionCandidates;
     14     {
     15 
     16         LOCK(m_nodes_mutex);
     17         for (const CNode* node : m_nodes) {
     18             if (node->fDisconnect)
     19@@ -937,12 +939,14 @@ bool CConnman::AttemptToEvictConnection()
     20         return true;
     21     }
     22     return false;
     23 }
     24 
     25 void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
     26+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     27+
     28     struct sockaddr_storage sockaddr;
     29     socklen_t len = sizeof(sockaddr);
     30     auto sock = hListenSocket.sock->Accept((struct sockaddr*)&sockaddr, &len);
     31     CAddress addr;
     32 
     33     if (!sock) {
     34@@ -1107,23 +1111,27 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
     35     OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), conn_type);
     36     return true;
     37 }
     38 
     39 void CConnman::DeleteDisconnectedNode(CNode* pnode)
     40 {
     41+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     42+
     43     // Destroy the object only after other threads have stopped using it.
     44     // Prevent double free by setting nRefCount to -1 before delete.
     45     int expectedRefCount = 0;
     46     if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
     47         WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.remove(pnode));
     48         DeleteNode(pnode);
     49     }
     50 }
     51 
     52 void CConnman::DisconnectAndReleaseNode(CNode* pnode)
     53 {
     54+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     55+
     56     LOCK(m_nodes_mutex);
     57     if (std::find(m_nodes.begin(), m_nodes.end(), pnode) != m_nodes.end()) {
     58 
     59         // remove from m_nodes
     60         m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
     61 
     62@@ -1138,12 +1146,14 @@ void CConnman::DisconnectAndReleaseNode(CNode* pnode)
     63         WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(pnode));
     64     }
     65 }
     66 
     67 void CConnman::DisconnectNodes()
     68 {
     69+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     70+
     71     {
     72         LOCK(m_nodes_mutex);
     73 
     74         if (!fNetworkActive) {
     75             // Disconnect any connected nodes
     76             for (CNode* pnode : m_nodes) {
     77@@ -1270,12 +1280,13 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
     78 
     79     return events_per_sock;
     80 }
     81 
     82 void CConnman::SocketHandler()
     83 {
     84+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     85     AssertLockNotHeld(m_total_bytes_sent_mutex);
     86 
     87     Sock::EventsPerSock events_per_sock;
     88 
     89     {
     90         const NodesSnapshot snap{*this, /*shuffle=*/false};
     91@@ -1381,12 +1392,14 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
     92         if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
     93     }
     94 }
     95 
     96 void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock)
     97 {
     98+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     99+
    100     for (const ListenSocket& listen_socket : vhListenSocket) {
    101         if (interruptNet) {
    102             return;
    103         }
    104         const auto it = events_per_sock.find(listen_socket.sock);
    105         if (it != events_per_sock.end() && it->second.occurred & Sock::RECV) {
    106@@ -1394,12 +1407,13 @@ void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock
    107         }
    108     }
    109 }
    110 
    111 void CConnman::ThreadSocketHandler()
    112 {
    113+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    114     AssertLockNotHeld(m_total_bytes_sent_mutex);
    115 
    116     while (!interruptNet)
    117     {
    118         DisconnectNodes();
    119         NotifyNumConnectionsChanged();
    120@@ -2093,12 +2107,14 @@ void CConnman::ThreadMessageHandler()
    121         fMsgProcWake = false;
    122     }
    123 }
    124 
    125 void CConnman::ThreadI2PAcceptIncoming()
    126 {
    127+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    128+
    129     static constexpr auto err_wait_begin = 1s;
    130     static constexpr auto err_wait_cap = 5min;
    131     auto err_wait = err_wait_begin;
    132 
    133     bool advertising_listen_addr = false;
    134     i2p::Connection conn;
    135@@ -2481,12 +2497,14 @@ void CConnman::StopThreads()
    136     if (threadSocketHandler.joinable())
    137         threadSocketHandler.join();
    138 }
    139 
    140 void CConnman::StopNodes()
    141 {
    142+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    143+
    144     if (fAddressesInitialized) {
    145         DumpAddresses();
    146         fAddressesInitialized = false;
    147 
    148         if (m_use_addrman_outgoing) {
    149             // Anchor connections are only dumped during clean shutdown.
    150diff --git i/src/net.h w/src/net.h
    151index d5833d7e2d..d45f3c606f 100644
    152--- i/src/net.h
    153+++ w/src/net.h
    154@@ -760,15 +760,17 @@ public:
    155 
    156     ~CConnman();
    157 
    158     bool Start(CScheduler& scheduler, const Options& options) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !m_added_nodes_mutex, !m_addr_fetches_mutex, !mutexMsgProc);
    159 
    160     void StopThreads();
    161-    void StopNodes();
    162-    void Stop()
    163+    void StopNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    164+    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex)
    165     {
    166+        AssertLockNotHeld(m_nodes_disconnected_mutex);
    167+
    168         StopThreads();
    169         StopNodes();
    170     };
    171 
    172     void Interrupt() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    173     bool GetNetworkActive() const { return fNetworkActive; };
    174@@ -917,31 +919,31 @@ private:
    175 
    176     void ThreadOpenAddedConnections() EXCLUSIVE_LOCKS_REQUIRED(!m_added_nodes_mutex, !m_unused_i2p_sessions_mutex);
    177     void AddAddrFetch(const std::string& strDest) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex);
    178     void ProcessAddrFetch() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_unused_i2p_sessions_mutex);
    179     void ThreadOpenConnections(std::vector<std::string> connect) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_added_nodes_mutex, !m_nodes_mutex, !m_unused_i2p_sessions_mutex);
    180     void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    181-    void ThreadI2PAcceptIncoming();
    182-    void AcceptConnection(const ListenSocket& hListenSocket);
    183+    void ThreadI2PAcceptIncoming() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    184+    void AcceptConnection(const ListenSocket& hListenSocket) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    185 
    186     /**
    187      * Create a `CNode` object from a socket that has just been accepted and add the node to
    188      * the `m_nodes` member.
    189      * [@param](/bitcoin-bitcoin/contributor/param/)[in] sock Connected socket to communicate with the peer.
    190      * [@param](/bitcoin-bitcoin/contributor/param/)[in] permission_flags The peer's permissions.
    191      * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr_bind The address and port at our side of the connection.
    192      * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr The address and port at the peer's side of the connection.
    193      */
    194     void CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    195                                       NetPermissionFlags permission_flags,
    196                                       const CAddress& addr_bind,
    197-                                      const CAddress& addr);
    198+                                      const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    199 
    200-    void DeleteDisconnectedNode(CNode* pnode);
    201-    void DisconnectAndReleaseNode(CNode* pnode);
    202-    void DisconnectNodes();
    203+    void DeleteDisconnectedNode(CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    204+    void DisconnectAndReleaseNode(CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    205+    void DisconnectNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    206     void NotifyNumConnectionsChanged();
    207     /** Return true if the peer is inactive and should be disconnected. */
    208     bool InactivityCheck(const CNode& node) const;
    209 
    210     /**
    211      * Generate a collection of sockets to check for IO readiness.
    212@@ -950,13 +952,13 @@ private:
    213      */
    214     Sock::EventsPerSock GenerateWaitSockets(Span<CNode* const> nodes);
    215 
    216     /**
    217      * Check connected and listening sockets for IO readiness and process them accordingly.
    218      */
    219-    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    220+    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
    221 
    222     /**
    223      * Do the read/write for connected sockets that are ready for IO.
    224      * [@param](/bitcoin-bitcoin/contributor/param/)[in] nodes Nodes to process. The socket of each node is checked against `what`.
    225      * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
    226      */
    227@@ -965,15 +967,15 @@ private:
    228         EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    229 
    230     /**
    231      * Accept incoming connections, one from each read-ready listening socket.
    232      * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
    233      */
    234-    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock);
    235+    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    236 
    237-    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    238+    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
    239     void ThreadDNSAddressSeed() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_nodes_mutex);
    240 
    241     uint64_t CalculateKeyedNetGroup(const CAddress& ad) const;
    242 
    243     CNode* FindNode(const CNetAddr& ip);
    244     CNode* FindNode(const CSubNet& subNet);
    245@@ -983,13 +985,13 @@ private:
    246     /**
    247      * Determine whether we're already connected to a given address, in order to
    248      * avoid initiating duplicate connections.
    249      */
    250     bool AlreadyConnectedToAddress(const CAddress& addr);
    251 
    252-    bool AttemptToEvictConnection();
    253+    bool AttemptToEvictConnection() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    254     CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
    255     void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr) const;
    256 
    257     void DeleteNode(CNode* pnode);
    258 
    259     NodeId GetNewNodeId();
    260@@ -1042,13 +1044,13 @@ private:
    261     const NetGroupManager& m_netgroupman;
    262     std::deque<std::string> m_addr_fetches GUARDED_BY(m_addr_fetches_mutex);
    263     Mutex m_addr_fetches_mutex;
    264     std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
    265     mutable Mutex m_added_nodes_mutex;
    266     std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    267-    GlobalMutex m_nodes_disconnected_mutex;
    268+    Mutex m_nodes_disconnected_mutex;
    269     std::list<CNode*> m_nodes_disconnected GUARDED_BY(m_nodes_disconnected_mutex);
    270     mutable RecursiveMutex m_nodes_mutex;
    271     std::atomic<NodeId> nLastNodeId{0};
    272     unsigned int nPrevNodeCount{0};
    273 
    274     /**
    
  23. in src/net.cpp:1173 in a3903cc186 outdated
    1187-                m_nodes_disconnected.remove(pnode);
    1188-                DeleteNode(pnode);
    1189-            }
    1190-        }
    1191+        DeleteDisconnectedNode(pnode);
    1192     }
    


    vasild commented at 1:08 pm on July 11, 2023:

    This can end up with a double-free if two threads concurrently execute it.

    1. Thread1 makes a copy of m_nodes_disconnected and releases m_nodes_disconnected_mutex
    2. Thread2 does the same
    3. Thread1 starts iterating on its own copy and calls DeleteDisconnectedNode() on the first element which calls DeleteNode() which calls delete pnode;
    4. Thread2 does the same on its own copy, a second delete on the same CNode object.

    willcl-ark commented at 7:37 am on July 12, 2023:
    Right, this is what I tried to prevent against in an earlier version, but in that version I think it was superfluous, however here it would be appropriate to use so that only one thread would actually perform the deletion ever.

    willcl-ark commented at 2:19 pm on July 17, 2023:
    DeleteNode() now gated behind if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) so it can only be called once (by a single thread).

    vasild commented at 4:48 am on July 19, 2023:

    I love this lock free pattern!

    The variable used as a guard (nRefCount) has to be outside of the object being protected. Otherwise if two threads execute it concurrently the first one may swap from 0 to -1 and delete the object. Then the second thread will end up reading freed memory when it tries to use nRefCount.

    This should be safe:

     0    // Delete disconnected nodes. Call DeleteNode() without holding m_nodes_mutex or m_nodes_disconnected_mutex.
     1    std::vector<CNode*> to_delete;
     2    { 
     3        LOCK(m_nodes_disconnected_mutex);
     4        for (auto it = m_nodes_disconnected.begin(); it != m_nodes_disconnected.end();) { 
     5            CNode* node = *it;
     6            if (node->GetRefCount() == 0) { 
     7                it = m_nodes_disconnected.erase(it);
     8                to_delete.push_back(node);
     9            } else { 
    10                ++it;
    11            } 
    12        } 
    13    } 
    14    for (CNode* node : to_delete) { 
    15        DeleteNode(node);
    16    } 
    
  24. in src/net.cpp:2514 in a3903cc186 outdated
    2508+    WITH_LOCK(m_nodes_disconnected_mutex, disconnected_nodes = m_nodes_disconnected);
    2509+    for (CNode* pnode : disconnected_nodes) {
    2510         DeleteNode(pnode);
    2511     }
    2512-    m_nodes_disconnected.clear();
    2513+    WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.clear());
    


    vasild commented at 1:26 pm on July 11, 2023:
    Same double-free as above (even though when this code in StopNodes() is executed, then the other threads that could access m_nodes_disconnected should have been exited by StopThreads() already, but better not rely on that).

    vasild commented at 3:47 pm on July 18, 2023:
    0with lock: copy x to temp
    1process temp without lock
    2with lock: clear x <-- some element may have been inserted between the locks and it will be removed without processing
    

    better do:

    0{
    1    LOCK(m_nodes_disconnected_mutex);
    2    disconnected_nodes = m_nodes_disconnected;
    3    m_nodes_disconnected.clear();
    4}
    5
    6for (CNode* pnode : disconnected_nodes) {
    7    DeleteNode(pnode);
    8}
    

    this will avoid the double-free as well

  25. in src/net.cpp:936 in a3903cc186 outdated
    937+            }
    938         }
    939     }
    940+    if (evicted_node) {
    941+        DisconnectAndReleaseNode(evicted_node);
    942+        DeleteDisconnectedNode(evicted_node);
    


    vasild commented at 3:19 pm on July 11, 2023:
    This could be called concurrently by two threads for the same CNode.

    vasild commented at 4:55 am on July 19, 2023:
    DisconnectAndReleaseNode() is safe because its body is protected by m_nodes_mutex. But DeleteDisconnectedNode() is not - two threads could race to call delete on the CNode object. It looks ok to iterate the whole m_disconnected_nodes here to avoid that, like in the other place.
  26. vasild commented at 4:24 pm on July 11, 2023: contributor

    I am not sure yet what would be the best approach to resolve the issues below.

    One way would be to hold m_nodes_disconnected_mutex for the entire iteration of the m_nodes_disconnected list. But this would mean to call DeleteNode() and thus PeerManagerImpl::FinalizeNode() under that mutex. The latter locks cs_main :-(

  27. willcl-ark force-pushed on Jul 17, 2023
  28. net: add mutex on m_disconnect_nodes
    This will permit safely removing nodes from multiple threads
    88213536ac
  29. net: add disconnect node helper functions
    Add DeleteDisconnectedNode() and DisconnectAndReleaseNode() functions.
    
    Permit disconnection and deletion of a single node.
    dd059160bf
  30. net: disconnect nodes in AttemptToEvictConnection
    Disconnect nodes using new helpers both in DisconnectNodes() and as part of
    AttemptToEvictConnection().
    
    Previously it was possible for multiple new connections to be accepted in
    one thread (e.g. ThreadI2PAccept) and evicted nodes to not be processed
    and removed atomically, instead waiting on ThreadSocketHandler() to run
    DisconnectNodes(), allowing overflow of maxconnections in the interim.
    
    Under the new behvaiour, as part of accepting a new connection we
    disconnect and delete the node inside of AttemptToEvictConnection(),
    dropping the new connection if this doesn't succeed. This ensures we
    don't exceed maxconnections.
    912d8e4914
  31. willcl-ark force-pushed on Jul 17, 2023
  32. Crypt-iQ commented at 2:49 pm on July 17, 2023: contributor

    OK I’ve pushed a new set of changes which now disconnects nodes synchronously inside of AttemptToEvictConnection.

    @Crypt-iQ I’d be curious if you still see these new changes as resolving the issue in #27843? I havent’ gotten your test patch working to my satisfaction yet (or at least, I don’t see positive eviction candidate selection during it so it wouldn’t overflow nMaxInbound even without these changes).

    Sorry I have been a bit short on time recently, but let me know when to test it and I’ll do it

  33. vasild commented at 10:09 am on July 19, 2023: contributor

    Here is a patch on top of this PR that should address all concerns above about thread safetyness.

      0diff --git i/src/net.cpp w/src/net.cpp
      1index c2160e945f..058f615dd5 100644
      2--- i/src/net.cpp
      3+++ w/src/net.cpp
      4@@ -886,12 +886,14 @@ size_t CConnman::SocketSendData(CNode& node) const
      5  *   to forge.  In order to partition a node the attacker must be
      6  *   simultaneously better at all of them than honest peers.
      7  *   If we find a candidate perform the eviction.
      8  */
      9 bool CConnman::AttemptToEvictConnection()
     10 {
     11+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     12+
     13     std::vector<NodeEvictionCandidate> vEvictionCandidates;
     14     {
     15 
     16         LOCK(m_nodes_mutex);
     17         for (const CNode* node : m_nodes) {
     18             if (node->fDisconnect)
     19@@ -916,33 +918,37 @@ bool CConnman::AttemptToEvictConnection()
     20         }
     21     }
     22     const std::optional<NodeId> node_id_to_evict = SelectNodeToEvict(std::move(vEvictionCandidates));
     23     if (!node_id_to_evict) {
     24         return false;
     25     }
     26-    CNode* evicted_node{nullptr};
     27+    bool disconnected{false};
     28     {
     29         LOCK(m_nodes_mutex);
     30-        for (CNode* pnode : m_nodes) {
     31-            if (pnode->GetId() == *node_id_to_evict) {
     32-                LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionTypeAsString(), pnode->GetId());
     33-                pnode->fDisconnect = true;
     34-                evicted_node = pnode;
     35+        for (auto it = m_nodes.begin(); it != m_nodes.end(); ++it) {
     36+            CNode* node = *it;
     37+            if (node->GetId() == *node_id_to_evict) {
     38+                LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", node->ConnectionTypeAsString(), node->GetId());
     39+                node->fDisconnect = true;
     40+                DisconnectAndReleaseNode(node);
     41+                disconnected = true;
     42+                m_nodes.erase(it);
     43                 break;
     44             }
     45         }
     46     }
     47-    if (evicted_node) {
     48-        DisconnectAndReleaseNode(evicted_node);
     49-        DeleteDisconnectedNode(evicted_node);
     50+    if (disconnected) {
     51+        DeleteDisconnectedNodes();
     52         return true;
     53     }
     54     return false;
     55 }
     56 
     57 void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
     58+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     59+
     60     struct sockaddr_storage sockaddr;
     61     socklen_t len = sizeof(sockaddr);
     62     auto sock = hListenSocket.sock->Accept((struct sockaddr*)&sockaddr, &len);
     63     CAddress addr;
     64 
     65     if (!sock) {
     66@@ -1105,45 +1111,56 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
     67     if (!grant) return false;
     68 
     69     OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), conn_type);
     70     return true;
     71 }
     72 
     73-void CConnman::DeleteDisconnectedNode(CNode* pnode)
     74+void CConnman::DeleteDisconnectedNodes()
     75 {
     76-    // Destroy the object only after other threads have stopped using it.
     77-    // Prevent double free by setting nRefCount to -1 before delete.
     78-    int expectedRefCount = 0;
     79-    if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
     80-        WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.remove(pnode));
     81-        DeleteNode(pnode);
     82+    AssertLockNotHeld(m_nodes_disconnected_mutex);
     83+
     84+    // Delete disconnected nodes. Call DeleteNode() without holding m_nodes_mutex or m_nodes_disconnected_mutex.
     85+    std::vector<CNode*> to_delete;
     86+
     87+    {
     88+        LOCK(m_nodes_disconnected_mutex);
     89+        for (auto it = m_nodes_disconnected.begin(); it != m_nodes_disconnected.end();) {
     90+            CNode* node = *it;
     91+            if (node->GetRefCount() == 0) {
     92+                it = m_nodes_disconnected.erase(it);
     93+                to_delete.push_back(node);
     94+            } else {
     95+                ++it;
     96+            }
     97+        }
     98+    }
     99+
    100+    for (CNode* node : to_delete) {
    101+        DeleteNode(node);
    102     }
    103 }
    104 
    105-void CConnman::DisconnectAndReleaseNode(CNode* pnode)
    106+void CConnman::DisconnectAndReleaseNode(CNode* node)
    107 {
    108-    LOCK(m_nodes_mutex);
    109-    if (std::find(m_nodes.begin(), m_nodes.end(), pnode) != m_nodes.end()) {
    110-
    111-        // remove from m_nodes
    112-        m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    113+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    114 
    115-        // release outbound grant (if any)
    116-        pnode->grantOutbound.Release();
    117+    // release outbound grant (if any)
    118+    node->grantOutbound.Release();
    119 
    120-        // close socket and cleanup
    121-        pnode->CloseSocketDisconnect();
    122+    // close socket and cleanup
    123+    node->CloseSocketDisconnect();
    124 
    125-        // hold in disconnected pool until all refs are released
    126-        pnode->Release();
    127-        WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(pnode));
    128-    }
    129+    // hold in disconnected pool until all refs are released
    130+    node->Release();
    131+    WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(node));
    132 }
    133 
    134 void CConnman::DisconnectNodes()
    135 {
    136+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    137+
    138     {
    139         LOCK(m_nodes_mutex);
    140 
    141         if (!fNetworkActive) {
    142             // Disconnect any connected nodes
    143             for (CNode* pnode : m_nodes) {
    144@@ -1152,28 +1169,25 @@ void CConnman::DisconnectNodes()
    145                     pnode->fDisconnect = true;
    146                 }
    147             }
    148         }
    149 
    150         // Disconnect unused nodes
    151-        std::vector<CNode*> nodes_copy = m_nodes;
    152-        for (CNode* pnode : nodes_copy)
    153-        {
    154-            if (pnode->fDisconnect)
    155-            {
    156-                DisconnectAndReleaseNode(pnode);
    157-            }
    158-        }
    159-    }
    160-    // Delete disconnected nodes
    161-    std::list<CNode*> disconnected_nodes_copy{};
    162-    WITH_LOCK(m_nodes_disconnected_mutex, disconnected_nodes_copy = m_nodes_disconnected);
    163-    for (CNode* pnode : disconnected_nodes_copy)
    164-    {
    165-        DeleteDisconnectedNode(pnode);
    166+        m_nodes.erase(std::remove_if(m_nodes.begin(),
    167+                                     m_nodes.end(),
    168+                                     [this](CNode* node) {
    169+                                         if (node->fDisconnect) {
    170+                                             DisconnectAndReleaseNode(node);
    171+                                             return true;
    172+                                         }
    173+                                         return false;
    174+                                     }),
    175+                      m_nodes.end());
    176     }
    177+
    178+    DeleteDisconnectedNodes();
    179 }
    180 
    181 void CConnman::NotifyNumConnectionsChanged()
    182 {
    183     size_t nodes_size;
    184     {
    185@@ -1270,12 +1284,13 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
    186 
    187     return events_per_sock;
    188 }
    189 
    190 void CConnman::SocketHandler()
    191 {
    192+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    193     AssertLockNotHeld(m_total_bytes_sent_mutex);
    194 
    195     Sock::EventsPerSock events_per_sock;
    196 
    197     {
    198         const NodesSnapshot snap{*this, /*shuffle=*/false};
    199@@ -1381,12 +1396,14 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    200         if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
    201     }
    202 }
    203 
    204 void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock)
    205 {
    206+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    207+
    208     for (const ListenSocket& listen_socket : vhListenSocket) {
    209         if (interruptNet) {
    210             return;
    211         }
    212         const auto it = events_per_sock.find(listen_socket.sock);
    213         if (it != events_per_sock.end() && it->second.occurred & Sock::RECV) {
    214@@ -1394,12 +1411,13 @@ void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock
    215         }
    216     }
    217 }
    218 
    219 void CConnman::ThreadSocketHandler()
    220 {
    221+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    222     AssertLockNotHeld(m_total_bytes_sent_mutex);
    223 
    224     while (!interruptNet)
    225     {
    226         DisconnectNodes();
    227         NotifyNumConnectionsChanged();
    228@@ -2093,12 +2111,14 @@ void CConnman::ThreadMessageHandler()
    229         fMsgProcWake = false;
    230     }
    231 }
    232 
    233 void CConnman::ThreadI2PAcceptIncoming()
    234 {
    235+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    236+
    237     static constexpr auto err_wait_begin = 1s;
    238     static constexpr auto err_wait_cap = 5min;
    239     auto err_wait = err_wait_begin;
    240 
    241     bool advertising_listen_addr = false;
    242     i2p::Connection conn;
    243@@ -2481,12 +2501,14 @@ void CConnman::StopThreads()
    244     if (threadSocketHandler.joinable())
    245         threadSocketHandler.join();
    246 }
    247 
    248 void CConnman::StopNodes()
    249 {
    250+    AssertLockNotHeld(m_nodes_disconnected_mutex);
    251+
    252     if (fAddressesInitialized) {
    253         DumpAddresses();
    254         fAddressesInitialized = false;
    255 
    256         if (m_use_addrman_outgoing) {
    257             // Anchor connections are only dumped during clean shutdown.
    258diff --git i/src/net.h w/src/net.h
    259index d5833d7e2d..6cdf8cb462 100644
    260--- i/src/net.h
    261+++ w/src/net.h
    262@@ -760,15 +760,17 @@ public:
    263 
    264     ~CConnman();
    265 
    266     bool Start(CScheduler& scheduler, const Options& options) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !m_added_nodes_mutex, !m_addr_fetches_mutex, !mutexMsgProc);
    267 
    268     void StopThreads();
    269-    void StopNodes();
    270-    void Stop()
    271+    void StopNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    272+    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex)
    273     {
    274+        AssertLockNotHeld(m_nodes_disconnected_mutex);
    275+
    276         StopThreads();
    277         StopNodes();
    278     };
    279 
    280     void Interrupt() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    281     bool GetNetworkActive() const { return fNetworkActive; };
    282@@ -917,31 +919,31 @@ private:
    283 
    284     void ThreadOpenAddedConnections() EXCLUSIVE_LOCKS_REQUIRED(!m_added_nodes_mutex, !m_unused_i2p_sessions_mutex);
    285     void AddAddrFetch(const std::string& strDest) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex);
    286     void ProcessAddrFetch() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_unused_i2p_sessions_mutex);
    287     void ThreadOpenConnections(std::vector<std::string> connect) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_added_nodes_mutex, !m_nodes_mutex, !m_unused_i2p_sessions_mutex);
    288     void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    289-    void ThreadI2PAcceptIncoming();
    290-    void AcceptConnection(const ListenSocket& hListenSocket);
    291+    void ThreadI2PAcceptIncoming() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    292+    void AcceptConnection(const ListenSocket& hListenSocket) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    293 
    294     /**
    295      * Create a `CNode` object from a socket that has just been accepted and add the node to
    296      * the `m_nodes` member.
    297      * [@param](/bitcoin-bitcoin/contributor/param/)[in] sock Connected socket to communicate with the peer.
    298      * [@param](/bitcoin-bitcoin/contributor/param/)[in] permission_flags The peer's permissions.
    299      * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr_bind The address and port at our side of the connection.
    300      * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr The address and port at the peer's side of the connection.
    301      */
    302     void CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    303                                       NetPermissionFlags permission_flags,
    304                                       const CAddress& addr_bind,
    305-                                      const CAddress& addr);
    306+                                      const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    307 
    308-    void DeleteDisconnectedNode(CNode* pnode);
    309-    void DisconnectAndReleaseNode(CNode* pnode);
    310-    void DisconnectNodes();
    311+    void DisconnectAndReleaseNode(CNode* node) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    312+    void DeleteDisconnectedNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    313+    void DisconnectNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    314     void NotifyNumConnectionsChanged();
    315     /** Return true if the peer is inactive and should be disconnected. */
    316     bool InactivityCheck(const CNode& node) const;
    317 
    318     /**
    319      * Generate a collection of sockets to check for IO readiness.
    320@@ -950,13 +952,13 @@ private:
    321      */
    322     Sock::EventsPerSock GenerateWaitSockets(Span<CNode* const> nodes);
    323 
    324     /**
    325      * Check connected and listening sockets for IO readiness and process them accordingly.
    326      */
    327-    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    328+    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
    329 
    330     /**
    331      * Do the read/write for connected sockets that are ready for IO.
    332      * [@param](/bitcoin-bitcoin/contributor/param/)[in] nodes Nodes to process. The socket of each node is checked against `what`.
    333      * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
    334      */
    335@@ -965,15 +967,15 @@ private:
    336         EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    337 
    338     /**
    339      * Accept incoming connections, one from each read-ready listening socket.
    340      * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
    341      */
    342-    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock);
    343+    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    344 
    345-    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    346+    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
    347     void ThreadDNSAddressSeed() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_nodes_mutex);
    348 
    349     uint64_t CalculateKeyedNetGroup(const CAddress& ad) const;
    350 
    351     CNode* FindNode(const CNetAddr& ip);
    352     CNode* FindNode(const CSubNet& subNet);
    353@@ -983,13 +985,13 @@ private:
    354     /**
    355      * Determine whether we're already connected to a given address, in order to
    356      * avoid initiating duplicate connections.
    357      */
    358     bool AlreadyConnectedToAddress(const CAddress& addr);
    359 
    360-    bool AttemptToEvictConnection();
    361+    bool AttemptToEvictConnection() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    362     CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
    363     void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr) const;
    364 
    365     void DeleteNode(CNode* pnode);
    366 
    367     NodeId GetNewNodeId();
    368@@ -1042,13 +1044,13 @@ private:
    369     const NetGroupManager& m_netgroupman;
    370     std::deque<std::string> m_addr_fetches GUARDED_BY(m_addr_fetches_mutex);
    371     Mutex m_addr_fetches_mutex;
    372     std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
    373     mutable Mutex m_added_nodes_mutex;
    374     std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    375-    GlobalMutex m_nodes_disconnected_mutex;
    376+    Mutex m_nodes_disconnected_mutex;
    377     std::list<CNode*> m_nodes_disconnected GUARDED_BY(m_nodes_disconnected_mutex);
    378     mutable RecursiveMutex m_nodes_mutex;
    379     std::atomic<NodeId> nLastNodeId{0};
    380     unsigned int nPrevNodeCount{0};
    381 
    382     /**
    
  34. willcl-ark commented at 3:37 pm on August 4, 2023: member

    Thanks for the review and patch @vasild, I will try to review it over the weekend and keep this moving forward.

    I have in the mean time been considering a totally different approach to thread-safe deletion of nodes from CConnman, but it’s somewhat more invasive (in terms of LOC)… I was experimenting with whether m_nodes would work better as a vector of share_ptr, meaning they could be added and removed from m_nodes (and m_nodes_disconnected) much more safely from the perspective of other threads…

    If you are also interested in this approach, I’d be curious to know what you thought of it. Althought “cleaner” in the end I think the change is too expensive for too-little benefit…

    https://github.com/willcl-ark/bitcoin/tree/2023-07_shared-ptr-cnode

  35. vasild commented at 4:03 pm on August 4, 2023: contributor

    Yes! I was thinking about the same! We are doing some manual refcounting here which is creating all kinds of headaches and is prone to bugs. There is a neat solution already for this std::shared_ptr. Long term, this is a must have IMO.

    To minimize the size of the initial PR, I guess you can just change CConnman::m_nodes and CConnman::m_nodes_disconnected from std::vector<CNode*> to std::vector<CNodeRef> and the needed changes to get that to compile and work.

    Later I think we don’t even need m_nodes_disconnected because we can do the FinalizeNode() from the CNode destructor. For example - give the CNode constructor a callback function that it will invoke from its destructor. Then just remove the shared_ptr objects from m_nodes or elsewhere and when their refcount drops to zero the destructor will be called and it will call FinalizeNode().

    PS 2023-07_shared-ptr-cnode does not look too bad.

  36. DrahtBot added the label Needs rebase on Aug 6, 2023
  37. DrahtBot commented at 5:47 pm on August 6, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  38. DrahtBot commented at 0:43 am on November 4, 2023: contributor

    There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  39. DrahtBot commented at 1:24 am on February 2, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  40. DrahtBot commented at 0:54 am on May 1, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  41. maflcko commented at 5:17 am on May 1, 2024: member
  42. willcl-ark commented at 8:44 am on May 1, 2024: member
    Going to close this for now as I don’t have time to finish it.
  43. willcl-ark closed this on May 1, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me