net: disconnect inside AttemptToEvictConnection #27912

pull willcl-ark wants to merge 3 commits into bitcoin:master from willcl-ark:27843-i2p-thread changing 2 files +61 −31
  1. willcl-ark commented at 10:26 AM on June 19, 2023: member

    Fixes #27843

    To avoid overflowing -maxconnections disconnect nodes marked for eviction directly inside of AttemptToEvictConnection(). This has the end result that new connections will only be accepted after an existing connection is dropped, otherwise the new connection is dropped.

    Previously the number of connected nodes could overflow nMaxInbound as (multiple) new connections could be accepted from ThreadI2PAccept -- each marking an existing connection to drop (in the future) -- before ThreadSocketHandler looped through to DisconnectNodes() and took care of the disconnections.

    Node disconnection and deletion are broken out into individual functions which handle a single node so they can be called both from DisconnectNodes() and AttemptToEvictConnection. This will result in more un/locking operations to perform mass disconnections, but as this only really happens when the network becomes inactive it should not be a problem.

  2. DrahtBot commented at 10:26 AM on June 19, 2023: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28222 (Use shared_ptr for CNode inside CConnman by willcl-ark)
    • #27213 (p2p: Diversify automatic outbound connections with respect to networks by amitiuttarwar)
    • #26938 ([WIP] p2p: asmap, avoid inbound connections from a specific AS by brunoerg)
    • #25832 (tracing: network connection tracepoints by 0xB10C)
    • #25572 (refactor: Introduce EvictionManager and use it for the inbound eviction logic by dergoegge)
    • #25268 (refactor: Introduce EvictionManager by dergoegge)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Jun 19, 2023
  4. in src/net.cpp:1152 in 1d28dfb751 outdated
    1160 |          std::list<CNode*> nodes_disconnected_copy = m_nodes_disconnected;
    1161 |          for (CNode* pnode : nodes_disconnected_copy)
    1162 |          {
    1163 |              // Destroy the object only after other threads have stopped using it.
    1164 | -            if (pnode->GetRefCount() <= 0) {
    1165 | +            // Prevent two threads trying to delete the same node: set nRefCount to -1 first
    


    dergoegge commented at 12:15 PM on June 19, 2023:

    I think m_disconnect_mutex already prevents this from happening


    willcl-ark commented at 1:22 PM on June 19, 2023:

    Hmm you may be right, I might have gone too far with that. What I was trying to protect against was checking the number of references in one instruction, and permitting another thread to increase the refCount in the meantime before we remove the node in the subsequent lines. But, it looks like we don't have any functions which would increase the refCount during normal operation (only incremented on new connections), so probably unnecessary here.

    If I'm honest, I kind of prefer this belt-and-suspenders way myself, but would be easily persuaded that it's superfluous or worse than current behaviour (useless extra locking).

    I felt a bit nevous about nesting m_disconnect_mutex inside of m_nodes_mutex, but it seemed like it was slightly preferable to just locking m_nodes_mutex for the whole of DisconnectNodes() as that's used in many other operations. I don't think there is any way StopNodes() and DisconnectNodes() can lock each other here though. Would you agree?


    dergoegge commented at 1:50 PM on June 19, 2023:

    The CNode reference counting is a total nightmare and we should (and have in the past: https://github.com/bitcoin/bitcoin/pull/10738/) try to get rid of it at some point.

    permitting another thread to increase the refCount in the meantime before we remove the node in the subsequent lines.

    This is actually possible with and without your patch and the way we protect against this is by not calling CNode::AddRef after nodes where moved to the disconnection queue.

    if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
        // Theoretically, nothing stops a different thread from calling `AddRef()` while we are here
        m_nodes_disconnected.remove(pnode);
        DeleteNode(pnode);
    }
    

    But, it looks like we don't have any functions which would increase the refCount during normal operation (only incremented on new connections), so probably unnecessary here.

    Our RAII helper NodesSnapshot does call AddRef during normal operation (e.g. before processing messages) but NodesSnapshot doesn't take a snapshot of m_nodes_disconnected so it's fine.

    I don't think there is any way StopNodes() and DisconnectNodes() can lock each other here though.

    Agreed, as long as you always lock in the same order (which afaict you do).


    Crypt-iQ commented at 2:28 PM on June 19, 2023:

    I agree with @dergoegge. Don't think StopNodes() can race with DisconnectNodes() since StopThreads() is called first

  5. in src/net.cpp:1126 in 2d33e62f62 outdated
    1126 |          {
    1127 | -            if (pnode->fDisconnect)
    1128 | -            {
    1129 | -                // remove from m_nodes
    1130 | -                m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    1131 | +            LOCK(m_disconnect_mutex);
    


    Crypt-iQ commented at 1:56 PM on June 19, 2023:

    Is it better to lock m_disconnect_mutex for the duration of the node_copy loop or to move it right before m_nodes_disconnected? Not sure about the performance hit of repeatedly locking+unlocking in the latter case


    willcl-ark commented at 12:10 PM on June 20, 2023:

    I think that's as currently implemented (perhap GitHub diff throwing you again)?

    https://github.com/bitcoin/bitcoin/blob/2d33e62f6260e9bdde76a31df3b6ffbd20d17db6/src/net.cpp#L1125-L1143


    Crypt-iQ commented at 8:11 AM on June 21, 2023:

    I meant like rightttt before m_nodes_disconnect.push_back(pnode). But don't think it matters so feel free to ignore

  6. in src/net.cpp:1169 in 2d33e62f62 outdated
    1169 | -    }
    1170 | -    if(nodes_size != nPrevNodeCount) {
    1171 | -        nPrevNodeCount = nodes_size;
    1172 | -        if (m_client_interface) {
    1173 | -            m_client_interface->NotifyNumConnectionsChanged(nodes_size);
    1174 | +        if(nodes_size != nPrevNodeCount) {
    


    Crypt-iQ commented at 2:07 PM on June 19, 2023:

    nPrevNodeCount needs a mutex now since it's called from ThreadI2PSocketHandler also


    Crypt-iQ commented at 11:23 AM on June 20, 2023:

    ignore this, I was confused by github's indentation

  7. in src/net.cpp:1175 in 2d33e62f62 outdated
    1175 | +            nPrevNodeCount = nodes_size;
    1176 | +            should_notify = true;
    1177 |          }
    1178 |      }
    1179 | +    if (should_notify && m_client_interface) {
    1180 | +        m_client_interface->NotifyNumConnectionsChanged(nodes_size);
    


    Crypt-iQ commented at 2:14 PM on June 19, 2023:

    Not familiar with the GUI code, but could this call to m_client_interface->NotifyNumConnectionsChanged race?


    willcl-ark commented at 11:05 AM on June 20, 2023:

    I am also not familiar with the GUI code, but IIUC I don't think this should race as the lines above lock the mutex and update nPrevNodeCount inside it, so even if two threads called this function, t1 would lock first and update the nPrevNodeCount``, and when it unlocked (and notified) t2 would find that nodes_size()was equal tonPrevNodeCountand so would exit without callingNotifyNumConnectionsChanged`.

    But this should still send a notification per connection count change on either thread.

    That said, it's easy enough to move this block inside the m_nodes_mutex block to be safer, and I may do just that..

    Could change the function to:

    void CConnman::NotifyNumConnectionsChanged()
    {
        size_t nodes_size;
        bool should_notify{false};
        {
            LOCK(m_nodes_mutex);
            nodes_size = m_nodes.size();
            if(nodes_size != nPrevNodeCount) {
                nPrevNodeCount = nodes_size;
                should_notify = true;
            }
            if (should_notify && m_client_interface) {
                m_client_interface->NotifyNumConnectionsChanged(nodes_size);
            }
        }
    }
    

    Crypt-iQ commented at 11:42 AM on June 20, 2023:

    I think it's possible that:

    • t1 locks m_nodes_mutex, checks nodes_size and updates nPrevNodeCount & should_notify
    • t1 unlocks the mutex and something else gets scheduled by the CPU
    • the connection count is updated again
    • t2 locks m_nodes_mutex and sets should_notify
    • t2 unlocks the mutex and calls NotifyNumConnectionsChanged
    • t1 gets scheduled and calls NotifyNumConnectionsChanged

    My concern here was whether it's ok if NotifyNumConnectionsChanged is called simultaneously from 2 threads? I think it'd be good to have nPrevNodeCount to have a GUARDED_BY(m_nodes_mutex) annotation

  8. in src/net.cpp:1406 in 2d33e62f62 outdated
    1401 | +    AssertLockNotHeld(m_total_bytes_sent_mutex);
    1402 | +
    1403 | +    SetSyscallSandboxPolicy(SyscallSandboxPolicy::NET);
    1404 | +    while (!interruptNet) {
    1405 | +        DisconnectNodes();
    1406 | +        NotifyNumConnectionsChanged();
    


    Crypt-iQ commented at 2:28 PM on June 19, 2023:

    I don't think NotifyNumConnectionsChanged is necessary here, but I could understand the consistency argument

  9. in src/net.cpp:1399 in 2d33e62f62 outdated
    1395 | @@ -1390,6 +1396,18 @@ void CConnman::ThreadSocketHandler()
    1396 |      }
    1397 |  }
    1398 |  
    1399 | +void CConnman::ThreadI2PSocketHandler()
    


    Crypt-iQ commented at 2:29 PM on June 19, 2023:

    github won't let me comment on untouched lines, but minor nit: threadI2PAcceptIncoming could be renamed to threadI2PSocketHandler?

  10. in src/net.cpp:2124 in 2d33e62f62 outdated
    2139 | -            continue;
    2140 | -        }
    2141 | +        return;
    2142 | +    }
    2143 | +
    2144 | +    if (!advertising_listen_addr) {
    


    Crypt-iQ commented at 2:56 PM on June 19, 2023:

    Setting advertising_listen_addr to false each time at the top of the function causes each I2P connection to log an AddLocal(...) line. I think this could be used to perform a disk-fill attack (my debug.log grew several MB in several minutes) . Prior to this patch, it was set once before the while(!interrupt) so it wouldn't trigger every time on a new connection

  11. Crypt-iQ commented at 3:00 PM on June 19, 2023: contributor

    Tested it and this patch does the fix the issue

  12. willcl-ark force-pushed on Jun 20, 2023
  13. willcl-ark commented at 12:18 PM on June 20, 2023: member

    Thanks @Crypt-iQ for the review. I pushed 21dae96 which includes most of the suggested changes:

    • NotifyNumConnectionsChanged now inside m_nodes_mutex block
    • moved advertising_listen_addr to a class member on CConnman initialised to false
    • Took the suggestion to rename the thread while touching
    • guarded nPrevNodeCount with m_nodes_mutex
  14. vasild commented at 3:34 PM on June 26, 2023: contributor

    I acknowledge the problem described in #27843: ThreadI2PAcceptIncoming() calls CreateNodeFromAcceptedSocket(). The latter checks nInbound >= nMaxInbound and calls AttemptToEvictConnection() if that is true, but the eviction is done asynchronously - AttemptToEvictConnection() only marks it for disconnect and leaves the actual disconnection and socket closure to another thread for a later time (ThreadSocketHandler()). Until that other thread actually does the disconnect, ThreadI2PAcceptIncoming() may call CreateNodeFromAcceptedSocket() way too many times, ending up with N excess connections above 125 (the new I2P ones) and N other connections marked for disconnect, but not disconnected until ThreadSocketHandler() calls DisconnectNodes().

    I intended ThreadI2PAcceptIncoming() to be just low level socket-accept function, an alternative to AcceptConnection() (which may be better named as AcceptConnectionTCP()). And then all the higher level logic (e.g. capping the number of connections) to happen in CreateNodeFromAcceptedSocket(). It indeed checks nMaxInbound but does the disconnect asynchronously. It works for AcceptConnection() because the async happens to be in the same thread. But it will not work if that is decoupled or if a new one is added later to accept connections by other means (other than TCP and I2P/SAM).

    I did not look deeply in the code in this PR, but I gather it drags some high level logic (cap number of connections) to ThreadI2PAcceptIncoming(). This seems suboptimal because now the higher level logic is in two places. If a new accept method is added, then it would have to be replicated there too. And ThreadI2PAcceptIncoming() becomes not just low level socket-accept function.

    What about changing AttemptToEvictConnection() to disconnect synchronously? It is called from just one place. Something like:

    --- i/src/net.cpp
    +++ w/src/net.cpp
    @@ -927,12 +927,13 @@ bool CConnman::AttemptToEvictConnection()
         }
         LOCK(m_nodes_mutex);
         for (CNode* pnode : m_nodes) {
             if (pnode->GetId() == *node_id_to_evict) {
                 LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionType
                 pnode->fDisconnect = true;
    +            DisconnectNodes();
                 return true;
             }
         }
         return false;
     }
    

    (modulo trying to avoid the recursive lock of m_nodes_mutex and maybe some optimization because DisconnectNodes() iterates on all nodes and in this case we know which node is marked for disconnect).

  15. willcl-ark commented at 12:37 PM on June 29, 2023: member

    Yes I agree and like this approach better.

    I had created a ThreadI2PSocketHandler() thread which essentially had the same loop as ThreadSocketHandler(). The repetition of DisconnectNodes() and NotifyNumConnectionsChanged() in two threads seemed annoying (and needed a few new locks) and I was considering whether a new seperate thread should just handle disconnecting nodes marked for disconnection and notifications, but I think doing it as part of CreateNodeFromAcceptedSocket() (in AttemptToEvictConnection()) makes more sense so will work on this.

  16. DrahtBot added the label CI failed on Jun 30, 2023
  17. willcl-ark force-pushed on Jul 4, 2023
  18. willcl-ark force-pushed on Jul 4, 2023
  19. willcl-ark renamed this:
    net: run disconnect in I2P thread
    net: disconnect inside AttemptToEvictConnection
    on Jul 4, 2023
  20. willcl-ark commented at 3:38 PM on July 4, 2023: member

    OK I've pushed a new set of changes which now disconnects nodes synchronously inside of AttemptToEvictConnection. @Crypt-iQ I'd be curious if you still see these new changes as resolving the issue in #27843? I havent' gotten your test patch working to my satisfaction yet (or at least, I don't see positive eviction candidate selection during it so it wouldn't overflow nMaxInbound even without these changes).

  21. DrahtBot removed the label CI failed on Jul 7, 2023
  22. in src/net.h:1048 in a3903cc186 outdated
    1044 | @@ -1043,7 +1045,8 @@ class CConnman
    1045 |      std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
    1046 |      mutable Mutex m_added_nodes_mutex;
    1047 |      std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    1048 | -    std::list<CNode*> m_nodes_disconnected;
    1049 | +    mutable RecursiveMutex m_nodes_disconnected_mutex;
    


    vasild commented at 1:02 PM on July 11, 2023:

    This need not be RecursiveMutex?

        mutable Mutex m_nodes_disconnected_mutex;
    

    willcl-ark commented at 2:17 PM on July 17, 2023:

    Ok I've used a GlobalMutex for this now. I never Recursive behaviour but was getting clang static analysis errors without. Looking at sync.h it seems that GlobalMutex should be the correct type for this I think.


    vasild commented at 3:22 PM on July 18, 2023:

    GlobalMutex silences some of the thread safety analysis and was introduced to overcome some limitations on those, is supposed to be used only for mutexes that are defined globally. I agree it is confusing. I don't like it and have a plan to remove GlobalMutex but it is stuck at #25390.

    <details> <summary>[patch] change to Mutex</summary>

    diff --git i/src/net.cpp w/src/net.cpp
    index c2160e945f..c8ee95a3d0 100644
    --- i/src/net.cpp
    +++ w/src/net.cpp
    @@ -886,12 +886,14 @@ size_t CConnman::SocketSendData(CNode& node) const
      *   to forge.  In order to partition a node the attacker must be
      *   simultaneously better at all of them than honest peers.
      *   If we find a candidate perform the eviction.
      */
     bool CConnman::AttemptToEvictConnection()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         std::vector<NodeEvictionCandidate> vEvictionCandidates;
         {
     
             LOCK(m_nodes_mutex);
             for (const CNode* node : m_nodes) {
                 if (node->fDisconnect)
    @@ -937,12 +939,14 @@ bool CConnman::AttemptToEvictConnection()
             return true;
         }
         return false;
     }
     
     void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         struct sockaddr_storage sockaddr;
         socklen_t len = sizeof(sockaddr);
         auto sock = hListenSocket.sock->Accept((struct sockaddr*)&sockaddr, &len);
         CAddress addr;
     
         if (!sock) {
    @@ -1107,23 +1111,27 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
         OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), conn_type);
         return true;
     }
     
     void CConnman::DeleteDisconnectedNode(CNode* pnode)
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         // Destroy the object only after other threads have stopped using it.
         // Prevent double free by setting nRefCount to -1 before delete.
         int expectedRefCount = 0;
         if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
             WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.remove(pnode));
             DeleteNode(pnode);
         }
     }
     
     void CConnman::DisconnectAndReleaseNode(CNode* pnode)
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         LOCK(m_nodes_mutex);
         if (std::find(m_nodes.begin(), m_nodes.end(), pnode) != m_nodes.end()) {
     
             // remove from m_nodes
             m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
     
    @@ -1138,12 +1146,14 @@ void CConnman::DisconnectAndReleaseNode(CNode* pnode)
             WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(pnode));
         }
     }
     
     void CConnman::DisconnectNodes()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         {
             LOCK(m_nodes_mutex);
     
             if (!fNetworkActive) {
                 // Disconnect any connected nodes
                 for (CNode* pnode : m_nodes) {
    @@ -1270,12 +1280,13 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
     
         return events_per_sock;
     }
     
     void CConnman::SocketHandler()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
         AssertLockNotHeld(m_total_bytes_sent_mutex);
     
         Sock::EventsPerSock events_per_sock;
     
         {
             const NodesSnapshot snap{*this, /*shuffle=*/false};
    @@ -1381,12 +1392,14 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
             if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
         }
     }
     
     void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock)
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         for (const ListenSocket& listen_socket : vhListenSocket) {
             if (interruptNet) {
                 return;
             }
             const auto it = events_per_sock.find(listen_socket.sock);
             if (it != events_per_sock.end() && it->second.occurred & Sock::RECV) {
    @@ -1394,12 +1407,13 @@ void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock
             }
         }
     }
     
     void CConnman::ThreadSocketHandler()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
         AssertLockNotHeld(m_total_bytes_sent_mutex);
     
         while (!interruptNet)
         {
             DisconnectNodes();
             NotifyNumConnectionsChanged();
    @@ -2093,12 +2107,14 @@ void CConnman::ThreadMessageHandler()
             fMsgProcWake = false;
         }
     }
     
     void CConnman::ThreadI2PAcceptIncoming()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         static constexpr auto err_wait_begin = 1s;
         static constexpr auto err_wait_cap = 5min;
         auto err_wait = err_wait_begin;
     
         bool advertising_listen_addr = false;
         i2p::Connection conn;
    @@ -2481,12 +2497,14 @@ void CConnman::StopThreads()
         if (threadSocketHandler.joinable())
             threadSocketHandler.join();
     }
     
     void CConnman::StopNodes()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         if (fAddressesInitialized) {
             DumpAddresses();
             fAddressesInitialized = false;
     
             if (m_use_addrman_outgoing) {
                 // Anchor connections are only dumped during clean shutdown.
    diff --git i/src/net.h w/src/net.h
    index d5833d7e2d..d45f3c606f 100644
    --- i/src/net.h
    +++ w/src/net.h
    @@ -760,15 +760,17 @@ public:
     
         ~CConnman();
     
         bool Start(CScheduler& scheduler, const Options& options) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !m_added_nodes_mutex, !m_addr_fetches_mutex, !mutexMsgProc);
     
         void StopThreads();
    -    void StopNodes();
    -    void Stop()
    +    void StopNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex)
         {
    +        AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
             StopThreads();
             StopNodes();
         };
     
         void Interrupt() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
         bool GetNetworkActive() const { return fNetworkActive; };
    @@ -917,31 +919,31 @@ private:
     
         void ThreadOpenAddedConnections() EXCLUSIVE_LOCKS_REQUIRED(!m_added_nodes_mutex, !m_unused_i2p_sessions_mutex);
         void AddAddrFetch(const std::string& strDest) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex);
         void ProcessAddrFetch() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_unused_i2p_sessions_mutex);
         void ThreadOpenConnections(std::vector<std::string> connect) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_added_nodes_mutex, !m_nodes_mutex, !m_unused_i2p_sessions_mutex);
         void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    -    void ThreadI2PAcceptIncoming();
    -    void AcceptConnection(const ListenSocket& hListenSocket);
    +    void ThreadI2PAcceptIncoming() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void AcceptConnection(const ListenSocket& hListenSocket) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
         /**
          * Create a `CNode` object from a socket that has just been accepted and add the node to
          * the `m_nodes` member.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] sock Connected socket to communicate with the peer.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] permission_flags The peer's permissions.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr_bind The address and port at our side of the connection.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr The address and port at the peer's side of the connection.
          */
         void CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
                                           NetPermissionFlags permission_flags,
                                           const CAddress& addr_bind,
    -                                      const CAddress& addr);
    +                                      const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
    -    void DeleteDisconnectedNode(CNode* pnode);
    -    void DisconnectAndReleaseNode(CNode* pnode);
    -    void DisconnectNodes();
    +    void DeleteDisconnectedNode(CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void DisconnectAndReleaseNode(CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void DisconnectNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
         void NotifyNumConnectionsChanged();
         /** Return true if the peer is inactive and should be disconnected. */
         bool InactivityCheck(const CNode& node) const;
     
         /**
          * Generate a collection of sockets to check for IO readiness.
    @@ -950,13 +952,13 @@ private:
          */
         Sock::EventsPerSock GenerateWaitSockets(Span<CNode* const> nodes);
     
         /**
          * Check connected and listening sockets for IO readiness and process them accordingly.
          */
    -    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    +    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
     
         /**
          * Do the read/write for connected sockets that are ready for IO.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] nodes Nodes to process. The socket of each node is checked against `what`.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
          */
    @@ -965,15 +967,15 @@ private:
             EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
     
         /**
          * Accept incoming connections, one from each read-ready listening socket.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
          */
    -    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock);
    +    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
    -    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    +    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
         void ThreadDNSAddressSeed() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_nodes_mutex);
     
         uint64_t CalculateKeyedNetGroup(const CAddress& ad) const;
     
         CNode* FindNode(const CNetAddr& ip);
         CNode* FindNode(const CSubNet& subNet);
    @@ -983,13 +985,13 @@ private:
         /**
          * Determine whether we're already connected to a given address, in order to
          * avoid initiating duplicate connections.
          */
         bool AlreadyConnectedToAddress(const CAddress& addr);
     
    -    bool AttemptToEvictConnection();
    +    bool AttemptToEvictConnection() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
         CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
         void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr) const;
     
         void DeleteNode(CNode* pnode);
     
         NodeId GetNewNodeId();
    @@ -1042,13 +1044,13 @@ private:
         const NetGroupManager& m_netgroupman;
         std::deque<std::string> m_addr_fetches GUARDED_BY(m_addr_fetches_mutex);
         Mutex m_addr_fetches_mutex;
         std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
         mutable Mutex m_added_nodes_mutex;
         std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    -    GlobalMutex m_nodes_disconnected_mutex;
    +    Mutex m_nodes_disconnected_mutex;
         std::list<CNode*> m_nodes_disconnected GUARDED_BY(m_nodes_disconnected_mutex);
         mutable RecursiveMutex m_nodes_mutex;
         std::atomic<NodeId> nLastNodeId{0};
         unsigned int nPrevNodeCount{0};
     
         /**
    
    

    </details>

  23. in src/net.cpp:1173 in a3903cc186 outdated
    1187 | -                m_nodes_disconnected.remove(pnode);
    1188 | -                DeleteNode(pnode);
    1189 | -            }
    1190 | -        }
    1191 | +        DeleteDisconnectedNode(pnode);
    1192 |      }
    


    vasild commented at 1:08 PM on July 11, 2023:

    This can end up with a double-free if two threads concurrently execute it.

    1. Thread1 makes a copy of m_nodes_disconnected and releases m_nodes_disconnected_mutex
    2. Thread2 does the same
    3. Thread1 starts iterating on its own copy and calls DeleteDisconnectedNode() on the first element which calls DeleteNode() which calls delete pnode;
    4. Thread2 does the same on its own copy, a second delete on the same CNode object.

    willcl-ark commented at 7:37 AM on July 12, 2023:

    Right, this is what I tried to prevent against in an earlier version, but in that version I think it was superfluous, however here it would be appropriate to use so that only one thread would actually perform the deletion ever.


    willcl-ark commented at 2:19 PM on July 17, 2023:

    DeleteNode() now gated behind if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) so it can only be called once (by a single thread).


    vasild commented at 4:48 AM on July 19, 2023:

    I love this lock free pattern!

    The variable used as a guard (nRefCount) has to be outside of the object being protected. Otherwise if two threads execute it concurrently the first one may swap from 0 to -1 and delete the object. Then the second thread will end up reading freed memory when it tries to use nRefCount.

    This should be safe:

        // Delete disconnected nodes. Call DeleteNode() without holding m_nodes_mutex or m_nodes_disconnected_mutex.
        std::vector<CNode*> to_delete;
        { 
            LOCK(m_nodes_disconnected_mutex);
            for (auto it = m_nodes_disconnected.begin(); it != m_nodes_disconnected.end();) { 
                CNode* node = *it;
                if (node->GetRefCount() == 0) { 
                    it = m_nodes_disconnected.erase(it);
                    to_delete.push_back(node);
                } else { 
                    ++it;
                } 
            } 
        } 
        for (CNode* node : to_delete) { 
            DeleteNode(node);
        } 
    
  24. in src/net.cpp:2514 in a3903cc186 outdated
    2508 | +    WITH_LOCK(m_nodes_disconnected_mutex, disconnected_nodes = m_nodes_disconnected);
    2509 | +    for (CNode* pnode : disconnected_nodes) {
    2510 |          DeleteNode(pnode);
    2511 |      }
    2512 | -    m_nodes_disconnected.clear();
    2513 | +    WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.clear());
    


    vasild commented at 1:26 PM on July 11, 2023:

    Same double-free as above (even though when this code in StopNodes() is executed, then the other threads that could access m_nodes_disconnected should have been exited by StopThreads() already, but better not rely on that).


    vasild commented at 3:47 PM on July 18, 2023:
    with lock: copy x to temp
    process temp without lock
    with lock: clear x <-- some element may have been inserted between the locks and it will be removed without processing
    

    better do:

    {
        LOCK(m_nodes_disconnected_mutex);
        disconnected_nodes = m_nodes_disconnected;
        m_nodes_disconnected.clear();
    }
    
    for (CNode* pnode : disconnected_nodes) {
        DeleteNode(pnode);
    }
    

    this will avoid the double-free as well

  25. in src/net.cpp:936 in a3903cc186 outdated
     937 | +            }
     938 |          }
     939 |      }
     940 | +    if (evicted_node) {
     941 | +        DisconnectAndReleaseNode(evicted_node);
     942 | +        DeleteDisconnectedNode(evicted_node);
    


    vasild commented at 3:19 PM on July 11, 2023:

    This could be called concurrently by two threads for the same CNode.


    vasild commented at 4:55 AM on July 19, 2023:

    DisconnectAndReleaseNode() is safe because its body is protected by m_nodes_mutex. But DeleteDisconnectedNode() is not - two threads could race to call delete on the CNode object. It looks ok to iterate the whole m_disconnected_nodes here to avoid that, like in the other place.

  26. vasild commented at 4:24 PM on July 11, 2023: contributor

    I am not sure yet what would be the best approach to resolve the issues below.

    One way would be to hold m_nodes_disconnected_mutex for the entire iteration of the m_nodes_disconnected list. But this would mean to call DeleteNode() and thus PeerManagerImpl::FinalizeNode() under that mutex. The latter locks cs_main :-(

  27. willcl-ark force-pushed on Jul 17, 2023
  28. net: add mutex on m_disconnect_nodes
    This will permit safely removing nodes from multiple threads
    88213536ac
  29. net: add disconnect node helper functions
    Add DeleteDisconnectedNode() and DisconnectAndReleaseNode() functions.
    
    Permit disconnection and deletion of a single node.
    dd059160bf
  30. net: disconnect nodes in AttemptToEvictConnection
    Disconnect nodes using new helpers both in DisconnectNodes() and as part of
    AttemptToEvictConnection().
    
    Previously it was possible for multiple new connections to be accepted in
    one thread (e.g. ThreadI2PAccept) and evicted nodes to not be processed
    and removed atomically, instead waiting on ThreadSocketHandler() to run
    DisconnectNodes(), allowing overflow of maxconnections in the interim.
    
    Under the new behvaiour, as part of accepting a new connection we
    disconnect and delete the node inside of AttemptToEvictConnection(),
    dropping the new connection if this doesn't succeed. This ensures we
    don't exceed maxconnections.
    912d8e4914
  31. willcl-ark force-pushed on Jul 17, 2023
  32. Crypt-iQ commented at 2:49 PM on July 17, 2023: contributor

    OK I've pushed a new set of changes which now disconnects nodes synchronously inside of AttemptToEvictConnection.

    @Crypt-iQ I'd be curious if you still see these new changes as resolving the issue in #27843? I havent' gotten your test patch working to my satisfaction yet (or at least, I don't see positive eviction candidate selection during it so it wouldn't overflow nMaxInbound even without these changes).

    Sorry I have been a bit short on time recently, but let me know when to test it and I'll do it

  33. vasild commented at 10:09 AM on July 19, 2023: contributor

    Here is a patch on top of this PR that should address all concerns above about thread safetyness.

    <details> <summary>[patch] thread safe</summary>

    diff --git i/src/net.cpp w/src/net.cpp
    index c2160e945f..058f615dd5 100644
    --- i/src/net.cpp
    +++ w/src/net.cpp
    @@ -886,12 +886,14 @@ size_t CConnman::SocketSendData(CNode& node) const
      *   to forge.  In order to partition a node the attacker must be
      *   simultaneously better at all of them than honest peers.
      *   If we find a candidate perform the eviction.
      */
     bool CConnman::AttemptToEvictConnection()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         std::vector<NodeEvictionCandidate> vEvictionCandidates;
         {
     
             LOCK(m_nodes_mutex);
             for (const CNode* node : m_nodes) {
                 if (node->fDisconnect)
    @@ -916,33 +918,37 @@ bool CConnman::AttemptToEvictConnection()
             }
         }
         const std::optional<NodeId> node_id_to_evict = SelectNodeToEvict(std::move(vEvictionCandidates));
         if (!node_id_to_evict) {
             return false;
         }
    -    CNode* evicted_node{nullptr};
    +    bool disconnected{false};
         {
             LOCK(m_nodes_mutex);
    -        for (CNode* pnode : m_nodes) {
    -            if (pnode->GetId() == *node_id_to_evict) {
    -                LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", pnode->ConnectionTypeAsString(), pnode->GetId());
    -                pnode->fDisconnect = true;
    -                evicted_node = pnode;
    +        for (auto it = m_nodes.begin(); it != m_nodes.end(); ++it) {
    +            CNode* node = *it;
    +            if (node->GetId() == *node_id_to_evict) {
    +                LogPrint(BCLog::NET, "selected %s connection for eviction peer=%d; disconnecting\n", node->ConnectionTypeAsString(), node->GetId());
    +                node->fDisconnect = true;
    +                DisconnectAndReleaseNode(node);
    +                disconnected = true;
    +                m_nodes.erase(it);
                     break;
                 }
             }
         }
    -    if (evicted_node) {
    -        DisconnectAndReleaseNode(evicted_node);
    -        DeleteDisconnectedNode(evicted_node);
    +    if (disconnected) {
    +        DeleteDisconnectedNodes();
             return true;
         }
         return false;
     }
     
     void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         struct sockaddr_storage sockaddr;
         socklen_t len = sizeof(sockaddr);
         auto sock = hListenSocket.sock->Accept((struct sockaddr*)&sockaddr, &len);
         CAddress addr;
     
         if (!sock) {
    @@ -1105,45 +1111,56 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
         if (!grant) return false;
     
         OpenNetworkConnection(CAddress(), false, &grant, address.c_str(), conn_type);
         return true;
     }
     
    -void CConnman::DeleteDisconnectedNode(CNode* pnode)
    +void CConnman::DeleteDisconnectedNodes()
     {
    -    // Destroy the object only after other threads have stopped using it.
    -    // Prevent double free by setting nRefCount to -1 before delete.
    -    int expectedRefCount = 0;
    -    if (pnode->nRefCount.compare_exchange_strong(expectedRefCount, -1)) {
    -        WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.remove(pnode));
    -        DeleteNode(pnode);
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
    +    // Delete disconnected nodes. Call DeleteNode() without holding m_nodes_mutex or m_nodes_disconnected_mutex.
    +    std::vector<CNode*> to_delete;
    +
    +    {
    +        LOCK(m_nodes_disconnected_mutex);
    +        for (auto it = m_nodes_disconnected.begin(); it != m_nodes_disconnected.end();) {
    +            CNode* node = *it;
    +            if (node->GetRefCount() == 0) {
    +                it = m_nodes_disconnected.erase(it);
    +                to_delete.push_back(node);
    +            } else {
    +                ++it;
    +            }
    +        }
    +    }
    +
    +    for (CNode* node : to_delete) {
    +        DeleteNode(node);
         }
     }
     
    -void CConnman::DisconnectAndReleaseNode(CNode* pnode)
    +void CConnman::DisconnectAndReleaseNode(CNode* node)
     {
    -    LOCK(m_nodes_mutex);
    -    if (std::find(m_nodes.begin(), m_nodes.end(), pnode) != m_nodes.end()) {
    -
    -        // remove from m_nodes
    -        m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
     
    -        // release outbound grant (if any)
    -        pnode->grantOutbound.Release();
    +    // release outbound grant (if any)
    +    node->grantOutbound.Release();
     
    -        // close socket and cleanup
    -        pnode->CloseSocketDisconnect();
    +    // close socket and cleanup
    +    node->CloseSocketDisconnect();
     
    -        // hold in disconnected pool until all refs are released
    -        pnode->Release();
    -        WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(pnode));
    -    }
    +    // hold in disconnected pool until all refs are released
    +    node->Release();
    +    WITH_LOCK(m_nodes_disconnected_mutex, m_nodes_disconnected.push_back(node));
     }
     
     void CConnman::DisconnectNodes()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         {
             LOCK(m_nodes_mutex);
     
             if (!fNetworkActive) {
                 // Disconnect any connected nodes
                 for (CNode* pnode : m_nodes) {
    @@ -1152,28 +1169,25 @@ void CConnman::DisconnectNodes()
                         pnode->fDisconnect = true;
                     }
                 }
             }
     
             // Disconnect unused nodes
    -        std::vector<CNode*> nodes_copy = m_nodes;
    -        for (CNode* pnode : nodes_copy)
    -        {
    -            if (pnode->fDisconnect)
    -            {
    -                DisconnectAndReleaseNode(pnode);
    -            }
    -        }
    -    }
    -    // Delete disconnected nodes
    -    std::list<CNode*> disconnected_nodes_copy{};
    -    WITH_LOCK(m_nodes_disconnected_mutex, disconnected_nodes_copy = m_nodes_disconnected);
    -    for (CNode* pnode : disconnected_nodes_copy)
    -    {
    -        DeleteDisconnectedNode(pnode);
    +        m_nodes.erase(std::remove_if(m_nodes.begin(),
    +                                     m_nodes.end(),
    +                                     [this](CNode* node) {
    +                                         if (node->fDisconnect) {
    +                                             DisconnectAndReleaseNode(node);
    +                                             return true;
    +                                         }
    +                                         return false;
    +                                     }),
    +                      m_nodes.end());
         }
    +
    +    DeleteDisconnectedNodes();
     }
     
     void CConnman::NotifyNumConnectionsChanged()
     {
         size_t nodes_size;
         {
    @@ -1270,12 +1284,13 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
     
         return events_per_sock;
     }
     
     void CConnman::SocketHandler()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
         AssertLockNotHeld(m_total_bytes_sent_mutex);
     
         Sock::EventsPerSock events_per_sock;
     
         {
             const NodesSnapshot snap{*this, /*shuffle=*/false};
    @@ -1381,12 +1396,14 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
             if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
         }
     }
     
     void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock)
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         for (const ListenSocket& listen_socket : vhListenSocket) {
             if (interruptNet) {
                 return;
             }
             const auto it = events_per_sock.find(listen_socket.sock);
             if (it != events_per_sock.end() && it->second.occurred & Sock::RECV) {
    @@ -1394,12 +1411,13 @@ void CConnman::SocketHandlerListening(const Sock::EventsPerSock& events_per_sock
             }
         }
     }
     
     void CConnman::ThreadSocketHandler()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
         AssertLockNotHeld(m_total_bytes_sent_mutex);
     
         while (!interruptNet)
         {
             DisconnectNodes();
             NotifyNumConnectionsChanged();
    @@ -2093,12 +2111,14 @@ void CConnman::ThreadMessageHandler()
             fMsgProcWake = false;
         }
     }
     
     void CConnman::ThreadI2PAcceptIncoming()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         static constexpr auto err_wait_begin = 1s;
         static constexpr auto err_wait_cap = 5min;
         auto err_wait = err_wait_begin;
     
         bool advertising_listen_addr = false;
         i2p::Connection conn;
    @@ -2481,12 +2501,14 @@ void CConnman::StopThreads()
         if (threadSocketHandler.joinable())
             threadSocketHandler.join();
     }
     
     void CConnman::StopNodes()
     {
    +    AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
         if (fAddressesInitialized) {
             DumpAddresses();
             fAddressesInitialized = false;
     
             if (m_use_addrman_outgoing) {
                 // Anchor connections are only dumped during clean shutdown.
    diff --git i/src/net.h w/src/net.h
    index d5833d7e2d..6cdf8cb462 100644
    --- i/src/net.h
    +++ w/src/net.h
    @@ -760,15 +760,17 @@ public:
     
         ~CConnman();
     
         bool Start(CScheduler& scheduler, const Options& options) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !m_added_nodes_mutex, !m_addr_fetches_mutex, !mutexMsgProc);
     
         void StopThreads();
    -    void StopNodes();
    -    void Stop()
    +    void StopNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void Stop() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex)
         {
    +        AssertLockNotHeld(m_nodes_disconnected_mutex);
    +
             StopThreads();
             StopNodes();
         };
     
         void Interrupt() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
         bool GetNetworkActive() const { return fNetworkActive; };
    @@ -917,31 +919,31 @@ private:
     
         void ThreadOpenAddedConnections() EXCLUSIVE_LOCKS_REQUIRED(!m_added_nodes_mutex, !m_unused_i2p_sessions_mutex);
         void AddAddrFetch(const std::string& strDest) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex);
         void ProcessAddrFetch() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_unused_i2p_sessions_mutex);
         void ThreadOpenConnections(std::vector<std::string> connect) EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_added_nodes_mutex, !m_nodes_mutex, !m_unused_i2p_sessions_mutex);
         void ThreadMessageHandler() EXCLUSIVE_LOCKS_REQUIRED(!mutexMsgProc);
    -    void ThreadI2PAcceptIncoming();
    -    void AcceptConnection(const ListenSocket& hListenSocket);
    +    void ThreadI2PAcceptIncoming() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void AcceptConnection(const ListenSocket& hListenSocket) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
         /**
          * Create a `CNode` object from a socket that has just been accepted and add the node to
          * the `m_nodes` member.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] sock Connected socket to communicate with the peer.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] permission_flags The peer's permissions.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr_bind The address and port at our side of the connection.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr The address and port at the peer's side of the connection.
          */
         void CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
                                           NetPermissionFlags permission_flags,
                                           const CAddress& addr_bind,
    -                                      const CAddress& addr);
    +                                      const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
    -    void DeleteDisconnectedNode(CNode* pnode);
    -    void DisconnectAndReleaseNode(CNode* pnode);
    -    void DisconnectNodes();
    +    void DisconnectAndReleaseNode(CNode* node) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void DeleteDisconnectedNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
    +    void DisconnectNodes() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
         void NotifyNumConnectionsChanged();
         /** Return true if the peer is inactive and should be disconnected. */
         bool InactivityCheck(const CNode& node) const;
     
         /**
          * Generate a collection of sockets to check for IO readiness.
    @@ -950,13 +952,13 @@ private:
          */
         Sock::EventsPerSock GenerateWaitSockets(Span<CNode* const> nodes);
     
         /**
          * Check connected and listening sockets for IO readiness and process them accordingly.
          */
    -    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    +    void SocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
     
         /**
          * Do the read/write for connected sockets that are ready for IO.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] nodes Nodes to process. The socket of each node is checked against `what`.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
          */
    @@ -965,15 +967,15 @@ private:
             EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
     
         /**
          * Accept incoming connections, one from each read-ready listening socket.
          * [@param](/bitcoin-bitcoin/contributor/param/)[in] events_per_sock Sockets that are ready for IO.
          */
    -    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock);
    +    void SocketHandlerListening(const Sock::EventsPerSock& events_per_sock) EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
     
    -    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex, !mutexMsgProc);
    +    void ThreadSocketHandler() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex, !m_total_bytes_sent_mutex, !mutexMsgProc);
         void ThreadDNSAddressSeed() EXCLUSIVE_LOCKS_REQUIRED(!m_addr_fetches_mutex, !m_nodes_mutex);
     
         uint64_t CalculateKeyedNetGroup(const CAddress& ad) const;
     
         CNode* FindNode(const CNetAddr& ip);
         CNode* FindNode(const CSubNet& subNet);
    @@ -983,13 +985,13 @@ private:
         /**
          * Determine whether we're already connected to a given address, in order to
          * avoid initiating duplicate connections.
          */
         bool AlreadyConnectedToAddress(const CAddress& addr);
     
    -    bool AttemptToEvictConnection();
    +    bool AttemptToEvictConnection() EXCLUSIVE_LOCKS_REQUIRED(!m_nodes_disconnected_mutex);
         CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
         void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr) const;
     
         void DeleteNode(CNode* pnode);
     
         NodeId GetNewNodeId();
    @@ -1042,13 +1044,13 @@ private:
         const NetGroupManager& m_netgroupman;
         std::deque<std::string> m_addr_fetches GUARDED_BY(m_addr_fetches_mutex);
         Mutex m_addr_fetches_mutex;
         std::vector<std::string> m_added_nodes GUARDED_BY(m_added_nodes_mutex);
         mutable Mutex m_added_nodes_mutex;
         std::vector<CNode*> m_nodes GUARDED_BY(m_nodes_mutex);
    -    GlobalMutex m_nodes_disconnected_mutex;
    +    Mutex m_nodes_disconnected_mutex;
         std::list<CNode*> m_nodes_disconnected GUARDED_BY(m_nodes_disconnected_mutex);
         mutable RecursiveMutex m_nodes_mutex;
         std::atomic<NodeId> nLastNodeId{0};
         unsigned int nPrevNodeCount{0};
     
         /**
    

    </details>

  34. willcl-ark commented at 3:37 PM on August 4, 2023: member

    Thanks for the review and patch @vasild, I will try to review it over the weekend and keep this moving forward.

    I have in the mean time been considering a totally different approach to thread-safe deletion of nodes from CConnman, but it's somewhat more invasive (in terms of LOC)... I was experimenting with whether m_nodes would work better as a vector of share_ptr, meaning they could be added and removed from m_nodes (and m_nodes_disconnected) much more safely from the perspective of other threads...

    If you are also interested in this approach, I'd be curious to know what you thought of it. Althought "cleaner" in the end I think the change is too expensive for too-little benefit...

    https://github.com/willcl-ark/bitcoin/tree/2023-07_shared-ptr-cnode

  35. vasild commented at 4:03 PM on August 4, 2023: contributor

    Yes! I was thinking about the same! We are doing some manual refcounting here which is creating all kinds of headaches and is prone to bugs. There is a neat solution already for this std::shared_ptr. Long term, this is a must have IMO.

    To minimize the size of the initial PR, I guess you can just change CConnman::m_nodes and CConnman::m_nodes_disconnected from std::vector<CNode*> to std::vector<CNodeRef> and the needed changes to get that to compile and work.

    Later I think we don't even need m_nodes_disconnected because we can do the FinalizeNode() from the CNode destructor. For example - give the CNode constructor a callback function that it will invoke from its destructor. Then just remove the shared_ptr objects from m_nodes or elsewhere and when their refcount drops to zero the destructor will be called and it will call FinalizeNode().

    PS 2023-07_shared-ptr-cnode does not look too bad.

  36. DrahtBot added the label Needs rebase on Aug 6, 2023
  37. DrahtBot commented at 5:47 PM on August 6, 2023: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->

    🐙 This pull request conflicts with the target branch and needs rebase.

  38. DrahtBot commented at 12:43 AM on November 4, 2023: contributor

    <!--13523179cfe9479db18ec6c5d236f789-->

    There hasn't been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.
  39. DrahtBot commented at 1:24 AM on February 2, 2024: contributor

    <!--13523179cfe9479db18ec6c5d236f789-->

    ⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.
  40. DrahtBot commented at 12:54 AM on May 1, 2024: contributor

    <!--13523179cfe9479db18ec6c5d236f789-->

    ⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.
  41. maflcko commented at 5:17 AM on May 1, 2024: member
  42. willcl-ark commented at 8:44 AM on May 1, 2024: member

    Going to close this for now as I don't have time to finish it.

  43. willcl-ark closed this on May 1, 2024

  44. bitcoin locked this on May 1, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me