net, net_processing: additional and consistent disconnect logging #28521

pull Sjors wants to merge 4 commits into bitcoin:master from Sjors:2023/09/more_net_log changing 9 files +123 −75
  1. Sjors commented at 2:22 pm on September 23, 2023: member

    While debugging unexpected disconnections, possibly related to #28331, I found some additional [net] logging to be useful.

    All cases where we disconnect now come with a log message that has the word disconnecting:

    • all calls to CloseSocketDisconnect() log disconnecting peer=…
    • wherever we set pnode->fDisconnect = true;
    • for all InactivityCheck cases (which in turn sets fDisconnect)
    • replaces “dropping” with “disconnecting” in Network not active, dropping peer=…

    A few exceptions are listed here: #28521 (review)

    I changed CloseSocketDisconnect() to no longer log disconnecting, and instead have all the call sites do so.

    This PR introduces two helper functions on CNode: DisconnectMsg and LogIP. The second and third commit use these helpers in net_processing.cpp so these disconnect messages are more consistent now (e.g. some didn’t log the IP). No new messages are added there though.

    The LogIP() helper is rarely used outside of a disconnect event, but it’s available for future use.

    Any LogPrint this PR touches is replaced with LogDebug (superseded by #30750), and every LogPrintf with LogInfo.

  2. DrahtBot commented at 2:22 pm on September 23, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28521.

    Reviews

    See the guideline for information on the review process.

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30988 (Split CConnman by vasild)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Sep 23, 2023
  4. ajtowns commented at 11:17 am on September 24, 2023: contributor
    Concept ACK – always makes sense to debug log disconnection reasons IMO
  5. 0xB10C commented at 12:57 pm on September 25, 2023: contributor
    Code review ACK 320978662db6b17e61afbf77a6584e33181c12b9
  6. naumenkogs commented at 8:50 am on September 26, 2023: member
    ACK 320978662db6b17e61afbf77a6584e33181c12b9
  7. in src/net.cpp:2143 in 320978662d outdated
    2138@@ -2138,7 +2139,10 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    2139             }
    2140         }
    2141 
    2142-        if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
    2143+        if (InactivityCheck(*pnode)) {
    2144+            LogPrint(BCLog::NET, "disconnecting due to inactivity peer=%d\n", pnode->GetId());
    


    ajtowns commented at 9:14 am on September 26, 2023:

    InactivityCheck() already logs a reason for any case where it returns true, eg:

    02023-09-26T07:21:14.013289Z [net] socket receive timeout: 1201s peer=745   <==== this one
    12023-09-26T07:21:14.014945Z [net] disconnecting peer=745
    22023-09-26T07:21:14.016616Z [net] Cleared nodestate for peer=745
    3
    42023-09-26T08:33:15.016157Z [net] version handshake timeout peer=9862    <=== and this one
    52023-09-26T08:33:15.016539Z [net] disconnecting peer=9862
    62023-09-26T08:33:15.017321Z [net] Cleared nodestate for peer=9862
    

    Does an extra log entry here actually help? Would adding the word “disconnecting” to the log entries in InactivityCheck() be better?


    Sjors commented at 3:46 pm on September 27, 2023:
    Done, and also made sure the keyword disconnecting appears in all cases where we disconnect.
  8. Sjors force-pushed on Sep 27, 2023
  9. Sjors force-pushed on Sep 27, 2023
  10. RandyMcMillan commented at 10:42 pm on September 27, 2023: contributor
    ACK 61a208a1d8d970558ffcf582b5573f3f3219caa9
  11. DrahtBot requested review from 0xB10C on Sep 27, 2023
  12. DrahtBot requested review from naumenkogs on Sep 27, 2023
  13. in src/net.cpp:2124 in 61a208a1d8 outdated
    2107@@ -2108,6 +2108,7 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    2108             {
    2109                 bool notify = false;
    2110                 if (!pnode->ReceiveMsgBytes({pchBuf, (size_t)nBytes}, notify)) {
    2111+                    LogPrint(BCLog::NET, "receiving message bytes failed for peer=%d\n", pnode->GetId());
    


    ajtowns commented at 8:52 am on September 28, 2023:
    This one didn’t get “disconnecting” added?

    Sjors commented at 9:33 am on September 28, 2023:
    No, because CloseSocketDisconnect already uses that term. I updated the PR description (yesterday) to explain the reasoning here.

    hodlinator commented at 9:12 pm on December 18, 2024:

    (In thread #28521 (review): This is currently being resolved by adding a call to DisconnectMsg() unless I’ve missed something).

    https://github.com/bitcoin/bitcoin/blob/06443b8f28bcec4315cec262eb03343dee5465a6/src/net.cpp#L2146-L2153

  14. DrahtBot added the label Needs rebase on Nov 28, 2023
  15. Sjors commented at 1:15 pm on November 29, 2023: member
    Rebased.
  16. DrahtBot requested review from ajtowns on Nov 29, 2023
  17. Sjors force-pushed on Nov 29, 2023
  18. DrahtBot removed the label Needs rebase on Nov 29, 2023
  19. jonatack commented at 8:25 pm on November 30, 2023: member
    Concept ACK. I’d find it helpful to also log the addrport if fLogIPs is true, along with the network (for instance, because IPv6 addresses look the same as CJDNS ones), e.g. something like net=..., peeraddr=...,
  20. Sjors commented at 8:41 am on December 1, 2023: member
    I added peeraddr= to each of them. This is indeed quite useful, because we can’t look up the IP after disconnection. Adding net= also makes sense, but let’s do that everywhere in another PR (probably using a helper function).
  21. Sjors force-pushed on Dec 1, 2023
  22. maflcko commented at 9:14 am on December 1, 2023: member

    because we can’t look up the IP after disconnection.

    Is the IP not logged on connection, when IP logging is enabled?

  23. Sjors commented at 10:08 am on December 1, 2023: member
    @maflcko indeed it is, though for inbound it depends on the exact log settings. I think right now you won’t see these disconnect messages without -debug=net and with it you’ll see all the connect messages. But if we increase the relative priority of disconnect messages then you might not. So it still seems useful to add.
  24. sr-gi commented at 9:03 pm on January 5, 2024: member
    Concept ACK. Just a single consideration: wouldn’t it also make sense to test this with -logips=1?
  25. Sjors commented at 8:16 am on January 10, 2024: member
    @sr-gi I could, but I prefer to punt that until we have a helper function that can be tested in just one place: #28521 (comment)
  26. DrahtBot added the label CI failed on Jan 17, 2024
  27. Sjors force-pushed on Feb 13, 2024
  28. DrahtBot removed the label CI failed on Feb 13, 2024
  29. DrahtBot added the label Needs rebase on Feb 27, 2024
  30. Sjors force-pushed on Feb 27, 2024
  31. DrahtBot removed the label Needs rebase on Feb 27, 2024
  32. davidgumberg commented at 7:14 pm on May 4, 2024: contributor

    I may be mistaken in understanding you to be saying something else, but it seems to me that any time pnode->fDisconnect is set to true, this will result in a call to CloseSocketDisconnect() and a disconnect log will happen.

    Maybe this doesn’t affect the purpose of this PR, as the connection close may come some time later than pnode->fDisconnect and using ‘disconnect’ consistently makes it easier to connect the message from CloseSocketDisconnect() with the reason that is logged above.

    In CConnman::DisconnectNodes() :

     0void CConnman::DisconnectNodes()
     1{
     2    // [...]
     3    {
     4        // [...]
     5        // Disconnect unused nodes
     6        std::vector<CNode*> nodes_copy = m_nodes;
     7        for (CNode* pnode : nodes_copy)
     8        {
     9            if (pnode->fDisconnect)
    10            {
    11                // remove from m_nodes
    12                m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    13                // [...]
    14                // close socket and cleanup
    15                pnode->CloseSocketDisconnect();
    16                // [...]
    17            }
    18        }
    19    }
    20    // [...]
    21}
    

    For example, I tested setting fNetworkActive with the setnetworkactive rpc and got the following logs on this branch:

    02024-05-04T19:06:56Z SetNetworkActive: false
    12024-05-04T19:06:56Z [net] Network not active, disconnecting peer=0
    22024-05-04T19:06:56Z [net] Network not active, disconnecting peer=1
    32024-05-04T19:06:56Z [net] Network not active, disconnecting peer=2
    42024-05-04T19:06:56Z [net] disconnecting peer=0
    52024-05-04T19:06:56Z [net] disconnecting peer=1
    62024-05-04T19:06:56Z [net] disconnecting peer=2
    

    This should probably also use the #28318 LogDebug interface over LogPrint.

  33. DrahtBot added the label Needs rebase on Jul 10, 2024
  34. Sjors force-pushed on Jul 11, 2024
  35. Sjors commented at 8:40 am on July 11, 2024: member

    Rebased after #29431 @davidgumberg this PR makes it easier to do something like cat ~/.bitcoin/debug.log | grep disconnect, and avoid the need for e.g. grep -n2 (or more) to get additional context.

    I switched to LogDebug in the lines touched by this PR.

  36. Sjors force-pushed on Jul 11, 2024
  37. DrahtBot added the label CI failed on Jul 11, 2024
  38. DrahtBot removed the label Needs rebase on Jul 11, 2024
  39. DrahtBot removed the label CI failed on Jul 11, 2024
  40. DrahtBot added the label Needs rebase on Jul 16, 2024
  41. Sjors force-pushed on Jul 16, 2024
  42. Sjors commented at 9:46 am on July 16, 2024: member
    Rebased after #30420.
  43. DrahtBot removed the label Needs rebase on Jul 16, 2024
  44. DrahtBot added the label Needs rebase on Jul 17, 2024
  45. Sjors force-pushed on Jul 18, 2024
  46. DrahtBot removed the label Needs rebase on Jul 18, 2024
  47. davidgumberg commented at 1:49 am on July 23, 2024: contributor

    Re-reviewed and manually tested ACK https://github.com/bitcoin/bitcoin/commit/45958d595b70222c89c25768f357a2ebbafac2b1

    It should be easy to grep for the reason that a peer got disconnected, and logging the peer’s address when fLogIPs seems helpful.

    Tested by running:

    0./bitcoind -datadir=/tmp/trash -debug=net -logips
    

    made fNetworkActive == false by:

    0./bitcoin-cli -datadir=/tmp/trash setnetworkactive false
    
     02024-07-23T01:08:08Z SetNetworkActive: false
     12024-07-23T01:08:08Z [net] Network not active, disconnecting peer=0 peeraddr=[REDACTEDIP]:8333
     22024-07-23T01:08:08Z [net] Network not active, disconnecting peer=1 peeraddr=[REDACTEDIP]:8333
     32024-07-23T01:08:08Z [net] Network not active, disconnecting peer=2 peeraddr=[REDACTEDIP]:8333
     42024-07-23T01:08:08Z [net] Network not active, disconnecting peer=4 peeraddr=[REDACTEDIP]:8333
     52024-07-23T01:08:08Z [net] Network not active, disconnecting peer=5 peeraddr=[REDACTEDIP]:8333
     62024-07-23T01:08:08Z [net] Network not active, disconnecting peer=6 peeraddr=[REDACTEDIP]:8333
     72024-07-23T01:08:08Z [net] disconnecting peer=0
     82024-07-23T01:08:08Z [net] disconnecting peer=1
     92024-07-23T01:08:08Z [net] disconnecting peer=2
    102024-07-23T01:08:08Z [net] disconnecting peer=4
    112024-07-23T01:08:08Z [net] disconnecting peer=5
    122024-07-23T01:08:08Z [net] disconnecting peer=6
    132024-07-23T01:08:08Z [net] disconnecting peer=8
    142024-07-23T01:08:08Z [net] disconnecting peer=10
    152024-07-23T01:08:08Z [net] disconnecting peer=11
    

    Micro-nit: if the purpose is to reduce the need to do grep -n2 , then it seems to me we should add the word ‘disconnect’ to all instances where the reason appears in the line above, and if we were to always log the reason for disconnect when it happens, it seems to me the extra generic log in CloseSocketDisconnect should be dropped.

  48. DrahtBot requested review from sr-gi on Jul 23, 2024
  49. DrahtBot requested review from jonatack on Jul 23, 2024
  50. in src/net.cpp:2005 in 45958d595b outdated
    2003+        );
    2004         return true;
    2005     }
    2006 
    2007     if (!node.fSuccessfullyConnected) {
    2008+        const std::string disconnect_msg{strprintf(", disconnecting peer=%d%s",
    


    davidgumberg commented at 2:26 am on July 23, 2024:

    nit: Maybe this should move to the top of the function and be used in the other three disconnect logs here

    0 if (!ShouldRunInactivityChecks(node, now)) return false;
    1+const std::string disconnect_msg{...};
    
  51. Sjors force-pushed on Jul 23, 2024
  52. Sjors commented at 11:03 am on July 23, 2024: member

    Thanks @davidgumberg. I added a helper function CNode::DisconnectMsg, and it’s now also used for:

    • “receiving message bytes failed”
    • “socket send error for”
    • “socket recv error”
    • “socket closed”

    In order to get rid of the redundant generic “disconnecting peer” in CNode::CloseSocketDisconnect(), I added the log statement to the two remaining call sites.

    Since it’s a method on CNode I can use it in net_processing as well, for which I added a separate commit. It has the effect of slightly modifying some of the existing log messages, and adds IP address logging in a few more places.

    I did not apply this to:

    • “connected to self”
    • some messages that would require a more complicated helper:
      • “disconnecting but not discouraging”
      • “disconnecting extra block-relay-only peer=%d”
      • “disconnecting extra outbound peer”

    While I was on a roll, I also added a LogIP() helper (as suggested earlier). There’s only a small number of places where we print the IP address that’s not in the context of a disconnect - but now it’s easier to expand that. I skipped the “send version message” message log here.

  53. Sjors renamed this:
    net: additional disconnect logging
    net, net_processing: additional and consistent disconnect logging
    on Jul 23, 2024
  54. Sjors force-pushed on Jul 23, 2024
  55. DrahtBot added the label CI failed on Jul 23, 2024
  56. DrahtBot commented at 12:31 pm on July 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27799245327

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  57. Sjors commented at 12:31 pm on July 23, 2024: member
    Pushed an update to consistently replace any LogPrint we touch with LogDebug, and every LogPrintf with LogInfo. Also fixed some missing \n characters as found by tidy.
  58. DrahtBot removed the label CI failed on Jul 23, 2024
  59. in src/net.cpp:702 in 5b718dc785 outdated
    689@@ -690,11 +690,16 @@ bool CNode::ReceiveMsgBytes(Span<const uint8_t> msg_bytes, bool& complete)
    690     return true;
    691 }
    692 
    693+std::string CNode::LogIP(bool log_ip) const
    694+{
    695+    return log_ip ? strprintf(" peeraddr=%s", addr.ToStringAddrPort()) : "";
    696+}
    697+
    


    davidgumberg commented at 7:44 pm on July 31, 2024:

    Why not:

    0std::string CNode::LogIP() const
    1{
    2    return fLogIPs ? strprintf(" peeraddr=%s", addr.ToStringAddrPort()) : "";
    3}
    

    Sjors commented at 8:42 am on August 9, 2024:
    I think we should avoid accessing the global variable fLogIPs. A future refactor could add m_log_ips to PeerManagerImpl and/or CConnman, initialize it from gArgs and then drop fLogIPs.
  60. DrahtBot added the label Needs rebase on Sep 2, 2024
  61. Sjors force-pushed on Sep 3, 2024
  62. Sjors commented at 8:06 am on September 3, 2024: member
    Rebased after #30750. It made the diff slightly smaller, since I had made the same changes on every line touched here.
  63. DrahtBot removed the label Needs rebase on Sep 3, 2024
  64. naumenkogs commented at 8:18 am on September 11, 2024: member

    ACK 08cd29f5020995294a4f6274e72afab58bf69aa2

    The first two commits make logging cleaner and more uniform. I’m not sure the last commit is useful, but it doesn’t hurt either.

  65. DrahtBot requested review from davidgumberg on Sep 11, 2024
  66. in src/net_processing.cpp:6246 in 0bbeabd82e outdated
    6242@@ -6240,7 +6243,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    6243             // Stalling only triggers when the block download window cannot move. During normal steady state,
    6244             // the download window should be much larger than the to-be-downloaded set of blocks, so disconnection
    6245             // should only happen during initial block download.
    6246-            LogPrintf("Peer=%d%s is stalling block download, disconnecting\n", pto->GetId(), fLogIPs ? strprintf(" peeraddr=%s", pto->addr.ToStringAddrPort()) : "");
    6247+            LogInfo("Peer is stalling block download%s\n", pto->DisconnectMsg(fLogIPs));
    


    davidgumberg commented at 11:50 pm on September 17, 2024:

    nit:

    0            LogInfo("Peer is stalling block download, %s\n", pto->DisconnectMsg(fLogIPs));
    

    Sjors commented at 8:04 am on September 18, 2024:
    Fixed.
  67. DrahtBot requested review from davidgumberg on Sep 17, 2024
  68. in src/net.cpp:1911 in 73d87f62d1 outdated
    1907@@ -1902,6 +1908,7 @@ void CConnman::DisconnectNodes()
    1908                 pnode->grantOutbound.Release();
    1909 
    1910                 // close socket and cleanup
    1911+                LogDebug(BCLog::NET, "%s\n", pnode->DisconnectMsg(fLogIPs));
    


    davidgumberg commented at 0:45 am on September 18, 2024:

    I believe your PR makes sure that DisconnectMsg gets printed any time fDisconnect == true so this line results in a duplicate log in each of those instances.

    For example when doing bitcoin-cli setnetworkactive false:

     02024-09-18T00:32:49Z SetNetworkActive: false
     12024-09-18T00:32:49Z [net] Network not active, disconnecting peer=0 peeraddr=xx.xx.xx.xx:8333
     22024-09-18T00:32:49Z [net] Network not active, disconnecting peer=1 peeraddr=xx.xx.xx.xx:8333
     32024-09-18T00:32:49Z [net] Network not active, disconnecting peer=3 peeraddr=xx.xx.xx.xx:8333
     42024-09-18T00:32:49Z [net] Network not active, disconnecting peer=4 peeraddr=xx.xx.xx.xx:8333
     52024-09-18T00:32:49Z [net] Network not active, disconnecting peer=5 peeraddr=xx.xx.xx.xx:8333
     62024-09-18T00:32:49Z [net] Network not active, disconnecting peer=6 peeraddr=xx.xx.xx.xx:8333
     72024-09-18T00:32:49Z [net] Network not active, disconnecting peer=7 peeraddr=xx.xx.xx.xx:8333
     82024-09-18T00:32:49Z [net] disconnecting peer=0 peeraddr=xx.xx.xx.xx:8333
     92024-09-18T00:32:49Z [net] disconnecting peer=1 peeraddr=xx.xx.xx.xx:8333
    102024-09-18T00:32:49Z [net] disconnecting peer=3 peeraddr=xx.xx.xx.xx:8333
    112024-09-18T00:32:49Z [net] disconnecting peer=4 peeraddr=xx.xx.xx.xx:8333
    122024-09-18T00:32:49Z [net] disconnecting peer=5 peeraddr=xx.xx.xx.xx:8333
    132024-09-18T00:32:49Z [net] disconnecting peer=6 peeraddr=xx.xx.xx.xx:8333
    142024-09-18T00:32:49Z [net] disconnecting peer=7 peeraddr=xx.xx.xx.xx:8333
    

    Sjors commented at 8:04 am on September 18, 2024:
    You’re right. I dropped this line and checked again that anytime fDisconnect is set to true emits a log message.
  69. DrahtBot requested review from davidgumberg on Sep 18, 2024
  70. Sjors force-pushed on Sep 18, 2024
  71. Sjors commented at 8:04 am on September 18, 2024: member
    Rebased and dropped duplicate message in CConnman::DisconnectNodes.
  72. davidgumberg commented at 6:04 pm on September 20, 2024: contributor

    reACK https://github.com/bitcoin/bitcoin/commit/9967e8feb62a0c30612e437b70d1d64a2e67b9e0

    Thanks for putting up with the review churn on this, it looks great! This PR makes peer disconnect message logging more consistent, and makes us better respect the -logips argument.

    I reviewed the range-diff:

    0git range-diff a74bdeea..08cd29f5 6fc46927..9967e8fe
    

    Starting a node and then doing bitcoin-cli setnetworkactive false and got the following:

    02024-09-20T17:47:15Z SetNetworkActive: false
    12024-09-20T17:47:15Z [net] Network not active, disconnecting peer=0
    22024-09-20T17:47:15Z [net] Network not active, disconnecting peer=2
    32024-09-20T17:47:15Z [net] Network not active, disconnecting peer=4
    42024-09-20T17:47:15Z [net] Network not active, disconnecting peer=5
    52024-09-20T17:47:15Z [net] Network not active, disconnecting peer=6
    62024-09-20T17:47:15Z [net] Network not active, disconnecting peer=8
    72024-09-20T17:47:15Z [net] Network not active, disconnecting peer=9
    

    Here is the disconnect logging that happens when a socket closes:

    02024-09-20T17:58:09Z [net] Added connection peer=1
    12024-09-20T17:58:09Z [net] start sending v2 handshake to peer=1
    22024-09-20T17:58:09Z [net] sending version (103 bytes) peer=1
    32024-09-20T17:58:09Z [net] send version message: version 70016, blocks=0, txrelay=1, peer=1
    42024-09-20T17:58:09Z [net] socket closed, disconnecting peer=1
    52024-09-20T17:58:09Z [net] retrying with v1 transport protocol for peer=1
    62024-09-20T17:58:09Z [net] Cleared nodestate for peer=1
    72024-09-20T17:58:09Z [net] trying v1 connection xx.xx.xx.xx:8333 lastseen=27.2hrs
    

    invalid transaction in an INV:

    02024-09-20T18:01:10Z [net] received: pong (8 bytes) peer=14
    12024-09-20T18:01:10Z [net] received: inv (253 bytes) peer=14
    22024-09-20T18:01:10Z [net] transaction (d9db66309820a33f00b0cc8c843cd8683fdc03357dbe1959a9de18afc04ff675) i
    3nv sent in violation of protocol, disconnecting peer=14
    42024-09-20T18:01:10Z [net] Cleared nodestate for peer=14
    

    service flags:

    02024-09-20T18:01:17Z [net] Added connection peer=15
    12024-09-20T18:01:17Z [net] sending version (103 bytes) peer=15
    22024-09-20T18:01:17Z [net] send version message: version 70016, blocks=0, txrelay=0, peer=15
    32024-09-20T18:01:17Z [net] received: version (102 bytes) peer=15
    42024-09-20T18:01:17Z [net] peer does not offer the expected services (0000040c offered, 00000009 expected),
    5 disconnecting peer=15
    62024-09-20T18:01:17Z [net] Cleared nodestate for peer=15
    

    shutdown:

     02024-09-20T18:03:00Z [net] disconnecting peer=0
     12024-09-20T18:03:00Z [net] Cleared nodestate for peer=0
     22024-09-20T18:03:00Z [net] disconnecting peer=3
     32024-09-20T18:03:00Z [net] Cleared nodestate for peer=3
     42024-09-20T18:03:00Z [net] disconnecting peer=4
     52024-09-20T18:03:00Z [net] Cleared nodestate for peer=4
     62024-09-20T18:03:00Z [net] disconnecting peer=5
     72024-09-20T18:03:00Z [net] Cleared nodestate for peer=5
     82024-09-20T18:03:00Z [net] disconnecting peer=7
     92024-09-20T18:03:00Z [net] Cleared nodestate for peer=7
    102024-09-20T18:03:00Z [net] disconnecting peer=8
    112024-09-20T18:03:00Z [net] Cleared nodestate for peer=8
    122024-09-20T18:03:00Z [net] disconnecting peer=9
    132024-09-20T18:03:00Z [net] Cleared nodestate for peer=9
    142024-09-20T18:03:00Z [net] disconnecting peer=10
    152024-09-20T18:03:00Z [net] Cleared nodestate for peer=10
    162024-09-20T18:03:00Z [net] disconnecting peer=13
    172024-09-20T18:03:00Z [net] Cleared nodestate for peer=13
    182024-09-20T18:03:00Z [net] disconnecting peer=17
    192024-09-20T18:03:00Z [net] Cleared nodestate for peer=17
    
  73. in src/net.cpp:2012 in 9967e8feb6 outdated
    2000 
    2001     if (now > last_recv + TIMEOUT_INTERVAL) {
    2002-        LogDebug(BCLog::NET, "socket receive timeout: %is peer=%d\n", count_seconds(now - last_recv), node.GetId());
    2003+        LogDebug(BCLog::NET,
    2004+            "socket receive timeout: %is%, s\n", count_seconds(now - last_recv),
    2005+            node.DisconnectMsg(fLogIPs)
    


    vasild commented at 12:13 pm on October 8, 2024:
    0        LogDebug(BCLog::NET,
    1            "socket receive timeout: %is, %s\n", count_seconds(now - last_recv),
    2            node.DisconnectMsg(fLogIPs)
    

    Sjors commented at 12:24 pm on November 26, 2024:
    Fixed
  74. in src/net.cpp:1996 in 9967e8feb6 outdated
    1982+        LogDebug(BCLog::NET,
    1983+            "socket no message in first %i seconds, %d %d, %s\n",
    1984+            count_seconds(m_peer_connect_timeout),
    1985+            last_recv.count() != 0,
    1986+            last_send.count() != 0,
    1987+            node.DisconnectMsg(fLogIPs)
    


    vasild commented at 12:15 pm on October 8, 2024:

    This %d %d, last_recv.count() != 0, last_send.count() != 0 is super obscure. It was like that before, so it is fine wrt this PR.

    But would be nice to fix it, since this is changing the same log message already. I am not sure how though. Maybe change that %d to something like “has never received from the peer” + “has never sent to the peer”.


    Sjors commented at 12:23 pm on November 26, 2024:
    I added a commit to improve this message.
  75. in src/net_processing.cpp:3786 in 9967e8feb6 outdated
    4114 
    4115     if (!pfrom.fSuccessfullyConnected) {
    4116-        LogDebug(BCLog::NET, "Unsupported message \"%s\" prior to verack from peer=%d\n", SanitizeString(msg_type), pfrom.GetId());
    4117+        LogDebug(BCLog::NET, "Unsupported message \"%s\" prior to verack, %s\n", SanitizeString(msg_type), pfrom.DisconnectMsg(fLogIPs));
    4118         return;
    4119     }
    


    vasild commented at 12:27 pm on October 8, 2024:
    This log shouldn’t contain “disconnecting” because we do not disconnect.

    Sjors commented at 12:22 pm on November 26, 2024:
    Reverted this change (and its corresponding test).
  76. in src/net.cpp:1890 in 9967e8feb6 outdated
    1884@@ -1874,7 +1885,7 @@ void CConnman::DisconnectNodes()
    1885             // Disconnect any connected nodes
    


    vasild commented at 12:37 pm on October 8, 2024:

    (suggestion at random place in the code)

    Consider to also adjust the CConnman::DisconnectNode() functions.


    Sjors commented at 12:22 pm on November 26, 2024:
    Tempted to leave the PR as is for now as it’s already quite large.

    hodlinator commented at 8:48 pm on December 18, 2024:

    I agree that CConnman::DisconnectNode() overloads should also be changed for increased uniformity. Although they have slight variants on how they log the identifiers of the node, so I see why you’d want to defer that. It is good that they already contain “disconnecting”.

    https://github.com/bitcoin/bitcoin/blob/06443b8f28bcec4315cec262eb03343dee5465a6/src/net.cpp#L3606-L3641

  77. vasild commented at 12:42 pm on October 8, 2024: contributor

    Approach ACK 9967e8feb6

    I noticed the repeating pattern fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToStringAddrPort()) : "" while working on #29415 and I repeated it a lot of times in the new messages I added :disappointed:. It has been on my TODO to introduce some function to deduplicate that. Thanks for the CNode::LogIP() method!

  78. naumenkogs commented at 1:38 pm on October 8, 2024: member
    Would be great to address @vasild feedbacks :)
  79. Sjors commented at 3:36 pm on October 21, 2024: member

    Seems like I need to look into and probably fix at least #28521 (review)

    Will look into the other comments too.

  80. net: additional disconnection logging
    Use the word "disconnecting" everywhere for easier grep.
    ad224429f8
  81. net_processing: use CNode::DisconnectMsg helper
    This is not a pure refactor:
    1. It slightly changes the log messages, as reflected in the test changes
    2. It adds the IP address to all disconnect logging (when fLogIPs is set)
    937ef9eb40
  82. net: add LogIP() helper, use in net_processing 1d01ad4d73
  83. Sjors force-pushed on Nov 26, 2024
  84. Sjors commented at 12:24 pm on November 26, 2024: member
    Rebased and addressed feedback.
  85. Sjors force-pushed on Nov 26, 2024
  86. DrahtBot added the label CI failed on Nov 26, 2024
  87. DrahtBot commented at 12:59 pm on November 26, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33538144560

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  88. Sjors commented at 12:59 pm on November 26, 2024: member
  89. net: clarify if we ever sent or received from peer 06443b8f28
  90. Sjors force-pushed on Nov 26, 2024
  91. DrahtBot removed the label CI failed on Nov 26, 2024
  92. vasild approved
  93. vasild commented at 4:56 pm on December 4, 2024: contributor
    ACK 06443b8f28bcec4315cec262eb03343dee5465a6
  94. Sjors commented at 7:25 am on December 16, 2024: member
    @0xB10C @naumenkogs @davidgumberg one more review round pls?
  95. danielabrozzoni approved
  96. danielabrozzoni commented at 0:15 am on December 18, 2024: contributor

    ACK 06443b8f28bcec4315cec262eb03343dee5465a6

    Thanks for working on this! I’ve found myself looking through logs and struggling to understand why some connections were closed, this will come in handy :)

  97. davidgumberg commented at 2:36 am on December 18, 2024: contributor

    reACK https://github.com/bitcoin/bitcoin/commit/06443b8f28bcec4315cec262eb03343dee5465a6

    Retested using the same procedure as above, InactivityCheck() logging is now much less opaque, and this PR makes disconnect logging consistent and better respect the -logips argument

  98. in src/net.cpp:3446 in 06443b8f28
    3442@@ -3411,6 +3443,7 @@ void CConnman::StopNodes()
    3443     std::vector<CNode*> nodes;
    3444     WITH_LOCK(m_nodes_mutex, nodes.swap(m_nodes));
    3445     for (CNode* pnode : nodes) {
    3446+        LogDebug(BCLog::NET, "%s\n", pnode->DisconnectMsg(fLogIPs));
    


    hodlinator commented at 8:10 pm on December 18, 2024:

    Would be nice to have some context on the same line as the disconnect decision in case -logsourcelocations was not enabled when the log was recorded.

    0        LogDebug(BCLog::NET, "Stopping node, %s\n", pnode->DisconnectMsg(fLogIPs));
    
  99. in src/net.cpp:561 in 06443b8f28
    557@@ -558,7 +558,6 @@ void CNode::CloseSocketDisconnect()
    558     fDisconnect = true;
    559     LOCK(m_sock_mutex);
    560     if (m_sock) {
    561-        LogDebug(BCLog::NET, "disconnecting peer=%d\n", id);
    


    hodlinator commented at 8:24 pm on December 18, 2024:

    nit: Would prefer to keep a message like the following in case new calls of CloseSocketDisconnect() creep in without calling your DisconnectMsg function.

    0        LogDebug(BCLog::NET, "Resetting socket for peer=%d\n", id);
    

    (Possibly just as a LogTrace).

  100. in src/net_processing.cpp:4131 in 06443b8f28
    4124@@ -4126,7 +4125,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    4125         vRecv >> locator >> hashStop;
    4126 
    4127         if (locator.vHave.size() > MAX_LOCATOR_SZ) {
    4128-            LogDebug(BCLog::NET, "getheaders locator size %lld > %d, disconnect peer=%d\n", locator.vHave.size(), MAX_LOCATOR_SZ, pfrom.GetId());
    4129+            LogDebug(BCLog::NET, "getheaders locator size %lld > %d, %s\n", locator.vHave.size(), MAX_LOCATOR_SZ, pfrom.DisconnectMsg(fLogIPs));
    4130             pfrom.fDisconnect = true;
    4131             return;
    4132         }
    


  101. hodlinator commented at 9:43 pm on December 18, 2024: contributor

    Concept ACK 06443b8f28bcec4315cec262eb03343dee5465a6

    Having done a similar change to facilitate disconnection issue debugging in a different networked system, I fully support this initiative.

    Excellent approach to enforce uniform wording through a log helper function.

    An alternative but more disruptive approach would have been to make CNode::fDisconnect private and add CNode::Disconnect(std::string_view reason).

    Sorry I didn’t see this PR earlier. I completely understand if you’d rather defer further changes until other PRs.

  102. Sjors commented at 1:40 am on December 19, 2024: member

    @hodlinator thanks for the review. I’d rather not expand the scope of this PR even more, because it’s painful to keep rebasing (as I increasingly forget what I did in the first place).

    If I need to make other more substantial changes, I’ll look into adding a few more log statements based on your feedback, but otherwise let’s leave it to a followup.

  103. hodlinator approved
  104. hodlinator commented at 10:13 pm on December 20, 2024: contributor

    ACK 06443b8f28bcec4315cec262eb03343dee5465a6

    Agree that follow-ups should be handled in other PRs at this point.


     02024-12-20T21:34:36Z [net] got inv: wtx 3e9f4e49ddfb5cfd1fd847de22836de941a7245054db0629bb5c0cdfdd2466bf  have peer=0
     12024-12-20T21:34:36Z [net] received: tx (248 bytes) peer=7
     22024-12-20T21:34:37Z [net] sending inv (73 bytes) peer=5
     32024-12-20T21:34:37Z [net] sending inv (37 bytes) peer=4
     42024-12-20T21:34:37Z [net] sending inv (73 bytes) peer=3
     52024-12-20T21:34:37Z [net] sending inv (73 bytes) peer=1
     62024-12-20T21:34:37Z [net] sending inv (73 bytes) peer=6
     72024-12-20T21:34:38Z [net] sending ping (8 bytes) peer=4
     82024-12-20T21:34:38Z [net] sending inv (37 bytes) peer=0
     92024-12-20T21:34:41Z [net] sending inv (73 bytes) peer=2
    102024-12-20T21:35:03Z [net] sending addrv2 (16 bytes) peer=4
    112024-12-20T21:35:19Z [net] sending addrv2 (28 bytes) peer=5
    122024-12-20T21:35:22Z [net] sending addrv2 (28 bytes) peer=7
    132024-12-20T21:35:28Z [net] sending ping (8 bytes) peer=5
    142024-12-20T21:36:17Z [net] sending ping (8 bytes) peer=6
    152024-12-20T21:36:17Z [net] sending ping (8 bytes) peer=7
    162024-12-20T21:36:18Z [net] sending ping (8 bytes) peer=0
    172024-12-20T21:36:23Z [net] sending ping (8 bytes) peer=8
    182024-12-20T21:36:24Z [net] sending ping (8 bytes) peer=9
    192024-12-20T21:36:25Z [net] sending ping (8 bytes) peer=1
    202024-12-20T21:36:31Z [net] sending ping (8 bytes) peer=2
    212024-12-20T21:36:32Z [net] sending ping (8 bytes) peer=3
    222024-12-20T21:38:53Z [net] Making feeler connection to 18.190.155.122:38333
    232024-12-20T21:38:53Z [net] trying v1 connection 18.190.155.122:38333 lastseen=12.6hrs
    242024-12-20T21:38:53Z [net] connect() to 18.190.155.122:38333 failed: Network is unreachable (101)
    252024-12-20T21:40:27Z [net] Making feeler connection to 102.97.13.193:38333
    26...
    272024-12-20T21:50:53Z [net] socket recv error, disconnecting peer=0: Connection timed out (110)
    28...
    292024-12-20T21:51:05Z [net] socket recv error, disconnecting peer=4: Connection timed out (110)
    30...
    312024-12-20T21:51:18Z [net] socket recv error, disconnecting peer=1: Network is unreachable (101)
    32...
    332024-12-20T21:51:22Z [net] socket recv error, disconnecting peer=2: Connection timed out (110)
    34...
    352024-12-20T21:52:48Z [net] socket recv error, disconnecting peer=9: Connection timed out (110)
    36...
    372024-12-20T21:52:52Z [net] socket recv error, disconnecting peer=8: Network is unreachable (101)
    38...
    392024-12-20T21:52:56Z [net] socket recv error, disconnecting peer=5: Connection timed out (110)
    40...
    412024-12-20T21:53:00Z [net] socket recv error, disconnecting peer=3: Connection timed out (110)
    42...
    432024-12-20T21:53:21Z [net] socket recv error, disconnecting peer=6: Connection timed out (110)
    44...
    452024-12-20T21:53:49Z [net] socket recv error, disconnecting peer=7: Connection timed out (110)
    

    Maybe the PR summary could be touched up slightly:

    All cases where we disconnect now come with a log message that has the word disconnecting:

    • CloseSocketDisconnect() logs disconnecting peer=…

    This is describing the state on master I guess, but all the other points describe what is changed on this PR.

    • whereever we set pnode->fDisconnect = true;

    Except 2 cases, a & b here: #28521 (review)

    Typo: Should be wherever

  105. Sjors commented at 2:36 am on December 21, 2024: member

    Fixed the typo in the description.

    This is describing the state on master I guess, but all the other points describe what is changed on this PR.

    I added “all calls to” to the first item.

    Added a link to your comment for the list of exceptions.

  106. ryanofsky merged this on Dec 27, 2024
  107. ryanofsky closed this on Dec 27, 2024

  108. Sjors deleted the branch on Dec 27, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me