net, net_processing: additional and consistent disconnect logging #28521

pull Sjors wants to merge 3 commits into bitcoin:master from Sjors:2023/09/more_net_log changing 9 files +119 −76
  1. Sjors commented at 2:22 pm on September 23, 2023: member

    While debugging unexpected disconnections, possibly related to #28331, I found some additional [net] logging to be useful.

    All cases where we disconnect now come with a log message that has the word disconnecting:

    • CloseSocketDisconnect() logs disconnecting peer=…
    • whereever we set pnode->fDisconnect = true;
    • for all InactivityCheck cases (which in turn sets fDisconnect)
    • replaces “dropping” with “disconnecting” in Network not active, dropping peer=…

    I changed CloseSocketDisconnect() to no longer log disconnecting, and instead have all the call sites do so.

    This PR introduces two helper functions on CNode: DisconnectMsg and LogIP. The second and third commit use these helpers in net_processing.cpp so these disconnect messages are more consistent now (e.g. some didn’t log the IP). No new messages are added there though.

    The LogIP() helper is rarely used outside of a disconnect event, but it’s available for future use.

    Any LogPrint this PR touches is replaced with LogDebug (superseded by #30750), and every LogPrintf with LogInfo.

  2. DrahtBot commented at 2:22 pm on September 23, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK ajtowns, jonatack, sr-gi
    Stale ACK 0xB10C, RandyMcMillan, davidgumberg, naumenkogs

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29418 (rpc: provide per message stats for global traffic via new RPC ‘getnetmsgstats’ by vasild)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Sep 23, 2023
  4. ajtowns commented at 11:17 am on September 24, 2023: contributor
    Concept ACK – always makes sense to debug log disconnection reasons IMO
  5. 0xB10C commented at 12:57 pm on September 25, 2023: contributor
    Code review ACK 320978662db6b17e61afbf77a6584e33181c12b9
  6. naumenkogs commented at 8:50 am on September 26, 2023: member
    ACK 320978662db6b17e61afbf77a6584e33181c12b9
  7. in src/net.cpp:2143 in 320978662d outdated
    2138@@ -2138,7 +2139,10 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    2139             }
    2140         }
    2141 
    2142-        if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
    2143+        if (InactivityCheck(*pnode)) {
    2144+            LogPrint(BCLog::NET, "disconnecting due to inactivity peer=%d\n", pnode->GetId());
    


    ajtowns commented at 9:14 am on September 26, 2023:

    InactivityCheck() already logs a reason for any case where it returns true, eg:

    02023-09-26T07:21:14.013289Z [net] socket receive timeout: 1201s peer=745   <==== this one
    12023-09-26T07:21:14.014945Z [net] disconnecting peer=745
    22023-09-26T07:21:14.016616Z [net] Cleared nodestate for peer=745
    3
    42023-09-26T08:33:15.016157Z [net] version handshake timeout peer=9862    <=== and this one
    52023-09-26T08:33:15.016539Z [net] disconnecting peer=9862
    62023-09-26T08:33:15.017321Z [net] Cleared nodestate for peer=9862
    

    Does an extra log entry here actually help? Would adding the word “disconnecting” to the log entries in InactivityCheck() be better?


    Sjors commented at 3:46 pm on September 27, 2023:
    Done, and also made sure the keyword disconnecting appears in all cases where we disconnect.
  8. Sjors force-pushed on Sep 27, 2023
  9. Sjors force-pushed on Sep 27, 2023
  10. RandyMcMillan commented at 10:42 pm on September 27, 2023: contributor
    ACK 61a208a1d8d970558ffcf582b5573f3f3219caa9
  11. DrahtBot requested review from 0xB10C on Sep 27, 2023
  12. DrahtBot requested review from naumenkogs on Sep 27, 2023
  13. in src/net.cpp:2124 in 61a208a1d8 outdated
    2107@@ -2108,6 +2108,7 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    2108             {
    2109                 bool notify = false;
    2110                 if (!pnode->ReceiveMsgBytes({pchBuf, (size_t)nBytes}, notify)) {
    2111+                    LogPrint(BCLog::NET, "receiving message bytes failed for peer=%d\n", pnode->GetId());
    


    ajtowns commented at 8:52 am on September 28, 2023:
    This one didn’t get “disconnecting” added?

    Sjors commented at 9:33 am on September 28, 2023:
    No, because CloseSocketDisconnect already uses that term. I updated the PR description (yesterday) to explain the reasoning here.
  14. DrahtBot added the label Needs rebase on Nov 28, 2023
  15. Sjors commented at 1:15 pm on November 29, 2023: member
    Rebased.
  16. DrahtBot requested review from ajtowns on Nov 29, 2023
  17. Sjors force-pushed on Nov 29, 2023
  18. DrahtBot removed the label Needs rebase on Nov 29, 2023
  19. jonatack commented at 8:25 pm on November 30, 2023: member
    Concept ACK. I’d find it helpful to also log the addrport if fLogIPs is true, along with the network (for instance, because IPv6 addresses look the same as CJDNS ones), e.g. something like net=..., peeraddr=...,
  20. Sjors commented at 8:41 am on December 1, 2023: member
    I added peeraddr= to each of them. This is indeed quite useful, because we can’t look up the IP after disconnection. Adding net= also makes sense, but let’s do that everywhere in another PR (probably using a helper function).
  21. Sjors force-pushed on Dec 1, 2023
  22. maflcko commented at 9:14 am on December 1, 2023: member

    because we can’t look up the IP after disconnection.

    Is the IP not logged on connection, when IP logging is enabled?

  23. Sjors commented at 10:08 am on December 1, 2023: member
    @maflcko indeed it is, though for inbound it depends on the exact log settings. I think right now you won’t see these disconnect messages without -debug=net and with it you’ll see all the connect messages. But if we increase the relative priority of disconnect messages then you might not. So it still seems useful to add.
  24. sr-gi commented at 9:03 pm on January 5, 2024: member
    Concept ACK. Just a single consideration: wouldn’t it also make sense to test this with -logips=1?
  25. Sjors commented at 8:16 am on January 10, 2024: member
    @sr-gi I could, but I prefer to punt that until we have a helper function that can be tested in just one place: #28521 (comment)
  26. DrahtBot added the label CI failed on Jan 17, 2024
  27. Sjors force-pushed on Feb 13, 2024
  28. DrahtBot removed the label CI failed on Feb 13, 2024
  29. DrahtBot added the label Needs rebase on Feb 27, 2024
  30. Sjors force-pushed on Feb 27, 2024
  31. DrahtBot removed the label Needs rebase on Feb 27, 2024
  32. davidgumberg commented at 7:14 pm on May 4, 2024: contributor

    I may be mistaken in understanding you to be saying something else, but it seems to me that any time pnode->fDisconnect is set to true, this will result in a call to CloseSocketDisconnect() and a disconnect log will happen.

    Maybe this doesn’t affect the purpose of this PR, as the connection close may come some time later than pnode->fDisconnect and using ‘disconnect’ consistently makes it easier to connect the message from CloseSocketDisconnect() with the reason that is logged above.

    In CConnman::DisconnectNodes() :

     0void CConnman::DisconnectNodes()
     1{
     2    // [...]
     3    {
     4        // [...]
     5        // Disconnect unused nodes
     6        std::vector<CNode*> nodes_copy = m_nodes;
     7        for (CNode* pnode : nodes_copy)
     8        {
     9            if (pnode->fDisconnect)
    10            {
    11                // remove from m_nodes
    12                m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
    13                // [...]
    14                // close socket and cleanup
    15                pnode->CloseSocketDisconnect();
    16                // [...]
    17            }
    18        }
    19    }
    20    // [...]
    21}
    

    For example, I tested setting fNetworkActive with the setnetworkactive rpc and got the following logs on this branch:

    02024-05-04T19:06:56Z SetNetworkActive: false
    12024-05-04T19:06:56Z [net] Network not active, disconnecting peer=0
    22024-05-04T19:06:56Z [net] Network not active, disconnecting peer=1
    32024-05-04T19:06:56Z [net] Network not active, disconnecting peer=2
    42024-05-04T19:06:56Z [net] disconnecting peer=0
    52024-05-04T19:06:56Z [net] disconnecting peer=1
    62024-05-04T19:06:56Z [net] disconnecting peer=2
    

    This should probably also use the #28318 LogDebug interface over LogPrint.

  33. DrahtBot added the label Needs rebase on Jul 10, 2024
  34. Sjors force-pushed on Jul 11, 2024
  35. Sjors commented at 8:40 am on July 11, 2024: member

    Rebased after #29431 @davidgumberg this PR makes it easier to do something like cat ~/.bitcoin/debug.log | grep disconnect, and avoid the need for e.g. grep -n2 (or more) to get additional context.

    I switched to LogDebug in the lines touched by this PR.

  36. Sjors force-pushed on Jul 11, 2024
  37. DrahtBot added the label CI failed on Jul 11, 2024
  38. DrahtBot removed the label Needs rebase on Jul 11, 2024
  39. DrahtBot removed the label CI failed on Jul 11, 2024
  40. DrahtBot added the label Needs rebase on Jul 16, 2024
  41. Sjors force-pushed on Jul 16, 2024
  42. Sjors commented at 9:46 am on July 16, 2024: member
    Rebased after #30420.
  43. DrahtBot removed the label Needs rebase on Jul 16, 2024
  44. DrahtBot added the label Needs rebase on Jul 17, 2024
  45. Sjors force-pushed on Jul 18, 2024
  46. DrahtBot removed the label Needs rebase on Jul 18, 2024
  47. davidgumberg commented at 1:49 am on July 23, 2024: contributor

    Re-reviewed and manually tested ACK https://github.com/bitcoin/bitcoin/commit/45958d595b70222c89c25768f357a2ebbafac2b1

    It should be easy to grep for the reason that a peer got disconnected, and logging the peer’s address when fLogIPs seems helpful.

    Tested by running:

    0./bitcoind -datadir=/tmp/trash -debug=net -logips
    

    made fNetworkActive == false by:

    0./bitcoin-cli -datadir=/tmp/trash setnetworkactive false
    
     02024-07-23T01:08:08Z SetNetworkActive: false
     12024-07-23T01:08:08Z [net] Network not active, disconnecting peer=0 peeraddr=[REDACTEDIP]:8333
     22024-07-23T01:08:08Z [net] Network not active, disconnecting peer=1 peeraddr=[REDACTEDIP]:8333
     32024-07-23T01:08:08Z [net] Network not active, disconnecting peer=2 peeraddr=[REDACTEDIP]:8333
     42024-07-23T01:08:08Z [net] Network not active, disconnecting peer=4 peeraddr=[REDACTEDIP]:8333
     52024-07-23T01:08:08Z [net] Network not active, disconnecting peer=5 peeraddr=[REDACTEDIP]:8333
     62024-07-23T01:08:08Z [net] Network not active, disconnecting peer=6 peeraddr=[REDACTEDIP]:8333
     72024-07-23T01:08:08Z [net] disconnecting peer=0
     82024-07-23T01:08:08Z [net] disconnecting peer=1
     92024-07-23T01:08:08Z [net] disconnecting peer=2
    102024-07-23T01:08:08Z [net] disconnecting peer=4
    112024-07-23T01:08:08Z [net] disconnecting peer=5
    122024-07-23T01:08:08Z [net] disconnecting peer=6
    132024-07-23T01:08:08Z [net] disconnecting peer=8
    142024-07-23T01:08:08Z [net] disconnecting peer=10
    152024-07-23T01:08:08Z [net] disconnecting peer=11
    

    Micro-nit: if the purpose is to reduce the need to do grep -n2 , then it seems to me we should add the word ‘disconnect’ to all instances where the reason appears in the line above, and if we were to always log the reason for disconnect when it happens, it seems to me the extra generic log in CloseSocketDisconnect should be dropped.

  48. DrahtBot requested review from sr-gi on Jul 23, 2024
  49. DrahtBot requested review from jonatack on Jul 23, 2024
  50. in src/net.cpp:2005 in 45958d595b outdated
    2003+        );
    2004         return true;
    2005     }
    2006 
    2007     if (!node.fSuccessfullyConnected) {
    2008+        const std::string disconnect_msg{strprintf(", disconnecting peer=%d%s",
    


    davidgumberg commented at 2:26 am on July 23, 2024:

    nit: Maybe this should move to the top of the function and be used in the other three disconnect logs here

    0 if (!ShouldRunInactivityChecks(node, now)) return false;
    1+const std::string disconnect_msg{...};
    
  51. Sjors force-pushed on Jul 23, 2024
  52. Sjors commented at 11:03 am on July 23, 2024: member

    Thanks @davidgumberg. I added a helper function CNode::DisconnectMsg, and it’s now also used for:

    • “receiving message bytes failed”
    • “socket send error for”
    • “socket recv error”
    • “socket closed”

    In order to get rid of the redundant generic “disconnecting peer” in CNode::CloseSocketDisconnect(), I added the log statement to the two remaining call sites.

    Since it’s a method on CNode I can use it in net_processing as well, for which I added a separate commit. It has the effect of slightly modifying some of the existing log messages, and adds IP address logging in a few more places.

    I did not apply this to:

    • “connected to self”
    • some messages that would require a more complicated helper:
      • “disconnecting but not discouraging”
      • “disconnecting extra block-relay-only peer=%d”
      • “disconnecting extra outbound peer”

    While I was on a roll, I also added a LogIP() helper (as suggested earlier). There’s only a small number of places where we print the IP address that’s not in the context of a disconnect - but now it’s easier to expand that. I skipped the “send version message” message log here.

  53. Sjors renamed this:
    net: additional disconnect logging
    net, net_processing: additional and consistent disconnect logging
    on Jul 23, 2024
  54. Sjors force-pushed on Jul 23, 2024
  55. DrahtBot added the label CI failed on Jul 23, 2024
  56. DrahtBot commented at 12:31 pm on July 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27799245327

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  57. Sjors commented at 12:31 pm on July 23, 2024: member
    Pushed an update to consistently replace any LogPrint we touch with LogDebug, and every LogPrintf with LogInfo. Also fixed some missing \n characters as found by tidy.
  58. DrahtBot removed the label CI failed on Jul 23, 2024
  59. in src/net.cpp:697 in 5b718dc785 outdated
    689@@ -690,11 +690,16 @@ bool CNode::ReceiveMsgBytes(Span<const uint8_t> msg_bytes, bool& complete)
    690     return true;
    691 }
    692 
    693+std::string CNode::LogIP(bool log_ip) const
    694+{
    695+    return log_ip ? strprintf(" peeraddr=%s", addr.ToStringAddrPort()) : "";
    696+}
    697+
    


    davidgumberg commented at 7:44 pm on July 31, 2024:

    Why not:

    0std::string CNode::LogIP() const
    1{
    2    return fLogIPs ? strprintf(" peeraddr=%s", addr.ToStringAddrPort()) : "";
    3}
    

    Sjors commented at 8:42 am on August 9, 2024:
    I think we should avoid accessing the global variable fLogIPs. A future refactor could add m_log_ips to PeerManagerImpl and/or CConnman, initialize it from gArgs and then drop fLogIPs.
  60. DrahtBot added the label Needs rebase on Sep 2, 2024
  61. Sjors force-pushed on Sep 3, 2024
  62. Sjors commented at 8:06 am on September 3, 2024: member
    Rebased after #30750. It made the diff slightly smaller, since I had made the same changes on every line touched here.
  63. DrahtBot removed the label Needs rebase on Sep 3, 2024
  64. naumenkogs commented at 8:18 am on September 11, 2024: member

    ACK 08cd29f5020995294a4f6274e72afab58bf69aa2

    The first two commits make logging cleaner and more uniform. I’m not sure the last commit is useful, but it doesn’t hurt either.

  65. DrahtBot requested review from davidgumberg on Sep 11, 2024
  66. in src/net_processing.cpp:6246 in 0bbeabd82e outdated
    6242@@ -6240,7 +6243,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    6243             // Stalling only triggers when the block download window cannot move. During normal steady state,
    6244             // the download window should be much larger than the to-be-downloaded set of blocks, so disconnection
    6245             // should only happen during initial block download.
    6246-            LogPrintf("Peer=%d%s is stalling block download, disconnecting\n", pto->GetId(), fLogIPs ? strprintf(" peeraddr=%s", pto->addr.ToStringAddrPort()) : "");
    6247+            LogInfo("Peer is stalling block download%s\n", pto->DisconnectMsg(fLogIPs));
    


    davidgumberg commented at 11:50 pm on September 17, 2024:

    nit:

    0            LogInfo("Peer is stalling block download, %s\n", pto->DisconnectMsg(fLogIPs));
    

    Sjors commented at 8:04 am on September 18, 2024:
    Fixed.
  67. DrahtBot requested review from davidgumberg on Sep 17, 2024
  68. in src/net.cpp:1911 in 73d87f62d1 outdated
    1907@@ -1902,6 +1908,7 @@ void CConnman::DisconnectNodes()
    1908                 pnode->grantOutbound.Release();
    1909 
    1910                 // close socket and cleanup
    1911+                LogDebug(BCLog::NET, "%s\n", pnode->DisconnectMsg(fLogIPs));
    


    davidgumberg commented at 0:45 am on September 18, 2024:

    I believe your PR makes sure that DisconnectMsg gets printed any time fDisconnect == true so this line results in a duplicate log in each of those instances.

    For example when doing bitcoin-cli setnetworkactive false:

     02024-09-18T00:32:49Z SetNetworkActive: false
     12024-09-18T00:32:49Z [net] Network not active, disconnecting peer=0 peeraddr=xx.xx.xx.xx:8333
     22024-09-18T00:32:49Z [net] Network not active, disconnecting peer=1 peeraddr=xx.xx.xx.xx:8333
     32024-09-18T00:32:49Z [net] Network not active, disconnecting peer=3 peeraddr=xx.xx.xx.xx:8333
     42024-09-18T00:32:49Z [net] Network not active, disconnecting peer=4 peeraddr=xx.xx.xx.xx:8333
     52024-09-18T00:32:49Z [net] Network not active, disconnecting peer=5 peeraddr=xx.xx.xx.xx:8333
     62024-09-18T00:32:49Z [net] Network not active, disconnecting peer=6 peeraddr=xx.xx.xx.xx:8333
     72024-09-18T00:32:49Z [net] Network not active, disconnecting peer=7 peeraddr=xx.xx.xx.xx:8333
     82024-09-18T00:32:49Z [net] disconnecting peer=0 peeraddr=xx.xx.xx.xx:8333
     92024-09-18T00:32:49Z [net] disconnecting peer=1 peeraddr=xx.xx.xx.xx:8333
    102024-09-18T00:32:49Z [net] disconnecting peer=3 peeraddr=xx.xx.xx.xx:8333
    112024-09-18T00:32:49Z [net] disconnecting peer=4 peeraddr=xx.xx.xx.xx:8333
    122024-09-18T00:32:49Z [net] disconnecting peer=5 peeraddr=xx.xx.xx.xx:8333
    132024-09-18T00:32:49Z [net] disconnecting peer=6 peeraddr=xx.xx.xx.xx:8333
    142024-09-18T00:32:49Z [net] disconnecting peer=7 peeraddr=xx.xx.xx.xx:8333
    

    Sjors commented at 8:04 am on September 18, 2024:
    You’re right. I dropped this line and checked again that anytime fDisconnect is set to true emits a log message.
  69. DrahtBot requested review from davidgumberg on Sep 18, 2024
  70. net: additional disconnection logging
    Use the word "disconnecting" everywhere for easier grep.
    e4552e2cd0
  71. net_processing: use CNode::DisconnectMsg helper
    This is not a pure refactor:
    1. It slightly changes the log messages, as reflected in the test changes
    2. It adds the IP address to all disconnect logging (when fLogIPs is set)
    fddfc47131
  72. net: add LogIP() helper, use in net_processing 9967e8feb6
  73. Sjors force-pushed on Sep 18, 2024
  74. Sjors commented at 8:04 am on September 18, 2024: member
    Rebased and dropped duplicate message in CConnman::DisconnectNodes.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-19 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me