Sjors
commented at 2:22 pm on September 23, 2023:
member
While debugging unexpected disconnections, possibly related to #28331, I found some additional [net] logging to be useful.
All cases where we disconnect now come with a log message that has the word disconnecting:
CloseSocketDisconnect() logs disconnecting peer=…
whereever we set pnode->fDisconnect = true;
for all InactivityCheck cases (which in turn sets fDisconnect)
replaces “dropping” with “disconnecting” in Network not active, dropping peer=…
I changed CloseSocketDisconnect() to no longer log disconnecting, and instead have all the call sites do so.
This PR introduces two helper functions on CNode: DisconnectMsg and LogIP. The second and third commit use these helpers in net_processing.cpp so these disconnect messages are more consistent now (e.g. some didn’t log the IP). No new messages are added there though.
The LogIP() helper is rarely used outside of a disconnect event, but it’s available for future use.
Any LogPrint this PR touches is replaced with LogDebug (superseded by #30750), and every LogPrintf with LogInfo.
DrahtBot
commented at 2:22 pm on September 23, 2023:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
#29418 (rpc: provide per message stats for global traffic via new RPC ‘getnetmsgstats’ by vasild)
#29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
P2P
on Sep 23, 2023
ajtowns
commented at 11:17 am on September 24, 2023:
contributor
Concept ACK – always makes sense to debug log disconnection reasons IMO
0xB10C
commented at 12:57 pm on September 25, 2023:
contributor
naumenkogs
commented at 8:50 am on September 26, 2023:
member
ACK320978662db6b17e61afbf77a6584e33181c12b9
in
src/net.cpp:2143
in
320978662doutdated
2138@@ -2138,7 +2139,10 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
2139 }
2140 }
21412142- if (InactivityCheck(*pnode)) pnode->fDisconnect = true;
2143+ if (InactivityCheck(*pnode)) {
2144+ LogPrint(BCLog::NET, "disconnecting due to inactivity peer=%d\n", pnode->GetId());
ajtowns
commented at 9:14 am on September 26, 2023:
InactivityCheck() already logs a reason for any case where it returns true, eg:
02023-09-26T07:21:14.013289Z [net] socket receive timeout: 1201s peer=745 <==== this one
12023-09-26T07:21:14.014945Z [net] disconnecting peer=745
22023-09-26T07:21:14.016616Z [net] Cleared nodestate for peer=745
342023-09-26T08:33:15.016157Z [net] version handshake timeout peer=9862 <=== and this one
52023-09-26T08:33:15.016539Z [net] disconnecting peer=9862
62023-09-26T08:33:15.017321Z [net] Cleared nodestate for peer=9862
Does an extra log entry here actually help? Would adding the word “disconnecting” to the log entries in InactivityCheck() be better?
No, because CloseSocketDisconnect already uses that term. I updated the PR description (yesterday) to explain the reasoning here.
DrahtBot added the label
Needs rebase
on Nov 28, 2023
Sjors
commented at 1:15 pm on November 29, 2023:
member
Rebased.
DrahtBot requested review from ajtowns
on Nov 29, 2023
Sjors force-pushed
on Nov 29, 2023
DrahtBot removed the label
Needs rebase
on Nov 29, 2023
jonatack
commented at 8:25 pm on November 30, 2023:
member
Concept ACK. I’d find it helpful to also log the addrport if fLogIPs is true, along with the network (for instance, because IPv6 addresses look the same as CJDNS ones), e.g. something like net=..., peeraddr=...,
Sjors
commented at 8:41 am on December 1, 2023:
member
I added peeraddr= to each of them. This is indeed quite useful, because we can’t look up the IP after disconnection. Adding net= also makes sense, but let’s do that everywhere in another PR (probably using a helper function).
Sjors force-pushed
on Dec 1, 2023
maflcko
commented at 9:14 am on December 1, 2023:
member
because we can’t look up the IP after disconnection.
Is the IP not logged on connection, when IP logging is enabled?
Sjors
commented at 10:08 am on December 1, 2023:
member
@maflcko indeed it is, though for inbound it depends on the exact log settings. I think right now you won’t see these disconnect messages without -debug=net and with it you’ll see all the connect messages. But if we increase the relative priority of disconnect messages then you might not. So it still seems useful to add.
sr-gi
commented at 9:03 pm on January 5, 2024:
member
Concept ACK. Just a single consideration: wouldn’t it also make sense to test this with -logips=1?
Sjors
commented at 8:16 am on January 10, 2024:
member
@sr-gi I could, but I prefer to punt that until we have a helper function that can be tested in just one place: #28521 (comment)
DrahtBot added the label
CI failed
on Jan 17, 2024
Sjors force-pushed
on Feb 13, 2024
DrahtBot removed the label
CI failed
on Feb 13, 2024
DrahtBot added the label
Needs rebase
on Feb 27, 2024
Sjors force-pushed
on Feb 27, 2024
DrahtBot removed the label
Needs rebase
on Feb 27, 2024
davidgumberg
commented at 7:14 pm on May 4, 2024:
contributor
I may be mistaken in understanding you to be saying something else, but it seems to me that any time pnode->fDisconnect is set to true, this will result in a call to CloseSocketDisconnect() and a disconnect log will happen.
Maybe this doesn’t affect the purpose of this PR, as the connection close may come some time later than pnode->fDisconnect and using ‘disconnect’ consistently makes it easier to connect the message from CloseSocketDisconnect() with the reason that is logged above.
This should probably also use the #28318LogDebug interface over LogPrint.
DrahtBot added the label
Needs rebase
on Jul 10, 2024
Sjors force-pushed
on Jul 11, 2024
Sjors
commented at 8:40 am on July 11, 2024:
member
Rebased after #29431@davidgumberg this PR makes it easier to do something like cat ~/.bitcoin/debug.log | grep disconnect, and avoid the need for e.g. grep -n2 (or more) to get additional context.
I switched to LogDebug in the lines touched by this PR.
Sjors force-pushed
on Jul 11, 2024
DrahtBot added the label
CI failed
on Jul 11, 2024
DrahtBot removed the label
Needs rebase
on Jul 11, 2024
DrahtBot removed the label
CI failed
on Jul 11, 2024
DrahtBot added the label
Needs rebase
on Jul 16, 2024
Sjors force-pushed
on Jul 16, 2024
Sjors
commented at 9:46 am on July 16, 2024:
member
Micro-nit: if the purpose is to reduce the need to do grep -n2 , then it seems to me we should add the word ‘disconnect’ to allinstanceswhere the reason appears in the line above, and if we were to always log the reason for disconnect when it happens, it seems to me the extra generic log in CloseSocketDisconnect should be dropped.
DrahtBot requested review from sr-gi
on Jul 23, 2024
DrahtBot requested review from jonatack
on Jul 23, 2024
davidgumberg
commented at 2:26 am on July 23, 2024:
nit: Maybe this should move to the top of the function and be used in the other three disconnect logs here
0 if (!ShouldRunInactivityChecks(node, now)) return false;
1+const std::string disconnect_msg{...};
Sjors force-pushed
on Jul 23, 2024
Sjors
commented at 11:03 am on July 23, 2024:
member
Thanks @davidgumberg. I added a helper function CNode::DisconnectMsg, and it’s now also used for:
“receiving message bytes failed”
“socket send error for”
“socket recv error”
“socket closed”
In order to get rid of the redundant generic “disconnecting peer” in CNode::CloseSocketDisconnect(), I added the log statement to the two remaining call sites.
Since it’s a method on CNode I can use it in net_processing as well, for which I added a separate commit. It has the effect of slightly modifying some of the existing log messages, and adds IP address logging in a few more places.
I did not apply this to:
“connected to self”
some messages that would require a more complicated helper:
“disconnecting but not discouraging”
“disconnecting extra block-relay-only peer=%d”
“disconnecting extra outbound peer”
While I was on a roll, I also added a LogIP() helper (as suggested earlier). There’s only a small number of places where we print the IP address that’s not in the context of a disconnect - but now it’s easier to expand that. I skipped the “send version message” message log here.
Sjors renamed this:
net: additional disconnect logging
net, net_processing: additional and consistent disconnect logging
on Jul 23, 2024
Sjors force-pushed
on Jul 23, 2024
DrahtBot added the label
CI failed
on Jul 23, 2024
DrahtBot
commented at 12:31 pm on July 23, 2024:
contributor
Make sure to run all tests locally, according to the documentation.
The failure may happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
Sjors
commented at 12:31 pm on July 23, 2024:
member
Pushed an update to consistently replace any LogPrint we touch with LogDebug, and every LogPrintf with LogInfo. Also fixed some missing \n characters as found by tidy.
DrahtBot removed the label
CI failed
on Jul 23, 2024
I think we should avoid accessing the global variable fLogIPs. A future refactor could add m_log_ips to PeerManagerImpl and/or CConnman, initialize it from gArgs and then drop fLogIPs.
DrahtBot added the label
Needs rebase
on Sep 2, 2024
Sjors force-pushed
on Sep 3, 2024
Sjors
commented at 8:06 am on September 3, 2024:
member
Rebased after #30750. It made the diff slightly smaller, since I had made the same changes on every line touched here.
DrahtBot removed the label
Needs rebase
on Sep 3, 2024
naumenkogs
commented at 8:18 am on September 11, 2024:
member
ACK08cd29f5020995294a4f6274e72afab58bf69aa2
The first two commits make logging cleaner and more uniform.
I’m not sure the last commit is useful, but it doesn’t hurt either.
DrahtBot requested review from davidgumberg
on Sep 11, 2024
in
src/net_processing.cpp:6246
in
0bbeabd82eoutdated
6242@@ -6240,7 +6243,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
6243 // Stalling only triggers when the block download window cannot move. During normal steady state,
6244 // the download window should be much larger than the to-be-downloaded set of blocks, so disconnection
6245 // should only happen during initial block download.
6246- LogPrintf("Peer=%d%s is stalling block download, disconnecting\n", pto->GetId(), fLogIPs ? strprintf(" peeraddr=%s", pto->addr.ToStringAddrPort()) : "");
6247+ LogInfo("Peer is stalling block download%s\n", pto->DisconnectMsg(fLogIPs));
davidgumberg
commented at 11:50 pm on September 17, 2024:
nit:
0 LogInfo("Peer is stalling block download, %s\n", pto->DisconnectMsg(fLogIPs));
DrahtBot requested review from davidgumberg
on Sep 17, 2024
in
src/net.cpp:1911
in
73d87f62d1outdated
1907@@ -1902,6 +1908,7 @@ void CConnman::DisconnectNodes()
1908 pnode->grantOutbound.Release();
19091910 // close socket and cleanup
1911+ LogDebug(BCLog::NET, "%s\n", pnode->DisconnectMsg(fLogIPs));
davidgumberg
commented at 0:45 am on September 18, 2024:
I believe your PR makes sure that DisconnectMsg gets printed any time fDisconnect == true so this line results in a duplicate log in each of those instances.
For example when doing bitcoin-cli setnetworkactive false:
Thanks for putting up with the review churn on this, it looks great! This PR makes peer disconnect message logging more consistent, and makes us better respect the -logips argument.
This %d %d, last_recv.count() != 0, last_send.count() != 0 is super obscure. It was like that before, so it is fine wrt this PR.
But would be nice to fix it, since this is changing the same log message already. I am not sure how though. Maybe change that %d to something like “has never received from the peer” + “has never sent to the peer”.
Tempted to leave the PR as is for now as it’s already quite large.
vasild
commented at 12:42 pm on October 8, 2024:
contributor
Approach ACK9967e8feb6
I noticed the repeating pattern fLogIPs ? strprintf(", peeraddr=%s", pfrom.addr.ToStringAddrPort()) : "" while working on #29415 and I repeated it a lot of times in the new messages I added :disappointed:. It has been on my TODO to introduce some function to deduplicate that. Thanks for the CNode::LogIP() method!
naumenkogs
commented at 1:38 pm on October 8, 2024:
member
Sjors
commented at 3:36 pm on October 21, 2024:
member
Seems like I need to look into and probably fix at least #28521 (review)
Will look into the other comments too.
net: additional disconnection logging
Use the word "disconnecting" everywhere for easier grep.
ad224429f8
net_processing: use CNode::DisconnectMsg helper
This is not a pure refactor:
1. It slightly changes the log messages, as reflected in the test changes
2. It adds the IP address to all disconnect logging (when fLogIPs is set)
937ef9eb40
net: add LogIP() helper, use in net_processing1d01ad4d73
Sjors force-pushed
on Nov 26, 2024
Sjors
commented at 12:24 pm on November 26, 2024:
member
Rebased and addressed feedback.
Sjors force-pushed
on Nov 26, 2024
DrahtBot added the label
CI failed
on Nov 26, 2024
DrahtBot
commented at 12:59 pm on November 26, 2024:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
Sjors
commented at 12:59 pm on November 26, 2024:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me