It’s too noisy:
0$ cat debug.log | wc -l
128529
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
310177
It’s too noisy:
0$ cat debug.log | wc -l
128529
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
310177
NET
log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.
0$ cat debug.log | wc -l 128529 2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l 310177
These figures are very far from representative? I have 154230 and 116, e.g. less than one line in one thousand, and with no banning, just one seednode and a few addnodes in the config file.
0$ cat debug.log | wc -l 128529 2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l 310177
These figures are very far from representative? I have 154230 and 116.
Interesting, what factors affect this?
FWIW, that is the node the log was grepped from: https://bitnodes.io/nodes/leaderboard/?q=95.164.65.194
I think it should stay unconditionally logged. If one-third of a node’s debug log is this message, ISTM the node has issues and this should be visibly logged.
What kind of the node’s issues could force it to discourage and disconnect peers?
net_processing.cpp::Misbehaving
.
I think it should stay unconditionally logged. If one-third of a node’s debug log is this message, ISTM the node has issues and this should be visibly logged.
Misbehaving peers; look at the callers of
net_processing.cpp::Misbehaving
.
I mean “peers do misbehave”, not “the node has issues”.
@jonatack What if the peer is doing this deliberately to try to fill our logs with crap? Either to cause disk usage, or to bury warnings about potential real issues (such as “Potential stale tip detected …”) by increasing signal to noise.
Generally I don’t think misbehaving peers should be able to cause us to write to disk unless the user is explicitly opting in to debug logging.
NET
logging dedicated to signals of healthy node operation, but ISTM that is what the unconditional log is for.)
if the case is truly as extreme as described at the top, then alarms should be going off […]
If so then I think it is better is we log when the “alarm should go off” condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are “more normal than normally” and that he/she draws the conclusion that something could be wrong.
I don’t think that is good usability: evidently also long term Core contributors don’t follow the described pattern as proven by this PR :)
Suggestion: we move forward with @hebasto’s suggestion and a follow-up PR can make sure the alarm goes off in a clear and unambiguous way when there is indication that something really is broken in a way that may require end-user attention/action :)
3827@@ -3828,7 +3828,7 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
3828 }
3829
3830 // Normal case: Disconnect the peer and discourage all nodes sharing the address
3831- LogPrintf("Disconnecting and discouraging peer %d!\n", peer_id);
If so then I think it is better is we log when the “alarm should go off” condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are “more normal than normally” and that he/she draws the conclusion that something could be wrong.
I don’t think that is good usability: evidently also long term Core contributors don’t follow the described pattern as proven by this PR :)
This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.
My point was simply that Core contributor hebasto who by definition is a power-user didn’t interpret the “alarm should go off” logging as such in this case (and TBH I’m not certain he should have). I think it is unreasonable to expect that ordinary users should do that interpretation even if we assume this would be a “alarm should go off” situation.
If this is something we want to warn about we should do it in a more clear and unambiguous way :)
Ok. It’s hard to know–I think I’d be more worried about network topology here than a peer causing lots of logging, but I’m not sure if that POV is correct or shared.
I’m thinking of looking at the NET
logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though.
Logging here was added way back in https://github.com/bitcoin/bitcoin/commit/b2864d2fb35fa85e32c76e111f8900598e0bb69d, shortly after logging categories were added (#3009). That commit also added unconditional logging to Misbehaving()
, which has now moved to the NET
category. I don’t have a very strong opinion, but I think that these logs should probably also go in the NET
category.
noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category @MarcoFalke: I don’t understand. Can you explain why they’re different?
I’m thinking of looking at the NET logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though. @jonatack: No objections. I’d take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg =
-debug=net:warning,tor:debug
etc), or have a logging post-processor that can filter by severity/category.
NET
category :)
@MarcoFalke: I don’t understand. Can you explain why they’re different?
I assume that noban peers are “trusted” peers hand-picked by the node operator, so they should be assumed to not misbehave. If they do and fill your disk, then it is user error and not Bitcoin Core error.
Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with “unsupervised” I mean that the users doesn’t check the debug log unless there is a local software malfunction or other issue that needs manual debugging.
Power users that understand internal Bitcoin Core logic and wish to parse the log for peers to ban, can enable the debug category and use grep
to parse the log.
I understand that the net
category is heavily dominated by tx-relay logs, but maybe a different pull can split up finer-grained categories from net
?
There are 17 other LogPrintf
lines in the same file (net_processing.cpp
) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp
.
I’d propose to have a global policy on connection/disconnection logging rather than PRing changes to each of them one-by-one.
If peer connection messages are displayed unconditionally, disconnections arguably should be as well–and many are.
It seems there would be consensus that connections/disconnections be either logged unconditionally, or that it should be possible to separate the logging of these higher-level, lower-frequency actions from the lower-level, high-frequency messages (inv, getdata, tx, etc.)
There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.
If you know of any scenario where a peer connecting to my node can cause my node to write to the default log (cheaply) then I’d be very interested to know about them so that I can fix them.
Assume a node running bitcoind
with default settings (-debug
not set).
Generally I don’t like the idea of untrusted parties being able to cause unwanted resource usage on my machine :)
src/banman.h::L36-55
. If an attacker is wasting our resources in this way, which #19219 aimed to reduce, a line in the log per disconnection seems the least of our worries. This message ends with an exclamation mark for good reason–it’s signal, not noise.
I think @MarcoFalke summarized the current policy quite nicely:
Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with “unsupervised” I mean that the users doesn’t check the debug log unless there is a local software malfunction or other issue that needs manual debugging.
If the suggestion is to change the policy to something else we should probably open a separate issue for that to get more participation :)
The change in this PR is perfectly in line with the current policy AFAICT.
On my nodes these logs are less than one line in a thousand, and I’m not banning anything.
Are you accepting inbound connections or hidden behind tor or something? That seems really low. FWIW, I’m seeing ~50% of log lines being “Disconnecting and discouraging” without NET enabled (11.8k out of 22.4k); though that node is a chunk behind current master.
0$ ~/.bitcoin$ cat debug.log | wc -l
19156971
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
32
4$ cat debug.log | grep "discouraging" | wc -l
52
6$ grep -i "discourag" ~/.bitcoin/debug.log
72020-10-08T19:45:15Z Disconnecting and discouraging peer 604!
82020-10-09T17:23:27Z Disconnecting and discouraging peer 288!
Yeah, down to less than one in a million now, but it’s a pointless stat, since a couple months been running a tor v3 hidden service with almost no inbounds. 50% seems like a lot though.
I’m thinking of looking at the NET logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though.
@jonatack: No objections. I’d take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.
I’d still like this, may propose something soon :tm:
Judging from the discussions in this PR is seems to me that the consensus opinion is that “Disconnecting and discouraging peer” is too noisy. Consider re-opening this PR :)
I suspect that re-proposing to hide the symptoms for all users every time one person has a situation with their node might not be the optimal way to address this. “Let’s sweep it under the rug.”
That said, since this discussion will go in circles until there is a real solution, I’ll unsubscribe here and concentrate at some point on proposing a solution for node operators who would like to know about these situations without running and grepping the current NET category logging which is far, far more noisy.
Judging from the discussions in this PR is seems to me that the consensus opinion is that “Disconnecting and discouraging peer” is too noisy. Consider re-opening this PR :)
Re-opened.
ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa for the reasons MarcoFalke gave above.
It shouldn’t be easy for an attacker to add noise to our logs. -debug
can be used in troubleshooting scenarios.
Having left the restarted node running for a couple of days now, I’ve broken my logs up by how frequently each message appears.
Standard “everything’s going fine” logs:
091 x Bitcoin Core version v0.20.99.0-e21b82438 (release build) [introductory logs]
1387 x UpdateTip: new best=....
249 x New outbound peer connected: ...
Blockchain warnings about people sending me BCH headers and blocks sometimes being slow:
093 x ERROR: AcceptBlockHeader: Consensus::ContextualCheckBlockHeader: ..., bad-diffbits, incorrect proof of work
113 x Potential stale tip detected, will try using extra outbound peer (last tip update: ... seconds ago)
Info about disk allocations/pruning:
0100 x Pre-allocating up to position ... in ...
14 x Prune: UnlinkPrunedFiles deleted blk/rev (...)
28 x FlushStateToDisk: write coins cache to disk ... started / completed (...)
34 x Leaving block file ...: CBlockFileInfo(...)
Connection info, pretty much none of which is actionable with the information given:
01114 x Disconnecting and discouraging peer ...!
119 x ping timeout: 1200.[...]
216 x socket receive timeout: 1201s
311 x socket send error Broken pipe (32)
45 x Socks5() connect to ....onion:8333 failed: connection refused
55 x socket send error Connection reset by peer (104)
63 x Socks5() connect to ....onion:8333 failed: host unreachable
73 x socket send error Connection timed out (110)
ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa
@
mention, which would be included in the merge commit