p2p: Put disconnecting logs into BCLog::NET category

hebasto commented at 2:24 pm on August 29, 2020: member

It’s too noisy:

0$ cat debug.log | wc -l
128529
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
310177

p2p: Put disconnecting logs into BCLog::NET category 1816327e53

MarcoFalke commented at 2:28 pm on August 29, 2020: member

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa

DrahtBot added the label P2P on Aug 29, 2020

jonatack commented at 4:30 pm on August 29, 2020: member

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

jonatack commented at 4:33 pm on August 29, 2020: member

0$ cat debug.log | wc -l
128529
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
310177

These figures are very far from representative? I have 154230 and 116, e.g. less than one line in one thousand, and with no banning, just one seednode and a few addnodes in the config file.

hebasto commented at 4:35 pm on August 29, 2020: member

0$ cat debug.log | wc -l
128529
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
310177
These figures are very far from representative? I have 154230 and 116.

Interesting, what factors affect this?

FWIW, that is the node the log was grepped from: https://bitnodes.io/nodes/leaderboard/?q=95.164.65.194

hebasto commented at 4:38 pm on August 29, 2020: member

@jonatack

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

This PR does not change the logging output amount with the debug=net option.

jonatack commented at 4:41 pm on August 29, 2020: member

I think it should stay unconditionally logged. If one-third of a node’s debug log is this message, ISTM the node has issues and this should be visibly logged.

hebasto commented at 4:44 pm on August 29, 2020: member

I think it should stay unconditionally logged. If one-third of a node’s debug log is this message, ISTM the node has issues and this should be visibly logged.

What kind of the node’s issues could force it to discourage and disconnect peers?

jonatack commented at 5:10 pm on August 29, 2020: member

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

hebasto commented at 5:20 pm on August 29, 2020: member

I think it should stay unconditionally logged. If one-third of a node’s debug log is this message, ISTM the node has issues and this should be visibly logged.

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

I mean “peers do misbehave”, not “the node has issues”.

jonatack commented at 5:28 pm on August 29, 2020: member

The node having issues because it’s seeing a pathological amount of misbehavior. Best to fix the issue, not hide the logs.

practicalswift commented at 5:57 pm on August 30, 2020: contributor

@jonatack What if the peer is doing this deliberately to try to fill our logs with crap? Either to cause disk usage, or to bury warnings about potential real issues (such as “Potential stale tip detected …”) by increasing signal to noise.

Generally I don’t think misbehaving peers should be able to cause us to write to disk unless the user is explicitly opting in to debug logging.

jonatack commented at 6:23 pm on August 30, 2020: member

I’d argue that this is a real issue. One-third of all logging for misbehavior should not be happening, and if the case is truly as extreme as described at the top, then alarms should be going off; better to know about it ASAP and take useful measures (setban, addnode, seednode, etc). The net logging is far too noisy to be useful unless you are specifically searching it for this or have some kind of custom, up-to-date-on-master monitoring system in place. I don’t think it’s reasonable to change the logging for everyone based on an unusually pathological case.

jonatack commented at 6:29 pm on August 30, 2020: member

(If people don’t want to see peer misbehavior unconditionally logged, maybe we should add a distinct category of NET logging dedicated to signals of healthy node operation, but ISTM that is what the unconditional log is for.)

practicalswift commented at 7:29 pm on August 30, 2020: contributor

@jonatack

if the case is truly as extreme as described at the top, then alarms should be going off […]

If so then I think it is better is we log when the “alarm should go off” condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are “more normal than normally” and that he/she draws the conclusion that something could be wrong.

I don’t think that is good usability: evidently also long term Core contributors don’t follow the described pattern as proven by this PR :)

Suggestion: we move forward with @hebasto’s suggestion and a follow-up PR can make sure the alarm goes off in a clear and unambiguous way when there is indication that something really is broken in a way that may require end-user attention/action :)

jonatack commented at 7:45 pm on August 30, 2020: member

This PR is made based on only only one data point, and afaict an extremely unusual one. Is anyone else really seeing this much misbehavior logging?

in src/net_processing.cpp:3831 in 1816327e53

3827@@ -3828,7 +3828,7 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
3828     }
3829 
3830     // Normal case: Disconnect the peer and discourage all nodes sharing the address
3831-    LogPrintf("Disconnecting and discouraging peer %d!\n", peer_id);

jonatack commented at 8:04 pm on August 30, 2020:

I don’t think that is good usability: evidently also long term Core contributors don’t follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2df5f7d443c0c9ee056ab94aa0fc19566d5 and last touched by @jnewbery in 655b1957470c39b.

practicalswift commented at 8:28 am on August 31, 2020:

If so then I think it is better is we log when the “alarm should go off” condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are “more normal than normally” and that he/she draws the conclusion that something could be wrong.

I don’t think that is good usability: evidently also long term Core contributors don’t follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.

My point was simply that Core contributor hebasto who by definition is a power-user didn’t interpret the “alarm should go off” logging as such in this case (and TBH I’m not certain he should have). I think it is unreasonable to expect that ordinary users should do that interpretation even if we assume this would be a “alarm should go off” situation.

If this is something we want to warn about we should do it in a more clear and unambiguous way :)

jonatack commented at 8:41 am on August 31, 2020:

Ok. It’s hard to know–I think I’d be more worried about network topology here than a peer causing lots of logging, but I’m not sure if that POV is correct or shared.

I’m thinking of looking at the NET logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though.

jnewbery commented at 9:20 am on August 31, 2020: member

Logging here was added way back in https://github.com/bitcoin/bitcoin/commit/b2864d2fb35fa85e32c76e111f8900598e0bb69d, shortly after logging categories were added (#3009). That commit also added unconditional logging to Misbehaving(), which has now moved to the NET category. I don’t have a very strong opinion, but I think that these logs should probably also go in the NET category.

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category @MarcoFalke: I don’t understand. Can you explain why they’re different?

I’m thinking of looking at the NET logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though. @jonatack: No objections. I’d take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

practicalswift commented at 9:34 am on August 31, 2020: contributor

ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa – I think these logs belong in the NET category :)

jonatack commented at 9:44 am on August 31, 2020: member

I think there is something dramatically wrong with a node that has this message as one-third of its logging. It shouldn’t be a motivation for this change.

jonatack commented at 9:46 am on August 31, 2020: member

On my nodes these logs are less than one line in a thousand, and I’m not banning anything.

MarcoFalke commented at 1:51 pm on August 31, 2020: member

@MarcoFalke: I don’t understand. Can you explain why they’re different?

I assume that noban peers are “trusted” peers hand-picked by the node operator, so they should be assumed to not misbehave. If they do and fill your disk, then it is user error and not Bitcoin Core error.

MarcoFalke commented at 3:25 pm on August 31, 2020: member

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with “unsupervised” I mean that the users doesn’t check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

Power users that understand internal Bitcoin Core logic and wish to parse the log for peers to ban, can enable the debug category and use grep to parse the log.

I understand that the net category is heavily dominated by tx-relay logs, but maybe a different pull can split up finer-grained categories from net?

jonatack commented at 8:06 am on September 2, 2020: member

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

I’d propose to have a global policy on connection/disconnection logging rather than PRing changes to each of them one-by-one.

If peer connection messages are displayed unconditionally, disconnections arguably should be as well–and many are.

It seems there would be consensus that connections/disconnections be either logged unconditionally, or that it should be possible to separate the logging of these higher-level, lower-frequency actions from the lower-level, high-frequency messages (inv, getdata, tx, etc.)

practicalswift commented at 8:19 am on September 2, 2020: contributor

@jonatack

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

If you know of any scenario where a peer connecting to my node can cause my node to write to the default log (cheaply) then I’d be very interested to know about them so that I can fix them.

Assume a node running bitcoind with default settings (-debug not set).

Generally I don’t like the idea of untrusted parties being able to cause unwanted resource usage on my machine :)

jonatack commented at 9:13 am on September 2, 2020: member

@practicalswift I understand your concern. Reconnecting from different IP addresses and re-misbehaving may be trivial, but misbehavior leading to discouragement does not automatically lead to immediate disconnection; see src/banman.h::L36-55. If an attacker is wasting our resources in this way, which #19219 aimed to reduce, a line in the log per disconnection seems the least of our worries. This message ends with an exclamation mark for good reason–it’s signal, not noise.

practicalswift commented at 1:34 pm on September 2, 2020: contributor

@jonatack

I think @MarcoFalke summarized the current policy quite nicely:

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with “unsupervised” I mean that the users doesn’t check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

If the suggestion is to change the policy to something else we should probably open a separate issue for that to get more participation :)

The change in this PR is perfectly in line with the current policy AFAICT.

jonatack commented at 1:54 pm on September 2, 2020: member

@practicalswift ISTM better solutions have been proposed in this discussion already, this unconditional logging is no oversight, and this can be closed.

hebasto commented at 12:09 pm on November 26, 2020: member

Closing for now, as ill behavior is no longer observed, at least for me :)

hebasto closed this on Nov 26, 2020

ajtowns commented at 2:00 am on December 4, 2020: member

On my nodes these logs are less than one line in a thousand, and I’m not banning anything.

Are you accepting inbound connections or hidden behind tor or something? That seems really low. FWIW, I’m seeing ~50% of log lines being “Disconnecting and discouraging” without NET enabled (11.8k out of 22.4k); though that node is a chunk behind current master.

jonatack commented at 10:23 pm on December 4, 2020: member

0$ ~/.bitcoin$ cat debug.log | wc -l
19156971
2$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
32
4$ cat debug.log | grep "discouraging" | wc -l
52
6$ grep -i "discourag" ~/.bitcoin/debug.log 
72020-10-08T19:45:15Z Disconnecting and discouraging peer 604!
82020-10-09T17:23:27Z Disconnecting and discouraging peer 288!

Yeah, down to less than one in a million now, but it’s a pointless stat, since a couple months been running a tor v3 hidden service with almost no inbounds. 50% seems like a lot though.

I’m thinking of looking at the NET logging to see if we should consider creating a “high level” net logging category (or a “user level” one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I’m not aware of, though.

@jonatack: No objections. I’d take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

I’d still like this, may propose something soon :tm:

ajtowns commented at 1:35 am on December 5, 2020: member

Yeah; it is a lot – but it’s triggerable by any badly written fake node software that does something stupid and also reconnects to your node sometime after being disconnected. I restarted that node after updating to current master and am back up to 378 lines of debug.log with 161 of them being Disconnecting and discouraging. (no debug logging enabled, which also means I can’t tell what IPs those peer ids are related to)

practicalswift commented at 10:03 am on December 5, 2020: contributor

Judging from the discussions in this PR is seems to me that the consensus opinion is that “Disconnecting and discouraging peer” is too noisy. Consider re-opening this PR :)

hebasto reopened this on Dec 5, 2020

jonatack commented at 10:18 am on December 5, 2020: member

Judging from the discussions in this PR is seems to me that the consensus opinion is that “Disconnecting and discouraging peer” is too noisy. Consider re-opening this PR :)

I suspect that re-proposing to hide the symptoms for all users every time one person has a situation with their node might not be the optimal way to address this. “Let’s sweep it under the rug.”

That said, since this discussion will go in circles until there is a real solution, I’ll unsubscribe here and concentrate at some point on proposing a solution for node operators who would like to know about these situations without running and grepping the current NET category logging which is far, far more noisy.

hebasto commented at 10:18 am on December 5, 2020: member

Judging from the discussions in this PR is seems to me that the consensus opinion is that “Disconnecting and discouraging peer” is too noisy. Consider re-opening this PR :)

Re-opened.

practicalswift commented at 3:52 pm on December 5, 2020: contributor

ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa for the reasons MarcoFalke gave above.

It shouldn’t be easy for an attacker to add noise to our logs. -debug can be used in troubleshooting scenarios.

ajtowns commented at 7:39 am on December 7, 2020: member

Having left the restarted node running for a couple of days now, I’ve broken my logs up by how frequently each message appears.

Standard “everything’s going fine” logs:

091 x Bitcoin Core version v0.20.99.0-e21b82438 (release build)  [introductory logs]
1387 x UpdateTip: new best=....
249 x New outbound peer connected: ...

Blockchain warnings about people sending me BCH headers and blocks sometimes being slow:

093 x ERROR: AcceptBlockHeader: Consensus::ContextualCheckBlockHeader: ..., bad-diffbits, incorrect proof of work
113 x Potential stale tip detected, will try using extra outbound peer (last tip update: ... seconds ago)

Info about disk allocations/pruning:

0100 x Pre-allocating up to position ... in ...
14 x Prune: UnlinkPrunedFiles deleted blk/rev (...)
28 x FlushStateToDisk: write coins cache to disk ... started / completed (...)
34 x Leaving block file ...: CBlockFileInfo(...)

Connection info, pretty much none of which is actionable with the information given:

01114 x  Disconnecting and discouraging peer ...! 
119 x ping timeout: 1200.[...]
216 x socket receive timeout: 1201s
311 x socket send error Broken pipe (32)
45 x Socks5() connect to ....onion:8333 failed: connection refused
55 x socket send error Connection reset by peer (104)
63 x Socks5() connect to ....onion:8333 failed: host unreachable
73 x socket send error Connection timed out (110)

ACK 1816327e533d359c237c53eb6440b2f3a7cbf4fa

MarcoFalke commented at 8:07 am on December 7, 2020: member

I’ve edited one review (https://github.com/bitcoin/bitcoin/pull/19832#issuecomment-739313243) to remove an @ mention, which would be included in the merge commit

MarcoFalke merged this on Dec 7, 2020

MarcoFalke closed this on Dec 7, 2020

hebasto deleted the branch on Dec 7, 2020

sidhujag referenced this in commit 6ea6605232 on Dec 7, 2020

DrahtBot locked this on Aug 16, 2022

p2p: Put disconnecting logs into BCLog::NET category #19832