This picks up #19995 in an attempt to solve #21559.
The goal of this PR is to mitigate disk filling attacks by temporarily rate limiting LogPrintf(…)
.
A disk fill attack is an attack where an untrusted party (such as a peer) is able to cheaply make your node log to disk excessively. The excessive logging may fill your disk and thus make your node crash either cleanly (best case: if disk fill rate is relatively slow) or uncleanly (worst case: if disk fill rate is relatively fast).
Approach
The hourly logging quota is set per source location. Every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour. Logging is only dropped from source locations that exceed the quota.
- Only logging to disk is rate limited. Logging to console is not rate limited.
- Only
LogPrintf(…)
is rate limited.LogPrint(category, …)
(-debug
) is not rate limited. UpdateTip: new best=[…]
is logged usingLogPrintfWithoutRateLimiting(…)
to avoid rate limiting. High log volume is expected for that source location during IBD.- When logging is restarted a tally of how many messages/bytes were dropped is printed.
- Log rate limiting can be disabled with the
-ratelimitlogging
config option. - All logs will be prefixed with
[*]
if there is at least one source location that is currently being suppressed.
Alternatives
In a recent PR Review Club meeting we discussed this PR and evaluated alternative approaches.
Global rate limiting
There was some discussion around an alternative approach which would be to globally rate limit LogPrintf(…)
. This approach was implemented in a competing PR #21706. The main point [1] in favor of global rate limiting is that it could be confusing to have incomplete logs when logs from one source location are dropped but not from others. The main point against global rate limiting is that it opens another attack vector where an attacker could trigger rate limiting and then execute a 2. attack which would then not be document in the logs at all. With regard to the attack vector on the global approach and the overall reviews the two approaches have gotten I have chosen to continue with the approach in this PR.
(To address the concern of [1] i have chosen to prefix all logs with [*]
if there is at least one source location that is currently being suppressed.)
logrotate
LarryRuane brought up logrotate which can be used on linux to monitor log files and perform actions (delete, compress, send somewhere, etc.) on them when a certain size or point in time is reached. logrotate
could be used to rate limit log but since we would like log rate limiting to be enabled by default it is not the best solution. Also on windows an alternative would have to be used.
Testing
I have a written a unit test which is contained in this PR. (Not the greatest test, it currently does not work on windows thanks to \r\n
. If someone has an idea how to improve the test i would appreciate it)
Further more I made a RPC here that can log excessive amounts of “a"s from different locations which i have found useful during my own testing.
bitcoin-cli excessivelog <location (1-5)> <num_bytes>
⚠️One thing to note with that rpc is that the rate limiting logic still prints the last message that triggered the limiting to disk, so something like bitcoin-cli excessivelog 1 536870912
would still log ~512MiB to disk. Logging to console is also never suppressed (unless -printtoconsole=0) ⚠️
A simple example to use the rpc:
0bitcoin-cli -regtest setmocktime 1
1bitcoin-cli -regtest excessivelog 1 1048500 # log just under 1MiB
2bitcoin-cli -regtest excessivelog 1 100 # this should get the total amount logged above 1MiB
3 # and the rate limiting logic should kick in
4bitcoin-cli -regtest setmocktime 3602 # let 1 hour pass (only works in regtest)
5bitcoin-cli -regtest excessivelog 1 100 # this should trigger logging to resume