log: Mitigate disk filling attacks by rate limiting LogPrintf #21603

pull dergoegge wants to merge 11 commits into bitcoin:master from dergoegge:log_ratelimiting changing 7 files +348 −58
  1. dergoegge commented at 12:47 pm on April 5, 2021: member

    This picks up #19995 in an attempt to solve #21559.

    The goal of this PR is to mitigate disk filling attacks by temporarily rate limiting LogPrintf(…).

    A disk fill attack is an attack where an untrusted party (such as a peer) is able to cheaply make your node log to disk excessively. The excessive logging may fill your disk and thus make your node crash either cleanly (best case: if disk fill rate is relatively slow) or uncleanly (worst case: if disk fill rate is relatively fast).

    Approach

    The hourly logging quota is set per source location. Every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour. Logging is only dropped from source locations that exceed the quota.

    • Only logging to disk is rate limited. Logging to console is not rate limited.
    • Only LogPrintf(…) is rate limited. LogPrint(category, …) (-debug) is not rate limited.
    • UpdateTip: new best=[…] is logged using LogPrintfWithoutRateLimiting(…) to avoid rate limiting. High log volume is expected for that source location during IBD.
    • When logging is restarted a tally of how many messages/bytes were dropped is printed.
    • Log rate limiting can be disabled with the -ratelimitlogging config option.
    • All logs will be prefixed with [*] if there is at least one source location that is currently being suppressed.

    Alternatives

    In a recent PR Review Club meeting we discussed this PR and evaluated alternative approaches.

    Global rate limiting

    There was some discussion around an alternative approach which would be to globally rate limit LogPrintf(…). This approach was implemented in a competing PR #21706. The main point [1] in favor of global rate limiting is that it could be confusing to have incomplete logs when logs from one source location are dropped but not from others. The main point against global rate limiting is that it opens another attack vector where an attacker could trigger rate limiting and then execute a 2. attack which would then not be document in the logs at all. With regard to the attack vector on the global approach and the overall reviews the two approaches have gotten I have chosen to continue with the approach in this PR. (To address the concern of [1] i have chosen to prefix all logs with [*] if there is at least one source location that is currently being suppressed.)

    logrotate

    LarryRuane brought up logrotate which can be used on linux to monitor log files and perform actions (delete, compress, send somewhere, etc.) on them when a certain size or point in time is reached. logrotate could be used to rate limit log but since we would like log rate limiting to be enabled by default it is not the best solution. Also on windows an alternative would have to be used.

    Testing

    I have a written a unit test which is contained in this PR. (Not the greatest test, it currently does not work on windows thanks to \r\n. If someone has an idea how to improve the test i would appreciate it)

    Further more I made a RPC here that can log excessive amounts of “a"s from different locations which i have found useful during my own testing. bitcoin-cli excessivelog <location (1-5)> <num_bytes>

    ⚠️One thing to note with that rpc is that the rate limiting logic still prints the last message that triggered the limiting to disk, so something like bitcoin-cli excessivelog 1 536870912 would still log ~512MiB to disk. Logging to console is also never suppressed (unless -printtoconsole=0) ⚠️

    A simple example to use the rpc:

    0bitcoin-cli -regtest setmocktime 1
    1bitcoin-cli -regtest excessivelog 1 1048500 # log just under 1MiB
    2bitcoin-cli -regtest excessivelog 1 100 # this should get the total amount logged above 1MiB
    3                                        # and the rate limiting logic should kick in
    4bitcoin-cli -regtest setmocktime 3602 # let 1 hour pass (only works in regtest)
    5bitcoin-cli -regtest excessivelog 1 100 # this should trigger logging to resume
    
  2. fanquake added the label Utils/log/libs on Apr 5, 2021
  3. dergoegge commented at 12:49 pm on April 5, 2021: member

    In response to @jnewbery’s comment

    I think I’d prefer log-shedding to be done globally, with a special log at the end of a log suppression period that tallies up which log locations have been suppressed.

    I disagree on the global schedding, i think the source location based schedding is over all the better approach. I agree with your point that it could be confusing to drop some logs in the context of user issues but schedding should only kick in if there is an attack or something is horribly wrong, so i would think that almost all logs provided by users would be complete. We can also always tell if rate limiting is taking place since we log when it starts/stops and in those cases a partial log is better then no log.

    (This is obviously mostly an opinion and I am totally willing to be convinced of global log-schedding if enough reviewers favor it)

  4. DrahtBot commented at 3:08 pm on April 5, 2021: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25306 (logging: add LogPrintfCategory to log unconditionally with category by jonatack)
    • #25287 (logging: threshold log level by klementtan)
    • #25203 (logging: update to severity-based logging by jonatack)
    • #24950 (Add config option to set max debug log size by tehelsper)
    • #23443 (p2p: Erlay support signaling by naumenkogs)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. dergoegge force-pushed on Apr 5, 2021
  6. jnewbery commented at 7:39 am on April 6, 2021: contributor

    schedding should only kick in if there is an attack or something is horribly wrong, so i would think that almost all logs provided by users would be complete.

    Right, and those cases are exactly where we need the best possible logs.

    We can also always tell if rate limiting is taking place since we log when it starts/stops and in those cases a partial log is better then no log.

    I guarantee you that not everyone will know this and most people will miss a single log line saying “rate-limiting log x”.

    I can only speak from my own experience of supporting network equipment for several years - partial logs are often worse than no logs at all since they’re so misleading.

  7. practicalswift commented at 7:55 am on April 6, 2021: contributor

    Strongest possible Concept ACK: this mitigation will kill an entire bug class (the “disk fill via logging” bug class).

    Thanks for taking up this work @dergoegge.

    Will test and review.

  8. practicalswift commented at 8:27 am on April 6, 2021: contributor

    @jnewbery

    I think I’d prefer log-shedding to be done globally, with a special log at the end of a log suppression period that tallies up which log locations have been suppressed.

    Clarifying questions to fully understand what is suggested:

    Let’s assume that log location A is a misplaced LogPrintf in a code path that the attacker can make us take.

    In the case of an attacker using log location A to fill our disk by making us log from there repeatedly, is the suggestion then that all log locations should be suppressed during the suppression period (instead of suppressing only the “attacker controlled” log location A)?

    To make your suggestion clear: could you exemplify what the special log entry at the end of the log suppression period would look like? (A patch or even a separate PR would be even better, but an example log entry would probably clarify enough!)

  9. in src/test/logging_tests.cpp:46 in fa114ff2d4 outdated
    41+}
    42+
    43+template <typename... Args>
    44+void LogFromFixedLocation(const char* fmt, const Args&... args)
    45+{
    46+    LogPrintf(fmt, args...);
    


    adamjonas commented at 10:08 pm on April 6, 2021:

    this line is tripping up the linter:

    0src/test/logging_tests.cpp: Expected 0 argument(s) after format string but found 1 argument(s): LogPrintf(fmt, args...)
    
  10. dergoegge force-pushed on Apr 6, 2021
  11. dergoegge force-pushed on Apr 6, 2021
  12. in src/test/logging_tests.cpp:138 in 1653167de6 outdated
    126+    // Log 1 MiB, this should be allowed since the usage was reset.
    127+    for (int i = 0; i < 1024; ++i) {
    128+        LogFromFixedLocation(s1023);
    129+    }
    130+    GetLogFileSize(curr_log_file_size);
    131+    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 1024 * 1024);
    


    adamjonas commented at 2:26 pm on April 14, 2021:
    Both windows CI builds are failing on this line. See cirrus.
  13. dergoegge force-pushed on Apr 15, 2021
  14. practicalswift commented at 4:09 am on April 16, 2021: contributor

    @dergoegge

    Thanks for your work on this PR!

    Would you be willing to implement also @jnewbery’s suggestions as a separate PR?

    I tried to summarise his suggestion in #21603 (comment). I’m not certain I got it right though - @jnewbery, feel free to chime in :)

    I think it makes sense to do it as separate PR since it deviates in important ways from the original suggestion. By having two separate PRs we would see which of the two approaches to address this type of attack that have consensus support.

    Personally I think I could live with both approaches: as long as we kill the disk-fill-via-logging bug class sooner rather than later I’m happy :)

  15. dergoegge commented at 12:25 pm on April 16, 2021: member

    Would you be willing to implement also jnewbery’s suggestions as a separate PR? @practicalswift Sure, thats a good idea :)

  16. dergoegge force-pushed on Apr 16, 2021
  17. rebroad commented at 9:45 am on April 17, 2021: contributor
    concept ACK (not that I’m sure if my opinion is relevant) and I think this makes more sense than the global alternative.
  18. DrahtBot added the label Needs rebase on Apr 23, 2021
  19. DrahtBot commented at 9:32 am on May 3, 2021: contributor

    🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file.

  20. dergoegge force-pushed on May 13, 2021
  21. DrahtBot removed the label Needs rebase on May 13, 2021
  22. DrahtBot added the label Needs rebase on Jun 12, 2021
  23. practicalswift commented at 8:04 am on June 12, 2021: contributor
    @dergoegge Thanks for working on this. Would you mind rebasing? I would like to review the updated version :)
  24. dergoegge force-pushed on Jun 12, 2021
  25. DrahtBot removed the label Needs rebase on Jun 12, 2021
  26. dergoegge commented at 11:02 am on June 12, 2021: member

    @practicalswift Rebased :)

    For anyone reviewing this: I made a rpc here that can log excessive amounts of “a"s from different locations which might be useful for testing. bitcoin-cli excessivelog <location (1-5)> <num_bytes>

    ⚠️One thing to note with that rpc is that the rate limiting logic still prints the last message that triggered the limiting to disk, so something like bitcoin-cli excessivelog 1 536870912 would still log ~512MiB to disk. Logging to console is also never suppressed (unless -printtoconsole=0) ⚠️

    A simple example to use the rpc:

    0bitcoin-cli excessivelog 1 1048500 # log just under 1MiB
    1bitcoin-cli excessivelog 1 100 # this should get the total amount logged above 1MiB and the rate limiting logic should kick in
    
  27. Emzy commented at 5:01 pm on June 16, 2021: contributor
    concept ACK I think this PR makes more sense than the global alternative, because global rate limiting would open another attack vector. An attacker could trigger the global limit in one category and conceal another attack from the log.
  28. dergoegge force-pushed on Jun 17, 2021
  29. dergoegge commented at 7:07 pm on June 17, 2021: member

    I moved the rate limiting logic into its own method. log_ratelimiting.0 -> log_ratelimiting.1

      01:  76c59a1cc ! 1:  a3a2cfd64 log: Mitigate disk filling attacks by rate limiting LogPrintf
      1    @@ Metadata
      2      ## Commit message ##
      3         log: Mitigate disk filling attacks by rate limiting LogPrintf
      4     
      5    +    Co-authored-by: dergoegge <n.goeggi@gmail.com>
      6    +
      7      ## src/init/common.cpp ##
      8     @@ src/init/common.cpp: void AddLoggingArgs(ArgsManager& argsman)
      9          argsman.AddArg("-logtimemicros", strprintf("Add microsecond precision to debug timestamps (default: %u)", DEFAULT_LOGTIMEMICROS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
     10    @@ src/logging.cpp: namespace BCLog {
     11     -void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& logging_function, const std::string& source_file, const int source_line)
     12     +static constexpr uint64_t HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION{1024 * 1024};
     13     +
     14    ++bool BCLog::Logger::RateLimit(std::string& str, const std::string& logging_function, const SourceLocation& source_location)
     15    ++{
     16    ++    if (!m_rate_limiting) {
     17    ++        // Rate limiting is disabled.
     18    ++        return false;
     19    ++    }
     20    ++
     21    ++    const std::chrono::seconds now = GetTime<std::chrono::seconds>();
     22    ++    QuotaUsage& quota_usage = m_quota_usage_per_source_location[source_location];
     23    ++    // Is the quota exceeded before this log call?
     24    ++    bool quota_exceeded_before = quota_usage.m_bytes_logged > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
     25    ++
     26    ++    bool dont_skip = false;
     27    ++    // Every hour the quota usage for a source location is reset.
     28    ++    if ((now - quota_usage.m_last_reset) > std::chrono::hours{1}) {
     29    ++        // Should logging to disk continue to be disabled?
     30    ++        bool quota_still_exceeded = quota_usage.m_bytes_dropped > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
     31    ++
     32    ++        if (quota_still_exceeded) {
     33    ++            str = LogTimestampStr(strprintf(
     34    ++                "Not restarting logging from %s:%d (%s): "
     35    ++                "because %d messages (%d MiB) were dropped during the last hour which still exceeds the limit of %d MiB.\n",
     36    ++                source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped,
     37    ++                quota_usage.m_bytes_dropped / (1024 * 1024), HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024)));
     38    ++        } else if (quota_exceeded_before) {
     39    ++            str = LogTimestampStr(strprintf(
     40    ++                "Restarting logging from %s:%d (%s): "
     41    ++                "%d messages (%d MiB) were dropped during the last hour.\n"
     42    ++                "%s",
     43    ++                source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped,
     44    ++                quota_usage.m_bytes_dropped / (1024 * 1024), str));
     45    ++            --m_rate_limited_locations;
     46    ++        }
     47    ++
     48    ++        // Dont skip the reset logs.
     49    ++        dont_skip = quota_exceeded_before || quota_still_exceeded;
     50    ++
     51    ++        // Logging to disk is only re-enabled if the number of dropped bytes did not exceed the limit.
     52    ++        if (!quota_still_exceeded) quota_usage.m_bytes_logged = 0;
     53    ++        quota_usage.m_messages_dropped = 0;
     54    ++        quota_usage.m_bytes_dropped = 0;
     55    ++        quota_usage.m_last_reset = now;
     56    ++    }
     57    ++
     58    ++    if (!quota_exceeded_before) quota_usage.m_bytes_logged += str.size();
     59    ++
     60    ++    bool quota_exceeded_after = quota_usage.m_bytes_logged > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
     61    ++    if (!quota_exceeded_after) {
     62    ++        // The limits were not exceeded and the message should not be dropped.
     63    ++        return false;
     64    ++    }
     65    ++
     66    ++    if (!quota_exceeded_before) {
     67    ++        str = LogTimestampStr(strprintf(
     68    ++            "Excessive logging detected from %s:%d (%s): "
     69    ++            ">%d MiB logged during the last hour. "
     70    ++            "Suppressing logging to disk from this source location for up to one hour. "
     71    ++            "Console logging unaffected. Last log entry: %s",
     72    ++            source_location.first, source_location.second, logging_function,
     73    ++            HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024), str));
     74    ++        ++m_rate_limited_locations;
     75    ++    } else if (!dont_skip) {
     76    ++        // The log message should be dropped.
     77    ++        quota_usage.m_messages_dropped++;
     78    ++        quota_usage.m_bytes_dropped += str.size();
     79    ++        return true;
     80    ++    }
     81    ++
     82    ++    return false;
     83    ++}
     84    ++
     85     +void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& logging_function, const SourceLocation& source_location, const bool skip_disk_usage_rate_limiting)
     86      {
     87          StdLockGuard scoped_lock(m_cs);
     88    @@ src/logging.cpp: void BCLog::Logger::LogPrintStr(const std::string& str, const s
     89      
     90     +    // Rate limit logging to disk to avoid disk filling attacks.
     91     +    bool skip_writing_to_disk_due_to_rate_limiting{false};
     92    -+    if (!skip_disk_usage_rate_limiting && m_rate_limiting) {
     93    -+        const std::chrono::seconds now = GetTime<std::chrono::seconds>();
     94    -+        QuotaUsage& quota_usage = m_quota_usage_per_source_location[source_location];
     95    -+        // Is the quota exceeded before this log call?
     96    -+        bool quota_exceeded_before = quota_usage.m_bytes_logged > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
     97    -+
     98    -+        bool dont_skip = false;
     99    -+        // Every hour the quota usage for a source location is reset.
    100    -+        if ((now - quota_usage.m_last_reset) > std::chrono::hours{1}) {
    101    -+            // Should logging to disk continue to be disabled?
    102    -+            bool quota_still_exceeded = quota_usage.m_bytes_dropped > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
    103    -+
    104    -+            if (quota_still_exceeded) {
    105    -+                str_prefixed = LogTimestampStr(strprintf("Not restarting logging from %s:%d (%s): because %d messages (%d MiB) were dropped during the last hour which still exceeds the limit of %d MiB.\n", source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped, quota_usage.m_bytes_dropped / (1024 * 1024), HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024)));
    106    -+            } else if (quota_exceeded_before) {
    107    -+                str_prefixed = LogTimestampStr(strprintf("Restarting logging from %s:%d (%s): %d messages (%d MiB) were dropped during the last hour.\n", source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped, quota_usage.m_bytes_dropped / (1024 * 1024)));
    108    -+            }
    109    -+
    110    -+            // Dont skip the reset logs.
    111    -+            dont_skip = quota_exceeded_before || quota_still_exceeded;
    112    -+
    113    -+            // Logging to disk is only re-enabled if the number of dropped bytes did not exceed the limit.
    114    -+            if (!quota_still_exceeded) quota_usage.m_bytes_logged = 0;
    115    -+            quota_usage.m_messages_dropped = 0;
    116    -+            quota_usage.m_bytes_dropped = 0;
    117    -+            quota_usage.m_last_reset = now;
    118    -+        }
    119    ++    if (!skip_disk_usage_rate_limiting) {
    120    ++        skip_writing_to_disk_due_to_rate_limiting = RateLimit(str_prefixed, logging_function, source_location);
    121    ++    }
    122     +
    123    -+        if (!quota_exceeded_before) quota_usage.m_bytes_logged += str_prefixed.size();
    124    -+
    125    -+        bool quota_exceeded_after = quota_usage.m_bytes_logged > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
    126    -+        if (quota_exceeded_after) {
    127    -+            if (!quota_exceeded_before) {
    128    -+                str_prefixed = LogTimestampStr(strprintf("Excessive logging detected from %s:%d (%s): >%d MiB logged during the last hour. Suppressing logging to disk from this source location for up to one hour. Console logging unaffected. Last log entry: %s", source_location.first, source_location.second, logging_function, HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024), str_prefixed));
    129    -+            } else if (!dont_skip) {
    130    -+                skip_writing_to_disk_due_to_rate_limiting = !dont_skip;
    131    -+                quota_usage.m_messages_dropped++;
    132    -+                quota_usage.m_bytes_dropped += str_prefixed.size();
    133    -+            }
    134    -+        }
    135    ++    if (m_rate_limited_locations > 0) {
    136    ++        str_prefixed.insert(0, "[*] ");
    137     +    }
    138     +
    139          m_started_new_line = !str.empty() && str[str.size()-1] == '\n';
    140    @@ src/logging.h: namespace BCLog {
    141              std::list<std::string> m_msgs_before_open GUARDED_BY(m_cs);
    142              bool m_buffering GUARDED_BY(m_cs) = true; //!< Buffer messages before logging can be started.
    143     +        std::unordered_map<SourceLocation, QuotaUsage, SourceLocationHasher> m_quota_usage_per_source_location GUARDED_BY(m_cs);
    144    ++        std::atomic_int m_rate_limited_locations;
    145      
    146              /**
    147               * m_started_new_line is a state variable that will suppress printing of
    148    @@ src/logging.h: namespace BCLog {
    149              fs::path m_file_path;
    150              std::atomic<bool> m_reopen_file{false};
    151      
    152    ++        /** Evaluate if a log message should not be printed to disk.
    153    ++         *
    154    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in,out] The log message.
    155    ++         *                The rate limiting logic message may change the message
    156    ++         *                to signal the start or end of a supression period.
    157    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     The function from which the log message originated.
    158    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     The source location from which the log message originated.
    159    ++         */
    160    ++        bool RateLimit(std::string& str, const std::string& logging_function, const SourceLocation& source_location) EXCLUSIVE_LOCKS_REQUIRED(m_cs);
    161    ++
    162              /** Send a string to the log output */
    163     -        void LogPrintStr(const std::string& str, const std::string& logging_function, const std::string& source_file, const int source_line);
    164     +        void LogPrintStr(const std::string& str, const std::string& logging_function, const SourceLocation& source_location, const bool skip_disk_usage_rate_limiting);
    1652:  855d05e43 = 2:  5b46965aa test: Add logging test for rate limiting
    

    I also decided to move forward with the approach in this PR and will close #21706. See the updated PR description for reasoning.

  30. dergoegge force-pushed on Jun 17, 2021
  31. Thoragh commented at 12:48 pm on June 19, 2021: contributor

    Concept ACK

    Maybe it would be good to set m_last_reset to the current time when it start limiting so it takes 1 hour until the next reset?

  32. dergoegge commented at 8:33 pm on June 20, 2021: member

    Maybe it would be good to set m_last_reset to the current time when it start limiting so it takes 1 hour until the next reset? @Thoragh Thanks for the suggestion, Im gonna test that out. It might make sense to do but it does make the approach a bit more conservative because a source location might have to wait longer before it can log again (at least 1hour).

  33. in src/init/common.cpp:79 in a2916ae504 outdated
    75@@ -76,6 +76,7 @@ void AddLoggingArgs(ArgsManager& argsman)
    76     argsman.AddArg("-logtimemicros", strprintf("Add microsecond precision to debug timestamps (default: %u)", DEFAULT_LOGTIMEMICROS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    77     argsman.AddArg("-printtoconsole", "Send trace/debug info to console (default: 1 when no -daemon. To disable logging to file, set -nodebuglogfile)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    78     argsman.AddArg("-shrinkdebugfile", "Shrink debug.log file on client startup (default: 1 when no -debug)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    79+    argsman.AddArg("-logratelimiting", strprintf("Rate limit debug logging to disk (default: %u)", DEFAULT_LOGRATELIMITING), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    


    LarryRuane commented at 1:17 am on June 22, 2021:
    0    argsman.AddArg("-logratelimiting", strprintf("Rate limit debug logging to disk (default: %u)", DEFAULT_LOGRATELIMITING), ArgsManager::ALLOW_BOOL, OptionsCategory::OPTIONS);
    

    Optional change; most of the other boolean arguments do specify ALLOW_ANY, I don’t know why. But since ALLOW_BOOL exists, may as well use it – if nothing else, it documents that the argument is a flag rather than some kind of rate threshold (integer value).


    jonatack commented at 7:04 am on June 22, 2021:
    Good point. I think the more specific ones were added later and the idea is to prefer using them for new code.
  34. in src/test/logging_tests.cpp:219 in a2916ae504 outdated
    58+    LogInstance().m_log_timestamps = false;
    59+    bool prev_log_sourcelocations = LogInstance().m_log_sourcelocations;
    60+    LogInstance().m_log_sourcelocations = false;
    61+    bool prev_log_threadnames = LogInstance().m_log_threadnames;
    62+    LogInstance().m_log_threadnames = false;
    63+
    


    LarryRuane commented at 4:25 am on June 22, 2021:
    0
    1    // Log 1024-character lines (1023 plus newline) to make the math simple.
    
  35. in src/test/logging_tests.cpp:33 in a2916ae504 outdated
    29@@ -30,4 +30,115 @@ BOOST_AUTO_TEST_CASE(logging_timer)
    30     BOOST_CHECK_EQUAL(micro_timer.LogMsg("test micros"), "tests: test micros (1000000.00μs)");
    31 }
    32 
    33+void GetLogFileSize(size_t& size)
    


    LarryRuane commented at 5:24 am on June 22, 2021:
    0size_t GetLogFileSize()
    
  36. in src/logging.h:42 in a2916ae504 outdated
    37@@ -32,6 +38,23 @@ struct LogCategory {
    38     bool active;
    39 };
    40 
    41+// Replace with std::source_location when switching to C++20.
    42+using SourceLocation = std::pair<const char*, int>;
    


    LarryRuane commented at 5:33 am on June 22, 2021:
    I would prefer struct here to std::pair. I think pair should be used in situations where two items are grouped together for some deep, fundamental reason, for example, a map key and its value. If you just happen to want to combine two things, but later it could be (say) three, then I think a struct is better. Also, it’s easier to read, location.file and location.line.
  37. in src/logging.h:97 in a2916ae504 outdated
    92@@ -70,6 +93,8 @@ namespace BCLog {
    93         FILE* m_fileout GUARDED_BY(m_cs) = nullptr;
    94         std::list<std::string> m_msgs_before_open GUARDED_BY(m_cs);
    95         bool m_buffering GUARDED_BY(m_cs) = true; //!< Buffer messages before logging can be started.
    96+        std::unordered_map<SourceLocation, QuotaUsage, SourceLocationHasher> m_quota_usage_per_source_location GUARDED_BY(m_cs);
    97+        std::atomic_int m_rate_limited_locations;
    


    LarryRuane commented at 5:38 am on June 22, 2021:
    It seems like this doesn’t need to be atomic.
  38. in src/logging.cpp:413 in a2916ae504 outdated
    332+    if (!skip_disk_usage_rate_limiting) {
    333+        skip_writing_to_disk_due_to_rate_limiting = RateLimit(str_prefixed, logging_function, source_location);
    334+    }
    335+
    336+    if (m_rate_limited_locations > 0) {
    337+        str_prefixed.insert(0, "[*] ");
    


    LarryRuane commented at 6:08 am on June 22, 2021:
    What’s this for? Maybe add a comment.
  39. in src/logging.cpp:249 in a2916ae504 outdated
    238@@ -239,13 +239,86 @@ namespace BCLog {
    239     }
    240 }
    241 
    242-void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& logging_function, const std::string& source_file, const int source_line)
    243+static constexpr uint64_t HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION{1024 * 1024};
    244+
    


    LarryRuane commented at 6:14 am on June 22, 2021:
    Comment needed describing what this function does at a high level and documenting the return value.

    dergoegge commented at 3:36 pm on June 24, 2021:
    There is a comment in src/logging.h.
  40. in src/logging.cpp:270 in a2916ae504 outdated
    326@@ -254,6 +327,16 @@ void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& loggi
    327 
    328     str_prefixed = LogTimestampStr(str_prefixed);
    329 
    330+    // Rate limit logging to disk to avoid disk filling attacks.
    331+    bool skip_writing_to_disk_due_to_rate_limiting{false};
    332+    if (!skip_disk_usage_rate_limiting) {
    333+        skip_writing_to_disk_due_to_rate_limiting = RateLimit(str_prefixed, logging_function, source_location);
    334+    }
    


    LarryRuane commented at 6:26 am on June 22, 2021:
    Consider moving this down to just before it’s needed.
  41. in src/logging.cpp:356 in a2916ae504 outdated
    352@@ -270,7 +353,7 @@ void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& loggi
    353     for (const auto& cb : m_print_callbacks) {
    354         cb(str_prefixed);
    355     }
    356-    if (m_print_to_file) {
    357+    if (m_print_to_file && !skip_writing_to_disk_due_to_rate_limiting) {
    


    LarryRuane commented at 6:32 am on June 22, 2021:

    I’m not sure it’s worth creating this variable skip_writing_to_disk_due_to_rate_limiting, maybe:

    0if (m_print_to_file && (skip_disk_usage_rate_limiting || !RateLimit(str_prefixed, logging_function, source_location))) {
    

    dergoegge commented at 10:50 pm on June 23, 2021:
    We want the rate limiting messages like “restarting logging from…” to also be printed to the console, so we have to call RateLimit before that and store the result. I am still gonna change the name of skip_writing_to_disk_due_to_rate_limiting because its a bit long.
  42. in src/logging.cpp:262 in 01ba3285d4 outdated
    252+    const std::chrono::seconds now = GetTime<std::chrono::seconds>();
    253+    QuotaUsage& quota_usage = m_quota_usage_per_source_location[source_location];
    254+    // Is the quota exceeded before this log call?
    255+    bool quota_exceeded_before = quota_usage.m_bytes_logged > HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION;
    256+
    257+    bool dont_skip = false;
    


    Thoragh commented at 3:08 pm on June 22, 2021:
    If i understand the code correctly, dont_skip is used to make sure the hourly reset messages are printed. Maybe the code could be simplified by removing dont_skip and calling LogPrintf instead for the hourly messages?

    dergoegge commented at 11:10 pm on June 23, 2021:

    That is correct! But this does unfortunately not work because we hold a lock here https://github.com/bitcoin/bitcoin/blob/a2916ae5043436d3fd22acb44c5663227fa7eb90/src/logging.cpp#L317

    A recursive call to LogPrintf also feels a bit dangerous to me.

  43. dergoegge force-pushed on Jun 24, 2021
  44. dergoegge commented at 3:34 pm on June 24, 2021: member

    log_ratelimiting.2 -> log_ratelimiting.3

      01:  01ba3285d ! 1:  2cab018d9 log: Mitigate disk filling attacks by rate limiting LogPrintf
      1    @@ src/init/common.cpp: void AddLoggingArgs(ArgsManager& argsman)
      2          argsman.AddArg("-logtimemicros", strprintf("Add microsecond precision to debug timestamps (default: %u)", DEFAULT_LOGTIMEMICROS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
      3          argsman.AddArg("-printtoconsole", "Send trace/debug info to console (default: 1 when no -daemon. To disable logging to file, set -nodebuglogfile)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
      4          argsman.AddArg("-shrinkdebugfile", "Shrink debug.log file on client startup (default: 1 when no -debug)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
      5    -+    argsman.AddArg("-logratelimiting", strprintf("Rate limit debug logging to disk (default: %u)", DEFAULT_LOGRATELIMITING), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
      6    ++    argsman.AddArg("-logratelimiting", strprintf("Rate limit debug logging to disk (default: %u)", DEFAULT_LOGRATELIMITING), ArgsManager::ALLOW_BOOL, OptionsCategory::OPTIONS);
      7      }
      8      
      9      void SetLoggingOptions(const ArgsManager& args)
     10    @@ src/logging.cpp: namespace BCLog {
     11     +            str = LogTimestampStr(strprintf(
     12     +                "Not restarting logging from %s:%d (%s): "
     13     +                "because %d messages (%d MiB) were dropped during the last hour which still exceeds the limit of %d MiB.\n",
     14    -+                source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped,
     15    ++                source_location.m_file, source_location.m_line, logging_function, quota_usage.m_messages_dropped,
     16     +                quota_usage.m_bytes_dropped / (1024 * 1024), HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024)));
     17     +        } else if (quota_exceeded_before) {
     18     +            str = LogTimestampStr(strprintf(
     19     +                "Restarting logging from %s:%d (%s): "
     20     +                "%d messages (%d MiB) were dropped during the last hour.\n"
     21     +                "%s",
     22    -+                source_location.first, source_location.second, logging_function, quota_usage.m_messages_dropped,
     23    ++                source_location.m_file, source_location.m_line, logging_function, quota_usage.m_messages_dropped,
     24     +                quota_usage.m_bytes_dropped / (1024 * 1024), str));
     25     +            --m_rate_limited_locations;
     26     +        }
     27    @@ src/logging.cpp: namespace BCLog {
     28     +            ">%d MiB logged during the last hour. "
     29     +            "Suppressing logging to disk from this source location for up to one hour. "
     30     +            "Console logging unaffected. Last log entry: %s",
     31    -+            source_location.first, source_location.second, logging_function,
     32    ++            source_location.m_file, source_location.m_line, logging_function,
     33     +            HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024), str));
     34     +        ++m_rate_limited_locations;
     35     +    } else if (!dont_skip) {
     36    @@ src/logging.cpp: namespace BCLog {
     37      
     38          if (m_log_sourcelocations && m_started_new_line) {
     39     -        str_prefixed.insert(0, "[" + RemovePrefix(source_file, "./") + ":" + ToString(source_line) + "] [" + logging_function + "] ");
     40    -+        str_prefixed.insert(0, "[" + RemovePrefix(source_location.first, "./") + ":" + ToString(source_location.second) + "] [" + logging_function + "] ");
     41    ++        str_prefixed.insert(0, "[" + RemovePrefix(source_location.m_file, "./") + ":" + ToString(source_location.m_line) + "] [" + logging_function + "] ");
     42          }
     43      
     44          if (m_log_threadnames && m_started_new_line) {
     45    @@ src/logging.cpp: void BCLog::Logger::LogPrintStr(const std::string& str, const s
     46          str_prefixed = LogTimestampStr(str_prefixed);
     47      
     48     +    // Rate limit logging to disk to avoid disk filling attacks.
     49    -+    bool skip_writing_to_disk_due_to_rate_limiting{false};
     50    -+    if (!skip_disk_usage_rate_limiting) {
     51    -+        skip_writing_to_disk_due_to_rate_limiting = RateLimit(str_prefixed, logging_function, source_location);
     52    -+    }
     53    ++    bool log_quota_exceeded = !skip_disk_usage_rate_limiting &&
     54    ++                              RateLimit(str_prefixed, logging_function, source_location);
     55     +
     56    ++    // To avoid confusion caused by dropped log messages when debugging an issue, we prefix log lines with "[*]"
     57    ++    // when there are any supressed source locations.
     58     +    if (m_rate_limited_locations > 0) {
     59     +        str_prefixed.insert(0, "[*] ");
     60     +    }
     61    @@ src/logging.cpp: void BCLog::Logger::LogPrintStr(const std::string& str, const s
     62              cb(str_prefixed);
     63          }
     64     -    if (m_print_to_file) {
     65    -+    if (m_print_to_file && !skip_writing_to_disk_due_to_rate_limiting) {
     66    ++    if (m_print_to_file && !log_quota_exceeded) {
     67              assert(m_fileout != nullptr);
     68      
     69              // reopen the log file, if requested
     70    @@ src/logging.h: struct LogCategory {
     71      };
     72      
     73     +// Replace with std::source_location when switching to C++20.
     74    -+using SourceLocation = std::pair<const char*, int>;
     75    ++struct SourceLocation {
     76    ++    const char* m_file{nullptr};
     77    ++    int m_line{0};
     78    ++
     79    ++    bool operator==(const SourceLocation& other) const
     80    ++    {
     81    ++        return m_file && other.m_file &&
     82    ++               std::strcmp(m_file, other.m_file) == 0 &&
     83    ++               m_line == other.m_line;
     84    ++    }
     85    ++};
     86    ++
     87     +struct SourceLocationHasher {
     88     +    size_t operator()(const SourceLocation& source_location) const noexcept
     89     +    {
     90     +        // Use CSipHasher(0, 0) as a simple way to get uniform distribution.
     91    -+        return static_cast<size_t>(CSipHasher(0, 0).Write(std::hash<const char*>{}(source_location.first)).Write(std::hash<int>{}(source_location.second)).Finalize());
     92    ++        return static_cast<size_t>(CSipHasher(0, 0).Write(std::hash<const char*>{}(source_location.m_file)).Write(std::hash<int>{}(source_location.m_line)).Finalize());
     93     +    }
     94     +};
     95     +
     96    @@ src/logging.h: namespace BCLog {
     97              std::list<std::string> m_msgs_before_open GUARDED_BY(m_cs);
     98              bool m_buffering GUARDED_BY(m_cs) = true; //!< Buffer messages before logging can be started.
     99     +        std::unordered_map<SourceLocation, QuotaUsage, SourceLocationHasher> m_quota_usage_per_source_location GUARDED_BY(m_cs);
    100    -+        std::atomic_int m_rate_limited_locations;
    101    ++        int m_rate_limited_locations GUARDED_BY(m_cs) = 0;
    102      
    103              /**
    104               * m_started_new_line is a state variable that will suppress printing of
    105    @@ src/logging.h: namespace BCLog {
    106      
    107     +        /** Evaluate if a log message should not be printed to disk.
    108     +         *
    109    -+         * [@param](/bitcoin-bitcoin/contributor/param/)[in,out] str                 The log message.
    110    -+         *                                    The rate limiting logic message may change the message
    111    -+         *                                    to signal the start or end of a supression period.
    112    -+         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     logging_function    The function name from which the log message originated.
    113    -+         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     source_location     The source location from which the log message originated.
    114    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in,out] str              The log message.
    115    ++         *                                 The rate limiting logic message may change the message
    116    ++         *                                 to signal the start or end of a supression period.
    117    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     logging_function The function name from which the log message originated.
    118    ++         * [@param](/bitcoin-bitcoin/contributor/param/)[in]     source_location  The source location from which the log message originated.
    119    ++         *
    120    ++         * [@return](/bitcoin-bitcoin/contributor/return/) true if the message should not be printed to disk, false otherwise.
    121     +         */
    122     +        bool RateLimit(std::string& str, const std::string& logging_function, const SourceLocation& source_location) EXCLUSIVE_LOCKS_REQUIRED(m_cs);
    123     +
    124    @@ src/logging.h: static inline void LogPrintf_(const std::string& logging_function
    125              }
    126     -        LogInstance().LogPrintStr(log_msg, logging_function, source_file, source_line);
    127     +
    128    -+        const SourceLocation source_location = std::make_pair(source_file, source_line);
    129    ++        const SourceLocation source_location{source_file, source_line};
    130     +        LogInstance().LogPrintStr(log_msg, logging_function, source_location, skip_disk_usage_rate_limiting);
    131          }
    132      }
    1332:  a2916ae50 ! 2:  e51bd0bdc test: Add logging test for rate limiting
    134    @@ src/test/logging_tests.cpp: BOOST_AUTO_TEST_CASE(logging_timer)
    135          BOOST_CHECK_EQUAL(micro_timer.LogMsg("test micros"), "tests: test micros (1000000.00μs)");
    136      }
    137      
    138    -+void GetLogFileSize(size_t& size)
    139    ++size_t GetLogFileSize()
    140     +{
    141     +    boost::system::error_code ec;
    142    -+    size = fs::file_size(LogInstance().m_file_path, ec);
    143    ++    size_t size = fs::file_size(LogInstance().m_file_path, ec);
    144     +    if (ec) LogPrintf("%s: %s %s\n", __func__, ec.message(), LogInstance().m_file_path);
    145     +    BOOST_CHECK(!ec);
    146    ++    return size;
    147     +}
    148     +
    149     +void LogFromFixedLocation(const std::string& str)
    150    @@ src/test/logging_tests.cpp: BOOST_AUTO_TEST_CASE(logging_timer)
    151     +    bool prev_log_threadnames = LogInstance().m_log_threadnames;
    152     +    LogInstance().m_log_threadnames = false;
    153     +
    154    ++    // Log 1024-character lines (1023 plus newline) to make the math simple.
    155     +    std::string log_message(1023, 'a');
    156     +
    157     +    SetMockTime(std::chrono::seconds{1});
    158     +
    159     +    size_t prev_log_file_size, curr_log_file_size;
    160    -+    GetLogFileSize(prev_log_file_size);
    161    ++    prev_log_file_size = GetLogFileSize();
    162     +
    163     +    // Log 1 MiB, this should be allowed.
    164     +    for (int i = 0; i < 1024; ++i) {
    165     +        LogFromFixedLocation(log_message);
    166     +    }
    167    -+    GetLogFileSize(curr_log_file_size);
    168    ++    curr_log_file_size = GetLogFileSize();
    169     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 1024 * 1024);
    170     +
    171     +    LogFromFixedLocation("This should trigger rate limiting");
    172    -+    GetLogFileSize(prev_log_file_size);
    173    ++    prev_log_file_size = GetLogFileSize();
    174     +
    175     +    // Log 0.5 MiB, this should not be allowed and all messages should be dropped.
    176     +    for (int i = 0; i < 512; ++i) {
    177     +        LogFromFixedLocation(log_message);
    178     +    }
    179    -+    GetLogFileSize(curr_log_file_size);
    180    ++    curr_log_file_size = GetLogFileSize();
    181     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 0);
    182     +
    183     +    // Let one hour pass.
    184     +    SetMockTime(std::chrono::seconds{60 * 60 + 1});
    185     +    LogFromFixedLocation("This should trigger the quota usage reset");
    186    -+    GetLogFileSize(prev_log_file_size);
    187    ++    prev_log_file_size = GetLogFileSize();
    188     +
    189     +    // Log 1 MiB, this should be allowed since the usage was reset.
    190     +    for (int i = 0; i < 1024; ++i) {
    191     +        LogFromFixedLocation(log_message);
    192     +    }
    193    -+    GetLogFileSize(curr_log_file_size);
    194    ++    curr_log_file_size = GetLogFileSize();
    195     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 1024 * 1024);
    196     +
    197     +    LogFromFixedLocation("This should trigger rate limiting");
    198    -+    GetLogFileSize(prev_log_file_size);
    199    ++    prev_log_file_size = GetLogFileSize();
    200     +
    201     +    // Log 1 MiB, this should not be allowed and all messages should be dropped.
    202     +    for (int i = 0; i < 1024; ++i) {
    203     +        LogFromFixedLocation(log_message);
    204     +    }
    205    -+    GetLogFileSize(curr_log_file_size);
    206    ++    curr_log_file_size = GetLogFileSize();
    207     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 0);
    208     +
    209     +    LogFromFixedLocation("This should also be dropped and bring the total of dropped bytes above 1 MiB.");
    210    -+    GetLogFileSize(curr_log_file_size);
    211    ++    curr_log_file_size = GetLogFileSize();
    212     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 0);
    213     +
    214     +    // Let another hour pass
    215     +    SetMockTime(std::chrono::seconds{2 * (60 * 60 + 1)});
    216     +    LogFromFixedLocation("Normally this would reset the usage but it does not because we dropped more than 1 MiB in the previous hour.");
    217    -+    GetLogFileSize(prev_log_file_size);
    218    ++    prev_log_file_size = GetLogFileSize();
    219     +
    220     +    // Log 1 MiB, this should not be allowed and all messages should be dropped.
    221     +    for (int i = 0; i < 1024; ++i) {
    222     +        LogFromFixedLocation(log_message);
    223     +    }
    224    -+    GetLogFileSize(curr_log_file_size);
    225    ++    curr_log_file_size = GetLogFileSize();
    226     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 0);
    227     +
    228     +    // Let another hour pass
    229     +    SetMockTime(std::chrono::seconds{3 * (60 * 60 + 1)});
    230     +    LogFromFixedLocation("This should trigger the quota usage reset");
    231    -+    GetLogFileSize(prev_log_file_size);
    232    ++    prev_log_file_size = GetLogFileSize();
    233     +
    234     +    // Log 1 MiB, this should be allowed since the usage was reset.
    235     +    for (int i = 0; i < 1024; ++i) {
    236     +        LogFromFixedLocation(log_message);
    237     +    }
    238    -+    GetLogFileSize(curr_log_file_size);
    239    ++    curr_log_file_size = GetLogFileSize();
    240     +    BOOST_CHECK(curr_log_file_size - prev_log_file_size == 1024 * 1024);
    241     +
    242     +    LogInstance().m_log_timestamps = prev_log_timestamps;
    
  45. LarryRuane commented at 9:31 pm on June 27, 2021: contributor
    Reviewers, please take a look at this alternate approach: #22350.
  46. narula commented at 5:33 pm on July 14, 2021: contributor
    fwiw prefer the more traditional log rotation approach in #22350; i find it cognitively easier to think in terms of stale logs rather than source suppression.
  47. DrahtBot added the label Needs rebase on Sep 23, 2021
  48. practicalswift commented at 4:11 pm on October 5, 2021: contributor

    @dergoegge Would you mind rebasing? I would like to re-review it after rebase :)

    FWIW I prefer this solution to the more complex log rotation approach taken in #22350. I find the simpler approach taken in this PR much easier to reason about. This PR solves the issue at hand in the simplest possible way IMO.

  49. dergoegge force-pushed on Oct 5, 2021
  50. dergoegge commented at 6:49 pm on October 5, 2021: member

    @practicalswift Thanks for the nudge! :)

    Rebased on latest master.

  51. DrahtBot removed the label Needs rebase on Oct 5, 2021
  52. dergoegge force-pushed on Feb 17, 2022
  53. dergoegge commented at 5:59 pm on February 17, 2022: member
    In an attempt to make this more reviewable, i split the changes into multiple commits. @practicalswift pinging you, since you were previously interested in reviewing
  54. in src/logging.h:98 in 6f7fd4be58 outdated
    93@@ -66,6 +94,49 @@ namespace BCLog {
    94         ALL         = ~(uint32_t)0,
    95     };
    96 
    97+    //! Fixed window rate limiter for logging.
    98+    class LogRatelimiter
    


    jnewbery commented at 9:49 am on February 24, 2022:

    Limit is a separate word, so perhaps:

    0    class LogRateLimiter
    
  55. in src/logging.h:104 in 6f7fd4be58 outdated
     99+    {
    100+    private:
    101+        //! Timestamp of the last window reset.
    102+        std::chrono::seconds m_last_reset;
    103+        //! Interval after which the window is reset.
    104+        std::chrono::seconds m_reset_interval;
    


    jnewbery commented at 9:49 am on February 24, 2022:
    0        const std::chrono::seconds m_reset_interval;
    
  56. in src/logging.h:109 in 6f7fd4be58 outdated
    104+        std::chrono::seconds m_reset_interval;
    105+        //! Remaining bytes in the current window interval.
    106+        uint64_t m_available_bytes;
    107+        //! Number of bytes that were not consumed within the current window.
    108+        uint64_t m_dropped_bytes{0};
    109+        //! The maximum number of bytes that can be logged withon one window.
    


    jnewbery commented at 9:51 am on February 24, 2022:
    0        //! The maximum number of bytes that can be logged within one window.
    
  57. in src/logging.cpp:268 in 6f7fd4be58 outdated
    268+    if (m_ratelimiters.find(source_location) == m_ratelimiters.end()) {
    269+        // We create a new rate limit window for this source location on its first log attempt.
    270+        m_ratelimiters.emplace(source_location, LogRatelimiter{RATELIMIT_WINDOW_SIZE, RATELIMIT_WINDOW_MAX_BYTES});
    271+    }
    272+
    273+    uint64_t dropped_bytes = m_ratelimiters[source_location].GetDroppedBytes();
    


    jnewbery commented at 10:14 am on February 24, 2022:

    You’ve done existence checking above (“We create a new rate limit window…”), so you can use std::unordered_map::at() instead of the [] operator. [] will insert a new item in the map if the key doesn’t exist, whereas at() will throw. Using [] therefore requires defining a default constructor for the value object.

    If you make that replacement here and two lines below, you can remove the default constructor for LotRatelimiter, and then make the const members const:

     0diff --git a/src/logging.cpp b/src/logging.cpp
     1index c96483366d..6df6ffcf80 100644
     2--- a/src/logging.cpp
     3+++ b/src/logging.cpp
     4@@ -269,10 +269,10 @@ void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& loggi
     5         m_ratelimiters.emplace(source_location, LogRatelimiter{RATELIMIT_WINDOW_SIZE, RATELIMIT_WINDOW_MAX_BYTES});
     6     }
     7 
     8-    uint64_t dropped_bytes = m_ratelimiters[source_location].GetDroppedBytes();
     9+    uint64_t dropped_bytes = m_ratelimiters.at(source_location).GetDroppedBytes();
    10     bool was_ratelimited = m_supressed_locations.find(source_location) != m_supressed_locations.end();
    11     bool is_ratelimited = !skip_ratelimiting && m_ratelimit &&
    12-                          !m_ratelimiters[source_location].Consume(str_prefixed.size());
    13+                          !m_ratelimiters.at(source_location).Consume(str_prefixed.size());
    14     if (!is_ratelimited && was_ratelimited) {
    15         // Logging will restart for this source location.
    16         m_supressed_locations.erase(source_location);
    17diff --git a/src/logging.h b/src/logging.h
    18index e3df9b45d0..7fa811ef1e 100644
    19--- a/src/logging.h
    20+++ b/src/logging.h
    21@@ -101,25 +101,26 @@ namespace BCLog {
    22         //! Timestamp of the last window reset.
    23         std::chrono::seconds m_last_reset;
    24         //! Interval after which the window is reset.
    25-        std::chrono::seconds m_reset_interval;
    26+        const std::chrono::seconds m_reset_interval;
    27         //! Remaining bytes in the current window interval.
    28         uint64_t m_available_bytes;
    29         //! Number of bytes that were not consumed within the current window.
    30         uint64_t m_dropped_bytes{0};
    31         //! The maximum number of bytes that can be logged withon one window.
    32-        uint64_t m_max_bytes;
    33+        const uint64_t m_max_bytes;
    34 
    35         //! Reset the window if the window interval has passed since the last reset.
    36         void MaybeReset();
    37 
    38     public:
    39-        LogRatelimiter() {}
    40+        LogRatelimiter() = delete;
    41 
    42         LogRatelimiter(std::chrono::seconds interval, uint64_t max_bytes)
    

    dergoegge commented at 3:33 pm on March 11, 2022:
    I dropped the non-default constructor since we only ever constructed with RATELIMIT_WINDOW_SIZE and RATELIMIT_WINDOW_MAX_BYTES. So now a new LogRateLimiter will be constructed using the default constructor when [] is used the first time.
  58. in src/logging.h:110 in 6f7fd4be58 outdated
    105+        //! Remaining bytes in the current window interval.
    106+        uint64_t m_available_bytes;
    107+        //! Number of bytes that were not consumed within the current window.
    108+        uint64_t m_dropped_bytes{0};
    109+        //! The maximum number of bytes that can be logged withon one window.
    110+        uint64_t m_max_bytes;
    


    jnewbery commented at 10:14 am on February 24, 2022:
    0        const uint64_t m_max_bytes;
    
  59. in src/logging.h:122 in 6f7fd4be58 outdated
    117+
    118+        LogRatelimiter(std::chrono::seconds interval, uint64_t max_bytes)
    119+            : m_reset_interval{interval}, m_available_bytes{max_bytes}, m_max_bytes{max_bytes}
    120+        {
    121+            m_last_reset = GetTime<std::chrono::seconds>();
    122+        }
    


    jnewbery commented at 11:11 am on February 24, 2022:
    Do the m_reset_interval and m_max_bytes members need to be parametrized and provided in the constructor? The LogRatelimiter object is only ever constructed with RATELIMIT_WINDOW_SIZE and RATELIMIT_WINDOW_MAX_BYTES except in the unit test, which could easily be modified to use the same value. The log Excessive logging detected from %s:%d (%s): >%d MiB logged during the last hour... also implicitly assumes that the objects have been constructed with these constants.

    dergoegge commented at 3:33 pm on March 11, 2022:
  60. in src/logging.h:121 in 6f7fd4be58 outdated
    116+        LogRatelimiter() {}
    117+
    118+        LogRatelimiter(std::chrono::seconds interval, uint64_t max_bytes)
    119+            : m_reset_interval{interval}, m_available_bytes{max_bytes}, m_max_bytes{max_bytes}
    120+        {
    121+            m_last_reset = GetTime<std::chrono::seconds>();
    


    jnewbery commented at 11:11 am on February 24, 2022:

    m_last_reset can be set in the initializer list:

    0        LogRatelimiter(std::chrono::seconds interval, uint64_t max_bytes)
    1            : m_last_reset{GetTime<std::chrono::seconds>()}
    2            , m_reset_interval{interval}
    3            , m_available_bytes{max_bytes}
    4            , m_max_bytes{max_bytes}
    5        {}
    
  61. in src/logging.h:31 in 6f7fd4be58 outdated
    26 static const bool DEFAULT_LOGTIMEMICROS = false;
    27 static const bool DEFAULT_LOGIPS        = false;
    28 static const bool DEFAULT_LOGTIMESTAMPS = true;
    29 static const bool DEFAULT_LOGTHREADNAMES = false;
    30 static const bool DEFAULT_LOGSOURCELOCATIONS = false;
    31+static const bool DEFAULT_RATELIMITLOGGING = true;
    


    jnewbery commented at 11:43 am on February 24, 2022:

    I know you’ve just copied the other consts here, but current style is to prefer:

    0static constexpr bool DEFAULT_RATELIMITLOGGING{true};
    
  62. in src/logging.h:177 in 6f7fd4be58 outdated
    173@@ -98,12 +174,13 @@ namespace BCLog {
    174         bool m_log_time_micros = DEFAULT_LOGTIMEMICROS;
    175         bool m_log_threadnames = DEFAULT_LOGTHREADNAMES;
    176         bool m_log_sourcelocations = DEFAULT_LOGSOURCELOCATIONS;
    177+        bool m_ratelimit = DEFAULT_RATELIMITLOGGING;
    


    jnewbery commented at 11:43 am on February 24, 2022:
    0        bool m_ratelimit{DEFAULT_RATELIMITLOGGING};
    
  63. in src/test/logging_tests.cpp:225 in 6f7fd4be58 outdated
    100+    // Log 1024-character lines (1023 plus newline) to make the math simple.
    101+    std::string log_message(1023, 'a');
    102+
    103+    SetMockTime(std::chrono::hours{1});
    104+
    105+    size_t log_file_size = std::filesystem::file_size(LogInstance().m_file_path);
    


    jnewbery commented at 11:58 am on February 24, 2022:
    I’d be tempted to remove all the log file size checks. Don’t the LogFromLocationAndExpect() checks already show that the expected messages are being written to the log?

    dergoegge commented at 11:58 am on March 7, 2022:

    I think i would like to keep them, as LogFromLocationAndExpect uses the log callbacks which fire regardless of whether or not the message is actually printed to the console or disk.

    I think these checks make sense just to be really sure that the file sizes change as expected. I have had bugs while writing this where the messages from the rate limiting logic where printed at the wrong time, so simply relying on those is not enough IMO.

  64. in src/test/logging_tests.cpp:81 in 6f7fd4be58 outdated
    76+        break;
    77+    case 3:
    78+        LogPrint(BCLog::ALL, "%s\n", message);
    79+        break;
    80+    default:
    81+        BOOST_CHECK_MESSAGE(false, "Unkown log location");
    


    jnewbery commented at 12:01 pm on February 24, 2022:

    This shouldn’t be a BOOST_CHECK, which is used to check program logic. This is to catch errors in the unit test code. I think just remove it and let the compiler warn.

    0    // no default case, so the compiler can warn about missing cases
    
  65. in src/logging.h:273 in 6f7fd4be58 outdated
    270+
    271+// Unconditional logging WITHOUT rate limiting. Use only for log messages that
    272+// MUST NOT be rate limited no matter how often they are logged. That requirement
    273+// should be extremely rare, so please use with care. Prefer LogPrintf(...) if
    274+// possible.
    275+#define LogPrintfWithoutRateLimiting(...) LogPrintf_(__func__, __FILE__, __LINE__, /* skip_ratelimiting */ true, __VA_ARGS__)
    


    jnewbery commented at 12:02 pm on February 24, 2022:

    I think it’d be nice to unify the various logprint macros by introducing new categories:

    • UNCONDITIONAL_RATE_LIMITED replaces LogPrintf
    • UNCONDITIONAL_ALWAYS replaces LogPrintfWithoutRateLimiting

    or similar.


    dergoegge commented at 5:17 pm on March 7, 2022:

    Are you suggesting to get rid of LogPrintf completely or to define it as the following?

    0#define LogPrintf(...) LogPrint(UNCONDITIONAL_RATE_LIMITED, __VA_ARGS__)
    

    I would not be a fan of replacing every usage of LogPrintf in this PR.


    jnewbery commented at 1:12 pm on March 11, 2022:
    I think it’d be good to get rid of LogPrintf completely in the future, but I’m not suggesting you do that in this PR. I’d just prefer that his PR doesn’t add a new log macro.
  66. in src/logging.cpp:252 in 6f7fd4be58 outdated
    248@@ -247,13 +249,13 @@ namespace BCLog {
    249     }
    250 } // namespace BCLog
    251 
    252-void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& logging_function, const std::string& source_file, const int source_line)
    253+void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& logging_function, const SourceLocation& source_location, bool skip_ratelimiting)
    


    jnewbery commented at 12:03 pm on February 24, 2022:
    Do we have any idea if the new logic in this function impacts performance?

    dergoegge commented at 10:09 pm on March 10, 2022:

    Here are some benchmarks using the code from #18815:

    merge base:

    0|               ns/op |                op/s |    err% |          ins/op |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:----------
    2|                3.63 |      275,623,892.70 |    0.5% |           54.00 |          13.00 |    0.0% |      0.01 | `LoggingNoCategory`
    3|               40.65 |       24,600,010.03 |    0.9% |          659.00 |         155.00 |    0.0% |      0.01 | `LoggingNoFile`
    4|            2,165.69 |          461,746.80 |    0.2% |       19,809.05 |       3,709.01 |    0.1% |      0.01 | `LoggingNoThreadNames`
    5|            2,189.45 |          456,736.09 |    0.8% |       19,842.05 |       3,717.01 |    0.2% |      0.01 | `LoggingYoCategory`
    6|            2,224.15 |          449,609.33 |    0.9% |       20,373.05 |       3,839.01 |    0.1% |      0.01 | `LoggingYoThreadNames`
    

    pull request:

    0|               ns/op |                op/s |    err% |          ins/op |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|---------------:|--------:|----------:|:----------
    2|                3.88 |      257,848,736.66 |    1.2% |           54.00 |          13.00 |    0.0% |      0.01 | `LoggingNoCategory`
    3|               40.29 |       24,820,332.51 |    2.5% |          663.00 |         155.00 |    0.0% |      0.01 | `LoggingNoFile`
    4|            2,440.31 |          409,783.92 |    1.5% |       22,024.06 |       4,017.01 |    0.1% |      0.01 | `LoggingNoThreadNames`
    5|            2,486.87 |          402,111.61 |    4.6% |       21,513.06 |       3,951.01 |    0.1% |      0.01 | `LoggingYoCategory`
    6|            2,571.68 |          388,850.52 |    3.3% |       22,584.06 |       4,147.01 |    0.1% |      0.01 | `LoggingYoThreadNames`
    

    I would say that the additional 200 - 300 nano seconds per LogPrintf call are negligible.

  67. dergoegge force-pushed on Mar 11, 2022
  68. dergoegge force-pushed on Mar 11, 2022
  69. DrahtBot added the label Needs rebase on Apr 8, 2022
  70. dergoegge force-pushed on Apr 14, 2022
  71. DrahtBot removed the label Needs rebase on Apr 14, 2022
  72. in src/test/logging_tests.cpp:117 in 3eaafe7cc8 outdated
    112+    }
    113+    BOOST_CHECK_MESSAGE(log_file_size == std::filesystem::file_size(LogInstance().m_file_path), "all further logs from location 0 should be dropped");
    114+
    115     BOOST_CHECK_THROW(
    116         LogFromLocationAndExpect(1, log_message, "Excessive logging detected"), std::runtime_error);
    117+    BOOST_CHECK_MESSAGE(log_file_size < std::filesystem::file_size(LogInstance().m_file_path), "locaiton 1 should be unaffected by other locations");
    


    mzumsande commented at 4:27 pm on April 14, 2022:
    typo: location (here and in two other spots)
  73. in src/logging.cpp:270 in 7654c17625 outdated
    264@@ -265,7 +265,38 @@ void BCLog::Logger::LogPrintStr(const std::string& str, const std::string& loggi
    265 
    266     str_prefixed = LogTimestampStr(str_prefixed);
    267 
    268-    m_started_new_line = !str.empty() && str[str.size()-1] == '\n';
    269+    uint64_t dropped_bytes = m_ratelimiters[source_location].GetDroppedBytes();
    270+    bool was_ratelimited = m_supressed_locations.find(source_location) != m_supressed_locations.end();
    271+    bool is_ratelimited = !skip_rate_limiting && m_ratelimit &&
    


    mzumsande commented at 4:34 pm on April 14, 2022:
    All these checks seem unnecessary for locations that are not subject to rate limiting (e.g. special category logs, or with rate limiting disabled). Wouldn’t it be better to make this entire block conditional on !skip_rate_limiting && m_ratelimit so that we don’t needlessly add locations to m_ratelimiters and perform other operations for these locations?
  74. in src/init/common.cpp:79 in 8ff8a856e7 outdated
    75@@ -76,6 +76,7 @@ void AddLoggingArgs(ArgsManager& argsman)
    76     argsman.AddArg("-logtimemicros", strprintf("Add microsecond precision to debug timestamps (default: %u)", DEFAULT_LOGTIMEMICROS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    77     argsman.AddArg("-printtoconsole", "Send trace/debug info to console (default: 1 when no -daemon. To disable logging to file, set -nodebuglogfile)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    78     argsman.AddArg("-shrinkdebugfile", "Shrink debug.log file on client startup (default: 1 when no -debug)", ArgsManager::ALLOW_ANY, OptionsCategory::DEBUG_TEST);
    79+    argsman.AddArg("-ratelimitlogging", strprintf("Rate limit debug logging to disk (default: %u)", DEFAULT_RATELIMITLOGGING), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    


    mzumsande commented at 4:43 pm on April 14, 2022:
    The naming might be confusing, maybe drop the “debug” in it because “debug logging” could be understood as the specialised logs (-debug), which are not subject to the rate limiting.

    dergoegge commented at 9:02 am on April 15, 2022:
    Replaced “debug” with “unconditional”.
  75. in src/logging.cpp:168 in f1cc42c04e outdated
    163@@ -165,6 +164,8 @@ const CLogCategoryDesc LogCategories[] =
    164 #endif
    165     {BCLog::UTIL, "util"},
    166     {BCLog::BLOCKSTORE, "blockstorage"},
    167+    {BCLog::UNCONDITIONAL_RATE_LIMITED, "uncond_rate_limited"},
    168+    {BCLog::UNCONDITIONAL_ALWAYS, "uncond_always"},
    


    glozow commented at 5:51 pm on April 14, 2022:
    Just to clarify - all the other categories (QT, REINDEX, BENCH, etc.) are not rate-limited?

    dergoegge commented at 9:07 am on April 15, 2022:
    Yes, only uses of LogPrintf (LogPrint(BCLog::UNCONDITIONAL_RATE_LIMITED, ...)) are rate limited.
  76. glozow commented at 5:58 pm on April 14, 2022: member

    Concept ACK

    Re: source-based vs global rate-limiting or rotation: from what I understand, the disk-filling attack we’re trying to prevent is an attacker figuring out some unconditional logging they can trigger over and over again, so a source-based rate-limiting approach seems to be the correct solution. I assume there aren’t other attacks we’re looking to prevent?

  77. dergoegge force-pushed on Apr 15, 2022
  78. dergoegge force-pushed on Apr 15, 2022
  79. dergoegge commented at 9:26 am on April 15, 2022: member

    Thanks for the review @mzumsande @glozow!

    […] the disk-filling attack we’re trying to prevent is an attacker figuring out some unconditional logging they can trigger over and over again […] I assume there aren’t other attacks we’re looking to prevent?

    None that I can think of, this PR is aimed at exactly what you described.

  80. in src/logging.h:161 in 092e023d83 outdated
    157@@ -85,7 +158,7 @@ namespace BCLog {
    158         std::atomic_bool m_started_new_line{true};
    159 
    160         /** Log categories bitfield. */
    161-        std::atomic<uint32_t> m_categories{0};
    162+        std::atomic<uint32_t> m_categories{BCLog::UNCONDITIONAL_RATE_LIMITED | BCLog::UNCONDITIONAL_ALWAYS};
    


    jnewbery commented at 5:14 pm on May 23, 2022:
    Does setting the default to BCLog::UNCONDITIONAL_RATE_LIMITED | BCLog::UNCONDITIONAL_ALWAYS break the logic in DefaultShrinkDebugFile()?

    dergoegge commented at 10:41 am on May 25, 2022:
    Good catch, fixed!
  81. in src/logging.cpp:500 in 092e023d83 outdated
    383+        m_last_reset = now;
    384+        m_dropped_bytes = 0;
    385+    }
    386+}
    387+
    388+bool BCLog::LogRateLimiter::Consume(uint64_t bytes)
    


    jnewbery commented at 5:24 pm on May 23, 2022:

    Is it possible for this to return false followed by true, even if the rate limiter doesn’t reset, eg:

    • m_available_bytes is 100
    • Consume(110) is called and returns false
    • Consume(90) is called and returns true.

    Which would then cause the “Restarting logging from …” and “Excessive logging detected from…” logs to be called before the LogRateLimiter has been reset.

    If so, I think this is the fix:

    0    if (bytes > m_available_bytes) {
    1        m_dropped_bytes += bytes;
    2+       m_available_bytes = 0;
    3        return false;
    4    }
    

    dergoegge commented at 10:43 am on May 25, 2022:
    Yes, trying to consume more bytes than are available should result in no available bytes afterwards.
  82. in src/logging.cpp:301 in 092e023d83 outdated
    297+
    298+        // To avoid confusion caused by dropped log messages when debugging an issue,
    299+        // we prefix log lines with "[*]" when there are any supressed source locations.
    300+        if (m_supressed_locations.size() > 0) {
    301+            str_prefixed.insert(0, "[*] ");
    302+        }
    


    jnewbery commented at 5:28 pm on May 23, 2022:

    Should this be outside the skip_rate_limiting conditional, so it is prefixed to all lines?

     0                 source_location.m_file, source_location.m_line, logging_function,
     1                 LogRateLimiter::WINDOW_MAX_BYTES / (1024 * 1024), str_prefixed));
     2         }
     3+    }
     4 
     5-        // To avoid confusion caused by dropped log messages when debugging an issue,
     6-        // we prefix log lines with "[*]" when there are any supressed source locations.
     7-        if (m_supressed_locations.size() > 0) {
     8-            str_prefixed.insert(0, "[*] ");
     9-        }
    10+    // To avoid confusion caused by dropped log messages when debugging an issue,
    11+    // we prefix log lines with "[*]" when there are any supressed source locations.
    12+    if (m_supressed_locations.size() > 0) {
    13+        str_prefixed.insert(0, "[*] ");
    14     }
    
  83. DrahtBot added the label Needs rebase on May 24, 2022
  84. dergoegge force-pushed on May 25, 2022
  85. DrahtBot removed the label Needs rebase on May 25, 2022
  86. in src/test/logging_tests.cpp:161 in 9d84cb2cd2 outdated
    156+{
    157+    SetMockTime(std::chrono::hours{1});
    158+    BCLog::LogRateLimiter window;
    159+
    160+    // Check that window gets initialised correctly.
    161+    BOOST_CHECK_EQUAL(window.GetAvailableBytes(), 1024ull * 1024ull);
    


    jnewbery commented at 4:25 pm on May 27, 2022:

    Would it be better to use WINDOW_MAX_BYTES here to reduce maintenance burden?

    0    BOOST_CHECK_EQUAL(window.GetAvailableBytes(), BCLog::LogRateLimiter::WINDOW_MAX_BYTES);
    
  87. in src/test/logging_tests.cpp:165 in 9d84cb2cd2 outdated
    160+    // Check that window gets initialised correctly.
    161+    BOOST_CHECK_EQUAL(window.GetAvailableBytes(), 1024ull * 1024ull);
    162+    BOOST_CHECK_EQUAL(window.GetDroppedBytes(), 0ull);
    163+
    164+    BOOST_CHECK(window.Consume(512ull * 1024ull));
    165+    BOOST_CHECK_EQUAL(window.GetAvailableBytes(), 512ull * 1024ull);
    


    jnewbery commented at 4:27 pm on May 27, 2022:

    Perhaps:

    0
    1    const uint64_t MESSAGE_SIZE{512 * 1024};
    2
    3    BOOST_CHECK(window.Consume(MESSAGE_SIZE));
    4    BOOST_CHECK_EQUAL(window.GetAvailableBytes(), WINDOW_MAX_BYTES - MESSAGE_SIZE);
    
  88. in src/logging.h:192 in 9d84cb2cd2 outdated
    188         fs::path m_file_path;
    189         std::atomic<bool> m_reopen_file{false};
    190 
    191         /** Send a string to the log output */
    192-        void LogPrintStr(const std::string& str, const std::string& logging_function, const std::string& source_file, const int source_line, const BCLog::LogFlags category, const BCLog::Level level);
    193+        void LogPrintStr(const std::string& str, const std::string& logging_function, const SourceLocation& source_location, const BCLog::LogFlags category, const BCLog::Level level);
    


    jnewbery commented at 4:29 pm on May 27, 2022:

    Can I convince you to split this line to make it a bit more readable?

    0        void LogPrintStr(const std::string& str, const std::string& logging_function,
    1                         const SourceLocation& source_location, const BCLog::LogFlags category,
    2                         const BCLog::Level level);
    
  89. in src/logging.cpp:349 in 9d84cb2cd2 outdated
    344 {
    345     StdLockGuard scoped_lock(m_cs);
    346     std::string str_prefixed = LogEscapeMessage(str);
    347 
    348-    if ((category != LogFlags::NONE || level != Level::None) && m_started_new_line) {
    349+    const bool print_category{category != LogFlags::NONE && category != LogFlags::UNCONDITIONAL_ALWAYS && category != LogFlags::UNCONDITIONAL_RATE_LIMITED};
    


    jnewbery commented at 5:06 pm on May 27, 2022:
    Is the category != LogFlags::NONE necessary here? I don’t think anything logs with category NONE now.

    dergoegge commented at 10:39 am on May 30, 2022:

    There are still tests that log using NONE: https://github.com/bitcoin/bitcoin/blob/a6377c5fe4f8d8fabbae527df888afdd9c6f771a/src/test/logging_tests.cpp#L80-L81

    What do you think of removing the NONE category entirely?


    jnewbery commented at 11:10 am on May 30, 2022:

    I think those should be replaced with LogPrintf_(..., BCLog::LogFlags::UNCONDITIONAL_ALWAYS, ...), since it’s impossible for LogPrintf_(..., BCLog::LogFlags::NONE, ...) to be called in the produce code.

    It seems fine to remove BCLog::LogFlags::NONE after that, but I don’t think it’s necessary in this PR.

  90. dergoegge force-pushed on May 30, 2022
  91. jnewbery commented at 11:11 am on May 30, 2022: contributor
    utACK a6377c5fe4f8d8fabbae527df888afdd9c6f771a
  92. in src/logging.cpp:10 in a6377c5fe4 outdated
     6@@ -7,7 +7,6 @@
     7 #include <logging.h>
     8 #include <util/threadnames.h>
     9 #include <util/string.h>
    10-#include <util/time.h>
    


    maflcko commented at 11:17 am on May 30, 2022:
    Why are you removing this include when it is needed?

    dergoegge commented at 1:51 pm on May 30, 2022:
    I moved the include to logging.h where it is needed for GetTime() etc.

    maflcko commented at 1:58 pm on May 30, 2022:
    According to the dev notes and iwyu it will need to be included in both files if both files use it.

    dergoegge commented at 2:01 pm on May 30, 2022:
    Ah I wasn’t aware, thanks for pointing this out. Will amend.
  93. in src/logging.h:131 in a6377c5fe4 outdated
    126+        //! The maximum number of bytes that can be logged within one window.
    127+        static constexpr uint64_t WINDOW_MAX_BYTES{1024 * 1024};
    128+
    129+        LogRateLimiter()
    130+            : m_last_reset{GetTime<std::chrono::seconds>()},
    131+              m_available_bytes{WINDOW_MAX_BYTES} {}
    


    maflcko commented at 11:22 am on May 30, 2022:
    Could drop this and use C++11 member init?
  94. jonatack commented at 11:24 am on May 30, 2022: contributor
    * Only `LogPrintf(…)` is rate limited. `LogPrint(category, …)` (`-debug`) is not rate limited.
    

    It looks like many of the LogPrintf messages will be going away with the update to severity-based logging and the added complexity here may be worth reevaluating (and potentially avoiding) after that.

  95. in src/logging.h:115 in a6377c5fe4 outdated
    110+    //! Fixed window rate limiter for logging.
    111+    class LogRateLimiter
    112+    {
    113+    private:
    114+        //! Timestamp of the last window reset.
    115+        std::chrono::seconds m_last_reset;
    


    maflcko commented at 11:24 am on May 30, 2022:
    0        std::chrono::steady_clock::time_point m_last_reset{std::chrono::steady_clock::now()};
    

    Looks like you are not mocking this in the tests, so could use steady clock?

    If not, new code should probably use NodeClock::now() over GetTime().


    dergoegge commented at 1:52 pm on May 30, 2022:
    Changed it to use NodeClock.
  96. maflcko commented at 11:33 am on May 30, 2022: member
    left some nits (feel free to ignore for now)
  97. [log] Introduce log rate limiter class
    The LogRatelimiter class implements a fixed window rate limiter. The
    rate limiter allows a fixed amount of bytes to be consumed within a
    fixed time window.
    ca0942b92d
  98. [log] Introduce source location type
    The SourceLocation type stores the filename and line of a source code
    location.
    In a later commit we use this type as the key type in an unordered map
    and set to keep track of rate limters for each location.
    528dec8323
  99. [config] Add -ratelimitlogging config option
    The -ratelimitlogging can be used to enable/disable the rate limiting to
    disk. Rate limiting is enabled by default.
    ade34ada57
  100. [log] Add two new categories for unconditional logging
    We create two new categories `UNCONDITIONAL_ALWAYS` and
    `UNCONDITIONAL_RATE_LIMITED` that are always enabled by default.
    
    LogPrintf now logs using the `UNCONDITIONAL_RATE_LIMITED` category which
    will start to apply rate limiting in a later commit.
    
    For some log locations it might be safe to allow more frequent logging
    without rate limiting. These locations should use the
    `UNCONDITIONAL_ALWAYS` category.
    45a086bf58
  101. [validation] Exempt UpdateTipLog from rate limiting
    UpdateTipLog logs everytime a new tip is activated. This occurs at an
    increased frequency during IBD and should therefore be exempt from rate
    limiting.
    95cbb40a86
  102. [log] Add rate limiting to LogPrintf
    To mitigate disk filling attacks caused by unsafe usages of LogPrintf,
    we rate limit LogPrintf by using the fixed window rate limiter
    (BCLog::LogRatelimiter) introduced in an earlier commit.
    
    The rate limiting logic is applied per source location instead of
    globally. A source location is allowed to log up to 1 MiB per hour.
    Source locations that violate the limit will have their logs supressed
    for up to one hour.
    95978a1663
  103. [test util] Mark ~DebugLogHelper as noexcept(false)
    We mark ~DebugLogHelper as noexcept(false) to be able to catch the
    exception it throws. This lets us use it in test in combination with
    BOOST_CHECK_THROW and BOOST_CHECK_NO_THROW to check that certain log
    messages are (not) logged.
    02ad12c719
  104. [test] Check for expected log rate limiting messages 270202597a
  105. [test] Test for expected file size changes when rate limiting is enabled 79f7091fef
  106. [test] Check that log rate limiting is disabled for exempt source locations b9261a8bbe
  107. [test] Check that rate limiting can be disabled c86d5ff02a
  108. dergoegge force-pushed on May 30, 2022
  109. dergoegge commented at 2:13 pm on May 30, 2022: member

    It looks like many of the LogPrintf messages will be going away with the update to severity-based logging and the added complexity here may be worth reevaluating (and potentially avoiding) after that. @jonatack IMO removing some of the LogPrintf locations (or assigning them a severity level which is not logged by default) does not really address the issue at its root like this PR is trying to do. Any remaining default logging could still cause issues as well as any logging that is introduced in the future.

  110. jonatack commented at 2:21 pm on May 30, 2022: contributor
    What I am working on is that no logging on by default would be externally provokable (and potentially removing LogPrintf). That along with reducing default logging in general has appeared to be the desired direction for some time now AFAICS. Of course, this could still be merged in the interim and then removed if no longer needed, but I’m targeting the next release.
  111. jnewbery commented at 2:36 pm on May 30, 2022: contributor
    reACK c86d5ff02a31322612555db559e1d97ef329e27f
  112. mzumsande commented at 3:29 pm on May 31, 2022: contributor

    What I am working on is that no logging on by default would be externally provokable (and potentially removing LogPrintf). That along with reducing default logging in general has appeared to be the desired direction for some time now AFAICS. Of course, this could still be merged in the interim and then removed if no longer needed, but I’m targeting the next release.

    With https://github.com/bitcoin/bitcoin/pull/25202/commits/bd971bffb02c7b06aac9a479f7e5ed8f71de2bec, category-based logging with a severity >= BCLog::Level::Warning is equivalent to LogPrintf from the viewpoint of disk filling attacks. I think that it would be really good to have this general countermeasure in place, especially with probable future PRs that convert more existing log messages to the new format, because we’ll now have to be careful about not adding category-based warnings or errors that could be triggered externally. The old rule, which is no longer valid but probably still in many contributors’ heads was that one only needs to be careful about LogPrintf, while category-base logging is not so critical.

  113. jonatack commented at 4:51 pm on May 31, 2022: contributor

    convert more existing log messages to the new format, because we’ll now have to be careful about not adding category-based warnings or errors that could be triggered externally

    Yes, doing precisely this right now for all of them. Though, assuming people are in favor of that, maybe reviewers might prefer extending this rate limiting to non-default logging or otherwise it can always still be removed.

  114. maflcko commented at 5:04 pm on May 31, 2022: member
    Is it possible to convert all of them? I think currently we don’t have a way to express a log category but unconditional log severity. Unless you want to use Warning/Error for “unconditional”.
  115. jonatack commented at 5:08 pm on May 31, 2022: contributor
    @MarcoFalke added that in #25203.
  116. mzumsande commented at 6:43 pm on May 31, 2022: contributor

    Though, assuming people are in favor of that, maybe reviewers might prefer extending this rate limiting to non-default logging.

    Good point, I think that this would definitely be useful (here or as a follow-up) - having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

  117. jonatack commented at 6:52 pm on May 31, 2022: contributor

    Good point, I think that this would definitely be useful (here or as a follow-up) - having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

    I agree; though as an example just updated the (copious) net processing logging in #25203 and there are a only a few remaining unconditional messages (user-controlled or unusual):

    0src/net_processing.cpp:2305:                LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Large reorg, won't direct fetch to %s (%d)\n",
    1src/net_processing.cpp:2702:            LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Connected to self at %s, disconnecting\n", pfrom.addr.ToString());
    2src/net_processing.cpp:3425:                    LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not relaying non-mempool transaction %s from forcerelay peer=%d\n", tx.GetHash().ToString(), pfrom.GetId());
    3src/net_processing.cpp:4190:        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not punishing misbehaving peer %d because it has noban permission set!\n", peer.m_id);
    4src/net_processing.cpp:4196:        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not punishing misbehaving peer %d because it was manually connected!\n", peer.m_id);
    5src/net_processing.cpp:4873:                        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Announcing block %s not on main chain (tip=%s)\n",
    
  118. dergoegge commented at 12:35 pm on June 6, 2022: member

    having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

    Also agree with this and i want to add that to this PR.

    The two new categories I introduce here (UNCONDITIONAL_ALWAYS and UNCONDITIONAL_RATE_LIMITED) don’t really make sense anymore, given that we now also have category based unconditional logging. Whether a log location should be rate limited could purely be based on the severity level. I was thinking of the following: Adding an extra severity Always which is logged unconditionally without rate limiting (e.g. for this location) and rate limiting logging to disk if severity == None || (severity >= Warning && severity < Always). Additionally, categories specified by -debug=cat should still be logged unconditionally without rate limiting. @mzumsande @jonatack @MarcoFalke what do y’all think of that?

  119. DrahtBot commented at 11:03 am on June 14, 2022: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  120. DrahtBot added the label Needs rebase on Jun 14, 2022
  121. mzumsande commented at 3:01 pm on June 22, 2022: contributor

    I was thinking of the following: Adding an extra severity Always which is logged unconditionally without rate limiting (e.g. for this location) and rate limiting logging to disk if severity == None || (severity >= Warning && severity < Always). Additionally, categories specified by -debug=cat should still be logged unconditionally without rate limiting. @mzumsande @jonatack @MarcoFalke what do y’all think of that?

    So, it seems to me that with the recent changes, we currently have

    1. Unconditional logging with no level and no category (LogPrintf)
    2. Conditional logging with category but no level (LogPrint) - not relevant to this PR
    3. Unconditional logging with category but no level (LogPrintfCategory)
    4. Logging with category and level, which may be unconditional depending on the level (LogPrintLevel)

    The “missing” combination of logging by level but without a category is currently not possible as far as I can see.

    If there would be a use case for having “Always” logged messages also to have a severity like “Warning” or “Error” we’d need another dimension - but I think that at least for the current uses this is not the case, so your suggestion makes sense to me.

    However, I’m really not sure how stable the current state of the changes to the logging framework is, it seems very much in flow and I’m a little bit confused about what direction it is heading:

    • The severity threshold mentioned in 4) was added in #25202, but after reading this comment by @laanwj I’m not sure it will stay.
    • Is the final goal for each logging message to have both a category and a level so that 1) and 2) and 3) would vanish over time? That would probably reduce the complexity of the rate-limiting considerably.
  122. maflcko commented at 3:34 pm on June 22, 2022: member
    If there is a log category with Info, which is also the default level, you wouldn’t need an Always level?
  123. mzumsande commented at 1:11 am on June 23, 2022: contributor

    If there is a log category with Info, which is also the default level, you wouldn’t need an Always level?

    I don’t see how - the “Always” level is meant for noisy unconditional logs that can hit the rate limit, but are so important that they should be exempt from it - basically only UpdateTipLog during IBD.

  124. jonatack commented at 6:14 pm on June 23, 2022: contributor
    I’m about to push an update to #25203 to incorporate #25287 and the changes discussed in #25306, and the current approach is that the None severity level is only used internally, i.e. in src/logging.{h.cpp}, and moved to the end of the enum to permit iterating through it similar to GetNetworkNames() / NET_MAX.
  125. jonatack commented at 6:18 pm on June 23, 2022: contributor

    The logging you might be worried about limiting would be, with the current direction, IIRC:

    • LogPrintf
    • LogPrintfWithCategory
    • LogPrintLevel with level Info/Warning/Error (these levels are planned to be logged unconditionally, while Debug and Trace would not)
  126. dergoegge commented at 10:18 am on October 7, 2022: member
    Haven’t had the time to maintain this. I still think some general mitigation for disk-filling would be nice, so I might pick this up again later unless someone else beats me to it. Closing for now.
  127. dergoegge closed this on Oct 7, 2022

  128. bitcoin locked this on Oct 7, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me