util: Filter control characters out of log messages #17095

pull laanwj wants to merge 1 commits into bitcoin:master from laanwj:2019_10_log_control_chars changing 2 files +41 −1
  1. laanwj commented at 11:26 am on October 10, 2019: member

    Belts and suspenders: make sure outgoing log messages don’t contain potentially suspicious characters, such as terminal control codes.

    This escapes control characters except newline (’\n’) in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings (it is a debug log, after all).

    (more checks could be added such as UTF-8 validity and unicode code-point range checking—this is substantially more involved and would need to keep track of state between characters and even LogPrint calls as they could end up split up—but escape codes seem to be the most common attack vector for terminals.)

  2. laanwj added the label Utils/log/libs on Oct 10, 2019
  3. laanwj force-pushed on Oct 10, 2019
  4. laanwj force-pushed on Oct 10, 2019
  5. laanwj commented at 11:35 am on October 10, 2019: member

    To test, apply this and run with logging to stdout, without this patch it will make the entire log turn yellow:

     0diff --git a/src/init.cpp b/src/init.cpp
     1index 035725b0908ebddae882583d489bdd8ad82c4857..26f4c7b906be959aa3ebaa1a2a657028c95b0f49 100644
     2--- a/src/init.cpp
     3+++ b/src/init.cpp
     4@@ -1231,6 +1231,7 @@ bool AppInitMain(InitInterfaces& interfaces)
     5         LogPrintf("Startup time: %s\n", FormatISO8601DateTime(GetTime()));
     6     LogPrintf("Default data directory %s\n", GetDefaultDataDir().string());
     7     LogPrintf("Using data directory %s\n", GetDataDir().string());
     8+    LogPrintf("\x1b[1;33mIn yellowwwww\n");
     9 
    10     // Only log conf file usage message if conf file actually exists.
    11     fs::path config_file_path = GetConfigFile(gArgs.GetArg("-conf", BITCOIN_CONF_FILENAME));
    

    with patch, it will just print:

    02019-10-10T11:37:40Z \x1b[1;33mIn yellowwwww
    
  6. in src/logging.cpp:242 in f2f1365805 outdated
    237+        for (char ch_in: str) {
    238+            uint8_t ch = (uint8_t)ch_in;
    239+            if (ch >= 32 || ch == '\n') {
    240+                ret += ch_in;
    241+            } else {
    242+                ret += strprintf("\\x%02x", ch);
    


    luke-jr commented at 11:45 am on October 10, 2019:

    It seems impossible to distinguish between a literal “\x00” and “\x00” after this transformation.

    Suggest using SanitizeString perhaps with a new rule.


    laanwj commented at 11:46 am on October 10, 2019:
    I don’t think that matters much. \x00 (as in, the code-point 0) is not supposed to end up in a log string at all, if it does that’s an error (not serious enough to stop the program, though). Could add some more apparent warning I guess?

    laanwj commented at 11:48 am on October 10, 2019:
    I’m very much against using SanitizeString’s methodology here. I don’t want to make any characters entirely disappear. If anything this should make them very visible.

    luke-jr commented at 11:52 am on October 10, 2019:
    Cherry-pick 7cbc72b93740a5c7e14d1fb1a9f45fd78ff24700?

    laanwj commented at 11:56 am on October 10, 2019:
    I prefer to keep this simple code specific and self-contained, TBH. Also, escaping with ‘%%’ seems to have exactly the same “issue”? (if you regard it as an issue)

    luke-jr commented at 3:20 pm on October 10, 2019:
    The escaping code there always escapes ‘%’

    MarcoFalke commented at 6:07 pm on October 10, 2019:
    Could add a DCHECK(false) (#16136) to the else branch, maybe only run once?

    laanwj commented at 6:14 pm on October 10, 2019:
    I think crashing the executable on an assertion error is a bad idea here. After all, if the character was injected somehow, that changes a log injection to a live DoS.

    MarcoFalke commented at 6:20 pm on October 10, 2019:
    I assumed it was only enabled by --enable-debug, which also turns potential lock issues into crashes.

    laanwj commented at 6:24 pm on October 10, 2019:
    Only running the else branch once is fine with me though, not sure. It’s a compromise between precise troubleshooting and the fear that this would generate very long strings.
  7. luke-jr changes_requested
  8. in src/logging.cpp:237 in f2f1365805 outdated
    232+     * It escapes instead of removes them to still allow for troubleshooting
    233+     * issues where they accidentally end up in strings.
    234+     */
    235+    std::string LogEscapeMessage(const std::string& str) {
    236+        std::string ret;
    237+        for (char ch_in: str) {
    


    laanwj commented at 12:22 pm on October 10, 2019:
    Note to self: add a space before :
  9. practicalswift commented at 1:44 pm on October 10, 2019: contributor

    ACK f2f1365805c7a0c3828129a714aac726cc4f46b8 – diff looks correct and I’ve tested it lightly with expected results

    We can tighten the allowed range (ch >= 32 || ch == '\n') later on, but this is a good start.

    (The only (theoretical) issue I can think of with this fix is that the size of the log payload is increased by a factor of four when escaping. An attacker trying to fill the disk of a node by repeated log injection can then fill the disk four times as fast by using escape characters in the the injected log payload. I still think this change is worth doing as suggested though: rate limiting logging and/or limiting the length of log messages can be handled in a follow-up PR.)

  10. DrahtBot commented at 4:12 pm on October 10, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16889 (Add some general std::vector utility functions by sipa)
    • #16673 (Relog configuration args on debug.log rotation by LarryRuane)
    • #12833 ([qt] move QSettings to bitcoin_rw.conf where possible by Sjors)
    • #11082 (Add new bitcoin_rw.conf file that is used for settings modified by this software itself by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. laanwj commented at 5:45 pm on October 10, 2019: member

    We can tighten the allowed range (ch >= 32 || ch == ‘\n’) later on, but this is a good start.

    I don’t think we should do that without parsing UTF-8. It definitely should be possible to log Russian and Chinese wallet names, or even emoji.

    Edit : OK I guess \x7f is, strictly, a control code too, and we could easily exclude that.

    The only (theoretical) issue I can think of with this fix is that the size of the log payload is increased by a factor of four when escaping

    Yes, true, to be honest I’m not worried about disk filling attacks here.

  12. laanwj added the label Needs backport (0.19) on Oct 10, 2019
  13. laanwj added the label Needs backport (0.18) on Oct 10, 2019
  14. in src/logging.cpp:236 in 51dbfcf7c2 outdated
    231+     * This escapes control characters except newline ('\n') in C syntax.
    232+     * It escapes instead of removes them to still allow for troubleshooting
    233+     * issues where they accidentally end up in strings.
    234+     */
    235+    std::string LogEscapeMessage(const std::string& str) {
    236+        std::string ret;
    


    laanwj commented at 6:25 pm on October 10, 2019:
    BTW: I wasn’t sure what the fashion of the day is for calling .reserve or not, here. The expected (99.9%) outcome here is that the output is exactly the same length as the input.
  15. jonasschnelli commented at 3:31 pm on October 11, 2019: contributor
    utACK 8c45e33c07783828ea78489edf680a67bc491377
  16. MarcoFalke added this to the milestone 0.19.0 on Oct 12, 2019
  17. in src/logging.cpp:239 in 8c45e33c07 outdated
    234+     */
    235+    std::string LogEscapeMessage(const std::string& str) {
    236+        std::string ret;
    237+        for (char ch_in : str) {
    238+            uint8_t ch = (uint8_t)ch_in;
    239+            if ((ch >= 32 || ch == '\n') && ch != '\x7f') {
    


    kristapsk commented at 2:17 pm on October 13, 2019:
    if (!iscntrl(ch) || ch == '\n') {?

    practicalswift commented at 7:06 pm on October 13, 2019:

    iscntrl is locale dependent and we try to avoid using locale dependent functions.

    See the following section in the developer notes:

    • Avoid using locale dependent functions if possible. You can use the provided lint-locale-dependence.sh to check for accidental use of locale dependent functions.

      • Rationale: Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix.

      • These functions are known to be locale dependent: alphasort, asctime, asprintf, atof, atoi, atol, atoll, atoq, btowc, ctime, dprintf, fgetwc, fgetws, fprintf, fputwc, fputws, fscanf, fwprintf, getdate, getwc, getwchar, isalnum, isalpha, isblank, iscntrl, isdigit, isgraph, islower, isprint, ispunct, isspace, isupper, iswalnum, iswalpha, iswblank, iswcntrl, iswctype, iswdigit, iswgraph, iswlower, iswprint, iswpunct, iswspace, iswupper, iswxdigit, isxdigit, mblen, mbrlen, mbrtowc, mbsinit, mbsnrtowcs, mbsrtowcs, mbstowcs, mbtowc, mktime, putwc, putwchar, scanf, snprintf, sprintf, sscanf, stoi, stol, stoll, strcasecmp, strcasestr, strcoll, strfmon, strftime, strncasecmp, strptime, strtod, strtof, strtoimax, strtol, strtold, strtoll, strtoq, strtoul, strtoull, strtoumax, strtouq, strxfrm, swprintf, tolower, toupper, towctrans, towlower, towupper, ungetwc, vasprintf, vdprintf, versionsort, vfprintf, vfscanf, vfwprintf, vprintf, vscanf, vsnprintf, vsprintf, vsscanf, vswprintf, vwprintf, wcrtomb, wcscasecmp, wcscoll, wcsftime, wcsncasecmp, wcsnrtombs, wcsrtombs, wcstod, wcstof, wcstoimax, wcstol, wcstold, wcstoll, wcstombs, wcstoul, wcstoull, wcstoumax, wcswidth, wcsxfrm, wctob, wctomb, wctrans, wctype, wcwidth, wprintf


    laanwj commented at 8:11 am on October 14, 2019:
    Right, I don’t want to rely on any C functions here which may or may not have the same view of the character set. The log is file always regarded as UTF-8 anyhow.
  18. kristapsk commented at 2:27 pm on October 13, 2019: contributor
    Maybe it’s possible to add some functional test? (not sure we can generate such log messages without changing source code, so just an idea)
  19. laanwj commented at 8:09 am on October 14, 2019: member

    Maybe it’s possible to add some functional test?

    Well—the idea behind preventing log injection is that it’s not possible to inject arbitrary log messages through the network. I don’t think we’d want to add any code to encourage it.

    Could add a unit test for the filter function, though.

  20. util: Filter control characters out of log messages
    Belts and suspenders: make sure outgoing log messages don't contain
    potentially suspicious characters, such as terminal control codes.
    
    This escapes control characters except newline ('\n') in C syntax.
    It escapes instead of removes them to still allow for troubleshooting
    issues where they accidentally end up in strings.
    d7820a1250
  21. laanwj commented at 8:53 am on October 15, 2019: member
    Added unit test. Had to rebase to do this (conflict with the spans test), so also squashed the fixups. Should be ready for merge.
  22. laanwj force-pushed on Oct 15, 2019
  23. practicalswift commented at 3:16 pm on October 15, 2019: contributor
    ACK d7820a1250070f3640246ae497e049bee0b3516f - tested and works as expected :)
  24. laanwj referenced this in commit c34b88620d on Oct 16, 2019
  25. laanwj merged this on Oct 16, 2019
  26. laanwj closed this on Oct 16, 2019

  27. sidhujag referenced this in commit 63a0d41cf6 on Oct 16, 2019
  28. fanquake referenced this in commit dc0fe7ae1f on Oct 19, 2019
  29. fanquake removed the label Needs backport (0.19) on Oct 19, 2019
  30. fanquake referenced this in commit 0b18ea6f57 on Oct 19, 2019
  31. fanquake removed the label Needs backport (0.18) on Oct 19, 2019
  32. laanwj referenced this in commit 5b68d1654f on Oct 21, 2019
  33. MarcoFalke referenced this in commit 4c1090c882 on Oct 28, 2019
  34. HashUnlimited referenced this in commit 50ad32105c on Nov 17, 2019
  35. fxtc referenced this in commit a4706d781f on Nov 25, 2019
  36. fxtc referenced this in commit 3f4b0a02ec on Nov 25, 2019
  37. laanwj referenced this in commit 29d70264fb on Nov 25, 2019
  38. jasonbcox referenced this in commit 14dccf6b47 on Sep 22, 2020
  39. kittywhiskers referenced this in commit 73092e0f94 on Nov 1, 2021
  40. kittywhiskers referenced this in commit 9818f3533d on Nov 1, 2021
  41. kittywhiskers referenced this in commit 13e26024c4 on Nov 1, 2021
  42. pravblockc referenced this in commit f82ce924f7 on Nov 18, 2021
  43. Munkybooty referenced this in commit 6ce7e306b1 on Dec 9, 2021
  44. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me