build: Add linter checking for accidental introduction of locale dependence #13041

pull practicalswift wants to merge 2 commits into bitcoin:master from practicalswift:lint-locale-dependence changing 2 files +258 −1
  1. practicalswift commented at 12:20 PM on April 20, 2018: contributor

    This linter will check for code accidentally introducing locale dependencies.

    Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

    Context: #12881 (comment)

    Example output:

    $ contrib/devtools/lint-locale-dependence.sh
    The locale dependent function tolower(...) appears to be used:
    src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {
    
    Unnecessary locale dependence can cause bugs that are very
    tricky to isolate and fix. Please avoid using locale dependent
    functions if possible.
    
    Advice not applicable in this specific case? Add an exception
    by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
    

    Note to reviewers: What is the most appropriate LOCALE_DEPENDENT_FUNCTIONS function list? What should be added or removed?

  2. fanquake added the label Scripts and tools on Apr 20, 2018
  3. practicalswift commented at 9:36 AM on April 22, 2018: contributor

    @laanwj Do you think the LOCALE_DEPENDENT_FUNCTIONS list is correct? :-)

  4. laanwj commented at 1:55 PM on April 23, 2018: member

    Concept ACK. I think this is a good idea.

    @laanwj Do you think the LOCALE_DEPENDENT_FUNCTIONS list is correct? :-)

    It's a good start. Please add strftime too, that's the one that caused recent discussion in #12973.

  5. laanwj assigned laanwj on Apr 23, 2018
  6. practicalswift force-pushed on Apr 23, 2018
  7. practicalswift commented at 2:04 PM on April 23, 2018: contributor

    @laanwj strftime and strptime added. Please re-review :-)

  8. in contrib/devtools/lint-locale-dependence.sh:11 in 4ba8cc19a7 outdated
       6 | +    iswgraph iswlower iswprint iswpunct iswspace iswupper iswxdigit isxdigit
       7 | +    stoi stol stoll strcoll strftime strptime strtod strtol strtoll tolower
       8 | +    toupper towlower towupper
       9 | +)
      10 | +REGEXP_IGNORE_EXTERNAL="src/(crypto/ctaes/|leveldb/|secp256k1/|tinyformat.h|univalue/)"
      11 | +REGEXP_IGNORE_KNOWN_VIOLATIONS="^(src/base58.cpp:.*isspace|src/qt/rpcconsole.cpp:.*isdigit|src/rest.cpp:.*strtol|src/torcontrol.cpp:.*strtol|src/uint256.cpp:.*isspace|src/uint256.cpp:.*tolower|src/util.cpp:.*tolower|src/utilmoneystr.cpp:.*isdigit|src/utilmoneystr.cpp:.*isspace|src/utilstrencodings.cpp:.*isspace|src/utilstrencodings.cpp:.*strtoll|src/utilstrencodings.cpp:.*strtol)"
    


    laanwj commented at 2:10 PM on April 23, 2018:

    That's a shocking number of known violations. I didn't know e.g. strtol is also locale dependent, otherwise I wouldn't have used it in torcontrol for example.


    laanwj commented at 2:11 PM on April 23, 2018:

    Is it possible to split this list over multiple lines, so that it's possible to remove an exception by removing a line?

  9. practicalswift force-pushed on Apr 23, 2018
  10. practicalswift commented at 3:06 PM on April 23, 2018: contributor

    @laanwj Updated version with the known violations list split over multiple lines. Please re-review :-)

  11. practicalswift force-pushed on Apr 23, 2018
  12. practicalswift commented at 3:24 PM on April 23, 2018: contributor

    Candidates for inclusion in the list of locale dependent functions:

    $ nm /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so | grep -E '_l$' | \
          cut -f3 -d' ' | grep -E '^[a-z]' | sort | sed 's/_l$//g'
    isalnum
    isalpha
    isblank
    iscntrl
    isdigit
    isgraph
    islower
    isprint
    ispunct
    isspace
    isupper
    iswalnum
    iswalpha
    iswblank
    iswcntrl
    iswctype
    iswdigit
    iswgraph
    iswlower
    iswprint
    iswpunct
    iswspace
    iswupper
    iswxdigit
    isxdigit
    nl_langinfo
    strcasecmp
    strcoll
    strerror
    strfmon
    strftime
    strncasecmp
    strptime
    strtod
    strtof
    strtold
    strtol
    strtoll
    strtoul
    strtoull
    strxfrm
    tolower
    toupper
    towctrans
    towlower
    towupper
    wcscasecmp
    wcscoll
    wcsftime
    wcsncasecmp
    wcstod
    wcstof
    wcstold
    wcstol
    wcstoll
    wcstoul
    wcstoull
    wcsxfrm
    wctrans
    wctype
    
  13. practicalswift commented at 3:31 PM on April 23, 2018: contributor

    The use of strtoul(...) and strtoull(...) might be problematic too?

  14. practicalswift force-pushed on Apr 23, 2018
  15. practicalswift commented at 3:40 PM on April 23, 2018: contributor

    Updated: Added more functions to LOCALE_DEPENDENT_FUNCTIONS. Added two known violations for strtoul(...) and strtoull(...) in utilstrencodings.cpp.

    Please re-review :-)

  16. laanwj commented at 6:13 AM on April 24, 2018: member

    utACK 29b73b509f1aa537a835f9e75878622c3a18335f

  17. practicalswift commented at 9:28 AM on April 24, 2018: contributor

    @theuni @sipa Your language expertise is needed :-)

    If you have time to review this PR – does LOCALE_DEPENDENT_FUNCTIONS look reasonable? :-)

  18. practicalswift force-pushed on Apr 24, 2018
  19. practicalswift commented at 12:26 PM on April 24, 2018: contributor

    Updated:

    • Added more functions to LOCALE_DEPENDENT_FUNCTIONS.
    • Added known atoi(…) violations for bitcoin-tx.cpp, init.cpp, qt/rpcconsole.cpp, torcontrol.cpp, util.cpp, utilstrencodings.cpp and utilstrencodings.h.
    • Added known *printf(…) violations for dbwrapper.cpp, test/dbwrapper_tests.cpp and util.cpp.
    • Now invoking git grep only once in order to minimize run-time.

    Please re-review :-)

  20. practicalswift force-pushed on Apr 24, 2018
  21. practicalswift force-pushed on Apr 24, 2018
  22. practicalswift force-pushed on Apr 24, 2018
  23. practicalswift force-pushed on Apr 24, 2018
  24. practicalswift force-pushed on Apr 24, 2018
  25. practicalswift force-pushed on Apr 24, 2018
  26. laanwj commented at 12:16 PM on April 25, 2018: member

    I think printf and fprintf(stderr, ... are fine. They are only used to print error messages, it's not important if they are formatted slightly different according to the locale. (not so much for sprintf vsnprintf etc which should be replaced with use of tinyformat, these end up in strings and might end up on the RPC interface or in files meant to be portable)

  27. practicalswift commented at 12:23 PM on April 25, 2018: contributor

    @laanwj You saw that fprintf(stderr, … is explicitly filtered out via grep -vE 'fprintf\(.*(stdout|stderr)' in the script? :-)

    Should I keep fprintf (with the exclusion above) and remove printf? Makes sense?

  28. laanwj commented at 12:54 PM on April 25, 2018: member

    @laanwj You saw that fprintf(stderr, … is explicitly filtered out via grep -vE 'fprintf(.*(stdout|stderr)' in the script? :-)

    Oh that's good! No, hadn't noticed that yet.

    Should I keep fprintf (with the exclusion above) and remove printf? Makes sense?

    Yes I think so!

  29. practicalswift force-pushed on Apr 25, 2018
  30. practicalswift commented at 1:01 PM on April 25, 2018: contributor

    @laanwj Updated version with printf removed! Please re-review :-)

  31. laanwj commented at 10:57 AM on April 26, 2018: member

    Thinking of it, it would make sense to add this list to the developer notes as well (next to the mention of the locale risks) , so it doesn't come out of the blue.

  32. practicalswift force-pushed on Apr 26, 2018
  33. practicalswift force-pushed on Apr 26, 2018
  34. practicalswift commented at 7:16 PM on April 26, 2018: contributor

    @laanwj Good point! Added a reference to the script and included the list in the developer notes. Looks good? :-)

  35. practicalswift force-pushed on Apr 26, 2018
  36. practicalswift force-pushed on May 9, 2018
  37. practicalswift force-pushed on May 22, 2018
  38. practicalswift force-pushed on May 22, 2018
  39. practicalswift force-pushed on May 22, 2018
  40. practicalswift commented at 5:56 AM on May 22, 2018: contributor

    Updated: Added some locale dependent Boost functions and the corresponding KNOWN_VIOLATIONS entries.

    Added known violations:

    • is_digit is used in src/core_read.cpp
    • to_lower is used in src/netbase.cpp
    • to_upper is used in src/rpc/server.cpp
    • trim is used in src/core_read.cpp
    • trim is used in src/httprpc.cpp
    • trim_right is used in src/bitcoin-tx.cpp

    Added Boost functions:

    fold_case
    is_digit
    is_space
    normalize
    to_lower
    to_title
    to_upper
    trim
    trim_left
    trim_right
    

    @laanwj, is this PR getting ready for merge? :-) When it is merged my plan is to start replacing calls to locale dependent functions with their non-locale dependent counterparts.

  41. practicalswift commented at 11:59 AM on May 24, 2018: contributor

    @Empact Would you mind reviewing this PR? :-)

  42. in contrib/devtools/lint-locale-dependence.sh:102 in f947ef02f9 outdated
      97 | +    mbsrtowcs
      98 | +    mbstowcs     # LC_CTYPE
      99 | +    mbtowc       # LC_CTYPE
     100 | +    mktime
     101 | +    normalize    # boost::locale::normalize
     102 | +#   printf       # LC_NUMERIC
    


    Empact commented at 12:16 AM on May 25, 2018:

    Would be helpful to add an explanatory comment as to why each of these are disabled. E.g. I don't see a printf call outside of crypto/secp256k1/tinyformat.h so it's not clear to me why this is disabled.

  43. in contrib/devtools/lint-locale-dependence.sh:220 in f947ef02f9 outdated
     215 | +        echo "The locale dependent function ${LOCALE_DEPENDENT_FUNCTION}(...) appears to be used:"
     216 | +        echo "${MATCHES}"
     217 | +        echo
     218 | +        EXIT_CODE=1
     219 | +    fi
     220 | +done
    


    Empact commented at 12:47 AM on May 25, 2018:

    Is it possible to simplify this by processing them at once? I'm skeptical grouping by function will be helpful.


    practicalswift commented at 5:58 AM on May 25, 2018:

    @Empact For performance or usability reasons?

    If you mean for performance reasons, then please note that the costly operation git grep is run only once.

    From a usability perspective I think it makes sense to group the output per function – try running the script with an empty KNOWN_VIOLATIONS and you'll see what I mean :-) But I'm open to alternative outputs, so please provide a diff and I'll check out your suggestion :-)


    Empact commented at 9:05 AM on May 25, 2018:

    Speaking from a maintainability perspective - seems in the general case KNOWN_VIOLATIONS will be populated so the list will be relatively short and only introduced by they who open the PR, in which case the grouping isn't very beneficial IMO. If not much usability benefit is gained, then if the code can be simplified by removing it I think it's worthwhile to do so.


    practicalswift commented at 5:50 PM on May 25, 2018:

    @Empact Do you have any specific output formatting in mind? Please give an example on how it would look :-)

  44. laanwj commented at 6:14 PM on June 5, 2018: member

    utACK f947ef02f9c8eee5957bb02e12cee4edd636c9fa

  45. cnavigato commented at 6:33 PM on June 5, 2018: none

    On Jun 5, 2018 11:15 AM, "Wladimir J. van der Laan" < notifications@github.com> wrote:

    utACK f947ef0 https://github.com/bitcoin/bitcoin/commit/f947ef02f9c8eee5957bb02e12cee4edd636c9fa

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/13041#issuecomment-394808844, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfpiNwEhc18PlReCb8tP5iuXoUiYLn6ks5t5sqtgaJpZM4TdXns .

  46. ken2812221 commented at 6:35 PM on June 5, 2018: contributor

    Need to move contrib/devtools/lint-locale-dependence.sh to test/lint/lint-locale-dependence.sh

  47. build: Add linter for checking accidental locale dependence 0a4ea2f458
  48. docs: Mention lint-locale-dependence.sh in developer-notes.md 698cfd0811
  49. practicalswift force-pushed on Jun 6, 2018
  50. practicalswift commented at 6:10 AM on June 6, 2018: contributor

    @ken2812221 Thanks! Moved! @laanwj @cnavigato Please re-review :-)

  51. ken2812221 approved
  52. ken2812221 commented at 8:55 AM on June 6, 2018: contributor

    utACK 698cfd081144845f6246171b8a2a0cfa8daaecdb

  53. laanwj merged this on Jun 7, 2018
  54. laanwj closed this on Jun 7, 2018

  55. laanwj referenced this in commit 5779dc4f76 on Jun 7, 2018
  56. MarcoFalke referenced this in commit f0a6a922fe on Sep 13, 2018
  57. PastaPastaPasta referenced this in commit 41e71ce854 on Jun 17, 2020
  58. PastaPastaPasta referenced this in commit 28b02f42e1 on Jul 2, 2020
  59. MarcoFalke referenced this in commit 7027c67cac on Jul 2, 2020
  60. zkbot referenced this in commit 43ac2062f9 on Oct 28, 2020
  61. zkbot referenced this in commit 84a5830aaa on Nov 9, 2020
  62. practicalswift deleted the branch on Apr 10, 2021
  63. PastaPastaPasta referenced this in commit 0c1f5b5934 on Jun 27, 2021
  64. PastaPastaPasta referenced this in commit 59d65a9da9 on Jun 28, 2021
  65. PastaPastaPasta referenced this in commit 746ae504ca on Jun 29, 2021
  66. PastaPastaPasta referenced this in commit f4956266df on Jul 1, 2021
  67. vijaydasmp referenced this in commit dfa262c93f on Oct 4, 2021
  68. gades referenced this in commit 6985870658 on Mar 8, 2022
  69. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-16 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me