qt: Kill the locale dependency bug class by not allowing Qt to mess with LC_NUMERIC #18147

pull practicalswift wants to merge 1 commits into bitcoin:master from practicalswift:kill-locale-dependency-bug-class changing 2 files +32 −0
  1. practicalswift commented at 3:33 pm on February 14, 2020: contributor

    Kill the locale dependency bug class by not allowing Qt to mess with LC_NUMERIC.

    Context: #18124 (Recommended reading if you are unsure about how C and C++ locales work in C++)

    Guaranteed locale settings before this PR:

    Program C locale category LC_NUMERIC Global C++ locale (std::locale)
    bitcoind C (classic) C (classic)
    bitcoin-qt No guarantee - user-specified C (classic)

    Guaranteed locale settings after this PR:

    Program C locale category LC_NUMERIC Global C++ locale (std::locale)
    bitcoind C (classic) C (classic)
    bitcoin-qt C (classic) C (classic)

    Background:

    Perhaps surprisingly Qt runs setlocale(LC_ALL, "") on initialization. This installs the locale specified by the user’s LC_ALL (or LC_*) environment variable as the new C locale.

    In contrast bitcoind does not opt-in to localization – no call to setlocale(LC_ALL, "") (or std::locale::global(std::locale(""))) is made and the environment variables LC_* are thus ignored.

    This results in an unfortunate situation where the bitcoind is guaranteed to be running with the classic locale ("C") whereas the locale of bitcoin-qt will vary depending on the user’s environment variables.

    An example: Assuming the environment variable LC_ALL=de_DE then the call std::to_string(1.23) will return "1.230000" in bitcoind but "1,230000" in bitcoin-qt.

    From the Qt documentation:

    “On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication, QGuiApplication or QCoreApplication to reset the locale that is used for number formatting to “C”-locale.”


    C and C++ locales 101:

     0#include <clocale>
     1#include <cstdlib>
     2#include <iostream>
     3#include <locale>
     4#include <sstream>
     5#include <string>
     6
     7int main(void)
     8{
     9    std::cout << "C locale at beginning of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
    10    std::cout << "C++ locale at beginning of main: " << std::locale{}.name() << "\n\n";
    11#if OPT_IN_TO_LOCALIZATION_USING_SET_LOCALE
    12    setlocale(LC_ALL, "");
    13#endif
    14#if OPT_IN_TO_LOCALIZATION_USING_STD_LOCALE_GLOBAL
    15    std::locale::global(std::locale(""));
    16#endif
    17    std::ostringstream oss;
    18    oss << 1000000;
    19    std::cout << "std::ostringstream oss; oss << 1000000; oss.str() equals " << oss.str() << "\n";
    20    std::cout << "std::to_string(1.23) equals " << std::to_string(1.23) << "\n\n";
    21    std::cout << "C locale at end of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
    22    std::cout << "C++ locale at end of main: " << std::locale{}.name() << "\n\n";
    23    std::cout << "setlocale(LC_COLLATE,  nullptr) (read-only operation) = " << std::string{setlocale(LC_COLLATE, nullptr)} << "\n";
    24    std::cout << "setlocale(LC_CTYPE,    nullptr) (read-only operation) = " << std::string{setlocale(LC_CTYPE, nullptr)} << "\n";
    25    std::cout << "setlocale(LC_MESSAGES, nullptr) (read-only operation) = " << std::string{setlocale(LC_MESSAGES, nullptr)} << "\n";
    26    std::cout << "setlocale(LC_MONETARY, nullptr) (read-only operation) = " << std::string{setlocale(LC_MONETARY, nullptr)} << "\n";
    27    std::cout << "setlocale(LC_NUMERIC,  nullptr) (read-only operation) = " << std::string{setlocale(LC_NUMERIC, nullptr)} << "\n";
    28    std::cout << "setlocale(LC_TIME,     nullptr) (read-only operation) = " << std::string{setlocale(LC_TIME, nullptr)} << "\n";
    29}
    

    Let’s try with LC_ALL=de_DE ./l8n without any opt-in to locale dependence:

     0$ clang++ -o l8n l8n.cpp
     1$ LC_ALL=de_DE ./l8n
     2C locale at beginning of main: C
     3C++ locale at beginning of main: C
     4
     5std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
     6std::to_string(1.23) equals 1.230000
     7
     8C locale at end of main: C
     9C++ locale at end of main: C
    10
    11setlocale(LC_COLLATE,  nullptr) (read-only operation) = C
    12setlocale(LC_CTYPE,    nullptr) (read-only operation) = C
    13setlocale(LC_MESSAGES, nullptr) (read-only operation) = C
    14setlocale(LC_MONETARY, nullptr) (read-only operation) = C
    15setlocale(LC_NUMERIC,  nullptr) (read-only operation) = C
    16setlocale(LC_TIME,     nullptr) (read-only operation) = C
    

    Let’s try with LC_NUMERIC=de_DE ./l8n without any opt-in to locale dependence:

     0$ clang++ -o l8n l8n.cpp
     1$ LC_NUMERIC=de_DE ./l8n
     2C locale at beginning of main: C
     3C++ locale at beginning of main: C
     4
     5std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
     6std::to_string(1.23) equals 1.230000
     7
     8C locale at end of main: C
     9C++ locale at end of main: C
    10
    11setlocale(LC_COLLATE,  nullptr) (read-only operation) = C
    12setlocale(LC_CTYPE,    nullptr) (read-only operation) = C
    13setlocale(LC_MESSAGES, nullptr) (read-only operation) = C
    14setlocale(LC_MONETARY, nullptr) (read-only operation) = C
    15setlocale(LC_NUMERIC,  nullptr) (read-only operation) = C
    16setlocale(LC_TIME,     nullptr) (read-only operation) = C
    

    Let’s try LC_ALL=de_DE ./l8n with opt-in to locale dependence using set_locale:

     0$ clang++ -DOPT_IN_TO_LOCALIZATION_USING_SET_LOCALE -o l8n l8n.cpp
     1$ LC_ALL=de_DE ./l8n
     2C locale at beginning of main: C
     3C++ locale at beginning of main: C
     4
     5std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
     6std::to_string(1.23) equals 1,230000
     7
     8C locale at end of main: de_DE
     9C++ locale at end of main: C
    10
    11setlocale(LC_COLLATE,  nullptr) (read-only operation) = de_DE
    12setlocale(LC_CTYPE,    nullptr) (read-only operation) = de_DE
    13setlocale(LC_MESSAGES, nullptr) (read-only operation) = de_DE
    14setlocale(LC_MONETARY, nullptr) (read-only operation) = de_DE
    15setlocale(LC_NUMERIC,  nullptr) (read-only operation) = de_DE
    16setlocale(LC_TIME,     nullptr) (read-only operation) = de_DE
    

    Let’s try LC_ALL=de_DE ./l8n with opt-in to locale dependence using std::locale::global:

     0$ clang++ -DOPT_IN_TO_LOCALIZATION_USING_STD_LOCALE_GLOBAL -o l8n l8n.cpp
     1$ LC_ALL=de_DE ./l8n
     2C locale at beginning of main: C
     3C++ locale at beginning of main: C
     4
     5std::ostringstream oss; oss << 1000000; oss.str() equals 1.000.000
     6std::to_string(1.23) equals 1,230000
     7
     8C locale at end of main: de_DE
     9C++ locale at end of main: de_DE
    10
    11setlocale(LC_COLLATE,  nullptr) (read-only operation) = de_DE
    12setlocale(LC_CTYPE,    nullptr) (read-only operation) = de_DE
    13setlocale(LC_MESSAGES, nullptr) (read-only operation) = de_DE
    14setlocale(LC_MONETARY, nullptr) (read-only operation) = de_DE
    15setlocale(LC_NUMERIC,  nullptr) (read-only operation) = de_DE
    16setlocale(LC_TIME,     nullptr) (read-only operation) = de_DE
    
  2. DrahtBot added the label GUI on Feb 14, 2020
  3. DrahtBot added the label Tests on Feb 14, 2020
  4. luke-jr commented at 5:16 pm on February 14, 2020: member

    Concept NACK.

    Locale-enabled functions should be using the user’s locale. If we don’t want that, we should use locale-independent equivalents.

  5. practicalswift commented at 4:30 pm on February 15, 2020: contributor

    @luke-jr

    I’m unable to parse your message.

    In bitcoin-qt we have no less than three(!) locales to consider:

    • The C locale (std::string{setlocale(LC_ALL, nullptr)})
    • The global C++ locale (std::locale{}.name())
    • The Qt locale (QLocale::system().name().toStdString())

    Which of these are you suggesting should be set according to what the user has specified in his/hers LC_* environment variables (which I assume is what you mean by “should be using the user’s locale”)?

  6. luke-jr commented at 5:15 pm on February 15, 2020: member
    @practicalswift The user-specified locale should affect all locale code, so all 3 should be set correctly…
  7. practicalswift commented at 5:17 pm on February 15, 2020: contributor
    @luke-jr Are you aware that it does not work like that today?
  8. practicalswift commented at 10:58 am on February 23, 2020: contributor
    @luke-jr Are you aware that we never set the global C++ locale to the user-specified locale in any of the programs we produce?
  9. practicalswift commented at 3:28 pm on March 6, 2020: contributor
    @luke-jr Friendly ping. Could you clarify? :) I’m trying to make sense of the feedback.
  10. DrahtBot commented at 10:38 pm on March 9, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  11. DrahtBot added the label Needs rebase on Mar 25, 2020
  12. laanwj commented at 3:38 pm on March 26, 2020: member

    Locale-enabled functions should be using the user’s locale. If we don’t want that, we should use locale-independent equivalents.

    I agree with Luke-Jr for a change.

    We should move to locale-independent functions for locale-independent formatting and functionality (RPC and command line handling and logging and such), not add more and more layers of these kind of increasingly obscure global workarounds to coerce the functions into hopefully being locale-independent.

  13. practicalswift commented at 4:04 pm on March 26, 2020: contributor

    @laanwj

    I agree that we should use locale-independent functions, but I think you’re misunderstanding what I am suggesting.

    I think that the implicit setlocale(LC_NUMERIC, "") call we’re doing in the current code via Qt is purely unintentional.

    AFAICT we rely on QLocale everywhere we want locale dependence, so there is no reason to do setlocale(LC_NUMERIC, "").

    From what I can tell this implicit setlocale(LC_NUMERIC, "") via Qt has given us years of locale problems without anyone really noticing that this call is the root cause.

    Are you following my reasoning?

    If not, can you think of a counter-example where we want the setlocale(LC_NUMERIC, "") call? I.e. where we are not relying on QLocale for the locale dependence we want.

    My suggestion is thus that:

    • we should continue to remove calls to locale dependent functions (the whack-a-mole game), and
    • we should not opt-in to locale dependence we don’t want (such as setlocale(LC_NUMERIC, "")).

    I think we should do both - not either-or :)

    What would the downsides be from such an approach? :)

  14. qt: Don't allow Qt to mess with LC_NUMERIC 26be181397
  15. practicalswift force-pushed on Mar 26, 2020
  16. practicalswift commented at 9:28 pm on March 26, 2020: contributor
    Rebased :)
  17. sipa commented at 10:50 pm on March 26, 2020: member
    @practicalswift My preference (and I suspect that’s what @laanwj means as well) is that in the parts of the codebase that are supposed to be locale-independent, simply no locale-related operations are present (including locale-dependent formatting, or checks of, or setting of, any global locale). That would make those pieces of code maximally reusable.
  18. DrahtBot removed the label Needs rebase on Mar 26, 2020
  19. practicalswift commented at 3:30 pm on March 27, 2020: contributor

    @sipa I understand that and I agree with that: I’ve worked extensively with getting rid of the use of locale-dependent functions in our code base :)

    What I still don’t understand is how we would be better off by opting in to locale dependence that we do not want.

    I don’t see how a.) getting rid of usage of locale independent functions and b.) not opting in to locale dependence we do not want would in any way be mutually exclusive :)

    To make this more concrete:

    Is there a single place in the code base where we want the locale dependency we’re now opting in to by the implicit setlocale(LC_NUMERIC, "") call we are making accidentally and indirectly via Qt?

    Are we intentionally opting in to locale dependency for the POSIX functions? To me it looks like this is purely by accident. Please correct me if I’m wrong :)

    From the Qt documentation:

    “On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication, QGuiApplication or QCoreApplication to reset the locale that is used for number formatting to “C”-locale.”

  20. practicalswift commented at 3:39 pm on March 27, 2020: contributor

    I guess this is like mitigations in general – an example:

    • a. We want to avoid buffer overflows in our code. We do that by auditing code and fix it as we discover issues.
    • b. If we one day discovered that we were accidentally and globally opted in to a certain subset of buffer overflows via the logic in a third party dependency then we would like to correct that by restoring the default of not opting in to that.

    a.) and b.) are clearly not mutually exclusive: we would do b.) immediately and over time do our best to achieve a.) as well.

    I fail to see how this case is different :)

    This case:

    • a. We want to avoid locale dependency bugs in our code. We do that by auditing code and fix it as we discover issues.
    • b. If we one day discovered that we were accidentally and globally opted in to a certain subset of locale dependency bugs via the logic in a third party dependency then we would like to correct that by restoring the default of not opting in to that.

    Help me understand what I’m missing :)

  21. practicalswift commented at 2:40 pm on April 29, 2020: contributor

    Closing this issue. Seems like I was unable to gather consensus support for permanently killing off the locale dependency bug class by fixing the root cause (unintended Qt interference with POSIX LC_NUMERIC during Qt initialisation) :)

    I guess we’ll have to live with locale issues also in the future, but now the root cause is documented at least and this PR can be referenced when we hit locale issues going forward :)

  22. practicalswift closed this on Apr 29, 2020

  23. practicalswift deleted the branch on Apr 10, 2021
  24. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me