[bugfix] Fix encoding issue for Windows #13426

pull ken2812221 wants to merge 13 commits into bitcoin:master from ken2812221:u8path_u8string changing 37 files +603 −195
  1. ken2812221 commented at 4:23 pm on June 10, 2018: contributor

    Fix #13103 Fix #13754

    From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.

    This PR adds two C++17 compatible macros: u8string and u8path. The reason that I use macros is that developers might not want to pass second parameter utf8 everytime when they call string or create path objects.

    I tried to find all string methods and convert them to u8string except for external api calls. Also tried to convert explicit or implicit path to u8path. Tell me if you find something that I missed.

    Required: bitcoin-core/leveldb#18

    See #13787 for travis run result

    Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.

  2. ken2812221 commented at 0:20 am on June 11, 2018: contributor

    Can we use path.imbued.locale instead for this?

    We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add boost::locale dependency in order to use boost locale generator.

  3. ken2812221 force-pushed on Jun 11, 2018
  4. ken2812221 force-pushed on Jun 11, 2018
  5. ken2812221 force-pushed on Jun 11, 2018
  6. laanwj added the label Bug on Jun 11, 2018
  7. ken2812221 force-pushed on Jun 12, 2018
  8. ken2812221 force-pushed on Jun 12, 2018
  9. ken2812221 force-pushed on Jun 12, 2018
  10. ken2812221 force-pushed on Jun 12, 2018
  11. ken2812221 renamed this:
    [WIP, bugfix] Add u8path and u8string to boost to fix #13103
    [bugfix] Add u8path and u8string to boost to fix #13103
    on Jun 12, 2018
  12. fanquake requested review from theuni on Jun 12, 2018
  13. ken2812221 force-pushed on Jun 12, 2018
  14. theuni commented at 8:27 pm on June 12, 2018: member

    I’m not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I’m just not familiar with it.

    Maybe start by upstreaming it and see where the discussion goes?

  15. ken2812221 commented at 8:32 pm on June 12, 2018: contributor

    Firstly because it would mean that only depends-builds can build for Windows

    Is there any way to build Bitcoin Core for Windows without depends-builds? Also, I created a PR https://github.com/boostorg/filesystem/pull/75

  16. achow101 commented at 11:24 pm on June 12, 2018: member

    Is there any way to build Bitcoin Core for Windows without depends-builds?

    Supposedly there is a way to build with MSVC and Visual Studio which doesn’t use the depends system.

  17. ken2812221 force-pushed on Jun 13, 2018
  18. ken2812221 force-pushed on Jun 13, 2018
  19. ken2812221 force-pushed on Jun 13, 2018
  20. ken2812221 commented at 0:22 am on June 13, 2018: contributor
    I found a way not to patch boost library, it should work with msvc now.
  21. ken2812221 force-pushed on Jun 13, 2018
  22. ken2812221 force-pushed on Jun 13, 2018
  23. ken2812221 force-pushed on Jun 13, 2018
  24. ken2812221 renamed this:
    [bugfix] Add u8path and u8string to boost to fix #13103
    [bugfix] Add u8path and u8string to fix #13103
    on Jun 13, 2018
  25. ken2812221 force-pushed on Jun 13, 2018
  26. DrahtBot commented at 8:25 pm on June 14, 2018: member
    • #13878 (utils: Add fstream wrapper to allow to pass unicode filename on Windows by ken2812221)
    • #13877 (utils: Make fs::path::string() always return utf-8 string on Windows by ken2812221)
    • #13866 (utils: Use _wfopen and _wreopen on Windows by ken2812221)
    • #13864 (validation: Document where we are intentionally ignoring bool return values from validation related functions by practicalswift)
    • #13862 (utils: drop boost::interprocess::file_lock by ken2812221)
    • #13846 (Move src/tinyformat.h to src/tinyformat/tinyformat.h by Empact)
    • #13845 (Include tinyformat as a subtree by Empact)
    • #13815 (util: Add [[nodiscard]] to all {Decode,Parse} functions returning bool by practicalswift)
    • #13761 (Docs: Create default bitcoin.conf file on startup by leishman)
    • #13756 (wallet: -avoidreuse feature for improved privacy by kallewoof)
    • #13751 (Utils and libraries: Drops the boost/algorithm/string/split.hpp dependency by 251Labs)
    • #13746 (-masterdatadir for datadir bootstrapping by kallewoof)
    • #13734 (gui: Drop boost::scoped_array and use wchar_t API explicitly on Windows by ken2812221)
    • #13723 (PSBT key path cleanups by sipa)
    • #13716 (bitcoin-cli: -stdinwalletpassphrase and non-echo stdin passwords by kallewoof)
    • #13671 (Remove the boost/algorithm/string/case_conv.hpp dependency by 251Labs)
    • #13667 (wallet: Fix backupwallet for multiwallets by domob1812)
    • #13639 ([refactor] Fix the chainparamsbase -> util circular dependency by Empact)
    • #13621 (Check for datadir after the config files were read by Flowdalic)
    • #13389 (Utils and libraries: Fix #13371 - move umask operation earlier in AppInit() by n2yen)
    • #13249 (Make objects in range declarations immutable by default. Avoid unnecessary copying of objects in range declarations. by practicalswift)
    • #13159 (Don’t close old debug log file handle prematurely when trying to re-open (on SIGHUP) by practicalswift)
    • #13100 (gui: Add dynamic wallets support by promag)
    • #13088 (Log early messages with -printtoconsole by ajtowns)
    • #12783 (macOS: Disable AppNap by krab)
    • #11911 (Free CDBEnv instances when not in use by ryanofsky)
    • #11625 (WIP: Add BitcoinApplication & RPCConsole tests by ryanofsky)
    • #10973 (Refactor: separate wallet from node by ryanofsky)
    • #10102 ([experimental] Multiprocess bitcoin by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  27. in src/util.cpp:87 in 5dbd2489a6 outdated
    84@@ -85,6 +85,8 @@ const int64_t nStartupTime = GetTime();
    85 const char * const BITCOIN_CONF_FILENAME = "bitcoin.conf";
    86 const char * const BITCOIN_PID_FILENAME = "bitcoind.pid";
    87 
    88+fs::detail::utf8_codecvt_facet g_utf8;
    


    Empact commented at 9:49 pm on June 15, 2018:
    static?

    ken2812221 commented at 10:10 pm on June 15, 2018:
    g_utf8 should be accessible globally.

    Empact commented at 1:51 am on June 16, 2018:
    Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P
  28. in src/util.h:56 in 5dbd2489a6 outdated
    49@@ -50,6 +50,11 @@ extern CTranslationInterface translationInterface;
    50 extern const char * const BITCOIN_CONF_FILENAME;
    51 extern const char * const BITCOIN_PID_FILENAME;
    52 
    53+extern fs::detail::utf8_codecvt_facet g_utf8;
    54+
    55+#define u8string() string(g_utf8)
    56+#define u8path(str) path(str, g_utf8)
    


    Empact commented at 9:51 pm on June 15, 2018:
    Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.

    ken2812221 commented at 10:10 pm on June 15, 2018:
    I just thought that pass the second parameter at every .string() and path() is pretty annoying and easy to forget. If most developers don’t want to use macro function, I could do a scripted-diff.

    Empact commented at 1:35 am on June 16, 2018:
    I mean define these macros as functions in the fs namespace that call through as defined here.

    ken2812221 commented at 4:10 am on June 16, 2018:
    We could include u8path into fs namespace, however for u8string we couldn’t since it’s a member function.

    Empact commented at 10:02 pm on June 16, 2018:
    I’ll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path for string as well, and hide g_utf8
  29. ken2812221 force-pushed on Jun 21, 2018
  30. ken2812221 commented at 3:11 pm on June 21, 2018: contributor
    Rebased
  31. unknown approved
  32. DrahtBot added the label Needs rebase on Jul 12, 2018
  33. ken2812221 force-pushed on Jul 12, 2018
  34. in src/bitcoin-cli.cpp:445 in 835f9b1418 outdated
    421@@ -422,6 +422,10 @@ static int CommandLineRPC(int argc, char *argv[])
    422             gArgs.ForceSetArg("-rpcpassword", rpcPass);
    423         }
    424         std::vector<std::string> args = std::vector<std::string>(&argv[1], &argv[argc]);
    425+#ifdef WIN32
    426+        for (std::string& arg : args)
    427+            arg = AnsiToUtf8(arg);
    428+#endif
    


    ken2812221 commented at 1:36 pm on July 12, 2018:

    In order to solve this problem, there are three ways to do.

    • The first one is above, which might get ? if the user enter any character that can’t present by user’s code page.
    • The second one is using GetCommandLineW and CommandLineToArgvW to get command line options in type wchar_t, which can present more Unicode characters.
    • The third one is adding wmain to get command line in type wchar_t directly by “(w)main” function.

    Which one is prefered?


    ken2812221 commented at 2:51 am on July 26, 2018:
    I choose the second way
  35. DrahtBot removed the label Needs rebase on Jul 12, 2018
  36. ken2812221 renamed this:
    [bugfix] Add u8path and u8string to fix #13103
    [bugfix] Add u8path and u8string to fix encoding issue for Windows
    on Jul 17, 2018
  37. laanwj added this to the milestone 0.17.0 on Jul 19, 2018
  38. DrahtBot added the label Needs rebase on Jul 20, 2018
  39. ken2812221 force-pushed on Jul 20, 2018
  40. fanquake removed the label Needs rebase on Jul 21, 2018
  41. fanquake commented at 4:57 am on July 21, 2018: member

    @theuni We probably want your thoughts here again.

    For reference nothing has happened at the upstream PR, however that is less relevant now that we aren’t patching Boost.

  42. Sjors commented at 9:56 am on July 25, 2018: member

    Compared to master, this PR helps in at least the following scenario:

    • Windows 10, system locale set to Simplified Chinese (see #13754 for how)
    • launch QT from search bar or from command prompt: bitcoin-qt -wallet=你好
    • open console and do getwalletinfo

    Before:

    After:

  43. DrahtBot added the label Needs rebase on Jul 25, 2018
  44. Add u8string and u8path function macros cacd92112e
  45. ken2812221 force-pushed on Jul 26, 2018
  46. ken2812221 commented at 2:45 am on July 26, 2018: contributor
    @Sjors Now you can use any walletname you want even if you are using English language setting.
  47. DrahtBot removed the label Needs rebase on Jul 26, 2018
  48. ken2812221 renamed this:
    [bugfix] Add u8path and u8string to fix encoding issue for Windows
    [bugfix] Fix encoding issue for Windows
    on Jul 26, 2018
  49. jonasschnelli commented at 11:11 am on July 26, 2018: contributor
  50. Sjors commented at 12:15 pm on July 26, 2018: member

    Now you can use any walletname you want even if you are using English language setting

    Thanks, that worked.

    Can you change some of the functional tests to use unicode characters? I think wallet_multiwallet.py and wallet_labels.py would be good examples, since they use RPC arguments as well as filenames. I renamed label e in labels tests which worked fine on macOS. The multiwallet test was less happy when I renamed w3: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128). @jnewbery thoughts?

    I also tested on macOS and that (still) works fine.

  51. ken2812221 force-pushed on Jul 26, 2018
  52. ken2812221 commented at 10:15 pm on July 26, 2018: contributor
    Update: Dropped wmain, Use GetCommandLineW and CommandLineToArgvW instead.
  53. Convert command line string to utf8 c7ced5b87a
  54. Use u8string at proper place 30618a83c8
  55. Use u8path at proper place 9f7078eca6
  56. ui: Use u8string and u8path to convert between path and QString ec774c5b97
  57. Make bdb compiled in Unicode mode 023691127f
  58. Pass utf-8 encoded string to bdb api 4a222f6787
  59. ken2812221 force-pushed on Jul 26, 2018
  60. jnewbery commented at 3:28 pm on July 27, 2018: member

    Can you change some of the functional tests to use unicode characters? … @jnewbery thoughts?

    Yes, functional tests should definitely include wallets & labels with unicode characters. We already have a test that covers unicode rpc username/password: https://github.com/bitcoin/bitcoin/blob/89a116dc0b446de0d18a981699a279eeaf6c9ea9/test/functional/rpc_users.py#L32

    wallet_multiwallet.py seems like a good place to add the test.

  61. ken2812221 force-pushed on Jul 28, 2018
  62. ken2812221 force-pushed on Jul 28, 2018
  63. ken2812221 commented at 6:08 pm on July 28, 2018: contributor
    Update: Now we can specify any characters in -datadir.
  64. ken2812221 closed this on Jul 28, 2018

  65. ken2812221 reopened this on Jul 28, 2018

  66. Sjors commented at 1:26 pm on July 30, 2018: member
    Some Travis builds are unhappy:
  67. ken2812221 commented at 2:17 pm on July 30, 2018: contributor
    @Sjors Those needs upstream changes bitcoin-core/leveldb#18, you can see travis result on #13787.
  68. ken2812221 force-pushed on Jul 31, 2018
  69. Make FileLock support utf8 for Windows c3e0340da8
  70. Use _wfopen and _wreopen on Windows 2d86754743
  71. Make leveldb works with utf8 087792e3c0
  72. Use Unicode version of MoveFileEx 4723aed698
  73. Add fsbridge::i/ofstream class 014aa3e61a
  74. Change fs/std::i/ofstream to fsbridge::i/ofstream 2fa7d421d1
  75. ken2812221 force-pushed on Jul 31, 2018
  76. ken2812221 force-pushed on Jul 31, 2018
  77. laanwj removed this from the milestone 0.17.0 on Aug 2, 2018
  78. laanwj added this to the milestone 0.18.0 on Aug 2, 2018
  79. laanwj added the label Windows on Aug 2, 2018
  80. Empact commented at 4:43 am on August 3, 2018: member
    If you call CommandLineToArgvW within gArgs.ParseParameters instead, you can avoid touching src/bitcoind.cpp and src/bench/bench_bitcoin.cpp, at least. There may be other opportunities for minimization.
  81. ken2812221 commented at 7:17 am on August 3, 2018: contributor

    If you call CommandLineToArgvW within gArgs.ParseParameters

    It would break the ParseParameters in tests if we do that.

  82. ken2812221 force-pushed on Aug 4, 2018
  83. ken2812221 commented at 3:27 pm on August 5, 2018: contributor
    I’ll seperate these into many PRs.
  84. ken2812221 closed this on Aug 5, 2018

  85. ken2812221 deleted the branch on Aug 6, 2018
  86. laanwj referenced this in commit 2ddce35abc on Aug 29, 2018
  87. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me