Modernize use of UTF-8 in Windows code #32380

pull hebasto wants to merge 5 commits into bitcoin:master from hebasto:250429-windows-utf8 changing 16 files +27 −80
  1. hebasto commented at 4:55 pm on April 29, 2025: member

    The main goal is to remove deprecated code (removed in C++26).

    This PR employs Microsoft’s modern approach to handling UTF-8:

    Until recently, Windows has emphasized “Unicode” -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, then -A APIs typically operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

    TODO:

    • Handle application manifests properly when building with MSVC.
    • Bump the minimum supported Windows version to 1903 (May 2019 Update).
    • Remove all remaining use cases of the deprecated std:wstring_convert.
      • The instance in subprocess.h will be addressed in a follow-up PR, as additional tests are likely needed.
      • The usage in common/system.cpp is handled in #32566.

    Resolves partially #32361.

  2. hebasto added the label Windows on Apr 29, 2025
  3. DrahtBot commented at 4:55 pm on April 29, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32380.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33247 (build: Remove CMAKE_SKIP_BUILD_RPATH and SKIP_BUILD_RPATH settings by 151henry151)
    • #33229 (multiprocess: Don’t require bitcoin -m argument when IPC options are used by ryanofsky)
    • #32951 (build: Explicitly set Qt’s AUTO{MOC,RCC,UIC} property per target by hebasto)
    • #31723 (node: Add –debug_runs/-waitfordebugger [DRAFT, rework incoming] by hodlinator)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • // Set the default input/output charset is utf-8 -> // Set the default input/output charset to UTF-8 [fixes ungrammatical phrasing; “to UTF-8” correctly expresses assignment and uppercase “UTF-8” is conventional]

    drahtbot_id_5_m

  4. hebasto force-pushed on Apr 30, 2025
  5. hebasto commented at 9:45 am on April 30, 2025: member
    Rebased on #32383.
  6. in src/common/args.cpp:878 in 942cea7728 outdated
    890 
    891-std::pair<int, char**> WinCmdLineArgs::get()
    892+WindowsScopedCodePage::~WindowsScopedCodePage()
    893 {
    894-    return std::make_pair(argc, argv);
    895+    SetConsoleOutputCP(m_original_code_page);
    


    laanwj commented at 1:16 pm on April 30, 2025:

    Is this necessary with the manifest?

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment(). It’s not necessary to scope this as we want it for the entire run of the application.


    hebasto commented at 1:43 pm on April 30, 2025:

    Is this necessary with the manifest?

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment().

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    It’s not necessary to scope this as we want it for the entire run of the application.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.


    laanwj commented at 1:49 pm on April 30, 2025:

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Thanks.

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    OK.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.

    We haven’t been doing this ever. I think it was nice to simply roll this into SetupEnvironment, and not do it in every individual tool. But okay.


    laanwj commented at 1:54 pm on April 30, 2025:
    If we really need to, what if we make SetupEnvironment scoped instead? It’s a more general solution, it doesn’t belong in “common/args”, and would work also if there’s future environment modifications that need reverting before exit.
  7. hebasto force-pushed on Apr 30, 2025
  8. laanwj commented at 7:04 am on May 1, 2025: member

    After this we should revert our custom leveldb unicode patch. This would make the env_windows backend go back to using the A functions, which then take UTF-8 as-is.

    Edit: After this, i could find the following uses of Windows -W WCHAR functions remaining:

    • LevelDB (see above)
    • subprocess: CreateProcessW (upstream issue, i guess)
    • src/random.cpp: CryptAcquireContextW - not actually passing strings, so no problem. Mind this function is deprecated for out-of-scope reasons: #32391
    • src/util/fs.cpp: CreateFileW - converts using file.wstring().c_str()
    • src/util/fs_helpers.cpp: SHGetSpecialFolderPathW - converts using fs::path(WCHAR*)
    • src/qt/guiutil.cpp: GetModuleFileNameW - result passed directly into ISHellLinkW interface, might be better to leave this code (SetStartOnSystemStartup) as-is
    • src/qt/guiutil.cpp: PathRemoveFileSpecW - idem
  9. hebasto force-pushed on May 1, 2025
  10. hebasto commented at 3:30 pm on May 1, 2025: member
    Please review #32396 first.
  11. hebasto force-pushed on May 1, 2025
  12. hebasto commented at 3:36 pm on May 1, 2025: member
    Rebased on #32396.
  13. maflcko added the label DrahtBot Guix build requested on May 4, 2025
  14. DrahtBot commented at 9:00 am on May 5, 2025: contributor

    Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

    File commit eba5f9c4b63fe46261fbb3e71b9a94832d105b23(master) commit d039c6a5588ed563753e7dbc7598091623a6b72f(pull/32380/merge)
    *-aarch64-linux-gnu-debug.tar.gz dd18c4706ae0a247... 86637bd208268bb1...
    *-aarch64-linux-gnu.tar.gz 4216933d26250c15... e5bbf8a01575dffd...
    *-arm-linux-gnueabihf-debug.tar.gz f972e0bd2981ac67... 8afd5f526dc8f4cf...
    *-arm-linux-gnueabihf.tar.gz 3ca06111b5a591e8... d4c0fd8d5ce3ba86...
    *-arm64-apple-darwin-codesigning.tar.gz 8a9c416ffe2e3d36... 0697f6515271ae7d...
    *-arm64-apple-darwin-unsigned.tar.gz 216c1a1f68356c92... 9bca33ae11db549e...
    *-arm64-apple-darwin-unsigned.zip 1f4d27d16e6c3eea... 4a745d1d4ebdf725...
    *-powerpc64-linux-gnu-debug.tar.gz 8900b88ab21f9f48... 1eb495cd89d88ecd...
    *-powerpc64-linux-gnu.tar.gz c7ed6137d21d3e52... 92b05ec45b154612...
    *-riscv64-linux-gnu-debug.tar.gz aad7744d23e2a136... 6f5ce4a909e4ea0c...
    *-riscv64-linux-gnu.tar.gz d49be6063aa6a124... 0af7c1a58a6167f9...
    *-x86_64-apple-darwin-codesigning.tar.gz c8973dacfc9e519c... 8f70bddec78726b2...
    *-x86_64-apple-darwin-unsigned.tar.gz 9ca6afb860f0b246... 6525d1b7cde740ba...
    *-x86_64-apple-darwin-unsigned.zip a364bd58765172cc... 174395eb35a55891...
    *-x86_64-linux-gnu-debug.tar.gz c11030172812a7c1... 3f1a15b4c2c184f4...
    *-x86_64-linux-gnu.tar.gz f288901ecf194ab8... aa602ac4c259b746...
    *.tar.gz 82d1cd8e8b352d6b... 26375d232dfe0d5d...
    SHA256SUMS.part a0c3bbbfdd25fd25... fffa601b9ae5c99f...
    guix_build.log d4d91f9e2f775124... 21db7fa459cab888...
    guix_build.log.diff 2dadd820439d138b...
  15. DrahtBot removed the label DrahtBot Guix build requested on May 5, 2025
  16. hebasto force-pushed on May 13, 2025
  17. fanquake referenced this in commit e230affaa3 on May 16, 2025
  18. DrahtBot added the label Needs rebase on May 16, 2025
  19. hebasto force-pushed on May 16, 2025
  20. hebasto force-pushed on May 16, 2025
  21. hebasto renamed this:
    [PoC] Modernize use of UTF-8 in Windows code
    Modernize use of UTF-8 in Windows code
    on May 16, 2025
  22. DrahtBot removed the label Needs rebase on May 16, 2025
  23. hebasto force-pushed on May 16, 2025
  24. hebasto commented at 4:34 pm on May 16, 2025: member
    Rebased on #32537.
  25. DrahtBot added the label CI failed on May 16, 2025
  26. hebasto force-pushed on May 18, 2025
  27. DrahtBot removed the label CI failed on May 18, 2025
  28. hebasto force-pushed on May 19, 2025
  29. hebasto marked this as ready for review on May 19, 2025
  30. in doc/release-notes-empty-template.md:39 in b158b729fc outdated
    34@@ -35,8 +35,8 @@ wallet versions of Bitcoin Core are generally supported.
    35 Compatibility
    36 ==============
    37 
    38-Bitcoin Core is supported and tested on operating systems using the
    39-Linux Kernel 3.17+, macOS 13+, and Windows 10+. Bitcoin
    40+Bitcoin Core is supported and tested on the following operating systems or newer:
    41+Linux Kernel 3.17, macOS 13, and Windows 10 (version 1903). Bitcoin
    


    fanquake commented at 3:48 pm on May 19, 2025:
    Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version? What happens if it isn’t?

    hebasto commented at 3:53 pm on May 19, 2025:

    Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version?

    The latter. My attempt to enforce it was based on a wrong assumption.

    What happens if it isn’t?

    My guess is that the application won’t properly handle any symbols outside its default code page.


    laanwj commented at 7:59 am on May 20, 2025:
    Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn’t correct? This seems more thorough than any version check.

    hebasto commented at 11:42 am on May 20, 2025:

    Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn’t correct? This seems more thorough than any version check.

    Thanks! Done.

  31. DrahtBot added the label Needs rebase on May 20, 2025
  32. hebasto force-pushed on May 20, 2025
  33. DrahtBot removed the label Needs rebase on May 20, 2025
  34. hebasto force-pushed on May 20, 2025
  35. hebasto commented at 2:06 pm on May 20, 2025: member
    Changes overlapping with #32566 have been dropped.
  36. in cmake/windows-app.manifest.in:8 in bd43cf8c19 outdated
    0@@ -1,10 +1,15 @@
    1 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    2-<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
    3+<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
    4   <assemblyIdentity
    5       type="win32"
    6       name="org.bitcoincore.${target}"
    7       version="${CLIENT_VERSION_MAJOR}.${CLIENT_VERSION_MINOR}.${CLIENT_VERSION_BUILD}.0"
    8   />
    9+  <asmv3:application>
    


    fanquake commented at 11:38 am on May 21, 2025:
    In 1e826bf91932e2d76969ae40a0c0ef85a990cf2f: Can the commit be updated to better explain why this is needed. Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced. I’m guessing the same is happening with MSVC (although that’s less relevant given it doesn’t effect releases)? Given that, and the fact that according to https://github.com/bitcoin/bitcoin/pull/32380/files#r2096044476, we are just going to assume that a user is running a new enough version of Windows, and throw an assertion if the codepage isn’t CP_UTF8, what is the app-manifest actually doing? Given the version setting from #32537 also didn’t work, it’s not clear why any of this is needed.

    laanwj commented at 1:46 pm on May 21, 2025:

    That’s not how it’s supposed to be. The default application codepage is something that the user can configure system-wide (there should be a checkbox: https://exploratory.io/note/exploratory/Enabling-UTF-8-on-Windows-hYc3yWL0).

    The purpose of the application manifest is to override it so it is always UTF-8 for the application. Exactly what the microsoft documentation says here: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

    This hasn’t changed with newer windows versions, as far as i know, it’s just that there is no usable UTF-8 codepage in earlier ones, so it cannot be set there.

    I’m guessing the same is happening with MSVC (although that’s less relevant given it doesn’t effect releases)?

    It’s absolutely relevant to MSVC-build versions, even if it doesn’t affect the release. If the code assumes the file path locale is UTF-8 and it isn’t all kinds of ugly filename breakage will happen.

    If GetACP always returns CP_UTF8 then i agree the check is meaningless, and we should find a meaningful check for “path codepage is UTF-8”.


    hebasto commented at 3:39 pm on May 21, 2025:

    In 1e826bf: Can the commit be updated to better explain why this is needed. Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced. I’m guessing the same is happening with MSVC (although that’s less relevant given it doesn’t effect releases)?

    GetACP being compiled on Windows returns 1252, which is not the UTF-8 code page. And, basically, it would depend on the OS language settings.


    hebasto commented at 3:49 pm on May 21, 2025:

    Looking at mingw-w64, GetACP has returned CP_UTF8 since it was introduced.

    I see it differently. I’ve cross-compiled this code:

    0#include <iostream>
    1#include <windows.h>
    2
    3int main()
    4{
    5    std::cout << GetACP() << "\n";
    6}
    

    and it prints 1252 on Windows.


    laanwj commented at 7:46 pm on May 21, 2025:

    fwiw: GetACP is a windows system function implemented in kernel32.dll, defined in the winnls.h header. There shouldn’t be a mingw-specific definition.

    CP_UTF8 is also defined in that header and has the following value:

    0#define CP_UTF8 65001
    

    1252 is codepage windows-1252, a legacy encoding.


    hodlinator commented at 8:36 am on September 22, 2025:

    Modified bitcoind I installed from a guix-generated version of this PR in an admin-prompt:

    0C:\Program Files\Bitcoin\daemon>mt -nologo -manifest C:\Users\hodlinator\Documents\cp1252.manifest -outputresource:"bitcoind.exe"
    
     0 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     1 <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
     2   <assemblyIdentity type="win32" name="org.bitcoincore.bitcoind" version="30.99.0.0"></assemblyIdentity>
     3   <asmv3:application>
     4     <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
     5-      <activeCodePage>UTF-8</activeCodePage>
     6+      <activeCodePage>Windows-1252</activeCodePage>
     7     </asmv3:windowsSettings>
     8   </asmv3:application>
     9   <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    10     <security>
    11       <requestedPrivileges>
    12         <requestedExecutionLevel level="asInvoker" uiAccess="false"></requestedExecutionLevel>
    13       </requestedPrivileges>
    14     </security>
    15   </trustInfo>
    16</assembly>
    
    0C:\Program Files\Bitcoin\daemon>bitcoind
    1Assertion failed: GetACP() == CP_UTF8, file ./common/system.cpp, line 87
    

    That further disproves GetACP() returning a hardcoded value.

  37. hebasto commented at 7:47 pm on September 16, 2025: member
    Rebased to refresh the CI.
  38. hebasto force-pushed on Sep 16, 2025
  39. DrahtBot added the label Needs rebase on Sep 18, 2025
  40. hebasto force-pushed on Sep 18, 2025
  41. hebasto commented at 7:25 pm on September 18, 2025: member
    Rebased to resolve conflict with #33333.
  42. DrahtBot removed the label Needs rebase on Sep 18, 2025
  43. in src/qt/bitcoin.cpp:483 in aa830373e8 outdated
    483-    std::tie(argc, argv) = winArgs.get();
    484-#endif
    485-
    486     std::unique_ptr<interfaces::Init> init = interfaces::MakeGuiInit(argc, argv);
    487 
    488     SetupEnvironment();
    


    hodlinator commented at 8:37 am on September 22, 2025:

    (Inline comment in unrelated file)

    Seems like SetupEnvironment is missing from the new bitcoin.cpp

     0diff --git a/src/bitcoin.cpp b/src/bitcoin.cpp
     1index c327827415..766f7bad36 100644
     2--- a/src/bitcoin.cpp
     3+++ b/src/bitcoin.cpp
     4@@ -59,6 +59,8 @@ static void ExecCommand(const std::vector<const char*>& args, std::string_view a
     5
     6 int main(int argc, char* argv[])
     7 {
     8+    SetupEnvironment();
     9+
    10     try {
    11         CommandLine cmd{ParseCommandLine(argc, argv)};
    12         if (cmd.show_version) {
    

    hebasto commented at 11:15 am on September 22, 2025:
    Thanks! Fixed.
  44. hodlinator commented at 8:41 am on September 22, 2025: contributor

    Reviewed aa830373e87325f08b647d41ac3ab5f5196e46c8

    Built a Guix build and tested on Windows. Verified all installed binaries worked. Also verified natively built bitcoind worked.

     0C:\Program Files\Bitcoin\daemon>mt -inputresource:bitcoind.exe;#1 -out:con
     1Microsoft (R) Manifest Tool
     2Copyright (c) Microsoft Corporation.
     3All rights reserved.
     4<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     5<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
     6  <assemblyIdentity type="win32" name="org.bitcoincore.bitcoind" version="30.99.0.0"></assemblyIdentity>
     7  <asmv3:application>
     8    <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
     9      <activeCodePage>UTF-8</activeCodePage>
    10    </asmv3:windowsSettings>
    11  </asmv3:application>
    12  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    13    <security>
    14      <requestedPrivileges>
    15        <requestedExecutionLevel level="asInvoker" uiAccess="false"></requestedExecutionLevel>
    16      </requestedPrivileges>
    17    </security>
    18  </trustInfo>
    19</assembly>
    20
    21C:\Program Files\Bitcoin>mt -inputresource:C:\Users\hodlinator\bitcoin\build\bin\Release\fuzz.exe;#1 -out:con
    22Microsoft (R) Manifest Tool
    23Copyright (c) Microsoft Corporation.
    24All rights reserved.
    25<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    26<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
    27  <assemblyIdentity type="win32" name="org.bitcoincore.fuzz" version="30.99.0.0"></assemblyIdentity>
    28  <asmv3:application>
    29    <asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
    30      <activeCodePage>UTF-8</activeCodePage>
    31    </asmv3:windowsSettings>
    32  </asmv3:application>
    33  <trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
    34    <security>
    35      <requestedPrivileges>
    36        <requestedExecutionLevel level="asInvoker" uiAccess="false"></requestedExecutionLevel>
    37      </requestedPrivileges>
    38    </security>
    39  </trustInfo>
    40</assembly>
    

    Checked out master (5aec516b2ce327227ad478ef798828ca3a3db81a) and cherry-picked system.cpp change to add (+ added fprintf):

    0    fprintf(stderr, "GetACP(): %d\n", GetACP());
    1    assert(GetACP() == CP_UTF8);
    

    Ran natively built bitcoind:

    0$ ./build/bin/Release/bitcoind
    1GetACP(): 1252
    2Assertion failed: GetACP() == CP_UTF8, file C:\Users\hodlinator\bitcoin\src\common\system.cpp, line 87
    
  45. hebasto force-pushed on Sep 22, 2025
  46. hebasto commented at 11:17 am on September 22, 2025: member
    The feedback from @hodlinator has been addressed.
  47. hodlinator approved
  48. hodlinator commented at 7:41 pm on September 22, 2025: contributor

    ACK b74a92cdb2c4d9871865a67162f47f97fb7e642c

    Latest push adds missing/previously avoided call to SetupEnvironment() (discovered in #32380 (review)), this entails also adding a dependency from bitcoin.exe on bitcoin_common.

    Tested Windows-native build, running bitcoin node.

  49. maflcko added the label DrahtBot Guix build requested on Sep 24, 2025
  50. DrahtBot added the label Needs rebase on Sep 25, 2025
  51. Set minimum supported Windows version to 1903 (May 2019 Update)
    This version is the minimum required to support the UTF-8 code page
    (CP_UTF8).
    90b5795982
  52. cmake: Set process code page to UTF-8 on Windows
    Additionally, this change adds app manifests to targets that were
    previously missing them.
    e232d0c278
  53. Remove no longer necessary `WinCmdLineArgs` class
    This change removes one use case of `std::wstring_convert`, which is
    deprecated in C++17 and removed in C++26. Other uses remain for now.
    78c9cde5b1
  54. Switch to ANSI Windows API in `Win32ErrorString()` function 2d97ba375e
  55. Switch to ANSI Windows API in `fsbridge::fopen()` function 63f3ea2b04
  56. hebasto force-pushed on Sep 26, 2025
  57. hebasto commented at 10:06 am on September 26, 2025: member
    Rebased to resolve a conflict with the merged #33229.
  58. DrahtBot removed the label Needs rebase on Sep 26, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-26 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me