[PoC] Modernize use of UTF-8 in Windows code #32380

pull hebasto wants to merge 7 commits into bitcoin:master from hebasto:250429-windows-utf8 changing 16 files +73 −75
  1. hebasto commented at 4:55 pm on April 29, 2025: member

    The main goal is to remove deprecated code (removed in C++26).

    This PR demonstrates Microsoft’s modern approach to handling UTF-8:

    Until recently, Windows has emphasized “Unicode” -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, then -A APIs typically operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

    TODO:

    • Handle application manifests properly when building with MSVC.
    • Bump the minimum supported Windows version to 1903 (May 2019 Update).
    • Remove all remaining use cases of the deprecated std:wstring_convert.

    Based on #32396.

  2. hebasto added the label Windows on Apr 29, 2025
  3. DrahtBot commented at 4:55 pm on April 29, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32380.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    Using microsoft/setup-msbuild → Using Microsoft/setup-msbuild

  4. hebasto force-pushed on Apr 30, 2025
  5. hebasto commented at 9:45 am on April 30, 2025: member
    Rebased on #32383.
  6. in src/common/args.cpp:878 in 942cea7728 outdated
    890 
    891-std::pair<int, char**> WinCmdLineArgs::get()
    892+WindowsScopedCodePage::~WindowsScopedCodePage()
    893 {
    894-    return std::make_pair(argc, argv);
    895+    SetConsoleOutputCP(m_original_code_page);
    


    laanwj commented at 1:16 pm on April 30, 2025:

    Is this necessary with the manifest?

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment(). It’s not necessary to scope this as we want it for the entire run of the application.


    hebasto commented at 1:43 pm on April 30, 2025:

    Is this necessary with the manifest?

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment().

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    It’s not necessary to scope this as we want it for the entire run of the application.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.


    laanwj commented at 1:49 pm on April 30, 2025:

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Thanks.

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    OK.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.

    We haven’t been doing this ever. I think it was nice to simply roll this into SetupEnvironment, and not do it in every individual tool. But okay.


    laanwj commented at 1:54 pm on April 30, 2025:
    If we really need to, what if we make SetupEnvironment scoped instead? It’s a more general solution, it doesn’t belong in “common/args”, and would work also if there’s future environment modifications that need reverting before exit.
  7. hebasto force-pushed on Apr 30, 2025
  8. laanwj commented at 7:04 am on May 1, 2025: member

    After this we should revert our custom leveldb unicode patch. This would make the env_windows backend go back to using the A functions, which then take UTF-8 as-is.

    Edit: After this, i could find the following uses of Windows -W WCHAR functions remaining:

    • LevelDB (see above)
    • subprocess: CreateProcessW (upstream issue, i guess)
    • src/random.cpp: CryptAcquireContextW - not actually passing strings, so no problem. Mind this function is deprecated for out-of-scope reasons: #32391
    • src/util/fs.cpp: CreateFileW - converts using file.wstring().c_str()
    • src/util/fs_helpers.cpp: SHGetSpecialFolderPathW - converts using fs::path(WCHAR*)
    • src/qt/guiutil.cpp: GetModuleFileNameW - result passed directly into ISHellLinkW interface, might be better to leave this code (SetStartOnSystemStartup) as-is
    • src/qt/guiutil.cpp: PathRemoveFileSpecW - idem
  9. hebasto force-pushed on May 1, 2025
  10. cmake: Add application manifests when cross-compiling for Windows
    Windows application manifests provide several benefits. However, on the
    master branch, the linker generates and embeds manifests only when
    building with MSVC.
    
    This change unifies manifest embedding for both native and
    cross-compilation.
    aae78531fe
  11. ci: Add "Get bitcoind manifest" steps to Windows CI jobs
    This change makes it easy to verify any changes in the application
    manifests.
    665b2ba999
  12. hebasto commented at 3:30 pm on May 1, 2025: member
    Please review #32396 first.
  13. cmake: Set process code page to UTF-8 on Windows 4075d8c1a0
  14. Remove no longer necessary `WinCmdLineArgs` class
    This change removes one use case of `std::wstring_convert`, which is
    deprecated in C++17 and removed in C++26. Other uses remain for now.
    4b3b50daf6
  15. Switch to ANSI Windows API in `Win32ErrorString()` function
    TODO: Improve test coverage.
    05b43cdac9
  16. Switch to ANSI Windows API in `runCommand()` function d4a1c9cabc
  17. Switch to ANSI Windows API in `fsbridge::fopen()` function
    TODO: Consider switching to `fopen_s()`.
    7bc30de984
  18. hebasto force-pushed on May 1, 2025
  19. hebasto commented at 3:36 pm on May 1, 2025: member
    Rebased on #32396.
  20. maflcko added the label DrahtBot Guix build requested on May 4, 2025
  21. DrahtBot commented at 9:00 am on May 5, 2025: contributor

    Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

    File commit eba5f9c4b63fe46261fbb3e71b9a94832d105b23(master) commit d039c6a5588ed563753e7dbc7598091623a6b72f(pull/32380/merge)
    *-aarch64-linux-gnu-debug.tar.gz dd18c4706ae0a247... 86637bd208268bb1...
    *-aarch64-linux-gnu.tar.gz 4216933d26250c15... e5bbf8a01575dffd...
    *-arm-linux-gnueabihf-debug.tar.gz f972e0bd2981ac67... 8afd5f526dc8f4cf...
    *-arm-linux-gnueabihf.tar.gz 3ca06111b5a591e8... d4c0fd8d5ce3ba86...
    *-arm64-apple-darwin-codesigning.tar.gz 8a9c416ffe2e3d36... 0697f6515271ae7d...
    *-arm64-apple-darwin-unsigned.tar.gz 216c1a1f68356c92... 9bca33ae11db549e...
    *-arm64-apple-darwin-unsigned.zip 1f4d27d16e6c3eea... 4a745d1d4ebdf725...
    *-powerpc64-linux-gnu-debug.tar.gz 8900b88ab21f9f48... 1eb495cd89d88ecd...
    *-powerpc64-linux-gnu.tar.gz c7ed6137d21d3e52... 92b05ec45b154612...
    *-riscv64-linux-gnu-debug.tar.gz aad7744d23e2a136... 6f5ce4a909e4ea0c...
    *-riscv64-linux-gnu.tar.gz d49be6063aa6a124... 0af7c1a58a6167f9...
    *-x86_64-apple-darwin-codesigning.tar.gz c8973dacfc9e519c... 8f70bddec78726b2...
    *-x86_64-apple-darwin-unsigned.tar.gz 9ca6afb860f0b246... 6525d1b7cde740ba...
    *-x86_64-apple-darwin-unsigned.zip a364bd58765172cc... 174395eb35a55891...
    *-x86_64-linux-gnu-debug.tar.gz c11030172812a7c1... 3f1a15b4c2c184f4...
    *-x86_64-linux-gnu.tar.gz f288901ecf194ab8... aa602ac4c259b746...
    *.tar.gz 82d1cd8e8b352d6b... 26375d232dfe0d5d...
    SHA256SUMS.part a0c3bbbfdd25fd25... fffa601b9ae5c99f...
    guix_build.log d4d91f9e2f775124... 21db7fa459cab888...
    guix_build.log.diff 2dadd820439d138b...
  22. DrahtBot removed the label DrahtBot Guix build requested on May 5, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-05 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me