Modernize use of UTF-8 in Windows code #32380

pull hebasto wants to merge 5 commits into bitcoin:master from hebasto:250429-windows-utf8 changing 15 files +24 −80
  1. hebasto commented at 4:55 pm on April 29, 2025: member

    The main goal is to remove deprecated code (removed in C++26).

    This PR employs Microsoft’s modern approach to handling UTF-8:

    Until recently, Windows has emphasized “Unicode” -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, then -A APIs typically operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

    TODO:

    • Handle application manifests properly when building with MSVC.
    • Bump the minimum supported Windows version to 1903 (May 2019 Update).
    • Remove all remaining use cases of the deprecated std:wstring_convert.
      • The instance in subprocess.h will be addressed in a follow-up PR, as additional tests are likely needed.
      • The usage in common/system.cpp is handled in #32566.
  2. hebasto added the label Windows on Apr 29, 2025
  3. DrahtBot commented at 4:55 pm on April 29, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32380.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    No conflicts as of last run.

  4. hebasto force-pushed on Apr 30, 2025
  5. hebasto commented at 9:45 am on April 30, 2025: member
    Rebased on #32383.
  6. in src/common/args.cpp:878 in 942cea7728 outdated
    890 
    891-std::pair<int, char**> WinCmdLineArgs::get()
    892+WindowsScopedCodePage::~WindowsScopedCodePage()
    893 {
    894-    return std::make_pair(argc, argv);
    895+    SetConsoleOutputCP(m_original_code_page);
    


    laanwj commented at 1:16 pm on April 30, 2025:

    Is this necessary with the manifest?

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment(). It’s not necessary to scope this as we want it for the entire run of the application.


    hebasto commented at 1:43 pm on April 30, 2025:

    Is this necessary with the manifest?

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Also we already have

    0    SetConsoleCP(CP_UTF8);
    1    SetConsoleOutputCP(CP_UTF8);
    

    in void SetupEnvironment().

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    It’s not necessary to scope this as we want it for the entire run of the application.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.


    laanwj commented at 1:49 pm on April 30, 2025:

    The manifest is responsible for the active code page, which differs from console input and output code pages. It would be easy to compare outputs of GetACP(), GetConsoleCP() and GetConsoleOutputCP().

    Thanks.

    Thanks! I overlooked those calls. However, with the manifest, setting only the console output code page is sufficient.

    OK.

    The scoped implementation is necessary to restore the console code pages to their state prior to the application’s execution.

    We haven’t been doing this ever. I think it was nice to simply roll this into SetupEnvironment, and not do it in every individual tool. But okay.


    laanwj commented at 1:54 pm on April 30, 2025:
    If we really need to, what if we make SetupEnvironment scoped instead? It’s a more general solution, it doesn’t belong in “common/args”, and would work also if there’s future environment modifications that need reverting before exit.
  7. hebasto force-pushed on Apr 30, 2025
  8. laanwj commented at 7:04 am on May 1, 2025: member

    After this we should revert our custom leveldb unicode patch. This would make the env_windows backend go back to using the A functions, which then take UTF-8 as-is.

    Edit: After this, i could find the following uses of Windows -W WCHAR functions remaining:

    • LevelDB (see above)
    • subprocess: CreateProcessW (upstream issue, i guess)
    • src/random.cpp: CryptAcquireContextW - not actually passing strings, so no problem. Mind this function is deprecated for out-of-scope reasons: #32391
    • src/util/fs.cpp: CreateFileW - converts using file.wstring().c_str()
    • src/util/fs_helpers.cpp: SHGetSpecialFolderPathW - converts using fs::path(WCHAR*)
    • src/qt/guiutil.cpp: GetModuleFileNameW - result passed directly into ISHellLinkW interface, might be better to leave this code (SetStartOnSystemStartup) as-is
    • src/qt/guiutil.cpp: PathRemoveFileSpecW - idem
  9. hebasto force-pushed on May 1, 2025
  10. hebasto commented at 3:30 pm on May 1, 2025: member
    Please review #32396 first.
  11. hebasto force-pushed on May 1, 2025
  12. hebasto commented at 3:36 pm on May 1, 2025: member
    Rebased on #32396.
  13. maflcko added the label DrahtBot Guix build requested on May 4, 2025
  14. DrahtBot commented at 9:00 am on May 5, 2025: contributor

    Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

    File commit eba5f9c4b63fe46261fbb3e71b9a94832d105b23(master) commit d039c6a5588ed563753e7dbc7598091623a6b72f(pull/32380/merge)
    *-aarch64-linux-gnu-debug.tar.gz dd18c4706ae0a247... 86637bd208268bb1...
    *-aarch64-linux-gnu.tar.gz 4216933d26250c15... e5bbf8a01575dffd...
    *-arm-linux-gnueabihf-debug.tar.gz f972e0bd2981ac67... 8afd5f526dc8f4cf...
    *-arm-linux-gnueabihf.tar.gz 3ca06111b5a591e8... d4c0fd8d5ce3ba86...
    *-arm64-apple-darwin-codesigning.tar.gz 8a9c416ffe2e3d36... 0697f6515271ae7d...
    *-arm64-apple-darwin-unsigned.tar.gz 216c1a1f68356c92... 9bca33ae11db549e...
    *-arm64-apple-darwin-unsigned.zip 1f4d27d16e6c3eea... 4a745d1d4ebdf725...
    *-powerpc64-linux-gnu-debug.tar.gz 8900b88ab21f9f48... 1eb495cd89d88ecd...
    *-powerpc64-linux-gnu.tar.gz c7ed6137d21d3e52... 92b05ec45b154612...
    *-riscv64-linux-gnu-debug.tar.gz aad7744d23e2a136... 6f5ce4a909e4ea0c...
    *-riscv64-linux-gnu.tar.gz d49be6063aa6a124... 0af7c1a58a6167f9...
    *-x86_64-apple-darwin-codesigning.tar.gz c8973dacfc9e519c... 8f70bddec78726b2...
    *-x86_64-apple-darwin-unsigned.tar.gz 9ca6afb860f0b246... 6525d1b7cde740ba...
    *-x86_64-apple-darwin-unsigned.zip a364bd58765172cc... 174395eb35a55891...
    *-x86_64-linux-gnu-debug.tar.gz c11030172812a7c1... 3f1a15b4c2c184f4...
    *-x86_64-linux-gnu.tar.gz f288901ecf194ab8... aa602ac4c259b746...
    *.tar.gz 82d1cd8e8b352d6b... 26375d232dfe0d5d...
    SHA256SUMS.part a0c3bbbfdd25fd25... fffa601b9ae5c99f...
    guix_build.log d4d91f9e2f775124... 21db7fa459cab888...
    guix_build.log.diff 2dadd820439d138b...
  15. DrahtBot removed the label DrahtBot Guix build requested on May 5, 2025
  16. hebasto force-pushed on May 13, 2025
  17. fanquake referenced this in commit e230affaa3 on May 16, 2025
  18. DrahtBot added the label Needs rebase on May 16, 2025
  19. hebasto force-pushed on May 16, 2025
  20. hebasto force-pushed on May 16, 2025
  21. hebasto renamed this:
    [PoC] Modernize use of UTF-8 in Windows code
    Modernize use of UTF-8 in Windows code
    on May 16, 2025
  22. DrahtBot removed the label Needs rebase on May 16, 2025
  23. hebasto force-pushed on May 16, 2025
  24. hebasto commented at 4:34 pm on May 16, 2025: member
    Rebased on #32537.
  25. DrahtBot added the label CI failed on May 16, 2025
  26. hebasto force-pushed on May 18, 2025
  27. DrahtBot removed the label CI failed on May 18, 2025
  28. hebasto force-pushed on May 19, 2025
  29. hebasto marked this as ready for review on May 19, 2025
  30. in doc/release-notes-empty-template.md:39 in b158b729fc outdated
    34@@ -35,8 +35,8 @@ wallet versions of Bitcoin Core are generally supported.
    35 Compatibility
    36 ==============
    37 
    38-Bitcoin Core is supported and tested on operating systems using the
    39-Linux Kernel 3.17+, macOS 13+, and Windows 10+. Bitcoin
    40+Bitcoin Core is supported and tested on the following operating systems or newer:
    41+Linux Kernel 3.17, macOS 13, and Windows 10 (version 1903). Bitcoin
    


    fanquake commented at 3:48 pm on May 19, 2025:
    Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version? What happens if it isn’t?

    hebasto commented at 3:53 pm on May 19, 2025:

    Is anything enforcing this, or is the assumption that any Windows 10 system is at least this version?

    The latter. My attempt to enforce it was based on a wrong assumption.

    What happens if it isn’t?

    My guess is that the application won’t properly handle any symbols outside its default code page.


    laanwj commented at 7:59 am on May 20, 2025:
    Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn’t correct? This seems more thorough than any version check.

    hebasto commented at 11:42 am on May 20, 2025:

    Can we maybe check the codepage early at startup (e.g. SetupEnvironment()) and fail if it isn’t correct? This seems more thorough than any version check.

    Thanks! Done.

  31. DrahtBot added the label Needs rebase on May 20, 2025
  32. Set minimum supported Windows version to 1903 (May 2019 Update)
    This version is the minimum required to support the UTF-8 code page
    (CP_UTF8).
    4caf6617d9
  33. cmake: Set process code page to UTF-8 on Windows
    Additionally, this change adds app manifests to targets that were
    previously missing them.
    1e826bf919
  34. Remove no longer necessary `WinCmdLineArgs` class
    This change removes one use case of `std::wstring_convert`, which is
    deprecated in C++17 and removed in C++26. Other uses remain for now.
    57d5e54e73
  35. Switch to ANSI Windows API in `Win32ErrorString()` function 2b1d22e8f9
  36. hebasto force-pushed on May 20, 2025
  37. DrahtBot removed the label Needs rebase on May 20, 2025
  38. Switch to ANSI Windows API in `fsbridge::fopen()` function bd43cf8c19
  39. hebasto force-pushed on May 20, 2025
  40. hebasto commented at 2:06 pm on May 20, 2025: member
    Changes overlapping with #32566 have been dropped.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-23 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me