From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.
This PR adds two C++17 compatible macros: u8string and u8path. The reason that I use macros is that developers might not want to pass second parameter utf8 everytime when they call string or create path objects.
I tried to find all string methods and convert them to u8string except for external api calls. Also tried to convert explicit or implicit path to u8path. Tell me if you find something that I missed.
Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.
Empact
commented at 11:24 pm on June 10, 2018:
member
ken2812221
commented at 0:20 am on June 11, 2018:
contributor
Can we use path.imbued.locale instead for this?
We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add boost::locale dependency in order to use boost locale generator.
ken2812221 force-pushed
on Jun 11, 2018
ken2812221 force-pushed
on Jun 11, 2018
ken2812221 force-pushed
on Jun 11, 2018
laanwj added the label
Bug
on Jun 11, 2018
ken2812221 force-pushed
on Jun 12, 2018
ken2812221 force-pushed
on Jun 12, 2018
ken2812221 force-pushed
on Jun 12, 2018
ken2812221 force-pushed
on Jun 12, 2018
ken2812221 renamed this:
[WIP, bugfix] Add u8path and u8string to boost to fix #13103
[bugfix] Add u8path and u8string to boost to fix #13103
on Jun 12, 2018
fanquake requested review from theuni
on Jun 12, 2018
ken2812221 force-pushed
on Jun 12, 2018
theuni
commented at 8:27 pm on June 12, 2018:
member
I’m not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I’m just not familiar with it.
Maybe start by upstreaming it and see where the discussion goes?
ken2812221
commented at 8:32 pm on June 12, 2018:
contributor
Firstly because it would mean that only depends-builds can build for Windows
#11911 (Free CDBEnv instances when not in use by ryanofsky)
#11625 (WIP: Add BitcoinApplication & RPCConsole tests by ryanofsky)
#10973 (Refactor: separate wallet from node by ryanofsky)
#10102 ([experimental] Multiprocess bitcoin by ryanofsky)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P
Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.
ken2812221
commented at 10:10 pm on June 15, 2018:
I just thought that pass the second parameter at every .string() and path() is pretty annoying and easy to forget. If most developers don’t want to use macro function, I could do a scripted-diff.
I’ll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path for string as well, and hide g_utf8
ken2812221 force-pushed
on Jun 21, 2018
ken2812221
commented at 3:11 pm on June 21, 2018:
contributor
Rebased
unknown approved
DrahtBot added the label
Needs rebase
on Jul 12, 2018
DrahtBot removed the label
Needs rebase
on Jul 12, 2018
ken2812221 renamed this:
[bugfix] Add u8path and u8string to fix #13103
[bugfix] Add u8path and u8string to fix encoding issue for Windows
on Jul 17, 2018
laanwj added this to the milestone 0.17.0
on Jul 19, 2018
DrahtBot added the label
Needs rebase
on Jul 20, 2018
ken2812221 force-pushed
on Jul 20, 2018
fanquake removed the label
Needs rebase
on Jul 21, 2018
fanquake
commented at 4:57 am on July 21, 2018:
member
@theuni We probably want your thoughts here again.
For reference nothing has happened at the upstream PR, however that is less relevant now that we aren’t patching Boost.
Sjors
commented at 9:56 am on July 25, 2018:
member
Compared to master, this PR helps in at least the following scenario:
Windows 10, system locale set to Simplified Chinese (see #13754 for how)
launch QT from search bar or from command prompt: bitcoin-qt -wallet=你好
open console and do getwalletinfo
Before:
After:
DrahtBot added the label
Needs rebase
on Jul 25, 2018
Add u8string and u8path function macroscacd92112e
ken2812221 force-pushed
on Jul 26, 2018
ken2812221
commented at 2:45 am on July 26, 2018:
contributor
@Sjors Now you can use any walletname you want even if you are using English language setting.
DrahtBot removed the label
Needs rebase
on Jul 26, 2018
ken2812221 renamed this:
[bugfix] Add u8path and u8string to fix encoding issue for Windows
[bugfix] Fix encoding issue for Windows
on Jul 26, 2018
jonasschnelli
commented at 11:11 am on July 26, 2018:
contributor
Sjors
commented at 12:15 pm on July 26, 2018:
member
Now you can use any walletname you want even if you are using English language setting
Thanks, that worked.
Can you change some of the functional tests to use unicode characters? I think wallet_multiwallet.py and wallet_labels.py would be good examples, since they use RPC arguments as well as filenames. I renamed label e in labels tests which worked fine on macOS. The multiwallet test was less happy when I renamed w3: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128). @jnewbery thoughts?
I also tested on macOS and that (still) works fine.
ken2812221 force-pushed
on Jul 26, 2018
ken2812221
commented at 10:15 pm on July 26, 2018:
contributor
Update: Dropped wmain, Use GetCommandLineW and CommandLineToArgvW instead.
Convert command line string to utf8c7ced5b87a
Use u8string at proper place30618a83c8
Use u8path at proper place9f7078eca6
ui: Use u8string and u8path to convert between path and QStringec774c5b97
Make bdb compiled in Unicode mode023691127f
Pass utf-8 encoded string to bdb api4a222f6787
ken2812221 force-pushed
on Jul 26, 2018
jnewbery
commented at 3:28 pm on July 27, 2018:
member
Can you change some of the functional tests to use unicode characters? … @jnewbery thoughts?
wallet_multiwallet.py seems like a good place to add the test.
ken2812221 force-pushed
on Jul 28, 2018
ken2812221 force-pushed
on Jul 28, 2018
ken2812221
commented at 6:08 pm on July 28, 2018:
contributor
Update: Now we can specify any characters in -datadir.
ken2812221 closed this
on Jul 28, 2018
ken2812221 reopened this
on Jul 28, 2018
Sjors
commented at 1:26 pm on July 30, 2018:
member
Some Travis builds are unhappy:
ken2812221
commented at 2:17 pm on July 30, 2018:
contributor
@Sjors Those needs upstream changes bitcoin-core/leveldb#18, you can see travis result on #13787.
ken2812221 force-pushed
on Jul 31, 2018
Make FileLock support utf8 for Windowsc3e0340da8
Use _wfopen and _wreopen on Windows2d86754743
Make leveldb works with utf8087792e3c0
Use Unicode version of MoveFileEx4723aed698
Add fsbridge::i/ofstream class014aa3e61a
Change fs/std::i/ofstream to fsbridge::i/ofstream2fa7d421d1
ken2812221 force-pushed
on Jul 31, 2018
ken2812221 force-pushed
on Jul 31, 2018
laanwj removed this from the milestone 0.17.0
on Aug 2, 2018
laanwj added this to the milestone 0.18.0
on Aug 2, 2018
laanwj added the label
Windows
on Aug 2, 2018
Empact
commented at 4:43 am on August 3, 2018:
member
If you call CommandLineToArgvW within gArgs.ParseParameters instead, you can avoid touching src/bitcoind.cpp and src/bench/bench_bitcoin.cpp, at least. There may be other opportunities for minimization.
ken2812221
commented at 7:17 am on August 3, 2018:
contributor
If you call CommandLineToArgvW within gArgs.ParseParameters
It would break the ParseParameters in tests if we do that.
ken2812221 force-pushed
on Aug 4, 2018
ken2812221
commented at 3:27 pm on August 5, 2018:
contributor
I’ll seperate these into many PRs.
ken2812221 closed this
on Aug 5, 2018
ken2812221 deleted the branch
on Aug 6, 2018
laanwj referenced this in commit
2ddce35abc
on Aug 29, 2018
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-01-22 06:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me