fix WIN32 boost::filesystem::path issues when using special chars for datadir path #6093
pull jonasschnelli wants to merge 2 commits into bitcoin:master from jonasschnelli:2015/05/fix_win_boost_path changing 1 files +5 −3-
jonasschnelli commented at 11:32 am on May 1, 2015: contributorfixes #6078 Meant for 0.11 branch.
-
don't imbue boost::filesystem::path with locale "C" on windows
fixes https://github.com/bitcoin/bitcoin/issues/6078
-
laanwj commented at 12:15 pm on May 1, 2015: memberGood catch. I already fixed this once before, people really shouldn’t be touching this ‘magic’ code. Maybe add a comment referring to the issue.
-
laanwj added the label Bug on May 1, 2015
-
laanwj added the label Windows on May 1, 2015
-
laanwj added this to the milestone 0.10.0 on May 1, 2015
-
jonasschnelli commented at 1:03 pm on May 1, 2015: contributor
Right. This is not the right approach. Was trying to go down the rabbit hole of win/unicode handling… Can’t find a solution. It seems that it could also be a compiler wchar issue. It looks like that the default locale is
C
?I could also guess that it might be connected to https://github.com/bitcoin/bitcoin/blob/master/src/util.cpp#L672
SHGetSpecialFolderPathA
. The A stands for ANSI. There is also a SHGetSpecialFolderPathW equivalent which supports WCHAR. -
laanwj commented at 1:48 pm on May 1, 2015: memberIn windows the file system locale depends on the user locale. I think it only works if boost uses the same locale as windows. But I’m not sure anymore how this was determined. I got it to work once, hence it worked in 0.10… (using the W* functions is out of the question unless you use them everywhere and that’d involve a lot of platform specific code as well as widechar to UTF-8 conversion)
-
dexX7 commented at 1:54 pm on May 1, 2015: contributor
@jonasschnelli: using the “C” locale to imbue
boost::filesystem::path
indeed triggers the failure on Windows in this context.There is more than one problem tackled by
SetupEnvironment()
:- On POSIX systems, messed up environment settings can cause a failure (Python test).
- During the deinitialization, an internal facet pointer of
boost::path
is deleted, and ifboost::path
was not initialized by the main thread, it causes failures such as #3136 (and related).
The current situation:
- As default, the “C” locale is used.
- On POSIX systems, the environment locale is used (via
std::locale("")
). - On POSIX systems, to detect bad environment settings, an exception thrown by
std::locale("")
is caught. - On POSIX systems, and if there are bad environment settings, the “C” locale is used as fallback.
- As workaround, to explicitly force the initialization of the internal facet pointer by the main thread,
boost::filesystem::path::imbue()
is called. - Bad initialization of
boost::filesystem::boost::path
can result in exceptions, which are currently not caught. - On Windows systems, the “C” locale appearingly does not cover the whole character set, which can result in (6), as shown by #6078.
In practise, the current 0.10.1 can result in a failure on Russian (and other) Windows, and earlier versions are affected by the deinitialization issue, resulting in faulty Boost test executions, as well as other failures during the shutdown, such as the crash reported in the context of rebuilding the block database.
-
laanwj commented at 3:47 pm on May 1, 2015: member
Arguably the new problem is worse, as it prevents people from running the client at all with some characters in their data directories, instead of a rare shutdown race. What about:
0 std::locale loc("C"); 1 // On most POSIX systems (e.g. Linux, but not BSD) the environment's locale 2 // may be invalid, in which case the "C" locale is used as fallback. 3 try { 4 loc = std::locale(""); // Raises a runtime error if current locale is invalid 5 } catch (const std::runtime_error&) { 6#if !defined(WIN32) && !defined(MAC_OSX) && !defined(__FreeBSD__) && !defined(__OpenBSD__) 7 setenv("LC_ALL", "C", 1); 8#endif 9 } 10 // The path locale is lazy initialized and to avoid deinitialization errors 11 // in multithreading environments, it is set explicitly by the main thread. 12 boost::filesystem::path::imbue(loc);
This would imbue the system locale on all platforms and should avoid the race condition, as well as the data directory problems. Needs to be tested though..
-
jonasschnelli commented at 4:33 pm on May 1, 2015: contributor
@laanwj i tried you proposed code and it does not fix the problem. Maybe it would be worth to revert #5877 and find a proper solution.
What really surprises me: At least since 0.9.2 (didn’t try further down), bitcoind and Bitcoin-Qt does not run with a datadir-path containing special Chinese or other Asian chars (Windows only).
-
dexX7 commented at 5:45 pm on May 1, 2015: contributor
boost::filesystem::path::imbue() returns the “previously used locale”, which should be the “right one”, if called in a pristine environment, or to put it another way: if imbue() wouldn’t be called at all, then this appears to be the one fs::path would use.
In the following patch, a dummy locale is used to extract the internal locale, to use it explicitly, which seems to be a preferable choice, because the goal is not to modify the locale at all, but to prevent deinitialization issues, and the bad-environment-fallback also kicks in earlier.
0diff --git a/src/util.cpp b/src/util.cpp 1index 4fea18b..68fb326 100644 2--- a/src/util.cpp 3+++ b/src/util.cpp 4@@ -713,18 +713,20 @@ void RenameThread(const char* name) 5 6 void SetupEnvironment() 7 { 8- std::locale loc("C"); 9 // On most POSIX systems (e.g. Linux, but not BSD) the environment's locale 10 // may be invalid, in which case the "C" locale is used as fallback. 11 #if !defined(WIN32) && !defined(MAC_OSX) && !defined(__FreeBSD__) && !defined(__OpenBSD__) 12 try { 13- loc = std::locale(""); // Raises a runtime error if current locale is invalid 14+ std::locale(""); // Raises a runtime error if current locale is invalid 15 } catch (const std::runtime_error&) { 16 setenv("LC_ALL", "C", 1); 17 } 18 #endif 19 // The path locale is lazy initialized and to avoid deinitialization errors 20 // in multithreading environments, it is set explicitly by the main thread. 21+ // A dummy locale is used to extract the internal default locale, used by 22+ // boost::filesystem::path, which is then used to explicitly imbue the path. 23+ std::locale loc = boost::filesystem::path::imbue(std::locale::classic()); 24 boost::filesystem::path::imbue(loc); 25 }
I tested 0.9.3, 0.10.1, another patch and this one, and it’s behavior appears to be similar to 0.9.3 (i.e. it can handle paths, which 0.10.1 can’t handle).
Example paths and test results:
- https://gist.github.com/dexX7/7394a947b29c7d84c710#file-results_bitcoind-log
- https://gist.github.com/dexX7/7394a947b29c7d84c710#file-results_bitcoin-qt-log
Neither the newly patched version, 0.9.3, or 0.10.0, are able to use
"E:\LocaleTests\юзза"
as datadir.When choosing
"E:\LocaleTests\юзза"
as datadir via bitcoin-qt 0.9.3, then the error message is in German, while the patched version, as well as bitcoin-qt 0.10.0, shows an English message, even though the rest of the UI is actually translated. -
theuni commented at 8:23 pm on May 1, 2015: member
I think that’s a bit circuitous. It works, but not for obvious reasons.
The returned loc from boost is the classic locale plus custom boost facet for windows. Otherwise, you could just replace the first call with std::locale().
The below is greatly simplified and I think it does what we want. Note that std::locale::global("") does not work as intended on windows, presumably because we’re statically linking libstdc++. The basic C setlocale() does though.
0void SetupEnvironment() 1{ 2 if (!setlocale(LC_ALL, "")) 3 { 4 setenv("LC_ALL", "C", 1); 5 setlocale(LC_ALL, "C"); 6 } 7 boost::filesystem::path::imbue(std::locale()); 8}
All we’re really trying to accomplish here is to teach boost how to handle locale-specific char conversions for paths, right? Afaik we don’t mess with c++ locales elsewhere.
The above accomplishes that, won’t throw exceptions, and I believe it’s portable. Tested and verified with a Chinese profile/username/locale on win7, and on Linux with a busted locale via
LC_LANG=bad ./bitcoind
-
dexX7 commented at 9:30 pm on May 1, 2015: contributor
@theuni: when building on Ubuntu 14.04, with
HOST=x86_64-w64-mingw32
, and the dependency packages provided viadepends
, I had to wrapsetenv
with#if !defined(WIN32) ...
, because it would otherwise cause a build error.Back on Windows 8.1 x64, when using bitcoin-qt.exe, the tricky paths, such as
..\œ
and..\€@ä®
, are accepted and work very well, however, when using..\юзза
, a runtime error is raised.I was not able to use that path with any version or patch, though 0.9.3, and the build with the dummy-workaround, respond with
"Error: Specified data directory "E:\LocaleTests\????" does not exist."
instead.The behavior of bitcoind.exe 0.9.3, the dummy-workaround patch, and your simple patch appears to be similar.
-
jonasschnelli commented at 8:45 am on May 2, 2015: contributor
Updated this PR with @theunis solution (fixed: added a
#if !defined(WIN32)
forsetenv()
). Built through gitian: https://builds.jonasschnelli.ch/pulls/6093/Won’t run on win7 with a username containing Chinese chars (as it was with 0.9.3, 0.10.0, probably it was always like this). But won’t also run on win7 with a username containing Russian chars (where 0.9.3 and 0.10.0 was running okay).
Short term we need a solution to fix #6078 without reintroducing #3136. Long term we need a solution who can handle all possible windows charsets.
-
theuni commented at 2:43 pm on May 2, 2015: member
Ok, I’ve done more debugging here. tl;dr: After poking some more with even more variables, I think @dexX7’s change is the right one. @jonasschnelli It works fine for me with a Chinese username. Turns out there’s another variable: selected system “Format” in system location settings. Looks like that’s what influences the default locale at runtime rather than the system locale.
Some testing using
printf("locale %s:"setlocale(LC_ALL,NULL))
: With a Chinese format selected:locale: Chinese (Simplified)_People's Republic of China.936
With English selected:English_United States.1252
So.. makes sense. Locale is per-user.
When the Format is set to Chinese, a Chinese username works. When the format is left (default?) in English, the Chinese username doesn’t work.
This is the case because when we use the user’s locale (Format), paths for that language should just work.
However.
Using @dexX7’s trick, Chinese username with English Format does work. This is because of boost’s usage of its own codecvt, which queries the Win API for the codepage type: https://github.com/boostorg/filesystem/blob/boost-1.55.0/src/windows_file_codecvt.cpp#L39
So it looks like that’s actually the way to go.
-
dexX7 commented at 4:14 pm on May 2, 2015: contributor
@theuni: my idea was basically to get access and capture the whole following block, which uses different facets, depending on the OS and build: boost-1.55.0/src/path.cpp#L792-L873, and then use it to properly “initialize” Boost path by the main thread with the correct locale, after taking care of bad environment settings.
Nevertheless, it would probably make sense to catch the exception, or handle failures related to the datadir. In your simpler solution I can trigger a runtime error, if I use the path
"E:\LocaleTests\юзза"
(which is a legit path) on a German localized Windows, while 0.9.3 and my patch results in"Error: Specified data directory "E:\LocaleTests\????" does not exist."
. I’m not entirely sure, where this is coming from, but ideally there should be a more expressive user faced message, with clear instructions to use a “compatible” path, if possibleFWIW: https://github.com/dexX7/bitcoin/commit/c9a37843bc58bce12ece6bcf896339eeadc58ccd @laanwj’s words are still ringing in my ear.. :/
This more and more looks like just some occult incantation to ward off the evil spirits of bad locales :)
-
sipa commented at 4:16 pm on May 2, 2015: memberCan’t we all just agree to go back to EBCDIC?
-
laanwj commented at 8:49 am on May 4, 2015: member
I preferred PETSCII :)
Question: at least for the 0.10 branch, can you code this so that it doesn’t affect behavior on non-Windows? I’d hate to release a ‘fix’ then break the Linux issue again, where an invalid locale prevents the application from starting.
Using @dexX7’s trick, Chinese username with English Format does work. This is because of boost’s usage of its own codecvt, which queries the Win API for the codepage type: https://github.com/boostorg/filesystem/blob/boost-1.55.0/src/windows_file_codecvt.cpp#L39
Phew – that’s suble. Thanks for shedding some light on this dark matter. I agree that @dexX7’s swap-the-locale trick, though circuitous. seems to be the best way forward.
-
in src/util.cpp: in 426ee48f86 outdated
737+ { 738+#if !defined(WIN32) 739 setenv("LC_ALL", "C", 1); 740- } 741 #endif 742- // The path locale is lazy initialized and to avoid deinitialization errors
laanwj commented at 8:52 am on May 4, 2015:Please correct instead of remove the comments. Esoteric knowledge about locales is needed to understand this on reading, heck, we will have forgotten about the rationale for this in a few months.jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli force-pushed on May 4, 2015jonasschnelli renamed this:
don't imbue boost::filesystem::path with locale "C" on windows
fix WIN32 boost::filesystem::path issues when using special chars for datadir path
on May 4, 2015jonasschnelli commented at 12:10 pm on May 4, 2015: contributorUpdated to make use of @dexX7 solution (https://github.com/bitcoin/bitcoin/pull/6093#issuecomment-98185779). I think this is okay for 0.11.
If there are no complains, i’ll open a PR for the 0.10 branch with the same solution but only touching Win32 behavior (some ifdefs).
theuni commented at 1:58 pm on May 4, 2015: member@dexX7 Thanks for sticking with this and explaining the issue well. It looks like it took me a little while to figure out exactly what you already had, I just had to work through it myself for it to make sense. Nice detective work! @laanwj Agreed on making it windows only for 0.10, and on adding comments.
Additionally, we’re depending on internal boost workings here that may change in the future, I wonder if it’d be worthwhile to add a quick test to use ‘./bitcoind.exe -datadir=c:\something\non-latin’. Sadly, I don’t think we could trust wine to faithfully reproduce the issue, but at least we’d have a set of tests that could be run manually.
If that’s possible, we’d probably want to bake it into the test_bitcoin.exe binary to rule out python/shell middle-man issues.
Edit: thinking on it a little more, using -datadir for a test is overkill. We could just add a quick unit-test to add a subdir inside the current profile and touch a file in there.
dexX7 commented at 3:52 pm on May 4, 2015: contributor@laanwj: Question: at least for the 0.10 branch, can you code this so that it doesn’t affect behavior on non-Windows?
Is this the case? Travis is happy with the change applied on top of 0.10.1, and the bad-environment test passes as well. It does not bring back #3136, and at least to my understanding, it should restore the behavior of pre-0.10, but without the multithreading-deinitialization issue. But please don’t give my understanding much weight here.
All this is based on the assumption that path::codecvt() is called via
boost::filesystem::path
at some point, which then undergoes a similar path aspath::imbue()
does, to return the original locale.@theuni: Thanks for sticking with this and explaining the issue well.
It’s the least I can do, given that my fix for #3136 (deinitialization issue), and the follow up #5950 (bad environment handling), ultimately led in this situation.
We could just add a quick unit-test to add a subdir inside the current profile and touch a file in there.
Having more tests for this would be incredibly useful.
There might be a twist though: the unit tests (via TestingSetup) appear to use
pathTemp = GetTempPath() / ...
for the temporary test datadir, and this probably throws, even before any tests are executed, ifpath
can’t handle paths inside the current profile, becauseGetTempPath()
returns\Users\{username}\AppData\Local\Temp
as per default on Windows Vista+.Just brainstorming on this: what I tested manually might be wrapped into a Python test, which starts and stops nodes, using a bunch of datadirs with “exotic” characters. Unfortunally I’m not sure, which test data could be used for it, given that, appearingly, the system’s localization plays a significant role here (e.g.
Ϫۨ
passes on my system, whileюзза
, which seems to work fine for the original poster of #6078, doesn’t).Tackling this from another perspective: ideally there would also be some form of mitigation in the context of
GetDataDir()
([create datadir] and related), to a) catch potential exceptions, and b) provide some form of guidance for the user, or more telling messages, if it actually happens.Edit: where I’m getting: there are probably other related issues in the context of the filesystem, which are not related to localization. For example, using a datadir without permission, results in a segfault.
[squashme] simplify SetupEnvironment() (by dexX7) 3da7849007in src/util.cpp: in fda4da8025 outdated
736 } catch (const std::runtime_error&) { 737 setenv("LC_ALL", "C", 1); 738 } 739 #endif 740- // The path locale is lazy initialized and to avoid deinitialization errors 741+ // The path locale is lazy initialized and to avoid deinitialization errors
laanwj commented at 3:16 pm on May 8, 2015:Duplicated comment :)
jonasschnelli commented at 8:07 am on May 10, 2015:Fixed.jonasschnelli force-pushed on May 10, 2015jonasschnelli commented at 8:12 am on May 10, 2015: contributorI lost track of this PR. How we proceed here? @theuni @dexX7? Is this ready?
My tests still tell me that there are problems with special char datadir on windows (as it always was the case). Maybe this PR should concentrate on fixing #6078 without reintroducing #3136 (solved by #5877). There will still be cases where bitcoind/qt won’t start up. As example: if you use a Chinese username on a english windows.
dexX7 commented at 9:30 am on May 10, 2015: contributor@jonasschnelli: As example: if you use a Chinese username on a english windows.
It’s the best solution I’ve found, though this is still very unfortunate. As far as I can see, it restores the behavior of 0.9.3, without reintroducing #3136.
jonasschnelli commented at 9:38 am on May 10, 2015: contributorlaanwj merged this on May 10, 2015laanwj closed this on May 10, 2015
laanwj referenced this in commit 23254131a3 on May 10, 2015laanwj commented at 12:27 pm on May 10, 2015: memberCherry-picked to 0.10 as 424ae66 (the change is now subtle enough that no special path for WIN32 is necessary)theuni commented at 5:00 pm on May 10, 2015: memberpost-merge ACK.laanwj referenced this in commit 424ae6629b on May 11, 2015mart2038 commented at 8:33 pm on May 29, 2015: nonepost-merge comments. Kudos to you guys for sticking with a win32 issue, but… Why? ‘They’ say that coding for *nix is too human resource hungry due to intra-distribution incompatibilities… hmmmmm, Oooookay, if ’they’ say so. @sipa @laanwj EBCDIC, PETSCII You must have been rich! “When I were a lad” (olde Englysh comedy) we programmed our 4004 directly in binary using mechanical switches. Our output device was four LED’s. Encoding? Who needs it? ;-)reddink referenced this in commit 200b5b8c93 on May 27, 2020MarcoFalke locked this on Sep 8, 2021
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-22 00:12 UTC
More mirrored repositories can be found on mirror.b10c.me