refactor: Forbid calling unsafe fs::path(std::string) constructor and fs::path::string() method #22937

pull ryanofsky wants to merge 2 commits into bitcoin:master from ryanofsky:pr/u8path changing 44 files +346 −195
  1. ryanofsky commented at 8:35 am on September 10, 2021: contributor

    The fs::path class has a std::string constructor which will implicitly convert from strings. Implicit conversions like this are not great in general because they can hide complexity and inefficiencies in the code, but this case is especially bad, because after the transition from boost::filesystem to std::filesystem in #20744 the behavior of this constructor on windows will be more complicated and can mangle path strings. The fs::path class also has a .string() method which is inverse of the constructor and has the same problems.

    Fix this by replacing the unsafe method calls with PathToString and PathFromString function calls, and by forbidding unsafe method calls in the future.

  2. in src/addrdb.cpp:117 in 1e07139279 outdated
    113@@ -114,16 +114,16 @@ bool DeserializeFileDB(const fs::path& path, Data& data, int version)
    114     FILE* file = fsbridge::fopen(path, "rb");
    115     CAutoFile filein(file, SER_DISK, version);
    116     if (filein.IsNull()) {
    117-        LogPrintf("Missing or invalid file %s\n", path.string());
    118+        LogPrintf("Missing or invalid file %s\n", fs::PathToString(path));
    


    MarcoFalke commented at 8:56 am on September 10, 2021:
    Wasn’t path.string() already explicit enough? Also, when should u8string be used instead? Finally, strprintf("%s", path) without any helpers still compiles? When should/shouldn’t that be used?

    ryanofsky commented at 10:21 am on September 10, 2021:

    Wasn’t path.string() already explicit enough?

    The problem with path.string() isn’t the explicitness, it’s the unsafe behavior with the new path class on windows, where it returns a string in an arbitrary encoding based on the local code page. In preparation for this, this PR is disabling the path.string() now so it isn’t misused later, and adding PathToString as a replacement which preserves the behavior we need and is complementary to PathFromStrong.

    Also, when should u8string be used instead?

    u8string/u8path functions should be used for converting strings/paths where the strings must be valid UTF-8 strings. PathToString/PathFromString functions should be used otherwise.

    U8 functions are the right functions to call in JSON-RPC code and Qt string code because Qt strings and JSON requests/responses require valid UTF-8. But otherwise PathToString/PathFromString functions which pass along strings verbatim and avoid making encoding assumptions are more appropriate. PathToString/PathFromString functions are the right functions to call for parsing argv paths, parsing environment variable paths, parsing config paths, writing to log files, and calling POSIX APIs that expect and return native 8-bit path strings. In all these places UTF-8 encoding is typical and nice to have, but not something that should be enforced or required. If someone has their home directory set to a non-utf8 path, PathToString/PathFromString won’t mangle the string or care about the encoding.

    Everything above is actually documented in the followup commit 3e88c9f6b4e84ab5da78f4862266b5ff13bb8fc0 (tag) posted #20744 (comment), but I threw together the documentation really quickly while trying to converge on a solution, so it could use some cleanup and reshuffling, either in this PR or the std::filesystem PR later when these functions stop being identical stubs.

    Finally, strprintf("%s", path) without any helpers still compiles? When should/shouldn’t that be used?

    I can look into that because I’m not too familiar with tinyformat, but I would suspect that this always compiled and never worked at runtime maybe? I’m not sure how it could have worked unless tinyformat knew about boost types.


    MarcoFalke commented at 10:25 am on September 10, 2021:
    I think tinyformat just “emulates” our ::ToString(path), which should also compile (didn’t try)

    ryanofsky commented at 10:51 am on September 10, 2021:

    I think tinyformat just “emulates” our ::ToString(path), which should also compile (didn’t try)

    Apparently tinyformat is using operator«, which boost path and std path both provide. I found I could disable it with

    0template<class CharT, class Traits>
    1std::basic_ostream<CharT,Traits>& operator<<(std::basic_ostream<CharT,Traits>& os, const path& p)
    2{
    3    static_assert(sizeof(CharT) == -1, "Implicit path to string conversion is error-prone and not supported, please use explicit PathToString call");
    4    return os;
    5}
    

    But I found that this also broke some boost tests, and also I’m a little reluctant to make this change anyway since formatting support seems convenient for debugging. I’m leaving this alone for now but happy to do whatever people prefer here.


    MarcoFalke commented at 11:15 am on September 10, 2021:

    PathToString/PathFromString functions should be used otherwise U8 functions are the right functions to call in JSON-RPC code and Qt string code …

    I am not sure how this can be achieved, without something like bilingual_str.

    For example, InitErrors are passed to Qt, as well as stderr, as well as the debug log. If Qt requires u8string and the others require PathToString, then with the current approach here this is impossible to achieve, no?


    ryanofsky commented at 12:56 pm on September 11, 2021:

    Nothing we want to achieve should be impossible to achieve or require a different string type. If your home directory is "/home/\xf0" (invalid utf-8), and Qt needs to display an error message about peers.dat, it should just display it with a replacement character like “Error reading /home/�/.bitcoin/peers.dat”. If we want to return the same file path as JSON-RPC, we might choose a different escaping mechanism like python’s lone surrogates "/home/\udcf0/.bitcoin/peers.dat".

    At the boundaries between our application and external programs and libraries like JSON-RPC and Qt, different ways of representing paths as strings will be necessary. That is why standard encoding functions like u8path and u8string are useful, and why we will choose different encoding functions in different places we are interacting with external code. But internally, any path can always be represented as a path object, and any path can always be represented as a string of bytes, as long as we are consistent about how we interpret those bytes. The PathToString and PathFromString function provide a consistent way to represent path objects as strings of bytes, now that we will no longer be able to rely on the boost::filesystem::path class’s string methods. (The new std::filesystem::path class has similar methods, but unlike boost’s methods their behavior cannot be controlled with an imbue function, so they are unsafe and expose complicated operating system specific behavior and will mangle strings, which is why this PR avoids and prevents calling them.)

  3. DrahtBot commented at 9:40 am on September 10, 2021: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #23280 (init: Coalesce Chainstate loading sequence between {,non-}unittest codepaths by dongcarl)
    • #23203 ([POC] build: static musl libc based bitcoind (with LTO) by fanquake)
    • #22910 ([RFC] Encapsulate asmap in NetGroupManager by jnewbery)
    • #22663 (dbwrapper: wrap options in unique_ptr and use DeleteOptions by Crypt-iQ)
    • #22350 (Rotate debug.log file by LarryRuane)
    • #20974 (test: add test for corrupt wallet bdb logs by ryanofsky)
    • #20744 (Use std::filesystem. Remove Boost Filesystem & System by fanquake)
    • #20435 (util: use stronger-guarantee rename method by vasild)
    • #20331 (allow -loadblock blocks to be unsorted by LarryRuane)
    • #19461 (multiprocess: Add bitcoin-gui -ipcconnect option by ryanofsky)
    • #19460 (multiprocess: Add bitcoin-wallet -ipcconnect option by ryanofsky)
    • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)
    • #10102 ([experimental] Multiprocess bitcoin by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label Block storage on Sep 10, 2021
  5. DrahtBot added the label GUI on Sep 10, 2021
  6. DrahtBot added the label P2P on Sep 10, 2021
  7. DrahtBot added the label Refactoring on Sep 10, 2021
  8. DrahtBot added the label RPC/REST/ZMQ on Sep 10, 2021
  9. DrahtBot added the label TX fees and policy on Sep 10, 2021
  10. DrahtBot added the label Utils/log/libs on Sep 10, 2021
  11. DrahtBot added the label UTXO Db and Indexes on Sep 10, 2021
  12. DrahtBot added the label Wallet on Sep 10, 2021
  13. DrahtBot added the label Needs rebase on Sep 10, 2021
  14. MarcoFalke removed the label GUI on Sep 10, 2021
  15. MarcoFalke removed the label Wallet on Sep 10, 2021
  16. MarcoFalke removed the label TX fees and policy on Sep 10, 2021
  17. MarcoFalke removed the label UTXO Db and Indexes on Sep 10, 2021
  18. MarcoFalke removed the label RPC/REST/ZMQ on Sep 10, 2021
  19. MarcoFalke removed the label P2P on Sep 10, 2021
  20. MarcoFalke removed the label Block storage on Sep 10, 2021
  21. MarcoFalke removed the label Utils/log/libs on Sep 10, 2021
  22. hebasto commented at 10:54 am on September 10, 2021: member

    mingw complains:

     0wallet/rpcdump.cpp: In lambda function:
     1wallet/rpcdump.cpp:553:76: error: use of deleted function fs::path::path(const string&)
     2  553 |         file.open(request.params[0].get_str(), std::ios::in | std::ios::ate);
     3      |                                                                            ^
     4In file included from ./flatfile.h:11,
     5                 from ./chain.h:11,
     6                 from wallet/rpcdump.cpp:5:
     7./fs.h:40:5: note: declared here
     8   40 |     path(const std::string& string) = delete;
     9      |     ^~~~
    10make[2]: *** [Makefile:10361: wallet/libbitcoin_wallet_a-rpcdump.o] Error 1
    
  23. ryanofsky force-pushed on Sep 10, 2021
  24. ryanofsky commented at 10:57 am on September 10, 2021: contributor
    Rebased 1e071392799def86e1cb6d93e83b985df025f0f8 -> ddbfc51002d3ef3238bc796b1aec159f361828b9 (pr/u8path.1 -> pr/u8path.2, compare) due to conflict with #21222, also fixing win64 build error https://cirrus-ci.com/task/5157617598201856
  25. ryanofsky commented at 11:15 am on September 10, 2021: contributor

    re: #22937 (comment)

    mingw complains:

    Thanks for headsup, and this should be addressed in the last push

  26. ryanofsky commented at 11:19 am on September 10, 2021: contributor

    This PR is basically ready but I still am looking for feedback about about whether best solution to path.string() method and path(string) constructor brokenness on windows is to make these methods uncallable and replace with them with PathToString PathFromString functions like this PR is doing. Or to just override these methods to keep current boost path behavior, and actually do what we want, but not follow the standard path class behavior. Or to take an alternate approach like using the new (since mid-2019) windows build option to force a UTF-8 code page, and not worry about older windows versions.


    Other remaining todos here: improve fs.h documentation and split commit into two parts so fs.h functions are added in the first commit, and code is switched over to call the new functions in a second commit.

  27. MarcoFalke commented at 11:35 am on September 10, 2021: member

    Or to take an alternate approach like using the new (since mid-2019) windows build option to force a UTF-8 code page

    If this fixes all our issues and avoids developer headaches when writing code (which conversion function to use), that seems most preferable.

  28. ryanofsky commented at 12:06 pm on September 10, 2021: contributor

    Or to take an alternate approach like using the new (since mid-2019) windows build option to force a UTF-8 code page

    If this fixes all our issues and avoids developer headaches when writing code (which conversion function to use), that seems most preferable.

    Sorry, I thought I had posted more details/drawbacks about this alternate approach previously, but I don’t think I ever did. Details can be found at https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8

    The only impact I believe this alternate approach would have for future developers (as opposed to current reviewers) compared to the current approach is that future developers will be able to write p.string() instead of PathToString(p) and write fs::path(s) instead of fs::PathFromString(s). And I think this difference is actually a drawback, not a benefit of that approach, because the fs::path constructor is not explicit, and the implicit conversions can lead to unintended bugs that are less likely to be cause by reviewer. Also I think it is better to have our own PathToString and PathFromString functions because these give a way to document more obviously which functions should be used for our project, instead of leaving developers looking at abstract cppreference pages that do not give concrete information about what the functions actually do on different platforms, or give any guidance on whether it is appropriate to chose .string() or .u8string() or .native() conversion methods in the context of writing bitcoin application code.

    Another drawback of the UTF-8 windows code page build option is that is silently ignored by older versions of windows.

    Another drawback of the UTF-8 windows code page build option is that it is not yet unimplemented, and will require someone with more knowledge than me of both the MSVC and Mingw builds to implement, while this PR is a straightforward code change that is basically ready to go.


    Updated ddbfc51002d3ef3238bc796b1aec159f361828b9 -> d312e529ce904c0dc8244f29424b8cba9c7da9cd (pr/u8path.2 -> pr/u8path.3, compare) to fix https://cirrus-ci.com/task/6285153786920960

  29. DrahtBot removed the label Needs rebase on Sep 10, 2021
  30. ryanofsky force-pushed on Sep 10, 2021
  31. hebasto commented at 12:51 pm on September 10, 2021: member
    Out of curiosity, what is the leveldb approach to handle paths on Windows?
  32. hebasto commented at 1:26 pm on September 10, 2021: member
    friendly ping @sipsorcery
  33. ryanofsky commented at 4:19 pm on September 10, 2021: contributor

    friendly ping @sipsorcery

    :facepalm: To be clear I think that the build hack would be inferior to this PR in every way! I think think the build hack is not worth implementing or even discussing seriously. The only reason I brought it up was for completeness and to illustrate how fragile and unpredictable the std::filesystem::path(std::string) implicit constructor and std::filesystem::path::string() method are, and why I think we should forbid them in favor of explicit PathFromString/PathToString functions that we can call straightforwardly and explain straightforwardly, to preserve correct behavior of boost::filesystem::path, and avoid a small pitfall in std::filesystem::path!

    Out of curiosity, what is the leveldb approach to handle paths on Windows?

    leveldb just uses string paths, not boost::filessystem::path or std::filesystem::path. Handling windows paths is really not that complicated. All you have to do is avoid older code page APIs and use newer APIs that ignore the windows code page. In the new std::filesystem there are two methods that are using the windows code page, instead of ignoring it like we want, and like the boost::filesystem equivalents did. So this PR is just switching code to avoid these two particular methods, and prevent them from being called in the future.

  34. hebasto commented at 5:43 pm on September 10, 2021: member

    Concept ACK on adding own explicit path <-> string conversion functions.

    To be clear I think that the build hack would be inferior to this PR in every way! I think think the build hack is not worth implementing or even discussing seriously.

    Sorry Russ, but I don’t follow what “the build hack” do you mean.

    Out of curiosity, what is the leveldb approach to handle paths on Windows?

    leveldb just uses string paths, not boost::filessystem::path or std::filesystem::path. Handling windows paths is really not that complicated. All you have to do is avoid older code page APIs and use newer APIs that ignore the windows code page. In the new std::filesystem there are two methods that are using the windows code page, instead of ignoring it like we want, and like the boost::filesystem equivalents did. So this PR is just switching code to avoid these two particular methods, and prevent them from being called in the future.

    Thanks for explanation!

  35. ryanofsky commented at 6:00 pm on September 10, 2021: contributor

    Sorry Russ, but I don’t follow what “the build hack” do you mean.

    Sorry, my fault for making this thread really confusing, but I wasn’t sure the reason you were pinging windows build guru sipsorcery and I thought the reason was that you wanted sipsorcery to implement the “build option” discussed here #22937 (comment) which I do not think is a good idea, and I’m sorry I brought up!

  36. hebasto commented at 6:14 pm on September 10, 2021: member
    Do I understand correctly that all changes (except for fs.h and fs.cpp) are “forced” by a compiler due to https://github.com/bitcoin/bitcoin/blob/d312e529ce904c0dc8244f29424b8cba9c7da9cd/src/fs.h#L40-L42?
  37. ryanofsky commented at 6:29 pm on September 10, 2021: contributor

    Do I understand correctly that all changes (except for fs.h and fs.cpp) are “forced” by a compiler

    Yes that is exactly right. I’m working on splitting up the PR into commits and rewriting the description and documentation now that various test bugs have been fixed, but this PR is just forbidding calls to fs::path methods that work ok on unix, but are confusing, badly documented, and can mangle strings on windows. It is replacing those method calls with PathToString/PathFromString calls and u8string/u8path calls which all do the same thing in the boost::filesystem implemenation, but will avoid complexity and bugs after thestd::filesystem switch.

  38. sipsorcery commented at 6:39 pm on September 10, 2021: member

    @hebasto this stuff is a bit above my pay grade I’m afraid. I’m ok with the Win32 API’s, such as CreateFileA for ANSI, CreateFileW for Unicode (UTF-16) etc. but when it comes to the file paths in the standard library std::filesystem and boost I’m out of my depth.

    It’s a pity @ken2812221 isn’t still around, he seemed to have a good grip on this stuff. I’ve always wondered where he got to?

    https://github.com/bitcoin/bitcoin/pulls?q=is%3Apr+author%3Aken2812221+unicode

  39. ryanofsky commented at 6:53 pm on September 10, 2021: contributor

    I’m working on improving documentation so this will be easier to understand, but the idea is just that we need to call functions that convert strings predictably and ignore the current windows “code page”. We need to avoid calling functions that convert strings unpredictably and use the the windows code page.

    The fs::path class happens to have two methods which we are calling which ignore the windows code page currently in the boost::filesystem implementation but unfortunately will start using the windows code page after #20744 which switches to the std::filesystem implementation.

    So this PR is just forbidding calls to those two methods and replacing them with other calls.

  40. in src/fs.h:41 in d312e529ce outdated
    37+{
    38+public:
    39+    using boost::filesystem::path::path;
    40+    path(boost::filesystem::path path) : boost::filesystem::path::path(std::move(path)) {}
    41+    path(const std::string& string) = delete;
    42+    path& operator=(std::string&) = delete;
    


    hebasto commented at 7:14 pm on September 10, 2021:

    I think we should also add

    0    path& operator=(std::string&) = delete;
    1    path& operator/=(const std::string&) = delete;
    2    path& append(const std::string&) = delete;
    

    ryanofsky commented at 5:50 pm on September 13, 2021:

    re: #22937 (review)

    I think we should also add

    Great suggestion! Added these. Trying to block every method that might result in a bad string conversion is probably not worth it, but these methods are pretty frequently used, and blocking these did turn up two more possible bugs.

  41. MarcoFalke commented at 7:18 am on September 11, 2021: member
    @ryanofsky Have you seen my previous comment #22937 (review) ?
  42. in src/fs.cpp:254 in d312e529ce outdated
    248@@ -249,9 +249,9 @@ void ofstream::close()
    249 #else // __GLIBCXX__
    250 
    251 #if BOOST_VERSION >= 107700
    252-static_assert(sizeof(*BOOST_FILESYSTEM_C_STR(fs::path())) == sizeof(wchar_t),
    253+static_assert(sizeof(*BOOST_FILESYSTEM_C_STR(boost::filesystem::path())) == sizeof(wchar_t),
    254 #else
    255-static_assert(sizeof(*fs::path().BOOST_FILESYSTEM_C_STR) == sizeof(wchar_t),
    256+static_assert(sizeof(*boost::filesystem::path().BOOST_FILESYSTEM_C_STR) == sizeof(wchar_t),
    


    MarcoFalke commented at 10:44 am on September 13, 2021:
    Can probably remove all of this code, because BOOST_FILESYSTEM_C_STR is no longer used? If yes, maybe even split up this change

    fanquake commented at 12:48 pm on September 13, 2021:
    I’m happy to base #20744 on this change, if this is the direction we are taking, and save splitting the Boost removal over several PRs.

    MarcoFalke commented at 12:51 pm on September 13, 2021:
    If we want to move forward with removing boost::fs, it could even make sense to split up the #20744 build system changes, so that other pull requests can start using the other C++17 features.

    ryanofsky commented at 12:58 pm on September 13, 2021:

    I’m happy to base #20744 on this change, if this is the direction we are taking, and save splitting the Boost removal over several PRs.

    Yeah, sorry, I’ve been a slow iterating over the documentation and variations on the idea, but I’ll push an updated version of this and rebased version of #20744 today


    ryanofsky commented at 5:50 pm on September 13, 2021:

    re: #22937 (review)

    Can probably remove all of this code, because BOOST_FILESYSTEM_C_STR is no longer used? If yes, maybe even split up this change

    It would be wrong to remove the static asserts without removing the fsbridge::fopen code, which is still used. Followup PR #20744 removes all of this, though. The static asserts here are verifying that the fsbridge::fopen function is making valid assumptions and will work correctly at runtime. The first static assert is for newer versions of boost where that should always be the true, and the second static assert is for older versions of boost where it is only true on some platforms.

    Probably there is some way to write these static asserts without using BOOST_FILESYSTEM_C_STR, but it seems better to leave code unchanged except for adding namespace prefix, which is the same change made other places.

  43. in src/fs.h:59 in d312e529ce outdated
    39+    using boost::filesystem::path::path;
    40+    path(boost::filesystem::path path) : boost::filesystem::path::path(std::move(path)) {}
    41+    path(const std::string& string) = delete;
    42+    path& operator=(std::string&) = delete;
    43+    std::string string() const = delete;
    44+    std::string u8string() const { return boost::filesystem::path::string(); }
    


    MarcoFalke commented at 10:46 am on September 13, 2021:
    Would it make sense to make operator<<(a) call a<<PathToString(*this)?

    ryanofsky commented at 6:00 pm on September 13, 2021:

    re: #22937 (review)

    Would it make sense to make operator<<(a) call a<<PathToString(*this)?

    I’m more comfortable blocking methods that are unsafe than overriding their behavior so they no longer behave as described by the c++ standard. Also PathToString() might be the right string representation for some cases, and the wrong string representation for other cases. For writing to console output on windows for example, the standard behavior encoding to the current code page is better behavior than encoding to UTF-8.

    I would just leave this alone so print debugging works well and tests don’t have to change, unless there is argument that allowing this is really dangerous.


    ken2812221 commented at 8:18 pm on October 11, 2021:

    I prefer to overwrite the path constructor function and .string() to make it compatible with UTF-8 and remain the implicit conversion function. Because u8string() would return std::u8string in C++20 and I don’t think it is a nice idea to handle another type of string. There is no simple way to convert between std::string and std::u8string.

    If we want to avoid std::u8string showing in the code when we bump to C++20, the C++ standard behavior must change anyway. Isn’t it be simpler to modify the path constructor and .string() function? Also, we don’t have to change that many codes.


    ryanofsky commented at 0:41 am on October 12, 2021:

    re: #22937 (review)

    I prefer to overwrite the path constructor function and .string() to make it compatible with UTF-8 and remain the implicit conversion function.

    I think allowing implicit conversion will probably be a good idea a few years in the future when we can require the UTF-8 code page on windows, but right now in October 2021 implicit conversion is unsafe. The fact that it took 5 developers 1 month and 93 comments on this PR to discover all the unsafe implicit conversions that would happen after dropping imbue in #20744 is good evidence that implicit conversions are not safe right now. I think the history of all your past fixes in #15468, #14192, #14128, #13888, #13886, #13883, #13878, #13877, #13866, #13787, and #13734 are another lesson on why we should not silently convert unicode objects to and from strings using the windows code page.

    This PR does not get in the way of enabling implicit conversions in the future. The C++ standard authors obviously think that implicit conversions can be safe, and I’d agree with them as long as implicit conversions use some variant of UTF-8 instead of the current windows code page. The boost developers also thought implicit conversions were safe, but unlike the C++ standard path class, the boost path class actually provided a way to control the implicit conversions using imbue.

    We could also go the opposite direction on implicit conversions the future and limit them even more, see #22937 (review). But this PR is just trying to take a middle path and fix existing bugs that would happen with imbue dropped, while providing a good margin of safety to prevent new bugs being introduced.

    Because u8string() would return std::u8string in C++20 and I don’t think it is a nice idea to handle another type of string. There is no simple way to convert between std::string and std::u8string.

    If we want to avoid std::u8string showing in the code when we bump to C++20, the C++ standard behavior must change anyway. Isn’t it be simpler to modify the path constructor and .string() function? Also, we don’t have to change that many codes.

    I don’t think these comments about the u8string() method are related to whether implicit conversion should be allowed. This PR is just defining the u8string() method temporarily because boost path class doesn’t provide one, and this will be dropped in #20744 because the standard path class does provides one.

    This PR does add 7 calls to the u8string() method (only in Qt and RPC code) and these calls should be able to take advantage of the new std::u8string type whenever that update happens.

  44. ryanofsky force-pushed on Sep 13, 2021
  45. ryanofsky renamed this:
    refactor: Add fs::PathToString, fs::PathFromString, u8string, u8path functions
    refactor: Forbid calling unsafe fs::path(std::string) constructor and fs::path::string() method
    on Sep 13, 2021
  46. ryanofsky commented at 7:27 pm on September 13, 2021: contributor
    Updated d312e529ce904c0dc8244f29424b8cba9c7da9cd -> f735d6031d827e8a1bea5dcb00a11262be9c8a20 (pr/u8path.3 -> pr/u8path.4, compare) with suggested changes, also splitting commits and improving documentation
  47. in src/fs.h:37 in f735d6031d outdated
    33+class path : public boost::filesystem::path
    34+{
    35+public:
    36+    using boost::filesystem::path::path;
    37+
    38+    // Allow path objects arguments for comptability.
    


    hebasto commented at 8:23 pm on September 13, 2021:

    typo:

    0    // Allow path objects arguments for compatibility,.
    
  48. hebasto commented at 8:25 pm on September 13, 2021: member

    Looks great!

    Functional tests still fail on Windows for the combined branch.

  49. ryanofsky force-pushed on Sep 14, 2021
  50. ryanofsky commented at 2:38 pm on September 14, 2021: contributor
    Updated f735d6031d827e8a1bea5dcb00a11262be9c8a20 -> 650470227f9312b627b4bee3f85540f2e00f8bd7 (pr/u8path.4 -> pr/u8path.5, compare) fixing review comment about bad spelling, and fixing error in multiprocess build https://cirrus-ci.com/task/4610801555210240?logs=ci#L3269
  51. hebasto approved
  52. hebasto commented at 7:30 pm on September 14, 2021: member

    ACK 650470227f9312b627b4bee3f85540f2e00f8bd7

    I tested this PR by building #20744 on top of this one. In the combined branch functional test on Windows (MSVC build) fail due to the wrong encoding. They could be fixed by adding std::setlocale into the SetupEnvironment function (just a demo, not a suggestion for the real code). ~But I’m sure these encoding issues are out of this PR scope, and they belong to the fsbridge namespace (see #13866).~ See #22937 (comment)

    UPDATE I continued testing of the combined branch (#20744 on top of this PR), and found another function with unwanted implicit conversion which must be handled:

     0--- a/src/fs.h
     1+++ b/src/fs.h
     2@@ -59,6 +59,11 @@ static inline path operator+(path p1, path p2)
     3     return p1;
     4 }
     5 
     6+static inline path absolute(const path& p)
     7+{
     8+    return std::filesystem::absolute(p);
     9+}
    10+
    11 /**
    12  * Convert path object to byte string. On POSIX, paths natively are byte
    13  * strings so this is trivial. On Windows, paths natively are Unicode, so an
    14diff --git a/src/util/system.cpp b/src/util/system.cpp
    15index d6afe3b0f..cfacb2061 100644
    16--- a/src/util/system.cpp
    17+++ b/src/util/system.cpp
    18@@ -392,7 +392,7 @@ const fs::path& ArgsManager::GetBlocksDirPath() const
    19     if (!path.empty()) return path;
    20 
    21     if (IsArgSet("-blocksdir")) {
    22-        path = fs::absolute(GetArg("-blocksdir", ""));
    23+        path = fs::absolute(fs::PathFromString(GetArg("-blocksdir", "")));
    24         if (!fs::is_directory(path)) {
    25             path = "";
    26             return path;
    27@@ -419,7 +419,7 @@ const fs::path& ArgsManager::GetDataDir(bool net_specific) const
    28 
    29     std::string datadir = GetArg("-datadir", "");
    30     if (!datadir.empty()) {
    31-        path = fs::absolute(datadir);
    32+        path = fs::absolute(fs::PathFromString(datadir));
    33         if (!fs::is_directory(path)) {
    34             path = "";
    35             return path;
    36@@ -799,7 +799,7 @@ fs::path GetDefaultDataDir()
    37 bool CheckDataDirOption()
    38 {
    39     std::string datadir = gArgs.GetArg("-datadir", "");
    40-    return datadir.empty() || fs::is_directory(fs::absolute(datadir));
    41+    return datadir.empty() || fs::is_directory(fs::absolute(fs::PathFromString(datadir)));
    42 }
    43 
    44 fs::path GetConfigFile(const std::string& confPath)
    

    But I’m not sure whether the diff above should belong to this PR.

  53. hebasto commented at 0:43 am on September 15, 2021: member

    I’ve managed to make functional tests pass on Windows (MSVC build) in the combined branch.

    And the following patch was required:

     0--- a/src/addrdb.cpp
     1+++ b/src/addrdb.cpp
     2@@ -194,7 +194,7 @@ std::optional<bilingual_str> LoadAddrman(const std::vector<bool>& asmap, const A
     3     } catch (const DbNotFoundError&) {
     4         // Addrman can be in an inconsistent state after failure, reset it
     5         addrman = std::make_unique<CAddrMan>(asmap, /* deterministic */ false, /* consistency_check_ratio */ check_addrman);
     6-        LogPrintf("Creating peers.dat because the file was not found (%s)\n", path_addr);
     7+        LogPrintf("Creating peers.dat because the file was not found (%s)\n", fs::PathToString(path_addr));
     8         DumpPeerAddresses(args, *addrman);
     9     } catch (const std::exception& e) {
    10         addrman = nullptr;
    

    Could such changes be forced by a compiler? (some discussion above)

  54. laanwj commented at 8:46 am on September 15, 2021: member
    It’s so incredibly stupid that basic, straightforward, conversion functions like these are unsafe and we need to write our own replacements. Sorry, more of a C++ rant than anything else… along with the locale handling disaster it seems like an intentional minefield sometimes.
  55. ryanofsky force-pushed on Sep 16, 2021
  56. ryanofsky commented at 3:56 pm on September 16, 2021: contributor

    Thank you @hebasto for debugging and fixing this! 🙇

    Updated 650470227f9312b627b4bee3f85540f2e00f8bd7 -> 127cce3ac9d0aab6401eafbe8d4f0d2400f1ccf4 (pr/u8path.5 -> pr/u8path.6, compare) with these fixes

    re: #22937 (comment)

    I’ve managed to make functional tests pass on Windows (MSVC build) in the combined branch. … Could such changes be forced by a compiler?

    Yes, added tinyformat override so there will be link errors.

  57. ryanofsky force-pushed on Sep 16, 2021
  58. ryanofsky commented at 4:34 pm on September 16, 2021: contributor
    Updated 127cce3ac9d0aab6401eafbe8d4f0d2400f1ccf4 -> 552e87e342a670808a227653a52134661400afcc (pr/u8path.6 -> pr/u8path.7, compare) fixing unintended formatting changes caused by adding PathToString calls, which made some functional tests fail
  59. kiminuo commented at 3:03 pm on September 19, 2021: contributor

    I did try to search for .string() and I got these occurrences:

     019 results - 11 files
     1
     2src\dbwrapper.cpp:
     3  117  CDBWrapper::CDBWrapper(const fs::path& path, size_t nCacheSize, bool fMemory, bool fWipe, bool obfuscate)
     4  118:     : m_name{path.stem().string()}
     5  119  {
     6
     7src\init.cpp:
     8  1097                    "also be data loss if bitcoin is started while in a temporary directory.\n",
     9  1098:                   args.GetArg("-datadir", ""), fs::current_path().string());
    10  1099      }
    11
    12src\node\blockstorage.cpp:
    13  70          if (fs::is_regular_file(*it) &&
    14  71:             it->path().filename().string().length() == 12 &&
    15  72:             it->path().filename().string().substr(8,4) == ".dat")
    16  73          {
    17  74:             if (it->path().filename().string().substr(0, 3) == "blk") {
    18  75:                 mapBlockFiles[it->path().filename().string().substr(3, 5)] = it->path();
    19  76:             } else if (it->path().filename().string().substr(0, 3) == "rev") {
    20  77                  remove(it->path());
    21
    22src\test\validation_chainstatemanager_tests.cpp:
    23  186      BOOST_TEST_MESSAGE(
    24  187:         "Wrote UTXO snapshot to " << snapshot_path.make_preferred().string() << ": " << result.write());
    25  188  
    26
    27src\util\system.cpp:
    28  121      LOCK(cs_dir_locks);
    29  122:     dir_locks.erase((directory / lockfile_name).string());
    30  123  }
    31
    32  246      auto result = path;
    33  247:     while (result.filename().string() == ".") {
    34  248          result = result.parent_path();
    35
    36src\wallet\bdb.cpp:
    37  831          LOCK(cs_db); // Lock env.m_databases until insert in BerkeleyDatabase constructor
    38  832:         std::string data_filename = data_file.filename().string();
    39  833          std::shared_ptr<BerkeleyEnvironment> env = GetBerkeleyEnv(data_file.parent_path());
    40  834          if (env->m_databases.count(data_filename)) {
    41  835:             error = Untranslated(strprintf("Refusing to load database. Data file '%s' is already loaded.", (env->Directory() / data_filename).string()));
    42  836              status = DatabaseStatus::FAILED_ALREADY_LOADED;
    43
    44src\wallet\bdb.h:
    45  143      /** Return path to main database filename */
    46  144:     std::string Filename() override { return (env->Directory() / strFile).string(); }
    47  145  
    48
    49src\wallet\db.cpp:
    50  22                  it.no_push();
    51  23:                 LogPrintf("%s: %s %s -- skipping.\n", __func__, ec.message(), it->path().string());
    52  24              } else {
    53  25:                 LogPrintf("%s: %s %s\n", __func__, ec.message(), it->path().string());
    54  26              }
    55
    56  53          } catch (const std::exception& e) {
    57  54:             LogPrintf("%s: Error scanning %s: %s\n", __func__, it->path().string(), e.what());
    58  55              it.no_push();
    59
    60src\wallet\test\db_tests.cpp:
    61  18      fs::path data_file = BDBDataFile(path);
    62  19:     database_filename = data_file.filename().string();
    63  20      return GetBerkeleyEnv(data_file.parent_path());
    64
    65src\wallet\test\wallet_tests.cpp:
    66  260  
    67  261:     std::string backup_file = (gArgs.GetDataDirNet() / "wallet.backup").string();
    68  262  
    

    … and I wonder whether cases like the following ones (and probably others) should use the new safe API functions (i.e. fs::PathToString):

    Please correct me if I’m wrong but my understanding is that fs::PathToString (and other helper functions) should be used as much as possible (and if not then it should be probably commented). Am I missing something?

    Thanks for an explanation!

  60. ryanofsky force-pushed on Sep 19, 2021
  61. ryanofsky commented at 7:12 pm on September 19, 2021: contributor

    I did try to search for .string() and I got these occurrences:

    Thanks! Great idea.

    Please correct me if I’m wrong but my understanding is that fs::PathToString (and other helper functions) should be used as much as possible (and if not then it should be probably commented). Am I missing something?

    No that’s right. I just didn’t think to search for these, and the tests that add special characters to directory names might not have caught some of these instances because they were only encoding filenames.

    Thanks for an explanation!

    Thank you! Updated 552e87e342a670808a227653a52134661400afcc -> 8026743822c52b723c7ce1dd322fd32dfc23a79b (pr/u8path.7 -> pr/u8path.8, compare) with replacements


    Updated 8026743822c52b723c7ce1dd322fd32dfc23a79b -> 6ed81b11783d5d71ebd009338fceafa0eb5ddfab (pr/u8path.8 -> pr/u8path.9, compare) to try to fix quoting error https://cirrus-ci.com/task/4573928086568960?logs=functional_tests#L39 Updated 6ed81b11783d5d71ebd009338fceafa0eb5ddfab -> 5cb67e7d8be0c65afba1e4fdfb2073616a9f4fa9 (pr/u8path.9 -> pr/u8path.10, compare) to fix quoting error https://cirrus-ci.com/task/6291803805581312?logs=functional_tests#L39 Updated 5cb67e7d8be0c65afba1e4fdfb2073616a9f4fa9 -> a1b9c0f2a557dce39ae83f1738b7668c586f376e (pr/u8path.10 -> pr/u8path.11, compare) to fix quoting segfault https://cirrus-ci.com/task/6281822167367680?logs=functional_tests#L39

  62. fanquake referenced this in commit 8f022a59b8 on Sep 21, 2021
  63. ryanofsky force-pushed on Sep 22, 2021
  64. ryanofsky closed this on Sep 22, 2021

  65. ryanofsky reopened this on Sep 22, 2021

  66. ryanofsky force-pushed on Sep 22, 2021
  67. ryanofsky force-pushed on Sep 23, 2021
  68. DrahtBot added the label Needs rebase on Sep 23, 2021
  69. ryanofsky force-pushed on Sep 24, 2021
  70. ryanofsky commented at 9:36 pm on September 24, 2021: contributor
    Rebased a1b9c0f2a557dce39ae83f1738b7668c586f376e -> 8a2f979ea78c116dc586da9d18ef1697b297ef01 (pr/u8path.11 -> pr/u8path.12, compare) due to conflict with #19806
  71. DrahtBot removed the label Needs rebase on Sep 24, 2021
  72. in src/fs.h:85 in 8a2f979ea7 outdated
    73+}
    74+
    75+// Allow explicit quoted stream I/O.
    76+static inline auto quoted(const std::string& s)
    77+{
    78+    return boost::io::quoted(s, '&');
    


    hebasto commented at 12:05 pm on September 26, 2021:

    8a2f979ea78c116dc586da9d18ef1697b297ef01

    It is unfortunate that we have to do this.

    Anyway, using of boost::io::quoted requires to include headers as follows:

     0diff --git a/src/fs.h b/src/fs.h
     1index e15dc3051..56934ea29 100644
     2--- a/src/fs.h
     3+++ b/src/fs.h
     4@@ -14,6 +14,13 @@
     5 #include <boost/filesystem.hpp>
     6 #include <boost/filesystem/fstream.hpp>
     7 
     8+#include <boost/version.hpp>
     9+#if BOOST_VERSION >= 107300
    10+#include <boost/io/quoted.hpp>
    11+#else
    12+#include <boost/io/detail/quoted_manip.hpp>
    13+#endif
    14+
    15 /** Filesystem operations and types */
    16 namespace fs {
    17 
    18diff --git a/test/lint/lint-includes.sh b/test/lint/lint-includes.sh
    19index bf7aeb5b4..974da45b9 100755
    20--- a/test/lint/lint-includes.sh
    21+++ b/test/lint/lint-includes.sh
    22@@ -56,6 +56,8 @@ EXPECTED_BOOST_INCLUDES=(
    23     boost/date_time/posix_time/posix_time.hpp
    24     boost/filesystem.hpp
    25     boost/filesystem/fstream.hpp
    26+    boost/io/detail/quoted_manip.hpp
    27+    boost/io/quoted.hpp
    28     boost/multi_index/hashed_index.hpp
    29     boost/multi_index/ordered_index.hpp
    30     boost/multi_index/sequenced_index.hpp
    31@@ -65,6 +67,7 @@ EXPECTED_BOOST_INCLUDES=(
    32     boost/signals2/optional_last_value.hpp
    33     boost/signals2/signal.hpp
    34     boost/test/unit_test.hpp
    35+    boost/version.hpp
    36 )
    37 
    38 for BOOST_INCLUDE in $(git grep '^#include <boost/' -- "*.cpp" "*.h" | cut -f2 -d: | cut -f2 -d'<' | cut -f1 -d'>' | sort -u); do
    

    ryanofsky commented at 11:50 am on September 27, 2021:

    re: #22937 (review)

    It is unfortunate that we have to do this.

    What’s unfortunate that we have to do? If you want to disallow the quoted function (by deleting this), or use a different kind of escaping (by changing this), these things would be possible in a followup. I also think boost’s choice of quoting behavior is a little weird, but this PR is just acting as a pure refactoring not changing what the code is doing.

    Anyway, using of boost::io::quoted requires to include headers as follows:

    I don’t think this is true, since we are already including boost path headers, and just using the same functionality we know they use. The only potential problem I could see here is that it is making assumptions about the boost implementation. But the whole PR is doing that necessarily for a seamless transition from boost::filesystem to std::filesystem in #20744. So I don’t see a practical issue, and certainly no concern after #20744.

    If there is an actual include problem here, I could call std::quoted instead of boost::io::quoted, but it seemed better leave the transition from boost to std for the followup PR and keep the refactoring in this PR more straightforward.


    hebasto commented at 12:38 pm on September 27, 2021:

    Anyway, using of boost::io::quoted requires to include headers as follows:

    I don’t think this is true…

    It was required to compile the combined branch – this PR + #20744.


    ryanofsky commented at 12:47 pm on September 27, 2021:

    It was required to compile the combined branch – this PR + #20744.

    Maybe we can move the discussion to #20744 if there are continuing issues there but I think #20744 should not be calling boost::io::quoted but should switch to std::quoted instead. We can also choose to get rid of this fs::quoted function entirely in a new PR between this PR and #20744 if we are willing to changing logging behavior a little bit.


    hebasto commented at 12:55 pm on September 27, 2021:
    Agree.
  73. hebasto commented at 3:29 pm on September 26, 2021: member

    I tested the combined branch (this PR + #20744) on top of the master (16ccb3a1cd9125eb24a5b45a98099ff98660767a), and found another implicit type conversion case which could be fixed with the following patch:

     0diff --git a/src/fs.h b/src/fs.h
     1index e15dc3051..9d611f44f 100644
     2--- a/src/fs.h
     3+++ b/src/fs.h
     4@@ -71,6 +71,11 @@ static inline path system_complete(const path& p)
     5     return boost::filesystem::system_complete(p);
     6 }
     7 
     8+static inline bool exists(const path& p)
     9+{
    10+    return boost::filesystem::exists(p);
    11+}
    12+
    13 // Allow explicit quoted stream I/O.
    14 static inline auto quoted(const std::string& s)
    15 {
    16diff --git a/src/wallet/rpcwallet.cpp b/src/wallet/rpcwallet.cpp
    17index 4e92889aa..c255a950c 100644
    18--- a/src/wallet/rpcwallet.cpp
    19+++ b/src/wallet/rpcwallet.cpp
    20@@ -2842,7 +2842,7 @@ static RPCHelpMan restorewallet()
    21 
    22     std::string backup_file = request.params[1].get_str();
    23 
    24-    if (!fs::exists(backup_file)) {
    25+    if (!fs::exists(fs::PathFromString(backup_file))) {
    26         throw JSONRPCError(RPC_INVALID_PARAMETER, "Backup file does not exist");
    27     }
    28 
    

    UPDATE:

    fs::copy_file arguments also must be protected from implicit type conversion.

    The updated patch for src/wallet/rpcwallet.cpp looks as following:

     0--- a/src/wallet/rpcwallet.cpp
     1+++ b/src/wallet/rpcwallet.cpp
     2@@ -2840,7 +2840,7 @@ static RPCHelpMan restorewallet()
     3 
     4     WalletContext& context = EnsureWalletContext(request.context);
     5 
     6-    std::string backup_file = request.params[1].get_str();
     7+    auto backup_file = fs::PathFromString(request.params[1].get_str());
     8 
     9     if (!fs::exists(backup_file)) {
    10         throw JSONRPCError(RPC_INVALID_PARAMETER, "Backup file does not exist");
    
  74. ryanofsky force-pushed on Sep 27, 2021
  75. ryanofsky commented at 12:28 pm on September 27, 2021: contributor

    I tested the combined branch (this PR + #20744) on top of the master (16ccb3a), and found another implicit type conversion case which could be fixed with the following patch:

    Another string bites the dust. Thanks!

    Updated 8a2f979ea78c116dc586da9d18ef1697b297ef01 -> 371b06d77a53cd1e9ac10c9f2b0ff9c56a2d8d68 (pr/u8path.12 -> pr/u8path.13, compare) with suggested backup file fix and fs::exists string protection.

  76. hebasto commented at 1:16 pm on September 27, 2021: member
    The CI error “[WinError 10054] An existing connection was forcibly closed by the remote host” seems unrelated to this PR changes. Job restarted.
  77. hebasto approved
  78. hebasto commented at 2:04 pm on September 27, 2021: member
    ACK 371b06d77a53cd1e9ac10c9f2b0ff9c56a2d8d68
  79. kiminuo commented at 5:33 pm on September 27, 2021: contributor

    @ryanofsky I was looking around and I have found this function in system.cpp:

    0bool RenameOver(fs::path src, fs::path dest)
    1{
    2#ifdef WIN32
    3    return MoveFileExW(src.wstring().c_str(), dest.wstring().c_str(),
    4                       MOVEFILE_REPLACE_EXISTING) != 0;
    5#else
    6    int rc = std::rename(src.c_str(), dest.c_str());
    7    return (rc == 0);
    8#endif /* WIN32 */
    9}
    

    (https://github.com/ryanofsky/bitcoin/blob/371b06d77a53cd1e9ac10c9f2b0ff9c56a2d8d68/src/util/system.cpp#L1064-L1073)

    Is .wstring().c_str() correct? I would expect your fs::PathToString fuction to be used here too … or maybe I would expect std::filesystem::rename here to avoid #ifdef completely. (A lot of guessing on my side.)

    Edit: should the function be moved to fs.h potentially?

    Edit: If this concern is valid, then it is useful to search for .wstring().c_str().

  80. ryanofsky commented at 5:53 pm on September 27, 2021: contributor

    Is .wstring().c_str() correct? I would expect your fs::PathToString fuction to be used here too

    This is actually correct and calling PathToString would be wrong here. The idea behind this code is to avoid any conversion to narrow strings. The path object on windows natively uses a wide string, wstring() returns it, and MoveFileExW expects it. So there is no need for this code to change, even though it could be replaced by std::rename anytime I think if someone wanted to do that in a different PR

  81. kiminuo commented at 6:23 pm on September 27, 2021: contributor
  82. ryanofsky force-pushed on Sep 27, 2021
  83. ryanofsky commented at 11:11 pm on September 27, 2021: contributor

    https://github.com/ryanofsky/bitcoin/blob/371b06d77a53cd1e9ac10c9f2b0ff9c56a2d8d68/src/i2p.cpp#L331 - Should fs::PathToString be used here?

    Wow, great find! This should have triggered a build error, and it was a nightmare to figure out what was allowing this to succeed, but it turned out I was overloading the tinyformat formatValue function in an imperfect way that caused the overload to be ignored if fs.h was included after tinyformat.h, and only used when fs.h was included before tinyformat.h. This worked most places, but was broken in the i2p source file. I fixed this by switching formatValue declarations to be template specializations instead of overloads to prevent this fragility and make the check actually reliable.


    Updated 371b06d77a53cd1e9ac10c9f2b0ff9c56a2d8d68 -> b95db7a0393a4da8954f0a63371a673231c36b6e (pr/u8path.13 -> pr/u8path.14, compare) to prevent a path with a locale-specific encoding from being formatted in an I2P error message on windows after #20744, and to fix a valueFormat overload bug that allowed this bad formatting to go undetected.

  84. in src/fs.h:234 in b95db7a039 outdated
    225@@ -103,4 +226,11 @@ namespace fsbridge {
    226 #endif // WIN32 && __GLIBCXX__
    227 };
    228 
    229+// Disallow implicit std::string conversion for tinyformat to avoid
    230+// locale-dependent encoding on windows.
    231+namespace tinyformat {
    232+template<> inline void formatValue(std::ostream&, const char*, const char*, int, const boost::filesystem::path&);
    233+template<> inline void formatValue(std::ostream&, const char*, const char*, int, const fs::path&);
    234+} // namespace tinyformat
    


    hebasto commented at 10:10 am on September 28, 2021:
    A note (not an issue). If code contains a formatValue call which assumes implicit std::string conversion, an error will be generated by a linker, not a compiler.

    hebasto commented at 11:12 am on September 28, 2021:
    0// Disallow implicit std::string conversion for tinyformat to avoid
    1// locale-dependent encoding on windows.
    2namespace tinyformat {
    3template<> inline void formatValue(std::ostream&, const char*, const char*, int, const fs::path&) = delete;
    4} // namespace tinyformat
    

    to force a compiler to report errors?

    Also the explicit template specialization with the boost::filesystem::path type seems redundant, no?


    ryanofsky commented at 1:31 pm on September 28, 2021:

    to force a compiler to report errors?

    Wow, I had no idea you could delete nonmember functions. It is much nicer to have this caught by the compiler instead of the linker. Will update.

    Also the explicit template specialization with the boost::filesystem::path type seems redundant, no?

    This is just to catch cases like strformat("%s", path1 / path2) or whatever else might return a std::filesystem::path instead of an fs::path


    hebasto commented at 1:47 pm on September 28, 2021:

    This is just to catch cases like strformat("%s", path1 / path2) or whatever else might return a std::filesystem::path instead of an fs::path

    We do not use the same assumption for other functions. Why just here?


    ryanofsky commented at 1:57 pm on September 28, 2021:

    We do not use the same assumption for other functions. Why just here?

    If there are other places where it’s easy to catch more encoding problems, I would be happy to add overloads. This case does work a little differently from other cases because it is template specialization not a function overload. A function overload will match when a subclass instance is passed to a parent class parameter, while template specialization will not match because type deduction won’t deduce a parent type from a child type.


    hebasto commented at 1:59 pm on September 28, 2021:

    Thanks for an explanation!

    Maybe add a comment to the code?


    ryanofsky commented at 2:47 pm on September 28, 2021:
    If we did want to be more vigilant about possible encoding bugs, we could have a followup PR that drops public inheritance class path : public boost::filesystem::path and wraps a private member instead, or uses private inheritance, to make the class a whitelist of allowed functions instead of a blacklist of disallowed functions. The changes required would be a superset of changes already in this PR, and it’s unclear whether any bugs this might catch would be severe enough to justify the extra maintenance burden, so I didn’t try this here. But it is something that could be done later.
  85. hebasto approved
  86. hebasto commented at 10:11 am on September 28, 2021: member

    re-ACK b95db7a0393a4da8954f0a63371a673231c36b6e

    I grepped git grep -n -e 'strprintf.*fs::', and wonder why do we use fs::quoted to wrap fs::PathToString only in certain cases?

  87. ryanofsky commented at 1:37 pm on September 28, 2021: contributor

    I grepped git grep -n -e 'strprintf.*fs::', and wonder why do we use fs::quoted to wrap fs::PathToString only in certain cases?

    I’m just preserving previous behavior in all cases. We previously sometimes printed paths with quotes, sometimes printed paths without quotes, and this PR is keeping it all the same so nothing breaks (tests, monitoring tools, etc)

  88. ryanofsky force-pushed on Sep 28, 2021
  89. ryanofsky commented at 1:58 pm on September 28, 2021: contributor
    Updated b95db7a0393a4da8954f0a63371a673231c36b6e -> bf9197c563cc62847a331df38c0a612433ea8beb (pr/u8path.14 -> pr/u8path.15, compare) taking hebasto’s suggestion and making the tinyformat link errors into compile errors
  90. hebasto approved
  91. hebasto commented at 2:10 pm on September 28, 2021: member
    re-ACK bf9197c563cc62847a331df38c0a612433ea8beb, only suggested changes since my previous review.
  92. in src/test/fs_tests.cpp:24 in bf9197c563 outdated
    19+    BOOST_CHECK_EQUAL(fs::PathFromString(u8_str).u8string(), u8_str);
    20+    BOOST_CHECK_EQUAL(fs::PathToString(fs::u8path(u8_str)), u8_str);
    21+#ifndef WIN32
    22+    // On non-windows systems, verify that non-UTF8 strings can be round
    23+    // tripped successfully. On windows, file paths are natively represented as
    24+    // unicode strings, so an invalid UTF-8 sequence that does not map to any
    


    kiminuo commented at 9:08 pm on October 4, 2021:

    nit: I find the sentence “On windows, file paths are natively represented as unicode strings, [..]” slightly confusing because of that “unicode strings”. My immediate understanding is something like “a sequence of bytes in either UTF-8, UTF-16, etc. encoding” which makes me guess for no good reason :)

    According to https://docs.microsoft.com/en-us/cpp/standard-library/filesystem?view=msvc-160#syntax (and your comment too), it holds that Windows uses a null-terminated sequence of wchar_t, encoded as UTF-16 (one or more elements for each character).

    So I would expect something like “On Windows, file paths are natively represented by wchar_t sequences, encoded as UTF-16 strings, [..]”

    Hopefully, I get it right.


    ryanofsky commented at 7:06 am on October 5, 2021:

    nit: I find the sentence “On windows, file paths are natively represented as unicode strings, [..]” slightly confusing because of that “unicode strings”.

    Thanks rewrote this and replaced “unicode strings” with “unicode”. The character types and encodings that windows uses internally are not really relevant here. The point is just that paths on windows are unicode and have to be encoded as bytes, while paths on posix are already bytes and do not get encoded.

    My immediate understanding is something like “a sequence of bytes in either UTF-8, UTF-16, etc. encoding” which makes me guess for no good reason :)

    This isn’t actually true, see https://github.com/rust-lang/rust/issues/12056 as a starting point if you are really curious about this. But this comment is just explaining why the test is skipped on windows.

    According to https://docs.microsoft.com/en-us/cpp/standard-library/filesystem?view=msvc-160#syntax (and your comment too), it holds that Windows uses a null-terminated sequence of wchar_t, encoded as UTF-16 (one or more elements for each character).

    This seems to be a simplification, they are wchar_t, but not strictly valid utf-16. Not too relevant to the API in this PR, though, which is just trying to preserve current behavior and use a safe string encoding.

  93. in src/test/fs_tests.cpp:17 in bf9197c563 outdated
    10@@ -11,6 +11,26 @@
    11 
    12 BOOST_FIXTURE_TEST_SUITE(fs_tests, BasicTestingSetup)
    13 
    14+BOOST_AUTO_TEST_CASE(fsbridge_pathtostring)
    15+{
    16+    std::string u8_str = "fs_tests_₿_🏃";
    17+    BOOST_CHECK_EQUAL(fs::PathToString(fs::PathFromString(u8_str)), u8_str);
    


    kiminuo commented at 9:37 pm on October 4, 2021:

    I wonder if it is worth adding a test for .stem() method too. We use it in dbwrapper.cpp.

    The test can be simply:

    0BOOST_CHECK_EQUAL(fs::PathToString(fs::PathFromString(u8_str).stem()), u8_str);
    

    or something like

    0std::string test_filename = "fs_tests_₿_🏃.dat";
    1std::string expected_stem = "fs_tests_₿_🏃";
    2BOOST_CHECK_EQUAL(fs::PathToString(fs::PathFromString(u8_filename).stem()), expected_stem);
    

    The reasoning behind this is to have a test that tests: string -> path -> modified path -> string. So we would be more sure that there is no std::filesystem implementation that behaves in a surprising way (I’m thinking about Android std::filesystem implementation that seems to be quite new.)

    Anyway, it’s just an idea from a category “maybe nice to have” and not “must have”. Feel free to ignore.

    edit: or .filename() instead of .stem().


    ryanofsky commented at 6:48 am on October 5, 2021:

    I wonder if it is worth adding a test for .stem() method too.

    Thanks, added suggested test

  94. ryanofsky force-pushed on Oct 5, 2021
  95. ryanofsky commented at 7:36 am on October 5, 2021: contributor
    Rebased bf9197c563cc62847a331df38c0a612433ea8beb -> 2f4a115f4aa394323118e0b9d3569a089e280360 (pr/u8path.15 -> pr/u8path.16, compare) after #23136 to include fix for wallet_basic.py Fee of 0.00000255 BTC too low! error https://cirrus-ci.com/task/4686259315539968?logs=ci#L30, and also including new review suggestions
  96. DrahtBot added the label Needs rebase on Oct 5, 2021
  97. refactor: Add fs::PathToString, fs::PathFromString, u8string, u8path functions
    There is no change in behavior. This just helps prepare for the
    transition from the boost::filesystem to the std::filesystem path
    implementation.
    
    Co-authored-by: Kiminuo <kiminuo@protonmail.com>
    b39a477ec6
  98. refactor: Block unsafe fs::path std::string conversion calls
    There is no change in behavior. This just helps prepare for the
    transition from boost::filesystem to std::filesystem by avoiding calls
    to methods which will be unsafe after the transaction to std::filesystem
    to due lack of a boost::filesystem::path::imbue equivalent and inability
    to set a predictable locale.
    
    Co-authored-by: Hennadii Stepanov <32963518+hebasto@users.noreply.github.com>
    Co-authored-by: Kiminuo <kiminuo@protonmail.com>
    Co-authored-by: MarcoFalke <falke.marco@gmail.com>
    6544ea5035
  99. ryanofsky force-pushed on Oct 5, 2021
  100. ryanofsky commented at 4:10 pm on October 5, 2021: contributor
    Rebased 2f4a115f4aa394323118e0b9d3569a089e280360 -> 6544ea5035268025207d2402db2f7d90fde947a6 (pr/u8path.16 -> pr/u8path.17, compare) due to conflict with #22950
  101. DrahtBot removed the label Needs rebase on Oct 5, 2021
  102. hebasto approved
  103. hebasto commented at 9:30 pm on October 6, 2021: member

    re-ACK 6544ea5035268025207d2402db2f7d90fde947a6, only added fsbridge_stem test case, updated comment, and rebased since my previous review. Verified with the following command:

    0$ git range-diff master bf9197c563cc62847a331df38c0a612433ea8beb 6544ea5035268025207d2402db2f7d90fde947a6
    
  104. kiminuo commented at 8:01 am on October 9, 2021: contributor

    @hebasto I compiled on my Windows 10 machine using instructions here: https://github.com/ryanofsky/bitcoin/blob/pr/u8path/build_msvc/README.md#building

    Note that bitcoin source files are located in folder C:\dev\fs_tests_₿_🏃 on my machine.

    Now when I attempt to run bitcoind.exe with a unicode datadir (using PowerShell 7), it just stops for me saying that the directory does not exist:

     0PS C:\dev\fs_tests_₿_🏃\bitcoin-msvs\build_msvc\x64\Release> .\bitcoind.exe -testnet -datadir="C:\dev\fs_tests_₿_🏃\datadir"
     1Error: Specified data directory "C:\dev\fs_tests_₿_🏃\datadir" does not exist.
     2
     3PS C:\dev\fs_tests_₿_🏃\bitcoin-msvs\build_msvc\x64\Release> mkdir "C:\dev\fs_tests_₿_🏃\datadir"
     4
     5    Directory: C:\dev\fs_tests_₿_🏃
     6
     7Mode                 LastWriteTime         Length Name
     8----                 -------------         ------ ----
     9d----          09.10.2021     9:51                datadir
    10
    11PS C:\dev\fs_tests_₿_🏃\bitcoin-msvs\build_msvc\x64\Release> .\bitcoind.exe -testnet -datadir="C:\dev\fs_tests_₿_🏃\datadir"
    12Error: Specified data directory "C:\dev\fs_tests_₿_🏃\datadir" does not exist.
    

    I’m not sure if I have not made an error somewhere but it would be great to know if this is a bug or not (and whose). And if it is, then if it is replicable on other machines.

  105. hebasto commented at 8:30 am on October 9, 2021: member

    @hebasto I compiled on my Windows 10 machine using instructions here: https://github.com/ryanofsky/bitcoin/blob/pr/u8path/build_msvc/README.md#building

    Why not using the updated build instructions?

    In the Command Prompt everything works as expected:

     0>git fetch origin pull/22937/merge:pr22937m-1009
     1>git switch pr22937m-1009
     2>cd .\build_msvc
     3>python msvc-autogen.py
     4>msbuild -property:Configuration=Release -maxCpuCount -verbosity:minimal bitcoin.sln
     5>cd .\x64\Release
     6>.\bitcoind.exe -testnet -datadir="C:\dev\fs_tests_₿_🏃\datadir"
     7Error: Specified data directory "C:\dev\fs_tests_₿_🏃\datadir" does not exist.
     8
     9
    10>mkdir "C:\dev\fs_tests_₿_🏃\datadir"
    11
    12>.\bitcoind.exe -testnet -datadir="C:\dev\fs_tests_₿_🏃\datadir"
    132021-10-09T08:22:03Z Bitcoin Core version v22.99.0-unk (release build)
    142021-10-09T08:22:03Z Assuming ancestors of block 0000000000004ae2f3896ca8ecd41c460a35bf6184e145d91558cece1c688a76 have valid signatures.
    152021-10-09T08:22:03Z Setting nMinimumChainWork=0000000000000000000000000000000000000000000005180c3bd8290da33a1a
    162021-10-09T08:22:03Z Using the 'standard' SHA256 implementation
    172021-10-09T08:22:03Z Default data directory C:\Users\hebasto\AppData\Roaming\Bitcoin
    182021-10-09T08:22:03Z Using data directory C:\dev\fs_tests_₿_🏃\datadir\testnet3
    19...
    
  106. kiminuo commented at 9:56 am on October 9, 2021: contributor

    Why not using the updated build instructions?

    I forgot they have changed. Thanks for letting me know.

    Anyway, I have re-compiled the source code (with merged master) and now it works for me with C:\dev\fs_tests_₿_🏃\datadir path. So it looks good. (I’m not entirely sure what was wrong the first time. Sorry for confusion.)

  107. kiminuo commented at 9:57 am on October 9, 2021: contributor
    ACK 6544ea5035268025207d2402db2f7d90fde947a6
  108. MarcoFalke added the label DrahtBot Guix build requested on Oct 11, 2021
  109. DrahtBot commented at 2:56 pm on October 14, 2021: contributor

    Guix builds

    File commit 5b7210c8745d9572fe94620f848d4ee1304c91a7(master) commit 143235bb9e8bdae1f06c73bb40432a5dcef7fed5(master and this pull)
    SHA256SUMS.part c79a21dc3da6e71b... b983ebfb72b583d8...
    *-aarch64-linux-gnu-debug.tar.gz 108b2ef989bd0d35... 3898376d7605da5d...
    *-aarch64-linux-gnu.tar.gz a5bacdca50e2c3f0... 8a95a295c0788b6a...
    *-arm-linux-gnueabihf-debug.tar.gz b21f70b81f75b1f6... 2a6531d4ec7722d0...
    *-arm-linux-gnueabihf.tar.gz 6c79c07aa1d2df22... 08265d99eab44622...
    *-osx-unsigned.dmg e15275439c514402... fe44836857a69d76...
    *-osx-unsigned.tar.gz 23080798bd4790d0... 54e7d3c2230ef7bd...
    *-osx64.tar.gz f4afcac2e217616d... 9cbeb8b52d026dee...
    *-powerpc64-linux-gnu-debug.tar.gz 7bb92bb01f337a75... 7995f70142a230c2...
    *-powerpc64-linux-gnu.tar.gz 085787f59f3c1a58... dad8aef36d96bec4...
    *-powerpc64le-linux-gnu-debug.tar.gz 2732fa036e1ba621... 1cae331b80b8d374...
    *-powerpc64le-linux-gnu.tar.gz 495e1c10d8950c1a... 30347fbdb5a1f1de...
    *-riscv64-linux-gnu-debug.tar.gz 82c896f2e942879b... 39b532b2edc99221...
    *-riscv64-linux-gnu.tar.gz ebc9d158e7fe286e... 8f76c93958777a8d...
    *-win-unsigned.tar.gz 4a871e218e240ccb... 70f7493a26b2abd0...
    *-win64-debug.zip 0ac6d7beea9e7a67... f108c4ac98fb6df1...
    *-win64-setup-unsigned.exe 73d5ba8460a67851... be36ccc82ed58c0b...
    *-win64.zip 104adfb4cd021131... 1e316e8ba80be236...
    *-x86_64-linux-gnu-debug.tar.gz 896c0ecaefff84eb... baab942104ce73f7...
    *-x86_64-linux-gnu.tar.gz 9cc9de13c910b2a4... ad1c11a6bd713e48...
    *.tar.gz 0d51a99bd81f38ec... eba1905c58e6ac72...
    guix_build.log 463c2d4bbaa53638... 3df07f1a2625978b...
    guix_build.log.diff e06b14caa048bdb8...
  110. DrahtBot removed the label DrahtBot Guix build requested on Oct 14, 2021
  111. laanwj commented at 8:01 am on October 15, 2021: member
    Code review ACK 6544ea5035268025207d2402db2f7d90fde947a6
  112. laanwj merged this on Oct 15, 2021
  113. laanwj closed this on Oct 15, 2021

  114. Fabcien referenced this in commit c68977858e on Jan 11, 2022
  115. Fabcien referenced this in commit 21a1f1ba22 on Jan 11, 2022
  116. Fabcien referenced this in commit f2a46b8255 on Jan 11, 2022
  117. Fabcien referenced this in commit 4bed870fb6 on Jan 11, 2022
  118. Fabcien referenced this in commit 2665ae4c45 on Jan 11, 2022
  119. Fabcien referenced this in commit e5645bc7fa on Jan 11, 2022
  120. Fabcien referenced this in commit 3758d64b90 on Jan 11, 2022
  121. Fabcien referenced this in commit feb2a01c35 on Jan 11, 2022
  122. dekm referenced this in commit fe2293e6c1 on Oct 27, 2022
  123. delta1 referenced this in commit 2b8eef42dd on May 11, 2023
  124. bitcoin locked this on Aug 9, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me