refactor: Remove pre-C++20 code, fs::path cleanup #29040

pull maflcko wants to merge 6 commits into bitcoin:master from maflcko:2312-fs-cleanup- changing 13 files +48 −44
  1. maflcko commented at 12:19 pm on December 9, 2023: member

    This:

    • Removes dead code.
    • Avoids unused copies in some places.
    • Adds copies in other places for safety.
  2. DrahtBot commented at 12:19 pm on December 9, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, stickies-v, achow101
    Concept ACK hebasto
    Stale ACK vasild

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot renamed this:
    refactor: Remove pre-C++20 code, fs::path cleanup
    refactor: Remove pre-C++20 code, fs::path cleanup
    on Dec 9, 2023
  4. DrahtBot added the label Refactoring on Dec 9, 2023
  5. maflcko force-pushed on Dec 9, 2023
  6. DrahtBot added the label CI failed on Dec 9, 2023
  7. DrahtBot removed the label CI failed on Dec 9, 2023
  8. hebasto commented at 2:17 pm on December 9, 2023: member

    Concept ACK.

    This silences the following (clang 18):

    0common/args.cpp:288:31: error: returning variable 'm_cached_blocks_path' by reference requires holding mutex 'cs_args' [-Werror,-Wthread-safety-reference-return]
    1 288 |     if (!path.empty()) return path;
    2     |                               ^
    

    Does it happen on the master branch?

  9. maflcko force-pushed on Dec 9, 2023
  10. vasild commented at 9:45 am on December 11, 2023: contributor

    Does it happen on the master branch? @hebasto, yes, see https://github.com/bitcoin/bitcoin/pull/28774

  11. in src/util/fs.h:18 in facc08096d outdated
    13@@ -14,6 +14,8 @@
    14 #include <ios>
    15 #include <ostream>
    16 #include <string>
    17+#include <system_error>
    18+#include <type_traits>
    


    vasild commented at 9:59 am on December 11, 2023:

    nit: the commit message has comments:

    0 #include <system_error>  // for error_code
    1 #include <type_traits>   // for is_same
    

    personally I like them and find them useful.

  12. in src/util/fs.h:62 in 0d66eea93f outdated
    61-        // (C++20). Convert both to std::string. This method can be removed
    62-        // after switching to C++20.
    63+        const std::u8string& utf8_str{std::filesystem::path::u8string()};
    64+        // Convert to std::string as a convenience for use in RPC code.
    65         return std::string{utf8_str.begin(), utf8_str.end()};
    66     }
    


    vasild commented at 10:12 am on December 11, 2023:

    What about the comment that says “This method can be removed after switching to C++20”? I think indeed this method should be removed entirely.

    In C++17 its prototype, according to the spec is std::string u8string() const which matches ours and is ok. But in C++20 it is: std::u8string u8string() const and it is confusing to have a mismatch. I would better drop this method like the comment says and then fix problems (if any) that arise due to that.


    maflcko commented at 3:43 pm on December 11, 2023:

    “This method can be removed after switching to C++20” was written by me and I think it no longer makes sense.

    But in C++20 it is: std::u8string u8string() const and it is confusing to have a mismatch.

    I think this is fine. fs::path is not an exact API-copy of std::filesystem::path. Some methods are different, or deleted, so this mismatch should be fine.

    If a u8string was returned here, UniValue would have to be updated to take it as well, in which case the conversion would need to happen there. Other parts of the codebase may have to be adjusted as well. I don’t think forcing conversions all over the codebase adds any value, but I am happy to change my view, if there are any good reasons.


    vasild commented at 4:52 pm on December 11, 2023:

    I need to understand this better. What I grasp now is that in C++17 it was ok to store UTF-8 strings in std::string and in C++20 it is not and one has to use std::u8string instead?

    This smells because if callers need std::string why do they call u8string() and not string()!? Is it because it was ok to do that in C++17 because in C++17 std::filesystem::path::string() would return std::string itself and on top of this it was needed because of some complication on Windows?

    On Windows filesystem::path uses wchar_t and it has to be somehow converted to std::string<char>, I guess:

    1. filesystem::path::string() which does that conversion is not ok
    2. filesystem::path::u8string() converts wchar_t to 2.1. C++17: char from which we convert it explicitly to char which is a noop 2.2. C++20: char8_t from which we convert it explicitly to char. Is that ok? Why C++20 has std::u8string if we can store UTF-8 strings in std::string?

    What would be an example in C++20 where it is broken by calling string() (1.) and ok by converting wchar_t->char8_t->char (2.2)?


    maflcko commented at 4:56 pm on December 11, 2023:

    I need to understand this better. What I grasp now is that in C++17 it was ok to store UTF-8 strings in std::string and in C++20 it is not and one has to use std::u8string instead?

    The underlying data structure is just a container to hold bytes. If the bytes are identical, which they are (in this function), it only matters for readers of the code what the data structure is called.

    The important part to getting the right bytes is to call the right function, which is called u8string in all versions of C++ and in this file.


    maflcko commented at 5:10 pm on December 11, 2023:

    2.2. C++20: char8_t from which we convert it explicitly to char. Is that ok? Why C++20 has std::u8string if we can store UTF-8 strings in std::string?

    Yes, conversion should be fine. See https://stackoverflow.com/a/57453713


    maflcko commented at 5:11 pm on December 11, 2023:

    What would be an example in C++20 where it is broken by calling string() (1.) and ok by converting wchar_t->char8_t->char (2.2)?

    This has nothing to do with C++20. If you call the wrong function string() vs u8string(), you will also get the wrong result with C++17.


    vasild commented at 8:51 am on December 12, 2023:

    This has nothing to do with C++20

    ok, rephrase (I mentioned C++20 because C++17 does not have char8_t):

    What would be an example where it is broken by calling string() (1.) and ok by converting wchar_t->char8_t->char (2.2)?


    maflcko commented at 11:51 am on December 12, 2023:

    What would be an example where it is broken by calling string()

    Any place, which correctly calls u8string(), and instead incorrectly calls string() is broken. For example, the code may be broken on systems whose native encoding is not UTF-8.

    See also

     0diff --git a/src/test/fs_tests.cpp b/src/test/fs_tests.cpp
     1index 7cfecb2b22..34168a551b 100644
     2--- a/src/test/fs_tests.cpp
     3+++ b/src/test/fs_tests.cpp
     4@@ -19,7 +19,7 @@ BOOST_AUTO_TEST_CASE(fsbridge_pathtostring)
     5 {
     6     std::string u8_str = "fs_tests_₿_🏃";
     7     BOOST_CHECK_EQUAL(fs::PathToString(fs::PathFromString(u8_str)), u8_str);
     8-    BOOST_CHECK_EQUAL(fs::u8path(u8_str).u8string(), u8_str);
     9+    BOOST_CHECK_EQUAL(std::filesystem::path{fs::u8path(u8_str)}.string(), u8_str);
    10     BOOST_CHECK_EQUAL(fs::PathFromString(u8_str).u8string(), u8_str);
    11     BOOST_CHECK_EQUAL(fs::PathToString(fs::u8path(u8_str)), u8_str);
    12 #ifndef WIN32
    
    0unknown location(0): fatal error: in "fs_tests/fsbridge_pathtostring": class std::system_error: No mapping for the Unicode character exists in the target multi-byte code page.
    

    https://github.com/maflcko/bitcoin-core-with-ci/actions/runs/7180069987/job/19553642111#step:23:477


    ryanofsky commented at 7:31 pm on December 13, 2023:

    In commit “refactor: Remove pre-C++20 fs code” (fa3d9304e80c214c8b073f12a7f4b08c5a94af04)

    re: #29040 (review)

    If a u8string was returned here, UniValue would have to be updated to take it as well, in which case the conversion would need to happen there.

    In theory, no conversions are actually required, right? In theory UniValue could use u8string instead of string internally, and other parts of the codebase could too, so nothing would need to be converted?

    In any case, I think going forward it will be confusing to have a u8string method that doesn’t actually return a u8string. The method isn’t called very many places, so maybe we could follow this up with a scripted-diff renaming u8string to utf8sting or something like that, to avoid confusing it with the standard method.

    Renaming might also be good since it should make the standard u8string method easier to call and allow getting the utf8 paths more efficiently without {utf8_str.begin(), utf8_str.end()} copies.


    vasild commented at 9:18 am on December 14, 2023:

    confusing to have a u8string method that doesn’t actually return a u8string

    My concern exactly!


    maflcko commented at 11:47 am on December 14, 2023:

    In theory, no conversions are actually required, right? In theory UniValue could use u8string instead of string internally, and other parts of the codebase could too, so nothing would need to be converted?

    Not sure if it makes sense to update the whole codebase from std::string to std::u8string, just so that UniValue can accept an std::u8string. I think it would be fine for UniValue to accept both string types, though, then one would have to be “converted”/copied to the other (I don’t think reinterpret_cast+std::move is guaranteed to work from char strings to char8_t strings, is it?)


    ryanofsky commented at 2:26 pm on December 14, 2023:

    re: #29040 (review)

    Not sure if it makes sense to update the whole codebase from std::string to std::u8string, just so that UniValue can accept an std::u8string. I think it would be fine for UniValue to accept both string types, though, then one would have to be “converted”/copied to the other (I don’t think reinterpret_cast+std::move is guaranteed to work from char strings to char8_t strings, is it?)

    Yeah I don’t think it makes sense to make an effort to convert code to std::u8string. To explain “In theory UniValue could use u8string”, I mean the UniValue class could switch to std::u8string members internally, and it could accept u8string values without converting, and it could return both string_view and u8string_view values without converting (just casting). The downside is it would need a conversion step for accepting std::string values, which would not be great right now because the rest of our code base is using std::string. But maybe later if we have new code using u8string it might make sense to switch UniValue then. It could also make sense to switch UniValue to use std::u8string if it turns out there an efficient way to move a std::string value into a std::u8string object, but I think you are right that the standard doesn’t provide a way to do this now and the reinterpret_cast + move option would technically be undefined behavior even if it works in practice.


    vasild commented at 3:15 pm on December 14, 2023:
    Or replace UniValue with https://github.com/nlohmann/json
  13. vasild commented at 10:13 am on December 11, 2023: contributor
    Almost ACK 0d66eea93f1e115b2e9e00ee2e89cd967f826d22, modulo the comment below about the u8string() method.
  14. DrahtBot requested review from hebasto on Dec 11, 2023
  15. refactor: Use C++20 std::chrono::days faea30227b
  16. refactor: Avoid copy/move in fs.h
    The operator accepts a const& reference, so no copy or move is needed.
    See https://en.cppreference.com/w/cpp/filesystem/path/append
    fa2bac08c2
  17. Add tests for C++20 std::u8string
    Also, add missing includes:
    
     #include <system_error>  // for error_code
     #include <type_traits>   // for is_same
    
     #include <cerrno>        // for errno
    fa00098e1a
  18. refactor: Remove pre-C++20 fs code
    Treating std::string as UTF-8 is deprecated in std::filesystem::path
    since C++20.
    
    However, it makes this codebase easier to read and maintain to retain
    the ability for std::string to hold UTF-8.
    fa3d9304e8
  19. ArgsManager: return path by value from GetBlocksDirPath()
    `ArgsManager::m_cached_blocks_path` is protected by
    `ArgsManager::cs_args` and returning a reference to it after releasing
    the mutex is unsafe.
    
    To resolve this, return a copy of the path. This has some performance
    penalty which is presumably ok, given that paths are a few 100s bytes
    at most and `GetBlocksDirPath()` is not called often.
    
    This silences the following (clang 18):
    
    ```
    common/args.cpp:288:31: error: returning variable 'm_cached_blocks_path' by reference requires holding mutex 'cs_args' [-Werror,-Wthread-safety-reference-return]
      288 |     if (!path.empty()) return path;
          |                               ^
    ```
    
    Do the same with
    `ArgsManager::GetDataDir()`,
    `ArgsManager::GetDataDirBase()` and
    `ArgsManager::GetDataDirNet()`.
    856c88776f
  20. maflcko force-pushed on Dec 11, 2023
  21. vasild approved
  22. vasild commented at 4:55 pm on December 11, 2023: contributor

    ACK 856c88776f8486446602476a1c9e133ac0cff510

    I realized that this discussion #29040 (review) is not a blocker for this PR because it merely removes a comment and changes const auto to const u8string which is noop, just a style thing.

  23. fanquake requested review from ryanofsky on Dec 13, 2023
  24. stickies-v approved
  25. stickies-v commented at 1:04 pm on December 13, 2023: contributor
    ACK 856c88776f8486446602476a1c9e133ac0cff510
  26. in src/common/args.cpp:280 in 856c88776f outdated
    276@@ -277,7 +277,7 @@ fs::path ArgsManager::GetPathArg(std::string arg, const fs::path& default_value)
    277     return result.has_filename() ? result : result.parent_path();
    278 }
    279 
    280-const fs::path& ArgsManager::GetBlocksDirPath() const
    281+fs::path ArgsManager::GetBlocksDirPath() const
    


    ryanofsky commented at 8:13 pm on December 13, 2023:

    In commit “ArgsManager: return path by value from GetBlocksDirPath()” (856c88776f8486446602476a1c9e133ac0cff510)

    This change looks ok, but it seems like the clang-18 warning is a false alarm. We know know m_cached_blocks_path will never change after it is written, and we always acquire the cs_args lock before reading m_cached_blocks_path so we can be sure that by the time we do read it we are necessarily seeing the latest value. I don’t see how there could be a race condition here, because it shouldn’t matter how long reader threads hold the lock, as long as they are acquire the lock before reading and we know the value never changes.

    Maybe declaring the fs::path& path const, and writing to m_cached_blocks_path directly could make the warning go away?


    vasild commented at 9:27 am on December 14, 2023:

    The warning tells us that the code is potentially unsafe. Not that it has a certain bug now. It is similar to having a variable declared with GUARDED_BY(), then accessing it without the mutex in such a way that it is safe.

    Maybe declaring the fs::path& path const, and writing to m_cached_blocks_path directly could make the warning go away?

    Would that be like tricking the compiler, going around the warning without actually removing the underlying potential problem?


    ryanofsky commented at 2:19 pm on December 14, 2023:

    re: #29040 (review)

    Maybe declaring the fs::path& path const, and writing to m_cached_blocks_path directly could make the warning go away?

    Would that be like tricking the compiler, going around the warning without actually removing the underlying potential problem?

    I’m actually not sure the change I’m suggesting would be enough to silence the compiler warning, so it may be a moot point. But the compiler just has incomplete information. It has no way of knowing that m_cached_blocks_path string is not written again after the first access. So I think suppressing the warning or declaring the reference const would not be tricks really, just ways of giving the compiler instructions or information.

    I do think this change is fine as is. I was mostly just curious if the warning was actually a false-positive, since it does seem like one.

  27. ryanofsky approved
  28. ryanofsky commented at 8:20 pm on December 13, 2023: contributor

    Code review ACK 856c88776f8486446602476a1c9e133ac0cff510

    I suggested a scripted diff followup, but all these changes seem good, and the followup could happen later. I am also a little disturbed that 856c88776f8486446602476a1c9e133ac0cff510 doesn’t begin with fa but I think I can get over it.

  29. stickies-v commented at 11:52 am on December 14, 2023: contributor

    but I think I can get over it.

    🎅 🎄

  30. maflcko commented at 12:22 pm on December 14, 2023: member

    but I think I can get over it.

    🎅 🎄

    It wasn’t written by me, so I not setting the prefix. This makes it easier for me to (fuzzily) recall which commits were authored by me and whose weren’t, by just looking at the prefix.

  31. maflcko commented at 12:23 pm on December 14, 2023: member
    I pushed a commit, but it is not a scripted-diff.
  32. in src/util/fs.h:63 in facdafaea8 outdated
    53@@ -54,7 +54,7 @@ class path : public std::filesystem::path
    54     // Disallow std::string conversion method to avoid locale-dependent encoding on windows.
    55     std::string string() const = delete;
    56 
    57-    std::string u8string() const
    58+    std::string utf8string() const
    


    ryanofsky commented at 2:50 pm on December 14, 2023:

    In commit “refactor: Rename fs::path::u8string() to fs::path::utf8string()” (facdafaea8cfc2a37e55173d0500794c80b28f7f) ~In commit “ArgsManager: return path by value from GetBlocksDirPath()” (856c88776f8486446602476a1c9e133ac0cff510)~

    Might be good to add documention for this method. Maybe “Return a UTF-8 representation of the path as a std::string, for compatibility with code using std::string. For code using the newer std::u8string type, it is more efficient to call the inherited std::filesystem::path::u8string method instead.” to explain why this exists and also point out that there is an alternative.


    maflcko commented at 3:23 pm on December 14, 2023:

    In commit “ArgsManager: return path by value from GetBlocksDirPath()” (https://github.com/bitcoin/bitcoin/commit/856c88776f8486446602476a1c9e133ac0cff510)

    Wrong commit? :)

    documention

    Thanks, copy-pasted this and removed my one-line comment.

  33. ryanofsky approved
  34. ryanofsky commented at 3:05 pm on December 14, 2023: contributor

    Code review ACK facdafaea8cfc2a37e55173d0500794c80b28f7f.

    Only change since last review is a new commit renaming the fs::path::u8string method.

  35. DrahtBot requested review from stickies-v on Dec 14, 2023
  36. DrahtBot requested review from vasild on Dec 14, 2023
  37. vasild approved
  38. vasild commented at 3:12 pm on December 14, 2023: contributor

    ACK facdafaea8cfc2a37e55173d0500794c80b28f7f

    To verify that nothing has been forgotten in the last commit which renames fs::path::u8string() to fs::path::utf8string() I added std::u8string u8string() const = delete;. The only call to that function is in BOOST_CHECK(fs::path(str8).u8string() == str8); which I guess is intentional.

  39. refactor: Rename fs::path::u8string() to fs::path::utf8string() 6666713041
  40. maflcko force-pushed on Dec 14, 2023
  41. maflcko commented at 3:25 pm on December 14, 2023: member

    To verify that nothing has been forgotten in the last commit which renames fs::path::u8string() to fs::path::utf8string() I added std::u8string u8string() const = delete;

    This should not be needed, because the two functions return different and incompatible types, so any oversight should result in a compile failure as-is.

  42. sipa commented at 3:26 pm on December 14, 2023: member
    @vasild FWIW, Bitcoin Core used to use json-spirit before the introduction of UniValue. The motivating argument was being able to access the string representation of numeric value (because when amounts are passed as JSON “numbers”, we don’t want to roundtrip through double before converting to CAmount).
  43. ryanofsky approved
  44. ryanofsky commented at 3:46 pm on December 14, 2023: contributor

    Code review ACK 66667130416b86208e01a0eb5541a15ea805ac26. Just documentation change since last review.

    To clarify and clean things up more, we could also document the fs::path::utf8string() method and fs::u8path() function as being inverses of each other, and maybe rename fs::u8path to fs::utf8path for consistency. I don’t think either of these things are necessary though, just thoughts for improvement.

  45. DrahtBot requested review from vasild on Dec 14, 2023
  46. maflcko commented at 6:23 pm on December 14, 2023: member

    maybe rename fs::u8path to fs::utf8path for consistency

    I think it is nice that it shadows the deprecated “constructor”, but I am happy to review a follow-up. (Can be a scripted-diff, because no use of the deprecated u8path remain in the code)

  47. stickies-v approved
  48. stickies-v commented at 8:34 pm on December 14, 2023: contributor
    re-ACK 66667130416b86208e01a0eb5541a15ea805ac26
  49. achow101 commented at 9:36 pm on December 14, 2023: member
    ACK 66667130416b86208e01a0eb5541a15ea805ac26
  50. achow101 merged this on Dec 14, 2023
  51. achow101 closed this on Dec 14, 2023

  52. maflcko deleted the branch on Dec 14, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me