Improve fs::PathToString documentation #23522

pull ryanofsky wants to merge 1 commits into bitcoin:master from ryanofsky:pr/fsd changing 3 files +34 −21
  1. ryanofsky commented at 1:46 am on November 16, 2021: member

    Add a developer note about avoiding fs::PathToString in RPCs, and improve some other fs::PathToString comments.

    Developer note might have been useful in two recent review comments:

  2. Improve fs::PathToString documentation 9b575f1c73
  3. DrahtBot added the label UTXO Db and Indexes on Nov 16, 2021
  4. unknown approved
  5. MarcoFalke added the label Docs on Nov 16, 2021
  6. MarcoFalke removed the label UTXO Db and Indexes on Nov 16, 2021
  7. MarcoFalke commented at 7:35 am on November 16, 2021: member
    Is there any way to enforce this without looking at the code with human eyes? Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?
  8. laanwj commented at 9:52 am on November 16, 2021: member

    Documentation review ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4

    Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?

    Yes, especially the Windows tests will fail, as the situation is most dire there. Both the functional and unit tests, as they both have some tests that use unicode characters in paths (that are not in any of the “narrow” codepages).

  9. hebasto approved
  10. hebasto commented at 11:05 am on November 16, 2021: member

    ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4

    Thanks for improving documentation on that topic!

  11. in src/fs.h:96 in 9b575f1c73
    93@@ -94,31 +94,34 @@ static inline path operator+(path p1, path p2)
    94 
    95 /**
    96  * Convert path object to byte string. On POSIX, paths natively are byte
    


    shaavan commented at 1:19 pm on November 16, 2021:
    0 * Convert path object to a byte string. On POSIX, paths natively are byte
    

    fanquake commented at 10:05 am on January 11, 2022:
    Done in #20744.
  12. in src/fs.h:100 in 9b575f1c73
     93@@ -94,31 +94,34 @@ static inline path operator+(path p1, path p2)
     94 
     95 /**
     96  * Convert path object to byte string. On POSIX, paths natively are byte
     97- * strings so this is trivial. On Windows, paths natively are Unicode, so an
     98- * encoding step is necessary.
     99+ * strings, so this is trivial. On Windows, paths natively are Unicode, so an
    100+ * encoding step is necessary. The inverse of \ref PathToString is \ref
    101+ * PathFromString. The strings returned and parsed by these functions can be
    102+ * used to call POSIX APIs, and for roundtrip conversion, logging, and
    


    shaavan commented at 1:19 pm on November 16, 2021:
    0 * used to call POSIX APIs and for roundtrip conversion, logging, and
    

    fanquake commented at 10:05 am on January 11, 2022:
    I don’t think this is more correct, have left out of #20744.
  13. in src/fs.h:108 in 9b575f1c73
    124+ * Because \ref PathToString and \ref PathFromString functions don't specify an
    125+ * encoding, they are meant to be used internally, not externally. They are not
    126+ * appropriate to use in applications requiring UTF-8, where
    127+ * fs::path::u8string() and fs::u8path() methods should be used instead. Other
    128+ * applications could require still different encodings. For example, JSON, XML,
    129+ * or URI applications might prefer to use higher level escapes (\uXXXX or
    


    shaavan commented at 1:19 pm on November 16, 2021:
    0 * or URI applications might prefer to use higher-level escapes (\uXXXX or
    

    fanquake commented at 10:05 am on January 11, 2022:
    Done in #20744.
  14. shaavan approved
  15. shaavan commented at 1:21 pm on November 16, 2021: contributor

    ACK 9b575f1c734c052b695ce921fb6412b22c18fdb4

    This PR intends to add documentation for the PathToString functions. This PR:

    • Clarifies where it is safe to use this command and where it used to be replaced with other functions.
    • Rewords the comments in the fs.h file detailing pathToString usage to make them more logically and verbally sound.
    • Move the Windows and POSIX implementation notes under the PathToString function

    The documentation is nicely worded and easy to understand. There is no critical grammar error in the wording. However, I found some minor grammatical corrections that you might consider if you update the PR for some other crucial reason.

  16. fanquake merged this on Nov 17, 2021
  17. fanquake closed this on Nov 17, 2021

  18. ryanofsky commented at 3:28 am on November 17, 2021: member

    re: #23522 (comment)

    Is there any way to enforce this without looking at the code with human eyes? Are the tests going to fail if we switch to std filesystem and have an incorrect string conversion?

    I think ideally PathToString and PathFromString functions would be eliminated. I wrote these functions to allow switching from boost::filesystem to std::filesystem while preserving current behavior, but current behavior is actually a little broken.

    PathToString is mostly used for logging, but this is not ideal because the result is usually not quoted, and paths can contain spaces, quotes, punctuation and text that make resulting log messages ambiguous. They could also contain arbitrary characters including newlines, invalid UTF-8 sequences, invalid multibyte sequences that break log parsing. I think ideally these PathToString calls would be replaced with calls to a new PathToLogString function, and that function would add surrounding quotes around paths, replace internal quotes with \", replace special ASCII characters like tabs and newlines with C \t \n escapes, replace non-latin unicode characters with \uXXXX escapes on windows, and replace non-latin bytes with \xXX escapes on POSIX.

    PathFromString is mostly used to parse command line and configuration arguments. I think ideally these calls would be removed and replaced with calls to a new ArgsManager::GetPathArg method. The method would treat command line arguments and configuration file paths as bytes on POSIX with no encoding or decoding. On windows, it would decode configuration file paths as UTF-8, and it would either decode command line arguments according to the current code page, or it would bypass the code page encoding and decoding done by the windows C library by overriding wmain and getting the original unicode arguments passed by the parent process in wchar_t *argv[]

    So ideally we could get rid of PathToString to PathFromString entirely. But more narrowly if we just wanted to try minimize new encoding bugs and avoid PathToString results being passed to the UniValue constructor, we would need to use distinct string types like std::string for byte strings and std::u8string for unicode strings and make it an error to construct a UniValue from a byte string. We could also try to renaming the PathToString function to discourage use. Maybe to something like PathToByteString, PathToInternalString, PathToSystemString, LegacyPathToString

  19. ryanofsky commented at 3:38 am on November 17, 2021: member
    Thanks for all reviews and thanks @shaavan for suggested edits. I didn’t get to apply them here, but they could be used in another followup or maybe #20744
  20. sidhujag referenced this in commit 867659eaf6 on Nov 17, 2021
  21. DrahtBot locked this on Jan 11, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me