Make (Read/Write)BinaryFile work with char vector, use AutoFile #29229

pull Sjors wants to merge 1 commits into bitcoin:master from Sjors:2024/01/rw-binary-file changing 7 files +138 −93
  1. Sjors commented at 3:10 pm on January 11, 2024: member

    ReadBinaryFile and WriteBinaryFile current work with std::string. This PR adds support for std::vector<unsigned char>>.

    It also uses AutoFile now.

    This is [update: probably not] used in #28983 to store the static key for the Template Provider, in a manner very similar to how we store the Tor v3 and i2p key. However it made no sense to me to store a CKey as plain text. See commit “Persist static key for Template Provider” for how it’s used.

    It uses a template and leverages the fact that both std::string and std::vector<unsigned char>> have an insert() method that can take a char array.

    The unsigned char support is not used in this PR, but tests do cover it.

  2. DrahtBot commented at 3:11 pm on January 11, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. Sjors commented at 3:15 pm on January 11, 2024: member
    cc @vasild
  4. in src/util/readwritefile.h:26 in 634f29f2ef outdated
    27+ *                 is larger than this, truncated data (with len > maxsize) will be returned.
    28+ * @return true if successful, false if an error occured.
    29  */
    30-std::pair<bool,std::string> ReadBinaryFile(const fs::path &filename, size_t maxsize=std::numeric_limits<size_t>::max());
    31+template <typename T>
    32+bool ReadBinaryFile(const fs::path& filename, T& output, size_t maxsize=std::numeric_limits<size_t>::max());
    


    maflcko commented at 3:26 pm on January 11, 2024:
    If you change the return type, why not use std::optional? Otherwise, it seems better to keep it untouched?

    Sjors commented at 3:29 pm on January 11, 2024:

    I had to change it because AFAIK you can’t differentiate functions only by their return type. It’s also more readable imo, no more ->first->second.

    std::optional<T> won’t work, unless someone can explain how…


    Sjors commented at 3:32 pm on January 11, 2024:

    However, functions can not be overloaded if they differ only in the return type.

    https://www.geeksforgeeks.org/function-overloading-and-return-type-in-cpp/


    maflcko commented at 3:41 pm on January 11, 2024:

    std::optional<T> won’t work, unless someone can explain how…

    What about this:

     0#include <vector>
     1#include <optional>
     2#include <string>
     3#include <cstdint>
     4
     5template<class T>
     6std::optional<T> ReadBinaryFile()
     7{
     8    return {};
     9}
    10 
    11int main()
    12{
    13    auto a{ReadBinaryFile<std::string>()};
    14    auto b{ReadBinaryFile<std::vector<uint8_t>>()};
    15}
    

    Sjors commented at 3:52 pm on January 11, 2024:
    I’ll give that a try tomorrow.
  5. Sjors force-pushed on Jan 11, 2024
  6. Sjors commented at 3:51 pm on January 11, 2024: member

    Added extern template ... to the header to prevent “implicit instantiation of undefined template” error on the “no wallet, libbitcoinkernel” CI (https://cirrus-ci.com/task/6254856449556480)

    That error doesn’t happen on my machine running macOS clang 15, nor on Ubuntu gcc 13.2 - maybe a specific configure warning flag?

    Update: that didn’t work, still not sure how to reproduce.

    0 util/readwritefile.cpp:23:84: error: implicit instantiation of undefined template 'std::vector<unsigned char>'
    1        const size_t n = fread(buffer, 1, std::min(sizeof(buffer), maxsize - output.size()), f);
    
  7. DrahtBot added the label CI failed on Jan 11, 2024
  8. DrahtBot commented at 3:51 pm on January 11, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/20390186031

  9. Sjors force-pushed on Jan 12, 2024
  10. Sjors force-pushed on Jan 12, 2024
  11. Sjors commented at 8:26 am on January 12, 2024: member
    I switched to using std::optional as the return type. It happily compiles on my end, we’ll see what CI thinks…
  12. Sjors force-pushed on Jan 12, 2024
  13. Sjors commented at 8:47 am on January 12, 2024: member
    Had to #include <vector> to please clang13 (should have included it anyway since it’s used).
  14. DrahtBot removed the label CI failed on Jan 12, 2024
  15. DrahtBot added the label CI failed on Jan 15, 2024
  16. Sjors force-pushed on Jan 15, 2024
  17. DrahtBot removed the label CI failed on Jan 15, 2024
  18. in src/Makefile.test.include:129 in ac5a83b91f outdated
    125@@ -126,6 +126,7 @@ BITCOIN_TESTS =\
    126   test/raii_event_tests.cpp \
    127   test/random_tests.cpp \
    128   test/rbf_tests.cpp \
    129+  test/readwritefile_tests.cpp \
    


    vasild commented at 1:09 pm on January 18, 2024:

    nit: in the commit message:

    ReadBinaryFile and WriteBinaryFile current work with std::string

    s/current/currently/

    Would be nice to wrap lines in the commit message at 72 columns.

  19. in src/util/readwritefile.h:23 in ac5a83b91f outdated
    23- * @param maxsize Puts a maximum size limit on the file that is read. If the file is larger than this, truncated data
    24- *         (with len > maxsize) will be returned.
    25+ * @param[in] filename Filename. Returns false it doesn't exist.
    26+ * @param[in] maxsize  Puts a maximum size limit on the file that is read. If the file
    27+ *                 is larger than this, truncated data (with len > maxsize) will be returned.
    28+ * @return result successful, {} otherwise
    


    vasild commented at 1:20 pm on January 18, 2024:

    nit:

    0 * [@return](/bitcoin-bitcoin/contributor/return/) result if successful, std::nullopt otherwise
    
  20. vasild commented at 3:12 pm on January 18, 2024: contributor

    However it made no sense to me to store a CKey as plain text

    I guess that by “plain text” here you mean std::string, right? std::string can store arbitrary non-ASCII characters, including '\0', so it is technically ok to use it for binary data.

    More relevant in this case is that CKey stores sensitive data and takes care to wipe it from memory when freed. In #28983 Read/WriteBinaryData() is used in a way that defeats that - the sensitive data will be copied to a temporary variable (ordinary std::vector) for IO. Can we change Read/WriteBinaryData() to operate directly on CKey in such a way that data goes directly from CKey to the disk?

  21. maflcko commented at 3:19 pm on January 18, 2024: member

    Can we change Read/WriteBinaryData() to operate directly on CKey in such a way that data goes directly from CKey to the disk?

    I haven’t looked in detail, but writing bytes to a file can be achieved with one line of code:

    0CAutoFile{...} << Span{data};
    
  22. Sjors commented at 9:11 am on January 22, 2024: member

    I like the idea of operating on CKey directly. I’ll try to add that, though not sure how to implement that securely.

    The keys used in #28983 are not as important as wallet keys, but if we add a generic method to store a CKey on disk, then it should do so securely - so that future devs using that function don’t shoot themselves in the foot. At least the current std::vector<unsigned char>> does not pretend to be secure.

  23. Sjors commented at 9:15 am on January 22, 2024: member

    I guess that by “plain text” here you mean std::string, right? std::string can store arbitrary non-ASCII characters, including '\0', so it is technically ok to use it for binary data.

    I think the way WriteBinaryFile was initially used with Tor was that we parse the JSON returned by the Tor daemon and store the PrivateKey key field (UTF8 encoced?) in a file.

  24. Sjors commented at 9:20 am on January 22, 2024: member

    I haven’t looked in detail, but writing bytes to a file can be achieved with one line of code

    That would be a nice simplification. Almost (?) to the point of not needing these helper functions.

  25. maflcko commented at 1:20 pm on January 22, 2024: member

    I haven’t looked in detail, but writing bytes to a file can be achieved with one line of code

    That would be a nice simplification. Almost (?) to the point of not needing these helper functions.

    WriteBinaryFile is unused right now either way outside of tests, so I guess this could be removed regardless, as future code can just use the in-line one-liner?

  26. Sjors commented at 8:38 am on January 23, 2024: member

    @maflcko WriteBinaryFile is used by Tor and I2P to cache the service private key.

    But I might still close this PR if all that’s needed is one-liner.

  27. Sjors force-pushed on Jan 23, 2024
  28. Sjors marked this as a draft on Jan 23, 2024
  29. Sjors commented at 9:53 am on January 23, 2024: member

    It’s not quite a one-liner because you still need to open a close a FILE and deal with various errors. So instead I modified [Write/Write]BinaryFile to use AutoFile instead of fwrite / fread.

    However, AutoFile{f} >> output only returns a fraction of the file in the test…

  30. maflcko commented at 10:03 am on January 23, 2024: member

    However, AutoFile{f} >> output only returns a fraction of the file in the test…

    (De)serialization of vectors or strings assumes the run-time length to be encoded first. Only arrays and spans assume no length, because it is assumed to be known at compile-time.

  31. Sjors force-pushed on Jan 23, 2024
  32. DrahtBot added the label CI failed on Jan 23, 2024
  33. DrahtBot commented at 10:32 am on January 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/20764871422

  34. Sjors commented at 10:37 am on January 23, 2024: member

    Ok, fixed that issue by first getting the file size and then resizing the std::vector / std::string.

    That doesn’t generalise nicely to known-size things like CKey. The maxsize argument also makes no sense for things with known size. So I might make a separate method for that. Which in turn makes this PR just a refactor with unused, but tested, support for std::vector<unsigned char>.

    (this is ready for review even without CKey support, which I’ll add in a separate commit and/or PR when I get to it)

  35. Sjors marked this as ready for review on Jan 23, 2024
  36. Sjors renamed this:
    Make (Read/Write)BinaryFile work with char vector
    Make (Read/Write)BinaryFile work with char vector, use AutoFile
    on Jan 23, 2024
  37. Sjors force-pushed on Jan 23, 2024
  38. Sjors commented at 11:01 am on January 23, 2024: member

    I tried to find some existing code that could use the std::vector<unsigned char> variant. Didn’t find it at first glance. It seems we almost always know what size to expect (per object).

    So that might be a good reason to kill that variant and only support loading an std::string of unknown size for now.

  39. in src/util/readwritefile.cpp:43 in e4d0e3decd outdated
    65-    }
    66-    if (fclose(f) != 0) {
    67+    std::FILE *f = fsbridge::fopen(filename, "wb");
    68+    if (f == nullptr) return false;
    69+    try {
    70+        AutoFile{f} << Span{data};
    


    maflcko commented at 11:33 am on January 23, 2024:
    0    try {
    1        AutoFile{fsbridge::fopen(filename, "wb")} << Span{data};
    

    nit: Can be written shorter


    Sjors commented at 12:14 pm on January 23, 2024:
    Took me a while to wrap my head around the various calls involved, but I guess the nullptr check is handled in read(), which is called by the various ser_read... functions in seralize.h, which is called by the Unserialize implementations, which is called by <<. (presumably the same for writing)

    Sjors commented at 12:21 pm on January 23, 2024:
    On the read side I’m keeping the explicit nullptr check for now, so I don’t have to catch fs::file_size failure.

    Sjors commented at 12:23 pm on January 23, 2024:
    At the point WriteBinaryFile is so small you might as well have the called do the try - catch. But might as well keep if around as long as ReadBinaryFile can’t be made smaller.
  40. DrahtBot removed the label CI failed on Jan 23, 2024
  41. Make (Read/Write)BinaryFile work with char vector
    ReadBinaryFile and WriteBinaryFile currently work with std::string.
    This commit adds support for std::vector<unsigned char>>.
    
    It uses a template and leverages the fact that both std::string and
    std::vector<unsigned char>> have an insert() method that can take
    a char array.
    
    Also switch to using AutoFile.
    f9b134e2b2
  42. Sjors force-pushed on Jan 23, 2024
  43. Sjors commented at 1:52 pm on January 23, 2024: member

    I opened #29295 for CKey.

    I also refactored #28983 to use AutoFile directly, see https://github.com/Sjors/bitcoin/pull/32. So I no longer need this PR myself, but happy to continue it.

  44. in src/util/readwritefile.cpp:27 in f9b134e2b2
    40+    if (f == nullptr) return {};
    41+    T output{};
    42+    size_t file_size = fs::file_size(filename);
    43+    output.resize(std::min(file_size, maxsize));
    44+    try {
    45+        AutoFile{f} >> Span{output};
    


    vasild commented at 3:40 pm on January 23, 2024:
    I liked more the previous version which called fread(3) here. It was simple stupid. This >> is now hard to follow, especially given that it depends on T. For vector it ends up calling AutoFile::detail_fread(). It does not check whether ferror(3) occurred.

    maflcko commented at 4:40 pm on January 23, 2024:

    For vector it ends up calling AutoFile::detail_fread(). It does not check whether ferror(3) occurred.

    It does. If the return value of detail_fread is not output.size(), operator>> will fail.


    vasild commented at 10:35 am on January 24, 2024:

    It does not check whether ferror(3) occurred.

    It does.

    Where? There is no ferror(3) call. From the man page: “The function fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred.”

    If the return value of detail_fread is not output.size(), operator>> will fail.

    Yes, but it can be equal to the desired size under two conditions: eof, or error. The previous code distinguished between both.


    maflcko commented at 11:05 am on January 24, 2024:

    If the return value of detail_fread is not output.size(), operator>> will fail.

    Yes, but it can be equal to the desired size under two conditions: eof, or error.

    No, the eof-error would only be raised if read past the desired size, not to it. Unless I am missing something?

    I am asking, because if there was a bug, it should be fixed, or at least an issue should be filed.


    vasild commented at 11:27 am on January 24, 2024:
    That “must” in the man page is pretty clear: “callers must use feof(3) and ferror(3) to determine which occurred”. A ferror(3) check can’t hurt and it is better to have an extra check that always returns “no error” than a missing check, failing to detect an IO error. The previous code was doing that - a dumb fread(3) followed by an unconditional ferror(3).

    maflcko commented at 11:33 am on January 24, 2024:

    Yes, it is called to determine which error occurred, see https://github.com/bitcoin/bitcoin/blob/e69796c79c0aa202087a13ba62d9fbcc1c8754d4/src/streams.cpp#L27

    Again, if there is a bug in the current code in master, it should be fixed.


    vasild commented at 12:08 pm on January 24, 2024:

    I would extend the check to:

    0 void AutoFile::read(Span<std::byte> dst)
    1 {
    2-    if (detail_fread(dst) != dst.size()) {
    3+    if (detail_fread(dst) != dst.size() || std::ferror(m_file)) {
    4         throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed");
    

    vasild commented at 2:37 pm on January 24, 2024:

    if there is a bug in the current code in master, it should be fixed.

    Done in https://github.com/bitcoin/bitcoin/pull/29307


    Sjors commented at 10:27 am on January 25, 2024:
    Glad something useful came out of this PR :-)
  45. in src/util/readwritefile.cpp:41 in f9b134e2b2
    63-        fclose(f);
    64-        return false;
    65-    }
    66-    if (fclose(f) != 0) {
    67+    try {
    68+        AutoFile{fsbridge::fopen(filename, "wb")} << Span{data};
    


    vasild commented at 3:48 pm on January 23, 2024:

    Same as above, the dumb version was easier to follow. The new one does not check whether fclose(3) failed, but it should. I think that is a serious deficiency in AutoFile itself.

    fwrite(3) may succeed, but if a subsequent fclose(3) fails we should consider the data did not make it safely to disk and that the file is corrupted (fclose(3) writes any buffered data to disk using fflush(3), so a failure at fclose(3) is as bad as failure at fwrite(3)).


    vasild commented at 2:36 pm on January 24, 2024:

    I think that is a serious deficiency in AutoFile itself

    Logged as https://github.com/bitcoin/bitcoin/pull/29307


    Sjors commented at 10:28 am on January 25, 2024:
    @vasild I would also prefer to fix things in AutoFile, but perhaps add a few comments to explain what happens under the hood?

    vasild commented at 11:10 am on January 25, 2024:

    This could happen:

    1. fwrite() succeeds, returns ok the caller, some of the data is buffered in the OS and not yet on disk
    2. fclose() is called, it tries to fflush() the buffered data to disk but fails due to IO error. The caller ignores the error returned by fclose()
    3. the program continues with the wrong assumption that the data is safely on disk

    Sjors commented at 12:01 pm on January 25, 2024:
    That’s not good. I guess we also don’t want to sync to disk, and block for that to complete, for every field that’s >>’d to a file though.

    vasild commented at 12:47 pm on January 25, 2024:
    Right. That’s why I don’t see a good solution. It is a design issue with AutoFile to flush/close from the destructor which can’t signal the failure to the caller.
  46. willcl-ark added the label Utils/log/libs on Jan 24, 2024
  47. achow101 commented at 3:29 pm on April 9, 2024: member

    The PR didn’t seem to attract much attention in the past. Also, the issue seems not important enough right now to keep it sitting around idle in the list of open PRs.

    Closing due to lack of interest.

  48. achow101 closed this on Apr 9, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me