util: explicitly close all AutoFiles that have been written #29307

pull vasild wants to merge 1 commits into bitcoin:master from vasild:AutoFile_error_check changing 18 files +99 −19
  1. vasild commented at 2:35 pm on January 24, 2024: contributor

    fclose(3) may fail to flush the previously written data to disk, thus a failing fclose(3) is as serious as a failing fwrite(3).

    Previously the code ignored fclose(3) failures. This PR improves that by changing all users of AutoFile that use it to write data to explicitly close the file and handle a possible error.


    Other alternatives are:

    1. fflush(3) after each write to the file (and throw if it fails from the AutoFile::write() method) and hope that fclose(3) will then always succeed. Assert that it succeeds from the destructor :roll_eyes:. Will hurt performance.
    2. Throw nevertheless from the destructor. Exception within the exception in C++ I think results in terminating the program without a useful message.
    3. (this is implemented in the latest incarnation of this PR) Redesign AutoFile so that its destructor cannot fail. Adjust all its users :sob:. For example, if the file has been written to, then require the callers to explicitly call the AutoFile::fclose() method before the object goes out of scope. In the destructor, as a sanity check, assume/assert that this is indeed the case. Defeats the purpose of a RAII wrapper for FILE* which automatically closes the file when it goes out of scope and there are a lot of users of AutoFile.
  2. DrahtBot commented at 2:35 pm on January 24, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK ryanofsky
    Stale ACK maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30987 (Don’t zero-after-free DataStream: Faster IBD on some configurations by davidgumberg)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #28792 (Embed default ASMap as binary dump header file by fjahr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Utils/log/libs on Jan 24, 2024
  4. vasild force-pushed on Jan 24, 2024
  5. DrahtBot added the label CI failed on Jan 24, 2024
  6. DrahtBot commented at 2:58 pm on January 24, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/20819752683

  7. in src/streams.cpp:18 in 7c58f6ea4f outdated
    14     if (!m_file) throw std::ios_base::failure("AutoFile::read: file handle is nullptr");
    15     if (m_xor.empty()) {
    16-        return std::fread(dst.data(), 1, dst.size(), m_file);
    17+        const auto n = std::fread(dst.data(), 1, dst.size(), m_file);
    18+        if (ferror(m_file)) {
    19+            throw std::ios_base::failure(strprintf("AutoFile::detail_fread: %s", SysErrorString(errno)));
    


    maflcko commented at 3:59 pm on January 24, 2024:

    not sure. detail_fread is an implementation detail and thin wrapper, with the goal to simply return the return value of std::fread. It should be left for the caller to decide about the error.

    In any case, if you want to do this, you’d have to apply it to both branches of the if. The error message shouldn’t differ just because a file is xor’d or not.

    Finally, it would be good to explain whether this is a bugfix or a refactor/belt-and-suspenders. I don’t think an error/eof could have occurred if the full dst.size() was read. So I don’t think this is needed, but I could be wrong. Ideally, there is a unit test showing why the change is needed.


    vasild commented at 8:40 am on January 25, 2024:

    Alright, moved to the caller.

    IMO this is a bugfix, but I am not 100% sure either. Might as well be viewed as “belt-and-suspenders”. What is certain is that it can’t hurt even if there is never an error on any platform if detail_fread(dst) == dst.size(). A unit test would require simulating an IO error under the fread(3) call and identically behaving fread(3) on all platforms.

  8. in src/test/fuzz/autofile.cpp:50 in 7c58f6ea4f outdated
    46@@ -47,7 +47,7 @@ FUZZ_TARGET(autofile)
    47                 }
    48             },
    49             [&] {
    50-                auto_file.fclose();
    51+                assert(auto_file.fclose() == 0);
    


    vasild commented at 5:18 pm on January 24, 2024:

    This assert actually failed in CI (wow!):

    https://cirrus-ci.com/task/5586893237125120

     0Run autofile with args ['/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/ci_container_base/ci/scratch/qa-assets/fuzz_seed_corpus/autofile')]INFO: Running with entropic power schedule (0xFF, 100).
     1INFO: Seed: 3765617163
     2INFO: Loaded 1 modules   (547059 inline 8-bit counters): 547059 [0x55cd54dd29b8, 0x55cd54e582ab), 
     3INFO: Loaded 1 PC tables (547059 PCs): 547059 [0x55cd54e582b0,0x55cd556b11e0), 
     4INFO:      535 files found in /ci_container_base/ci/scratch/qa-assets/fuzz_seed_corpus/autofile
     5INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1041296 bytes
     6INFO: seed corpus: files: 535 min: 1b max: 1041296b total: 17489928b rss: 106Mb
     7fuzz: test/fuzz/autofile.cpp:50: auto autofile_fuzz_target(FuzzBufferType)::(anonymous class)::operator()() const: Assertion `auto_file.fclose() == 0' failed.
     8==18378== ERROR: libFuzzer: deadly signal
     9    [#0](/bitcoin-bitcoin/0/) 0x55cd5193ab55 in __sanitizer_print_stack_trace (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1a5fb55) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    10    [#1](/bitcoin-bitcoin/1/) 0x55cd518922fc in fuzzer::PrintStackTrace() (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19b72fc) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    11    [#2](/bitcoin-bitcoin/2/) 0x55cd51878367 in fuzzer::Fuzzer::CrashCallback() (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199d367) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    12    [#3](/bitcoin-bitcoin/3/) 0x7fb162f1d8ff  (/lib/x86_64-linux-gnu/libc.so.6+0x428ff) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    13    [#4](/bitcoin-bitcoin/4/) 0x7fb162f7498a in pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9998a) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    14    [#5](/bitcoin-bitcoin/5/) 0x7fb162f1d855 in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42855) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    15    [#6](/bitcoin-bitcoin/6/) 0x7fb162f018b6 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x268b6) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    16    [#7](/bitcoin-bitcoin/7/) 0x7fb162f017da  (/lib/x86_64-linux-gnu/libc.so.6+0x267da) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    17    [#8](/bitcoin-bitcoin/8/) 0x7fb162f14185 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x39185) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    18    [#9](/bitcoin-bitcoin/9/) 0x55cd51accacc in autofile_fuzz_target(Span<unsigned char const>)::$_3::operator()() const src/test/fuzz/autofile.cpp:50:17
    19    [#10](/bitcoin-bitcoin/10/) 0x55cd51accacc in unsigned long CallOneOf<autofile_fuzz_target(Span<unsigned char const>)::$_0, autofile_fuzz_target(Span<unsigned char const>)::$_1, autofile_fuzz_target(Span<unsigned char const>)::$_2, autofile_fuzz_target(Span<unsigned char const>)::$_3, autofile_fuzz_target(Span<unsigned char const>)::$_4, autofile_fuzz_target(Span<unsigned char const>)::$_5>(FuzzedDataProvider&, autofile_fuzz_target(Span<unsigned char const>)::$_0, autofile_fuzz_target(Span<unsigned char const>)::$_1, autofile_fuzz_target(Span<unsigned char const>)::$_2, autofile_fuzz_target(Span<unsigned char const>)::$_3, autofile_fuzz_target(Span<unsigned char const>)::$_4, autofile_fuzz_target(Span<unsigned char const>)::$_5) src/./test/fuzz/util.h:42:27
    20    [#11](/bitcoin-bitcoin/11/) 0x55cd51accacc in autofile_fuzz_target(Span<unsigned char const>) src/test/fuzz/autofile.cpp:27:9
    21    [#12](/bitcoin-bitcoin/12/) 0x55cd5204a52c in std::function<void (Span<unsigned char const>)>::operator()(Span<unsigned char const>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    22    [#13](/bitcoin-bitcoin/13/) 0x55cd5204a52c in LLVMFuzzerTestOneInput src/test/fuzz/fuzz.cpp:178:5
    23    [#14](/bitcoin-bitcoin/14/) 0x55cd51879814 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199e814) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    24    [#15](/bitcoin-bitcoin/15/) 0x55cd51878f09 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199df09) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    25    [#16](/bitcoin-bitcoin/16/) 0x55cd5187aaf6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199faf6) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    26    [#17](/bitcoin-bitcoin/17/) 0x55cd5187b017 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19a0017) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    27    [#18](/bitcoin-bitcoin/18/) 0x55cd5186875b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x198d75b) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    28    [#19](/bitcoin-bitcoin/19/) 0x55cd51892ce6 in main (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19b7ce6) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    29    [#20](/bitcoin-bitcoin/20/) 0x7fb162f030cf  (/lib/x86_64-linux-gnu/libc.so.6+0x280cf) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    30    [#21](/bitcoin-bitcoin/21/) 0x7fb162f03188 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x28188) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    31    [#22](/bitcoin-bitcoin/22/) 0x55cd5185d624 in _start (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1982624) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    32NOTE: libFuzzer has rudimentary signal handlers.
    33      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
    34SUMMARY: libFuzzer: deadly signal
    35MS: 0 ; base unit: 0000000000000000000000000000000000000000
    36artifact_prefix='./'; Test unit written to ./crash-276f4f9d3e6ba98f897bd1082aec7214ca8e4f5d
    37INFO: Running with entropic power schedule (0xFF, 100).
    38INFO: Seed: 3765617163
    39INFO: Loaded 1 modules   (547059 inline 8-bit counters): 547059 [0x55cd54dd29b8, 0x55cd54e582ab), 
    40INFO: Loaded 1 PC tables (547059 PCs): 547059 [0x55cd54e582b0,0x55cd556b11e0), 
    41INFO:      535 files found in /ci_container_base/ci/scratch/qa-assets/fuzz_seed_corpus/autofile
    42INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1041296 bytes
    43INFO: seed corpus: files: 535 min: 1b max: 1041296b total: 17489928b rss: 106Mb
    44fuzz: test/fuzz/autofile.cpp:50: auto autofile_fuzz_target(FuzzBufferType)::(anonymous class)::operator()() const: Assertion `auto_file.fclose() == 0' failed.
    45==18378== ERROR: libFuzzer: deadly signal
    46    [#0](/bitcoin-bitcoin/0/) 0x55cd5193ab55 in __sanitizer_print_stack_trace (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1a5fb55) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    47    [#1](/bitcoin-bitcoin/1/) 0x55cd518922fc in fuzzer::PrintStackTrace() (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19b72fc) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    48    [#2](/bitcoin-bitcoin/2/) 0x55cd51878367 in fuzzer::Fuzzer::CrashCallback() (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199d367) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    49    [#3](/bitcoin-bitcoin/3/) 0x7fb162f1d8ff  (/lib/x86_64-linux-gnu/libc.so.6+0x428ff) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    50    [#4](/bitcoin-bitcoin/4/) 0x7fb162f7498a in pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x9998a) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    51    [#5](/bitcoin-bitcoin/5/) 0x7fb162f1d855 in gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42855) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    52    [#6](/bitcoin-bitcoin/6/) 0x7fb162f018b6 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x268b6) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    53    [#7](/bitcoin-bitcoin/7/) 0x7fb162f017da  (/lib/x86_64-linux-gnu/libc.so.6+0x267da) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    54    [#8](/bitcoin-bitcoin/8/) 0x7fb162f14185 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x39185) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    55    [#9](/bitcoin-bitcoin/9/) 0x55cd51accacc in autofile_fuzz_target(Span<unsigned char const>)::$_3::operator()() const src/test/fuzz/autofile.cpp:50:17
    56    [#10](/bitcoin-bitcoin/10/) 0x55cd51accacc in unsigned long CallOneOf<autofile_fuzz_target(Span<unsigned char const>)::$_0, autofile_fuzz_target(Span<unsigned char const>)::$_1, autofile_fuzz_target(Span<unsigned char const>)::$_2, autofile_fuzz_target(Span<unsigned char const>)::$_3, autofile_fuzz_target(Span<unsigned char const>)::$_4, autofile_fuzz_target(Span<unsigned char const>)::$_5>(FuzzedDataProvider&, autofile_fuzz_target(Span<unsigned char const>)::$_0, autofile_fuzz_target(Span<unsigned char const>)::$_1, autofile_fuzz_target(Span<unsigned char const>)::$_2, autofile_fuzz_target(Span<unsigned char const>)::$_3, autofile_fuzz_target(Span<unsigned char const>)::$_4, autofile_fuzz_target(Span<unsigned char const>)::$_5) src/./test/fuzz/util.h:42:27
    57    [#11](/bitcoin-bitcoin/11/) 0x55cd51accacc in autofile_fuzz_target(Span<unsigned char const>) src/test/fuzz/autofile.cpp:27:9
    58    [#12](/bitcoin-bitcoin/12/) 0x55cd5204a52c in std::function<void (Span<unsigned char const>)>::operator()(Span<unsigned char const>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    59    [#13](/bitcoin-bitcoin/13/) 0x55cd5204a52c in LLVMFuzzerTestOneInput src/test/fuzz/fuzz.cpp:178:5
    60    [#14](/bitcoin-bitcoin/14/) 0x55cd51879814 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199e814) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    61    [#15](/bitcoin-bitcoin/15/) 0x55cd51878f09 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199df09) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    62    [#16](/bitcoin-bitcoin/16/) 0x55cd5187aaf6 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x199faf6) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    63    [#17](/bitcoin-bitcoin/17/) 0x55cd5187b017 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19a0017) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    64    [#18](/bitcoin-bitcoin/18/) 0x55cd5186875b in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x198d75b) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    65    [#19](/bitcoin-bitcoin/19/) 0x55cd51892ce6 in main (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x19b7ce6) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    66    [#20](/bitcoin-bitcoin/20/) 0x7fb162f030cf  (/lib/x86_64-linux-gnu/libc.so.6+0x280cf) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    67    [#21](/bitcoin-bitcoin/21/) 0x7fb162f03188 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x28188) (BuildId: f0b834daa3d05a80967e9ec2f990a1ea71c958fa)
    68    [#22](/bitcoin-bitcoin/22/) 0x55cd5185d624 in _start (/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1982624) (BuildId: 34c8e9fbac2706ae466ade7d359adebec5d82bcf)
    69NOTE: libFuzzer has rudimentary signal handlers.
    70      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
    71SUMMARY: libFuzzer: deadly signal
    72MS: 0 ; base unit: 0000000000000000000000000000000000000000
    73artifact_prefix='./'; Test unit written to ./crash-276f4f9d3e6ba98f897bd1082aec7214ca8e4f5d
    74Target ['/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/ci_container_base/ci/scratch/qa-assets/fuzz_seed_corpus/autofile')] failed with exit code 77
    

    No base64 of the unit in the output, I guess I cannot access ./crash-276f4f9d3e6ba98f897bd1082aec7214ca8e4f5d and I cannot reproduce by running FUZZ=autofile ./src/test/fuzz/fuzz -runs=1000000 .../qa-assets/fuzz_seed_corpus/autofile locally. Maybe the CI machine ran out of disk space?


    vasild commented at 8:22 am on January 25, 2024:

    I realized that auto_file used in this fuzz test is not associated with a regular file on the filesystem, but with a “cookie” returned by fuzzed_file_provider.open() returned by fopencookie(3) which uses a custom close function FuzzedFileProvider::close() which can arbitrarily return -1:

    https://github.com/bitcoin/bitcoin/blob/207220ce8b767d8efdb5abf042ecf23d846ded73/src/test/fuzz/util.cpp#L352

    changed this to ignore the return value of auto_file.fclose().

  9. vasild force-pushed on Jan 25, 2024
  10. DrahtBot removed the label CI failed on Jan 25, 2024
  11. maflcko commented at 11:32 am on January 29, 2024: member

    I don’t think any care close-check needs to be done when reading from a file.

    So what about removing the write method from AutoFile, and introduce a new derived class to add it back. This class could Assume that the file was flushed/closed before the destructor is called?

  12. vasild commented at 10:48 am on February 4, 2024: contributor

    I don’t think any care close-check needs to be done when reading from a file.

    That is my understanding too.

    So what about removing the write method from AutoFile, and introduce a new derived class to add it back. This class could Assume that the file was flushed/closed before the destructor is called?

    Assume is more for code correctness, not for external errors (like IO error). If it does not fail during testing and on CI, that does not mean IO errors are absent and will not occur on the users’ machines. Given that Assume() is removed from production builds, it still means that file corruption could remain unnoticed on live environments, like now on master.

  13. maflcko commented at 9:45 am on February 5, 2024: member

    This class could Assume that the file was flushed/closed before the destructor is called?

    Assume is more for code correctness, not for external errors (like IO error).

    The assumption would be that the close was called before the destructor is called. This is a code correctness question and seems like the perfect fit for Assume. That is, if the Assume fails, the code was missing a close, and the code must be changed to fix the internal bug. The Assume does not in any way check for IO errors.

  14. vasild commented at 3:17 pm on February 5, 2024: contributor

    Then I misunderstood this:

    This class could Assume that the file was flushed/closed before the destructor is called

    How?

  15. maflcko commented at 3:34 pm on February 5, 2024: member
    AutoFile holds a file, which will be nullptr when it is closed. So if the file is nullptr, then it can be assumed to be flushed.
  16. vasild commented at 4:40 pm on February 5, 2024: contributor
    How/when would the file be closed?
  17. maflcko commented at 8:00 am on February 6, 2024: member

    How/when would the file be closed?

    An AutoFile can be closed by calling the fclose() method in the code.

  18. vasild commented at 9:45 am on February 6, 2024: contributor
    Ok, the users of the class then have to explicitly close. That is 1.4. from the OP, I elaborated it with more words.
  19. DrahtBot commented at 3:42 pm on March 12, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/20850328376

  20. DrahtBot added the label CI failed on Mar 12, 2024
  21. Retropex referenced this in commit f54c172ffe on Mar 28, 2024
  22. Retropex referenced this in commit 70403c5bb7 on Mar 28, 2024
  23. achow101 commented at 3:43 pm on April 9, 2024: member
    Seems like there is some discussion in the description that is a better fit for a brainstorming issue.
  24. DrahtBot marked this as a draft on Apr 23, 2024
  25. DrahtBot commented at 6:50 am on April 23, 2024: contributor
    Converted to draft due to failing CI and inactivity
  26. vasild force-pushed on May 10, 2024
  27. vasild commented at 10:28 am on May 10, 2024: contributor

    5543990321...1fa61d9edc: rebase and log a message from ~AutoFile() if closing fails and terminate the program.

    Ready for review. I think this would be a strict improvement over what we have now in master - a silent ignoring of a possible file corruption.

  28. vasild marked this as ready for review on May 10, 2024
  29. vasild force-pushed on May 10, 2024
  30. Sjors commented at 4:59 pm on May 10, 2024: member
    @vasild CI seems unhappy.
  31. vasild force-pushed on May 13, 2024
  32. vasild marked this as a draft on May 13, 2024
  33. vasild force-pushed on May 13, 2024
  34. DrahtBot removed the label CI failed on May 13, 2024
  35. vasild marked this as ready for review on May 14, 2024
  36. vasild commented at 9:51 am on May 14, 2024: contributor
    Fixed the CI!
  37. in src/streams.cpp:28 in de23848eed outdated
    24@@ -23,8 +25,9 @@ std::size_t AutoFile::detail_fread(Span<std::byte> dst)
    25 
    26 void AutoFile::read(Span<std::byte> dst)
    27 {
    28-    if (detail_fread(dst) != dst.size()) {
    29-        throw std::ios_base::failure(feof() ? "AutoFile::read: end of file" : "AutoFile::read: fread failed");
    30+    if (detail_fread(dst) != dst.size() || ferror(m_file)) {
    


    maflcko commented at 10:57 am on May 14, 2024:

    de23848eed5437f5e4dc6ccdc38b40cc58738ead: I still don’t understand this. Why would fread succeed to read the requested number of bytes, but then mark the stream with an error? If the goal is to somehow detect errors after the read succeeded, I am not sure this should be a goal. Otherwise, it would be good to explain what real user-facing bug this fixes or introduces.

    Either this is a refactor, in which case it seems pointless, or it is a behavior change, in which case it needs to explain why it beneficial and correct.


    vasild commented at 12:39 pm on May 14, 2024:

    This was somewhat answered in #29307 (review) above but let me elaborate - the thing that prompted me to do this change is this text from the fread(3) manual page:

    The function fread() does not distinguish between end-of-file and error, and callers must use feof(3) and ferror(3) to determine which occurred.

    I think checking for ferror(3) after fread(3) is a kind of harmless “belt-and-suspenders”. If you are sure that if detail_fread(dst) == dst.size() then ferror(m_file) will always be false or feel strongly against this then I could drop the commit de23848eed util: check for error after reading from a file.


    maflcko commented at 5:32 pm on May 23, 2024:

    If you are sure that if detail_fread(dst) == dst.size() then ferror(m_file) will always be false or feel strongly against this then I could drop the commit https://github.com/bitcoin/bitcoin/commit/de23848eed5437f5e4dc6ccdc38b40cc58738ead util: check for error after reading from a file.

    I don’t feel strongly in one way or another. I am just saying that with the currently available information, it is not possible to confidently review this and know it is a good change. At a minimum, it should be clear whether this is a refactor or a behavior change, and what user-facing bug this could possibly prevent or fix. (I can’t think of one, but maybe I am missing something?)


    vasild commented at 7:00 am on June 20, 2024:

    Well, it is pretty much obvious that checking for an error after read is either:

    • a noop, if the read could have never failed because it managed to read as many bytes as requested or
    • a bug fix where a read error was ignored before the fix

    so this is a safe change (it does not break anything), possibly fixing a bug. Given that you keep pushing against this I will drop this commit in the hopes to move this PR forward.


    maflcko commented at 7:08 am on June 20, 2024:
    No, I don’t understand why a “read error was ignored before the fix” and why “this is a safe change (it does not break anything)”. For example, if this was hit on a filesystem that stored a block file, which was fully and successfully read, but then return an error (for an unknown reason), this would mean that for a user where the program previously worked fine without any issues, is now unusable (for an unknown reason).

    vasild commented at 8:14 am on June 20, 2024:

    I just found a similar code to this proposed change:

    https://github.com/bitcoin/bitcoin/blob/2d21060af831fa8f8f1791b4464961d5f28a88cb/src/node/utxo_snapshot.cpp#L74-L80

    After reading from a file and getting everything that is needed from it, the code checks if there is an extra data and if that read fails it reports an error.


    maflcko commented at 8:30 am on June 20, 2024:
    The code you linked to is primarily checking for unexpected trailing data and only printing a warning. It is not making the program possibly unusable.

    vasild commented at 10:05 am on June 20, 2024:

    I think I see your point, you prefer 1. over 2., assuming we got all the bytes we asked for from a read, then:

    1. Ignore possible IO error by not checking for it
    2. Check for an error and behave as if the read failed

    I dropped this change.

  38. vasild force-pushed on May 14, 2024
  39. in src/kernel/mempool_persist.cpp:221 in f9cf6b516b outdated
    217@@ -213,6 +218,7 @@ bool DumpMempool(const CTxMemPool& pool, const fs::path& dump_path, FopenFn mock
    218                   fs::file_size(dump_path));
    219     } catch (const std::exception& e) {
    220         LogInfo("Failed to dump mempool: %s. Continuing anyway.\n", e.what());
    221+        (void)file.fclose();
    


    maflcko commented at 12:54 pm on May 15, 2024:

    Not sure I understand this change either. This seems to be mixing the third alternative in the pull request description with the approach you wanted to take?

    Defeats the purpose of a RAII wrapper for FILE* which automatically closes the file when it goes out of scope and there are a lot of users of AutoFile.


    vasild commented at 1:11 pm on May 23, 2024:

    It is best if all users of AutoFile explicitly call the fclose() method and check that it succeeds (option 3. from PR description). This is not feasible because there are many users of AutoFile. However, some of those users already do call the fclose() method explicitly. For those users it is easy and I added checks whether fclose() succeeds. DumpMempool() is such a user and the change is:

    0             throw std::runtime_error("FileCommit failed");
    1-        file.fclose();           
    2+        if (file.fclose() != 0) {    
    3+            throw std::runtime_error(
    4+                strprintf("Error closing %s: %s", fs::PathToString(file_fspath), SysErrorString(errno)));
    5+        }
    6         if (!RenameOver(dump_path + ".new", dump_path)) {
    

    For consistency in DumpMempool() I changed all code paths in that function to explicitly call fclose(). There are two codepaths and this added (void)file.fclose(); is the other one. I ignore the return value because this is already failure handling and the function is going to return false. I didn’t want to do something like “while handling error1, error2 occurred”.


    maflcko commented at 5:25 pm on May 23, 2024:

    Oh, now I see this is required, because otherwise the node would crash on this error instead of continuing with the current code, if this was missing?

    Not sure this is desirable, see also https://github.com/bitcoin/bitcoin/pull/29307/files#r1612065514


    vasild commented at 10:09 am on June 20, 2024:
    Resolving this discussion. Changed the semantic of AutoFile to require all write users to call fclose().
  40. DrahtBot added the label Needs rebase on May 23, 2024
  41. in src/streams.h:413 in de23848eed outdated
    410-    ~AutoFile() { fclose(); }
    411+    ~AutoFile()
    412+    {
    413+        if (fclose() != 0 && m_was_written) {
    414+            LogPrintLevel(BCLog::ALL, BCLog::Level::Error, "Error closing file: %s\n", SysErrorString(errno));
    415+            std::abort();
    


    maflcko commented at 5:22 pm on May 23, 2024:

    not sure about unconditionally crashing the node on any failure here. This may or may not be an issue, so the call site will have to decide about this.

    For example, If an RPC user dumps the mempool to an external (corrupt) USB thumb drive, a failure shouldn’t crash the node, but notify the user over the RPC.

    Also, I am not sure about involving logging in this low-level code. Logging may not be set up, so this may never be printed before the crash.

    Also, I am not sure about the log message. It just says that there was an error, but not whose fault it was or what should be done about it.

    A better solution overall may be to just call if (m_was_written) Assume(IsNull());. This should highlight all violating call sites during CI, because the test should be deterministic. (If there is a code path that is not deterministically tested, all hopes are lost anyway, and I don’t think the solution should be to crash the users node completely and without considering the context?)


    vasild commented at 10:07 am on June 20, 2024:

    if (m_was_written) Assume(IsNull());

    Done.

  42. luke-jr referenced this in commit ec8ef5e51c on Jun 13, 2024
  43. luke-jr referenced this in commit 17f1f24993 on Jun 13, 2024
  44. vasild force-pushed on Jun 20, 2024
  45. vasild renamed this:
    util: check for errors after close and read in AutoFile
    util: explicitly close all AutoFiles that have been written
    on Jun 20, 2024
  46. DrahtBot removed the label Needs rebase on Jun 20, 2024
  47. maflcko commented at 12:04 pm on June 20, 2024: member

    lgtm ACK 11be9f4103ea219f801a1f0fe1385f66ca70ad22

    Will do some testing later

  48. luke-jr referenced this in commit d2ea6ab60b on Jun 22, 2024
  49. DrahtBot added the label Needs rebase on Jul 2, 2024
  50. vasild force-pushed on Jul 2, 2024
  51. vasild commented at 12:41 pm on July 2, 2024: contributor
    11be9f4103...ea89a6e368: rebase due to conflicts
  52. DrahtBot removed the label Needs rebase on Jul 2, 2024
  53. DrahtBot added the label Needs rebase on Jul 9, 2024
  54. vasild force-pushed on Jul 22, 2024
  55. vasild commented at 10:33 am on July 22, 2024: contributor
    ea89a6e368...4533445fd7: rebase due to conflicts
  56. DrahtBot added the label CI failed on Jul 22, 2024
  57. DrahtBot commented at 12:13 pm on July 22, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27742621741

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  58. DrahtBot removed the label Needs rebase on Jul 22, 2024
  59. vasild commented at 12:56 pm on July 22, 2024: contributor
    CI failure seems unrelated.
  60. DrahtBot removed the label CI failed on Jul 22, 2024
  61. ryanofsky commented at 9:26 pm on July 22, 2024: contributor

    Concept ACK, approach ~0

    The PR description is a bit vague about how it is changing the AutoFile API.

    Previously, the autofile API allowed but did not require AutoFile callers to check for failing fclose calls after writing. Now it requires that callers call the fclose method manually and explicitly use or discard its return value.

    Checking for failing fclose calls is important because if a call fails, it probably indicates that there is a serious problem that should trigger a log message or an alert. Specifically if the file was used for writing it could mean that data could not successfully be saved to disk. So it would be bad for fclose errors to be discarded without being handled at all or reported somewhere.

    The PR makes AutoFile require callers to handle failing fclose calls, and it enforces this in two ways:

    1. It adds [[nodiscard]] annotation to the fclose method so its return value cannot be ignored. The return value is 0 if successful or EOF (-1) if unsuccessful.
    2. It tries to enforce that callers actually invoke fclose by adding an Assume statement to the destructor that terminates the program with a fatal error if the file was written to and fclose was not called and the binary was built in a debug mode.

    It seems like this approach works, but it is maybe a little fragile because it doesn’t actually enforce that caller handles the fclose result in every code path. So for example if an early return statement is added after a caller opens an autofile object but before it calls the fclose method, and the early return code path is not normally executed, it can introduce a latent bug that occasionally will crash debug builds, while providing no indication that anything is wrong in release builds.

    I think if the goal is to require AutoFile callers to handle failing fclose calls, a more reliable way to do that would be to change the constructor signature to

    0explicit AutoFile(std::FILE* file, std::function<void()> on_error);
    

    And require the caller to provide an on_error callback that does error handing like

    0[&] {
    1        remove(pathTmp);
    2        LogError("%s: Failed to close file %s: %s\n", __func__, fs::PathToString(pathTmp), SysErrorString(errno));
    3}
    

    I think the current approach is ok too, but it seems more fragile because it doesn’t detect all programming errors, and when it does it detects them with a runtime crash instead of a compile time error.

    I am coming to this late, so if you are really attached to the current approach I can just review and ack it, but I wanted to suggest an alternative.

  62. vasild commented at 1:13 pm on July 23, 2024: contributor

    @ryanofsky, I am not so attached to the current approach :) if it can be improved, then lets improve it.

    For the on_error callback - that would be called on fclose() error from destructor only, right? Not on any error? E.g. the write() method throws an exception on error.

    Not all callers have the file name when creating AutoFile which means we would have to log some generic message without a file name. Or go an extra mile and make sure we have the file name, but some callers don’t use files at all, IIRC the fuzzer uses some cookie “files” that are not on the filesystem.

    Then the callback can be omitted and the message can be logged from the destructor:

     0     ~AutoFile()
     1     {   
     2         if (m_was_written) {
     3             // Callers that wrote to the file must have closed it explicitly
     4             // with the fclose() method and checked that the close succeeded.
     5             // This is because here from the destructor we have no way to signal
     6             // error due to close which, after write, could mean the file is
     7             // corrupted and must be handled properly at the call site.
     8             Assume(IsNull());
     9         }
    10                        
    11-        (void)fclose();
    12+        if (fclose() != 0) {
    13+            LogError("Failed to close file: %s\n", SysErrorString(errno));
    14+        }
    15     }
    
  63. DrahtBot added the label CI failed on Aug 7, 2024
  64. DrahtBot added the label Needs rebase on Aug 21, 2024
  65. vasild force-pushed on Aug 22, 2024
  66. vasild commented at 9:27 am on August 22, 2024: contributor
    4533445fd7...1dcd626fe1: rebase due to conflicts
  67. DrahtBot removed the label Needs rebase on Aug 22, 2024
  68. vasild force-pushed on Aug 23, 2024
  69. vasild commented at 9:25 am on August 23, 2024: contributor
    1dcd626fe1...587d996f07: rebase, get the extra printout from #29307 (comment) and resolve a silent merge conflict with https://github.com/bitcoin/bitcoin/pull/28052
  70. DrahtBot removed the label CI failed on Aug 23, 2024
  71. vasild force-pushed on Sep 2, 2024
  72. vasild commented at 2:54 pm on September 2, 2024: contributor
    587d996f07...489c2477e7: rebase to pick CMake
  73. DrahtBot added the label CI failed on Sep 5, 2024
  74. DrahtBot removed the label CI failed on Sep 15, 2024
  75. DrahtBot added the label Needs rebase on Sep 17, 2024
  76. util: explicitly close all AutoFiles that have been written
    There is no way to report a close error from `AutoFile` destructor.
    Such an error could be serious if the file has been written to because
    it may mean the file is now corrupted (same as if write fails).
    
    So, change all users of `AutoFile` that use it to write data to
    explicitly close the file and handle a possible error.
    dba7835386
  77. vasild force-pushed on Sep 28, 2024
  78. vasild commented at 5:20 am on September 28, 2024: contributor
    489c2477e7...dba7835386: rebase due to conflicts
  79. DrahtBot removed the label Needs rebase on Sep 28, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me