Add logging and error handling for file syncing #13039

pull laanwj wants to merge 1 commits into bitcoin:master from laanwj:2018_04_fsync_noignore changing 4 files +36 −13
  1. laanwj commented at 10:16 AM on April 20, 2018: member

    Add logging and error handling inside, and outside of FileCommit. Functions such as fsync, fdatasync will return error in case of hardware I/O errors, and ignoring this means it can silently continue through data corruption. (c.f. https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)

    EINVAL is handled specially to avoid crashing out on (network, fuse) filesystems that don't handle f[data]sync.

    I checked that the syncing inside leveldb is already generating an I/O error as appropriate.

  2. laanwj added the label Utils/log/libs on Apr 20, 2018
  3. laanwj force-pushed on Apr 20, 2018
  4. laanwj requested review from theuni on Apr 20, 2018
  5. laanwj requested review from sipa on Apr 20, 2018
  6. practicalswift commented at 12:39 PM on April 20, 2018: contributor

    Strong concept ACK! Nice!

  7. in src/util.cpp:799 in 5657cc5162 outdated
     796 | +        LogPrintf("%s: fflush failed: %d\n", __func__, errno);
     797 | +        return false;
     798 | +    }
     799 |  #ifdef WIN32
     800 |      HANDLE hFile = (HANDLE)_get_osfhandle(_fileno(file));
     801 |      FlushFileBuffers(hFile);
    


    theuni commented at 2:40 PM on April 20, 2018:

    laanwj commented at 3:47 PM on April 20, 2018:

    Ok

  8. in src/util.cpp:806 in 5657cc5162 outdated
     801 |      FlushFileBuffers(hFile);
     802 |  #else
     803 |      #if defined(__linux__) || defined(__NetBSD__)
     804 | -    fdatasync(fileno(file));
     805 | +    if (fdatasync(fileno(file)) != 0 && errno != EINVAL) { // Ignore EINVAL for filesystems that don't support sync
     806 | +        LogPrintf("%s: fdatasync failed: %d\n", __func__, errno);
    


    theuni commented at 2:46 PM on April 20, 2018:

    I can't come up with a specific reason other than "gut feeling", but potentially logging to the same disk that just failed to sync feels icky.


    laanwj commented at 3:35 PM on April 20, 2018:

    That's not necessarily the case, e.g. they might be logging to stdout, or piping the log to something else over the network. Logging the error is extremely important for troubleshooting in any case, even if it might fail.

  9. in src/validation.cpp:1632 in 5657cc5162 outdated
    1629 |      fileOld = OpenUndoFile(posOld);
    1630 |      if (fileOld) {
    1631 |          if (fFinalize)
    1632 |              TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nUndoSize);
    1633 | -        FileCommit(fileOld);
    1634 | +        flush_failed = FileCommit(fileOld);
    


    theuni commented at 2:50 PM on April 20, 2018:

    |=


    theuni commented at 3:07 PM on April 20, 2018:

    Whoops, and the logic for these is backwards. They should be:

     flush_failed |= !FileCommit(fileOld);
    

    laanwj commented at 3:35 PM on April 20, 2018:

    Ah yes this should be flush_ok

  10. in src/validation.cpp:1631 in 5657cc5162 outdated
    1627 |      }
    1628 |  
    1629 |      fileOld = OpenUndoFile(posOld);
    1630 |      if (fileOld) {
    1631 |          if (fFinalize)
    1632 |              TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nUndoSize);
    


    theuni commented at 2:51 PM on April 20, 2018:

    Wouldn't hurt to check the return value here too, but I guess it's not crucial since any error would just propagate to the sync.


    laanwj commented at 3:45 PM on April 20, 2018:

    Will do

  11. in src/addrdb.cpp:54 in 5657cc5162 outdated
      48 | @@ -49,7 +49,8 @@ bool SerializeFileDB(const std::string& prefix, const fs::path& path, const Data
      49 |  
      50 |      // Serialize
      51 |      if (!SerializeDB(fileout, data)) return false;
      52 | -    FileCommit(fileout.Get());
      53 | +    if (!FileCommit(fileout.Get()))
      54 | +        return error("%s: Failed to flush file %s", __func__, pathTmp.string());
      55 |      fileout.fclose();
    


    theuni commented at 2:58 PM on April 20, 2018:

    Should this attempt to close before returning? I assume we're in bad shape already if we get here, but I'm afraid that some one-off sync failure could lead to multiple copies open and weird fd behavior.


    laanwj commented at 3:41 PM on April 20, 2018:

    Doesn't CAutofile destructor take care of this?


    theuni commented at 3:50 PM on April 20, 2018:

    yep, nevermind missed that this was a CAutofile.

  12. in src/validation.cpp:1624 in 5657cc5162 outdated
    1620 |      FILE *fileOld = OpenBlockFile(posOld);
    1621 |      if (fileOld) {
    1622 |          if (fFinalize)
    1623 |              TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize);
    1624 | -        FileCommit(fileOld);
    1625 | +        flush_failed = FileCommit(fileOld);
    


    theuni commented at 2:59 PM on April 20, 2018:

    See fclose comment above.

  13. in src/validation.cpp:4770 in 5657cc5162 outdated
    4764 | @@ -4760,7 +4765,8 @@ bool DumpMempool(void)
    4765 |          }
    4766 |  
    4767 |          file << mapDeltas;
    4768 | -        FileCommit(file.Get());
    4769 | +        if (!FileCommit(file.Get()))
    4770 | +            throw std::runtime_error("FileCommit failed");
    4771 |          file.fclose();
    


    theuni commented at 3:00 PM on April 20, 2018:

    aaand here :)

  14. theuni commented at 3:00 PM on April 20, 2018: member

    concept ACK

  15. in src/util.cpp:805 in 5657cc5162 outdated
     800 |      HANDLE hFile = (HANDLE)_get_osfhandle(_fileno(file));
     801 |      FlushFileBuffers(hFile);
     802 |  #else
     803 |      #if defined(__linux__) || defined(__NetBSD__)
     804 | -    fdatasync(fileno(file));
     805 | +    if (fdatasync(fileno(file)) != 0 && errno != EINVAL) { // Ignore EINVAL for filesystems that don't support sync
    


    theuni commented at 3:32 PM on April 20, 2018:

    Interestingly, leveldb does not compare the result to EINVAL. Which makes sense for a database, as if writes can't be synchronized at any point, the db can't really ever be trusted.

    I think we might want to operate the same way. If a FlushBlockFile() fails due to a disk/filesystem that can't sync, do we really want to continue with updating the index db?


    laanwj commented at 3:38 PM on April 20, 2018:

    leveldb does compare against EINVAL, we had this issue a long time ago where leveldb didn't work with certain filesystems (e.g. NTFS mounted from Linux). This was eventually fixed by adding the compare against EINVAL. We certainly don't want to repeat this issue with the block files!

    $ git grep fsync
    util/env_posix.cc:        if (fsync(fd) < 0 && errno != EINVAL) {
    ``
    

    theuni commented at 3:48 PM on April 20, 2018:

    Hmm. leveldb checks the fsync() in SyncDirIfManifest() against EINVAL, but not the fdatasync() in Sync().

    I have no knowledge of the past issue though, so I'll defer to you on that.


    laanwj commented at 3:51 PM on April 20, 2018:

    as if writes can't be synchronized at any point, the db can't really ever be trusted.

    I don't agree. When shutting down bitcoind and the OS cleanly, f(data)sync is a no-op. It's only an issue if there are sudden crashes or power outages. In which case the user will just have to accept potential corruption on crashes when choosing to use a filesystem that doesn't support these things. Certainly for the block files, which are huge and more commonly hosted externally.

    My intent here is not to break any current usecases.

  16. theuni approved
  17. theuni commented at 4:05 PM on April 20, 2018: member

    utACK after squash.

  18. jonasschnelli commented at 11:52 AM on April 23, 2018: contributor

    utACK 36424a4f09de38e1ab54381aa7844781688f5c75 (squash recommended).

  19. Add logging and error handling for file syncing
    Add logging and error handling inside, and outside of FileCommit.
    Functions such as fsync, fdatasync will return error in case of hardware
    I/O errors, and ignoring this means it can silently continue through
    data corruption.  (c.f.
    https://lwn.net/SubscriberLink/752063/12b232ab5039efbe/)
    cf0277928f
  20. laanwj commented at 12:25 PM on April 23, 2018: member

    squashed 36424a4cf0277928fa8d955d75f661021845789194dfff7

  21. laanwj force-pushed on Apr 23, 2018
  22. laanwj merged this on Apr 23, 2018
  23. laanwj closed this on Apr 23, 2018

  24. laanwj referenced this in commit 8b4081a889 on Apr 23, 2018
  25. PastaPastaPasta referenced this in commit 74847ceb09 on Nov 10, 2020
  26. PastaPastaPasta referenced this in commit 9cc2bb8fb2 on Nov 12, 2020
  27. PastaPastaPasta referenced this in commit d36a4a4db3 on Nov 17, 2020
  28. random-zebra referenced this in commit 116bb50765 on Apr 20, 2021
  29. gades referenced this in commit ce94318d8b on Jun 26, 2021
  30. MarcoFalke locked this on Sep 8, 2021


sipa


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me