Fix possible data race when committing block files #12696

pull eklitzke wants to merge 1 commits into bitcoin:master from eklitzke:fsync changing 4 files +27 −6
  1. eklitzke commented at 2:21 pm on March 15, 2018: contributor

    It was recently pointed out to me that calling fsync() or fdatasync() on a new file is not sufficient to ensure it’s persisted to disk, as the existence of the file itself is stored in the directory inode. This means that ensuring that a new file is actually committed also requires an fsync() on the parent directory. There are lots of discussions about this topic online, e.g. here. This only applies to new files, calling fsync() on an old file is always fine.

    In theory this means that the block index can race block persistence, as a poorly timed power outage could cause us to commit data to the block index that gets lost in the filesystem. The change here ensures that we call fsync() on the blocks directory after committing new block files. I checked the LevelDB source code and they already do this when updating their writeahead log. In theory this could happen at the same time as a chain split and that could cause you to come back online and then miss the block you had committed to the index, which would put you permanently out of sync between the two. This seems pretty far fetched, but we should handle this case correctly anyway.

    I’m using a new autoconf macro as well, AC_CHECK_FUNCS(). It checks that a function is available and then defines a HAVE_* macro if it is, analogous to AC_CHECK_HEADERS. Right now autoscan complains a lot about the fact that we’re not using this, so I figured I might as well start now while I was in this part of the code.

    Apparently Windows doesn’t have an similar method of syncing filesystem metadata—I’m not an expert on that though.

    Also not strictly related to this change, but I have been working on a lot of platform-specific PRs recently and want to refactor util.h and util.cpp so the platform-specific bits are isolated from the generic util stuff. I intend to create an issue later today to describe how I think that should be done so I can get feedback before starting that work.

  2. laanwj added the label Block storage on Mar 15, 2018
  3. laanwj added the label Data corruption on Mar 15, 2018
  4. eklitzke commented at 4:45 pm on March 15, 2018: contributor
    Apparently macOS erroneously reports that it has fdatasync(), discussion here: https://github.com/haskell/unix/issues/37
  5. Fix possible data race when committing block files
    It was recently pointed out to me that calling fsync() or fdatasync() on a new
    file is not sufficient to ensure it's persisted to disk, a the existence of the
    file itself is stored in the directory inode. This means that ensuring that a
    new file is actually committed also requires an fsync() on the parent directory.
    
    This change ensures that we call fsync() on the blocks directory after
    committing new block files.
    4894e368fa
  6. eklitzke force-pushed on Mar 15, 2018
  7. in src/util.h:182 in 4894e368fa
    177+
    178+/**
    179+ * Sync directory contents. This is required on some environments to ensure that
    180+ * newly created files are committed to disk.
    181+ */
    182+void DirectoryCommit(const fs::path &dirname);
    


    promag commented at 9:02 pm on March 15, 2018:
    nit, fs:path& dirname. The same in the implementation.
  8. promag commented at 9:03 pm on March 15, 2018: member
    utACK 4894e36.
  9. donaloconnor commented at 10:13 pm on March 15, 2018: contributor
    Just in case (I’m not very familiar with this part of Bitcoin): is there a reason this isn’t done for the undo data also? If it makes sense there then maybe it’s okay to do it once at the end of the function instead of in both?
  10. eklitzke commented at 8:58 am on March 16, 2018: contributor

    While I was looking at this again (originally to commit the undo data as @donaloconnor suggested) I decided it would be nice if we could only sync the parent directory if we know there’s actually a new block file on disk. The block files are 128 MB, so most of the time we’re flushing we don’t need to sync the parent directory.

    The full logic for all of the paths that can cause a file to be created is really confusing. First I tried adding a global variable that tracks if a new file has been opened, so that we can only sync the directory when that’s set. Then I was trying to understand the fFinalize flag so I could add a doxygen doc string to FlushBlockToDisk(). The full code path for how this can get called is pretty confusing and the methods aren’t well documented, but it seemed like it was true in the same cases that there would be a new file, since we “finalize” (i.e. truncate) the file when the is not “known to already reside on disk”. But why exactly we would need to truncate a file that doesn’t exist yet is unclear. I think it’s to support reorgs, but it was kind of hard to wrap my head around.

    Since this is pretty important logic I feel like some of these code paths at least need better comments. I am going to take another look through the code tomorrow and see if I can work it out. Maybe we end up always syncing the parent directory anyway since that’s always safe, but I want to make sure I have a better grasp of all of the logic in this file.

  11. in src/validation.cpp:1617 in 4894e368fa
    1613@@ -1614,6 +1614,7 @@ void static FlushBlockFile(bool fFinalize = false)
    1614         if (fFinalize)
    1615             TruncateFile(fileOld, vinfoBlockFile[nLastBlockFile].nSize);
    1616         FileCommit(fileOld);
    1617+        DirectoryCommit(GetDataDir() / "blocks");
    


    ryanofsky commented at 2:38 pm on March 22, 2018:
    Realize you are currently working on this, but if you decide to go with this code as is, you should definitely comment here that call this isn’t needed in the case where new files aren’t being added, in case somebody wants to try to optimize this later.

    luke-jr commented at 4:30 pm on June 12, 2018:
    Won’t it be a reasonably fast noop if the directory hasn’t changed?
  12. in src/util.cpp:740 in 4894e368fa
    740     fcntl(fileno(file), F_FULLFSYNC, 0);
    741-    #else
    742+#elif HAVE_FDATASYNC
    743+    fdatasync(fileno(file));
    744+#else
    745+    fsync(fileno(file));
    


    ryanofsky commented at 2:43 pm on March 22, 2018:
    You aren’t changing this, but it would be nice if these functions returned errors so failures could be logged.
  13. ryanofsky commented at 2:47 pm on March 22, 2018: member
    Realise you are still working on this but, utACK 4894e368fa61cbae1370a8b83581533b0b223f45 for the current version of the code, which does look good. PR description is a little out of date, though. Part about AC_CHECK_FUNCS should probably be removed.
  14. MarcoFalke added the label Needs rebase on Jun 6, 2018
  15. in src/util.cpp:748 in 4894e368fa
    748+
    749+void DirectoryCommit(const fs::path &dirname)
    750+{
    751+#ifndef WIN32
    752+    FILE* file = fsbridge::fopen(dirname, "r");
    753     fsync(fileno(file));
    


    luke-jr commented at 4:29 pm on June 12, 2018:
    Is fsync really enough here, or should we be calling FileCommit on it?
  16. fanquake commented at 3:01 pm on July 25, 2018: member
    Closing with "Up for grabs".
  17. fanquake closed this on Jul 25, 2018

  18. laanwj removed the label Needs rebase on Oct 24, 2019
  19. laanwj referenced this in commit 86a8b35f32 on Jan 7, 2021
  20. sidhujag referenced this in commit c01fae845f on Jan 7, 2021
  21. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me