use shared mutex to guard against block files being removed before read #26316

pull andrewtoth wants to merge 2 commits into bitcoin:master from andrewtoth:block-read-shared-mutex changing 3 files +12 −6
  1. andrewtoth commented at 0:35 am on October 15, 2022: contributor

    Potentially fixed #17161.

    It seems there’s a race in ReadBlockFromDisk and UndoReadFromDisk where after the block/undo file position is retrieved the actual block/undo file could be removed during prune. This could just cause the read to fail, or it could prevent the file from being removed if the file is already open depending on the platform (https://en.cppreference.com/w/cpp/io/c/remove). This PR uses a shared mutex to lock the removal of files until any reads are completed. This way it doesn’t block cs_main while reading so ReadBlockFromDisk usage can be moved outside of cs_main in some cases like #26308.

    The first commit also updates the function signature of ReadRawBlockFromDisk to take a block index instead of the file position, and then the file position is retrieved while locking cs_main inside the function. That way it unifies it with ReadBlockFromDisk and can be moved outside cs_main scope inside net_processing in a follow up PR like #26326.

    Also see #25232.

  2. andrewtoth force-pushed on Oct 16, 2022
  3. pass block index to ReadRawBlockFromDisk and lock inside a3c7da95f8
  4. andrewtoth force-pushed on Oct 17, 2022
  5. use shared mutex to guard against files being removed before read ad67092f92
  6. andrewtoth force-pushed on Oct 17, 2022
  7. DrahtBot commented at 6:29 pm on October 17, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK dergoegge

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26415 (rpc,rest: faster getblock and rest_block by reading raw block by andrewtoth)
    • #26288 (Enable -Wstring-concatenation and -Wstring-conversion on clang builds by Empact)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. in src/node/blockstorage.cpp:33 in ad67092f92
    29@@ -29,6 +30,7 @@ std::atomic_bool fImporting(false);
    30 std::atomic_bool fReindex(false);
    31 bool fPruneMode = false;
    32 uint64_t nPruneTarget = 0;
    33+std::shared_mutex cs_UnlinkFiles;
    


    dergoegge commented at 10:05 am on October 19, 2022:
    0std::shared_mutex g_unlinkfiles_mutex;
    

    Would be more inline with our naming conventions.

    But generally I think we should be removing globals instead of adding more. In this case, we could make ReadBlockFromDisk, ReadRawBlockFromDisk, UndoReadFromDisk, UnlinkPrunedFiles methods of BlockManager and have the mutex owned by that as well. I have not scoped out how big of a refactor that would be but I think that would be preferable.


    andrewtoth commented at 5:59 pm on October 20, 2022:
    All places have access to the BlockManager fairly easily so it should be straightforward refactor, with the exception of CZMQPublishRawBlockNotifier::NotifyBlock. I have no idea how I will get a handle to the BlockManager from there :/.

    andrewtoth commented at 6:33 pm on October 24, 2022:
    I created #26375 to remove needing to read a block from CZMQPublishRawBlockNotifier::NotifyBlock entirely. After that or something like it the refactor can be done much more easily. Marking as draft until that is merged.

    luke-jr commented at 10:49 pm on November 3, 2022:
    The refactor should be after and separate from the fix.
  9. dergoegge commented at 10:51 am on October 19, 2022: member

    Concept ACK

    I am wondering if the current approach actually fixes the race mentioned in #17161. Reading the block file will fail if (a) the file was removed or (b) if CBlockIndex::nStatus does not have the BLOCK_HAVE_DATA bit set at the time of calling GetBlockPos() in ReadBlockFromDisk. In this PR you cover (a) but not (b), is that right? BlockManager::PruneOneBlockFile unsets the bit during pruning before UnlinkPrunedFiles is called, so it seems to like the race could still happen.

  10. andrewtoth commented at 2:11 pm on October 19, 2022: contributor

    Reading the block file will fail if (a) the file was removed or (b) if CBlockIndex::nStatus does not have the BLOCK_HAVE_DATA bit set at the time of calling GetBlockPos() in ReadBlockFromDisk. In this PR you cover (a) but not (b), is that right? BlockManager::PruneOneBlockFile unsets the bit during pruning before UnlinkPrunedFiles is called, so it seems to like the race could still happen.

    Yes, that sounds right to me. However, after this change #17161 will not really be a problem. Since getting the block position inside ReadBlockFromDisk will return an empty position if nStatus does not have BLOCK_HAVE_DATA, the read will fail. Having the read fail should be fine as long as the error can be handled gracefully (like RPC/REST/ZMQ/net_processing) but in validation code cs_main should remain locked so it can’t happen.

    The real issue this solves is on Windows where if after pindex->GetBlockPos() is called and the file is opened then the files are unlinked. This will cause the file to not be removed. From https://en.cppreference.com/w/cpp/io/c/remove

    If the file is currently open by the current or another process, the behavior of this function is implementation-defined

    I think we should just avoid that happening altogether since “implementation-defined” sounds scary. This was suggested in #11913#pullrequestreview-88290878 and mentioned in #13903 (comment) as well.

    Another strategy could be to not unset CBlockIndex::nFile and CBlockIndex::nDataPos when calling BlockManager::PruneOneBlockFile and then not check for BLOCK_HAVE_DATA in CBlockIndex::GetBlockPos(). That way a read will still succeed even if the block is set to be pruned but the file has not been removed. Then nFile and nDataPos can be zeroed in UnlinkPrunedFiles. But that seems like a bigger change for not much more benefit.

  11. andrewtoth marked this as a draft on Oct 24, 2022
  12. luke-jr commented at 10:54 pm on November 3, 2022: member

    This seems like only a partial fix, since we should be checking BLOCK_HAVE_DATA within the lock. Otherwise, there’s still a race after we check it and the block is read, where the block could be pruned.

    OTOH, maybe we should just handle missing block files more gracefully than a failed assertion…

  13. andrewtoth commented at 4:07 pm on November 18, 2022: contributor
    Closing in favor of #26533.
  14. andrewtoth closed this on Nov 18, 2022

  15. andrewtoth deleted the branch on Aug 17, 2023
  16. bitcoin locked this on Aug 16, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-06 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me