CBlockIndex::nStatus race when pruning #17161

issue maflcko openend this issue on October 16, 2019
  1. maflcko commented at 5:21 pm on October 16, 2019: member

    Disclaimer: I couldn’t reproduce this issue in practice on ext4 and ntfs partitions on a x86 CPU, but I think that this issue should be solved in the long-term.

    Basically, CBlockIndex::nStatus is used to hold a bunch of flags (e.g. whether the block data is currently on the hard drive, or whether the block data was verified at some point in the past). Most of this access is from validation, which happens to hold the cs_main lock. However, some of the access (e.g. background index threads, or RPC calls) does not take the cs_main lock, so the access to nStatus is not guarded for them. In theory, this could turn out to be racy in combination with pruning.

    I am not sure how to solve this. Some ideas:

    • Put nStatus under the cs_main lock. –> meh, this will affect validation performance when polling on RPC
    • Make a “read-write-lock” for nStatus, similar to ::mempool.cs. I.e. when writing nStatus, the cs_main lock and the read lock is taken and when reading nStatus, only the read lock is taken. Maybe use a std::shared_mutex?
    • Any other ideas … !?
  2. maflcko added the label Brainstorming on Oct 16, 2019
  3. maflcko added the label Bug on Oct 16, 2019
  4. maflcko added the label Validation on Oct 16, 2019
  5. maflcko added the label Block storage on Oct 16, 2019
  6. promag commented at 10:04 am on October 17, 2019: member

    this will affect validation performance when polling on RPC

    Where? From a quick look only getrawtransaction reads nStatus without cs_main (I may be missing other cases). For this case we could do somthing like:

    0    uint32_t block_status;
    1    {
    2        LOCK(cs_main);
    3        blockindex = LookupBlockIndexAndStatus(blockhash, block_status);
    
  7. maflcko commented at 12:30 pm on October 17, 2019: member
    As pointed out by @ryanofsky in #13903 (comment), reading the status under a lock is not sufficient, as the block might be deleted as soon as the lock is releases.
  8. fanquake referenced this in commit 4daadce36c on Oct 17, 2019
  9. PastaPastaPasta referenced this in commit 1d2f2b0cc8 on Jun 27, 2021
  10. PastaPastaPasta referenced this in commit 26822f5e68 on Jun 28, 2021
  11. PastaPastaPasta referenced this in commit 328a7fd7be on Jun 29, 2021
  12. PastaPastaPasta referenced this in commit a9d7580bc7 on Jul 1, 2021
  13. PastaPastaPasta referenced this in commit fae0bfdfa8 on Jul 1, 2021
  14. PastaPastaPasta referenced this in commit d577147f4f on Jul 12, 2021
  15. PastaPastaPasta referenced this in commit bf8935119c on Jul 13, 2021
  16. PastaPastaPasta referenced this in commit b485210d68 on Jul 13, 2021
  17. vasild commented at 1:50 pm on November 26, 2021: contributor

    What exactly is the problem with nStatus?

    1. Torn reads? (e.g. somebody reads it while somebody else is writing it, resulting in read returning garbage - a value that was not in memory before the write, nor after the write)

    2. Torn writes? (e.g. two threads write to it concurrently, resulting in garbage being stored in it - a value that was not written by either thread)

    3. The access to nStatus is not synced with access to another variable?

  18. maflcko commented at 1:53 pm on November 26, 2021: member
    All of the above. nStatus is both written to and read from from several threads.
  19. vasild commented at 2:17 pm on November 26, 2021: contributor
    Which are the other variables (for 3.)? Because 1. and 2. alone could be solved by using std::atomic_uint32_t.
  20. maflcko commented at 2:27 pm on November 26, 2021: member
    atomic can’t solve this, because HAVE_DATA needs to be valid until right after you read the block, not the status of it. See my previous comment, which links to #13903 (comment)
  21. vasild commented at 3:02 pm on November 26, 2021: contributor

    Are you saying that even #22932 does not solve the problem because even with that PR the code is:

    https://github.com/bitcoin/bitcoin/blob/b1c859abb790a1bbc9c31e3a535e36d1c59085b9/src/node/blockstorage.cpp#L400-L404

    and the block file may be removed from disk after GetBlockPos() completes (under cs_main) but before the if? In this case ReadBlockFromDisk() would return false.

    cc @jonatack

  22. jonatack commented at 12:46 pm on December 6, 2021: contributor
    Thanks, having a look/updating.
  23. laanwj closed this on Jan 27, 2022

  24. fanquake commented at 10:11 am on January 27, 2022: member
    Not sure this should have been closed given #22932 doesn’t say it actually fixed this, and at least one of the review comments said the same.
  25. fanquake reopened this on Jan 27, 2022

  26. andrewtoth commented at 0:38 am on October 15, 2022: contributor
    If all that is left is the ReadBlockFromDisk race, then that could be fixed with just a shared mutex guarding the reading of blocks and the removal of blocks no? This way cs_main is not blocked and blocks can still be read in parallel via shared locks.
  27. maflcko commented at 3:49 pm on December 7, 2022: member
    Closing for now, as this is a theoretical issue that isn’t (hopefully) exposed to users.
  28. maflcko closed this on Dec 7, 2022

  29. Fabcien referenced this in commit dcac626d18 on Jan 24, 2023
  30. bitcoin locked this on Dec 7, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me