index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability #26215

pull ryanofsky wants to merge 1 commits into bitcoin:master from ryanofsky:pr/untilsync changing 1 files +12 −1
  1. ryanofsky commented at 1:21 pm on September 30, 2022: contributor

    Since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb from PR #21726, index BlockUntilSyncedToCurrentChain behavior has been less reliable, and there has also been a race condition in the coinstatsindex_initial_sync unit test.

    It seems better for BlockUntilSyncedToCurrentChain to actually wait for the last connected block to be fully processed, than to be able to return before prune locks are set, so this switches the order of m_best_block_index = block; and UpdatePruneLock statements in SetBestBlockIndex to make it more reliable.

    Also since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb, there has been a race condition in the coinstatsindex_initial_sync test. Before that commit, the atomic index best block pointer m_best_block_index was updated as the last step of BaseIndex::BlockConnected, so BlockUntilSyncedToCurrentChain could safely be used in tests to wait for the last BlockConnected notification to be finished before stopping and destroying the index. But after that commit, calling BlockUntilSyncedToCurrentChain is no longer sufficient, and there is a race between the test shutdown code which destroys the index object and the new code introduced in that commit calling AllowPrune() and GetName() on the index object. Reproducibility instructions for this are in #25365 (comment)

    This commit fixes the coinstatsindex_initial_sync race condition, even though it will require an additional change to silence TSAN false positives, #26188, after it is fixed. So this partially addresses but does not resolve the bug reporting TSAN errors #25365.

    There is no known race condition outside of test code currently, because the bitcoind Shutdown function calls FlushBackgroundCallbacks not BlockUntilSyncedToCurrentChain to safely shut down.

    Co-authored-by: vasild Co-authored-by: MarcoFalke

  2. ryanofsky marked this as ready for review on Sep 30, 2022
  3. maflcko added this to the milestone 24.0 on Sep 30, 2022
  4. DrahtBot added the label UTXO Db and Indexes on Sep 30, 2022
  5. maflcko commented at 1:41 pm on September 30, 2022: member

    Concept ACK dd2ef55a86b85a9f1dc8cd1a7a4a0fc7ed2d7da4

    The description makes sense, but I have not reviewed in detail that bitcoind behaviour remains unchanged.

    Tagged for backport, but happy to drop again if this seems too risky.

  6. ryanofsky commented at 1:43 pm on September 30, 2022: contributor
    Might be possible to write a unit test that ensures BlockUntilSyncedToCurrentChain doesn’t return until after prune locks are updated. But it seems a little tricky so this PR does not have a test for now.
  7. ryanofsky commented at 1:53 pm on September 30, 2022: contributor

    Tagged for backport, but happy to drop again if this seems too risky.

    Thanks, I think it’s probably good to backport. SInce it’s just doing an atomic assignment one step later it seems safe and it’s hard to think of ways it could cause a problem.

  8. DrahtBot commented at 10:34 pm on September 30, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. maflcko requested review from fjahr on Oct 4, 2022
  10. fanquake commented at 2:35 pm on October 4, 2022: member
    @mzumsande care to take a look here?
  11. in src/index/base.cpp:427 in dd2ef55a86 outdated
    419         node::PruneLockInfo prune_lock;
    420         prune_lock.height_first = block->nHeight;
    421         WITH_LOCK(::cs_main, m_chainstate->m_blockman.UpdatePruneLock(GetName(), prune_lock));
    422     }
    423+
    424+    // Intentionally set m_best_block_index as the last step in this function,
    


    mzumsande commented at 8:33 pm on October 4, 2022:
    Maybe it could be helpful to additionally add a similar comment at the end of BaseIndex::BlockConnected, so that no one attempts to add references to *this after the call to SetBestBlockIndex there either, for the same reason.

    ryanofsky commented at 2:57 pm on October 5, 2022:

    re: #26215 (review)

    Agree that would be helpful. Added comment

  12. mzumsande commented at 8:50 pm on October 4, 2022: contributor

    ACK dd2ef55a86b85a9f1dc8cd1a7a4a0fc7ed2d7da4

    Took me a while to catch up with the entire discussion around #25365, but this makes sense to me. Aside from the issue with BlockUntilSyncedToCurrentChain that is fixed here, the order between updating prune locks and the best index shouldn’t matter, since the validation code which performs the pruning does not interact with m_best_index- so I think this is safe.

  13. index: Improve BaseIndex::BlockUntilSyncedToCurrentChain reliability
    Since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb from PR
    https://github.com/bitcoin/bitcoin/pull/21726, index
    `BlockUntilSyncedToCurrentChain` behavior has been less reliable, and there has
    also been a race condition in the `coinstatsindex_initial_sync` unit test.
    
    It seems better for `BlockUntilSyncedToCurrentChain` to actually wait for the
    last connected block to be fully processed, than to be able to return before
    prune locks are set, so this switches the order of `m_best_block_index =
    block;` and `UpdatePruneLock` statements in `SetBestBlockIndex` to make it more
    reliable.
    
    Also since commit f08c9fb0c6a799e3cb75ca5f763a746471625beb, there has been a
    race condition in the `coinstatsindex_initial_sync` test. Before that commit,
    the atomic index best block pointer `m_best_block_index` was updated as the
    last step of `BaseIndex::BlockConnected`, so `BlockUntilSyncedToCurrentChain`
    could safely be used in tests to wait for the last `BlockConnected`
    notification to be finished before stopping and destroying the index. But
    after that commit, calling `BlockUntilSyncedToCurrentChain` is no longer
    sufficient, and there is a race between the test shutdown code which destroys
    the index object and the new code introduced in that commit calling
    `AllowPrune()` and `GetName()` on the index object. Reproducibility
    instructions for this are in
    https://github.com/bitcoin/bitcoin/issues/25365#issuecomment-1259744133
    
    This commit fixes the `coinstatsindex_initial_sync` race condition, even though
    it will require an additional change to silence TSAN false positives,
    https://github.com/bitcoin/bitcoin/pull/26188, after it is fixed. So this
    partially addresses but does not resolve the bug reporting TSAN errors
    https://github.com/bitcoin/bitcoin/issues/25365.
    
    There is no known race condition outside of test code currently, because the
    bitcoind `Shutdown` function calls `FlushBackgroundCallbacks` not
    `BlockUntilSyncedToCurrentChain` to safely shut down.
    
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    Co-authored-by: MacroFake <falke.marco@gmail.com>
    8891949bdc
  14. ryanofsky force-pushed on Oct 5, 2022
  15. ryanofsky commented at 3:09 pm on October 5, 2022: contributor

    Thanks for the review!

    Updated dd2ef55a86b85a9f1dc8cd1a7a4a0fc7ed2d7da4 -> 8891949bdcb25093d3a6703ae8228c3c3687d3a4 (pr/untilsync.1 -> pr/untilsync.2, compare) adding suggested comment.

    re: #26215#pullrequestreview-1130577545

    Took me a while to catch up with the entire discussion around #25365

    I hope the PR description was clear enough explaining the change and what motivated it, so it wasn’t actually neccessary to read the old discussion thread, but you could let me know if I missed anything important in the explanation.

  16. mzumsande commented at 6:23 pm on October 5, 2022: contributor

    re-ACK 8891949bdcb25093d3a6703ae8228c3c3687d3a4

    I hope the PR description was clear enough explaining the change and what motivated it, so it wasn’t actually neccessary to read the old discussion thread, but you could let me know if I missed anything important in the explanation.

    The PR is clear, I don’t think anything is missing, I just decided I wanted to follow the discussion chronologically.

  17. fanquake merged this on Oct 10, 2022
  18. fanquake closed this on Oct 10, 2022

  19. fanquake added the label Needs backport (24.x) on Oct 10, 2022
  20. sidhujag referenced this in commit f4572f9ae4 on Oct 10, 2022
  21. fanquake referenced this in commit 5ad82a09b4 on Oct 11, 2022
  22. fanquake removed the label Needs backport (24.x) on Oct 11, 2022
  23. fanquake commented at 1:21 am on October 11, 2022: member
    Added to #26133 for backport to 24.x.
  24. jonatack commented at 4:36 pm on October 11, 2022: contributor
    Post-merge ACK 8891949bdcb25093d3a6703ae8228c3c3687d3a4
  25. achow101 referenced this in commit 885366c67a on Oct 13, 2022
  26. bitcoin locked this on Oct 13, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me