Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

pull sr-gi wants to merge 4 commits into bitcoin:master from sr-gi:202403-block-tiebreak changing 6 files +179 −10
  1. sr-gi commented at 8:11 pm on March 12, 2024: member

    This PR grabs some interesting bits from #29284 and fixes some edge cases in how block tiebreaks are dealt with.

    Regarding #29284

    The main functionality from the PR was dropped given it was not an issue anymore, however, reviewers pointed out some comments were outdated #29284 (review) (which to my understanding may have led to thinking that there was still an issue) it also added test coverage for the aforementioned case which was already passing on master and is useful to keep.

    New functionality

    While reviewing the superseded PR, it was noticed that blocks that are loaded from disk may face a similar issue (check #29284 (comment) for more context).

    The issue comes from how tiebreaks for equal work blocks are handled: if two blocks have the same amount of work, the one that is activatable first wins, that is, the one for which we have all its data (and all of its ancestors’). The variable that keeps track of this, within CBlockIndex is nSequenceId, which is not persisted over restarts. This means that when a node is restarted, all blocks loaded from disk are defaulted the same nSequenceId: 0. Now, when trying to decide what chain is best on loading blocks from disk, the previous tiebreaker rule is not decisive anymore, so the CBlockIndexWorkComparator has to default to its last rule: whatever block is loaded first (has a smaller memory address).

    This means that if multiple same work tip candidates were available before restarting the node, it could be the case that the selected chain tip after restarting does not match the one before.

    Therefore, the way nSequenceId is initialized is changed to:

    • 0 for blocks that belong to the previously known best chain
    • 1 to all other blocks loaded from disk
  2. DrahtBot commented at 8:11 pm on March 12, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa, mzumsande

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/validation.cpp:4238 in 813961f057 outdated
    4188@@ -4189,7 +4189,7 @@ bool ChainstateManager::ProcessNewBlockHeaders(const std::vector<CBlockHeader>&
    4189         if (IsInitialBlockDownload() && ppindex && *ppindex) {
    4190             const CBlockIndex& last_accepted{**ppindex};
    4191             const int64_t blocks_left{(GetTime() - last_accepted.GetBlockTime()) / GetConsensus().nPowTargetSpacing};
    4192-            const double progress{100.0 * last_accepted.nHeight / (last_accepted.nHeight + blocks_left)};
    4193+            const double progress{100.0 * (last_accepted.nHeight + 1) / (last_accepted.nHeight + 1 + blocks_left)};
    


    maflcko commented at 10:16 am on March 13, 2024:

    I don’t understand this commit. Generally, it would be good to motivate code changes by explaining them.

    PNBH isn’t called for the genesis block, so it would be good to add steps to reproduce.


    sipa commented at 1:08 pm on March 13, 2024:
    A sanitizer hit an error in CI without this change, if I recall correctly, but I don’t remember the details. May be worth trying to remove the commit and seeing if and where it still fails.

    maflcko commented at 1:10 pm on March 13, 2024:
    Ah, I guess blocks_left is negative?

    maflcko commented at 1:29 pm on March 13, 2024:

    Yeah,

     0diff --git a/test/functional/wallet_disable.py b/test/functional/wallet_disable.py
     1index 9c73f7dead..e0deab80ef 100755
     2--- a/test/functional/wallet_disable.py
     3+++ b/test/functional/wallet_disable.py
     4@@ -14,11 +14,14 @@ from test_framework.util import assert_raises_rpc_error
     5 class DisableWalletTest (BitcoinTestFramework):
     6     def set_test_params(self):
     7         self.setup_clean_chain = True
     8-        self.num_nodes = 1
     9-        self.extra_args = [["-disablewallet"]]
    10+        self.num_nodes = 2
    11+        self.extra_args = [["-disablewallet"]]*2
    12         self.wallet_names = []
    13 
    14     def run_test (self):
    15+        with self.nodes[1].assert_debug_log(["sadf"]):
    16+         import time;self.nodes[1].setmocktime(int(time.time()-10*60))
    17+         self.generate(self.nodes[0], 1)
    18         # Make sure wallet is really disabled
    19         assert_raises_rpc_error(-32601, 'Method not found', self.nodes[0].getwalletinfo)
    20         x = self.nodes[0].validateaddress('3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy')
    

    SUMMARY: UndefinedBehaviorSanitizer: float-divide-by-zero validation.cpp:4192:65 in


    maflcko commented at 1:30 pm on March 13, 2024:

    In that case, the correct fix would be to not allow negative values.

    Otherwise, you are just making the problem go away for one specific value, and appear for another specific value.


    maflcko commented at 2:07 pm on March 13, 2024:
    Fixed in #29647, along with chrono and log cleanups.

    sr-gi commented at 6:58 pm on March 14, 2024:
    I’ll remove it from here and rebase on top of that once merged
  4. sr-gi renamed this:
    Adds missing test to chain ties (CBlockIndexWorkComparator)
    Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator)
    on Mar 14, 2024
  5. sr-gi renamed this:
    Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator)
    Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties)
    on Mar 14, 2024
  6. sr-gi force-pushed on Mar 14, 2024
  7. DrahtBot added the label CI failed on Mar 14, 2024
  8. DrahtBot commented at 8:52 pm on March 14, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22675498262

  9. sr-gi force-pushed on Mar 14, 2024
  10. sr-gi force-pushed on Mar 15, 2024
  11. sr-gi force-pushed on Mar 15, 2024
  12. DrahtBot removed the label CI failed on Mar 15, 2024
  13. sr-gi force-pushed on Mar 22, 2024
  14. sr-gi force-pushed on Mar 22, 2024
  15. DrahtBot added the label Needs rebase on Mar 22, 2024
  16. sr-gi force-pushed on Mar 25, 2024
  17. sr-gi commented at 7:45 am on March 25, 2024: member
    Rebased to drop the custom log fix in favor of a more generic solution (https://github.com/bitcoin/bitcoin/pull/29640#discussion_r1523327586)
  18. DrahtBot removed the label Needs rebase on Mar 25, 2024
  19. in src/validation.cpp:4513 in c7e663dfc1 outdated
    4508+    while (target) {
    4509+        target->nSequenceId = 0;
    4510+        target = target->pprev;
    4511+    }
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    


    mzumsande commented at 8:41 pm on May 31, 2024:
    typo: replace
  20. in src/validation.cpp:4514 in c7e663dfc1 outdated
    4509+        target->nSequenceId = 0;
    4510+        target = target->pprev;
    4511+    }
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    4514+    // Otherwise the scoring will be based on the memory address location instrad of the nSequenceId
    


    mzumsande commented at 8:41 pm on May 31, 2024:
    typo: instead
  21. in src/validation.cpp:4517 in c7e663dfc1 outdated
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    4514+    // Otherwise the scoring will be based on the memory address location instrad of the nSequenceId
    4515+    setBlockIndexCandidates.erase(tip);
    4516+    TryAddBlockIndexCandidate(tip);
    4517+    // We also need to re-prune blocks that now score potentially worse than before
    


    mzumsande commented at 3:06 am on June 1, 2024:
    Why call PruneBlockIndexCandidates twice instead of moving it here?

    sr-gi commented at 6:06 pm on June 10, 2024:

    Umm, I really cannot see why I wanted to do this in two steps. It really doesn’t seem necessary.

    I’ve also re-arranged the code a bit to minimize the diff

  22. in src/node/blockstorage.cpp:166 in c7e663dfc1 outdated
    162@@ -163,7 +163,7 @@ bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIn
    163     if (pa->nSequenceId > pb->nSequenceId) return true;
    164 
    165     // Use pointer address as tie breaker (should only happen with blocks
    166-    // loaded from disk, as those all have id 0).
    167+    // loaded from disk, as those all have id 1).
    


    luke-jr commented at 10:17 pm on June 7, 2024:
    0 or 1

    luke-jr commented at 10:18 pm on June 7, 2024:
    Actually, I guess the 0s should be unique so this just needs rephrasing some other way

    sr-gi commented at 6:13 pm on June 10, 2024:

    Updated to:

    (should only happen with blocks loaded from disk, as those share the same nSequenceId: 0 for blocks on the best chain, 1 for all others)

  23. sr-gi force-pushed on Jun 10, 2024
  24. DrahtBot added the label CI failed on Jul 23, 2024
  25. DrahtBot commented at 8:51 pm on July 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/26038919395

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  26. Updates CBlockIndexWorkComparator outdated comment 3be1e099f8
  27. test: add functional test for complex reorgs c76f4c6dab
  28. Set the same best tip on restart if two candidates have the same work
    Before this, if we had two (or more) same work tip candidates and restarted our node,
    it could be the case that the block set as tip after bootstrap didn't match the one
    before stopping. That's because the work and `nSequenceId` of both block will be the same
    (the latter is only kept in memory), so the active chain after restart would have depended
    on what tip candidate was loaded first.
    
    This makes sure that we are consistent over reboots.
    6756af191f
  29. test: Adds block tiebreak over restarts tests
    Adds tests to make sure we are consistent on activating the same chain over
    a node restart if two or more candidates have the same work when the node is shutdown
    c6ca2a17ea
  30. sr-gi force-pushed on Jul 24, 2024
  31. sr-gi commented at 3:11 pm on July 24, 2024: member

    Rebased to deal with CI failing.

    c76f4c6dab267b8f83c75e44939f77d237251c1a has been amended to include __file__ in the ChainTiebreaksTest constructor, as required by any subclass of BitcoinTestFramework since #30463

  32. DrahtBot removed the label CI failed on Jul 25, 2024
  33. sipa commented at 5:04 pm on July 25, 2024: member
    utACK c6ca2a17ea4d76f21d7702f70ba892a9494576bf
  34. in test/functional/feature_chain_tiebreaks.py:91 in c76f4c6dab outdated
    86+        peer.send_blocks_and_test([blocks[10]], node, success=False)
    87+        # B2 is still the active chain, as B7-B10 have missing parents.
    88+        assert_equal(node.getbestblockhash(), blocks[2].hash)
    89+
    90+        self.log.info('Send parents B3-B4 of B8-B10 in reverse order')
    91+        peer.send_blocks_and_test([blocks[4]], node, success=False, force_send=True)
    


    mzumsande commented at 4:18 pm on September 25, 2024:
    nit (not really related, but I noticed it asking myself why success=False here): The doc “if success is False: assert that the node’s tip doesn’t advance” in send_blocks_and_test is wrong. It it is only checked that the node’s tip doesn’t advance to the last block provided, however it can advance to another block, as it happens here.

    sr-gi commented at 9:06 pm on October 3, 2024:
    I guess it may be worth creating an issue / opening a separate PR to fix this, given it is unrelated
  35. mzumsande commented at 10:28 pm on September 25, 2024: contributor
    Code Review ACK c6ca2a17ea4d76f21d7702f70ba892a9494576bf I agree that it is more consistent to keep the same tip over a restart in case of multiple candidates (although I don’t think this is much of a problem in practice).

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me