Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

pull sr-gi wants to merge 5 commits into bitcoin:master from sr-gi:202403-block-tiebreak changing 6 files +185 −10
  1. sr-gi commented at 8:11 pm on March 12, 2024: member

    This PR grabs some interesting bits from #29284 and fixes some edge cases in how block tiebreaks are dealt with.

    Regarding #29284

    The main functionality from the PR was dropped given it was not an issue anymore, however, reviewers pointed out some comments were outdated #29284 (review) (which to my understanding may have led to thinking that there was still an issue) it also added test coverage for the aforementioned case which was already passing on master and is useful to keep.

    New functionality

    While reviewing the superseded PR, it was noticed that blocks that are loaded from disk may face a similar issue (check #29284 (comment) for more context).

    The issue comes from how tiebreaks for equal work blocks are handled: if two blocks have the same amount of work, the one that is activatable first wins, that is, the one for which we have all its data (and all of its ancestors’). The variable that keeps track of this, within CBlockIndex is nSequenceId, which is not persisted over restarts. This means that when a node is restarted, all blocks loaded from disk are defaulted the same nSequenceId: 0. Now, when trying to decide what chain is best on loading blocks from disk, the previous tiebreaker rule is not decisive anymore, so the CBlockIndexWorkComparator has to default to its last rule: whatever block is loaded first (has a smaller memory address).

    This means that if multiple same work tip candidates were available before restarting the node, it could be the case that the selected chain tip after restarting does not match the one before.

    Therefore, the way nSequenceId is initialized is changed to:

    • 0 for blocks that belong to the previously known best chain
    • 1 to all other blocks loaded from disk
  2. DrahtBot commented at 8:11 pm on March 12, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29640.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK furszy
    Stale ACK sipa, mzumsande

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/validation.cpp:4238 in 813961f057 outdated
    4188@@ -4189,7 +4189,7 @@ bool ChainstateManager::ProcessNewBlockHeaders(const std::vector<CBlockHeader>&
    4189         if (IsInitialBlockDownload() && ppindex && *ppindex) {
    4190             const CBlockIndex& last_accepted{**ppindex};
    4191             const int64_t blocks_left{(GetTime() - last_accepted.GetBlockTime()) / GetConsensus().nPowTargetSpacing};
    4192-            const double progress{100.0 * last_accepted.nHeight / (last_accepted.nHeight + blocks_left)};
    4193+            const double progress{100.0 * (last_accepted.nHeight + 1) / (last_accepted.nHeight + 1 + blocks_left)};
    


    maflcko commented at 10:16 am on March 13, 2024:

    I don’t understand this commit. Generally, it would be good to motivate code changes by explaining them.

    PNBH isn’t called for the genesis block, so it would be good to add steps to reproduce.


    sipa commented at 1:08 pm on March 13, 2024:
    A sanitizer hit an error in CI without this change, if I recall correctly, but I don’t remember the details. May be worth trying to remove the commit and seeing if and where it still fails.

    maflcko commented at 1:10 pm on March 13, 2024:
    Ah, I guess blocks_left is negative?

    maflcko commented at 1:29 pm on March 13, 2024:

    Yeah,

     0diff --git a/test/functional/wallet_disable.py b/test/functional/wallet_disable.py
     1index 9c73f7dead..e0deab80ef 100755
     2--- a/test/functional/wallet_disable.py
     3+++ b/test/functional/wallet_disable.py
     4@@ -14,11 +14,14 @@ from test_framework.util import assert_raises_rpc_error
     5 class DisableWalletTest (BitcoinTestFramework):
     6     def set_test_params(self):
     7         self.setup_clean_chain = True
     8-        self.num_nodes = 1
     9-        self.extra_args = [["-disablewallet"]]
    10+        self.num_nodes = 2
    11+        self.extra_args = [["-disablewallet"]]*2
    12         self.wallet_names = []
    13 
    14     def run_test (self):
    15+        with self.nodes[1].assert_debug_log(["sadf"]):
    16+         import time;self.nodes[1].setmocktime(int(time.time()-10*60))
    17+         self.generate(self.nodes[0], 1)
    18         # Make sure wallet is really disabled
    19         assert_raises_rpc_error(-32601, 'Method not found', self.nodes[0].getwalletinfo)
    20         x = self.nodes[0].validateaddress('3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy')
    

    SUMMARY: UndefinedBehaviorSanitizer: float-divide-by-zero validation.cpp:4192:65 in


    maflcko commented at 1:30 pm on March 13, 2024:

    In that case, the correct fix would be to not allow negative values.

    Otherwise, you are just making the problem go away for one specific value, and appear for another specific value.


    maflcko commented at 2:07 pm on March 13, 2024:
    Fixed in #29647, along with chrono and log cleanups.

    sr-gi commented at 6:58 pm on March 14, 2024:
    I’ll remove it from here and rebase on top of that once merged
  4. sr-gi renamed this:
    Adds missing test to chain ties (CBlockIndexWorkComparator)
    Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator)
    on Mar 14, 2024
  5. sr-gi renamed this:
    Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator)
    Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties)
    on Mar 14, 2024
  6. sr-gi force-pushed on Mar 14, 2024
  7. DrahtBot added the label CI failed on Mar 14, 2024
  8. DrahtBot commented at 8:52 pm on March 14, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22675498262

  9. sr-gi force-pushed on Mar 14, 2024
  10. sr-gi force-pushed on Mar 15, 2024
  11. sr-gi force-pushed on Mar 15, 2024
  12. DrahtBot removed the label CI failed on Mar 15, 2024
  13. sr-gi force-pushed on Mar 22, 2024
  14. sr-gi force-pushed on Mar 22, 2024
  15. DrahtBot added the label Needs rebase on Mar 22, 2024
  16. sr-gi force-pushed on Mar 25, 2024
  17. sr-gi commented at 7:45 am on March 25, 2024: member
    Rebased to drop the custom log fix in favor of a more generic solution (https://github.com/bitcoin/bitcoin/pull/29640#discussion_r1523327586)
  18. DrahtBot removed the label Needs rebase on Mar 25, 2024
  19. in src/validation.cpp:4513 in c7e663dfc1 outdated
    4508+    while (target) {
    4509+        target->nSequenceId = 0;
    4510+        target = target->pprev;
    4511+    }
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    


    mzumsande commented at 8:41 pm on May 31, 2024:
    typo: replace
  20. in src/validation.cpp:4514 in c7e663dfc1 outdated
    4509+        target->nSequenceId = 0;
    4510+        target = target->pprev;
    4511+    }
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    4514+    // Otherwise the scoring will be based on the memory address location instrad of the nSequenceId
    


    mzumsande commented at 8:41 pm on May 31, 2024:
    typo: instead
  21. in src/validation.cpp:4517 in c7e663dfc1 outdated
    4512+
    4513+    // Block index candidates are loaded before the chain tip, so we need to replae this entry
    4514+    // Otherwise the scoring will be based on the memory address location instrad of the nSequenceId
    4515+    setBlockIndexCandidates.erase(tip);
    4516+    TryAddBlockIndexCandidate(tip);
    4517+    // We also need to re-prune blocks that now score potentially worse than before
    


    mzumsande commented at 3:06 am on June 1, 2024:
    Why call PruneBlockIndexCandidates twice instead of moving it here?

    sr-gi commented at 6:06 pm on June 10, 2024:

    Umm, I really cannot see why I wanted to do this in two steps. It really doesn’t seem necessary.

    I’ve also re-arranged the code a bit to minimize the diff

  22. in src/node/blockstorage.cpp:166 in c7e663dfc1 outdated
    162@@ -163,7 +163,7 @@ bool CBlockIndexWorkComparator::operator()(const CBlockIndex* pa, const CBlockIn
    163     if (pa->nSequenceId > pb->nSequenceId) return true;
    164 
    165     // Use pointer address as tie breaker (should only happen with blocks
    166-    // loaded from disk, as those all have id 0).
    167+    // loaded from disk, as those all have id 1).
    


    luke-jr commented at 10:17 pm on June 7, 2024:
    0 or 1

    luke-jr commented at 10:18 pm on June 7, 2024:
    Actually, I guess the 0s should be unique so this just needs rephrasing some other way

    sr-gi commented at 6:13 pm on June 10, 2024:

    Updated to:

    (should only happen with blocks loaded from disk, as those share the same nSequenceId: 0 for blocks on the best chain, 1 for all others)

  23. sr-gi force-pushed on Jun 10, 2024
  24. DrahtBot added the label CI failed on Jul 23, 2024
  25. DrahtBot commented at 8:51 pm on July 23, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/26038919395

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  26. sr-gi force-pushed on Jul 24, 2024
  27. sr-gi commented at 3:11 pm on July 24, 2024: member

    Rebased to deal with CI failing.

    c76f4c6dab267b8f83c75e44939f77d237251c1a has been amended to include __file__ in the ChainTiebreaksTest constructor, as required by any subclass of BitcoinTestFramework since #30463

  28. DrahtBot removed the label CI failed on Jul 25, 2024
  29. sipa commented at 5:04 pm on July 25, 2024: member
    utACK c6ca2a17ea4d76f21d7702f70ba892a9494576bf
  30. in test/functional/feature_chain_tiebreaks.py:91 in c76f4c6dab outdated
    86+        peer.send_blocks_and_test([blocks[10]], node, success=False)
    87+        # B2 is still the active chain, as B7-B10 have missing parents.
    88+        assert_equal(node.getbestblockhash(), blocks[2].hash)
    89+
    90+        self.log.info('Send parents B3-B4 of B8-B10 in reverse order')
    91+        peer.send_blocks_and_test([blocks[4]], node, success=False, force_send=True)
    


    mzumsande commented at 4:18 pm on September 25, 2024:
    nit (not really related, but I noticed it asking myself why success=False here): The doc “if success is False: assert that the node’s tip doesn’t advance” in send_blocks_and_test is wrong. It it is only checked that the node’s tip doesn’t advance to the last block provided, however it can advance to another block, as it happens here.

    sr-gi commented at 9:06 pm on October 3, 2024:
    I guess it may be worth creating an issue / opening a separate PR to fix this, given it is unrelated
  31. mzumsande commented at 10:28 pm on September 25, 2024: contributor
    Code Review ACK c6ca2a17ea4d76f21d7702f70ba892a9494576bf I agree that it is more consistent to keep the same tip over a restart in case of multiple candidates (although I don’t think this is much of a problem in practice).
  32. in test/functional/feature_chain_tiebreaks.py:143 in c6ca2a17ea outdated
    138+
    139+        self.log.info('Restart the node and check that the best tip before restarting matched the ones afterwards')
    140+        # Restart and check enough times to this to eventually fail if the logic is broken
    141+        for _ in range(10):
    142+            self.restart_node(0)
    143+            assert_equal(blocks[0].hash, node.getbestblockhash())
    


    achow101 commented at 8:12 pm on December 4, 2024:

    In c6ca2a17ea4d76f21d7702f70ba892a9494576bf “test: Adds block tiebreak over restarts tests”

    This does not seem to fail on master, but it does on this branch with the fix commit reverted. Is it possible that this was fixed by a different PR?


    mzumsande commented at 9:23 pm on December 4, 2024:
    for me, it also fails on master - but not always, intermittently.

    sr-gi commented at 8:06 pm on December 5, 2024:

    This is due to how this failure is supposed to happen, which is not ideal. The obtained hash depends on the memory address where the competing hashes are loaded, so “enough” here is hard to measure.

    Happy to use an alternative approach is you can think of any :/

  33. in src/chain.h:196 in 6756af191f outdated
    191@@ -192,7 +192,9 @@ class CBlockIndex
    192     uint32_t nNonce{0};
    193 
    194     //! (memory only) Sequential id assigned to distinguish order in which blocks are received.
    195-    int32_t nSequenceId{0};
    196+    //! Initialized to 1 when loading blocks from disk, except for blocks belonging to the best chain
    197+    //! which overwrite it to 0.
    198+    int32_t nSequenceId{1};
    


    furszy commented at 9:20 pm on January 27, 2025:
    In 6756af191f0: nit: It might be more maintainable to define two constants, such as ID_DISK-ONLY_BEST_CHAIN=0 and ID_DISK-ONLY_FORK=1 (or something similar).
  34. furszy commented at 9:28 pm on January 27, 2025: member

    Concept ACK.

    Just started reviewing. It needs to be rebased on master to include CMake support. Also, left a small nit along the way.

  35. Updates CBlockIndexWorkComparator outdated comment a06e5a84f1
  36. test: add functional test for complex reorgs 05fb4f8aee
  37. Set the same best tip on restart if two candidates have the same work
    Before this, if we had two (or more) same work tip candidates and restarted our node,
    it could be the case that the block set as tip after bootstrap didn't match the one
    before stopping. That's because the work and `nSequenceId` of both block will be the same
    (the latter is only kept in memory), so the active chain after restart would have depended
    on what tip candidate was loaded first.
    
    This makes sure that we are consistent over reboots.
    c8f5e6234f
  38. sr-gi force-pushed on Jan 28, 2025
  39. sr-gi commented at 4:52 pm on January 28, 2025: member
    Thanks @furszy. I’ve added two constants for the values loaded from disk and rebased over master.
  40. furszy commented at 5:10 pm on January 28, 2025: member

    I’ve added two constants for the values loaded from disk

    Missed to push them?

  41. sr-gi force-pushed on Jan 28, 2025
  42. sr-gi commented at 5:14 pm on January 28, 2025: member

    I’ve added two constants for the values loaded from disk

    Missed to push them?

    Indeed. Should be fixed now

  43. Make nSequenceId init value constants
    Make it easier to follow what the values come without having to go
    over the comments, plus easier to maintain
    c7f9061d55
  44. test: Adds block tiebreak over restarts tests
    Adds tests to make sure we are consistent on activating the same chain over
    a node restart if two or more candidates have the same work when the node is shutdown
    177d07f659
  45. sr-gi force-pushed on Feb 11, 2025
  46. sr-gi commented at 4:20 pm on February 11, 2025: member
    The last commit had a small typo, force-pushing to fix it
  47. in src/validation.cpp:4724 in c8f5e6234f outdated
    4717@@ -4718,9 +4718,22 @@ bool Chainstate::LoadChainTip()
    4718         return false;
    4719     }
    4720     m_chain.SetTip(*pindex);
    4721+    tip = m_chain.Tip();
    4722+
    4723+    // Make sure our chain tip before shutting down scores better than any other candidate
    4724+    // to maintain a consistent best tip over reboots
    


    furszy commented at 3:01 pm on February 12, 2025:

    tiny nit:

    0    // Make sure our chain tip before shutting down scores better than any other candidate
    1    // to maintain a consistent best tip over reboots in case of a tie with another chain.
    
  48. in src/validation.cpp:4729 in c8f5e6234f outdated
    4724+    // to maintain a consistent best tip over reboots
    4725+    auto target = tip;
    4726+    while (target) {
    4727+        target->nSequenceId = 0;
    4728+        target = target->pprev;
    4729+    }
    


    furszy commented at 3:48 pm on February 12, 2025:

    q: cannot just set the tip’s sequence id to 0?

    We only compare the tip inside TryAddBlockIndexCandidate and not the entire chain (if not, then a test might be missing because setting only the tip to 0 still passes all tests).


    furszy commented at 4:13 pm on February 12, 2025:
    Answered offline by mzumsande. It’s for an edge case: the user invalidates the tip after startup and before receiving any blocks from the network. The node then might find out that there are two competing blocks for the tip’s ancestor. A test wouldn’t hurt to document this behavior but it is not entirely needed.

    mzumsande commented at 4:20 pm on February 12, 2025:
    I don’t think that this is an edge case we’d necessarily need to support though - it was just the only case I could come up with where it would matter.
  49. furszy commented at 4:31 pm on February 12, 2025: member
    Code review ACK 177d07f
  50. DrahtBot requested review from mzumsande on Feb 12, 2025
  51. DrahtBot requested review from sipa on Feb 12, 2025
  52. ryanofsky commented at 4:47 pm on March 23, 2025: contributor
    Might be ready for merge if @mzumsande and @sipa can re-ack

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-03 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me