validation: fix UB in LoadChainTip #34521

pull marcofleon wants to merge 2 commits into bitcoin:master from marcofleon:2026/02/fix-ub-LoadChainTip changing 6 files +75 −40
  1. marcofleon commented at 6:23 pm on February 5, 2026: contributor

    Addresses #34503. See this issue for more details as well.

    Fixes a bug where, under certain conditions, setBlockIndexCandidates had blocks in it that were worse than the tip. Removing the tip from setBlockIndexCandidates before modifying its nSequenceId avoids the undefined behavior that results from modification of the sequence id field while the tip is still in the set.

  2. DrahtBot added the label Validation on Feb 5, 2026
  3. DrahtBot commented at 6:23 pm on February 5, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK Jhackman2019

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32950 (validation: remove BLOCK_FAILED_CHILD by stratospher)
    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
    • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. marcofleon commented at 6:32 pm on February 5, 2026: contributor

    note: Technically we still have UB with this fix because we alter the sequence ids of the whole chain of blocks back from the tip, and all of those blocks are in setBlockIndexCandidate at the time. I guess we could just fully clear the candidate set, change the ids, and then only add the tip and anything with more work? That might be a technically more complete fix. Although its weird to me that we populate the set with the whole block tree just to clear/prune it later.

    I was trying to move the initial population of the candidate set from LoadBlockIndex to LoadChainTip in this branch but a few functional tests were failing.

    I have a branch that I believe addresses both the UB summarized above and the deeper one mentioned in #34521#pullrequestreview-3761920357.

    I’m open to whatever reviewers think is better.

  5. maflcko added this to the milestone 31.0 on Feb 6, 2026
  6. mzumsande commented at 9:46 am on February 6, 2026: contributor

    I think that there are two types of UB remaining:

    1. For the one mentionend in #34521 (comment) I think we could simply move the PruneBlockIndexCandidates(); up too, before erasing/readding the tip. That way all ancestors of the tip would have been removed from setBlockIndexCandidates before they are adjusted.

    2. However, the second type of UB goes deeper: With assumeutxo, we can have two Chainstates with a setBlockIndexCandidates each. That means, even if we remove/readd the tip of CS1 while calling LoadChainTip for it, this would still be UB because the same index would not have been removed from CS2’s setBlockIndexCandidates . We would have to argue that the setBlockIndexCandidates blocks from each chainstate don’t overlap in a way that leads to UB, which seems very brittle.

    (extra explanation that may be helpful for others: During normal operation setBlockIndexCandidates has all connectable blocks with at least as much work as the current tip. However, during startup, we first add all connectable indexes to the set because we don’t know the tip yet, then load the tip, and then call PruneBlockIndexCandidates(); to remove almost all of them again.)

    FYI @sipa @TheCharlatan @furszy @sr-gi who authored/acked the original PR - any idea how to best fix this?

  7. Jhackman2019 commented at 5:30 am on February 7, 2026: none

    Tested on ARM64 and it builds clean. Ran the startup and validation tests and all passed.

    The alternative branch that clears and repopulates the candidate set seems like it would be a more complete fix.

    Concept ACK

  8. in src/validation.cpp:4624 in d97d221376


    l0rinc commented at 8:00 pm on February 7, 2026:

    Since LoadChainTip() also mutates nSequenceId for ancestors, should we remove/reinsert all mutated entries (not only tip) before changing sequence IDs? But more generally, can we avoid mutating CBlockIndex::nSequenceId while any setBlockIndexCandidates (across all chainstates) contains those indices?

    Also, could we add a targeted test that stresses the startup tip selection under tie conditions, and asserts the candidate set contains no entries worse than tip?


    l0rinc commented at 10:51 am on February 8, 2026:

    Also, could we add a targeted test that stresses

    The simplest test I found adds two equal-work siblings, resets block-sequence state, seeds CoinsTip with the worst-sorting candidate, calls LoadBlockIndex && LoadChainTip, then verifies every setBlockIndexCandidates entry is not worse than the tip and that set iteration order is comparator-consistent:

     0diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
     1--- a/src/test/validation_chainstatemanager_tests.cpp	(revision b2805eec35ce9fdb69892e864556be4ae057ce6a)
     2+++ b/src/test/validation_chainstatemanager_tests.cpp	(date 1770547026792)
     3@@ -24,6 +24,9 @@
     4 
     5 #include <tinyformat.h>
     6 
     7+#include <algorithm>
     8+#include <array>
     9+#include <ranges>
    10 #include <vector>
    11 
    12 #include <boost/test/unit_test.hpp>
    13@@ -591,6 +594,47 @@
    14     BOOST_CHECK_EQUAL(cs2.setBlockIndexCandidates.size(), num_indexes - last_assumed_valid_idx + 1);
    15 }
    16 
    17+//! Regression test for candidate pruning when LoadChainTip rewrites nSequenceId.
    18+BOOST_FIXTURE_TEST_CASE(chainstatemanager_loadchaintip_prunes_worse_candidates, TestChain100Setup)
    19+{
    20+    constexpr auto comparator{node::CBlockIndexWorkComparator{}};
    21+    auto& chainman{*Assert(m_node.chainman)};
    22+    auto& chainstate{chainman.ActiveChainstate()};
    23+    auto* active_tip{WITH_LOCK(chainman.GetMutex(), return chainman.ActiveTip())};
    24+    BOOST_REQUIRE(active_tip && active_tip->pprev);
    25+
    26+    // Add two equal-work siblings and use the worse one as CoinsTip best block.
    27+    LOCK(cs_main);
    28+    auto header{active_tip->GetBlockHeader()};
    29+    auto make_sibling{[&]() EXCLUSIVE_LOCKS_REQUIRED(cs_main) {
    30+        ++header.nNonce;
    31+        auto* sibling{chainman.m_blockman.AddToBlockIndex(header, chainman.m_best_header)};
    32+        sibling->RaiseValidity(BLOCK_VALID_TRANSACTIONS);
    33+        sibling->m_chain_tx_count = active_tip->m_chain_tx_count;
    34+        return sibling;
    35+    }};
    36+    std::array candidates{active_tip, make_sibling(), make_sibling()};
    37+    chainman.ResetBlockSequenceCounters();
    38+    for (auto& index : chainman.m_blockman.m_block_index | std::views::values) {
    39+        index.nSequenceId = SEQ_ID_INIT_FROM_DISK;
    40+    }
    41+    std::ranges::sort(candidates, comparator);
    42+
    43+    chainstate.m_chain.SetTip(*active_tip);
    44+    chainstate.CoinsTip().SetBestBlock(candidates.front()->GetBlockHash());
    45+    chainstate.ClearBlockIndexCandidates();
    46+
    47+    BOOST_REQUIRE(chainman.LoadBlockIndex() && chainstate.LoadChainTip());
    48+    const CBlockIndex* prev{nullptr};
    49+    for (auto* pindex : chainstate.setBlockIndexCandidates) {
    50+        BOOST_CHECK_MESSAGE(!comparator(pindex, chainstate.m_chain.Tip()), "Candidate sorts worse than the tip");
    51+        if (prev) {
    52+            BOOST_CHECK_MESSAGE(!comparator(pindex, prev), "Wrong candidate set iteration order");
    53+        }
    54+        prev = pindex;
    55+    }
    56+}
    57+
    58 //! Ensure that snapshot chainstate can be loaded when found on disk after a
    59 //! restart, and that new blocks can be connected to both chainstates.
    60 BOOST_FIXTURE_TEST_CASE(chainstatemanager_snapshot_init, SnapshotTestSetup)
    

    Before the fix this fails with:

    test/validation_chainstatemanager_tests.cpp:630: error: in “validation_chainstatemanager_tests/chainstatemanager_loadchaintip_prunes_worse_candidates”: Candidate sorts worse than the tip
    test/validation_chainstatemanager_tests.cpp:632: error: in “validation_chainstatemanager_tests/chainstatemanager_loadchaintip_prunes_worse_candidates”: Wrong candidate set iteration order

    And passes after with the erase in the loop:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1--- a/src/validation.cpp	(revision b2805eec35ce9fdb69892e864556be4ae057ce6a)
     2+++ b/src/validation.cpp	(date 1770547212278)
     3@@ -4614,15 +4614,15 @@
     4 
     5     // Make sure our chain tip before shutting down scores better than any other candidate
     6     // to maintain a consistent best tip over reboots in case of a tie.
     7+    // Erase each block from candidates before changing its nSequenceId. The set
     8+    // uses nSequenceId for ordering, so modifying it in-place is undefined behavior.
     9     auto target = tip;
    10     while (target) {
    11+        setBlockIndexCandidates.erase(target);
    12         target->nSequenceId = SEQ_ID_BEST_CHAIN_FROM_DISK;
    13         target = target->pprev;
    14     }
    15 
    16-    // Block index candidates are loaded before the chain tip, so we need to replace this entry
    17-    // Otherwise the scoring will be based on the memory address location instead of the nSequenceId
    18-    setBlockIndexCandidates.erase(tip);
    19     TryAddBlockIndexCandidate(tip);
    20     PruneBlockIndexCandidates();
    

    (though I couldn’t write a test that fails for your solution only (but passes for the looped erase), so the two fixes are likely equivalent - though conceptually mutating any key still in a std::set is UB so we should likely delete (or not insert) all of them).

  9. l0rinc changes_requested
  10. l0rinc commented at 8:28 pm on February 7, 2026: contributor
    I think I understand the race, but deleting/reinserting here feels brittle. Would it be possible to avoid pre-populating candidate inserts until after sequence IDs are finalized?
  11. Jhackman2019 commented at 1:37 am on February 8, 2026: none
    Agreed, this lines up with what mzumsande pointed out regarding the dual chainstate case too. I believe marcofleon’s alt branch already takes that approach.
  12. mzumsande commented at 8:55 am on February 9, 2026: contributor
    I also think that the fix in https://github.com/marcofleon/bitcoin/tree/2026/02/fix-setBlockIndexCandidate-ub looks better and would suggest to switch to that approach. Irrespective of the bug being fixed here I think that the current way of adding the entire chain and then removing all of it again from the set is not very elegant, so it seems good to change that anyway.
  13. marcofleon force-pushed on Feb 10, 2026
  14. marcofleon force-pushed on Feb 10, 2026
  15. DrahtBot added the label CI failed on Feb 10, 2026
  16. DrahtBot commented at 5:16 pm on February 10, 2026: contributor

    🚧 At least one of the CI tasks failed. Task macOS native: https://github.com/bitcoin/bitcoin/actions/runs/21874154480/job/63137979156 LLM reason (✨ experimental): validation_chainstatemanager_tests failed (subprocess aborted) causing the CTest run to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  17. marcofleon commented at 6:11 pm on February 10, 2026: contributor
    I’ve switched the approach to only populating setBlockIndexCandidates after we call LoadChainTip and included an assert that all candidate sets (if we have two chainstates) are empty right before we change all the sequence ids. This should fix both types of undefined behavior outlined in #34521#pullrequestreview-3761920357.
  18. DrahtBot removed the label CI failed on Feb 10, 2026
  19. l0rinc commented at 8:10 pm on February 10, 2026: contributor
    @marcofleon, could you please add a test as a first commit that documents the current behavior and the fix in the next commit that that adjust the test to help reviewers assess the behavior change (i.e. documented and debuggable before & after behavior)? Or if that’s too involved, just a test that we can run without the fix to assess its correctness (we can’t do that currently since it contains PopulateBlockIndexCandidates)? Please see my suggestion in https://github.com/bitcoin/bitcoin/pull/34521/changes#r2779130922
  20. validation: fix UB in `LoadChainTip`
    The removal of the chain tip from setBlockIndexCandidates was happening
    after nSequenceId was modified. Since the set uses nSequenceId for
    ordering via the comparator, modifying it while the element is in the
    set is undefined behavior, which would sometimes cause the erase to fail.
    
    With assumeutxo, a second form of UB exists: two chainstates each have
    their own candidate set, but share the same CBlockIndex objects. Calling
    LoadChainTip on one chainstate mutates nSequenceIds that are also in the
    other chainstate's set.
    
    Fix by populating setBlockIndexCandidates after all changes to
    nSequenceId.
    a8cfd1d5d0
  21. Extend functional test for setBlockIndexCandidates UB 72bd4c915e
  22. marcofleon force-pushed on Feb 13, 2026
  23. marcofleon commented at 2:59 pm on February 13, 2026: contributor

    @l0rinc I extended a functional test that restarts the node after a block race. If you remove the first commit and run the test, it should crash, although it might take a few runs. The undefined behavior doesn’t always have an effect on the candidate set. It depends on the initial ordering of the blocks in the set, which on master, comes down to the pointer addresses of the competing blocks.

    You can see that the conditions that cause the assert to fail are pretty specific. If you think the test is useful, I can keep the it there. Or I can take the commit out once you’re confident that the UB is fixed. I don’t mind either way.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-17 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me