validation: fix UB in LoadChainTip #34521

pull marcofleon wants to merge 3 commits into bitcoin:master from marcofleon:2026/02/fix-ub-LoadChainTip changing 6 files +76 −43
  1. marcofleon commented at 6:23 pm on February 5, 2026: contributor

    Addresses #34503. See this issue for more details as well.

    Fixes a bug where, under certain conditions, setBlockIndexCandidates had blocks in it that were worse than the tip. The block index candidate set uses nSequenceId as a sort key, so modifying this field while blocks are in the set results in undefined behavior. This PR populates setBlockIndexCandidates after the nSequenceId modifications, avoiding the UB.

  2. DrahtBot added the label Validation on Feb 5, 2026
  3. DrahtBot commented at 6:23 pm on February 5, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, sipa, achow101
    Concept ACK Jhackman2019, stickies-v, mzumsande, stringintech
    Stale ACK dergoegge, fjahr

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
    • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. marcofleon commented at 6:32 pm on February 5, 2026: contributor

    note: Technically we still have UB with this fix because we alter the sequence ids of the whole chain of blocks back from the tip, and all of those blocks are in setBlockIndexCandidate at the time. I guess we could just fully clear the candidate set, change the ids, and then only add the tip and anything with more work? That might be a technically more complete fix. Although its weird to me that we populate the set with the whole block tree just to clear/prune it later.

    I was trying to move the initial population of the candidate set from LoadBlockIndex to LoadChainTip in this branch but a few functional tests were failing.

    I have a branch that I believe addresses both the UB summarized above and the deeper one mentioned in #34521#pullrequestreview-3761920357.

    I’m open to whatever reviewers think is better.

  5. maflcko added this to the milestone 31.0 on Feb 6, 2026
  6. mzumsande commented at 9:46 am on February 6, 2026: contributor

    I think that there are two types of UB remaining:

    1. For the one mentionend in #34521 (comment) I think we could simply move the PruneBlockIndexCandidates(); up too, before erasing/readding the tip. That way all ancestors of the tip would have been removed from setBlockIndexCandidates before they are adjusted.

    2. However, the second type of UB goes deeper: With assumeutxo, we can have two Chainstates with a setBlockIndexCandidates each. That means, even if we remove/readd the tip of CS1 while calling LoadChainTip for it, this would still be UB because the same index would not have been removed from CS2’s setBlockIndexCandidates . We would have to argue that the setBlockIndexCandidates blocks from each chainstate don’t overlap in a way that leads to UB, which seems very brittle.

    (extra explanation that may be helpful for others: During normal operation setBlockIndexCandidates has all connectable blocks with at least as much work as the current tip. However, during startup, we first add all connectable indexes to the set because we don’t know the tip yet, then load the tip, and then call PruneBlockIndexCandidates(); to remove almost all of them again.)

    FYI @sipa @TheCharlatan @furszy @sr-gi who authored/acked the original PR - any idea how to best fix this?

  7. Jhackman2019 commented at 5:30 am on February 7, 2026: none

    Tested on ARM64 and it builds clean. Ran the startup and validation tests and all passed.

    The alternative branch that clears and repopulates the candidate set seems like it would be a more complete fix.

    Concept ACK

  8. in src/validation.cpp:4624 in d97d221376 outdated


    l0rinc commented at 8:00 pm on February 7, 2026:

    Since LoadChainTip() also mutates nSequenceId for ancestors, should we remove/reinsert all mutated entries (not only tip) before changing sequence IDs? But more generally, can we avoid mutating CBlockIndex::nSequenceId while any setBlockIndexCandidates (across all chainstates) contains those indices?

    Also, could we add a targeted test that stresses the startup tip selection under tie conditions, and asserts the candidate set contains no entries worse than tip?


    l0rinc commented at 10:51 am on February 8, 2026:

    Also, could we add a targeted test that stresses

    The simplest test I found adds two equal-work siblings, resets block-sequence state, seeds CoinsTip with the worst-sorting candidate, calls LoadBlockIndex && LoadChainTip, then verifies every setBlockIndexCandidates entry is not worse than the tip and that set iteration order is comparator-consistent:

     0diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
     1--- a/src/test/validation_chainstatemanager_tests.cpp	(revision b2805eec35ce9fdb69892e864556be4ae057ce6a)
     2+++ b/src/test/validation_chainstatemanager_tests.cpp	(date 1770547026792)
     3@@ -24,6 +24,9 @@
     4 
     5 #include <tinyformat.h>
     6 
     7+#include <algorithm>
     8+#include <array>
     9+#include <ranges>
    10 #include <vector>
    11 
    12 #include <boost/test/unit_test.hpp>
    13@@ -591,6 +594,47 @@
    14     BOOST_CHECK_EQUAL(cs2.setBlockIndexCandidates.size(), num_indexes - last_assumed_valid_idx + 1);
    15 }
    16 
    17+//! Regression test for candidate pruning when LoadChainTip rewrites nSequenceId.
    18+BOOST_FIXTURE_TEST_CASE(chainstatemanager_loadchaintip_prunes_worse_candidates, TestChain100Setup)
    19+{
    20+    constexpr auto comparator{node::CBlockIndexWorkComparator{}};
    21+    auto& chainman{*Assert(m_node.chainman)};
    22+    auto& chainstate{chainman.ActiveChainstate()};
    23+    auto* active_tip{WITH_LOCK(chainman.GetMutex(), return chainman.ActiveTip())};
    24+    BOOST_REQUIRE(active_tip && active_tip->pprev);
    25+
    26+    // Add two equal-work siblings and use the worse one as CoinsTip best block.
    27+    LOCK(cs_main);
    28+    auto header{active_tip->GetBlockHeader()};
    29+    auto make_sibling{[&]() EXCLUSIVE_LOCKS_REQUIRED(cs_main) {
    30+        ++header.nNonce;
    31+        auto* sibling{chainman.m_blockman.AddToBlockIndex(header, chainman.m_best_header)};
    32+        sibling->RaiseValidity(BLOCK_VALID_TRANSACTIONS);
    33+        sibling->m_chain_tx_count = active_tip->m_chain_tx_count;
    34+        return sibling;
    35+    }};
    36+    std::array candidates{active_tip, make_sibling(), make_sibling()};
    37+    chainman.ResetBlockSequenceCounters();
    38+    for (auto& index : chainman.m_blockman.m_block_index | std::views::values) {
    39+        index.nSequenceId = SEQ_ID_INIT_FROM_DISK;
    40+    }
    41+    std::ranges::sort(candidates, comparator);
    42+
    43+    chainstate.m_chain.SetTip(*active_tip);
    44+    chainstate.CoinsTip().SetBestBlock(candidates.front()->GetBlockHash());
    45+    chainstate.ClearBlockIndexCandidates();
    46+
    47+    BOOST_REQUIRE(chainman.LoadBlockIndex() && chainstate.LoadChainTip());
    48+    const CBlockIndex* prev{nullptr};
    49+    for (auto* pindex : chainstate.setBlockIndexCandidates) {
    50+        BOOST_CHECK_MESSAGE(!comparator(pindex, chainstate.m_chain.Tip()), "Candidate sorts worse than the tip");
    51+        if (prev) {
    52+            BOOST_CHECK_MESSAGE(!comparator(pindex, prev), "Wrong candidate set iteration order");
    53+        }
    54+        prev = pindex;
    55+    }
    56+}
    57+
    58 //! Ensure that snapshot chainstate can be loaded when found on disk after a
    59 //! restart, and that new blocks can be connected to both chainstates.
    60 BOOST_FIXTURE_TEST_CASE(chainstatemanager_snapshot_init, SnapshotTestSetup)
    

    Before the fix this fails with:

    test/validation_chainstatemanager_tests.cpp:630: error: in “validation_chainstatemanager_tests/chainstatemanager_loadchaintip_prunes_worse_candidates”: Candidate sorts worse than the tip
    test/validation_chainstatemanager_tests.cpp:632: error: in “validation_chainstatemanager_tests/chainstatemanager_loadchaintip_prunes_worse_candidates”: Wrong candidate set iteration order

    And passes after with the erase in the loop:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1--- a/src/validation.cpp	(revision b2805eec35ce9fdb69892e864556be4ae057ce6a)
     2+++ b/src/validation.cpp	(date 1770547212278)
     3@@ -4614,15 +4614,15 @@
     4 
     5     // Make sure our chain tip before shutting down scores better than any other candidate
     6     // to maintain a consistent best tip over reboots in case of a tie.
     7+    // Erase each block from candidates before changing its nSequenceId. The set
     8+    // uses nSequenceId for ordering, so modifying it in-place is undefined behavior.
     9     auto target = tip;
    10     while (target) {
    11+        setBlockIndexCandidates.erase(target);
    12         target->nSequenceId = SEQ_ID_BEST_CHAIN_FROM_DISK;
    13         target = target->pprev;
    14     }
    15 
    16-    // Block index candidates are loaded before the chain tip, so we need to replace this entry
    17-    // Otherwise the scoring will be based on the memory address location instead of the nSequenceId
    18-    setBlockIndexCandidates.erase(tip);
    19     TryAddBlockIndexCandidate(tip);
    20     PruneBlockIndexCandidates();
    

    (though I couldn’t write a test that fails for your solution only (but passes for the looped erase), so the two fixes are likely equivalent - though conceptually mutating any key still in a std::set is UB so we should likely delete (or not insert) all of them).


    mzumsande commented at 10:34 am on March 4, 2026:
    I’d prefer this removal to be a separate commit at the beginning - it’s independent from the later changes, and you could explain in the commit message the not-so-ovbious reason why it can be safely removed (it’d always return early).

    marcofleon commented at 8:24 pm on March 4, 2026:
    Done
  9. l0rinc changes_requested
  10. l0rinc commented at 8:28 pm on February 7, 2026: contributor
    I think I understand the race, but deleting/reinserting here feels brittle. Would it be possible to avoid pre-populating candidate inserts until after sequence IDs are finalized?
  11. Jhackman2019 commented at 1:37 am on February 8, 2026: none
    Agreed, this lines up with what mzumsande pointed out regarding the dual chainstate case too. I believe marcofleon’s alt branch already takes that approach.
  12. mzumsande commented at 8:55 am on February 9, 2026: contributor
    I also think that the fix in https://github.com/marcofleon/bitcoin/tree/2026/02/fix-setBlockIndexCandidate-ub looks better and would suggest to switch to that approach. Irrespective of the bug being fixed here I think that the current way of adding the entire chain and then removing all of it again from the set is not very elegant, so it seems good to change that anyway.
  13. marcofleon force-pushed on Feb 10, 2026
  14. marcofleon force-pushed on Feb 10, 2026
  15. DrahtBot added the label CI failed on Feb 10, 2026
  16. DrahtBot commented at 5:16 pm on February 10, 2026: contributor

    🚧 At least one of the CI tasks failed. Task macOS native: https://github.com/bitcoin/bitcoin/actions/runs/21874154480/job/63137979156 LLM reason (✨ experimental): validation_chainstatemanager_tests failed (subprocess aborted) causing the CTest run to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  17. marcofleon commented at 6:11 pm on February 10, 2026: contributor
    I’ve switched the approach to only populating setBlockIndexCandidates after we call LoadChainTip and included an assert that all candidate sets (if we have two chainstates) are empty right before we change all the sequence ids. This should fix both types of undefined behavior outlined in #34521#pullrequestreview-3761920357.
  18. DrahtBot removed the label CI failed on Feb 10, 2026
  19. l0rinc commented at 8:10 pm on February 10, 2026: contributor
    @marcofleon, could you please add a test as a first commit that documents the current behavior and the fix in the next commit that that adjust the test to help reviewers assess the behavior change (i.e. documented and debuggable before & after behavior)? Or if that’s too involved, just a test that we can run without the fix to assess its correctness (we can’t do that currently since it contains PopulateBlockIndexCandidates)? Please see my suggestion in https://github.com/bitcoin/bitcoin/pull/34521/changes#r2779130922
  20. marcofleon force-pushed on Feb 13, 2026
  21. marcofleon commented at 2:59 pm on February 13, 2026: contributor

    @l0rinc I extended a functional test that restarts the node after a block race. If you remove the first commit and run the test, it should crash, although it might take a few runs. The undefined behavior doesn’t always have an effect on the candidate set. It depends on the initial ordering of the blocks in the set, which on master, comes down to the pointer addresses of the competing blocks.

    You can see that the conditions that cause the assert to fail are pretty specific. If you think the test is useful, I can keep the it there. Or I can take the commit out once you’re confident that the UB is fixed. I don’t mind either way.

  22. dergoegge approved
  23. dergoegge commented at 2:48 pm on February 17, 2026: member

    tACK 72bd4c915ecaca7141d3c25e8d990e92b0551f67

    Tested with the same test that found the issue originally. High confidence that this does not break anything.

  24. in src/validation.cpp:4615 in 72bd4c915e
    4611@@ -4612,6 +4612,11 @@ bool Chainstate::LoadChainTip()
    4612     m_chainman.UpdateIBDStatus();
    4613     tip = m_chain.Tip();
    4614 
    4615+    // nSequenceId is about to be modified, so all candidate sets should be empty to avoid UB.
    


    stickies-v commented at 8:35 am on February 18, 2026:

    The referenced UB is not really obvious from just this sentence, future readers will thank you for elaborating. For example:

    0    // nSequenceId is one of the keys used to sort setBlockIndexCandidates. Ensure all
    1    // candidate sets are empty to avoid UB as nSequenceId is about to be modified.
    

    marcofleon commented at 6:28 am on February 19, 2026:
    Better comment, thanks. Taken.
  25. in src/validation.h:820 in 72bd4c915e outdated
    816@@ -817,6 +817,8 @@ class Chainstate
    817 
    818     void ClearBlockIndexCandidates() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    819 
    820+    void PopulateBlockIndexCandidates() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    stickies-v commented at 8:37 am on February 18, 2026:
    Would be nice to add a docstring for this public method. To avoid documenting the “wrong” function, this might mean also adding one to TryAddBlockIndexCandidate and referencing that here.

    marcofleon commented at 6:29 am on February 19, 2026:
    Done, added a docstring to both methods.
  26. in src/validation.cpp:5763 in 72bd4c915e outdated
    5756@@ -5753,6 +5757,11 @@ util::Result<CBlockIndex*> ChainstateManager::ActivateSnapshot(
    5757     assert(chaintip_loaded);
    5758     m_blockman.m_snapshot_height = Assert(chainstate.SnapshotBase())->nHeight;
    5759 
    5760+    // Due to the early return in LoadChainTip(), no nSequenceIds were
    5761+    // modified, so the historical chainstate's candidate set remains correctly
    5762+    // ordered. Only populate the snapshot chainstate's candidate set.
    5763+    chainstate.PopulateBlockIndexCandidates();
    


    stickies-v commented at 3:36 pm on February 18, 2026:

    This comment is a bit scary. It references (and assumes?) implementation details in LoadChainTip(), and is not clear about which early return (and/or why that is hit). This stuff confuses me more than it gives me clarity.

    I don’t have any suggestions on how to improve it yet, but wanted to flag it already.


    marcofleon commented at 6:30 am on February 19, 2026:
    Agreed it’s not the best of comments. I was having trouble trying to decide how in detail I should go here or whether to put a comment at all. I’ll leave it for now until I have a better idea of how to improve it, or if you or other reviewers have suggestions.

    mzumsande commented at 11:08 am on February 24, 2026:

    I got confused by this too. First I thought that during Snapshot activation the added ( assert(cs->setBlockIndexCandidates.empty()) should fail because the historical chainstate should always have candidates in its set at the point in time where we activate a snapshot. But as the comment says, we always return before we get to that spot, so nothing fails.

    But then this leads to the question why we call LoadChainTip() from ActivateSnapshot() in the first place here? In PopulateAndValidateSnapshot(), we set both SetBestBlock() and SetTip() so we should always return here in the ActivateSnapshot branch. Wouldn’t it be simpler to remove both the call to LoadChainTip() and the comment?


    marcofleon commented at 7:47 pm on February 25, 2026:

    Wouldn’t it be simpler to remove both the call to LoadChainTip() and the comment?

    Yeah this makes more sense. I appreciate the suggestion (and explanation).

  27. stickies-v commented at 3:37 pm on February 18, 2026: contributor
    Concept ACK
  28. DrahtBot added the label Needs rebase on Feb 18, 2026
  29. marcofleon force-pushed on Feb 19, 2026
  30. DrahtBot removed the label Needs rebase on Feb 19, 2026
  31. luke-jr commented at 4:24 am on February 20, 2026: member
    Should SEQ_ID_BEST_CHAIN_FROM_DISK be used for the best chain in all chainstates, or only the tip chainstate? (I can’t think of a scenario where they would contradict, but felt it might be good to ask in case anyone else thinks of such a case)
  32. achow101 commented at 10:06 pm on February 23, 2026: member
    ACK 9b07bd85064d52d3f194eb421bf0a5abfce4f9bd
  33. DrahtBot requested review from stickies-v on Feb 23, 2026
  34. DrahtBot requested review from dergoegge on Feb 23, 2026
  35. mzumsande commented at 11:22 am on February 24, 2026: contributor
    Concept ACK
  36. in src/validation.cpp:4936 in acf5339157
    4931+    // may not have BLOCK_VALID_TRANSACTIONS (e.g. if we haven't yet downloaded
    4932+    // the block), so we special-case it here.
    4933+    const CBlockIndex* snap_base{m_chainman.CurrentChainstate().SnapshotBase()};
    4934+
    4935+    for (CBlockIndex* pindex : m_blockman.GetAllBlockIndices()) {
    4936+        if (pindex == snap_base ||
    


    sedited commented at 4:39 pm on February 24, 2026:

    Now that this is moved to the Chainstate class, might it better to do the following check?

    0diff --git a/src/validation.cpp b/src/validation.cpp
    1index 8c02e94f94..b2b9a1e548 100644
    2--- a/src/validation.cpp
    3+++ b/src/validation.cpp
    4@@ -4933 +4932,0 @@ void Chainstate::PopulateBlockIndexCandidates()
    5-    const CBlockIndex* snap_base{m_chainman.CurrentChainstate().SnapshotBase()};
    6@@ -4936 +4935 @@ void Chainstate::PopulateBlockIndexCandidates()
    7-        if (pindex == snap_base ||
    8+        if (pindex == TargetBlock() || pindex == SnapshotBase() ||
    

    marcofleon commented at 7:54 pm on February 25, 2026:
    Taken, thanks. It’s cleaner to use these directly now that it’s a method on Chainstate.
  37. in src/validation.cpp:4928 in 9b07bd8506
    4921@@ -4923,6 +4922,25 @@ void Chainstate::ClearBlockIndexCandidates()
    4922     setBlockIndexCandidates.clear();
    4923 }
    4924 
    4925+void Chainstate::PopulateBlockIndexCandidates()
    4926+{
    4927+    AssertLockHeld(::cs_main);
    4928+    setBlockIndexCandidates.clear();
    


    stringintech commented at 7:54 pm on February 24, 2026:

    Could we have an assertion here instead?

    0    assert(setBlockIndexCandidates.empty());
    

    sedited commented at 9:00 pm on February 24, 2026:
    At this point in time, can the candidate set contain anything besides nothing, or in some cases the genesis block? If so, could it be useful to assert that? I’m also not sure why the candidate set needs to be cleared here.

    marcofleon commented at 7:59 pm on February 25, 2026:

    I think I was just clearing it to make sure it’s empty beforehand. But agreed, it’s better to assert the state the set should be in by the time this is called.

    I added the assertion for empty or only the genesis block in the set. I found out that when starting a shiny new node for the first time, the genesis block will be in this set at this point.

    Let me know if there’s something else you had in mind for the assertion.


    marcofleon commented at 8:04 pm on February 25, 2026:
    Thanks, I agree an assertion here is better. Added this, but allowing for the genesis block to be in the set as well, which happens when we start up a node for the first time.
  38. in src/validation.cpp:4941 in 9b07bd8506
    4936+        if (pindex == snap_base ||
    4937+                (pindex->IsValid(BLOCK_VALID_TRANSACTIONS) &&
    4938+                 (pindex->HaveNumChainTxs() || pindex->pprev == nullptr))) {
    4939+            TryAddBlockIndexCandidate(pindex);
    4940+        }
    4941+    }
    


    stringintech commented at 8:02 pm on February 24, 2026:
    It is not clear to me why the snapshot base has to be added as candidate for the historical chainstate as well 🤔 (I understand this behavior already existed before).

    stringintech commented at 9:22 am on February 25, 2026:
    Seems like adding it only to the snapshot chainstate candidates would break the chainstatemanager_loadblockindex unit test but then again the comment here also suggests that adding it to the historical chainstate candidates is not necessary. So maybe we can adapt both PopulateBlockIndexCandidates() and failing unit tests?

    marcofleon commented at 8:11 pm on February 25, 2026:
    Good point, I thought about this as well. I couldn’t find a good reason the snapshot base is added as a candidate for both, but maybe we’re missing something. Either way, this feels tangential to this PR and afaict adding to both is harmless, so I’ll keep as is.
  39. sedited commented at 9:00 pm on February 24, 2026: contributor
    Concept ACK
  40. stringintech commented at 9:07 am on February 25, 2026: contributor
    Concept ACK.
  41. marcofleon force-pushed on Feb 25, 2026
  42. achow101 commented at 9:31 pm on February 26, 2026: member
    ACK cd2d6529f49dfef98ecc89c0d07b2ba0ac55a550
  43. DrahtBot requested review from sedited on Feb 26, 2026
  44. DrahtBot requested review from stringintech on Feb 26, 2026
  45. DrahtBot requested review from mzumsande on Feb 26, 2026
  46. in src/validation.h:821 in cd2d6529f4 outdated
    816 
    817     void PruneBlockIndexCandidates();
    818 
    819     void ClearBlockIndexCandidates() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    820 
    821+    /** Populate the candidate set by calling TryAddBlockIndexCandidate on all valid block indices. */
    


    stickies-v commented at 5:55 am on February 27, 2026:

    nit: usage could be documented more completely:

    0    /** Populate the candidate set by calling TryAddBlockIndexCandidate on all valid block indices.
    1     * This function may only be called once.
    2     *
    3     * [@pre](/bitcoin-bitcoin/contributor/pre/) setBlockIndexCandidates must be empty (except for genesis block).
    

    stickies-v commented at 6:18 am on February 27, 2026:

    (continued from #34521 (review))

    Actually, I don’t think we need any assertions or clearing here. This feels like leaking testing of the current behaviour into the function implementation. It makes sense to add assertions for preconditions that can e.g. lead to UB, but that doesn’t seem to be the case here. Inserting twice may be useless, and callsites may want to prevent it from happening, but it’s not an inherent requirement for populating the candidate sets.

    The tests pass fine with the below diff, which removes the assert and duplicates the PopulateBlockIndexCandidates() calls:

     0diff --git a/src/node/chainstate.cpp b/src/node/chainstate.cpp
     1index ff3de29822..69c3848881 100644
     2--- a/src/node/chainstate.cpp
     3+++ b/src/node/chainstate.cpp
     4@@ -131,6 +131,7 @@ static ChainstateLoadResult CompleteChainstateInitialization(
     5     // set's comparator, changing it while blocks are in the set would be UB.
     6     for (const auto& chainstate : chainman.m_chainstates) {
     7         chainstate->PopulateBlockIndexCandidates();
     8+        chainstate->PopulateBlockIndexCandidates();
     9     }
    10 
    11     const auto& chainstates{chainman.m_chainstates};
    12diff --git a/src/test/util/chainstate.h b/src/test/util/chainstate.h
    13index 3ceed5697e..b8076f6b49 100644
    14--- a/src/test/util/chainstate.h
    15+++ b/src/test/util/chainstate.h
    16@@ -106,6 +106,7 @@ CreateAndActivateUTXOSnapshot(
    17                 pindex = pindex->pprev;
    18             }
    19             chain.PopulateBlockIndexCandidates();
    20+            chain.PopulateBlockIndexCandidates();
    21         }
    22         BlockValidationState state;
    23         if (!node.chainman->ActiveChainstate().ActivateBestChain(state)) {
    24diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
    25index 31740f8d98..738e8caed7 100644
    26--- a/src/test/validation_chainstatemanager_tests.cpp
    27+++ b/src/test/validation_chainstatemanager_tests.cpp
    28@@ -82,6 +82,8 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager, TestChain100Setup)
    29         c2.LoadChainTip();
    30         for (const auto& cs : manager.m_chainstates) {
    31             cs->PopulateBlockIndexCandidates();
    32+            cs->PopulateBlockIndexCandidates();
    33+            
    34         }
    35     }
    36     BlockValidationState _;
    37@@ -499,6 +501,7 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_loadblockindex, TestChain100Setup)
    38         chainman.LoadBlockIndex();
    39         for (const auto& cs : chainman.m_chainstates) {
    40             cs->PopulateBlockIndexCandidates();
    41+            cs->PopulateBlockIndexCandidates();
    42         }
    43     };
    44 
    45diff --git a/src/validation.cpp b/src/validation.cpp
    46index 1a55a789af..e2d1365694 100644
    47--- a/src/validation.cpp
    48+++ b/src/validation.cpp
    49@@ -4925,10 +4925,6 @@ void Chainstate::ClearBlockIndexCandidates()
    50 void Chainstate::PopulateBlockIndexCandidates()
    51 {
    52     AssertLockHeld(::cs_main);
    53-    // It's possible that the genesis block is already in the candidate set.
    54-    assert(setBlockIndexCandidates.empty() ||
    55-           (setBlockIndexCandidates.size() == 1 &&
    56-            (*setBlockIndexCandidates.begin())->GetBlockHash() == m_chainman.GetConsensus().hashGenesisBlock));
    57 
    58     for (CBlockIndex* pindex : m_blockman.GetAllBlockIndices()) {
    59         // With assumeutxo, the snapshot block is a candidate for the tip, but it
    60@@ -5749,6 +5745,7 @@ util::Result<CBlockIndex*> ChainstateManager::ActivateSnapshot(
    61     Chainstate& chainstate{AddChainstate(std::move(snapshot_chainstate))};
    62     m_blockman.m_snapshot_height = Assert(chainstate.SnapshotBase())->nHeight;
    63 
    64+    chainstate.PopulateBlockIndexCandidates();
    65     chainstate.PopulateBlockIndexCandidates();
    66 
    67     LogInfo("[snapshot] successfully activated snapshot %s", base_blockhash.ToString());
    

    I think calling clear() at the beginning (as it was in a previous version) would be acceptable too if better/required, but then it should be documented and Repopulate is probably a better name than Populate?


    marcofleon commented at 2:50 pm on February 27, 2026:

    True, it is a set so adding duplicates will be fine. I viewed the assertion as more of a check for unexpected state. In that case, an Assume could make more sense.

    What do you think of an Assume vs clearing it? Or the other option is just not having anything there. I don’t have a strong preference. I’ll expand the documentation to match whichever one we end up doing.


    stickies-v commented at 4:21 pm on February 27, 2026:

    I viewed the assertion as more of a check for unexpected state. In that case, an Assume could make more sense.

    I don’t think it does. Similar to assert, Assume just makes using and maintaining PopulateBlockIndexCandidates() unnecessarily confusing. PopulateBlockIndexCandidates’s responsibility is to populate block index candidates, not to verify higher level program behaviour. Higher levels can add asserts or assumes where applicable.


    sedited commented at 4:26 pm on February 27, 2026:
    I think I agree with stickies-v after all here. After posting my initial comment, I was less sure if the asserts would be useful: They don’t actually guard against abuse or the bug in question here.

    marcofleon commented at 4:51 pm on February 27, 2026:
    I’ll remove everything then and keep the documentation as is, because there’s nothing to expand on.
  47. DrahtBot requested review from stickies-v on Feb 27, 2026
  48. fjahr commented at 3:03 pm on February 27, 2026: contributor

    ACK cd2d6529f49dfef98ecc89c0d07b2ba0ac55a550

    Reviewed the code and confirmed that the edited functional test fails about 20% of the time on current master and the code changes here address this.

  49. marcofleon force-pushed on Feb 27, 2026
  50. sedited approved
  51. sedited commented at 7:39 pm on February 27, 2026: contributor
    ACK 1f7109af24570e786a236c9271794d32af2f16aa
  52. DrahtBot requested review from fjahr on Feb 27, 2026
  53. DrahtBot requested review from achow101 on Feb 27, 2026
  54. stickies-v commented at 9:00 am on February 28, 2026: contributor

    I think the current approach is a reasonable way to address the stated problem. However, after spending a lot of time on this PR I keep finding it difficult to build sufficient confidence that the reshuffled initialization order doesn’t introduce any (subtle) new logic bugs.

    The current approach patches, but doesn’t really fix the underlying issue. The fact that setBlockIndexCandidates is sorted by a public, mutable field is inherently brittle, and might easily cause similar bugs in the future.

    Do we really need setBlockIndexCandidates to be pow-sorted at all times? edit: turns out we do It seems to me like we don’t - we only need the ordering in FindMostWorkChain. So, my suggested alternative is to keep the set default-sorted, and do the pow scanning in FindMostWorkChain. It seems like this can be implemented in a much smaller diff than the current approach, too: https://github.com/bitcoin/bitcoin/compare/master...stickies-v:bitcoin:2026-02/candidates-without-sequenceid

    edit: I suspect we might be able to make further performance improvements by e.g. making setBlockIndexCandidates a multimap with nChainwork as key, and further encapsulation improvements by having a separate class BlockIndexCandidates, but it doesn’t seem necessary for this minimal fix.

  55. sedited commented at 9:35 am on February 28, 2026: contributor

    Do we really need setBlockIndexCandidates to be pow-sorted at all times? It seems to me like we don’t - we only need the ordering in FindMostWorkChain. So, my suggested alternative is to keep the set default-sorted, and do the pow scanning in FindMostWorkChain. It seems like this can be implemented in a much smaller diff than the current approach, too: https://github.com/bitcoin/bitcoin/compare/master...stickies-v:bitcoin:2026-02/candidates-without-sequenceid

    I tried doing a quick signet reindex-chainstate and your branch was an order of magnitude slower than current master. Did you check how this impacts performance?

  56. stickies-v commented at 11:48 am on February 28, 2026: contributor

    I tried doing a quick signet reindex-chainstate and your branch was an order of magnitude slower than current master. Did you check how this impacts performance?

    Thanks for the quick sanity check, you’re right, reindex-chainstate performance is horrible. Fixed in this slightly more involved branch that still sorts on nChainwork, but not nSequenceId: https://github.com/bitcoin/bitcoin/compare/master...stickies-v:bitcoin:2026-02/candidates-chainwork. Seems just as quick as master for me. Idea remains the same.

  57. stickies-v commented at 1:12 am on March 1, 2026: contributor
    Since the current approach seems feasible and has had a good amount of review already, I’ll open my approach as a separate PR tomorrow so they can be evaluated side-by-side rather than having to flip back and forth.
  58. in src/validation.cpp:4933 in 77604ff526 outdated
    4928+
    4929+    for (CBlockIndex* pindex : m_blockman.GetAllBlockIndices()) {
    4930+        // With assumeutxo, the snapshot block is a candidate for the tip, but it
    4931+        // may not have BLOCK_VALID_TRANSACTIONS (e.g. if we haven't yet downloaded
    4932+        // the block), so we special-case it here.
    4933+        if (pindex == SnapshotBase() || pindex == TargetBlock() ||
    


    mzumsande commented at 10:32 am on March 4, 2026:
    Adding the pindex == TargetBlock() condition (without which a unit test would fail) makes a quirk of the existing AssumeUtxo logic more visible - from a high-level point of view it seems unnecessary: Why do we need an exception for the historical chainstate? This chainstate starts at genesis, and when catching up, it will download, and connect the snapshot block normally, so it doesn’t need any special logic as far as I can see. Currently, we have an unconnectable block lingering in setBlockIndexCandidates, which makes no sense. It is even documented in the unit tests https://github.com/bitcoin/bitcoin/blob/2eaf701bc0d5cfbb6bc458f467141451f2175cdf/src/test/validation_chainstatemanager_tests.cpp#L557-L561 I am planning to open a follow-up PR (not targeted for v31) that will 1) Remove this and 2) tighten the logic in CheckBlockIndex to check if there are no unconnectable block in setBlockIndexCandidates.

    mzumsande commented at 1:50 pm on March 4, 2026:
    oh, I just saw that the same point was raised above by @stringintech - I do think this should be fixed, but not in this PR.
  59. in test/functional/feature_chain_tiebreaks.py:115 in 1f7109af24
    115-        #   G
    116-        #     \
    117-        #      A2
    118+        #            /- A1
    119+        #           /
    120+        #   G -- B1 --- A2
    


    mzumsande commented at 11:17 am on March 4, 2026:
    Was this subtest supposed to use the second node (which currently seems completely unused)? The diagram suggests this, currently the blocks are not build on Genesis but on the messy endstate of the first subtest. Maybe use the second node for that, and never connect the two (or keep it as is and remove the second node).

    marcofleon commented at 8:18 pm on March 4, 2026:
    Good catch, thanks. Switched to using the second node.
  60. in test/functional/feature_chain_tiebreaks.py:126 in 1f7109af24 outdated
    130-        prev_time = genesis_block["time"]
    131+        tip_block = node.getblock(node.getbestblockhash())
    132+        prev_time = tip_block["time"]
    133 
    134-        for i in range(0, 2):
    135+        for i in range(0, 3):
    


    mzumsande commented at 11:25 am on March 4, 2026:
    Could you explain the test change a bit more in the commit message and/or the code? Why do we send a third block A3 now? Why do we no longer restart multiple times in a loop, but send a new block after the restart?

    marcofleon commented at 8:23 pm on March 4, 2026:

    Done. I found out more about why two blocks doesn’t cause the erase failure on my mac vs three blocks.

    https://godbolt.org/z/covTcYT1c

    Using libstdc++ (the standard library GCC uses I believe) would cause the two block race to fail. But as you can see, using clang’s libc++ there are only failures with a three block race.

    Maybe different tree structure shapes? Would be interesting to pinpoint exactly why.

  61. DrahtBot requested review from mzumsande on Mar 4, 2026
  62. mzumsande commented at 12:03 pm on March 4, 2026: contributor

    I think the current approach is a reasonable way to address the stated problem. However, after spending a lot of time on this PR I keep finding it difficult to build sufficient confidence that the reshuffled initialization order doesn’t introduce any (subtle) new logic bugs.

    Fixed in this slightly more involved branch

    I feel that this would be too risky as a last-minute change. I would prefer to either go with this PR or alternatively revert #29640 for v31 (the problem that PR solved was of a pretty theoretical nature at all - forks at the tip are rare, and changing the tip after a restart is certainly a bit weird but not really a practical problem for most nodes).

  63. marcofleon commented at 2:09 pm on March 4, 2026: contributor

    I feel that this would be too risky as a last-minute change

    I did a first pass of stickies branch and I think it probably is a more robust long-term solution, although I would need to review more to be sure. We also ran the original antithesis test on the branch for about a day and it passed.

    If we want to do a temporary (less complex) fix for now and then revisit approaches later on, that sounds fine to me. I’ll address review comments today in case we end up deciding that this PR is the way to move forward for v31.

  64. sipa commented at 4:00 pm on March 4, 2026: member
    Is it possible the same issue still exists inside Chainstate::PreciousBlock? It modifies pindex->nSequenceId after erasing it from its own setBlockIndexCandidates, but doesn’t erase it from the potentially existing other Chainstate.
  65. mzumsande commented at 4:32 pm on March 4, 2026: contributor

    Is it possible the same issue still exists inside Chainstate::PreciousBlock? It modifies pindex->nSequenceId after erasing it from its own setBlockIndexCandidates, but doesn’t erase it from the potentially existing other Chainstate.

    I thought about this question last week but came to the conclusion that it’s probably not a problem: PreciousBlock operates on the Active Chainstate, and will abort if the block is not on its tip. These blocks should never be in the setBlockIndexCandidates of the historical chainstate because they are not ancestors of the target block (see this criterion)

  66. validation: remove LoadChainTip call from ActivateSnapshot
    This call is a no-op. PopulateAndValidateSnapshot already sets both
    the chain tip and the coins cache best block to the snapshot block,
    so LoadChainTip always hits the early return when it finds that the
    two match (tip->GetBlockHash() == coins_cache.GetBestBlock()).
    9249e6089e
  67. validation: fix UB in LoadChainTip
    The removal of the chain tip from setBlockIndexCandidates was
    happening after nSequenceId was modified. Since the set uses
    nSequenceId as a sort key, modifying it while the element is in the
    set is undefined behavior, which can cause the erase to fail.
    
    With assumeutxo, a second form of UB exists: two chainstates each
    have their own candidate set, but share the same CBlockIndex
    objects. Calling LoadChainTip on one chainstate mutates nSequenceIds
    that are also in the other chainstate's set.
    
    Fix by populating setBlockIndexCandidates after all changes to
    nSequenceId.
    854a6d5a9a
  68. marcofleon force-pushed on Mar 4, 2026
  69. Extend functional test for setBlockIndexCandidates UB
    Fix the from-disk subtest to use a separate node so it builds on a
    clean genesis block, rather than the leftover chain from the
    in-memory subtest.
    
    Change from a two-way to a three-way block race. The UB in the old
    LoadChainTip (mutating nSequenceId, a sort key, while the block is
    in setBlockIndexCandidates) corrupts the internal tree structure,
    resulting in a failed erase that leaves stale blocks in the set
    alongside the tip. With only two competing blocks, this is caught
    by libstdc++ but not by libc++. A three-way split triggers the bug
    on both implementations.
    
    To trigger CheckBlockIndex (where the crashing assertion is), replace
    the restart loop with sending a new block after a single restart.
    20ae9b98ea
  70. marcofleon force-pushed on Mar 4, 2026
  71. DrahtBot added the label CI failed on Mar 4, 2026
  72. DrahtBot removed the label CI failed on Mar 4, 2026
  73. sedited approved
  74. sedited commented at 2:58 pm on March 5, 2026: contributor
    Re-ACK 20ae9b98eab20117344cf31f7cde39cadd70ca22
  75. sipa commented at 6:32 pm on March 5, 2026: member
    Code review ACK 20ae9b98eab20117344cf31f7cde39cadd70ca22
  76. achow101 commented at 8:04 pm on March 5, 2026: member
    ACK 20ae9b98eab20117344cf31f7cde39cadd70ca22
  77. achow101 merged this on Mar 6, 2026
  78. achow101 closed this on Mar 6, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-09 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me