validation: Improve warnings in case of chain corruption #33553

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202510_headersync_loop changing 2 files +10 −5
  1. mzumsande commented at 9:49 pm on October 6, 2025: contributor

    In case of corruption that leads to a block being marked as invalid that is seen as valid by the rest of the network, the user currently doesn’t receive good error messages, but will often be stuck in an endless headers-sync loop with no explanation (#26391).

    This PR improves warnings in two ways:

    Fixes #26391 (We’ll still do the endless looping, trying to find a peer with a headers that we can use, but will now repeatedly log warnings while doing so).

  2. DrahtBot added the label Validation on Oct 6, 2025
  3. DrahtBot commented at 9:49 pm on October 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33553.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK janb84, hodlinator
    Approach ACK stickies-v

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. in src/validation.cpp:2035 in d55830cd50
    2035         return;
    2036     }
    2037 
    2038     if (m_chainman.m_best_invalid && m_chainman.m_best_invalid->nChainWork > m_chain.Tip()->nChainWork + (GetBlockProof(*m_chain.Tip()) * 6)) {
    2039-        LogPrintf("%s: Warning: Found invalid chain at least ~6 blocks longer than our best chain.\nChain state database corruption likely.\n", __func__);
    2040+        LogPrintf("%s: Warning: Found invalid chain at least ~6 blocks longer than our best chain. Chain state database corruption likely.\n", __func__);
    


    maflcko commented at 7:52 am on October 7, 2025:
    0        LogWarning("Found invalid chain at least ~6 blocks longer than our best chain. Chain state database corruption likely.");
    

    nit: while touching this, could add the correct log severity?


    mzumsande commented at 2:43 pm on October 7, 2025:
    done
  5. in src/validation.cpp:2029 in d55830cd50 outdated
    2025@@ -2026,15 +2026,13 @@ void Chainstate::CheckForkWarningConditions()
    2026 {
    2027     AssertLockHeld(cs_main);
    2028 
    2029-    // Before we get past initial download, we cannot reliably alert about forks
    2030-    // (we assume we don't get stuck on a fork before finishing our initial sync)
    2031-    // Also not applicable to the background chainstate
    2032-    if (m_chainman.IsInitialBlockDownload() || this->GetRole() == ChainstateRole::BACKGROUND) {
    2033+    // Not applicable to the background chainstate
    


    stickies-v commented at 11:05 am on October 7, 2025:
    nit: trivial docstring, would prefer removing (or elaborating)

    mzumsande commented at 4:03 pm on October 8, 2025:
    removed it. I think it’s obvious enough why it’s not applicable.
  6. mzumsande force-pushed on Oct 7, 2025
  7. janb84 commented at 2:56 pm on October 7, 2025: contributor

    Concept ACK

    This PR introduces a informative warning for the user in the case that the users has an invalid block (header) due to corruption or otherwise.

    NIT: why keep looping ? is there a path where the node can recover from this ?

    partial code review ✅

    Gives warning in case of chain corruption

    02025-10-07T14:24:52Z Pre-synchronizing blockheaders, height: 266146 (~97.53%)
    12025-10-07T14:24:55Z [warning] Received header for a block previously marked as invalid from peer=0. If this happens with all peers, database corruption is likely and -reindex may be necessary to recover.
    22025-10-07T14:24:56Z Pre-synchronizing blockheaders, height: 4146 (~1.54%)
    

    Loops without warning:

     02025-10-07T14:17:05Z Pre-synchronizing blockheaders, height: 258146 (~94.63%)
     12025-10-07T14:17:05Z Pre-synchronizing blockheaders, height: 262146 (~96.10%)
     22025-10-07T14:17:05Z Pre-synchronizing blockheaders, height: 266146 (~97.53%)
     32025-10-07T14:17:07Z Pre-synchronizing blockheaders, height: 4146 (~1.54%)
     42025-10-07T14:17:07Z Pre-synchronizing blockheaders, height: 16146 (~6.00%)
     52025-10-07T14:17:08Z Pre-synchronizing blockheaders, height: 28146 (~10.45%)
     62025-10-07T14:17:08Z Pre-synchronizing blockheaders, height: 42146 (~15.66%)
     72025-10-07T14:17:08Z Pre-synchronizing blockheaders, height: 54146 (~20.12%)
     82025-10-07T14:17:08Z Pre-synchronizing blockheaders, height: 64146 (~23.84%)
     92025-10-07T14:17:09Z Pre-synchronizing blockheaders, height: 78146 (~29.03%)
    102025-10-07T14:17:09Z Pre-synchronizing blockheaders, height: 92146 (~34.24%)
    
  8. mzumsande commented at 3:23 pm on October 7, 2025: contributor

    NIT: why keep looping ? is there a path where the node can recover from this ?

    Not if we are actually corrupted, but we can’t be 100% sure. Maybe we are right, and the next peer would extend the chain that we see as valid.

    There is the possibility of an an attack where a miner creates an invalid block with valid PoW (which is costly), and has some sybil nodes feed that chain to others - it wouldn’t be good if this could be used to shutdown nodes. Or there could be some kind of chaotic chain split / hard fork scenario. This is not very likely to happen (hopefully), but I don’t think shutting down the node automatically would be a good idea.

  9. in src/net_processing.cpp:2960 in 6ae9af6b98
    2955@@ -2956,6 +2956,10 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
    2956                                                            state, &pindexLast)};
    2957     if (!processed) {
    2958         if (state.IsInvalid()) {
    2959+            if (state.GetResult() == BlockValidationResult::BLOCK_CACHED_INVALID) {
    2960+                LogWarning("Received header for a block previously marked as invalid from peer=%d. "
    


    stickies-v commented at 6:09 pm on October 7, 2025:

    nit to respect line-length, clang-format and grammatically make things clearer that the peer in question did not send the previous version of the header:

     0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
     1index 073a434335..4574c6cf19 100644
     2--- a/src/net_processing.cpp
     3+++ b/src/net_processing.cpp
     4@@ -2957,8 +2957,10 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
     5     if (!processed) {
     6         if (state.IsInvalid()) {
     7             if (state.GetResult() == BlockValidationResult::BLOCK_CACHED_INVALID) {
     8-                LogWarning("Received header for a block previously marked as invalid from peer=%d. "
     9-                           "If this happens with all peers, database corruption is likely and -reindex may be necessary to recover.", pfrom.GetId());
    10+                LogWarning("Header received from peer=%d was previously marked as invalid. If this "
    11+                           "happens with all peers, database corruption is likely and restarting "
    12+                           "with -reindex may be necessary to recover.",
    13+                           pfrom.GetId());
    14             }
    15             MaybePunishNodeForBlock(pfrom.GetId(), state, via_compact_block, "invalid header received");
    16             return;
    

    mzumsande commented at 4:04 pm on October 8, 2025:
    done.
  10. stickies-v commented at 6:12 pm on October 7, 2025: contributor
    Approach ACK, agreed that something non-invasive like logging is the way to go here.
  11. p2p: Add warning message when receiving headers for blocks cached as invalid
    Currently, if database corruption leads to a block being marked as
    invalid incorrectly, we can get stuck in an infinite headerssync
    loop with no indication what went wrong or how to fix it.
    With the added log message, users will receive an explicit warning after each
    failed headerssync attempt with an outbound peer.
    419ff388b9
  12. validation: call CheckForkWarningConditions during IBD, and at startup
    The existing IBD check was added at a time when CheckForkWarningConditions
    did also sophisticated fork detection that could lead to false positives
    during IBD (55ed3f14751206fc87f0cbf8cb4e223efacef338).
    
    The fork detection logic doesn't exist anymore
    (since fa62304c9760f0de9838e56150008816e7a9bacb), so the IBD check is no
    longer necessary.
    
    Displaying the log at startup will help node operators diagnose the
    problem better.
    04b943b63c
  13. in src/net_processing.cpp:2959 in 6ae9af6b98
    2955@@ -2956,6 +2956,10 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
    2956                                                            state, &pindexLast)};
    2957     if (!processed) {
    2958         if (state.IsInvalid()) {
    2959+            if (state.GetResult() == BlockValidationResult::BLOCK_CACHED_INVALID) {
    


    stickies-v commented at 11:38 am on October 8, 2025:

    I think there is potential for unconditional logging abuse here. If an inbound peer (which doesn’t get disconnected for sending BLOCK_CACHED_INVALID headers) knows of (or spends the PoW to construct) an invalid block with a valid header, they can resend the same header and spam the log. The impacts should be limited since we now have disk filling mitigation in place, but it’s still annoying at the very least, and it’s very cheap to execute on the entire network when e.g. a miner accidentally constructs such a block.

    Minimal test to showcase:

     0diff --git a/test/functional/p2p_invalid_block.py b/test/functional/p2p_invalid_block.py
     1index 88bdbe22a8..c5d22c3382 100755
     2--- a/test/functional/p2p_invalid_block.py
     3+++ b/test/functional/p2p_invalid_block.py
     4@@ -104,6 +104,8 @@ class InvalidBlockRequestTest(BitcoinTestFramework):
     5         block3.solve()
     6 
     7         peer.send_blocks_and_test([block3], node, success=False, reject_reason='bad-cb-amount')
     8+        for i in range(100):
     9+            peer.send_blocks_and_test([block3], node, success=False, reject_reason='Received header for a block previously marked')
    10 
    11 
    12         # Complete testing of CVE-2012-2459 by sending the original block.
    

    mzumsande commented at 4:15 pm on October 8, 2025:

    good point - I restricted the log to outbound peers, which makes sense anyway, because it’s consistent with the fact that we only disconnect those.

    (Besides, I’m curious what the current way to think about unconditional logging should be - be more liberal / care less about spamming? Or keep on being careful about log levels just as before when the rate limiting didn’t exist? A bit offtopic here, but maybe something to discuss at CoreDev…)

  14. mzumsande force-pushed on Oct 8, 2025
  15. in src/net_processing.cpp:2965 in 04b943b63c
    2960+                LogWarning("Header received from peer=%i was previously marked as invalid. "
    2961+                           "If this happens with all peers, database corruption is likely and "
    2962+                           "-reindex may be necessary to recover.",
    2963+                           pfrom.GetId());
    2964+            }
    2965             MaybePunishNodeForBlock(pfrom.GetId(), state, via_compact_block, "invalid header received");
    


    hodlinator commented at 9:05 pm on October 8, 2025:

    (Inline comment in random place): Heads up that I’ve started working on a functional test for this at https://github.com/bitcoin/bitcoin/compare/04b943b63ca4c27aa857b03e5de82ed8cae7b17d...hodlinator:bitcoin:pr/33553_suggestions

    It’s super-slow (partially due to HeadersSyncState only being created when receiving the very hardcoded 2000 headers message), and still of questionable usefulness.

  16. hodlinator commented at 9:06 pm on October 8, 2025: contributor

    Concept ACK 04b943b63ca4c27aa857b03e5de82ed8cae7b17d

    Good to see progress on this issue!


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-10-10 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me