Stuck in Endless Pre-Syncing Headers Loop #26391

issue da2ce7 openend this issue on October 26, 2022
  1. da2ce7 commented at 8:26 am on October 26, 2022: none

    Expected behavior Pre-Sync Headers Like Normal.

    Actual behavior

    02022-10-26T07:34:14Z Pre-synchronizing blockheaders, height: 748853 (~98.55%)
    12022-10-26T07:34:16Z Pre-synchronizing blockheaders, height: 226853 (~31.00%)
    

    Unaffected by restating program.

    02022-10-26T07:34:14Z Pre-synchronizing blockheaders, height: 748853 (~98.55%)
    12022-10-26T07:34:16Z Pre-synchronizing blockheaders, height: 226853 (~31.00%)
    

    To reproduce

    I think that this is a one-off sort of error:

    Here is a backup of my .bitcoin folder. https://drive.proton.me/urls/V04QAGG998#GlCnfHpkWW7F

    The files before blk00047.dat and rev00047.dat are omitted, and need to be copied in from another source. As sipa says, the blk files are not deterministic. - Will upload the full .bitcoin folder so people can reproduce…

    Here is the 4.6gb full backup of my .bitcoin folder: https://drive.proton.me/urls/JA11NDEA14#GeG83qrpmvtt

    System information

    Fedora 37 Silverblue Running in Toolbox.

    Bitcoin Core: 28cf75697186ea8e473e120a643994bdf8237d6c

  2. da2ce7 added the label Bug on Oct 26, 2022
  3. maflcko commented at 8:34 am on October 26, 2022: member
    This may happen when one of the blocks in the main chain is marked invalid (for example due to corruption)
  4. fanquake renamed this:
    Stuck in Endless Pre-Syncing Headders Loop
    Stuck in Endless Pre-Syncing Headers Loop
    on Oct 26, 2022
  5. maflcko added the label P2P on Oct 26, 2022
  6. kouloumos commented at 9:13 am on October 26, 2022: contributor

    Your debug.log shows that before seeing this behavior you were doing IBD up until height=224854 were this happened, which I think matches what MarcoFalke said.

    02022-10-26T07:24:30Z UpdateTip: new best=00000000000000cd7d1c3d5137423c00e6a221d5492ace06d8fb9d990f2d7c96 height=224854 version=0x00000002 log2_work=69.513761 tx=14063312 date='2013-03-08T15:46:54Z' progress=0.018147 cache=356.2MiB(2733513txo)
    12022-10-26T07:24:30Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 878d6685666400b75a1947ccfc676249ecdf52678b2dc0d83e0328f8c24a951a, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    22022-10-26T07:24:30Z InvalidChainFound: invalid block=000000000000032021a6d18011d202df36cf07822a657b47390ab90568bb14e2  height=224855  log2_work=69.513793  date=2013-03-08T15:58:52Z
    32022-10-26T07:24:30Z InvalidChainFound:  current best=00000000000000cd7d1c3d5137423c00e6a221d5492ace06d8fb9d990f2d7c96  height=224854  log2_work=69.513761  date=2013-03-08T15:46:54Z
    42022-10-26T07:24:30Z ERROR: ConnectTip: ConnectBlock 000000000000032021a6d18011d202df36cf07822a657b47390ab90568bb14e2 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    

    On the next run it started the pre-sync phase from that height, and that’s the point it restarts every time.

    I’ve tried to reproduce using the backup of your directory, but I couldn’t. Probably because using it requires a -reindex.

  7. maflcko commented at 9:35 am on October 26, 2022: member

    Steps to reproduce (with a diff to force corruption):

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index 37e68cfe4a..811ff2f9eb 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -2201,7 +2201,7 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
     5         {
     6             CAmount txfee = 0;
     7             TxValidationState tx_state;
     8-            if (!Consensus::CheckTxInputs(tx, tx_state, view, pindex->nHeight, txfee)) {
     9+            if (Consensus::CheckTxInputs(tx, tx_state, view, pindex->nHeight, txfee)) {
    10                 // Any transaction validation failure in ConnectBlock is a block consensus failure
    11                 state.Invalid(BlockValidationResult::BLOCK_CONSENSUS,
    12                             tx_state.GetRejectReason(), tx_state.GetDebugMessage());
    

    then call ./src/qt/bitcoin-qt -datadir=/tmp -signet -printtoconsole=1

    or even with a reduced min chain work, to speed up the retry-loop:

    ./bld-cmake/bin/bitcoin-qt -datadir=/tmp -signet -printtoconsole=1 -minimumchainwork=11c4ff1d99 -debug=validation

  8. fanquake commented at 10:16 am on October 26, 2022: member
  9. da2ce7 commented at 4:32 pm on October 26, 2022: none

    @kouloumos

    I’ve tried to reproduce using the backup of your directory, but I couldn’t. Probably because using it requires a -reindex.

    The files before blk00047.dat and rev00047.dat are omitted, and need to be copied in from another source.

  10. sipa commented at 4:49 pm on October 26, 2022: member
    @da2ce7 The contents of those files is not deterministic, as it depends on the order you received blocks in. It’s not necessarily possible for someone to reconstruct your state without those files (as some blocks may be before/after the cut off differently).
  11. sipa commented at 5:22 pm on October 26, 2022: member

    Discussed this a bit with @sdaftuar.

    What’s going on here is actually expected: your node believes that the chain other nodes are offering it is invalid, thus it’s correct behavior that it doesn’t actually manage to synchronize and accept that chain. This invalidity is only detected during the headers sync phase, and not during the new pre-sync phase that precedes it. The result is that your node goes through peers one by one, attempting to synchronize headers with them, by performing a full pre-sync phase, and only after that completes noticing they’re giving us a known invalid chain.

    A question is whether we could detect this during the pre-sync phase instead, which wouldn’t stop the lack of progress, but would avoid the bandwidth waste on repeated presyncs with everyone. The answer is yes - it wouldn’t be hard to check for known-invalid headers in the presync phase as well, however, I don’t think we want to do that because of fingerprinting reasons: it would permit an attacker to feed you an invalid, low-PoW, block during IBD, and then later follow you around the network by claiming to have a chain that extends this invalid block. If you stop fetching immediately, they know you’re the same node as the one they gave the invalid block to earlier.

    This fingerprinting is partially solvable: by keeping track of how much work was built on top of an invalid block, and if that work meets our anti-DoS threshold, permit reacting on known-invalid blocks when they’re fed to us during presync. This is however a fair bit of complexity and I’m not sure it’s worth it for just somewhat improving the situation for essentially broken nodes which will never recover anyway.

    Perhaps time is better spent on better reporting to the user, in the form of targetted warnings in logs (or even failure to start) when there appears to be a long invalid high-PoW chain out there.

  12. maflcko removed the label Bug on Oct 26, 2022
  13. maflcko added the label Feature on Oct 26, 2022
  14. sipa commented at 9:49 pm on October 26, 2022: member
    Arguably the fact that this results in corrupted node wasting bandwidth on redownloading headers multiple times is a 24.0 regression. But I don’t know if it’s worth fixing as it involves some complexity, and would only benefit already broken nodes anyway/
  15. maflcko added the label Brainstorming on Oct 27, 2022
  16. Shekelme commented at 3:43 am on May 16, 2023: none
    Same bug here. v24.0.1 Endless_Pre-synchronizing.txt
  17. aleks-mariusz commented at 11:15 am on June 20, 2023: none

    I’m seeing this with v25.0.0 as well :-/

    02023-06-20T11:09:46Z Pre-synchronizing blockheaders, height: 782560 (~98.44%)
    12023-06-20T11:09:49Z Pre-synchronizing blockheaders, height: 738560 (~93.00%)
    

    What are the recommendations on fixing this?

  18. maflcko commented at 12:01 pm on June 20, 2023: member

    What are the recommendations on fixing this?

    Bitcoin Core makes heavy use of CPU, RAM and disk IO. Hardware defects might only become visible when running Bitcoin Core. You might want to check your hardware for defects.

    • memtest86 to check your RAM
    • to check the CPU behaviour under load, use linpack or Prime95
    • to test your storage device use smartctl or CrystalDiskInfo

    Source: https://bitcoin.stackexchange.com/a/12206

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

  19. tansanDOTeth commented at 10:48 am on June 24, 2023: none

    Is this normal? I’m looking at the logs and it looks like it happens quite often.

     0023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 775060 (~97.47%)
     12023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 777060 (~97.71%)
     22023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 779060 (~97.95%)
     32023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 781060 (~98.19%)
     42023-06-24T10:32:34Z Pre-synchronizing blockheaders, height: 783060 (~98.43%)
     52023-06-24T10:32:39Z New outbound peer connected: version: 70016, blocks=795702, peer=408 (outbound-full-relay)
     62023-06-24T10:32:40Z New outbound peer connected: version: 70016, blocks=795702, peer=409 (outbound-full-relay)
     72023-06-24T10:32:46Z Pre-synchronizing blockheaders, height: 335060 (~42.81%)
     82023-06-24T10:32:46Z New outbound peer connected: version: 70016, blocks=795702, peer=410 (outbound-full-relay)
     92023-06-24T10:32:49Z New outbound peer connected: version: 70015, blocks=795702, peer=411 (outbound-full-relay)
    102023-06-24T10:32:51Z Pre-synchronizing blockheaders, height: 337060 (~43.06%)
    112023-06-24T10:32:58Z Pre-synchronizing blockheaders, height: 339060 (~43.31%)
    122023-06-24T10:33:06Z Pre-synchronizing blockheaders, height: 341060 (~43.57%)
    132023-06-24T10:33:12Z Pre-synchronizing blockheaders, height: 343060 (~43.82%)
    142023-06-24T10:33:20Z Pre-synchronizing blockheaders, height: 345060 (~44.07%)
    152023-06-24T10:33:29Z Pre-synchronizing blockheaders, height: 347060 (~44.32%)
    162023-06-24T10:33:34Z Pre-synchronizing blockheaders, height: 349060 (~44.58%)
    172023-06-24T10:33:44Z Pre-synchronizing blockheaders, height: 351060 (~44.83%)
    182023-06-24T10:33:49Z Pre-synchronizing blockheaders, height: 353060 (~45.09%)
    
  20. aleks-mariusz commented at 10:53 am on June 24, 2023: none

    What are the recommendations on fixing this?

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

    This helped, re-indexing, but this throws away the entire progress haivng been made, and starts over at 0% :-/ it took 3+ days to get back to current state sadly w/ my hardware/network connection

  21. tansanDOTeth commented at 10:55 am on June 24, 2023: none

    What are the recommendations on fixing this?

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

    This helped, re-indexing, but this throws away the entire progress haivng been made, and starts over at 0% :-/ it took 3+ days to get back to current state sadly w/ my hardware/network connection

    Is there a way to do this from the GUI?

    Edit: For Mac users:

    0/Applications/Bitcoin-Qt.app/Contents/MacOS/Bitcoin-Qt -reindex
    
  22. willcl-ark commented at 8:11 am on July 1, 2024: member

    Do we want to keep this open to address this comment?:

    Perhaps time is better spent on better reporting to the user, in the form of targetted warnings in logs (or even failure to start) when there appears to be a long invalid high-PoW chain out there.

    Otherwise I think we can probably close this as stale.

  23. jstefanop commented at 3:17 pm on December 13, 2024: none

    FYI we are seeing this across a large number of our Nodes (FutureBit Apollos we have 10’s of thousands of nodes from our users on the network).

    Re-index usually fixes, but there should be a way for core to detect this corruption and self fix/reindex from the point of corruption. Being stuck in an endless pre-sync loop is not great (or at least shutdown the node with an error?)

  24. maflcko commented at 2:41 pm on December 16, 2024: member

    detect this corruption and self fix/reindex from the point of corruption.

    I am not sure this will be an improvement, because:

    • It will turn an endless per-sync loop into an endless reindex loop (at least for some).
    • If there is an issue with the hardware, it would be better to diagnose and fix it, instead of continuing. Otherwise the issue will possibly be silently ignored and re-appear in the future.

    My recommendation would be to check which configs of your hardware run into this problem and then try to determine and fix the root cause.

  25. eshutov commented at 12:47 pm on January 12, 2025: none

    I have faced with the same issue:

    02025-01-12T12:01:56Z Pre-synchronizing blockheaders, height: 852216 (~97.02%)
    12025-01-12T12:02:00Z Pre-synchronizing blockheaders, height: 854216 (~97.23%)
    22025-01-12T12:02:02Z Pre-synchronizing blockheaders, height: 856216 (~97.47%)
    32025-01-12T12:02:29Z Pre-synchronizing blockheaders, height: 712215 (~81.31%)
    42025-01-12T12:02:33Z Pre-synchronizing blockheaders, height: 714215 (~81.52%)
    52025-01-12T12:02:37Z Pre-synchronizing blockheaders, height: 716215 (~81.75%)
    

    Pre-synchronizing repeats in infinity loop and consumes network bandwidth. I confirm this happened for me due to hardware issue with my PC (GPU extra power cord wasn’t connected originally and this caused kernel backtraces in dmesg). As was mentioned above i tried to do -reindex. But this didn’t help. Still see the messages like above. Possibly the best option for those who ran into this is entire blockchain redownloading.

    version: v28.0.0

  26. mzumsande commented at 9:03 pm on February 12, 2025: contributor

    A question is whether we could detect this during the pre-sync phase instead, which wouldn’t stop the lack of progress, but would avoid the bandwidth waste on repeated presyncs with everyone. The answer is yes - it wouldn’t be hard to check for known-invalid headers in the presync phase as well, however, I don’t think we want to do that because of fingerprinting reasons: it would permit an attacker to feed you an invalid, low-PoW, block during IBD, and then later follow you around the network by claiming to have a chain that extends this invalid block. If you stop fetching immediately, they know you’re the same node as the one they gave the invalid block to earlier.

    I don’t think that fingerprinting is a concern here. An attacker cannot feed us an invalid, low-PoW block during IBD to fingerprint us - if they could get us to accept such a header to our block tree db somehow, that would mean that the Anti-DoS headers-sync algorithm would have failed, and we’d probably have way bigger problems than fingerprinting.

    So I don’t think there is a conceptual problem with detecting this in the pre-sync phase.

    But the point remains that this is just a symptom of the underlying corruption - we’d still endlessly cycle through peers that we would now disconnect immediately for sending us an “invalid” header, so I’m not sure how that would actually be more helpful for users?

  27. sipa commented at 9:06 pm on February 12, 2025: member

    @mzumsande Agreed, reading back what I wrote, I don’t see why I thought fingerprinting would be an issue.

    I think the most useful course of actual here is detecting the presence of a high-PoW header-invalid chain, and reporting it to the user as a sign of likely corruption. Unsure what that would mean for anything other than the GUI, though.

  28. mzumsande commented at 10:04 pm on February 12, 2025: contributor

    I think the most useful course of actual here is detecting the presence of a high-PoW header-invalid chain, and reporting it to the user as a sign of likely corruption. Unsure what that would mean for anything other than the GUI, though.

    We actually have this kind of check (CheckForkWarningConditions) but have it disabled during IBD for no good reason I can see (at least after #19905). I’ll work on a PR suggesting to use this check in the pre-sync header loop situation (and also rework it, e.g. I don’t think it needs to be called in ActivateBestChainStep()).

  29. jblachly commented at 12:34 pm on October 6, 2025: none

    Still happening in 29.1

     02025-10-06T12:27:14Z Pre-synchronizing blockheaders, height: 846996 (~92.38%)
     12025-10-06T12:27:15Z Pre-synchronizing blockheaders, height: 852996 (~93.04%)
     22025-10-06T12:27:15Z Pre-synchronizing blockheaders, height: 856996 (~93.46%)
     32025-10-06T12:27:15Z Pre-synchronizing blockheaders, height: 860996 (~93.89%)
     42025-10-06T12:27:15Z Pre-synchronizing blockheaders, height: 866996 (~94.54%)
     52025-10-06T12:27:16Z Pre-synchronizing blockheaders, height: 874996 (~95.38%)
     62025-10-06T12:27:16Z Pre-synchronizing blockheaders, height: 880996 (~96.04%)
     72025-10-06T12:27:18Z Pre-synchronizing blockheaders, height: 368995 (~40.84%)
     82025-10-06T12:27:22Z Pre-synchronizing blockheaders, height: 370995 (~41.06%)
     92025-10-06T12:27:26Z Pre-synchronizing blockheaders, height: 372995 (~41.28%)
    102025-10-06T12:27:26Z Pre-synchronizing blockheaders, height: 374995 (~41.50%)
    112025-10-06T12:27:28Z Pre-synchronizing blockheaders, height: 376995 (~41.71%)
    122025-10-06T12:27:32Z Pre-synchronizing blockheaders, height: 378995 (~41.94%)
    

    The reported approximate percentage at which it “resets” is typically +/- a couple of percent of 96.04%, and occasionally is the exact same value (96.04%), but has been as lo was 91%.

  30. mzumsande commented at 9:52 pm on October 6, 2025: contributor

    I’ll work on a PR suggesting to use this check in the pre-sync header loop situation

    Forgot about this completely, but opened #33553 now to improve the logging (we’ll still circle through peers trying to get headers that we don’t view as invalid, but will now repeatedly log warning suggesting to reindex if the problem persists)

  31. hodlinator referenced this in commit ff2fd5ca89 on Nov 19, 2025
  32. hodlinator referenced this in commit f0d4bd06c0 on Nov 19, 2025
  33. glozow closed this on Dec 9, 2025

  34. glozow referenced this in commit 2c44c41984 on Dec 9, 2025
  35. da2ce7 commented at 8:39 pm on December 9, 2025: none

    @glozow thank you for your work towards resolution of this issue; however the software still gets stuck in an endless loop. :( - Just with better logging :)

    I do not think that this issue should be closed (as completed) until the client has some sort of resolution; (for example triggering a full recheck of the data and fixing the corruption; or aborting and throwing a fatal error)…

  36. mzumsande commented at 9:10 pm on December 9, 2025: contributor

    I’m not sure if there really is a better solution - there is a judgement call involved by the node operator on whether there is local corruption, or whether the peers are out of consensus and we are correct in viewing the block as invalid (e.g. in case of a catastrophic consensus split). I don’t think we would want to automate making that call. We wouldn’t want to reindex or shut down automatically, because this could be turned into an (expensive) attack by miners, who could deliberately create an invalid block with valid PoW.

    Strictly speaking there is no endless loop. We connect to one peer, sync headers until we get to the invalid block, and then disconnect and search for another peer to download headers from, and so on. The loop is only “endless” if all peers in the network have a different view on consensus than we have, and we have no way of knowing for sure if this is the case.

  37. roqqit commented at 9:51 pm on December 9, 2025: none

    This is very fragile on testnets that fork a lot. For example on dogecoin (ya not bitcoin but hey its a software fork), there were a lot of testnet forks for whatever reason and presyncing spun in an endless loop (I let it run for over 24 hours). I found a way around this by setting minimumchainwork=0. This “fix” works but also signals that presyncing is broken.

    I do not have a clear path forward but it does appear that presyncing is too aggressive and should instead work on a best-effort basis. As a user my expectation would be for headers to start processing after a few hours even if presyncing failed. That way, I could at least take action and start invalidating blocks or whatever. But when presyncing gets stuck there is nothing obvious to do since I am stuck at genesis.

  38. sipa commented at 0:04 am on December 10, 2025: member
    @roqqit That sounds like an unrelated issue, which may be specific to your fork. This issue is about the node not detecting that its database is corrupted, and thus incorrectly rejecting the valid chain when it is offered.
  39. hodlinator commented at 1:46 pm on December 12, 2025: contributor

    Agree with #26391 (comment) that only adding warnings feels unsatisfying. My initial suggestion was also to halt the node. But I now also agree with #26391 (comment), we cannot make a definite classification between corruption and consensus forks.

    The original issue at the top here had a log where the node was still syncing ancestor blocks to assumevalid. Within the same run: outputs were (or should have been) added to the chainstate DB in one block and then validation of a later block failed to find them when searching for inputs (my digging into the data: #33553 (comment), was still suggesting more drastic measures there). I think this kind of IBD case might merit at least more certain language:

    block %s was previously marked invalid (received from peer=%i).
    It's an ancestor to the assume valid block though! Unless we are running with
    a malicious -assumevalid setting or are experiencing a very unlikely fork in
    consensus, this strongly indicates database corruption (that -reindex may fix).
    
     0--- a/src/net_processing.cpp
     1+++ b/src/net_processing.cpp
     2@@ -2957,10 +2957,25 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
     3         if (state.IsInvalid()) {
     4             if (!pfrom.IsInboundConn() && state.GetResult() == BlockValidationResult::BLOCK_CACHED_INVALID) {
     5                 // Warn user if outgoing peers send us headers of blocks that we previously marked as invalid.
     6-                LogWarning("%s (received from peer=%i). "
     7-                           "If this happens with all peers, consider database corruption (that -reindex may fix) "
     8-                           "or a potential consensus incompatibility.",
     9-                           state.GetDebugMessage(), pfrom.GetId());
    10+                bool assume_valid_ancestor{false};
    11+                if (!m_chainman.AssumedValidBlock().IsNull()) {
    12+                    LOCK(cs_main);
    13+                    node::BlockMap::const_iterator it{m_chainman.m_blockman.m_block_index.find(m_chainman.AssumedValidBlock())};
    14+                    assume_valid_ancestor = it != m_chainman.m_blockman.m_block_index.end() &&
    15+                                            it->second.GetAncestor(pindexLast->nHeight) == pindexLast;
    16+                }
    17+                if (assume_valid_ancestor) {
    18+                    LogWarning("%s (received from peer=%i). "
    19+                               "It's an ancestor to the assume valid block though! Unless we are running with "
    20+                               "a malicious -assumevalid setting or are experiencing a very unlikely fork in "
    21+                               "consensus, this strongly indicates database corruption (that -reindex may fix).",
    22+                               state.GetDebugMessage(), pfrom.GetId());
    23+                } else {
    24+                    LogWarning("%s (received from peer=%i). "
    25+                               "If this happens with all peers, consider database corruption (that -reindex may fix) "
    26+                               "or a potential consensus incompatibility.",
    27+                               state.GetDebugMessage(), pfrom.GetId());
    28+                }
    29             }
    30             MaybePunishNodeForBlock(pfrom.GetId(), state, via_compact_block, "invalid header received");
    31             return;
    

    Since we only do this for outbound connections thanks to #33553 (review), the ancestor lookup shouldn’t be too much of a DoS vector.

    Would there be any interest in such a follow-up?


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-13 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me