Stuck in Endless Pre-Syncing Headers Loop #26391

issue da2ce7 openend this issue on October 26, 2022
  1. da2ce7 commented at 8:26 am on October 26, 2022: none

    Expected behavior Pre-Sync Headers Like Normal.

    Actual behavior

    02022-10-26T07:34:14Z Pre-synchronizing blockheaders, height: 748853 (~98.55%)
    12022-10-26T07:34:16Z Pre-synchronizing blockheaders, height: 226853 (~31.00%)
    

    Unaffected by restating program.

    02022-10-26T07:34:14Z Pre-synchronizing blockheaders, height: 748853 (~98.55%)
    12022-10-26T07:34:16Z Pre-synchronizing blockheaders, height: 226853 (~31.00%)
    

    To reproduce

    I think that this is a one-off sort of error:

    Here is a backup of my .bitcoin folder. https://drive.proton.me/urls/V04QAGG998#GlCnfHpkWW7F

    The files before blk00047.dat and rev00047.dat are omitted, and need to be copied in from another source. As sipa says, the blk files are not deterministic. - Will upload the full .bitcoin folder so people can reproduce…

    Here is the 4.6gb full backup of my .bitcoin folder: https://drive.proton.me/urls/JA11NDEA14#GeG83qrpmvtt

    System information

    Fedora 37 Silverblue Running in Toolbox.

    Bitcoin Core: 28cf75697186ea8e473e120a643994bdf8237d6c

  2. da2ce7 added the label Bug on Oct 26, 2022
  3. maflcko commented at 8:34 am on October 26, 2022: member
    This may happen when one of the blocks in the main chain is marked invalid (for example due to corruption)
  4. fanquake renamed this:
    Stuck in Endless Pre-Syncing Headders Loop
    Stuck in Endless Pre-Syncing Headers Loop
    on Oct 26, 2022
  5. maflcko added the label P2P on Oct 26, 2022
  6. kouloumos commented at 9:13 am on October 26, 2022: contributor

    Your debug.log shows that before seeing this behavior you were doing IBD up until height=224854 were this happened, which I think matches what MarcoFalke said.

    02022-10-26T07:24:30Z UpdateTip: new best=00000000000000cd7d1c3d5137423c00e6a221d5492ace06d8fb9d990f2d7c96 height=224854 version=0x00000002 log2_work=69.513761 tx=14063312 date='2013-03-08T15:46:54Z' progress=0.018147 cache=356.2MiB(2733513txo)
    12022-10-26T07:24:30Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 878d6685666400b75a1947ccfc676249ecdf52678b2dc0d83e0328f8c24a951a, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    22022-10-26T07:24:30Z InvalidChainFound: invalid block=000000000000032021a6d18011d202df36cf07822a657b47390ab90568bb14e2  height=224855  log2_work=69.513793  date=2013-03-08T15:58:52Z
    32022-10-26T07:24:30Z InvalidChainFound:  current best=00000000000000cd7d1c3d5137423c00e6a221d5492ace06d8fb9d990f2d7c96  height=224854  log2_work=69.513761  date=2013-03-08T15:46:54Z
    42022-10-26T07:24:30Z ERROR: ConnectTip: ConnectBlock 000000000000032021a6d18011d202df36cf07822a657b47390ab90568bb14e2 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    

    On the next run it started the pre-sync phase from that height, and that’s the point it restarts every time.

    I’ve tried to reproduce using the backup of your directory, but I couldn’t. Probably because using it requires a -reindex.

  7. maflcko commented at 9:35 am on October 26, 2022: member

    Steps to reproduce (with a diff to force corruption):

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index 37e68cfe4a..811ff2f9eb 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -2201,7 +2201,7 @@ bool Chainstate::ConnectBlock(const CBlock& block, BlockValidationState& state,
     5         {
     6             CAmount txfee = 0;
     7             TxValidationState tx_state;
     8-            if (!Consensus::CheckTxInputs(tx, tx_state, view, pindex->nHeight, txfee)) {
     9+            if (Consensus::CheckTxInputs(tx, tx_state, view, pindex->nHeight, txfee)) {
    10                 // Any transaction validation failure in ConnectBlock is a block consensus failure
    11                 state.Invalid(BlockValidationResult::BLOCK_CONSENSUS,
    12                             tx_state.GetRejectReason(), tx_state.GetDebugMessage());
    

    then call ./src/qt/bitcoin-qt -datadir=/tmp -signet -printtoconsole=1

  8. fanquake commented at 10:16 am on October 26, 2022: member
  9. da2ce7 commented at 4:32 pm on October 26, 2022: none

    @kouloumos

    I’ve tried to reproduce using the backup of your directory, but I couldn’t. Probably because using it requires a -reindex.

    The files before blk00047.dat and rev00047.dat are omitted, and need to be copied in from another source.

  10. sipa commented at 4:49 pm on October 26, 2022: member
    @da2ce7 The contents of those files is not deterministic, as it depends on the order you received blocks in. It’s not necessarily possible for someone to reconstruct your state without those files (as some blocks may be before/after the cut off differently).
  11. sipa commented at 5:22 pm on October 26, 2022: member

    Discussed this a bit with @sdaftuar.

    What’s going on here is actually expected: your node believes that the chain other nodes are offering it is invalid, thus it’s correct behavior that it doesn’t actually manage to synchronize and accept that chain. This invalidity is only detected during the headers sync phase, and not during the new pre-sync phase that precedes it. The result is that your node goes through peers one by one, attempting to synchronize headers with them, by performing a full pre-sync phase, and only after that completes noticing they’re giving us a known invalid chain.

    A question is whether we could detect this during the pre-sync phase instead, which wouldn’t stop the lack of progress, but would avoid the bandwidth waste on repeated presyncs with everyone. The answer is yes - it wouldn’t be hard to check for known-invalid headers in the presync phase as well, however, I don’t think we want to do that because of fingerprinting reasons: it would permit an attacker to feed you an invalid, low-PoW, block during IBD, and then later follow you around the network by claiming to have a chain that extends this invalid block. If you stop fetching immediately, they know you’re the same node as the one they gave the invalid block to earlier.

    This fingerprinting is partially solvable: by keeping track of how much work was built on top of an invalid block, and if that work meets our anti-DoS threshold, permit reacting on known-invalid blocks when they’re fed to us during presync. This is however a fair bit of complexity and I’m not sure it’s worth it for just somewhat improving the situation for essentially broken nodes which will never recover anyway.

    Perhaps time is better spent on better reporting to the user, in the form of targetted warnings in logs (or even failure to start) when there appears to be a long invalid high-PoW chain out there.

  12. maflcko removed the label Bug on Oct 26, 2022
  13. maflcko added the label Feature on Oct 26, 2022
  14. sipa commented at 9:49 pm on October 26, 2022: member
    Arguably the fact that this results in corrupted node wasting bandwidth on redownloading headers multiple times is a 24.0 regression. But I don’t know if it’s worth fixing as it involves some complexity, and would only benefit already broken nodes anyway/
  15. maflcko added the label Brainstorming on Oct 27, 2022
  16. Shekelme commented at 3:43 am on May 16, 2023: none
    Same bug here. v24.0.1 Endless_Pre-synchronizing.txt
  17. aleks-mariusz commented at 11:15 am on June 20, 2023: none

    I’m seeing this with v25.0.0 as well :-/

    02023-06-20T11:09:46Z Pre-synchronizing blockheaders, height: 782560 (~98.44%)
    12023-06-20T11:09:49Z Pre-synchronizing blockheaders, height: 738560 (~93.00%)
    

    What are the recommendations on fixing this?

  18. maflcko commented at 12:01 pm on June 20, 2023: member

    What are the recommendations on fixing this?

    Bitcoin Core makes heavy use of CPU, RAM and disk IO. Hardware defects might only become visible when running Bitcoin Core. You might want to check your hardware for defects.

    • memtest86 to check your RAM
    • to check the CPU behaviour under load, use linpack or Prime95
    • to test your storage device use smartctl or CrystalDiskInfo

    Source: https://bitcoin.stackexchange.com/a/12206

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

  19. tansanDOTeth commented at 10:48 am on June 24, 2023: none

    Is this normal? I’m looking at the logs and it looks like it happens quite often.

     0023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 775060 (~97.47%)
     12023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 777060 (~97.71%)
     22023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 779060 (~97.95%)
     32023-06-24T10:32:33Z Pre-synchronizing blockheaders, height: 781060 (~98.19%)
     42023-06-24T10:32:34Z Pre-synchronizing blockheaders, height: 783060 (~98.43%)
     52023-06-24T10:32:39Z New outbound peer connected: version: 70016, blocks=795702, peer=408 (outbound-full-relay)
     62023-06-24T10:32:40Z New outbound peer connected: version: 70016, blocks=795702, peer=409 (outbound-full-relay)
     72023-06-24T10:32:46Z Pre-synchronizing blockheaders, height: 335060 (~42.81%)
     82023-06-24T10:32:46Z New outbound peer connected: version: 70016, blocks=795702, peer=410 (outbound-full-relay)
     92023-06-24T10:32:49Z New outbound peer connected: version: 70015, blocks=795702, peer=411 (outbound-full-relay)
    102023-06-24T10:32:51Z Pre-synchronizing blockheaders, height: 337060 (~43.06%)
    112023-06-24T10:32:58Z Pre-synchronizing blockheaders, height: 339060 (~43.31%)
    122023-06-24T10:33:06Z Pre-synchronizing blockheaders, height: 341060 (~43.57%)
    132023-06-24T10:33:12Z Pre-synchronizing blockheaders, height: 343060 (~43.82%)
    142023-06-24T10:33:20Z Pre-synchronizing blockheaders, height: 345060 (~44.07%)
    152023-06-24T10:33:29Z Pre-synchronizing blockheaders, height: 347060 (~44.32%)
    162023-06-24T10:33:34Z Pre-synchronizing blockheaders, height: 349060 (~44.58%)
    172023-06-24T10:33:44Z Pre-synchronizing blockheaders, height: 351060 (~44.83%)
    182023-06-24T10:33:49Z Pre-synchronizing blockheaders, height: 353060 (~45.09%)
    
  20. aleks-mariusz commented at 10:53 am on June 24, 2023: none

    What are the recommendations on fixing this?

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

    This helped, re-indexing, but this throws away the entire progress haivng been made, and starts over at 0% :-/ it took 3+ days to get back to current state sadly w/ my hardware/network connection

  21. tansanDOTeth commented at 10:55 am on June 24, 2023: none

    What are the recommendations on fixing this?

    If your hardware doesn’t have any faults, you can do a -reindex to wipe the corrupt block file from the storage.

    This helped, re-indexing, but this throws away the entire progress haivng been made, and starts over at 0% :-/ it took 3+ days to get back to current state sadly w/ my hardware/network connection

    Is there a way to do this from the GUI?

    Edit: For Mac users:

    0/Applications/Bitcoin-Qt.app/Contents/MacOS/Bitcoin-Qt -reindex
    
  22. willcl-ark commented at 8:11 am on July 1, 2024: member

    Do we want to keep this open to address this comment?:

    Perhaps time is better spent on better reporting to the user, in the form of targetted warnings in logs (or even failure to start) when there appears to be a long invalid high-PoW chain out there.

    Otherwise I think we can probably close this as stale.

  23. jstefanop commented at 3:17 pm on December 13, 2024: none

    FYI we are seeing this across a large number of our Nodes (FutureBit Apollos we have 10’s of thousands of nodes from our users on the network).

    Re-index usually fixes, but there should be a way for core to detect this corruption and self fix/reindex from the point of corruption. Being stuck in an endless pre-sync loop is not great (or at least shutdown the node with an error?)

  24. maflcko commented at 2:41 pm on December 16, 2024: member

    detect this corruption and self fix/reindex from the point of corruption.

    I am not sure this will be an improvement, because:

    • It will turn an endless per-sync loop into an endless reindex loop (at least for some).
    • If there is an issue with the hardware, it would be better to diagnose and fix it, instead of continuing. Otherwise the issue will possibly be silently ignored and re-appear in the future.

    My recommendation would be to check which configs of your hardware run into this problem and then try to determine and fix the root cause.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me