bitcoin 0.10.0rc1 stopped accepting blocks at height 336370 #5588

issue jhoenicke openend this issue on January 1, 2015
  1. jhoenicke commented at 9:34 am on January 1, 2015: none

    I have the release candidate 1 running and noticed today that it is still at height 336370. The block 336371 is not in the database, but I think it is still putting new blocks beyond that into the database.

    I have the bitcoind still running if someone has suggestions how to debug this, further. I don’t depend on it and haven’t tried yet if restarting solves the problem.

    I have the relay node client from Matt Corallo running, which feeds the local bitcoind with blocks and transactions from the relay network.

    Update: I noticed that the block 336370 that was accepted is the orphaned block 000000000000000000929bbefb29a690c8913b3c066946d998bfd7c5426fb5e7.

  2. laanwj added the label Bug on Jan 1, 2015
  3. laanwj commented at 11:39 am on January 1, 2015: member

    Weird. Is there anything in the tail of the debug.log about errors accepting a block? Or e.g. leveldb checksum errors? Was this during the initial sync, or a node that has been running for a long time? Was it upgraded from 0.9?

    I wonder if A) it ever received the correct block 336370 (hash 00000000000000000c3ac3a00701d773648fa304a324a7a11eb1e4cf71c8c972) and rejected it, or B) that it isn’t requesting that for some reason, and therefore stuck on the wrong side. Or the node requested it but hasn’t received it…

  4. laanwj commented at 11:45 am on January 1, 2015: member
    Make sure you save the debug.log before restarting the node (as that may rotate it and lose information). For troubleshooting it would help a lot if you mail it (gzipped/optionally pgp encrypted) to laanwj@gmail.com or upload it somewhere.
  5. laanwj commented at 12:20 pm on January 1, 2015: member
    Also: can you post the output of the getchaintips RPC?
  6. jhoenicke commented at 12:31 pm on January 1, 2015: none

    Can it be a problem with compatibility to the relay network client. I noticed that 336372 came directly after 336371. debug.log contains the lines

    02014-12-28 21:03:08 ERROR: AcceptBlockHeader : prev block not found
    12014-12-28 21:03:08 ERROR: ProcessNewBlock : AcceptBlock FAILED
    22014-12-28 21:03:08 Misbehaving: 127.0.0.1:40312 (0 -> 10)
    

    Maybe the blocks were relayed in the wrong order.

    Update: block 336372 was sent before 336371 (both came from the same miner). Update 2: I should check the facts more carefully. 336372 and 336373 came from the same miner with reversed time stamps. 336371 came from a different miner a minute earlier (according to time stamp). Of course, the time stamps are not accurate enough to say anything about the order they were sent through the network.

    I restarted the server and it recovered from the problem.

  7. laanwj commented at 11:25 am on January 2, 2015: member

    That makes sense - it looks like a subtle bug, it rejected the header because it could not find the previous block, then just never requested it again. None of these messages mentions which block is concerned, but we could assume it is 336371.

    Possibly, interactions with @bluematt’s relay client may break an assumption in the P2P code.

  8. laanwj added this to the milestone 0.10.0 on Jan 2, 2015
  9. laanwj added the label Priority High on Jan 2, 2015
  10. laanwj added the label P2P on Jan 2, 2015
  11. laanwj commented at 11:38 am on January 2, 2015: member

    I’ve augmented your blk overview with the block numbers, and they tell a similar story

    0[main 336368] Block: 000000000000000007113bc080f3442d74d11e02ca156e39c92290a78d70403a parent: 000000000000000007edde8a1649ca5f967c95142920bb20408cfc1b9f1a2a66
    1[main 336369] Block: 0000000000000000189923f03da1a3130919808dc354074267ead5988b87ad19 parent: 000000000000000007113bc080f3442d74d11e02ca156e39c92290a78d70403a
    2[stale 336370] Block: 000000000000000000929bbefb29a690c8913b3c066946d998bfd7c5426fb5e7 parent: 0000000000000000189923f03da1a3130919808dc354074267ead5988b87ad19
    3[main 336372] Block: 0000000000000000038c09063f665c6a3b906d0d6f0969572dc4a9d119f3b6ea parent: 00000000000000000b858f3f63fa38dd01051abaa5974aec68c3c2712602205d
    4[main 336370] Block: 00000000000000000c3ac3a00701d773648fa304a324a7a11eb1e4cf71c8c972 parent: 0000000000000000189923f03da1a3130919808dc354074267ead5988b87ad19
    5[main 336373] Block: 0000000000000000084413b21b030408acf50f2741ac623973ef8a9676e93b78 parent: 0000000000000000038c09063f665c6a3b906d0d6f0969572dc4a9d119f3b6ea
    6[main 336374] Block: 00000000000000001124faaba22545ebea7ec0a5fc0218f972915698c417af9e parent: 0000000000000000084413b21b030408acf50f2741ac623973ef8a9676e93b78
    7[main 336375] Block: 000000000000000015a4f30d82ee0b924836815f7afb5d420852eedda59f10c9 parent: 00000000000000001124faaba22545ebea7ec0a5fc0218f972915698c417af9e
    

    The main chain’s block 336371 00000000000000000b858f3f63fa38dd01051abaa5974aec68c3c2712602205d was never received and stored, so likely never requested again.

  12. ajweiss commented at 9:45 pm on January 2, 2015: contributor
    Dumb question: Was this node connected to any peers other than the relaynode peer?
  13. jhoenicke commented at 9:59 am on January 3, 2015: none

    Yes, it was connected; it usually has a lot of incoming connections in addition to the usual eight outgoing connections.

    If I understand the problem correctly it didn’t request block 336371 from the other peers ever again although it requested the parent block 336370 later. Even a misbehaving peer should not be able to stop bitcoind from accepting a valid block.

    The problem was solved by restarting.

  14. sdaftuar commented at 10:35 pm on January 3, 2015: member

    I’ve been trying to track this down and I think I have a guess as to what could have occurred.

    Which version of the relay client are you using? Am I right in assuming you’ve whitelisted it?

    I believe at least the java and python versions of the relay client will send an inv for blocks they deliver followed by the block itself as part of their delivery, and because the relay client is probably whitelisted, that the relay node will be a preferred download peer and could be assigned a block to download. If that were to happen, then the node would get stuck because the getdata would go unanswered, but because the block is “in flight” no other node would be asked to provide it.

    More specifically, my guess is that the following likely occurred: a) The network forks at block 370 (as shown above).

    b) Block 371 is inv’ed and delivered by the relay node. This block would be discarded since its parent is unknown. The getheaders message sent to the relay node is discarded by the relay code.

    c) Block 372 is inv’ed by one of the other peers. The getheaders request sent in response was presumably answered and mapBlockIndex now contains the alternate block 370 along with blocks 371 and 372.

    Since you indicated most of your other peers are inbound connections, I think it’s likely that the peer who inv’ed you 372 is not a preferred download peer (see main.cpp:295). I believe that the inv handling code would request block 372 itself from the peer that inv’ed it, but to download block 371, my understanding is that the code defers assignment until later, in SendMessages.

    When SendMessages is called for each node, I believe the only preferred download peer that would have the block as available is the relay node peer. The relay would be a preferred download peer if it’s whitelisted (even though it’s inbound), and in ProcessBlockAvailability (main.cpp:348) the hash for block 371, which was unknown when received, should then be able to be found in the block index and could be assigned.

    Assuming no other peers have inv’ed blocks 371 or 372 at the point , the node would have sent a getdata to the relay peer. Unfortunately the relay peer ignores those messages, and I believe this would trigger a bug highlighted in #5463, namely that having a getdata ignored by a client causes the node to get stuck with the tip not advancing.

    If this analysis is right, it’s not clear to me what the right solution to this is; presumably #5463 would solve this problem from lingering, but perhaps a solution that gets the tip moving more quickly again would be preferable, so that miners using the relay client don’t find themselves mining on an old tip?

  15. Michagogo commented at 11:20 pm on January 3, 2015: contributor

    I had what might be a related issue. I noticed that my syncing (catching up, when starting up, running 0.10.0rc1) was stalled, and looking at the log showed that it had been stuck for over 2 hours, and that the next block was in flight from one peer. Also, the blocks ahead seemed to be getting loaded in, as it was switching block files. I used a program to forcibly close the socket, disconnecting the peer that was supposedly serving me that block, and 80 seconds later I was synced.

    getchaintips, getpeerinfo, debug.log excerpt

    You can see it synced normally until block 337077 at 2015-01-03 19:38:06, at which point it stopped, but continued to fill in the blocks ahead of there. At 2015-01-03 21:42:17 I killed the connection to that peer, at which point syncing proceeded normally.

  16. jhoenicke commented at 8:42 am on January 4, 2015: none

    I don’t have the relay on the whitelist (I didn’t bother to figure out how to do it yet). I use the Java version of RelayNode; the latest version pre-compiled in Matt’s repository (from Sept). https://github.com/TheBlueMatt/RelayNode/blob/master/client/RelayNodeClient.jar

    I haven’t asked for getpeerinfo, but I don’t think the block was stalled by another peer. At the time I restarted it was 3.5 days behind.

  17. sipa commented at 3:12 pm on January 4, 2015: member

    So it sounds like there are two independent problems:

    • Outside of IBD, stalling detection doesn’t work, so we want a means to prevent unbounded waiting for a block that doesn’t arrive (like #5463, see my comment there).
    • The relay client doesn’t reply to getheaders - this is not a problem to bitcoin core itself, but it does make the relay client useless for transferring reorgs.
  18. Michagogo commented at 3:14 pm on January 4, 2015: contributor

    IBD meaning initial node bootstrap specifically though, not catching up on start.

    On Sun, Jan 4, 2015 at 5:13 PM, Pieter Wuille notifications@github.com wrote:

    So it sounds like there are two independent problems:

    • Outside of IBD, stalling detection doesn’t work, so we want a means to prevent unbounded waiting for a block that doesn’t arrive (like #5463 #5463, see my comment there).
    • The relay client doesn’t reply to getheaders - this is not a problem to bitcoin core itself, but it does make the relay client useless for transferring reorgs.

    — Reply to this email directly or view it on GitHub #5588 (comment).

  19. sipa commented at 3:27 pm on January 4, 2015: member
    @Michagogo Eh? I’m using IBD here to refer to whenever you’re more than 1024 blocks behind.
  20. Michagogo commented at 3:29 pm on January 4, 2015: contributor

    Ah, okay.

    On Sun, Jan 4, 2015 at 5:28 PM, Pieter Wuille notifications@github.com wrote:

    @Michagogo https://github.com/Michagogo Eh? I’m using IBD here to refer to whenever you’re more than 1024 blocks behind.

    — Reply to this email directly or view it on GitHub #5588 (comment).

  21. laanwj removed this from the milestone 0.10.0 on Jan 12, 2015
  22. laanwj commented at 10:33 am on January 12, 2015: member
    Workaround in #5608, so removing 0.10.0 milestone
  23. EagleTM commented at 6:31 pm on January 12, 2015: none

    I ran into this testing 0.10.0rc1 (on 32 bit ARM) while catching up ~2000 blocks Curiously it got stuck on height=338072 (stays as last connected best block) as well for some reason:

    debug log excerpt peerinfo (a couple of hours later but still stuck) getchaintips

  24. luke-jr commented at 6:35 pm on January 12, 2015: member
    Please test 0.10.0rc3 now, this should be fixed…
  25. laanwj closed this on May 18, 2015

  26. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-05 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me