Headers-first sync never gets going #5138

issue gavinandresen openend this issue on October 24, 2014
  1. gavinandresen commented at 7:06 pm on October 24, 2014: contributor

    I ran into this writing a regression test; it is a weird edge case with the headers-first logic.

    The setup:

    Create a 200-block blockchain with blocks that have ‘old’ timestamps (see https://github.com/gavinandresen/bitcoin-git/tree/chainalerts for code to do that)

    Start up two nodes, both of which read in that 200-block chain. Connect node_b to node_a. Have node_b generate a new block. RESULT: node_a ignores it.

    Here’s -debug=net output from node_b:

    0UpdateTip: new best=00004900679d6cfbac61afa36cd6fda7f27af0e69e909373b947567d6dcc1d74  height=201
    1sending: inv (37 bytes) peer=1
    2received: getheaders (645 bytes) peer=1
    3getheaders 201 to 00004900679d6cfbac61afa36cd6fda7f27af0e69e909373b947567d6dcc1d74 from peer=1
    4sending: headers (82 bytes) peer=1
    

    and from node_a:

    0getheaders 200 to 0000000000000000000000000000000000000000000000000000000000000000 from peer=1
    1got inv: block 00004900679d6cfbac61afa36cd6fda7f27af0e69e909373b947567d6dcc1d74  new peer=1
    2received: headers (82 bytes) peer=1
    

    I think the bug is because since both nodes just started up and read blocks from disk, pindexLastCommonBlock is NULL – it should be block 200. I think.

     0(lldb) p state
     1(<anonymous>::CNodeState) $60 = {
     2  nMisbehavior = 0
     3  fShouldBan = false
     4  name = "127.0.0.1:59518"
     5  rejects = size=0 {}
     6  pindexBestKnownBlock = 0x0000000105122e10
     7  hashLastUnknownBlock = {
     8    base_uint<256> = {
     9      pn = ([0] = 0, [1] = 0, [2] = 0, [3] = 0, [4] = 0, [5] = 0, [6] = 0, [7] = 0)
    10    }
    11  }
    12  pindexLastCommonBlock = 0x0000000000000000
    13  fSyncStarted = false
    14  nStallingSince = 0
    15  vBlocksInFlight = size=0 {}
    16  nBlocksInFlight = 0
    17}
    

    Anyway, fFetch is never set to true on line 4422: -> 4422 bool fFetch = !pto->fInbound || (pindexBestHeader && (state.pindexLastCommonBlock ? state.pindexLastCommonBlock->nHeight : 0) + 144 > pindexBestHeader->nHeight);

    … because state.pindexLastCommonBlock is NULL, and 0+144 < 201

    Writing this all down… would it make sense to fix this by always fetching when state.pindexLastCommonBlock is NULL ?

  2. gavinandresen assigned sipa on Oct 24, 2014
  3. gavinandresen commented at 7:31 pm on October 24, 2014: contributor

    Possible patch that fixes the problem for me, not sure if it will cause other problems:

     0diff --git a/src/main.cpp b/src/main.cpp
     1index 0cfe90b..fc17c4b 100644
     2--- a/src/main.cpp
     3+++ b/src/main.cpp
     4@@ -4419,7 +4419,14 @@ bool SendMessages(CNode* pto, bool fSendTrickle)
     5         // Start block sync
     6         if (pindexBestHeader == NULL)
     7             pindexBestHeader = chainActive.Tip();
     8-        bool fFetch = !pto->fInbound || (pindexBestHeader && (state.pindexLastCommonBlock ? state.pindexLastCommonBlock->nHeight : 0) + 144 > pindexBestHeader->nHeight);
     9+        bool fFetch = !pto->fInbound;
    10+        if (!fFetch && pindexBestHeader)
    11+        {
    12+            if (state.pindexLastCommonBlock)
    13+                fFetch = (state.pindexLastCommonBlock->nHeight + 144 > pindexBestHeader->nHeight);
    14+            else
    15+                fFetch = true;
    16+        }
    17         if (!state.fSyncStarted && !pto->fClient && fFetch && !fImporting && !fReindex) {
    18             // Only actively request headers from a single peer, unless we're close to today.
    19             if (nSyncStarted == 0 || pindexBestHeader->GetBlockTime() > GetAdjustedTime() - 24 * 60 * 60) {
    
  4. sipa commented at 7:39 pm on October 24, 2014: member

    That patch will just make you download from all peers at startup (until you have something). That won’t hurt, but it’s not really the intended effect. What it’s trying to do is only fetch from non-outbound if we’re synced up to less than 144 blocks (a day) from the best known headers synced already.

    The problem is really just the lack of outbound connections. If there are no such connections, we should probably just fetch from all peers.

  5. ghost commented at 3:29 am on October 25, 2014: none

    Relatedly, the “chainActive.Tip()->GetBlockTime() > GetAdjustedTime() - Params().TargetSpacing() * 20” condition causes block data to not be fetched (immediately) upon receiving the block announcement when the current Tip is older by more than 3.33 hours (Params().TargetSpacing() * 20). Such a time gap is not unlikely with fully-synced nodes in regtest mode, but also not inconceivable in any.

    This causes immediate node divergence, as headers-first sync never gets going either. The patch corrects this, but I wonder whether the conditional logic needs revisiting too.

  6. gavinandresen commented at 3:47 pm on October 27, 2014: contributor

    @sipa : is there a higher-level description somewhere of how peers figure out what chains they have in common? I’m having trouble following the logic in main.cpp.

    Also, I get the feeling there might have been some premature optimization here; in particular, just looking at the data structures I expected that CNodeState->pindexLastCommonBlock should be non-null not too long after a peer has completed the version/verack sequence, but I guess that is not true? Seems to me upon connection peers should always tell each other what their best blocks are, even if they are at the same height.

  7. gavinandresen commented at 9:41 pm on November 19, 2014: contributor

    I ran into this again today writing a regression test. Here is a distilled test case that will fail:

     0#!/usr/bin/env python
     1# Copyright (c) 2014 The Bitcoin Core developers
     2# Distributed under the MIT software license, see the accompanying
     3# file COPYING or http://www.opensource.org/licenses/mit-license.php.
     4
     5#
     6# Test proper initiation of block sync between nodes
     7#
     8
     9from test_framework import BitcoinTestFramework
    10from bitcoinrpc.authproxy import AuthServiceProxy, JSONRPCException
    11from util import *
    12import os
    13import shutil
    14
    15class BlockSyncTest(BitcoinTestFramework):
    16
    17    def setup_network(self):
    18        # Nodes will all start with same chain, but
    19        # are disconnected
    20        self.nodes = start_nodes(4, self.options.tmpdir)
    21
    22    def run_test(self):
    23        # Have node0 mine two blocks, disconnected from other nodes
    24        blocks = self.nodes[0].setgenerate(True, 2)
    25
    26        # Now connect 0 to 1, then 2 to 0:
    27        connect_nodes(self.nodes[0], 1)
    28        connect_nodes(self.nodes[2], 0)
    29
    30        # Wait a few seconds for inv/getdata/block messages
    31        time.sleep(3)
    32
    33        # Everybody should have the same idea of best block
    34        assert_equal(self.nodes[1].getbestblockhash(), blocks[-1])
    35        assert_equal(self.nodes[2].getbestblockhash(), blocks[-1])
    36
    37        # if node3 mines three then connects, everybody should sync with
    38        # it's chain:
    39        blocks = self.nodes[3].setgenerate(True, 3)
    40        connect_nodes(self.nodes[3], 0)
    41        time.sleep(3)
    42        bestblocks = [ node.getbestblockhash() for node in self.nodes ]
    43        assert_equal(bestblocks, [blocks[-1]]*len(self.nodes) )
    44
    45if __name__ == '__main__':
    46    BlockSyncTest().main()
    

    Reversing the connection direction in the last connect_nodes(self.nodes[3], 0) and the test succeeds.

  8. morcos commented at 11:08 pm on December 1, 2014: member
    I went through the logic of this. When the connection direction is reversed, node3 is an outbound connection of node0 and is therefore a “nice” peer (fPreferredDownload is true). But in the connection direction of your test case, node3 is inbound and so its not preferred and since node0 has another preferredDownload peer (it made an outbound connection to node1) it does not feel the need to sync headers with node3. See main.cpp:SendMessages() at line 4520.
  9. laanwj added the label Bug on Dec 5, 2014
  10. laanwj added this to the milestone 0.11.0 on Jan 8, 2015
  11. laanwj removed this from the milestone 0.11.0 on Jan 8, 2015
  12. laanwj commented at 1:26 pm on January 8, 2015: member
    Should be fixed by #5157 (turns out to be not the case)
  13. laanwj closed this on Jan 8, 2015

  14. laanwj reopened this on Jan 8, 2015

  15. sdaftuar commented at 3:32 pm on January 8, 2015: member

    As discussed on irc, @gavinandresen’s test in his Nov 14 comment still fails. The problem is that if an inbound peer is multiple blocks ahead of you and you have other outbound peers, the download logic tries to download the intermediate blocks from one of the outbound peers. If the outbound peers haven’t announced the block, then you’ll never request them.

    That’s true even if an inbound peer continued to announce blocks that continue to build on its chain, you wouldn’t ever advance your tip to catch up, because you’ll be held up by not requesting the intermediate blocks, since you have no one to request them of.

  16. laanwj commented at 8:53 am on June 1, 2015: member
    Should be fixed by #5662
  17. laanwj closed this on Jun 1, 2015

  18. laanwj referenced this in commit cd737214ce on Sep 16, 2019
  19. sidhujag referenced this in commit f9a3ff0065 on Sep 16, 2019
  20. MarcoFalke referenced this in commit 59c138d2f1 on Sep 18, 2019
  21. random-zebra referenced this in commit 46585f7b62 on Mar 31, 2020
  22. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me