p2p: Sync chain more readily from inbound peers during IBD #24171

pull sdaftuar wants to merge 2 commits into bitcoin:master from sdaftuar:2022-01-download-from-inbound changing 3 files +62 −3
  1. sdaftuar commented at 9:09 pm on January 26, 2022: member

    When in IBD, if the honest chain is only known by inbound peers, then we must eventually sync from them in order to learn it. This change allows us to perform initial headers sync and fetch blocks from inbound peers, if we have no blocks in flight.

    The restriction on having no blocks in flight means that we will naturally throttle our block downloads to any such inbound peers that we may be downloading from, until we leave IBD. This is a tradeoff between preferring outbound peers for most of our block download, versus making sure we always eventually will get blocks we need that are only known by inbound peers even during IBD, as otherwise we may be stuck in IBD indefinitely (which could have cascading failure on the network, if a large fraction of the network managed to get stuck in IBD).

    Note that the test in the second commit fails on master, without the first commit.

  2. sdaftuar commented at 9:30 pm on January 26, 2022: member
    See discussion in #21106, eg #21106 (comment)
  3. DrahtBot added the label P2P on Jan 26, 2022
  4. DrahtBot commented at 11:03 pm on January 26, 2022: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24970 (net processing: Move cleanSubVer, fPreferredDownload and nLocalHostNonce to Peer by jnewbery)
    • #24008 (assumeutxo: net_processing changes by jamesob)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. in test/functional/p2p_block_sync.py:28 in 4fdd7fe3ea outdated
    23+    def run_test(self):
    24+        self.log.info("Setup network: node0->node1->node2")
    25+        self.log.info("Mining one block on node0")
    26+        self.generate(self.nodes[0], 1)
    27+        self.log.info("Check that all nodes synced")
    28+        self.sync_all()
    


    MarcoFalke commented at 8:10 am on January 27, 2022:

    nit: I think this can be removed, since self.generate already calls this internally (and fails if the sync fails).

    It might also be good to explain why three nodes are needed and why node0->node1 isn’t sufficient to test this.

    My understanding is that this is setting up node1 as the subject under test. Having an outbound and an inbound connection, of which the outbound is selected for block download, but fails to provide any data. The test would pass with just node0->node1 setup which means that we already sync from inbound peers if there are no outbounds at all?


    shaavan commented at 12:10 pm on January 27, 2022:

    I checked for the suggestions of @MarcoFalke, and made sure they were correct. In the definition of the generate function, sync_all is being called by default. https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/test_framework.py#L633-L636 Also, I checked that the test works perfectly with just two nodes, 0 and 1. I made the following changes to the test to incorporate the suggestions.

    Diff:

     0     def set_test_params(self):
     1         self.setup_clean_chain = True
     2-        self.num_nodes = 3
     3+        self.num_nodes = 2
     4 
     5     def setup_network(self):
     6         self.setup_nodes()
     7         self.connect_nodes(0, 1)
     8-        self.connect_nodes(1, 2)
     9 
    10     def run_test(self):
    11-        self.log.info("Setup network: node0->node1->node2")
    12-        self.log.info("Mining one block on node0")
    13+        self.log.info("Setup network: node0->node1")
    14+        self.log.info("Mining one block on node0 and check that all nodes synced")
    15         self.generate(self.nodes[0], 1)
    16-        self.log.info("Check that all nodes synced")
    17-        self.sync_all()
    18         self.log.info("Success!")
    

    MarcoFalke commented at 12:17 pm on January 27, 2022:

    Also, I checked that the test works perfectly with just two nodes, 0 and 1.

    Jup, the test with two nodes is passing on master and this pull. I think the goal of the test was to have a test that fails on master, but passes with this pull. For this you need three nodes.


    shaavan commented at 12:35 pm on January 27, 2022:

    Jup, the test with two nodes is passing on master and this pull. I think the goal of the test was to have a test that fails on master but passes with this pull.

    I just confirmed that it’s true. I think the number of nodes should be kept at 3.

    We can still incorporate the following change to the test to remove redundancy from it:

    Diff:

    0def run_test(self):
    1         self.log.info("Setup network: node0->node1->node2")
    2-        self.log.info("Mining one block on node0")
    3+        self.log.info("Mining one block on node0 and check that all nodes synced")
    4         self.generate(self.nodes[0], 1)
    5-        self.log.info("Check that all nodes synced")
    6-        self.sync_all()
    7         self.log.info("Success!")
    

    sdaftuar commented at 2:27 pm on January 27, 2022:

    My understanding is that this is setting up node1 as the subject under test. Having an outbound and an inbound connection, of which the outbound is selected for block download, but fails to provide any data. The test would pass with just node0->node1 setup which means that we already sync from inbound peers if there are no outbounds at all? @MarcoFalke Yes that is correct. We already had logic in place to download blocks from inbound peers in the case that we had no outbound peers, but prior to this patch we have had no way to download blocks that are only known by inbound peers if we’re in IBD.


    sdaftuar commented at 2:27 pm on January 27, 2022:
    I’ve updated the test comments and removed the self.sync_all() call.
  6. MarcoFalke approved
  7. MarcoFalke commented at 8:12 am on January 27, 2022: member
    Concept ACK. Makes sense to sync from inbounds if the outbounds don’t provide any blocks. Seems like this is already the behaviour when there are no outbounds at all.
  8. in src/net_processing.cpp:4619 in 4fdd7fe3ea outdated
    4615+            // putting undue load on (say) some home user who is just making
    4616+            // outbound connections to the network, but if our only source of
    4617+            // the latest blocks is from an inbound peer, we have to be sure to
    4618+            // eventually download it (and not just wait indefinitely for an
    4619+            // outbound peer to have it).
    4620+            if (nPreferredDownload == 0 || mapBlocksInFlight.empty()) {
    


    PastaPastaPasta commented at 8:44 am on January 27, 2022:

    nit: I would prefer combining this smth like

     0        } else if (!pto->fClient && !pto->IsAddrFetchConn()) &&
     1                  (nPreferredDownload == 0 || mapBlocksInFlight.empty())) {
     2            // Typically this is an inbound peer. If we don't have any outbound
     3            // peers, or if we aren't downloading any blocks from such peers,
     4            // then allow block downloads from this peer, too.
     5            // We prefer downloading blocks from outbound peers to avoid
     6            // putting undue load on (say) some home user who is just making
     7            // outbound connections to the network, but if our only source of
     8            // the latest blocks is from an inbound peer, we have to be sure to
     9            // eventually download it (and not just wait indefinitely for an
    10            // outbound peer to have it).
    

    shaavan commented at 11:21 am on January 27, 2022:

    I prefer the current arrangement of the conditions, because:

    1. The following comment says: Typically this is an inbound peer. This here is referring to the conditions !pto->fClient && !pto->IsAddrFetchConn() describing the node from which we want to fetch the block header. The other two conditions are specific to our node, which are checking if there is any preferred block to download on our side or if we have Blocks in Flight. Merging these two sets of conditions would make the following comment confusing.
    2. Keeping these two sets of conditions separate makes it easier to understand the code’s logic and easier to follow with the comments.

    sdaftuar commented at 2:29 pm on January 27, 2022:
    Thanks, I don’t think many code reviewers will find the new arrangement of these conditions meaningfully harder to read than the old code here, and I do expect that many reviewers would indeed find the old code much harder to read, so I’m inclined to leave this as I’ve rewritten it.

    dergoegge commented at 9:09 pm on March 14, 2022:
    iiuc the logic here is supposed to allow block/header downloads from inbound peers when the outbound peers don’t deliver but this seems to ignore the preference for outbound peers during the initial header sync. Should pto be an inbound peer on the first call to SendMessages then we would request initial header sync from pto since there are no blocks in flight at that time. However none of the outbound peers have done anything wrong (e.g. stall header download) that should make us prefer the inbound peer.

    sdaftuar commented at 11:58 am on March 15, 2022:

    If our first connection is an inbound peer, then our existing logic is likely to perform initial headers sync from that peer anyway (because we’ll sync from an inbound peer if we have no outbounds). So inherently there’s some risk if inbound and outbound connections are racing each other at startup that we might pick an inbound peer for initial sync.

    On the other hand, if we have multiple peers (inbound and outbound), then once we’ve begun initial headers sync with one peer (and nSyncStarted gets incremented above 0), then we won’t initiate headers sync based on this test anymore – instead the new test becomes whether our headers chain is close to caught up (see line 4626 below).


    dergoegge commented at 6:20 pm on March 17, 2022:

    I think I overestimated our preference for initial header sync from outbound peers and my concerns were to conservative considering the current syncing logic. My assumption was that we very strongly prefer outbound peers to avoid being fed an alternative chain but it seems that we already accept headers during IBD from any peer (inbound/outbound) whether we requested initial sync from them or not.

    Thank you for your detailed response!

  9. shaavan commented at 12:17 pm on January 27, 2022: contributor

    Concept ACK

    This PR, along with the previous condition, incorporates a new condition for selecting the peer for downloading block headers. This condition states that if there are no preferred Peers to download and no BlockinFlight, download the blocks from available peer even if that is an inbound peer. I agree with the logic of this change described in the PR’s description. I especially like the commenting of the code, which made it easier to reason with the code’s logic.

    I shall ACK this PR, as soon as this comment is addressed.

  10. jonatack commented at 2:18 pm on January 27, 2022: member
    Concept and review ACK 4fdd7fe3ea347efc6e6b16c9dd9e2ea9950e6936, modulo #24171 (review). The test fails as expected on master with AssertionError: Block sync timed out.
  11. sdaftuar force-pushed on Jan 27, 2022
  12. jonatack commented at 2:23 pm on January 27, 2022: member
    Perhaps have a look at test/functional/feature_minchainwork.py to see if some of it needs updating with this change or if the test could be added there.
  13. sdaftuar force-pushed on Jan 27, 2022
  14. sdaftuar commented at 2:32 pm on January 27, 2022: member

    @jonatack See also #24178, where I do change the behavior of nMinimumChainWork and update the test for it.

    I like the idea of having this very simple test from this PR in our repo, because I have stumbled upon this exact sequence of steps breaking in the past while writing tests, and I’d like that to never happen again! I do agree with you that there is likely room for further improving the feature_minchainwork.py test to incorporate this kind of logic too, but it seems like a bigger rewrite to me.


    Just wanted to add an additional meta-review note; I think we could bikeshed the test here or the code style indefinitely, but more important to me is that we are happy with the changed criteria in this PR. While I think my proposal here works overall, I do think there is plenty of room to criticize or defend the behavior I’m proposing, so I’d welcome feedback either way about the specifics of the change and the expected impact on the network.

    For instance, one line of thinking is that perhaps we should not prefer outbound peers over inbound peers during IBD at all, now that -maxuploadtarget is a setting that users may use to prevent their node from serving historical blocks if they don’t want to be doing that. I opted for a smaller behavior change in this patch, but I could see the argument for making a bigger one (and honestly that would simplify our reasoning about initial sync much more!).

  15. jonatack commented at 2:47 pm on January 27, 2022: member
    ACK 15c1687642631ca5bd8ddd6aa316776aee7e3ccd
  16. jamesob commented at 3:38 pm on January 27, 2022: member

    I’ve read the code on Github and I agree with the nature of the changes. The burden should always be on data providers to limit externally-requested bandwidth usage per their own thresholds (rather than relying on peers to be “considerate” a priori). On the other hand, unnecessary stalls (and certainly the prospect of network-wide stalls) in IBD are something worth avoiding. So this PR’s strategy of allowing requests to inbound peers for blocks when no blocks are otherwise in flight seems reasonable and even preferable given the possibility of stalling otherwise.

    I’ll review and test locally in a bit.

  17. ghost commented at 3:35 pm on February 10, 2022: none
    Concept ACK
  18. naumenkogs commented at 11:31 am on March 11, 2022: member

    ACK 15c1687642631ca5bd8ddd6aa316776aee7e3ccd

    I think it’s a good change even if we opt for something bigger in the future. nit: We could have a one printed statement here saying this chain is less trusted because it comes from inbound peers only.

  19. dergoegge commented at 9:22 pm on March 14, 2022: member

    Concept ACK

    Downloading blocks from inbound peers when the outbound peers don’t deliver sounds good to me.

    However mapBlocksInFlight.empty() might not be the best condition to determine if headers should be downloaded from an inbound peer. Just because there are no blocks in flight does not mean that the outbound peers won’t deliver headers.

  20. sdaftuar commented at 12:14 pm on March 15, 2022: member

    @dergoegge Thanks for the review and flagging the question about headers sync. I believe the logic I’m proposing here works ok because the test of whether we actually try for headers sync with an inbound peer isn’t only based on whether mapBlocksInFlight is empty, but also if nSyncStarted == 0 (our guard to try to sync headers from only one peer at a time). So in order for us to pick an inbound peer for headers sync, it would have to be the case that we somehow have not yet picked an existing outbound peer for headers sync.

    This could happen if (1) an outbound peer stalls headers sync and we disconnect them, or if (2) we happen to connect to an inbound peer before any outbound peers are established. I think case (1) is unlikely because if any peer sends us a block INV during the period that we’re waiting on headers, we’ll initiate headers sync with that peer (we send a getheaders in response to INV messages), and the window for headers sync to complete should be something like 20 minutes.

    Case (2) is a somewhat unavoidable issue, because when we start up we don’t know if we’ll have any outbound peers and so if we don’t try to sync from an inbound peer eventually, we may never sync the chain. (And this is how the logic works today already.)

  21. DrahtBot added the label Needs rebase on Apr 20, 2022
  22. Sammiesorrow27 approved
  23. Sync chain more readily from inbound peers during IBD
    When in IBD, if the honest chain is only known by inbound peers, then we must
    eventually sync from them in order to learn it. This change allows us to
    perform initial headers sync and fetch blocks from inbound peers, if we have no
    blocks in flight.
    
    The restriction on having no blocks in flight means that we will naturally
    throttle our block downloads to any such inbound peers that we may be
    downloading from, until we leave IBD. This is a tradeoff between preferring
    outbound peers for most of our block download, versus making sure we always
    eventually will get blocks we need that are only known by inbound peers even
    during IBD, as otherwise we may be stuck in IBD indefinitely (which could have
    cascading failure on the network, if a large fraction of the network managed to
    get stuck in IBD).
    0569b5c4bb
  24. Add functional test for block sync from inbound peers 48262a00f5
  25. sdaftuar force-pushed on May 17, 2022
  26. DrahtBot removed the label Needs rebase on May 17, 2022
  27. ajtowns commented at 11:44 pm on June 1, 2022: member

    ACK 48262a00f58489d705314ee3c31136133040bb0e

    Seems like a straightforward improvement, as demonstrated by the test case. Seems to me like there could be better conditions, but better to deploy a simple improvement now than debate forever about the best possible logic and deploy nothing.

  28. sipa commented at 6:54 pm on June 2, 2022: member
    ACK 48262a00f58489d705314ee3c31136133040bb0e
  29. laanwj merged this on Jun 2, 2022
  30. laanwj closed this on Jun 2, 2022

  31. sidhujag referenced this in commit 8c3c29b815 on Jun 3, 2022
  32. jonatack commented at 1:49 pm on June 6, 2022: member

    Post-merge re-ACK 48262a00f58489d705314ee3c31136133040bb0e

    Expanded the test to verify explicitly for myself that both of the sites using the sync_blocks_and_headers_from_peer logic are hit in p2p_block_sync.py, by checking that node1 sends both an initial getheaders and a getdata (blocks) message.

     0diff --git a/test/functional/p2p_block_sync.py b/test/functional/p2p_block_sync.py
     1index d821edc1b1..b76a02b295 100755
     2--- a/test/functional/p2p_block_sync.py
     3+++ b/test/functional/p2p_block_sync.py
     4@@ -28,8 +28,14 @@ class BlockSyncTest(BitcoinTestFramework):
     5 
     6     def run_test(self):
     7         self.log.info("Setup network: node0->node1->node2")
     8-        self.log.info("Mining one block on node0 and verify all nodes sync")
     9-        self.generate(self.nodes[0], 1)
    10+        self.log.info("Mine one block on node0, verify all nodes sync, and that node1 sends an initial getheaders and a getdata (blocks)")
    11+        with self.nodes[1].assert_debug_log(
    12+            expected_msgs=[
    13+                "[SendMessages] [net] initial getheaders (0) to peer=1 (startheight:0)",
    14+                "[SendMessages] [net] Requesting block ", " (1) peer=0",
    15+            ]
    16+        ):
    17+            self.generate(self.nodes[0], 1)
    18         self.log.info("Success!")
    
  33. DrahtBot locked this on Jun 21, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me