p2p: Advance pindexLastCommonBlock early after connecting blocks #32180

pull mzumsande wants to merge 4 commits into bitcoin:master from mzumsande:202403_ibd_lastcommonblock changing 2 files +29 βˆ’30
  1. mzumsande commented at 10:40 pm on March 31, 2025: contributor

    As described in #32179, pindexLastCommonBlock is updated later than necessary in master. In case of a linear chain with no forks, it can be moved forward at the beginning of FindNextBlocksToDownload, so that the updated value can be used to better estimate nWindowEnd. This helps the node to request all blocks from peers within the correct 1024-block-window and avoids peers being incorrectly marked as stallers.

    The second commit removes a special case introduced in 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb which is not necessary, because the more general logic also covers this case.

    I also changed p2p_ibd_stalling.py to cover the situation after a resolved situation, making sure that no additional peers are marked for stalling.

    Fixes #32179

  2. DrahtBot commented at 10:40 pm on March 31, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32180.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK yuvicc
    Stale ACK furszy, stringintech, achow101, vasild

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. mzumsande marked this as a draft on Mar 31, 2025
  4. DrahtBot added the label P2P on Mar 31, 2025
  5. mzumsande marked this as ready for review on Apr 2, 2025
  6. mzumsande force-pushed on Apr 7, 2025
  7. in src/net_processing.cpp:1408 in 205e651713 outdated
    1380+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1381+    }
    1382+
    1383     // If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
    1384     // of its current tip anymore. Go back enough to fix that.
    1385     state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);
    


    stringintech commented at 9:59 am on April 10, 2025:

    Could there be an issue if both nodes reorganize to the same chain, but the last common block is still on another fork?

    If our new tip height is less than the outdated pindexLastCommonBlock height (like when reorging to a shorter chain with more work), we’d skip the optimization when we shouldn’t.

    Since LastCommonAncestor() brings back the pindexLastCommonBlock on the peer’s best chain, maybe the new code should come after this call?

    Not sure if this edge case matters in practice, but thought I’d mention it.​​​​​​​​​​​​​​​​


    mzumsande commented at 7:45 pm on April 11, 2025:
    Good observation! I also think that this edge case shouldn’t matter much. It’s absolutely essential that pindexLastCommonBlock is an ancestor of pindexBestKnownBlock when calling FindNextBlocks, which is why I originally left that check last. But the added check can’t change that since it only sets pindexLastCommonBlock to the tip if that tip is an ancestor of pindexBestKnownBlock. So I’ve changed the order.
  8. stringintech commented at 10:02 am on April 10, 2025: contributor
    Concept ACK.
  9. yuvicc commented at 7:47 am on April 11, 2025: contributor

    Concept ACK

    Thanks for this PR.

    I have made some notes regarding this PR and will add some guide for testing as well very soon.

  10. mzumsande force-pushed on Apr 11, 2025
  11. in src/net_processing.cpp:1416 in c69ee2d5b9 outdated
    1376@@ -1379,6 +1377,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
    1377     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
    1378         return;
    1379 
    1380+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
    1381+    // allows us to determine nWindowEnd better.
    1382+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
    1383+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1384+    }
    


    vasild commented at 10:36 am on April 25, 2025:

    This change assumes their tip is higher than our tip. What if their tip is lower? Maybe worth advancing pindexLastCommonBlock in that case as well. Something like this:

     0// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to
     1// download any blocks in between, and skipping ahead here allows us to determine nWindowEnd better.
     2const auto our_tip{m_chainman.ActiveTip()};
     3const auto peer_best_known{state->pindexBestKnownBlock};
     4if (our_tip->nHeight > state->pindexLastCommonBlock->nHeight) {
     5    if (peer_best_known->nHeight >= our_tip->nHeight) { 
     6        if (peer_best_known->GetAncestor(our_tip->nHeight) == our_tip) {
     7            state->pindexLastCommonBlock = our_tip;
     8        }     
     9    } else {                        
    10        if (our_tip->GetAncestor(peer_best_known->nHeight) == peer_best_known) {
    11            state->pindexLastCommonBlock = peer_best_known;
    12        }
    13        // FindNextBlocks() would be a noop below, so return here.
    14        return;
    15    }     
    16}
    

    mzumsande commented at 2:54 pm on April 25, 2025:
    Then we would have returned earlier because there are no blocks from this peer that would help us with our IBD: https://github.com/bitcoin/bitcoin/blob/80e6ad9e3023a57a4ef19b7d0edf9ac5be71a584/src/net_processing.cpp#L1384-L1387 (second condition)

    vasild commented at 9:10 am on April 28, 2025:
    Does it make sense to advance pindexLastCommonBlock in that case? I am just curious, it is definitely not the aim of this PR.

    mzumsande commented at 8:05 pm on April 28, 2025:
    I’d say probably not because pindexLastCommonBlock isn’t really used for anything but assigning blocks during IBD. If we will never need any blocks from this peer (until it sends us more headers, but then pindexLastCommonBlock will get updated. I’ll also ignore weird invalidateblock scenarios here), I don’t see what benefits advancing it anyway would have.
  12. in test/functional/p2p_ibd_stalling.py:70 in c69ee2d5b9 outdated
    66@@ -67,6 +67,7 @@ def run_test(self):
    67             height += 1
    68             block_dict[blocks[-1].sha256] = blocks[-1]
    69         stall_block = blocks[0].sha256
    70+        second_stall = blocks[500].sha256  # another block we don't provide immediately
    


    vasild commented at 11:57 am on April 25, 2025:

    nit: could use an array here:

    0stall_blocks = [blocks[0].sha256, blocks[500].sha256]
    

    which would make it possible to have the following code like:

    0- peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
    1+ peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
    2
    3- self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
    4+ self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
    

    mzumsande commented at 8:02 pm on April 28, 2025:
    done
  13. in test/functional/p2p_ibd_stalling.py:145 in c69ee2d5b9 outdated
    144+            self.all_sync_send_with_ping(peers)
    145 
    146-        self.log.info("Check that all outstanding blocks get connected")
    147-        self.wait_until(lambda: node.getblockcount() == NUM_BLOCKS)
    148+        self.log.info("Check that all outstanding blocks up to the second stall block get connected")
    149+        self.wait_until(lambda: node.getblockcount() == 500)
    


    vasild commented at 12:00 pm on April 25, 2025:
    nit: could use a variable second_stall_block_index = 500 or something like that to avoid repeating the number 500.

    mzumsande commented at 8:03 pm on April 28, 2025:
    changed to use one var for each index, plus one array.

    vasild commented at 12:51 pm on September 12, 2025:

    Looks like, according to git range-diff b7ff6a611a...3bdff9a154, this and the suggestion above were reverted in the latest push? If that was not intentional, then maybe amend this change in the latest commit:

     0--- i/test/functional/p2p_ibd_stalling.py
     1+++ w/test/functional/p2p_ibd_stalling.py
     2@@ -63,29 +63,30 @@ class P2PIBDStallingTest(BitcoinTestFramework):
     3             blocks.append(create_block(tip, create_coinbase(height), block_time))
     4             blocks[-1].solve()
     5             tip = blocks[-1].hash_int
     6             block_time += 1
     7             height += 1
     8             block_dict[blocks[-1].hash_int] = blocks[-1]
     9-        stall_block = blocks[0].hash_int
    10-        second_stall = blocks[500].hash_int  # another block we don't provide immediately
    11+        stall_index = 0
    12+        second_stall_index = 500
    13+        stall_blocks = [blocks[stall_index].hash_int, blocks[second_stall_index].hash_int]
    14 
    15         headers_message = msg_headers()
    16         headers_message.headers = [CBlockHeader(b) for b in blocks[:NUM_BLOCKS-1]]
    17         peers = []
    18 
    19         self.log.info("Check that a staller does not get disconnected if the 1024 block lookahead buffer is filled")
    20         self.mocktime = int(time.time()) + 1
    21         node.setmocktime(self.mocktime)
    22         for id in range(NUM_PEERS):
    23-            peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
    24+            peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
    25             peers[-1].block_store = block_dict
    26             peers[-1].send_and_ping(headers_message)
    27 
    28         # Wait until all blocks are received (except for the stall blocks), so that no other blocks are in flight.
    29-        self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
    30+        self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
    31 
    32         self.all_sync_send_with_ping(peers)
    33         # If there was a peer marked for stalling, it would get disconnected
    34         self.mocktime += 3
    35         node.setmocktime(self.mocktime)
    36         self.all_sync_send_with_ping(peers)
    37@@ -100,13 +101,13 @@ class P2PIBDStallingTest(BitcoinTestFramework):
    38 
    39         self.log.info("Check that the stalling peer is disconnected after 2 seconds")
    40         self.mocktime += 3
    41         node.setmocktime(self.mocktime)
    42         peers[0].wait_for_disconnect()
    43         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 1)
    44-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
    45+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
    46         # Make sure that SendMessages() is invoked, which assigns the missing block
    47         # to another peer and starts the stalling logic for them
    48         self.all_sync_send_with_ping(peers)
    49 
    50         self.log.info("Check that the stalling timeout gets doubled to 4 seconds for the next staller")
    51         # No disconnect after just 3 seconds
    52@@ -115,37 +116,37 @@ class P2PIBDStallingTest(BitcoinTestFramework):
    53         self.all_sync_send_with_ping(peers)
    54         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 1)
    55 
    56         self.mocktime += 2
    57         node.setmocktime(self.mocktime)
    58         self.wait_until(lambda: sum(x.is_connected for x in node.p2ps) == NUM_PEERS - 2)
    59-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
    60+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
    61         self.all_sync_send_with_ping(peers)
    62 
    63         self.log.info("Check that the stalling timeout gets doubled to 8 seconds for the next staller")
    64         # No disconnect after just 7 seconds
    65         self.mocktime += 7
    66         node.setmocktime(self.mocktime)
    67         self.all_sync_send_with_ping(peers)
    68         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 2)
    69 
    70         self.mocktime += 2
    71         node.setmocktime(self.mocktime)
    72         self.wait_until(lambda: sum(x.is_connected for x in node.p2ps) == NUM_PEERS - 3)
    73-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
    74+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
    75         self.all_sync_send_with_ping(peers)
    76 
    77         self.log.info("Provide the first withheld block and check that stalling timeout gets reduced back to 2 seconds")
    78         with node.assert_debug_log(expected_msgs=['Decreased stalling timeout to 2 seconds'], unexpected_msgs=['Stall started']):
    79             for p in peers:
    80-                if p.is_connected and (stall_block in p.getdata_requests):
    81-                    p.send_without_ping(msg_block(block_dict[stall_block]))
    82+                if p.is_connected and (stall_blocks[0] in p.getdata_requests):
    83+                    p.send_without_ping(msg_block(block_dict[stall_blocks[0]]))
    84             self.all_sync_send_with_ping(peers)
    85 
    86         self.log.info("Check that all outstanding blocks up to the second stall block get connected")
    87-        self.wait_until(lambda: node.getblockcount() == 500)
    88+        self.wait_until(lambda: node.getblockcount() == second_stall_index)
    89 
    90 
    91     def all_sync_send_with_ping(self, peers):
    92         for p in peers:
    93             if p.is_connected:
    94                 p.sync_with_ping()
    

    mzumsande commented at 10:04 pm on September 15, 2025:
    That was not on purpose - sorry. Fixed now.
  14. vasild approved
  15. vasild commented at 12:23 pm on April 25, 2025: contributor

    ACK c69ee2d5b93296d3008d6978182b2bc29bbeb457

    The code changes look reasonable. I tested that p2p_ibd_stalling.py fails without commit 1d8504c5e6 p2p: During block download, adjust pindexLastCommonBlock right away. It failed 7 times and passed 1 time. Is it expected to pass sometimes? I also checked that the code inside the modified ifs is executed during p2p_ibd_stalling.py.

  16. DrahtBot requested review from stringintech on Apr 25, 2025
  17. mzumsande force-pushed on Apr 28, 2025
  18. mzumsande commented at 8:02 pm on April 28, 2025: contributor

    c69ee2d to b7ff6a6: addressed feedback

    Is it expected to pass sometimes?

    It wasn’t - but I have tracked this down now. It’s a bit involved, I hope the following explanation makes sense:

    The initial situation is that there are 2 remaining connected peers, the node has just synced blocks up to height 500, but pindexLastCommonBlock is still at 0. The second stall block at height 501 is already requested from one peer, let’s say peer0.

    If SendMessages is invoked for peer0, it will request no blocks, update pindexLastCommonBlock to 500 and won’t mark anyone as a staller, because it’s already requested from the same peer. If after that, peer1 would be processed, peer0 would be marked as a staller. But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it. If peer1 is processed then (for which pindexLastCommonBlock is still at 0), all 1024+1 blocks in the window will already be requested, so we never get to this point because we always continue here. I think this could be fixed by exchanging the order of this and this code block. I’m not sure if this is something that needs fixing though because the current patch should avoid this problem too…

  19. DrahtBot added the label CI failed on Apr 29, 2025
  20. vasild approved
  21. vasild commented at 9:27 am on April 29, 2025: contributor

    ACK b7ff6a611a220e9380f6cd6428f1d3483c8d566f

    But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it

    I confirm this, made it reproducible:

    • git show 855b1fc2b9 |git apply -R
    • observe p2p_ibd_stalling.py is failing (mostly).
    • apply the patch below to mess with the order in which nodes are processed
    • observe p2p_ibd_stalling.py succeeding
     0--- i/src/net.cpp
     1+++ w/src/net.cpp
     2@@ -3029,12 +3029,14 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
     3         pnode->ConnectedThroughNetwork(),
     4         GetNodeCount(ConnectionDirection::Out));
     5 }
     6 
     7 Mutex NetEventsInterface::g_msgproc_mutex;
     8 
     9+static size_t seq{0};
    10+
    11 void CConnman::ThreadMessageHandler()
    12 {
    13     LOCK(NetEventsInterface::g_msgproc_mutex);
    14 
    15     while (!flagInterruptMsgProc)
    16     {
    17@@ -3043,13 +3045,22 @@ void CConnman::ThreadMessageHandler()
    18         {
    19             // Randomize the order in which we process messages from/to our peers.
    20             // This prevents attacks in which an attacker exploits having multiple
    21             // consecutive connections in the m_nodes list.
    22             const NodesSnapshot snap{*this, /*shuffle=*/true};
    23 
    24-            for (CNode* pnode : snap.Nodes()) {
    25+            auto nodes = snap.Nodes();
    26+            std::ranges::sort(nodes, [](CNode* a, CNode* b) {
    27+                return seq % 3 == 0 ? a->GetId() > b->GetId() : a->GetId() < b->GetId();
    28+            });
    29+
    30+            ++seq;
    31+
    32+            LogInfo("===== begin");
    33+            for (CNode* pnode : nodes) {
    34+                LogInfo("===== node id: %d, disconnect: %d", pnode->GetId(), pnode->fDisconnect);
    35                 if (pnode->fDisconnect)
    36                     continue;
    37 
    38                 // Receive messages
    39                 bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
    40                 fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
    41@@ -3058,12 +3069,13 @@ void CConnman::ThreadMessageHandler()
    42                 // Send messages
    43                 m_msgproc->SendMessages(pnode);
    44 
    45                 if (flagInterruptMsgProc)
    46                     return;
    47             }
    48+            LogInfo("===== end");
    49         }
    50 
    51         WAIT_LOCK(mutexMsgProc, lock);
    52         if (!fMoreWork) {
    53             condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
    54         }
    
  22. DrahtBot removed the label CI failed on Apr 29, 2025
  23. DrahtBot added the label Needs rebase on Jul 18, 2025
  24. p2p: During block download, adjust pindexLastCommonBlock right away
    This allows us to calculate nWndowEnd better.
    Before, it would be calculated based on an old pindexLastCommonBlock,
    which could result in blocks not requested for download that could have been
    requested, and peers being wrongly marked as staller.
    0a757acf66
  25. mzumsande force-pushed on Jul 21, 2025
  26. mzumsande commented at 3:27 pm on July 21, 2025: contributor
    b7ff6a6 to 3bdff9a: rebased due to conflict with #32868
  27. DrahtBot removed the label Needs rebase on Jul 21, 2025
  28. DrahtBot added the label CI failed on Jul 21, 2025
  29. DrahtBot removed the label CI failed on Jul 22, 2025
  30. in src/net_processing.cpp:1416 in 0a757acf66 outdated
    1410@@ -1411,6 +1411,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
    1411     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
    1412         return;
    1413 
    1414+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
    1415+    // allows us to determine nWindowEnd better.
    1416+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
    


    furszy commented at 8:01 pm on July 23, 2025:

    nano non-blocking nits: Could extract tip = ActiveTip() and use it everywhere here to make it slightly more readable.

    Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip).


    mzumsande commented at 10:13 pm on September 15, 2025:

    About the first, I didn’t have a strong opinion. Seems like if we do it for ActiveTip(), we should also do it for ActiveHeight().

    Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip).

    I’d say that a name like CChain::IsAncestorOf(const CBlockIndex* possible_ancestor, const CBlockIndex* predecessor) would be more precise it’s doing more than just checking if in chain. I think I’d be concept ACK to that - how about a separate PR and changing it in various places of the codebase where the GetAncestor / height check is used?

  31. furszy commented at 8:07 pm on July 23, 2025: member
    ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6
  32. DrahtBot requested review from vasild on Jul 23, 2025
  33. stringintech commented at 10:21 am on July 30, 2025: contributor

    ACK 3bdff9a

    I think it would be nice to include how the newly added logic would also cover the assumeutxo case in the commit description; if my understanding is correct, adding something like:

    After snapshot loading, our tip becomes the snapshot block. Since peers are verified to have the snapshot in their chain, our tip is an ancestor of the peer’s best block, hence the general advancement logic will move pindexLastCommonBlock from any pre-snapshot position to the snapshot height automatically.

  34. mzumsande commented at 7:44 pm on August 4, 2025: contributor
    Thanks! I’ll address the nits if / when I have to push again.
  35. achow101 commented at 10:09 pm on September 11, 2025: member
    ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6
  36. vasild commented at 12:56 pm on September 12, 2025: contributor
    ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6 but see #32180 (review), non-blocking IMO, would be nice to address
  37. p2p: remove extra logic for assumeutxo
    This reverts commit 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb
    which introduced extra logic for when a snapshot was loaded.
    With the previous commit, this is not longer necessary:
    
    After snapshot loading, our tip becomes the snapshot block.
    For peers that have the most-work chain, which inlcludes the snapshot,
    our tip is an ancestor of the peer's best block, hence the general
    advancement logic will move pindexLastCommonBlock
    from any pre-snapshot position to the snapshot height automatically.
    b561aa3646
  38. test: remove magic number when checking for blocks that have arrived
    getpeerinfo provides a list of blocks that are inflight, which can be used
    instead.
    e35c3d1545
  39. test: improve coverage for a resolved stalling situation
    This test (which would fail without the previous commit) checks
    that after the stalling block was received, we don't incorrectly
    mark another peer as a staller immediately.
    b2c6cb7f02
  40. mzumsande force-pushed on Sep 15, 2025
  41. mzumsande commented at 10:06 pm on September 15, 2025: contributor

    3bdff9a to b2c6cb7:


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-17 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me