p2p: Advance pindexLastCommonBlock early after connecting blocks #32180

pull mzumsande wants to merge 4 commits into bitcoin:master from mzumsande:202403_ibd_lastcommonblock changing 2 files +20 βˆ’22
  1. mzumsande commented at 10:40 pm on March 31, 2025: contributor

    As described in #32179, pindexLastCommonBlock is updated later than necessary in master. In case of a linear chain with no forks, it can be moved forward at the beginning of FindNextBlocksToDownload, so that the updated value can be used to better estimate nWindowEnd. This helps the node to request all blocks from peers within the correct 1024-block-window and avoids peers being incorrectly marked as stallers.

    The second commit removes a special case introduced in 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb which is not necessary, because the more general logic also covers this case.

    I also changed p2p_ibd_stalling.py to cover the situation after a resolved situation, making sure that no additional peers are marked for stalling.

    Fixes #32179

  2. DrahtBot commented at 10:40 pm on March 31, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32180.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK furszy, stringintech
    Concept ACK yuvicc
    Stale ACK vasild

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. mzumsande marked this as a draft on Mar 31, 2025
  4. DrahtBot added the label P2P on Mar 31, 2025
  5. mzumsande marked this as ready for review on Apr 2, 2025
  6. mzumsande force-pushed on Apr 7, 2025
  7. in src/net_processing.cpp:1408 in 205e651713 outdated
    1380+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1381+    }
    1382+
    1383     // If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
    1384     // of its current tip anymore. Go back enough to fix that.
    1385     state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);
    


    stringintech commented at 9:59 am on April 10, 2025:

    Could there be an issue if both nodes reorganize to the same chain, but the last common block is still on another fork?

    If our new tip height is less than the outdated pindexLastCommonBlock height (like when reorging to a shorter chain with more work), we’d skip the optimization when we shouldn’t.

    Since LastCommonAncestor() brings back the pindexLastCommonBlock on the peer’s best chain, maybe the new code should come after this call?

    Not sure if this edge case matters in practice, but thought I’d mention it.​​​​​​​​​​​​​​​​


    mzumsande commented at 7:45 pm on April 11, 2025:
    Good observation! I also think that this edge case shouldn’t matter much. It’s absolutely essential that pindexLastCommonBlock is an ancestor of pindexBestKnownBlock when calling FindNextBlocks, which is why I originally left that check last. But the added check can’t change that since it only sets pindexLastCommonBlock to the tip if that tip is an ancestor of pindexBestKnownBlock. So I’ve changed the order.
  8. stringintech commented at 10:02 am on April 10, 2025: contributor
    Concept ACK.
  9. yuvicc commented at 7:47 am on April 11, 2025: contributor

    Concept ACK

    Thanks for this PR.

    I have made some notes regarding this PR and will add some guide for testing as well very soon.

  10. mzumsande force-pushed on Apr 11, 2025
  11. in src/net_processing.cpp:1416 in c69ee2d5b9 outdated
    1376@@ -1379,6 +1377,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
    1377     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
    1378         return;
    1379 
    1380+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
    1381+    // allows us to determine nWindowEnd better.
    1382+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
    1383+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1384+    }
    


    vasild commented at 10:36 am on April 25, 2025:

    This change assumes their tip is higher than our tip. What if their tip is lower? Maybe worth advancing pindexLastCommonBlock in that case as well. Something like this:

     0// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to
     1// download any blocks in between, and skipping ahead here allows us to determine nWindowEnd better.
     2const auto our_tip{m_chainman.ActiveTip()};
     3const auto peer_best_known{state->pindexBestKnownBlock};
     4if (our_tip->nHeight > state->pindexLastCommonBlock->nHeight) {
     5    if (peer_best_known->nHeight >= our_tip->nHeight) { 
     6        if (peer_best_known->GetAncestor(our_tip->nHeight) == our_tip) {
     7            state->pindexLastCommonBlock = our_tip;
     8        }     
     9    } else {                        
    10        if (our_tip->GetAncestor(peer_best_known->nHeight) == peer_best_known) {
    11            state->pindexLastCommonBlock = peer_best_known;
    12        }
    13        // FindNextBlocks() would be a noop below, so return here.
    14        return;
    15    }     
    16}
    

    mzumsande commented at 2:54 pm on April 25, 2025:
    Then we would have returned earlier because there are no blocks from this peer that would help us with our IBD: https://github.com/bitcoin/bitcoin/blob/80e6ad9e3023a57a4ef19b7d0edf9ac5be71a584/src/net_processing.cpp#L1384-L1387 (second condition)

    vasild commented at 9:10 am on April 28, 2025:
    Does it make sense to advance pindexLastCommonBlock in that case? I am just curious, it is definitely not the aim of this PR.

    mzumsande commented at 8:05 pm on April 28, 2025:
    I’d say probably not because pindexLastCommonBlock isn’t really used for anything but assigning blocks during IBD. If we will never need any blocks from this peer (until it sends us more headers, but then pindexLastCommonBlock will get updated. I’ll also ignore weird invalidateblock scenarios here), I don’t see what benefits advancing it anyway would have.
  12. in test/functional/p2p_ibd_stalling.py:70 in c69ee2d5b9 outdated
    66@@ -67,6 +67,7 @@ def run_test(self):
    67             height += 1
    68             block_dict[blocks[-1].sha256] = blocks[-1]
    69         stall_block = blocks[0].sha256
    70+        second_stall = blocks[500].sha256  # another block we don't provide immediately
    


    vasild commented at 11:57 am on April 25, 2025:

    nit: could use an array here:

    0stall_blocks = [blocks[0].sha256, blocks[500].sha256]
    

    which would make it possible to have the following code like:

    0- peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
    1+ peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
    2
    3- self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
    4+ self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
    

    mzumsande commented at 8:02 pm on April 28, 2025:
    done
  13. in test/functional/p2p_ibd_stalling.py:145 in c69ee2d5b9 outdated
    144+            self.all_sync_send_with_ping(peers)
    145 
    146-        self.log.info("Check that all outstanding blocks get connected")
    147-        self.wait_until(lambda: node.getblockcount() == NUM_BLOCKS)
    148+        self.log.info("Check that all outstanding blocks up to the second stall block get connected")
    149+        self.wait_until(lambda: node.getblockcount() == 500)
    


    vasild commented at 12:00 pm on April 25, 2025:
    nit: could use a variable second_stall_block_index = 500 or something like that to avoid repeating the number 500.

    mzumsande commented at 8:03 pm on April 28, 2025:
    changed to use one var for each index, plus one array.
  14. vasild approved
  15. vasild commented at 12:23 pm on April 25, 2025: contributor

    ACK c69ee2d5b93296d3008d6978182b2bc29bbeb457

    The code changes look reasonable. I tested that p2p_ibd_stalling.py fails without commit 1d8504c5e6 p2p: During block download, adjust pindexLastCommonBlock right away. It failed 7 times and passed 1 time. Is it expected to pass sometimes? I also checked that the code inside the modified ifs is executed during p2p_ibd_stalling.py.

  16. DrahtBot requested review from stringintech on Apr 25, 2025
  17. mzumsande force-pushed on Apr 28, 2025
  18. mzumsande commented at 8:02 pm on April 28, 2025: contributor

    c69ee2d to b7ff6a6: addressed feedback

    Is it expected to pass sometimes?

    It wasn’t - but I have tracked this down now. It’s a bit involved, I hope the following explanation makes sense:

    The initial situation is that there are 2 remaining connected peers, the node has just synced blocks up to height 500, but pindexLastCommonBlock is still at 0. The second stall block at height 501 is already requested from one peer, let’s say peer0.

    If SendMessages is invoked for peer0, it will request no blocks, update pindexLastCommonBlock to 500 and won’t mark anyone as a staller, because it’s already requested from the same peer. If after that, peer1 would be processed, peer0 would be marked as a staller. But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it. If peer1 is processed then (for which pindexLastCommonBlock is still at 0), all 1024+1 blocks in the window will already be requested, so we never get to this point because we always continue here. I think this could be fixed by exchanging the order of this and this code block. I’m not sure if this is something that needs fixing though because the current patch should avoid this problem too…

  19. DrahtBot added the label CI failed on Apr 29, 2025
  20. vasild approved
  21. vasild commented at 9:27 am on April 29, 2025: contributor

    ACK b7ff6a611a220e9380f6cd6428f1d3483c8d566f

    But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it

    I confirm this, made it reproducible:

    • git show 855b1fc2b9 |git apply -R
    • observe p2p_ibd_stalling.py is failing (mostly).
    • apply the patch below to mess with the order in which nodes are processed
    • observe p2p_ibd_stalling.py succeeding
     0--- i/src/net.cpp
     1+++ w/src/net.cpp
     2@@ -3029,12 +3029,14 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
     3         pnode->ConnectedThroughNetwork(),
     4         GetNodeCount(ConnectionDirection::Out));
     5 }
     6 
     7 Mutex NetEventsInterface::g_msgproc_mutex;
     8 
     9+static size_t seq{0};
    10+
    11 void CConnman::ThreadMessageHandler()
    12 {
    13     LOCK(NetEventsInterface::g_msgproc_mutex);
    14 
    15     while (!flagInterruptMsgProc)
    16     {
    17@@ -3043,13 +3045,22 @@ void CConnman::ThreadMessageHandler()
    18         {
    19             // Randomize the order in which we process messages from/to our peers.
    20             // This prevents attacks in which an attacker exploits having multiple
    21             // consecutive connections in the m_nodes list.
    22             const NodesSnapshot snap{*this, /*shuffle=*/true};
    23 
    24-            for (CNode* pnode : snap.Nodes()) {
    25+            auto nodes = snap.Nodes();
    26+            std::ranges::sort(nodes, [](CNode* a, CNode* b) {
    27+                return seq % 3 == 0 ? a->GetId() > b->GetId() : a->GetId() < b->GetId();
    28+            });
    29+
    30+            ++seq;
    31+
    32+            LogInfo("===== begin");
    33+            for (CNode* pnode : nodes) {
    34+                LogInfo("===== node id: %d, disconnect: %d", pnode->GetId(), pnode->fDisconnect);
    35                 if (pnode->fDisconnect)
    36                     continue;
    37 
    38                 // Receive messages
    39                 bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
    40                 fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
    41@@ -3058,12 +3069,13 @@ void CConnman::ThreadMessageHandler()
    42                 // Send messages
    43                 m_msgproc->SendMessages(pnode);
    44 
    45                 if (flagInterruptMsgProc)
    46                     return;
    47             }
    48+            LogInfo("===== end");
    49         }
    50 
    51         WAIT_LOCK(mutexMsgProc, lock);
    52         if (!fMoreWork) {
    53             condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
    54         }
    
  22. DrahtBot removed the label CI failed on Apr 29, 2025
  23. DrahtBot added the label Needs rebase on Jul 18, 2025
  24. p2p: During block download, adjust pindexLastCommonBlock right away
    This allows us to calculate nWndowEnd better.
    Before, it would be calculated based on an old pindexLastCommonBlock,
    which could result in blocks not requested for download that could have been
    requested, and peers being wrongly marked as staller.
    0a757acf66
  25. p2p: remove extra logic for assumeutxo
    This reverts commit 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb
    which introduced extra logic for when a snapshot was loaded.
    With the previous commit, this is not longer necessary
    because the more general logic of advancing pindexLastCommonBlock
    also covers this case.
    e665560822
  26. test: remove magic number when checking for blocks that have arrived
    getpeerinfo provides a list of blocks that are inflight, which can be used
    instead.
    f57e1442a9
  27. test: improve coverage for a resolved stalling situation
    This test (which would fail without the previous commit) checks
    that after the stalling block was received, we don't incorrectly
    mark another peer as a staller immediately.
    3bdff9a154
  28. mzumsande force-pushed on Jul 21, 2025
  29. mzumsande commented at 3:27 pm on July 21, 2025: contributor
    b7ff6a6 to 3bdff9a: rebased due to conflict with #32868
  30. DrahtBot removed the label Needs rebase on Jul 21, 2025
  31. DrahtBot added the label CI failed on Jul 21, 2025
  32. DrahtBot removed the label CI failed on Jul 22, 2025
  33. in src/net_processing.cpp:1416 in 0a757acf66 outdated
    1410@@ -1411,6 +1411,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
    1411     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
    1412         return;
    1413 
    1414+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
    1415+    // allows us to determine nWindowEnd better.
    1416+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
    


    furszy commented at 8:01 pm on July 23, 2025:

    nano non-blocking nits: Could extract tip = ActiveTip() and use it everywhere here to make it slightly more readable.

    Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip).

  34. furszy commented at 8:07 pm on July 23, 2025: member
    ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6
  35. DrahtBot requested review from vasild on Jul 23, 2025
  36. stringintech commented at 10:21 am on July 30, 2025: contributor

    ACK 3bdff9a

    I think it would be nice to include how the newly added logic would also cover the assumeutxo case in the commit description; if my understanding is correct, adding something like:

    After snapshot loading, our tip becomes the snapshot block. Since peers are verified to have the snapshot in their chain, our tip is an ancestor of the peer’s best block, hence the general advancement logic will move pindexLastCommonBlock from any pre-snapshot position to the snapshot height automatically.

  37. mzumsande commented at 7:44 pm on August 4, 2025: contributor
    Thanks! I’ll address the nits if / when I have to push again.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-15 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me