p2p: Advance pindexLastCommonBlock early after connecting blocks #32180

pull mzumsande wants to merge 4 commits into bitcoin:master from mzumsande:202403_ibd_lastcommonblock changing 2 files +29 −30
  1. mzumsande commented at 10:40 pm on March 31, 2025: contributor

    As described in #32179, pindexLastCommonBlock is updated later than necessary in master. In case of a linear chain with no forks, it can be moved forward at the beginning of FindNextBlocksToDownload, so that the updated value can be used to better estimate nWindowEnd. This helps the node to request all blocks from peers within the correct 1024-block-window and avoids peers being incorrectly marked as stallers.

    The second commit removes a special case introduced in 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb which is not necessary, because the more general logic also covers this case.

    I also changed p2p_ibd_stalling.py to cover the situation after a resolved situation, making sure that no additional peers are marked for stalling.

    Fixes #32179

  2. DrahtBot commented at 10:40 pm on March 31, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32180.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild
    Concept ACK stringintech, i-am-yuvi

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. mzumsande marked this as a draft on Mar 31, 2025
  4. DrahtBot added the label P2P on Mar 31, 2025
  5. mzumsande marked this as ready for review on Apr 2, 2025
  6. mzumsande force-pushed on Apr 7, 2025
  7. in src/net_processing.cpp:1376 in 205e651713 outdated
    1380+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1381+    }
    1382+
    1383     // If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
    1384     // of its current tip anymore. Go back enough to fix that.
    1385     state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);
    


    stringintech commented at 9:59 am on April 10, 2025:

    Could there be an issue if both nodes reorganize to the same chain, but the last common block is still on another fork?

    If our new tip height is less than the outdated pindexLastCommonBlock height (like when reorging to a shorter chain with more work), we’d skip the optimization when we shouldn’t.

    Since LastCommonAncestor() brings back the pindexLastCommonBlock on the peer’s best chain, maybe the new code should come after this call?

    Not sure if this edge case matters in practice, but thought I’d mention it.​​​​​​​​​​​​​​​​


    mzumsande commented at 7:45 pm on April 11, 2025:
    Good observation! I also think that this edge case shouldn’t matter much. It’s absolutely essential that pindexLastCommonBlock is an ancestor of pindexBestKnownBlock when calling FindNextBlocks, which is why I originally left that check last. But the added check can’t change that since it only sets pindexLastCommonBlock to the tip if that tip is an ancestor of pindexBestKnownBlock. So I’ve changed the order.
  8. stringintech commented at 10:02 am on April 10, 2025: none
    Concept ACK.
  9. i-am-yuvi commented at 7:47 am on April 11, 2025: contributor

    Concept ACK

    Thanks for this PR.

    I have made some notes regarding this PR and will add some guide for testing as well very soon.

  10. mzumsande force-pushed on Apr 11, 2025
  11. in src/net_processing.cpp:1384 in c69ee2d5b9 outdated
    1376@@ -1379,6 +1377,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
    1377     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
    1378         return;
    1379 
    1380+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
    1381+    // allows us to determine nWindowEnd better.
    1382+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
    1383+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
    1384+    }
    


    vasild commented at 10:36 am on April 25, 2025:

    This change assumes their tip is higher than our tip. What if their tip is lower? Maybe worth advancing pindexLastCommonBlock in that case as well. Something like this:

     0// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to
     1// download any blocks in between, and skipping ahead here allows us to determine nWindowEnd better.
     2const auto our_tip{m_chainman.ActiveTip()};
     3const auto peer_best_known{state->pindexBestKnownBlock};
     4if (our_tip->nHeight > state->pindexLastCommonBlock->nHeight) {
     5    if (peer_best_known->nHeight >= our_tip->nHeight) { 
     6        if (peer_best_known->GetAncestor(our_tip->nHeight) == our_tip) {
     7            state->pindexLastCommonBlock = our_tip;
     8        }     
     9    } else {                        
    10        if (our_tip->GetAncestor(peer_best_known->nHeight) == peer_best_known) {
    11            state->pindexLastCommonBlock = peer_best_known;
    12        }
    13        // FindNextBlocks() would be a noop below, so return here.
    14        return;
    15    }     
    16}
    

    mzumsande commented at 2:54 pm on April 25, 2025:
    Then we would have returned earlier because there are no blocks from this peer that would help us with our IBD: https://github.com/bitcoin/bitcoin/blob/80e6ad9e3023a57a4ef19b7d0edf9ac5be71a584/src/net_processing.cpp#L1384-L1387 (second condition)

    vasild commented at 9:10 am on April 28, 2025:
    Does it make sense to advance pindexLastCommonBlock in that case? I am just curious, it is definitely not the aim of this PR.

    mzumsande commented at 8:05 pm on April 28, 2025:
    I’d say probably not because pindexLastCommonBlock isn’t really used for anything but assigning blocks during IBD. If we will never need any blocks from this peer (until it sends us more headers, but then pindexLastCommonBlock will get updated. I’ll also ignore weird invalidateblock scenarios here), I don’t see what benefits advancing it anyway would have.
  12. in test/functional/p2p_ibd_stalling.py:70 in c69ee2d5b9 outdated
    66@@ -67,6 +67,7 @@ def run_test(self):
    67             height += 1
    68             block_dict[blocks[-1].sha256] = blocks[-1]
    69         stall_block = blocks[0].sha256
    70+        second_stall = blocks[500].sha256  # another block we don't provide immediately
    


    vasild commented at 11:57 am on April 25, 2025:

    nit: could use an array here:

    0stall_blocks = [blocks[0].sha256, blocks[500].sha256]
    

    which would make it possible to have the following code like:

    0- peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
    1+ peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
    2
    3- self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
    4+ self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
    

    mzumsande commented at 8:02 pm on April 28, 2025:
    done
  13. in test/functional/p2p_ibd_stalling.py:145 in c69ee2d5b9 outdated
    144+            self.all_sync_send_with_ping(peers)
    145 
    146-        self.log.info("Check that all outstanding blocks get connected")
    147-        self.wait_until(lambda: node.getblockcount() == NUM_BLOCKS)
    148+        self.log.info("Check that all outstanding blocks up to the second stall block get connected")
    149+        self.wait_until(lambda: node.getblockcount() == 500)
    


    vasild commented at 12:00 pm on April 25, 2025:
    nit: could use a variable second_stall_block_index = 500 or something like that to avoid repeating the number 500.

    mzumsande commented at 8:03 pm on April 28, 2025:
    changed to use one var for each index, plus one array.
  14. vasild approved
  15. vasild commented at 12:23 pm on April 25, 2025: contributor

    ACK c69ee2d5b93296d3008d6978182b2bc29bbeb457

    The code changes look reasonable. I tested that p2p_ibd_stalling.py fails without commit 1d8504c5e6 p2p: During block download, adjust pindexLastCommonBlock right away. It failed 7 times and passed 1 time. Is it expected to pass sometimes? I also checked that the code inside the modified ifs is executed during p2p_ibd_stalling.py.

  16. DrahtBot requested review from stringintech on Apr 25, 2025
  17. p2p: During block download, adjust pindexLastCommonBlock right away
    This allows us to calculate nWndowEnd better.
    Before, it would be calculated based on an old pindexLastCommonBlock,
    which could result in blocks not requested for download that could have been
    requested, and peers being wrongly marked as staller.
    855b1fc2b9
  18. p2p: remove extra logic for assumeutxo
    This reverts commit 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb
    which introduced extra logic for when a snapshot was loaded.
    With the previous commit, this is not longer necessary
    because the more general logic of advancing pindexLastCommonBlock
    also covers this case.
    48cd9171bb
  19. test: remove magic number when checking for blocks that have arrived
    getpeerinfo provides a list of blocks that are inflight, which can be used
    instead.
    5ed7daa102
  20. test: improve coverage for a resolved stalling situation
    This test (which would fail without the previous commit) checks
    that after the stalling block was received, we don't incorrectly
    mark another peer as a staller immediately.
    b7ff6a611a
  21. mzumsande force-pushed on Apr 28, 2025
  22. mzumsande commented at 8:02 pm on April 28, 2025: contributor

    c69ee2d to b7ff6a6: addressed feedback

    Is it expected to pass sometimes?

    It wasn’t - but I have tracked this down now. It’s a bit involved, I hope the following explanation makes sense:

    The initial situation is that there are 2 remaining connected peers, the node has just synced blocks up to height 500, but pindexLastCommonBlock is still at 0. The second stall block at height 501 is already requested from one peer, let’s say peer0.

    If SendMessages is invoked for peer0, it will request no blocks, update pindexLastCommonBlock to 500 and won’t mark anyone as a staller, because it’s already requested from the same peer. If after that, peer1 would be processed, peer0 would be marked as a staller. But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it. If peer1 is processed then (for which pindexLastCommonBlock is still at 0), all 1024+1 blocks in the window will already be requested, so we never get to this point because we always continue here. I think this could be fixed by exchanging the order of this and this code block. I’m not sure if this is something that needs fixing though because the current patch should avoid this problem too…

  23. DrahtBot added the label CI failed on Apr 29, 2025
  24. vasild approved
  25. vasild commented at 9:27 am on April 29, 2025: contributor

    ACK b7ff6a611a220e9380f6cd6428f1d3483c8d566f

    But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it

    I confirm this, made it reproducible:

    • git show 855b1fc2b9 |git apply -R
    • observe p2p_ibd_stalling.py is failing (mostly).
    • apply the patch below to mess with the order in which nodes are processed
    • observe p2p_ibd_stalling.py succeeding
     0--- i/src/net.cpp
     1+++ w/src/net.cpp
     2@@ -3029,12 +3029,14 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
     3         pnode->ConnectedThroughNetwork(),
     4         GetNodeCount(ConnectionDirection::Out));
     5 }
     6 
     7 Mutex NetEventsInterface::g_msgproc_mutex;
     8 
     9+static size_t seq{0};
    10+
    11 void CConnman::ThreadMessageHandler()
    12 {
    13     LOCK(NetEventsInterface::g_msgproc_mutex);
    14 
    15     while (!flagInterruptMsgProc)
    16     {
    17@@ -3043,13 +3045,22 @@ void CConnman::ThreadMessageHandler()
    18         {
    19             // Randomize the order in which we process messages from/to our peers.
    20             // This prevents attacks in which an attacker exploits having multiple
    21             // consecutive connections in the m_nodes list.
    22             const NodesSnapshot snap{*this, /*shuffle=*/true};
    23 
    24-            for (CNode* pnode : snap.Nodes()) {
    25+            auto nodes = snap.Nodes();
    26+            std::ranges::sort(nodes, [](CNode* a, CNode* b) {
    27+                return seq % 3 == 0 ? a->GetId() > b->GetId() : a->GetId() < b->GetId();
    28+            });
    29+
    30+            ++seq;
    31+
    32+            LogInfo("===== begin");
    33+            for (CNode* pnode : nodes) {
    34+                LogInfo("===== node id: %d, disconnect: %d", pnode->GetId(), pnode->fDisconnect);
    35                 if (pnode->fDisconnect)
    36                     continue;
    37 
    38                 // Receive messages
    39                 bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
    40                 fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
    41@@ -3058,12 +3069,13 @@ void CConnman::ThreadMessageHandler()
    42                 // Send messages
    43                 m_msgproc->SendMessages(pnode);
    44 
    45                 if (flagInterruptMsgProc)
    46                     return;
    47             }
    48+            LogInfo("===== end");
    49         }
    50 
    51         WAIT_LOCK(mutexMsgProc, lock);
    52         if (!fMoreWork) {
    53             condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
    54         }
    
  26. DrahtBot removed the label CI failed on Apr 29, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-07 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me