p2p: Advance pindexLastCommonBlock early after connecting blocks

mzumsande commented at 10:40 pm on March 31, 2025: contributor

As described in #32179, pindexLastCommonBlock is updated later than necessary in master. In case of a linear chain with no forks, it can be moved forward at the beginning of FindNextBlocksToDownload, so that the updated value can be used to better estimate nWindowEnd. This helps the node to request all blocks from peers within the correct 1024-block-window and avoids peers being incorrectly marked as stallers.

I also changed p2p_ibd_stalling.py to cover the situation after a resolved situation, making sure that no additional peers are marked for stalling.

Fixes #32179

DrahtBot commented at 10:40 pm on March 31, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32180.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	Crypt-iQ, sipa, stringintech
Concept ACK	yuvicc
Stale ACK	furszy, achow101, vasild

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

mzumsande marked this as a draft on Mar 31, 2025

DrahtBot added the label P2P on Mar 31, 2025

mzumsande marked this as ready for review on Apr 2, 2025

mzumsande force-pushed on Apr 7, 2025

in src/net_processing.cpp:1408 in 205e651713 outdated

1380+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
1381+    }
1382+
1383     // If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
1384     // of its current tip anymore. Go back enough to fix that.
1385     state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);

stringintech commented at 9:59 am on April 10, 2025:

Could there be an issue if both nodes reorganize to the same chain, but the last common block is still on another fork?

If our new tip height is less than the outdated pindexLastCommonBlock height (like when reorging to a shorter chain with more work), we’d skip the optimization when we shouldn’t.

Since LastCommonAncestor() brings back the pindexLastCommonBlock on the peer’s best chain, maybe the new code should come after this call?

Not sure if this edge case matters in practice, but thought I’d mention it.

mzumsande commented at 7:45 pm on April 11, 2025:

Good observation! I also think that this edge case shouldn’t matter much. It’s absolutely essential that pindexLastCommonBlock is an ancestor of pindexBestKnownBlock when calling FindNextBlocks, which is why I originally left that check last. But the added check can’t change that since it only sets pindexLastCommonBlock to the tip if that tip is an ancestor of pindexBestKnownBlock. So I’ve changed the order.

stringintech commented at 10:02 am on April 10, 2025: contributor

Concept ACK.

yuvicc commented at 7:47 am on April 11, 2025: contributor

Concept ACK

Thanks for this PR.

I have made some notes regarding this PR and will add some guide for testing as well very soon.

mzumsande force-pushed on Apr 11, 2025

in src/net_processing.cpp:1416 in c69ee2d5b9 outdated

1376@@ -1379,6 +1377,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1377     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
1378         return;
1379 
1380+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
1381+    // allows us to determine nWindowEnd better.
1382+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
1383+        state->pindexLastCommonBlock = m_chainman.ActiveTip();
1384+    }

vasild commented at 10:36 am on April 25, 2025:

This change assumes their tip is higher than our tip. What if their tip is lower? Maybe worth advancing pindexLastCommonBlock in that case as well. Something like this:

 0// If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to
 1// download any blocks in between, and skipping ahead here allows us to determine nWindowEnd better.
 2const auto our_tip{m_chainman.ActiveTip()};
 3const auto peer_best_known{state->pindexBestKnownBlock};
 4if (our_tip->nHeight > state->pindexLastCommonBlock->nHeight) {
 5    if (peer_best_known->nHeight >= our_tip->nHeight) { 
 6        if (peer_best_known->GetAncestor(our_tip->nHeight) == our_tip) {
 7            state->pindexLastCommonBlock = our_tip;
 8        }     
 9    } else {                        
10        if (our_tip->GetAncestor(peer_best_known->nHeight) == peer_best_known) {
11            state->pindexLastCommonBlock = peer_best_known;
12        }
13        // FindNextBlocks() would be a noop below, so return here.
14        return;
15    }     
16}

mzumsande commented at 2:54 pm on April 25, 2025:

Then we would have returned earlier because there are no blocks from this peer that would help us with our IBD: https://github.com/bitcoin/bitcoin/blob/80e6ad9e3023a57a4ef19b7d0edf9ac5be71a584/src/net_processing.cpp#L1384-L1387 (second condition)

vasild commented at 9:10 am on April 28, 2025:

Does it make sense to advance pindexLastCommonBlock in that case? I am just curious, it is definitely not the aim of this PR.

mzumsande commented at 8:05 pm on April 28, 2025:

I’d say probably not because pindexLastCommonBlock isn’t really used for anything but assigning blocks during IBD. If we will never need any blocks from this peer (until it sends us more headers, but then pindexLastCommonBlock will get updated. I’ll also ignore weird invalidateblock scenarios here), I don’t see what benefits advancing it anyway would have.

in test/functional/p2p_ibd_stalling.py:70 in c69ee2d5b9 outdated

66@@ -67,6 +67,7 @@ def run_test(self):
67             height += 1
68             block_dict[blocks[-1].sha256] = blocks[-1]
69         stall_block = blocks[0].sha256
70+        second_stall = blocks[500].sha256  # another block we don't provide immediately

vasild commented at 11:57 am on April 25, 2025:

nit: could use an array here:

0stall_blocks = [blocks[0].sha256, blocks[500].sha256]

which would make it possible to have the following code like:

0- peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
1+ peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
2
3- self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
4+ self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))

mzumsande commented at 8:02 pm on April 28, 2025:

done

in test/functional/p2p_ibd_stalling.py:145 in c69ee2d5b9 outdated

144+            self.all_sync_send_with_ping(peers)
145 
146-        self.log.info("Check that all outstanding blocks get connected")
147-        self.wait_until(lambda: node.getblockcount() == NUM_BLOCKS)
148+        self.log.info("Check that all outstanding blocks up to the second stall block get connected")
149+        self.wait_until(lambda: node.getblockcount() == 500)

vasild commented at 12:00 pm on April 25, 2025:

nit: could use a variable second_stall_block_index = 500 or something like that to avoid repeating the number 500.

mzumsande commented at 8:03 pm on April 28, 2025:

changed to use one var for each index, plus one array.

vasild commented at 12:51 pm on September 12, 2025:

Looks like, according to git range-diff b7ff6a611a...3bdff9a154, this and the suggestion above were reverted in the latest push? If that was not intentional, then maybe amend this change in the latest commit:

 0--- i/test/functional/p2p_ibd_stalling.py
 1+++ w/test/functional/p2p_ibd_stalling.py
 2@@ -63,29 +63,30 @@ class P2PIBDStallingTest(BitcoinTestFramework):
 3             blocks.append(create_block(tip, create_coinbase(height), block_time))
 4             blocks[-1].solve()
 5             tip = blocks[-1].hash_int
 6             block_time += 1
 7             height += 1
 8             block_dict[blocks[-1].hash_int] = blocks[-1]
 9-        stall_block = blocks[0].hash_int
10-        second_stall = blocks[500].hash_int  # another block we don't provide immediately
11+        stall_index = 0
12+        second_stall_index = 500
13+        stall_blocks = [blocks[stall_index].hash_int, blocks[second_stall_index].hash_int]
14 
15         headers_message = msg_headers()
16         headers_message.headers = [CBlockHeader(b) for b in blocks[:NUM_BLOCKS-1]]
17         peers = []
18 
19         self.log.info("Check that a staller does not get disconnected if the 1024 block lookahead buffer is filled")
20         self.mocktime = int(time.time()) + 1
21         node.setmocktime(self.mocktime)
22         for id in range(NUM_PEERS):
23-            peers.append(node.add_outbound_p2p_connection(P2PStaller([stall_block, second_stall]), p2p_idx=id, connection_type="outbound-full-relay"))
24+            peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_blocks), p2p_idx=id, connection_type="outbound-full-relay"))
25             peers[-1].block_store = block_dict
26             peers[-1].send_and_ping(headers_message)
27 
28         # Wait until all blocks are received (except for the stall blocks), so that no other blocks are in flight.
29-        self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == 2)
30+        self.wait_until(lambda: sum(len(peer['inflight']) for peer in node.getpeerinfo()) == len(stall_blocks))
31 
32         self.all_sync_send_with_ping(peers)
33         # If there was a peer marked for stalling, it would get disconnected
34         self.mocktime += 3
35         node.setmocktime(self.mocktime)
36         self.all_sync_send_with_ping(peers)
37@@ -100,13 +101,13 @@ class P2PIBDStallingTest(BitcoinTestFramework):
38 
39         self.log.info("Check that the stalling peer is disconnected after 2 seconds")
40         self.mocktime += 3
41         node.setmocktime(self.mocktime)
42         peers[0].wait_for_disconnect()
43         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 1)
44-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
45+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
46         # Make sure that SendMessages() is invoked, which assigns the missing block
47         # to another peer and starts the stalling logic for them
48         self.all_sync_send_with_ping(peers)
49 
50         self.log.info("Check that the stalling timeout gets doubled to 4 seconds for the next staller")
51         # No disconnect after just 3 seconds
52@@ -115,37 +116,37 @@ class P2PIBDStallingTest(BitcoinTestFramework):
53         self.all_sync_send_with_ping(peers)
54         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 1)
55 
56         self.mocktime += 2
57         node.setmocktime(self.mocktime)
58         self.wait_until(lambda: sum(x.is_connected for x in node.p2ps) == NUM_PEERS - 2)
59-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
60+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
61         self.all_sync_send_with_ping(peers)
62 
63         self.log.info("Check that the stalling timeout gets doubled to 8 seconds for the next staller")
64         # No disconnect after just 7 seconds
65         self.mocktime += 7
66         node.setmocktime(self.mocktime)
67         self.all_sync_send_with_ping(peers)
68         assert_equal(node.num_test_p2p_connections(), NUM_PEERS - 2)
69 
70         self.mocktime += 2
71         node.setmocktime(self.mocktime)
72         self.wait_until(lambda: sum(x.is_connected for x in node.p2ps) == NUM_PEERS - 3)
73-        self.wait_until(lambda: self.is_block_requested(peers, stall_block))
74+        self.wait_until(lambda: self.is_block_requested(peers, stall_blocks[0]))
75         self.all_sync_send_with_ping(peers)
76 
77         self.log.info("Provide the first withheld block and check that stalling timeout gets reduced back to 2 seconds")
78         with node.assert_debug_log(expected_msgs=['Decreased stalling timeout to 2 seconds'], unexpected_msgs=['Stall started']):
79             for p in peers:
80-                if p.is_connected and (stall_block in p.getdata_requests):
81-                    p.send_without_ping(msg_block(block_dict[stall_block]))
82+                if p.is_connected and (stall_blocks[0] in p.getdata_requests):
83+                    p.send_without_ping(msg_block(block_dict[stall_blocks[0]]))
84             self.all_sync_send_with_ping(peers)
85 
86         self.log.info("Check that all outstanding blocks up to the second stall block get connected")
87-        self.wait_until(lambda: node.getblockcount() == 500)
88+        self.wait_until(lambda: node.getblockcount() == second_stall_index)
89 
90 
91     def all_sync_send_with_ping(self, peers):
92         for p in peers:
93             if p.is_connected:
94                 p.sync_with_ping()

mzumsande commented at 10:04 pm on September 15, 2025:

That was not on purpose - sorry. Fixed now.

vasild approved

vasild commented at 12:23 pm on April 25, 2025: contributor

ACK c69ee2d5b93296d3008d6978182b2bc29bbeb457

The code changes look reasonable. I tested that p2p_ibd_stalling.py fails without commit 1d8504c5e6 p2p: During block download, adjust pindexLastCommonBlock right away. It failed 7 times and passed 1 time. Is it expected to pass sometimes? I also checked that the code inside the modified ifs is executed during p2p_ibd_stalling.py.

DrahtBot requested review from stringintech on Apr 25, 2025

mzumsande force-pushed on Apr 28, 2025

mzumsande commented at 8:02 pm on April 28, 2025: contributor

c69ee2d to b7ff6a6: addressed feedback

Is it expected to pass sometimes?

It wasn’t - but I have tracked this down now. It’s a bit involved, I hope the following explanation makes sense:

The initial situation is that there are 2 remaining connected peers, the node has just synced blocks up to height 500, but pindexLastCommonBlock is still at 0. The second stall block at height 501 is already requested from one peer, let’s say peer0.

If SendMessages is invoked for peer0, it will request no blocks, update pindexLastCommonBlock to 500 and won’t mark anyone as a staller, because it’s already requested from the same peer. If after that, peer1 would be processed, peer0 would be marked as a staller. But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it. If peer1 is processed then (for which pindexLastCommonBlock is still at 0), all 1024+1 blocks in the window will already be requested, so we never get to this point because we always continue here. I think this could be fixed by exchanging the order of this and this code block. I’m not sure if this is something that needs fixing though because the current patch should avoid this problem too…

DrahtBot added the label CI failed on Apr 29, 2025

vasild approved

vasild commented at 9:27 am on April 29, 2025: contributor

ACK b7ff6a611a220e9380f6cd6428f1d3483c8d566f

But if peer0 is processed twice in a row (which can happen due to the randomization in ThreadMessageHandler), we could request block 1025 from it

I confirm this, made it reproducible:

git show 855b1fc2b9 |git apply -R
observe p2p_ibd_stalling.py is failing (mostly).
apply the patch below to mess with the order in which nodes are processed
observe p2p_ibd_stalling.py succeeding

 0--- i/src/net.cpp
 1+++ w/src/net.cpp
 2@@ -3029,12 +3029,14 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
 3         pnode->ConnectedThroughNetwork(),
 4         GetNodeCount(ConnectionDirection::Out));
 5 }
 6 
 7 Mutex NetEventsInterface::g_msgproc_mutex;
 8 
 9+static size_t seq{0};
10+
11 void CConnman::ThreadMessageHandler()
12 {
13     LOCK(NetEventsInterface::g_msgproc_mutex);
14 
15     while (!flagInterruptMsgProc)
16     {
17@@ -3043,13 +3045,22 @@ void CConnman::ThreadMessageHandler()
18         {
19             // Randomize the order in which we process messages from/to our peers.
20             // This prevents attacks in which an attacker exploits having multiple
21             // consecutive connections in the m_nodes list.
22             const NodesSnapshot snap{*this, /*shuffle=*/true};
23 
24-            for (CNode* pnode : snap.Nodes()) {
25+            auto nodes = snap.Nodes();
26+            std::ranges::sort(nodes, [](CNode* a, CNode* b) {
27+                return seq % 3 == 0 ? a->GetId() > b->GetId() : a->GetId() < b->GetId();
28+            });
29+
30+            ++seq;
31+
32+            LogInfo("===== begin");
33+            for (CNode* pnode : nodes) {
34+                LogInfo("===== node id: %d, disconnect: %d", pnode->GetId(), pnode->fDisconnect);
35                 if (pnode->fDisconnect)
36                     continue;
37 
38                 // Receive messages
39                 bool fMoreNodeWork = m_msgproc->ProcessMessages(pnode, flagInterruptMsgProc);
40                 fMoreWork |= (fMoreNodeWork && !pnode->fPauseSend);
41@@ -3058,12 +3069,13 @@ void CConnman::ThreadMessageHandler()
42                 // Send messages
43                 m_msgproc->SendMessages(pnode);
44 
45                 if (flagInterruptMsgProc)
46                     return;
47             }
48+            LogInfo("===== end");
49         }
50 
51         WAIT_LOCK(mutexMsgProc, lock);
52         if (!fMoreWork) {
53             condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100), [this]() EXCLUSIVE_LOCKS_REQUIRED(mutexMsgProc) { return fMsgProcWake; });
54         }

DrahtBot removed the label CI failed on Apr 29, 2025

DrahtBot added the label Needs rebase on Jul 18, 2025

mzumsande force-pushed on Jul 21, 2025

mzumsande commented at 3:27 pm on July 21, 2025: contributor

b7ff6a6 to 3bdff9a: rebased due to conflict with #32868

DrahtBot removed the label Needs rebase on Jul 21, 2025

DrahtBot added the label CI failed on Jul 21, 2025

DrahtBot removed the label CI failed on Jul 22, 2025

in src/net_processing.cpp:1416 in 0a757acf66 outdated

1410@@ -1411,6 +1411,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1411     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
1412         return;
1413 
1414+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
1415+    // allows us to determine nWindowEnd better.
1416+    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {

furszy commented at 8:01 pm on July 23, 2025:

nano non-blocking nits: Could extract tip = ActiveTip() and use it everywhere here to make it slightly more readable.

Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip).

mzumsande commented at 10:13 pm on September 15, 2025:

About the first, I didn’t have a strong opinion. Seems like if we do it for ActiveTip(), we should also do it for ActiveHeight().

Also, this isn’t introduced by this PR, but the second check has always puzzled me a bit. It would be nice to have some util function like IsBlockInChain(const CBlockIndex* block_to_test, const CBlockIndex* chain_tip).

I’d say that a name like CChain::IsAncestorOf(const CBlockIndex* possible_ancestor, const CBlockIndex* predecessor) would be more precise it’s doing more than just checking if in chain. I think I’d be concept ACK to that - how about a separate PR and changing it in various places of the codebase where the GetAncestor / height check is used?

furszy commented at 8:20 pm on October 10, 2025:

how about a separate PR and changing it in various places of the codebase where the GetAncestor / height check is used?

That would sounds good to me.

furszy commented at 8:07 pm on July 23, 2025: member

ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6

DrahtBot requested review from vasild on Jul 23, 2025

stringintech commented at 10:21 am on July 30, 2025: contributor

ACK 3bdff9a

I think it would be nice to include how the newly added logic would also cover the assumeutxo case in the commit description; if my understanding is correct, adding something like:

After snapshot loading, our tip becomes the snapshot block. Since peers are verified to have the snapshot in their chain, our tip is an ancestor of the peer’s best block, hence the general advancement logic will move pindexLastCommonBlock from any pre-snapshot position to the snapshot height automatically.

mzumsande commented at 7:44 pm on August 4, 2025: contributor

Thanks! I’ll address the nits if / when I have to push again.

achow101 commented at 10:09 pm on September 11, 2025: member

ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6

vasild commented at 12:56 pm on September 12, 2025: contributor

ACK 3bdff9a154733f8f9f379448f5595a2e90474bc6 but see #32180 (review), non-blocking IMO, would be nice to address

mzumsande force-pushed on Sep 15, 2025

mzumsande commented at 10:06 pm on September 15, 2025: contributor

3bdff9a to b2c6cb7:

Reintroduced test improvement that I unintentionally dropped with the last rebase (https://github.com/bitcoin/bitcoin/pull/32180#discussion_r2344157017)
changed commit message of “p2p: remove extra logic for assumeutxo” (https://github.com/bitcoin/bitcoin/pull/32180#issuecomment-3135699830)

vasild approved

vasild commented at 1:10 pm on September 29, 2025: contributor

ACK b2c6cb7f026ff704c227e632f8c5c2f09e6f058a

DrahtBot requested review from furszy on Sep 29, 2025

DrahtBot requested review from achow101 on Sep 29, 2025

in src/net_processing.cpp:1414 in 0a757acf66 outdated

1410@@ -1411,6 +1411,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1411     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
1412         return;
1413 
1414+    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here

sipa commented at 2:44 pm on October 1, 2025:

It occurs to me that it’s possible the fork point between the active tip and pindexBestKnownBlock is ahead of pindexLastCommonBlock, but not quite equal to the active tip itself.

Would something like this, unconditionally, not be more general even?

0auto fork_point = LastCommonAncestor(state->pindexBestKnownBlock, m_chainman.ActiveTip());
1if (fork_point->nChainWork > state->pindexLastCommonBlock->nChainWork) {
2    state->pindexLastCommonBlock = fork_point;
3}

If performance of LastCommonAncestor is a concern, I think it can be modified to run in $\mathcal{O}(\log n)$ time.

stringintech commented at 9:09 pm on October 1, 2025:

This makes sense and if we end up doing it, it seems it also allows removing some logic from FindNextBlocksToDownload() and have this:

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index 9ab6fb53ed..423834118c 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -1397,24 +1397,15 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
 5         return;
 6     }
 7 
 8-    // Bootstrap quickly by guessing a parent of our best tip is the forking point.
 9-    // Guessing wrong in either direction is not a problem.
10-    if (state->pindexLastCommonBlock == nullptr) {
11-        state->pindexLastCommonBlock = m_chainman.ActiveChain()[std::min(state->pindexBestKnownBlock->nHeight, m_chainman.ActiveChain().Height())];
12+    auto fork_point = LastCommonAncestor(state->pindexBestKnownBlock, m_chainman.ActiveTip());
13+    if (state->pindexLastCommonBlock == nullptr ||
14+        fork_point->nChainWork > state->pindexLastCommonBlock->nChainWork ||
15+        state->pindexBestKnownBlock->GetAncestor(state->pindexLastCommonBlock->nHeight) != state->pindexLastCommonBlock) {
16+        state->pindexLastCommonBlock = fork_point;
17     }
18-
19-    // If the peer reorganized, our previous pindexLastCommonBlock may not be an ancestor
20-    // of its current tip anymore. Go back enough to fix that.
21-    state->pindexLastCommonBlock = LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock);
22     if (state->pindexLastCommonBlock == state->pindexBestKnownBlock)
23         return;
24 
25-    // If our tip has advanced beyond pindexLastCommonBlock, move it ahead to the tip. We don't need to download any blocks in between, and skipping ahead here
26-    // allows us to determine nWindowEnd better.
27-    if (m_chainman.ActiveHeight() > state->pindexLastCommonBlock->nHeight && state->pindexBestKnownBlock->GetAncestor(m_chainman.ActiveHeight()) == m_chainman.ActiveTip()) {
28-        state->pindexLastCommonBlock = m_chainman.ActiveTip();
29-    }
30-
31     const CBlockIndex *pindexWalk = state->pindexLastCommonBlock;
32     // Never fetch further than the best block we know the peer has, or more than BLOCK_DOWNLOAD_WINDOW + 1 beyond the last
33     // linked block we have in common with this peer. The +1 is so we can detect stalling, namely if we would be able to

edit: moved the second predicate (peer reorg validation) in the OR clause to last position for better performance.

Crypt-iQ commented at 2:14 pm on October 8, 2025:

I guess there’s no way to quickly set pindexLastCommonBlock if it’s outside of our active chain, so skipping ahead works only when it’s inside the active chain (even if we had the blocks to do so)?

mzumsande commented at 10:10 pm on October 9, 2025:

I like the suggested approach, it is alos much than a multi-step approach, so I changed to it.

If performance of LastCommonAncestor is a concern

I don’t think so, in particular because it replaces the long-existing LastCommonAncestor(state->pindexLastCommonBlock, state->pindexBestKnownBlock); call which should be of very similar duration (during IBD, the old pindexLastCommonBlock is close to the current tip, while pindexBestKnownBlock is at the best known header typically.

I guess there’s no way to quickly set pindexLastCommonBlock if it’s outside of our active chain, so skipping ahead works only when it’s inside the active chain (even if we had the blocks to do so)?

I’m not sure what scenario you have in mind - a fork in which our active chain reorged away from the peer’s best chain? This seems hypothetical enough that performance is probably not too important.

Crypt-iQ commented at 1:22 pm on October 10, 2025:

I’m not sure what scenario you have in mind - a fork in which our active chain reorged away from the peer’s best chain? This seems hypothetical enough that performance is probably not too important.

Is my understanding correct that pindexLastCommonBlock could be different across peers on the same chain as us (depending on which peers are processed first) or different because our peers may be on different chains from each other? I was just thinking out loud about the second case, where performance shouldn’t be a concern.

mzumsande commented at 4:32 pm on October 10, 2025:

Is my understanding correct that pindexLastCommonBlock could be different across peers on the same chain as us (depending on which peers are processed first) or different because our peers may be on different chains from each other?

yes, that’s correct. pindexLastCommonBlock is also moved forward here in FindNextBlocks if all previous blocks have data, so each peer’s pindexLastCommonBlock depends on the recent FindNextBlocks history.

That’s why it’s important to not set pindexLastCommonBlock to the fork_point if the existing entry is a predecessor of pindexBestKnownBlock and has more work than the fork point. If we did that, we would reverse the progress from FindNextBlocks by settting pindexLastCommonBlock backwards, and syncing with peers on different chains that would require large reorgs of size >1024 wouldn’t work anymore.

achow101 referenced this in commit de1dc6b47b on Oct 7, 2025

Crypt-iQ commented at 2:13 pm on October 8, 2025: contributor

crACK 0a757acf66d905566729ec4140159cbc68a95ac7

I think this could be fixed by exchanging the order of this and this code block. I’m not sure if this is something that needs fixing though because the current patch should avoid this problem too…

Since the patch only applies when our tip is an ancestor of pindexBestKnownBlock, I think this patch does not fix it if that condition is false?

DrahtBot requested review from Crypt-iQ on Oct 8, 2025

p2p: During block download, adjust pindexLastCommonBlock better

Simplify and improve the logic for calculating pindexLastCommonBlock, in order to calculate
nWindowEnd better.
The previous logic would not take into account when the chain tip had moved forward, so that
FindNextBlocks could iterate over many blocks already downloaded and
connected, which could result in blocks not being requested for download that should have been
requested, and peers being wrongly marked as staller.

It also removes extra logic from commit 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb
for the situation right after a snapshot was loaded:
After snapshot loading, our tip becomes the snapshot block.
For peers that have the most-work chain, which inlcludes the snapshot,
our tip is an ancestor of the peer's best block, hence the general
advancement logic will move pindexLastCommonBlock
from any pre-snapshot position to the snapshot height automatically.

Co-authored-by: stringintech <stringintech@gmail.com>
Co-authored-by: Pieter Wuille <pieter@wuille.net>

3069d66dca

test: remove magic number when checking for blocks that have arrived

getpeerinfo provides a list of blocks that are inflight, which can be used
instead.

9af6daf07e

test: improve coverage for a resolved stalling situation

This test (which would fail without the previous commit) checks
that after the stalling block was received, we don't incorrectly
mark another peer as a staller immediately.

01cc20f330

mzumsande force-pushed on Oct 9, 2025

mzumsande commented at 10:11 pm on October 9, 2025: contributor

b2c6cb7 to 01cc20f: changed the approach of the main commit to the suggestion by @sipa. I also added a comment and rewrote the commit message.

Crypt-iQ commented at 2:03 pm on October 10, 2025: contributor

crACK 01cc20f3307c532f402cdf5dc17f2f14920b725b

DrahtBot requested review from vasild on Oct 10, 2025

sipa commented at 8:54 pm on October 13, 2025: member

Imagine the following state:

 0graph BT
 1
 2g["genesis"];
 3f["fork_point"];
 4a["ActiveTip()"];
 5b["pindexBestKnownBlock"];
 6l["pindexLastCommonBlock"];
 7p["P"];
 8f-->g;
 9a-->f;
10l-->p;
11b-->p;
12p-->f;

In this situation, I think we want to update pindexLastCommonBlock to be P, but the current code in this PR will select fork_point. Is this a situation that can occur?

I think that the logic we actually want is selecting the more-work one among LastCommonAncestor(pindexBestKnownBlock, ActiveTip()) and LastCommonAncestor(pindexBestKnownBlock, pindexLastCommonBlock).

mzumsande commented at 3:13 pm on October 14, 2025: contributor

I think that the logic we actually want is selecting the more-work one among LastCommonAncestor(pindexBestKnownBlock, ActiveTip()) and LastCommonAncestor(pindexBestKnownBlock, pindexLastCommonBlock).

That means a situation where a peer reorged to a different chain compared to the last call, while our tip is on yet a third chain - I agree that it would be better to go to P here. However, it would just be an optimization - without it, we will set pindexLastCommonBlock to the fork_point, move it along the chain towards P (which may again lead to the delay / unnecessary stalling detections we see on master today).

I think that the logic we actually want is selecting the more-work one among LastCommonAncestor(pindexBestKnownBlock, ActiveTip()) and LastCommonAncestor(pindexBestKnownBlock, pindexLastCommonBlock).

This would mean a second call to LastCommonAncestor though - if we did that, wouldn’t we be optimizing for an edge case we’ll likely never encounter on mainnet, at the detriment of the usual case of a linear chain?

sipa commented at 2:01 am on October 16, 2025: member

@mzumsande Agree that sounds sufficiently unlikely to optimize for (and sufficiently expensive to intentionally cause).

Code review ACK on the logic change, but I want to have a look at the tests still.

in src/net_processing.cpp:1409 in 01cc20f330

1409-        (snap_base && state->pindexLastCommonBlock->nHeight < snap_base->nHeight)) {
1410-        state->pindexLastCommonBlock = m_chainman.ActiveChain()[std::min(state->pindexBestKnownBlock->nHeight, m_chainman.ActiveChain().Height())];
1411+        fork_point->nChainWork > state->pindexLastCommonBlock->nChainWork ||
1412+        state->pindexBestKnownBlock->GetAncestor(state->pindexLastCommonBlock->nHeight) != state->pindexLastCommonBlock) {
1413+        state->pindexLastCommonBlock = fork_point;
1414     }

vasild commented at 1:03 pm on October 16, 2025:

0// pindexLastCommonBlock is required to be an ancestor of pindexBestKnownBlock
1...
2state->pindexBestKnownBlock->GetAncestor(state->pindexLastCommonBlock->nHeight)

If Last is an ancestor of Best and we get Best’s ancestor at height Last.height, isn’t this always going to be Last?

sipa commented at 5:13 pm on October 17, 2025:

I think the comment is a bit misleading. When calling FindNestBlocks (= at the end of this function), LastCommonBlock must be an ancestor of BestKnownBlock, but this isn’t the case during FindNextBlocksToDownload (one of its functions is making sure LastCommonBlock is set to something satisfying that property).

vasild commented at 5:02 pm on November 6, 2025:

@mzumsande ?

mzumsande commented at 2:45 pm on November 11, 2025:

Sorry, I missed this. yes, it’s a bit ambiguous, the intention is “required for FindNestBlocks” (it’s also used as a starting point there). This could be made more precise in a follow-up.

DrahtBot requested review from vasild on Oct 16, 2025

bitcoin deleted a comment on Oct 17, 2025

sipa commented at 6:13 pm on October 17, 2025: member

utACK 01cc20f3307c532f402cdf5dc17f2f14920b725b

stringintech commented at 10:16 am on November 3, 2025: contributor

re-ACK 01cc20f

achow101 commented at 4:32 pm on November 10, 2025: member

ACK 01cc20f3307c532f402cdf5dc17f2f14920b725b

achow101 merged this on Nov 10, 2025

achow101 closed this on Nov 10, 2025

TheCharlatan referenced this in commit 94f024ad07 on Nov 11, 2025

p2p: Advance pindexLastCommonBlock early after connecting blocks #32180

Code Coverage & Benchmarks

Reviews

Conflicts