p2p: For assumeutxo, download snapshot chain before background chain

mzumsande commented at 9:44 pm on February 29, 2024: contributor

After loading a snapshot, pindexLastCommonBlock is usually already set to some block for existing peers. That means we’d continue syncing the background chain from those peers instead of prioritising the snapshot chain, which defeats the purpose of doing assumeutxo in the first place. Only existing peers are affected by this bug.

DrahtBot commented at 9:44 pm on February 29, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	fjahr, Sjors, achow101
Stale ACK	ryanofsky

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label P2P on Feb 29, 2024

fanquake added this to the milestone 27.0 on Mar 1, 2024

fanquake requested review from ryanofsky on Mar 1, 2024

fanquake commented at 4:18 pm on March 5, 2024: member

Note that given that 27 will not ship with mainnet assumeutxo, this isn’t a blocker, and a fix can still be backported to 27.x if deemed necessary.

Sjors commented at 3:21 pm on March 11, 2024: member

Concept ACK

ac547ea43beb1d66d2c11e7e8abb76dd1bfc2883 looks reasonable, but it would be nice to somehow add a test for it.

What happens once we’ve reached the tip and need more peers for background sync? And once background sync is done, is it updated to the tip again?

Would it make sense to introduce pindexLastCommonBackgroundSyncBlock so we have more control over what peers to dedicate to the tip vs background (which might take weeks on some devices)?

mzumsande commented at 8:30 pm on March 11, 2024: contributor

ac547ea looks reasonable, but it would be nice to somehow add a test for it.

I tested this syncing on signet, but I’ll think about a functional test; I imagine that it would be hard to prevent race conditions in a functional test because we’d:

need to first connect to a peer that has the entire chain and start syncing the background chain
then load the snapshot (at the same time the peer would continue syncing the background chain)
then check that we switch to requesting from the snapshot chain (while the bg chain hopefully hasn’t finished downloading yet)

It might be possible though if that peer was a P2PConnection() (that could stall / withhold certain blocks) instead of a real node.

What happens once we’ve reached the tip and need more peers for background sync? And once background sync is done, is it updated to the tip again?

Would it make sense to introduce pindexLastCommonBackgroundSyncBlock so we have more control over what peers to dedicate to the tip vs background (which might take weeks on some devices)?

We have this logic in SendMessages: It first tries FindNextBlocksToDownload meant for the active chainstate, and, if it doesn’t fill up the vToDownload buffer (e.g. because we are at / near the tip), then calls TryDownloadingHistoricalBlocks specifically for the background chainstate. I think that this existing separation should work well, the problem that this PR fixes is that FindNextBlocksToDownload would pick historical blocks from the background chain when it shouldn’t because higher-prio blocks are available.

Sjors commented at 3:45 pm on March 12, 2024: member

I can see how it would be difficult to test, indeed.

I encountered this scenario while testing #29370 on signet. All my peers were only downloading the background state. I only got headers for the tip chain, but it didn’t progress. The workaround there was to turn the network off and on, which is consistent with your observation that only existing peers are affected.

utACK ac547ea43beb1d66d2c11e7e8abb76dd1bfc2883

fjahr commented at 5:58 pm on March 25, 2024: contributor

The code and reasoning look good to me and I have been trying to write a test but I am currently unable to reproduce the behavior described before the change, i.e. to make the test fail on master.

Here is the test so far: https://github.com/fjahr/bitcoin/commit/478fb28dac4e22175da150cee97991a66b4f5695

This now does fail on master but only because it also checks for the debug log HERE I have added. This log confirms that the test does trigger the behavior change, i.e. on master the last common block is set once, after the change here it is set twice within FindNextBlockToDownload. But, of course, we would rather like to have the test fail at the asserts at the end but that is not the case so far. @mzumsande do you have an idea why this doesn’t reproduce the behavior you have seen on master?

mzumsande commented at 10:24 pm on March 25, 2024: contributor

@mzumsande do you have an idea why this doesn’t reproduce the behavior you have seen on master?

I tested your branch, and I see the same - this is really interesting, and it works differently than I thought when I opened the PR!

I think the following happens in PeerManagerImpl::FindNextBlocks (after being called from FindNextBlocksToDownload()): We process up to 1024 blocks ahead of the current pindexWalk: https://github.com/bitcoin/bitcoin/blob/2102c978b5f2d0bfaa0290c00ed693555f3403d4/src/net_processing.cpp#L1430 But due to the lines https://github.com/bitcoin/bitcoin/blob/2102c978b5f2d0bfaa0290c00ed693555f3403d4/src/net_processing.cpp#L1497-L1502 we never ask for the blocks below the snapshot block here: Even though we don’t have their data, they are now in the active chain, so they are skipped instead. In the test, we immediately ask for the blocks after the snapshot instead, which is exactly the desired behavior.

So why did I observe a different behavior on signet? I think it only works like this in the test because there, the snapshot is just a 100 blocks away from pindexLastCommonBlock.

If this wasn’t the case, we’d request nothing in FindNextBlocksToDownload(), only move our pindexLastCommonBlock ahead by a maximum of 1024. Then we have capacity for this peer left and would actually ask for the historical blocks via TryDownloadingHistoricalBlocks().

So I think what would actually happen on mainnet and signet is that each time we can process a block request with an existing peer, we’d move pindexLastCommonBlock 1024 blocks ahead, and then request historical blocks - until pindexLastCommonBlock has reached the snapshot. So we’d continuing downloading the background chain for a while (a few minutes?) with existing peers until we switch over to the snapshot chain (instead of never switching over as I originally thought when opening this PR).

Still, I think that this behavior is not great, and the fix is an improvement - not only because we start downloading the snapshot chain immediately, we also do much less iterating in FindNextBlocksToDownload , moving pindexLastCommonBlock ahead without actually downloading anything.

fanquake removed this from the milestone 27.0 on Apr 16, 2024

fjahr commented at 11:48 pm on May 30, 2024: contributor

tACK ac547ea43beb1d66d2c11e7e8abb76dd1bfc2883

I reproduced the issue on current master with signet and confirmed that it is fixed with the change here (three times actually with slightly different logs). Based on reading the code and some custom logging I agree with @mzumsande that his description above is what is indeed happening. Based on this I am fine to add this without a test.

fjahr commented at 11:51 pm on May 30, 2024: contributor

@ryanofsky would be really great to get your review here when you get the chance :)

in src/net_processing.cpp:1402 in ac547ea43b outdated

1396@@ -1397,9 +1397,12 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1397         return;
1398     }
1399 
1400-    if (state->pindexLastCommonBlock == nullptr) {
1401+    std::optional<int> snapshot_height{m_chainman.GetSnapshotBaseHeight()};
1402+    if (state->pindexLastCommonBlock == nullptr ||
1403+        (snapshot_height.has_value() && state->pindexLastCommonBlock->nHeight < snapshot_height.value())) {

ryanofsky commented at 9:05 pm on June 5, 2024:

In commit “p2p: For assumeutxo, download snapshot chain before background chain” (ac547ea43beb1d66d2c11e7e8abb76dd1bfc2883)

It seems like the pindexLastCommonBlock->nHeight < snapshot_height check is not strict enough because the snapshot block might not necessarily be an ancestor the peer’s best known block.

In this case, the pindexLastCommonBlock assignments on line 1406 and 1411 will reset it back to the fork point between the two nodes each time FindNextBlocksToDownload is called. And if the peer’s best block is more than BLOCK_DOWNLOAD_WINDOW blocks away from the current chain, the node will never be able to download enough blocks to reach it and validate it. This means if the snapshot is not on the best chain, the node may be unable to sync to a better chain.

Suggestion to fix would be:

 0--- a/src/net_processing.cpp
 1+++ b/src/net_processing.cpp
 2@@ -1487,12 +1487,14 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
 3         return;
 4     }
 5 
 6-    std::optional<int> snapshot_height{m_chainman.GetSnapshotBaseHeight()};
 7+    // If pindexLastCommonBlock has not been set yet, set it now. Also reset it
 8+    // if an assumetxo snapshot has been loaded and it is an ancestor of the
 9+    // snapshot, so only blocks after the snapshot will be downloaded.
10+    const CBlockIndex* snap_base{m_chainman.GetSnapshotBaseBlock()};
11     if (state->pindexLastCommonBlock == nullptr ||
12-        (snapshot_height.has_value() && state->pindexLastCommonBlock->nHeight < snapshot_height.value())) {
13+        (snap_base && snap_base->GetAncestor(state->pindexLastCommonBlock->nHeight) == state->pindexLastCommonBlock)) {
14         // Bootstrap quickly by guessing a parent of our best tip is the forking point.
15         // Guessing wrong in either direction is not a problem.
16-        // Also applies after loading a snapshot, when we want to prioritise loading the snapshot chain and therefore need to skip ahead.
17         state->pindexLastCommonBlock = m_chainman.ActiveChain()[std::min(state->pindexBestKnownBlock->nHeight, m_chainman.ActiveChain().Height())];
18     }
19

mzumsande commented at 8:39 pm on June 11, 2024:

This means if the snapshot is not on the best chain, the node may be unable to sync to a better chain.

I’ve thought about this for a while, and I’m not sure if that problem is really being fixed by your suggestion for the following (slightly philosophical) reasons:

Accepting a snapshot means that we’ve created a new chainstate with its Active Tip being set to that block.
That is a commitment, we can’t just reorg away from it to another chain that doesn’t contain the snapshot block, even if such a chain existed - at least not until the background sync has finished (we’d need the full blocks below the snapshot), and once that has happened, the snapshot chainstate gets deleted.
If all of our peers are on a better chain not containing the Snapshot, it seems to me that we are in a hopeless situation anyway: With the current patch we might download no blocks from those peers, with your suggestion we’d download useless blocks that can’t be connected - I’m not sure why one is better than the other.

So I think that the premise that we could get into this situation should be rejected: If a valid snapshot was provided and put into a release, it should only be possible if there was a massive reorg of multiple thousands of blocks (and we’d also need to have accepted header chains ending in both chaintips from different peers). I’d say that in this hypothetical situation, AssumeUtxo is not at the top of the list of our concerns - bitcoin would have far bigger problems.

Without a massive reorg, there is the alternative that the snapshot provided was bad somehow. But then (apart from the fact that it shouldn’t have been hardcoded into chainparams), it also shouldn’t have been loaded in the first place because its header should not have been accepted to our block index with our header DoS protections.

So all in all I think that once we’ve accepted a snapshot, we’ve made a commitment and have to treat it as part of the best chain, at least until background sync has finished.

ryanofsky commented at 2:00 pm on June 12, 2024:

re: #29519 (review)

Good points. My suggested change was premised on the false assumption that the node would be able to seamlessly reorg to the most-work chain. But that’s not true because of missing undo data.

Your philosophical point is interesting. I believe that:

Loading snapshots should not affect how nodes come to consensus, it should only affect what order they download blocks, and help them compute the latest UTXO sets sooner.

And you seem to be suggesting:

Loading a snapshot should force a node to only consider chains that include the snapshot block and ignore other chains, even if they have more work.

On this PR more specifically:

When I made my suggested change, I was forgetting that we don’t have any undo data for the snapshot block and blocks before it, so unfortunately it is not possible for a node to follow the most-work chain when a snapshot is loaded until all the preceding blocks are downloaded.

Given this fact, I think the best thing FindNextBlocksToDownload can do when an unvalidated snapshot is loaded is to ignore peers on other chains not containing the snapshot block, until it actually has the blocks and undo data that would be needed to validate other chains. I think the current PR is not exactly doing this though. It looks like it will still download up to BLOCK_DOWNLOAD_WINDOW useless blocks on the peer’s chain, starting at the fork point between the two chains, which is not great. It will also not download later blocks after snapshot is validated, when undo data needed to enable a re-org becomes available.

On the broader approach to downloading blocks when a snapshot is loaded:

I’d agree now that the only sensible block downloading behavior for an unvalidated snapshot chainstate is to only download blocks after the snapshot (due to lack of undo data before then) and ignore any peers whose best blocks don’t descend from the snapshot block.

But it is less clear which blocks should be downloaded and attached to the older chainstate, and whether it should target the most-work block or the snapshot block. Since d43a1f1a2fa35d377c7a9ad7ab92d1ae325bde3d from #27746 it targets the snapshot block, but was not doing that originally. Comparing the two alternatives:

Targeting the most-work block seems like a purer approach that would probably simplify the code and remove special cases, and would also have the benefit of the node syncing to the most-work chain faster, with less wasted work if it the chain turns out to be valid.
Targeting the snapshot block is also very reasonable, and is what the code is doing currently, and guarantees the snapshot will be validated even if it is not used, and could be a practical defense against eclipse attacks, or fork chains that appear to have more work but turn out to be invalid.

On whether we should care about any of this:

I don’t think we need to care about these design choices too much, since everything above is referring to a case we don’t expect to encounter, where there is a possibly-valid chain of blocks, not including the snapshot block, that has more work than any known chain including the snapshot block. The only time when we should see this is when there is a major fork with a lot of work behind it.

However, I don’t think “the premise that we could get into this situation should be rejected.” That seems like it is going too far. We should put some thought into how code will behavior in presence for forks for practical reasons, and also to inform design of the code and to write it in a clean way that isn’t full of assumptions and special cases.

I do think, as long as we are hardcoding snapshot hashes in the source code, we can reject premise that snapshot chainstate turns out to be invalid, and just refuse to run in that case. But the case where the snapshot is valid, but the snapshot block is not on the most-work chain seems different and worth thinking about.

mzumsande commented at 3:27 pm on June 12, 2024:

And you seem to be suggesting: Loading a snapshot should force a node to only consider chains that include the snapshot block and ignore other chains, even if they have more work.

This was meant more as a description of the status quo than a description of the ideal world. I think that with today’s logic we don’t really have another choice but to do so - temporarily until the background sync has finished.

but the snapshot block is not on the most-work chain seems different and worth thinking about

I agree with that.Some possible ways to deal with that:

1.) Tighten the criteria for snapshot acceptance: We could reject loading the snapshot if the snapshot block is not a predecessor of m_best_header. Currently, we only check that it has more work than the ActiveTip. This is easy to implement, though it doesn’t cover the case where we only learn about the better chain after the snapshot has been loaded.

2.) After a snapshot is accepted and we detect that the snapshot is no longer on the most-work chain we know of, we could abort syncing the snapshot, delete the chainstate and go back to normal sync mode. This sounds more complicated to implement, and I’m not convinced it’s worth it.

It looks like it will still download up to BLOCK_DOWNLOAD_WINDOW useless blocks on the peer’s chain, starting at the fork point between the two chains, which is not great.

That’s true. I will try to adjust the PR to deal with that. Since I find the logic in FindNextBlocksToDownload / TryDownloadingHistoricalBlocks / FindNextBlocks rather hard to understand, I would like to reproduce the situation in a test (at least locally, sounds like timing issues could make this hard to include in a PR) to be more confident that this is really how it works, which could take a few days.

mzumsande commented at 3:42 pm on June 14, 2024:

I have now added a first commit that avoids downloading blocks not on the snapshot chain completely. Local testing revealed that not only are we unable to reorg to a more-work chain, we also won’t fail gracefully: On master, we would still download the blocks and attempt to reorg, but then the node would just crash.

ryanofsky commented at 9:37 pm on June 5, 2024: contributor

Sorry for not seeing this earlier. I think I found a potential problem with this change, so I left a comment below and would be curious to find out more.

ryanofsky approved

ryanofsky commented at 2:28 pm on June 12, 2024: contributor

Code review ACK ac547ea43beb1d66d2c11e7e8abb76dd1bfc2883. This is an improvement over the status quo in the normal case, though it seems like it introduces some odd behavior if snapshot block is not on the most work chain, as described in other comments. I think it would be nice to clean the code up more and clarify what is trying to do, instead of merging this change as it is. But it would also be ok to merge in its current form, if you prefer that.

mzumsande commented at 3:28 pm on June 12, 2024: contributor

I think it would be nice to clean the code up more and clarify what is trying to do, instead of merging this change as it is.

I will look into improving this (see thread above) in the next days, so hold off merging please.

mzumsande force-pushed on Jun 14, 2024

mzumsande commented at 3:46 pm on June 14, 2024: contributor

ac547ea to e977c69: Added a commit that avoids downloading blocks not on the snapshot chain (see discussion above). Also made small changes to the (now) second commit using snap_base instead of GetSnapshotBaseHeight(), and reworked the documentation.

in src/net_processing.cpp:1400 in 019fba99b0 outdated

1396@@ -1397,6 +1397,14 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1397         return;
1398     }
1399 
1400+    // When syncing with AssumeUtxo, if the snapshot is not in the peer's best chain, abort:

fjahr commented at 2:14 pm on June 18, 2024:

nit: A bit easier to parse IMO, only if you have to retouch.

0    // When we sync with AssumeUtxo and discover the snapshot is not in the peer's best chain, abort:

mzumsande commented at 8:06 pm on August 6, 2024:

done with the latest push.

fjahr commented at 2:25 pm on June 18, 2024: contributor

tACK e977c698f97ac9bba30e4e3837f41721841c28c4

I reviewed the new changes and re-tested the behavior on Signet.

I mostly agree with @mzumsande above and will post some more thoughts on #30288 rather than here since the conversation seems to have moved over there.

So, since I agree this is not something we should encounter this is not a blocker, but just to confirm my understanding: with the current state of the change, if there was an actual other best chain deeper than the snapshot discovered, our node would get stuck because all other nodes would reorg to the best chain and then we would not request blocks from any of our peers, right?

DrahtBot requested review from ryanofsky on Jun 18, 2024

DrahtBot requested review from Sjors on Jun 18, 2024

mzumsande commented at 3:39 pm on June 18, 2024: contributor

if there was an actual other best chain deeper than the snapshot discovered, our node would get stuck because all other nodes would reorg to the best chain and then we would not request blocks from any of our peers, right?

yes, but as you mentioned only if all peers were on that other chain. Whereas the behavior on master would be to download some blocks and then crash with an assert trying to reorg to the new chain - note that this would also happen if the other chain was invalid (for reasons we’d only find out during connecting the blocks) and even if there was just a single peer feeding us that chain.

fjahr commented at 10:27 pm on June 19, 2024: contributor

yes, but as you mentioned only if all peers were on that other chain.

It may be worth logging something every time we ignore a peer that is on another chain. In case we encounter this (very unlikely) scenario it would help identify the issue faster and it could be an interesting data point alone to know if anyone ever sees this at all.

Sjors commented at 4:00 pm on July 23, 2024: member

In the unlikely event of such a deep reorg and a lack of peers on the old side, I think it’s acceptable if the user needs to restart their node in order to follow the new longest headers.

p2p: Restrict downloading of blocks for snapshot chain

If the best chain of the peer doesn't include the snapshot
block, it is futile to download blocks from this chain,
because we couldn't reorg to it. We'd also crash
trying to reorg because this scenario is not handled.

7a885518d5

p2p: For assumeutxo, download snapshot chain before background chain

After loading a snapshot, pindexLastCommonBlock is usually already set
to some block for existing peers. That means we'd continue syncing the
background chain from those peers instead of prioritising the snapshot
chain, which defeats the purpose of doing assumeutxo in the first place.
Only existing peers are affected by this bug.

49d569cb1f

mzumsande force-pushed on Aug 6, 2024

mzumsande commented at 8:06 pm on August 6, 2024: contributor

It may be worth logging something every time we ignore a peer that is on another chain.

e977c69 to 49d569c: Added a log message and slightly reworded a comment.

fjahr commented at 8:10 pm on August 6, 2024: contributor

re-ACK 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb

Only trivial changes addressing comments since my last review, per diff: https://github.com/bitcoin/bitcoin/compare/e977c698f97ac9bba30e4e3837f41721841c28c4..49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb

in src/net_processing.cpp:1405 in 7a885518d5 outdated

1396@@ -1397,6 +1397,15 @@ void PeerManagerImpl::FindNextBlocksToDownload(const Peer& peer, unsigned int co
1397         return;
1398     }
1399 
1400+    // When we sync with AssumeUtxo and discover the snapshot is not in the peer's best chain, abort:
1401+    // We can't reorg to this chain due to missing undo data until the background sync has finished,
1402+    // so downloading blocks from it would be futile.
1403+    const CBlockIndex* snap_base{m_chainman.GetSnapshotBaseBlock()};
1404+    if (snap_base && state->pindexBestKnownBlock->GetAncestor(snap_base->nHeight) != snap_base) {
1405+        LogDebug(BCLog::NET, "Not downloading blocks from peer=%d, which doesn't have the snapshot block in its best chain.\n", peer.m_id);

Sjors commented at 11:19 am on August 7, 2024:

7a885518d57c6eb818ebef5fd04a575f324ee8b2: although this scenario is worthy of LogWarning, we have to be careful to avoid a log spam attack. So for now LogDebug is fine.

Sjors approved

Sjors commented at 3:54 pm on August 7, 2024: member

tACK 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb

Stalling fix

I tested (rebased on master) that without the second commit, when loading a testnet3 snapshot, and only having outbound peers, many peers keep downloading background sync blocks. A few newly connected peers do contribute to the tip.

With the second commit synced_blocks jumps to the snapshot height. Once the tip is reached (takes a while due to the loppocalypse and because the snapshot is a year old), and background sync needs to take over, synced_blocks does not jump back to the background sync height. However the inflight blocks are in the expected range for background sync.

Reorg fix

I tested the reorg scenario as well. First I invalidated block 2,870,000 near the tip and CPU mined a couple of blocks (by messing with the timestamp). I then made a snapshot for it which I hardcoded in the chain params.

I then spun up a fresh node, and first connected it to the previous node to get the right header. I then connected it to random nodes to find the real chain. Trying to load the fake snapshot runs into #30320, so I reverted that to simulate the scenario of a reorg that happens during snapshot handling.

With this PR I get the Not downloading blocks from peer log messages. The sync completely stalls, and their synced_blocks stays far below the snapshot height. After a few minutes it finds new outbound peers, but for those synced_blocks and synced_headers stays -1.

When I connect to my stale node the tip updates. The background sync then continues. Once that completes (I had to restart due to an unrelated crash) it winds back for a reorg, but fails:

02024-08-07T14:24:12Z [error] ReadBlockFromDisk: OpenBlockFile failed for FlatFilePos(nFile=-1, nPos=0)
12024-08-07T14:24:12Z [error] DisconnectTip(): Failed to read block
22024-08-07T14:24:12Z [error] A fatal internal error occurred, see debug.log for details: Failed to disconnect block.
32024-08-07T14:24:12Z [error] ProcessNewBlock: ActivateBestChain failed (Failed to disconnect block.)

It’s a pruned node, but that’s not the reason - it simply doesn’t have the undo data yet for blocks below the snapshot. Restarting just causes it to stop again. Certainly better than an assert. A followup PR could nuke the chainstate_snapshot dir in this scenario. Doing so manually makes it continue.

Reorg behaviour before this PR

Without the commits here, once the snapshot is loaded it stays at the snapshot height for a while. Both the slate and real chaintips are headers-only. This is expected stalling behavior.

I then restart the node. The chaintips remain headers-only for a while. Once I manually connect it to my stale node, it immediately syncs the fake tip. It doesn’t attempt to reorg.

Worse, when I restart the node it won’t do any background sync until I connect it to the node with the fake chaintip. It would only download from that peer.

When background sync finishes I get the same behaviour: it tries to reorg and fails to disconnect block 2,870,000.

Is there another crash you were referring to? The first commit message says:

We’d also crash trying to reorg because this scenario is not handled.

That could be clarified a bit. The scenario still isn’t handled, but we fail a bit later - and not with an assert.

Update: I do notice a difference (improvement). Before this PR after I delete the snapshot chainstate, it just syncs from the snapshot height to the tip - completely forgetting that it still needs to do a background sync. I’m a bit puzzled by this difference.

mzumsande commented at 5:58 pm on August 7, 2024: contributor

Is there another crash you were referring to? The first commit message says: We’d also crash trying to reorg because this scenario is not handled.

I didn’t mean an assert with that, just a shutdown with a fatal error. I tested this in a functional test, if I recall correctly the following happened:

After loading the snapshot, I would connect a peer with a higher-work tip branching off below the snapshot height. On master we’d download blocks from this peer, and as soon as we’d have a an alternative block with more work in setBlockIndexCandidates, we would try to reorg to that (below the snapshot height) - but shut down with a fatal error because the undo data wasn’t available.

Sjors commented at 11:18 am on August 8, 2024: member

What you recall sounds similar to what still happens. But it seems this PR makes recovery from that failure easier (just delete the snapshot dir). I just don’t fully understand why and it’s very time consuming to reproduce on testnet3.

achow101 added this to the milestone 28.0 on Aug 8, 2024

achow101 commented at 9:31 pm on August 9, 2024: member

ACK 49d569cb1fdd62a9da8dff51dccaf4680fe3d0eb

achow101 merged this on Aug 9, 2024

achow101 closed this on Aug 9, 2024

mzumsande deleted the branch on Aug 12, 2024

Johnnygatvol commented at 2:30 pm on August 13, 2024: none

I need to know where my money is going

vijaydasmp referenced this in commit 9bac31c593 on Apr 1, 2025

Fabcien referenced this in commit 67ce874e3f on Aug 11, 2025

Fabcien referenced this in commit 26514f6ff6 on Aug 11, 2025

bitcoin locked this on Aug 13, 2025

p2p: For assumeutxo, download snapshot chain before background chain #29519

Code Coverage

Reviews

Stalling fix

Reorg fix

Reorg behaviour before this PR