Context and Motivation
Whenever we’re modifying caching behavior (optimizations, refactors, new features, calculating additional metrics), a common concern is often: “Sweet, but have you tested it via a reorg?!”
We already have tests covering basic reorg scenarios (feature_block.py, p2p_unrequested_blocks.py, feature_pruning.py). However, we’re lacking a macro regression test suite that systematically verifies Bitcoin Core behavior against historical mainnet reorg events, especially across releases, different configurations (pruned, txindex, varying memory settings), or with random undo/redo cycles.
Proposal: Historical Reorg Macro Test Suite
The goal is to create a robust regression test ensuring the latest Bitcoin Core handles historical reorgs identically to when they originally occurred. This would increase confidence that new versions do not introduce regressions in complex reorg and undo/redo logic, especially when modifying sensitive code paths. While partially covered by existing synthetic tests, this proposal is a more heavyweight alternative, using real historical blocks, performing a full IBD, and explicitly checking for behavior changes related to reorgs.
Making sure mainnet
behavior is retained in critical, but we might as well extend it to making sure testnet
behavior (which is a lot more volatile anyway) is also covered.
Dedicated Historical Stale Block Proxy
Leveraging the existing bitcoin-data/stale-blocks dataset, which currently contains over 200 real historical stale blocks, we propose:
- A dedicated fake node (“stale-block proxy”) that replays historical mainnet headers and blocks exactly as originally observed (since we can’t have reorgs during IBD otherwise, but this way we can simulate the ones that did actually happen).
- The node under test would exclusively connect to this proxy node.
- The proxy sequentially presents each historical stale block as a temporary chain tip (once the stale block is reached, the proxy moves on to the next available stale block, routing real blocks via the network), forcing the test node into realistic mainnet reorg conditions during a full IBD.
- Once we reach a given height we could validate the resulting UTXO set againt known AssumeUTXO hashes.
Key Testing Scenarios:
- Perform full initial block download (IBD) against the stale-block proxy, ensuring natural and historically accurate chain reorgs.
- Test various node configurations explicitly:
- Default setup
- Pruned nodes
- Nodes with small and large dbcache memory allocations
- Nodes running with
txindex=1
Additional Randomized Undo/Redo Testing:
In addition to historical scenarios, randomly trigger smaller undo/redo reorg cycles at various block heights to further stress-test UTXO consistency, using CoinsTip::SanityCheck()
for validation before and after each reorg (confirming that undoing and reapplying a block results in the same state).
RFC / Questions:
- Should this be an optional functional test (run periodically, monthly, pre-release), or triggered automatically via a GitHub label for relevant PRs?
- Are there any additional scenarios or configurations we should consider?
- How could we gather more historical stale blocks for our dataset (do we even have data for >1 reorgs)?