This one replaces #34643
Summary
This PR adds an optimized ChainClusterImpl for chain-shaped transaction topologies (A→B→C→D…). It replaces the generic DepGraph-based cluster with a compact linear representation (m_txdata) and a backward-absorbing chunk algorithm, reducing memory and improving performance for common chain patterns.
Key changes
m_maybe_chainflag onGroupEntryto hint at chain-shaped mergesChainClusterImplwithm_txdata(chain order) andm_split_segments(contiguous segments after removals)TryChainMergeinApplyDependenciesto detect and merge chains via the fast pathApplyRemovals/Splitcorrectly split chains when middle transactions are removed (no spurious dependencies)- Removal of
m_linearizationfromChainClusterImpl(always[0,1,…,n-1]) ComputeChunks()helper to centralize the backward-absorbing chunk logicSetFee/Relinearizefix: downgrade toNEEDS_RELINEARIZEon fee change to avoid dereferencing destroyed Refs;Relinearizepromotes back to OPTIMAL and recomputes chunks
Testing
- Unit tests:
txgraph_chain_merge(TryChainMerge),txgraph_chain_split_on_removal(ApplyRemovals/Split) - Benchmarks: ChainCluster merge and BlockBuilder paths
Trace replay comparison
Trace recording (TracingTxGraph) and replay (txgraph-replay, analyze_trace.py) code: git@github.com:HowHsu/bitcoin.git branch before_chaincluster. Usage guide: TxGraph Trace & Replay.
The trace file is on github LFS: https://howhsu.github.io/thinking-in-bitcoin/data/txgraph.trace.4.5days.final.xz
Same trace replayed on baseline vs ChainCluster branch. Parameters: max_cluster_count=64, max_cluster_size=404000, acceptable_cost=75000. Total ops: 71,313,272.
| Entry point | Baseline Total (μs) | ChainCluster Total (μs) |
|---|---|---|
| DoWork | 3,595,581 | 1,297,506 |
| CompareMainOrder | 1,445 | 1,531 |
| GetAncestors | 1 | 0 |
| GetDescendants | 8,543 | 8,361 |
| CountDistinctClusters | 575 | 2,474 |
| GetMainMemoryUsage | 157 | 156 |
| GetMainStagingDiagrams | 35,428 | 15,969 |
| IsOversized | 31,732 | 108,050 |
| StartStaging | 128 | 87 |
| AbortStaging | 0 | 0 |
| CommitStaging | 9,300 | 10,777 |
| TOTAL | 3,682,890 | 1,444,911 |
Mutations (not timed): ADD_TX 1,764,814, REMOVE_TX 33,050, ADD_DEP 791,630, SET_FEE 0, UNLINK_REF 1,764,814.
Trace analysis
analyze_trace.py on the same trace file (/tmp/txgraph.trace).
Note on oversized clusters: Clusters with size > 64 appear because the trace captures graph state at discrete snapshots (e.g. after each CommitStaging). Split is triggered lazily by DoWork/ApplyDependencies; at snapshot time, some clusters may not yet have been processed and thus exceed max_cluster_count.
Note on peak state: Peak state is the snapshot with the most transactions (27,512), not necessarily the one with the most chain clusters.
Trace format version: 1
Parameters: max_cluster_count=64 max_cluster_size=404000 acceptable_cost=75000
Processed 71313273 operations, 1763516 CommitStagings
Peak mempool size: 27512 transactions (at op [#51503379](/bitcoin-bitcoin/51503379/))
Unstaged mutations: 0 ADD_TX, 0 ADD_DEP, 0 REMOVE_TX, 1764814 UNLINK_REF
Edge sanity check (final state): OK - all edges reference live transactions
Edge sanity check (peak state): OK - all edges reference live transactions
Peak state (27,512 transactions, 8,596 clusters):
| Size | Clusters | Chains | Non-chain | Chain% |
|---|---|---|---|---|
| 1 | 7,628 | 7,628 | 0 | 100.0% |
| 2 | 575 | 575 | 0 | 100.0% |
| 3 | 103 | 72 | 31 | 69.9% |
| 4 | 75 | 58 | 17 | 77.3% |
| 5 | 21 | 18 | 3 | 85.7% |
| 6 | 9 | 4 | 5 | 44.4% |
| 7 | 12 | 9 | 3 | 75.0% |
| 8 | 4 | 1 | 3 | 25.0% |
| 9 | 31 | 30 | 1 | 96.8% |
| 10 | 25 | 23 | 2 | 92.0% |
| 13 | 1 | 1 | 0 | 100.0% |
| 16 | 1 | 0 | 1 | 0.0% |
| 20 | 2 | 2 | 0 | 100.0% |
| 24 | 10 | 9 | 1 | 90.0% |
| 25 | 90 | 87 | 3 | 96.7% |
| 26 | 3 | 0 | 3 | 0.0% |
| 29 | 1 | 0 | 1 | 0.0% |
| 33 | 1 | 0 | 1 | 0.0% |
| 48 | 1 | 0 | 1 | 0.0% |
| 51 | 1 | 0 | 1 | 0.0% |
| 59 | 1 | 0 | 1 | 0.0% |
| 14464 | 1 | 0 | 1 | 0.0% |
| TOTAL | 8,596 | 8,517 | 79 | 99.1% |
- Transactions in chain clusters: 12,355 (44.9%)
- Transactions in non-chain clusters: 15,157 (55.1%)
- Transactions with dependencies: 19,884 (72.3%)
- Excluding size=1 clusters: 4,727 in chains (23.8%), 15,157 in non-chains (76.2%), of 19,884 txs
- 1 cluster of size 14,464 exceeds
max_cluster_count=64(oversized)
Final state (6,079 transactions, 4,665 clusters):
| Size | Clusters | Chains | Non-chain | Chain% |
|---|---|---|---|---|
| 1 | 4,269 | 4,269 | 0 | 100.0% |
| 2 | 285 | 285 | 0 | 100.0% |
| 3 | 68 | 44 | 24 | 64.7% |
| 4 | 24 | 9 | 15 | 37.5% |
| 5 | 6 | 4 | 2 | 66.7% |
| 6 | 4 | 1 | 3 | 25.0% |
| 7 | 3 | 0 | 3 | 0.0% |
| 8 | 1 | 0 | 1 | 0.0% |
| 10 | 1 | 0 | 1 | 0.0% |
| 26 | 1 | 0 | 1 | 0.0% |
| 54 | 1 | 0 | 1 | 0.0% |
| 286 | 1 | 0 | 1 | 0.0% |
| 481 | 1 | 0 | 1 | 0.0% |
| TOTAL | 4,665 | 4,612 | 53 | 98.9% |
- Transactions in chain clusters: 5,033 (82.8%)
- Transactions in non-chain clusters: 1,046 (17.2%)
- Transactions with dependencies: 1,810 (29.8%)
- Excluding size=1 clusters: 764 in chains (42.2%), 1,046 in non-chains (57.8%), of 1,810 txs
- 2 clusters (sizes 286, 481) exceed
max_cluster_count=64
The trace file is about 0.5 GB, I'll find a place to upload it soon.
Memory Analysis
As per @instagibbs 's comment, I make some memory tests, the results are in below article: https://howhsu.github.io/thinking-in-bitcoin/articles/chain-cluster-memory.en.html
No-chain mempool
As @instagibbs points out, mempool usage patterns may evolve over time. We should evaluate the impact of this optimization in the no-chain scenario (i.e., when there are no chains of size ≥ 3; chains of size 2 are typically CPFP cases). Below link is the comparison and results. https://howhsu.github.io/thinking-in-bitcoin/articles/chain-cluster-nochain-baseline.en.html In short, chaincluster is ~21% faster, which is most likely caused by size=2 chains.
Compare with PR #34643
https://howhsu.github.io/thinking-in-bitcoin/articles/chain-linearize-vs-chaincluster.en.html Compare with the previous chain linearize idea. In short, chaincluster branch is about 15.5% faster.
Overhead of IsOversized and CountDistinctClusters
The increase in overhead for these two components is caused by chunk computation in Updated. This logic was moved from DoWork to IsOversized and CountDistinctClusters as part of the chain cluster design. For more details, see: https://howhsu.github.io/thinking-in-bitcoin/articles/chain-isoversized-chunk-overhead.en.html
Bench Environment
The above results were obtained on the test machine.
Hardware
| Item | Value |
|---|---|
| CPU | Intel Core i5-13600KF (14 cores, 20 threads) |
| Memory | 32 GB |
Host OS
| Item | Value |
|---|---|
| OS | Ubuntu 24.04 LTS (Noble Numbat) |
| Docker | 28.4.0 |
Docker Container
| Item | Value |
|---|---|
| Base OS | Debian 12 (bookworm) |
| GCC | 12.2.0 |
| CMake | 3.25.1 |
Fuzz testing
| Check | Result |
|---|---|
| Crash files | None |
| Error log | No assert / ERROR / deadly signal |
| Workers | All 8 finished normally |
| Total runs | ~39.6M |
| Duration | 36001 s (~10 h) |
** Fuzz Environment**
| Item | Value |
|---|---|
| Host kernel | Linux 6.17.0-14-generic (Ubuntu) |
| Container OS | Debian GNU/Linux 12 (bookworm) |
| Architecture | x86_64 |
| Compiler | Clang 20.1.8 |
| Fuzzer | libFuzzer |
| Sanitizers | AddressSanitizer, UndefinedBehaviorSanitizer |
| Target | txgraph |
| Corpus | fuzz_corpus/txgraph |