This PR relies on PR #34625
Summary
This PR adds TryLinearizeChain(), an O(N) specialised linearizer for chain-shaped
clusters, and a benchmark targeting the feerate distribution that triggers the worst-case
behaviour of the general SFL algorithm on chains.
Commits:
ad15466be1e4cluster_linearize: add O(N) fast path for chain-shaped clusters72bf0faf820ccluster_linearize: benchmark Linearize on worst-case chain feerate distribution
Motivation
Instrumentation added to a running node revealed that virtually most cluster seen in the mempool is chain-shaped — each transaction spends an output of the previous one. This is consistent with widespread use of CPFP (Child-Pays-For-Parent) fee bumping, where a low-feerate parent transaction is paired with a high-feerate child that spends one of its outputs, forming a two-transaction chain; longer chains arise from repeated CPFP stacking or batch-payment patterns.
Because chain clusters dominate in practice, a topology-specific O(N) fast path has disproportionately large impact compared with optimising for rarer topologies.
The general Linearize() function uses the Spanning Forest Linearization (SFL) algorithm, whose
MakeTopological() phase merges chunks iteratively. For a chain cluster with N transactions
and strictly increasing individual feerates (fee_i = i+1), MakeTopological triggers a
cascade that merges the entire chain into one chunk step-by-step. At merge step k, two
operations each scan the growing top-part chunk of size k:
UpdateChunkiterates over all k transactions to write theirchunk_rep(unconditionally, regardless of any inner-loop conditions).MergeChunksiterates over all k transactions to locate the dependency connecting top and bottom chunks.
This accumulates O(1+2+…+(N-1)) = O(N²) loop iterations across the full cascade. An O(N) specialised path avoids the SFL overhead entirely for chain-shaped clusters.
Algorithm
Detection (TryLinearizeChain)
A cluster with N transactions is a chain if and only if its ancestor-set sizes form exactly
{1, 2, …, N}. This is checked in O(N) using a boolean presence array;
Ancestors(i).Count() is O(1) for the fixed-size BitSet<64> used here.
Topological order
The node with k ancestors occupies position k−1 in the unique topological order. Recovered in O(N) by a single mapping pass.
Why PostLinearize is not needed
The topological order of a chain cluster is unique, and every adjacent pair has a dependency,
so ChunkLinearizationInfo’s unconditional merge and PostLinearize’s merge/swap behave
identically (the swap branch never triggers). The topological order itself is the optimal
linearization, and PostLinearize can be safely skipped.
The result is returned early from Linearize() before the SFL machinery is initialised:
0// In Linearize():
1if (auto chain_lin = TryLinearizeChain(depgraph); !chain_lin.empty()) {
2 return {std::move(chain_lin), /*optimal=*/true, /*cost=*/0, /*is_chain=*/true};
3}
The fourth return value (is_chain) allows the caller (Relinearize()) to skip the
PostLinearize() pass, which is redundant for chains: TryLinearizeChain already
produces an optimal, connected linearization.
Benchmark results
LinearizeOptimallyMonotoneChainTotal — increasing feerates (SFL worst case)
MakeMonotoneChainGraph builds a chain with fee_i = i+1, size_i = 1. For the SFL
algorithm, every prefix of the chain has feerate (k+1)/2 (strictly increasing with length),
so MakeTopological must merge the entire chain into a single chunk through N−1 steps.
Each step scans the growing top-part chunk (via UpdateChunk’s unconditional chunk_rep
write loop and MergeChunks’s dep-search loop), accumulating O(N²) iterations in total.
TryLinearizeChain eliminates this entirely.
| Size | Before (ns/op) | After (ns/op) | Speedup |
|---|---|---|---|
| 9 tx | 973 | 48 | 20× |
| 16 tx | 1,772 | 72 | 25× |
| 32 tx | 3,880 | 129 | 30× |
| 48 tx | 6,205 | 185 | 34× |
| 63 tx | 8,551 | 238 | 36× |
Instructions per call after optimisation:
| N | Total instructions | Instructions/tx |
|---|---|---|
| 9 | 954 | 106 |
| 16 | 1,360 | 85 |
| 32 | 2,295 | 72 |
| 48 | 3,230 | 67 |
| 63 | 4,100 | 65 |
The “after” timings confirm that TryLinearizeChain runs in O(N) regardless of feerate
distribution (~65–106 instructions per transaction at N=9..63, constant converging as N
grows). The speedup grows with N because the baseline trends towards O(N²) while the
optimised path remains O(N).