[RFC] CChain Concurrency Improvement (Base + Tail Architecture)

alexanderwiederin commented at 8:44 pm on January 27, 2026: none

RFC: CChain Concurrency Improvement

This PR implements a copy-on-write based CChain design to enable concurrent access.

Motivation

Current CChain uses a single vector protected by cs_main, creating bottlenecks for multi-threaded access. This PR lays the groundwork for removing cs_main locks by enabling consistent snapshots without holding locks.

Design

See full design document: CChain Concurrency (Out of date after mutex removal - will be updated soon)

TL;DR: Split chain into base (bulk of history) + tail (recent ~1,000 blocks). Use nested stlab::copy_on_write so updates copy only tail, keeping base shared.

Thanks to @purplekarrot for pointing me to stlab::copy_on_write and making design suggestions.

Changes

Implements copy-on-write based CChain with value semantics
Removes cs_main locks from RPC methods (where possible)
All validation locks remain in place (cs_main still held during validation)
API unchanged - backward compatible

Request for Comments

Is the nested copy-on-write architecture sound?
Is MAX_TAIL_SIZE = 1000 reasonable?
What are the concerns for future lock removal?
Is the added complexity worth the benefit?

DrahtBot commented at 8:44 pm on January 27, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34424.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	ismaelsadeeq

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34440 (Refactor CChain methods to use references, tests by optout21)
#32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by sedited)
#29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

required -> requires [Subject-verb agreement: “Concurrent access … required external synchronization.” should read “Concurrent access … requires external synchronization.” to be grammatically correct.]

2026-01-30 18:32:27

alexanderwiederin renamed this:
~~[RFC] Chain Concurrency Improvement~~
[RFC] Chain Concurrency Improvement (Base + Tail Architecture)
on Jan 27, 2026

alexanderwiederin renamed this:
~~[RFC] Chain Concurrency Improvement (Base + Tail Architecture)~~
[RFC] CChain Concurrency Improvement (Base + Tail Architecture)
on Jan 27, 2026

alexanderwiederin force-pushed on Jan 28, 2026

in src/validation.h:1153 in 6b4db96405 outdated

1152@@ -1153,6 +1153,10 @@ class ChainstateManager
1153     //! @{

ismaelsadeeq commented at 1:48 pm on January 28, 2026:

In “validation: add ActiveChainSnapshot() method” 6b4db9640550b8f4cacc8debdb6eaf5a941885c9

nit: In commit message body s/acwuires/acquires

alexanderwiederin commented at 4:27 pm on January 28, 2026:

Thanks!

in src/util/copy_on_write.h:2 in 490324dd7a outdated

0@@ -0,0 +1,260 @@
1+// Copyright (c) 2013 Adobe
2+// Copyright (c) 2025-present The Bitcoin Core developers

ismaelsadeeq commented at 1:52 pm on January 28, 2026:

In “util: add copy-on-write smart pointer” 490324dd7a07dfc44bfbc06b57b0c36df29581eb

Is this a direct copy? Or has modifications, perhaps add a link in the commit message to the reference implementation.

I also think we should have a basic unit test for this utility.

alexanderwiederin commented at 4:33 pm on January 28, 2026:

It’s practically a direct copy. I removed the comments, documentation and deprecated fields and methods to keep it minimal. I also left a link to https://github.com/stlab/stlab-copy-on-write on the file header and just added Original source: https://github.com/stlab/stlab-copy-on-write/tree/abb4445 to the commit message.

Agree on the unit tests for this!

in src/chain.cpp:22 in be06d774f3

18@@ -15,12 +19,25 @@ std::string CBlockIndex::ToString() const
19 
20 void CChain::SetTip(CBlockIndex& block)
21 {
22-    CBlockIndex* pindex = &block;
23-    vChain.resize(pindex->nHeight + 1);
24-    while (pindex && vChain[pindex->nHeight] != pindex) {
25-        vChain[pindex->nHeight] = pindex;
26-        pindex = pindex->pprev;
27+    std::lock_guard<std::mutex> lock(m_write_mutex);

ismaelsadeeq commented at 1:53 pm on January 28, 2026:

In “refactor: Implement copy-on-write semantics for CChain” be06d774f34d05c7089fc388f7407331b54eb46a

Why are you not using our LOCK macro?

alexanderwiederin commented at 4:27 pm on January 28, 2026:

Implemented LOCK now. Thanks!

in src/validation.h:1156 in 6b4db96405 outdated

1152@@ -1153,6 +1153,10 @@ class ChainstateManager
1153     //! @{
1154     Chainstate& ActiveChainstate() const;
1155     CChain& ActiveChain() const EXCLUSIVE_LOCKS_REQUIRED(GetMutex()) { return ActiveChainstate().m_chain; }
1156+    CChain ActiveChainSnapshot() const {

ismaelsadeeq commented at 2:00 pm on January 28, 2026:

In “validation: add ActiveChainSnapshot() method” 6b4db9640550b8f4cacc8debdb6eaf5a941885c9

It reduces lock contention, indeed, but one downside is that it can be inconvenient for someone who just wants to know, e.g., the chain tip to get height or hash in a concurrent way. They have to copy the whole chain.

We should also support copying the chain tip, and returning the latest block index in the chain, not copying the whole chain.

ismaelsadeeq commented at 2:22 pm on January 28, 2026:

Of course, they can use the old ActiveChain, lock and make a copy, but in my opinion, we should strive to make the locking of cs_main internal to ChainstateManager and simply return copies of these values. Future code should only handle locks directly in performance-critical paths where making a copy is too expensive, or in synchronous contexts where the caller already holds the lock.

janb84 commented at 2:25 pm on January 28, 2026:

For this RFC we chose to do a limited feature set. I agree that the end goal is something like you are describing.

alexanderwiederin commented at 9:15 am on January 29, 2026:

@ismaelsadeeq let me think about the best approach here. Thanks for the feedback.

alexanderwiederin commented at 4:11 pm on January 30, 2026:

@ismaelsadeeq In the last push I refactored ActiveTip and ActiveHeight to use the snapshots and deduce the values from there. To reiterate, taking a copy of the chain is cheap due to copy-on-write - the underlying data is reference-counted. ActiveTip and ActiveHeight no longer require holding cs_main.

I’ve also removed the write_mutex on the CChain class in my last push to lean more into value semantics. The idea is that we treat the chain similar to how we would treat a std::vector. It’s not internally thread-safe, and we expect the owner to synchronise.

I’ve also added documentation to CChain describing the performance characteristics and copy-on-write behaviour.

ismaelsadeeq commented at 2:02 pm on January 28, 2026: member

Concept ACK on this.

Looks simpler than I thought.

A few comments after a quick skim through the PR

darosior commented at 3:35 pm on January 28, 2026: member

Design

See full design document: CChain Concurrency

In the design section you describe the initial state with a single block in the tail, but a merge operation empties it. It seems nicer for those to be consistent?

Is MAX_TAIL_SIZE = 1000 reasonable?

I guess you could go an order of magnitude lower and match the coinbase maturity, if we need to rewrite the chain this deep we have bigger problems than a large amount of copies?

util: add copy-on-write smart pointer

Introduce copy-on-write wrapper from stlab library to enable efficient
value semantics with shared ownership. Uses atomic reference counting to
share instances and only copies on modification when non-unique.

Based on Adobe's stlab library, distributed under Boost Software License
1.0

Original source: https://github.com/stlab/stlab-copy-on-write/tree/abb4445

Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>

bca4994478

alexanderwiederin force-pushed on Jan 28, 2026

alexanderwiederin commented at 11:18 am on January 29, 2026: none

Appreciate the feedback @darosior!

Is MAX_TAIL_SIZE = 1000 reasonable?

I guess you could go an order of magnitude lower and match the coinbase maturity, if we need to rewrite the chain this deep we have bigger problems than a large amount of copies?

I think the trade-off for MAX_TAIL_SIZE is primarily about performance during normal sequential operation rather than rewrites.

Assuming the tail is copied on every AppendToTail and the base is copied on every MergeTailIntoBase (the conservative case with copy-on-write), the question becomes: should we optimise for cheap AppendToTail operations (smaller tail) or infrequent MergeTailIntoBase operations (larger tail)?

The key insight is that MergeTailIntoBase copies the entire base, so its cost scales with chain length. Therefore:

Longer chains benefit from larger MAX_TAIL_SIZE (fewer expensive base copies)
Shorter chains benefit from smaller MAX_TAIL_SIZE (cheaper tail copies)

So far, the change has only resulted in a negligible IBD performance reduction, but I think we should pay attention to this parameter.

alexanderwiederin force-pushed on Jan 30, 2026

sipa commented at 4:11 pm on January 30, 2026: member

Is there actually a need for stable CChain objects that live for non-trivial amounts of time?

If not, it seems a simpler alternative might be to just give CChain its own internal non-exposed mutex - allowing access to it without holding cs_main?

refactor: implement copy-on-write semantics for CChain

Refactor CChain into a regular type with copy-on-write semantics, using
a split base/tail architecture to enable snapshots.

Key changes:
- Make CChain a regular type supporting copy construction/assignment
- Introduce COW Impl struct holding base (COW vector) and tail (mutable
  vector)
- Optimize sequential appends by accumulating in tail until
  MAX_TAIL_SIZE
- Merge tail into base only when size threshold is reached
- Handle reorgs by rebuilding base and clearing tail
- Update all methods (FindFork, FindEarliestAtLeast, etx.) for split
  storage

This allows creating chain snapshots via simple copy while sharing the
bulk of the data. Copies only occur on modification.

Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>

550204e5db

validation: add ActiveChainSnapshot() method

Add ActiveChainSnapshot() to ChainstateManager to enable lock-free
access to the active chain. Unlike ActiveChain() which requires holding
cs_main, this methods acquires the lock internally and returns a
copy-on-write snapshot of the chain.

Refactor ActiveTip() and ActiveHeight() to use ActiveChainSnapshot()
instead of ActiveChain(), removing the EXCLUSIVE_LOCKS_REQUIRED
annotation. This allows these methods to be called without holding
cs_main, as they now manage locking internally.

This reduces lock contention by allowing RPC methods and other consumers
to work with a consistent view of the chain without holding cs_main for
extended periods.

Co-authored-by: janb84 <608446+janb84@users.noreply.github.com>

342ae66371

rpc: use chain snapshot in mining RPC

Replace cs_main-locked chain access with ActiveChainSnapshot() in
getnetworkhashps to reduce lock contention.

94c21391c2

rpc: use chain snapshots in blockchain RPCs

Replace cs_main-locked chain access with ActiveChainSnapshot(),
ActiveTip() and ActiveHeight() in blockchain RPC methods to reduce
lock contention. These methods acquire cs_main internally and provide a
consistent view of the chain without requiring the caller to hold
cs_main throughout.

For RPCs that also need BlockIndex lookups, cs_main is still acquired
but only for the BlockIndex access, not for chain queries.

Updated RPCs:
- getblock, getblockheader: ActiveTip() for tip, cs_main for block lookup
- getchaintips: ActiveChainSnapshot() for active chain, cs_main for
  BlockIndex iteration
- getblockcount, getbestblockhash: ActiveHeight() only no cs_main needed
- geblockhash, getdifficulty: ActiveChainSnapshot() via
  ParseHashOrHeight
- pruneblockchain: ActiveHeight() for height calculation, cs_main for
  pruning
- getchaintxstats: ActiveChainSnapshot() for tip and Contains check

711b3f4647

alexanderwiederin force-pushed on Jan 30, 2026

alexanderwiederin commented at 1:00 pm on February 2, 2026: none

Is there actually a need for stable CChain objects that live for non-trivial amounts of time?

Based on anecdotal experience, I believe yes. From what I understand this could be interesting for: a) RPC users in the context of mining and b) kernel API consumers for use cases like indexing, analytics and monitoring.

But I acknowledge this needs a more concrete answer.

In the meantime, @sedited and/or @Sjors, do you want to chime in?

sipa commented at 1:21 pm on February 2, 2026: member

I believe that due to the fact that CBlockIndex::GetAncestor() is fast and usable without cs_main (or any locking), most if not all uses of CChain that need a stable view can be rewritten to just query the highest-height block they need, and then use CBlockIndex::GetAncestor() from there to get to earlier blocks.

CBlockIndex::GetAncestor() is less efficient than CChain::operator[], but I don’t think the difference is sufficient to really matter for any uses.

I haven’t tried this, or dug into what callers would need to change for this, so feel free to find counterexamples. But absent that, my thinking at a high level is that the CBlockIndex pprev/pskip structure effectively already allows lock-free access, and CChain is mostly an optimization that can be used when one does have access to cs_main (or whatever lock protects the chain structures). If some of those call sites need reduction in their locking, the first thing to try would be to switch them over to CBlockIndex::GetAncestor().

sedited commented at 1:38 pm on February 2, 2026: contributor

Re #34424 (comment)

Is there actually a need for stable CChain objects that live for non-trivial amounts of time?

There are a number of rpc calls where a stable CChain object is held for some amount of time, .e.g. getchaintips might do multiple deep lookups and traversals down to various fork points in the chain. Granted, none of these currently benefit from this change, because they currently also require a lock to access the block index, though that might be easier to workaround after this.

I see this change as a first step towards moving some other data structures into a similar pattern and cleaning up our locking annotations for the chain and block index data structures (for example making the pprev pointer const).

What might be less compelling for developers in this codebase is that the approach here arguably provides an API to the chain that is fairly easy to reason about for external, or rather kernel library, users.

sipa commented at 1:43 pm on February 2, 2026: member

I certainly have no objections to improving locking annotations, and making locking cleaner. But in this case, I’m really not convinced: if you need lock-free access, just don’t use CChain. It’s a precomputed index-by-height for O(1) access rather than the O(log^2 n) access that CBlockIndex::GetAncestor() provides, but at the cost of needing synchronization. If that synchronization is burdensome, just don’t use it.

Specifically for getchaintips, you can get the current active tip once at the beginning, and then instead of active_chain.Contains(block), use active_tip.GetAncestor(block->nHeight) == block.

sedited commented at 2:25 pm on February 2, 2026: contributor

re #34424 (comment)

It’s a precomputed index-by-height for O(1) access rather than the O(log^2 n) access that CBlockIndex::GetAncestor() provides, but at the cost of needing synchronization

Mmh, going by its prevalence in the current code base it seems like developers would rather have random O(1) access than forego synchronization :P

To provide a little more background, this topic was repeatedly brought up during review of the kernel api, where it was repeatedly requested to provide optimized, indexed, random access to entries in the block map without the need for synchronization. If there is no greater appetite for that here, then I guess that should be abandoned. If developers really require repeated, fast, random height queries, they can also build such an index themselves.

I do agree though that in most cases in our code a single query to the tip (that could be protected by its own lock as per your suggestion), and then a lookup to GetAncestor is probably fast enough. I’m curious how you would weigh between needing synchronization and doing a quick lookup?

sipa commented at 2:41 pm on February 2, 2026: member

Mmh, going by its prevalence in the current code base it seems like developers would rather have random O(1) access than forego synchronization :P

Maybe, or that place in the code held a lock already, so there was no reason not to use CChain.

Maybe we should investigate what would be required to drop CChain entirely. Maybe it’s just fine. Maybe it needs a few tweaks to some algorithms (like sorting certain queries by height to minimize the distances fed to GetAncestor). But if that were possible, without unacceptable performance degradations, then I think it’d be useful evidence that there is no need to expose an unsynchronized CChain - or even to actually remove it internally.

sipa commented at 3:27 pm on February 2, 2026: member

There do seem to be a lot of places where CChain::Next() is called, and I’m not sure that for all of those the overhead of a tip.GetAncestor(block->nHeight + 1) would be acceptable.

Still, maybe it’s possible to add an internal mutex to CChain, which is only held during its own function calls. This would allow exposing “get at given height in active chain”, “get successor in main chain”, “find fork point with active chain”, “check if in main chain” functions, which don’t need external synchronization (so no cs_main needed), but also lack stability. Anything that needs stability can use the GetAncestor alternative. It wouldn’t surprise me that this is sufficient for our own use cases.

alexanderwiederin commented at 9:57 pm on February 3, 2026: none

Your suggestions make sense - since CChain usage is mostly incidental to other reasons for holding cs_main, optimising CChain operations is misplaced. Simpler alternatives like GetAncestor or an internal mutex should suffice for chain access.

I’ll reconsider the approach with this in mind - thanks for taking the time!

alexanderwiederin closed this on Feb 5, 2026

[RFC] CChain Concurrency Improvement (Base + Tail Architecture) #34424