Following up on #24760 ("The BlockIndex/BlockMap should not live in memory all the time"), which was closed as stale in 2024 with "feel free to open a new issue". In that thread the main open question was empirical: is the block index actually a meaningful share of memory, and is it worth the complexity? Here is some measurement, plus a motivation that is sharper than the default-node case discussed in 2024: memory-constrained / low-power nodes.
Measurements
BlockMap = std::unordered_map<uint256, CBlockIndex> at current mainnet height (~955k entries).
Microbench (real unordered_map, default allocator, 955k entries, peak RSS):
| entry layout | sizeof | BlockMap RSS | bytes/entry | delta |
|---|---|---|---|---|
| current CBlockIndex | 144 B | 231 MB | ~254 B | - |
| drop cached header (re-read from BlockTreeDB) | 96 B | 173 MB | ~190 B | -58 MB |
| DB-backed essentials only | 40 B | 114 MB | ~126 B | -117 MB |
(The ~254 B/entry includes real map node + bucket overhead, matching the ~216-224 B figure
from #24760. It is notably larger than sizeof-based math, so measuring matters.)
Real-node confirmation (a synced mainnet node, restart, RSS sampled through the load phases): the fresh process goes from ~120 MB to a settled ~340 MB across the ~6 s block-index load ("Loading block index" -> "Loaded best chain"), i.e. ~220 MB of resident BlockMap, consistent with the microbench. ("Using 2.0 MiB for block index database" confirms this is the in-RAM map, not LevelDB cache.) It grows with chain height.
Why this matters more for constrained nodes
On a default node, dbcache + mempool dominate and the block index is a minor share, which is
roughly the conclusion of the 2024 thread. But on a deliberately lean low-power node
(-blocksonly, no txindex/blockfilterindex, minimal -dbcache), those consumers shrink
to near-nothing and the block index becomes the dominant non-reclaimable floor - roughly
~220 MB out of a ~300-400 MB total. The chainstate is mmap'd and reclaimable; the BlockMap is
not. For a 512 MB-class device, 58-117 MB is 11-23% of total RAM.
Possible directions (in increasing scope)
- Lazy-load the cached block header (
nVersion,hashMerkleRoot,nTime,nBits,nNonce; 48 B) and re-read fromBlockTreeDBon demand, keeping a small recent-window cache for hot paths (MTP, header relay). These fields are already persisted on disk, and are read through a bounded surface (GetBlockHeader()plus ~34 direct accesses). ~58 MB. Lowest risk. - sipa's suggestion in #24760: decouple
CBlockIndex::pprev, pull prev from DB/cache and link by hash (loopingpprevis rare outside deep reorgs; height arithmetic covers the common case). Enables a DB-backed, much smaller resident index. ~117 MB+. Larger change.
Caveats and open questions
- The savings figures are from a faithful microbench + a real-node baseline confirmation, not yet a full modified-bitcoind before/after. Happy to build that prototype if there is appetite.
- Tradeoff is RAM vs occasional disk reads (header serve, deep-reorg traversal) and added complexity in foundational code. Whether that tradeoff is worth it for the constrained-node use case is the real question.
- Is there interest in this with the low-power framing, and if so which direction is preferred before someone invests in a full implementation?