coins: pack cache entry flags into list pointer #35577

pull ptrinh wants to merge 2 commits into bitcoin:master from ptrinh:coins-pack-cache-entry-flags changing 5 files +193 −21
  1. ptrinh commented at 5:39 PM on June 21, 2026: none

    Summary

    CCoinsCacheEntry keeps its DIRTY/FRESH state in a separate uint8_t m_flags. Since the struct also holds two CoinsCachePair* list pointers, that one byte forces 7 bytes of trailing padding - so the flags cost 8 bytes per cached UTXO.

    CoinsCachePair is pointer-aligned (alignment >= 8), so the low 3 bits of the m_prev pointer are always zero. This PR packs the 2 flag bits into those low bits and removes the m_flags member, funnelling all flag/list access through four small private helpers (GetFlags / PrevPtr / SetPrevPtr / SetFlagBits). The linked-list logic is otherwise unchanged.

    • sizeof(CCoinsCacheEntry): 80 -> 72 bytes
    • pool-allocator block: 152 -> 144 bytes

    No on-disk or consensus format change.

    Motivation

    The UTXO cache is the dominant RAM consumer during IBD, and reducing bytes-per-coin lets a given -dbcache hold more coins and flush less often. This helps memory-constrained / low-end nodes in particular.

    Measurements

    Memory (mainnet IBD, identical config, compared at matching heights via the cache=<MiB>(<txo>) figure in the UpdateTip log = CoinsTip().DynamicMemoryUsage()):

    height stock B/UTXO patched B/UTXO reduction
    150000 150.1 142.5 -5.1%
    180000 153.1 145.3 -5.1%
    200000 147.8 139.9 -5.3%

    CPU (bench_bitcoin -filter=CCoinsCaching, interleaved runs): stock mean 199.1 ns/op vs patched 200.1 ns/op - within run-to-run noise; no measurable regression from the masking. (Run on a laptop with ~15% variance; happy to have reviewers re-run on a quiet machine.)

    Testing

    coins_tests, coinscachepair_tests, validation_flush_tests pass unchanged.

    Notes / open questions

    • The flagged-list design was introduced in #28280; this builds directly on its encapsulated flag access.
    • Trade-off for review: 8 bytes/coin vs. a small amount of bit-masking on flag access. Benchmarks above suggest the masking is free in practice, but reviewers may weigh the added indirection.
  2. DrahtBot added the label UTXO Db and Indexes on Jun 21, 2026
  3. DrahtBot commented at 5:39 PM on June 21, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35577.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34864 (coins: make cache freshness imply dirtiness and remove invalid test states by l0rinc)
    • #34075 (fees: Introduce Mempool Based Fee Estimation to reduce overestimation by ismaelsadeeq)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. pinheadmz commented at 5:42 PM on June 21, 2026: member

    Looks a bit like a refactor for slim gain, with LLM generated description? @ptrinh how well do you personally understand the changes? A 5% reduction in memory usage is not nothing, but is there any performance gain? Did IBD up to block 200000 (or 900000) go any faster by moving these two bits?

  5. ptrinh commented at 6:04 PM on June 21, 2026: none

    @pinheadmz Fair questions, let me address both.

    On understanding: the change moves the DIRTY/FRESH flags out of the separate m_flags byte and into the low 2 bits of the m_prev list pointer. That works because CoinsCachePair is pointer-aligned, so the low 3 bits of any real pointer are zero and free to reuse. The point isn't the refactor itself, it's that the standalone byte forced 7 bytes of trailing padding, so removing it shrinks CCoinsCacheEntry from 80 to 72 bytes (pool block 152 to 144). The flag/list logic in AddFlags/SetClean/SelfRef is unchanged, just routed through small accessors that mask the low bits.

    On performance, you're right that the original description only justified memory and didn't show a speedup, so I measured it. The 5% only helps when you're cache-bound (small -dbcache, frequent flushes); with a large dbcache there's no flush pressure and I'd expect no difference, which matches a CCoinsCaching microbench that showed no change.

    To exercise the cache-bound case I ran reindex-chainstate to height 200000 with -dbcache=100 -par=1 -connect=0, same blocks, both binaries built from this tree (only coins.h differs):

    • Stock did one cache-pressure flush (LARGE) before 200k; patched did zero. Peak coins cache 366 MiB stock vs 351 MiB patched.
    • Wall clock, 6 runs each, interleaved in both orders to cancel page-cache warmth: stock mean 28.9s, patched mean 27.8s (about -3.6%, ranges barely overlap).

    So in the constrained regime there is a small but consistent speedup from the avoided flush, on top of the memory reduction. I did not run a full IBD to 900000 (would need the full chain and hours), so I can't claim a number there; the benefit should matter most for memory-limited nodes.

    Happy to rewrite the PR description more plainly, add the benchmark numbers, or drop it if you don't think the tradeoff (a few bit-mask ops on flag access for 8 bytes/coin) is worth it.

  6. purpleKarrot commented at 6:56 AM on June 23, 2026: contributor

    @ptrinh, that looks like an interesting low level optimization. Out of interest, can you share how you decided to pick this task? Did you look at the code, saw flags next to a pointer, thought about pointer tagging and then did a benchmark to measure the effect? Or did you start with an analysis to find bottlenecks? I would encourage you to write a blog post about this!

    Comment on the code: I would prefer if the pointer tagging functionality was extracted into a reusable class, like PointerTagPair which can then later be replaced with std::pointer_tag_pair once P3125 is available.

  7. ptrinh commented at 8:15 AM on June 23, 2026: none

    @purpleKarrot Thanks! Honest answer: this didn't come from me spotting it in the code cold. It came out of an exploration of running Bitcoin Core on low-power ARM hardware and profiling where IBD actually spends memory and CPU. I leaned on some AI-assisted tooling for the profiling. The analysis broke down the ~160 bytes/UTXO in the cache, and the thing that stood out was the small m_flags byte sitting next to the list pointers, forcing 7 bytes of padding. The adjacency is what suggested tagging; the benchmark came afterward, to check it was more than a cosmetic refactor.

    On the code suggestion: agreed, open-coding the masks in CCoinsCacheEntry isn't ideal. I'll extract it into a reusable PointerTagPair (and yes, structured so it can be swapped for std::pointer_tag_pair if P3125 lands). That's a nicer home for it regardless of whether this particular change goes in, given the overlap with #34864.

  8. util: add PointerTagPair
    Add a small helper that stores a pointer together with a few-bit tag in
    the same word, reusing the low alignment bits of the pointer that are
    always zero. The caller is responsible for ensuring the pointee is aligned
    enough to leave those bits free.
    
    This is a stop-gap for std::pointer_tag_pair (P3125); the interface is kept
    deliberately close so it can be swapped once that is available.
    8ed9f8f45c
  9. coins: pack cache entry flags into list pointer
    CCoinsCacheEntry stores DIRTY/FRESH state in a separate uint8_t m_flags.
    Because the struct also holds two CoinsCachePair pointers, that single byte
    forces 7 bytes of trailing padding, so the flags effectively cost 8 bytes
    per cached UTXO.
    
    CoinsCachePair is pointer-aligned, so the low bits of the m_prev list
    pointer are always zero. Use PointerTagPair to store the 2 flag bits there
    and drop the m_flags member. A static_assert guards the required alignment,
    and the linked-list logic in AddFlags/SetClean/SelfRef is otherwise
    unchanged.
    
    sizeof(CCoinsCacheEntry) drops from 80 to 72 bytes and the pool allocator
    block from 152 to 144 bytes. On a mainnet sync this reduces UTXO cache
    memory by ~5% (~8 bytes/UTXO; e.g. 147.8 -> 139.9 bytes/UTXO at height
    200000), letting a given -dbcache hold proportionally more coins and flush
    less often during IBD.
    
    No on-disk or consensus format change. A new coinscachepair_tests case
    checks that setting both flags on every node leaves the forward and backward
    list pointers intact. coins_tests and validation_flush_tests pass unchanged.
    The CCoinsCaching benchmark shows no measurable change (means within
    run-to-run noise), i.e. the pointer masking adds no detectable CPU cost.
    35cbbbaff0
  10. ptrinh force-pushed on Jun 23, 2026
  11. in src/coins.h:143 in 35cbbbaff0
     147 | +            pair.second.m_prev.pointer()->second.m_next = &pair;
     148 |          }
     149 | -        Assume(pair.second.m_prev && pair.second.m_next);
     150 | -        pair.second.m_flags |= flags;
     151 | +        Assume(pair.second.m_prev.pointer() && pair.second.m_next);
     152 | +        pair.second.m_prev.set_tag(pair.second.m_prev.tag() | flags);
    


    l0rinc commented at 9:49 PM on June 23, 2026:

    We're basically changing the flags to a boolean here - which was also done in https://github.com/bitcoin/bitcoin/pull/34864/changes#diff-095ce1081a930998a10b37358fae5499ac47f8cb6f25f5df5d88e920a54e0341

    I'm not against this change but I would prefer other cleanups in this area before we optimize this part prematurely. We also thought of packing more data in here, possibly even getting rid of the m_prev and only do a singly linked list instead, my full description to @andrewtoth looked like this:

    I brushed up on #28280 to rehash (pun intended) why we have a doubly linked list.

    First, having pointer links to map entries is much cheaper than having a separate map or bimap keyed (and valued) by COutPoint. Each COutPoint alone is already around 36 bytes, so any extra index that stores additional COutPoints would be significantly larger than just storing a couple of pointers per flagged entry.

    The linked list of map entries is indeed iterated linearly, but deletions of spent FRESH entries can currently be done in O(1) time, since we can safely update the previous entry to point to the current entry's next (and vice versa). With a singly linked list we can't easily do that: to remove a node we need its predecessor, but we only have a pointer to the current element. We also can't safely do a "copy-next and delete next" trick or Extract/Reinsert, because the unordered_map key is const and semantically tied to its value.

    We also can't do deletion by starting from the sentinel on every deletion (too costly), and we can't just leave entries around until BatchWrite sees them, because that would saturate the memory and cause premature flushes needlessly.

    But what is we kept the tombstones instead of deleting immediately and to track a count of deletable entries. When there is memory pressure and a significant number of deletions have accumulated, we could run a compaction phase first: iterate the singly linked list once, maintain a prev pointer during iteration, and physically remove the tombstone entries from both the map and the linked list during that pass. After compaction we could re-evaluate whether a BatchWrite is still needed (i.e. memory pressure still exists).

    This should keep individual deletions amortized constant time, and would let us drop the back pointer, reducing CCoinsCacheEntry from 80 bytes to 72. On top of that, we could explore more aggressive representation tricks: for example, packing Coin and CCoinsCacheEntry more tightly (which could get us closer to around 60–64 bytes instead of 80) and maybe even hiding the DIRTY and FRESH flag bits in the low bits of the CoinsCachePair* pointer (using alignment guarantees). Those would be more intrusive and would need careful evaluation and benchmarking and probably not worth it, but just the removal of one of the pointers would get us tens of thousands of blocks for the same cache size.

    we could even add an interface to Coin and inline Coin coin in CCoinsCacheEntry which would inherit from the ICoin interface - since inlining reconsiders the alignments and CCoinsCacheEntry would get even smaller

  12. l0rinc changes_requested
  13. l0rinc commented at 9:56 PM on June 23, 2026: contributor

    We should consider this, but I personally don't think we're ready for this, let's get the other related refactors in first (e.g. #34864)

  14. ptrinh commented at 12:22 AM on June 24, 2026: none

    Thanks @l0rinc, that's a really helpful writeup and the reasoning makes sense. Agreed this is premature given #34864 and the other cleanups in this area, and that dropping m_prev via a singly-linked list with compaction is a cleaner route to the same 80 -> 72 than tagging the flags onto the back pointer.

    I'll close this so it doesn't sit as a conflicting PR. The PointerTagPair helper can resurface later if the more intrusive packing ever turns out to be worth it. Happy to help review/test #34864 in the meantime.

  15. ptrinh closed this on Jun 24, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-30 07:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me