The BlockIndex/BlockMap should not live in memory all the time #24760

issue TheQuantumPhysicist openend this issue on April 4, 2022
  1. TheQuantumPhysicist commented at 1:53 pm on April 4, 2022: contributor

    Hello everyone. I have a proposal to improve the memory usage in Bitcoin Core. I hope I can win your attention and interest for this :-)

    Introduction

    When the bitcoin blockchain was small, it made sense to load the block index to memory. It acts as an optimization where the block index is basically a handy tool to know whether blocks exist, and learn certain characteristics about them, such as their error state, number of transactions, header, etc.

    However, the problem with the block index is that it’s unbounded AND grows in memory; strictly. Loading bitcoin core already occupies tons of memory and takes lots of time on small machines. We’re over 2 GB of memory at times, and that’s growing. Running bitcoin core on smaller machines is even impossible (such as Raspberry Pi Zero) because of that (without using swap memory and similar workarounds, which have their problems); and there’s no reason for that other than memory restrictions. Bitcoin transaction and block processing is simple and isn’t really that demanding.

    From a probabilistic point of view, the higher the number of blocks, the less likely we need all the block index entries all the time after a block is synced.

    The criticality of this issue depends on how one looks at it; maybe “This is fine” (pun intended). To me, unbounded growing data in memory is not the right way to go, and sooner or later this will become a real problem, especially that bitcoin is aimed to decentralization by allowing everyone to run a node. The logic is very simple: The more memory bitcoin core uses, the less people will want to run a node (all the time).

    I would like to propose a solution (and work on it) that makes the memory usage bounded.

    I would appreciate your opinions on the matter primarily to learn what you think and secondarily to avoid expending efforts that aren’t appreciated in the bitcoin dev community. Especially that I’m not really innovating here (e.g., upgrading consensus), but I’m attempting to solve a technical issue that I solved before elsewhere.

    The solution

    TLDR: LRU cache for the block index.

    Long answer: There is already an LRU cache in leveldb controlled using the command line argument -dbcache, which distributes memory over different parts on initialization. However, this isn’t used for the block index and its dependents. Depending on performance metrics, we can either simply read the block index directly from leveldb through a proper interface (and use its LRU cache), or we can write our own LRU cache that lives between leveldb and txdb or even between txdb and that block index interface directly (I have written a concurrent transactional LRU cache before, but that’s another discussion). If the one from leveldb performs well enough, maybe that’s good enough, and the task becomes much simpler. This can be done in multiple steps over time and not all at once.

    Effects

    The block manager (singleton) is the responsible owner of the block index. Bitcoin Core uses this map to directly locate the block index by simply doing a find, since it’s just an unordered map/hash map.

    0typedef std::unordered_map<uint256, CBlockIndex*, BlockHasher> BlockMap;
    

    Let’s discuss the effects of this proposal

    First effect: CBlockIndex by value, not by reference

    The correct way I see to do this is to have some interface instead of accessing the BlockMap directly, which will act indirectly a value-based map (as opposed to pointers). In its simplest form, it’s implemented to just use the same map. The side effect of this will be that modifying the block index of a block in any way has to be done through a proper read-modify-write operation. While this seems too elaborate compared to the pointer stuff done now, we have to acknowledge this is what we do anyway since cs_main lock is held during any modification/addition of a block index entry. The only difference is that because of the isolation resulting from this solution, there might be a little performance drawback because of copying back and forth. However, compilers may be able to optimize this if the block index entry lives only in memory. It’s a long shot, though.

    Second effect: No more linking blocks by pointer

    Currently, blocks are linked together using pointers. Every block index has a pointer that links to the previous block in memory. By simply using the CBlockIndex::pprev member, we get the previous block’s block index. This has to change to a function that pulls prev from the database/cache. The linking should not be done by block index but by block hashes. This should not have significant effect on performance assuming we don’t consistently loop over the pprev to reach a block in the history. In fact, the only reason to do such a thing is a (long) reorg. Usually the best way to loop to history over the mainchain is using block height arithmetic. The original bitcoin core used to do pprev looping a lot (for example even to create a CBlockLocator starting from the tip), but right now this isn’t the case anymore since block heights of the main chain are registered in CChain.

    More side effects: Many things have to go to the DB

    The problem is bigger than just BlockMap. Other things will be affected as well, such as CChain. The fact that there are many dependents on BlockMap pointers is part of the reason why the memory consumption is so high. All this has to go and be replaced by proper indexing in the key/value database at hand; i.e., leveldb.

    Steps to complete this task

    1. Create the interfaces that will replace everything that uses CBlockIndex pointers, still with a global BlockMap through an interface, not through std::unordered_map directly.
    2. Test the new design and ensure that the isolation is successful.
    3. Get rid of BlockMap and use the database (and whatever caching layer above it) to store the block index
    4. Profit! Enjoy a much lower memory footprint, and more decentralization by having smaller hardware run full-nodes with less work.

    Let me know what you think, and hopefully I didn’t oversee a anything in my analysis.

  2. fanquake added the label Resource usage on Apr 4, 2022
  3. maflcko commented at 2:17 pm on April 4, 2022: member

    typedef std::unordered_map<uint256, CBlockIndex*, BlockHasher> BlockMap;

    This is no longer true in current master.

    We’re over 2 GB of memory at times

    This implies that the issue is not the block index. If a memory limit is only surpassed temporarily, it is more likely the wallet, an index thread or the mempool.

    Locally for me, the block index consumes way less than 1 GB or even 500 MB on x86. It shouldn’t be that much different on aarch64?

  4. sipa commented at 2:25 pm on April 4, 2022: member

    I believe our current BlockMap structure on 64-bit systems uses 216 bytes per entry (incl. malloc overhead), plus another 8 bytes per entry in CChain. Together that should be around 156 MiB of memory usage on the main chain.

    That’s not a trivial amount, but combined with some amount of peer connection buffer, utxo cache, mempool, … to provide a basic operation, it’s also not by far the only contributing factor. Furthermore, it’s an extremely predictably-growing one which for all but the most memory-starved systems probably grows slower than what normal system evolution allows (so this should be becoming less and less of a problem over time).

    I’m not opposed to improvements to reducing the memory usage of the block map, but I’m not sure that the complexity of switching to a LRU model is worth it for saving maybe 100 MiB.

  5. TheQuantumPhysicist commented at 2:54 pm on April 4, 2022: contributor

    This is no longer true in current master.

    Very nice. I’m glad to see clear ownership there. Thank you for that.

    This implies that the issue is not the block index. If a memory limit is only surpassed temporarily, it is more likely the wallet, an index thread or the mempool.

    It’s probably the indexer, I agree there. I’m running on an empty wallet. Now after the sync is over, it’s around 800 MB of memory on my workstation (assuming we have zero forks and synced newly), which is still not negligible.

    I believe our current BlockMap structure on 64-bit systems uses 216 bytes per entry, plus another 8 bytes per entry in CChain. Together that should be around 156 MiB of memory usage on the main chain.

    It looks small maybe if we ignore alignment and maps/sets overheads. The problem also is that there are other things I constantly see linked with block index and they all relate to the block map in some way and live in memory, storing extra information times 800k blocks, taking memory. The CChain was an example. Take also a look at setBlockIndexCandidates. Another 800k pointers thrown in the bucket of barely used memory (ignoring std::set’s overhead).

    And to prove that this is specifically the block index. Just launch a new node from genesis. And you’ll see the memory usage just growing linearly starting from around 150 MB. So it’s either the block index or something else that grows linearly with the current height. I don’t think there’s anything else, because everything else is managed properly with bounded memory.

    And finally, to concede and agree with you guys, yes, it may not be the worst thing depending on the angle. But it’s something that has to be fixed sooner or later. If you guys think it’s worth shooting through it, I’ll happily do it.

  6. sipa commented at 2:57 pm on April 4, 2022: member

    My calculation takes maps/sets/allocation overhead into account.

    The default utxo cache size is 450 MiB in size, plus it “steals” any unused portion of the mempool (default limited to 300 MiB). If you want to reduce memory usage today during sync, the best move is reducing -dbcache and -maxmempool.

  7. maflcko commented at 3:02 pm on April 4, 2022: member

    So it’s either the block index or something else that grows linearly with the current height.

    It might be best to benchmark/measure this, otherwise it will be hard to guess. I’d be really surprised if this was the block index, as the block index is constructed and fully filled before IBD starts. Also, I can confirm the ~150MiB figure above on x86.

  8. TheQuantumPhysicist commented at 3:03 pm on April 4, 2022: contributor

    My calculation takes maps/sets/allocation overhead into account.

    Sounds good. I would appreciate telling me how you got these numbers, because I’m failing to understand what all that memory is taken for.

    The default utxo cache size is 450 MiB in size, plus it “steals” any unused portion of the mempool (default limited to 300 MiB). If you want to reduce memory usage today during sync, the best move is reducing -dbcache and -maxmempool.

    If I run ./src/qt/bitcoin-qt --dbcache=25 --maxmempool=10, it’s still over 700 MB of usage. Taking the linear calculation from before into account, if we subtract 150 MB, it’s 550 MB that can be attributed to the block index, which is around 65% of the total consumption of memory (and growing).

  9. TheQuantumPhysicist commented at 3:05 pm on April 4, 2022: contributor

    It might be best to benchmark/measure this, otherwise it will be hard to guess. I’d be really surprised if this was the block index, as the block index is constructed and fully filled before IBD starts. Also, I can confirm the ~150MiB figure above on x86.

    I might be wrong with my figures. But I learned over the years that estimating memory consumption is really a difficult thing. So, please excuse me if I’m overextending my imagination on this.

  10. maflcko commented at 3:24 pm on April 4, 2022: member

    It might be that the GUI is using another 200 MB of memory?

    Again, there is nothing we can do without benchmarking. Locally (for me), I get:

    Screenshot from 2022-04-04 17-15-37

  11. sipa commented at 3:28 pm on April 4, 2022: member

    Sounds good. I would appreciate telling me how you got these numbers, because I’m failing to understand what all that memory is taken for.

    memusage.h contains algorithms for computing memory usage of various structures, including std::unordered_map. That only works with actual maps, but if you look at the code you’ll find it consists of:

    • Individually-allocated blobs of type (similar to) struct : std::pair<const Key, Value> { void* ptr; } (every unordered_map node consists of a key, a value, and a pointer to potentially another blob in the same bucket). These blobs (incl. allocation overhead) use 208 bytes.
    • The table itself, with size one pointer per bucket. The number of buckets is approximately equal to the number of elements in the map. So 8 bytes per element.
  12. TheQuantumPhysicist commented at 4:38 pm on April 4, 2022: contributor

    It might be that the GUI is using another 200 MB of memory?

    Correct. In my Linux (Debian Bullseye), when I start a node from genesis, before choosing the data dir, it consumes 125 MB. Then after choosing it bumps up to 175 MB.

    Perhaps there was a misunderstanding on what I meant with my calculation. If the node starts at around 150 MB, then after sync it’s 800 MB (or 700 after removing the mempool and the cache), that can only mean that things have grown linearly in between, assuming everything else is cleaned up.

    I’ll see what I can do with benchmarks. I’m no so familiar with how that works though. If there’s somewhere where I can start/target to get these results, I’d appreciate pointing me there.

  13. maflcko commented at 11:27 am on April 6, 2022: member
    I used massif to check the allocated memory. Not sure if this can also help find the memory issue you are facing, but it might be worth a try.
  14. TheQuantumPhysicist commented at 11:45 am on April 6, 2022: contributor

    I will play with massif. I’ll see the benchmarks. I will also attempt to remove the block index and see what happens. I’ll be playing with this for a while on multiple angles.

    Issue #24700 shows an interesting symptom of the problem on devices with low memory. Honestly, I was able to reduce the usage of the whole node to 25-50 MB of memory in an old bitcoin fork (with millions of blocks in the block index) by completely eliminating the block index, which was very aggressive with no LRU cache. It’s difficult to know how modern bitcoin will act without trying, and I’m even convinced that after we try, we can recursively fix more related memory waste; but I believe the main culprit here is the block index. There are too many things to be checked. I’ll keep you updated with any results I arrive at.

  15. willcl-ark commented at 3:10 pm on April 10, 2024: contributor

    This issue hasn’t had activity in a while and appears to have gone stale so I’m going to close it for now.

    Feel free to open a new issue or comment here if you are still experiencing this problem so we can investigate further.

  16. willcl-ark closed this on Apr 10, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me