RAM usage regression in 26.x and master on ARM 32-bit #28906

issue hebasto openend this issue on November 17, 2023
  1. hebasto commented at 7:28 pm on November 17, 2023: member

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    The bitcoind -dbcache=100 -par=4 -disablewallet -reindex-chainstate command fails with OOM. First reported in #28718 (comment).

    Expected behaviour

    RAM usage does not exceed 0.9 GB.

    Steps to reproduce

    100% reproducibility with Guix compiled binaries.

    Relevant log output

    02023-11-17T18:43:31Z [loadblk] UpdateTip: new best=000000000000000016d35e6ca77dea6f4e2caa387f245a712b1df5bbab3c559c height=314541 version=0x00000002 log2_work=80.091273 tx=44112815 date='2014-08-08T11:42:41Z' progress=0.049410 cache=105.6MiB(13710725txo)
    

    How did you obtain Bitcoin Core

    Compiled from source

    What version of Bitcoin Core are you using?

    master@5aa0c82ccd6ceb4a141686fc8658f679de75a787

    Operating system and version

    Armbian 23.8.3 jammy, 1 GB swap-file

    Machine specifications

    armv7l, 8 cores, 2 GB RAM

  2. hebasto added the label Resource usage on Nov 17, 2023
  3. hebasto commented at 7:30 pm on November 17, 2023: member

    A regression is caused by #25325.

    cc @martinus @sipa @achow101

  4. pinheadmz commented at 7:48 pm on November 17, 2023: member
    Do we have a 32-bit CI task already?
  5. hebasto commented at 7:50 pm on November 17, 2023: member

    Do we have a 32-bit CI task already?

    x86, not ARM. we do

  6. martinus commented at 8:14 pm on November 17, 2023: contributor

    This is a shot in the dark, but could this be due to memory fragmentation? The PoolResource allocates blocks of 262144 bytes, which seems quite large for such a system. Could you try this patch, or is there a way for me to try this? It reduces the allocated block size to 16K (and fixes a few tests that rely on the hardcoded size).

     0diff --git a/src/support/allocators/pool.h b/src/support/allocators/pool.h
     1index c8e70ebacff..9f93511995e 100644
     2--- a/src/support/allocators/pool.h
     3+++ b/src/support/allocators/pool.h
     4@@ -184,7 +184,7 @@ public:
     5     /**
     6      * Construct a new Pool Resource object, defaults to 2^18=262144 chunk size.
     7      */
     8-    PoolResource() : PoolResource(262144) {}
     9+    PoolResource() : PoolResource(16384) {}
    10 
    11     /**
    12      * Disable copy & move semantics, these are not supported for the resource.
    13diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
    14index 12dc4d1ccc4..5f1a0120aca 100644
    15--- a/src/test/coins_tests.cpp
    16+++ b/src/test/coins_tests.cpp
    17@@ -1088,18 +1088,18 @@ BOOST_AUTO_TEST_CASE(coins_resource_is_used)
    18 {
    19     CCoinsMapMemoryResource resource;
    20     PoolResourceTester::CheckAllDataAccountedFor(resource);
    21-
    22+    size_t num_elements = 16384 / 200;
    23     {
    24         CCoinsMap map{0, CCoinsMap::hasher{}, CCoinsMap::key_equal{}, &resource};
    25         BOOST_TEST(memusage::DynamicUsage(map) >= resource.ChunkSizeBytes());
    26 
    27-        map.reserve(1000);
    28+        map.reserve(num_elements);
    29 
    30         // The resource has preallocated a chunk, so we should have space for at several nodes without the need to allocate anything else.
    31         const auto usage_before = memusage::DynamicUsage(map);
    32 
    33         COutPoint out_point{};
    34-        for (size_t i = 0; i < 1000; ++i) {
    35+        for (size_t i = 0; i < num_elements; ++i) {
    36             out_point.n = i;
    37             map[out_point];
    38         }
    39diff --git a/src/test/validation_flush_tests.cpp b/src/test/validation_flush_tests.cpp
    40index 7398091215c..c8f2c9b32fa 100644
    41--- a/src/test/validation_flush_tests.cpp
    42+++ b/src/test/validation_flush_tests.cpp
    43@@ -37,7 +37,7 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
    44     };
    45 
    46     // PoolResource defaults to 256 KiB that will be allocated, so we'll take that and make it a bit larger.
    47-    constexpr size_t MAX_COINS_CACHE_BYTES = 262144 + 512;
    48+    constexpr size_t MAX_COINS_CACHE_BYTES = 16384 + 512;
    49 
    50     // Without any coins in the cache, we shouldn't need to flush.
    51     BOOST_TEST(
    
  7. hebasto referenced this in commit 1493959d8e on Nov 17, 2023
  8. hebasto commented at 10:05 pm on November 17, 2023: member

    @martinus

    This is a shot in the dark, but could this be due to memory fragmentation? The PoolResource allocates blocks of 262144 bytes, which seems quite large for such a system. Could you try this patch, or is there a way for me to try this? It reduces the allocated block size to 16K (and fixes a few tests that rely on the hardcoded size).

    I’ve tested your patch. No change in behavior, unfortunately. It still fails with OOM.

  9. martinus commented at 7:46 am on November 18, 2023: contributor
    It could be that the memory usage calculation has changed, and calculation for memory used in the CCoinsMap is now more correct than before. Previously it overestimated memory usage, and now it’s more correct but for the same cache size this also leads to a bit higher memory usage
  10. hebasto commented at 9:11 am on November 18, 2023: member

    It could be that the memory usage calculation has changed, and calculation for memory used in the CCoinsMap is now more correct than before. Previously it overestimated memory usage, and now it’s more correct but for the same cache size this also leads to a bit higher memory usage

    It depends on definition of “a bit higher” :) It was less than 0.9 GB of RAM. Now 2 GB of RAM plus 1 GB swap is not enough.

  11. martinus commented at 9:14 am on November 18, 2023: contributor
    “a bit higher” should be about 2% or so… I don’t think there is a memory leak anywhere.. Is this issue only on ARM or also on x86 32bit?
  12. martinus commented at 2:38 pm on November 18, 2023: contributor

    I investigated a bit with heaptrack and it seems there are a bit higher spikes than I have thought. It’s still not too bad, without the pool heaptrack says peack memory is ~600MB, and with the pool it’s ~700 MB. That bigger peak seems to happen in CCoinsViewCache::BatchWrite.

    Previously, the mapCoins.erase(it) freed a node, and a subsequent cacheCoins[it->first] could use that freed memory. With the pool that is currently not possible any more, because the mapCoins.erase(it) doesn’t actually free the memory but puts it into the pool. And because mapCoins and cacheCoins use two different resources, that memory can’t be reused.

    I’ll have a look if it is possible to use the same pool for both mapCoins and cacheCoins, this should reduce peak memory usage. If this would also fix the ARM issue, I can’t say…

  13. hebasto commented at 2:41 pm on November 18, 2023: member
    From the observing the RAM usage during runtime, I would say that RAM usage increases over time until OOM.
  14. hebasto commented at 5:55 pm on November 18, 2023: member

    From IRC:

    19:42 <sipa> and it’s only with guix builds?

    I also can confirm the regression for binaries that were natively compiled with GCC 11.4.0.

  15. martinus commented at 7:35 pm on November 18, 2023: contributor
    @hebasto could you try another patch? https://github.com/martinus/bitcoin/commit/04fb0b35cf18d5805d58595c27f027fae4428e62 This removes the placement new that didn’t reuse the returned pointer, which technically has always been illegal
  16. hebasto commented at 8:45 pm on November 18, 2023: member

    Is this difference in the reported cache size expected:

    • in master @ 3a93957a5dc97cb2fd0656d1e2451ebef57204df (pre-PR25325):
    0UpdateTip: new best=00000000000001bc255d63d3e82af7638566a5fbaadb6d5bdc41d0731a23d1ea height=209285 version=0x00000001 log2_work=69.067375 tx=9192843 date='2012-11-24T00:09:17Z' progress=0.010297 cache=376.0MiB(3276665txo)
    1Cache size (394377008) exceeds total space (394371840)
    
    • in master @ 5aa0c82ccd6ceb4a141686fc8658f679de75a787 (PR25325 merged):
    0UpdateTip: new best=00000000000001bc255d63d3e82af7638566a5fbaadb6d5bdc41d0731a23d1ea height=209285 version=0x00000001 log2_work=69.067375 tx=9192843 date='2012-11-24T00:09:17Z' progress=0.009937 cache=26.3MiB(3276665txo)
    

    ?

  17. hebasto commented at 8:46 pm on November 18, 2023: member

    @hebasto could you try another patch? martinus@04fb0b3 This removes the placement new that didn’t reuse the returned pointer, which technically has always been illegal

    Tested. No behavior change. Still OOM.

  18. martinus commented at 9:13 pm on November 18, 2023: contributor

    Is this difference in the reported cache size expected

    Yes, when the pool is used the flushes happen at different times because more transactions are cached, so the number of cached transaction at certain heights can be completely different

    Tested. No behavior change. Still OOM.

    Thanks for testing! Hm, I’m out of ideas for now… I don’t see anything in the code that could cause this. My only suspicion is that for some reason on ARM the freed memory from the pool is not reused, but I don’t see why that would be the case

  19. hebasto commented at 10:14 pm on November 18, 2023: member

    Is this difference in the reported cache size expected

    Yes, when the pool is used the flushes happen at different times because more transactions are cached, so the number of cached transaction at certain heights can be completely different

    The point is that the former log shows the first flush. In the latter one, there were no flushes by that point at all.

  20. martinus commented at 5:11 am on November 19, 2023: contributor

    The point is that the former log shows the first flush. In the latter one, there were no flushes by that point at all.

    That’s ok, with pool it flushes later. But on second glance, the cache size is incorrect: cache=26.3MiB(3276665txo) That’s not possible, there’s cache can only be 26.3MiB with 3276665tx in it. So this looks like the cache size estimation doesn’t work for some reason. Can you post the full log?

  21. hebasto commented at 8:41 am on November 19, 2023: member
  22. maflcko added this to the milestone 26.0 on Nov 19, 2023
  23. martinus commented at 12:02 pm on November 19, 2023: contributor

    Thanks a lot for the log! I am now pretty certain that the allocated blocks from the pool are not counted in DynamicUsage. I can reproduce practically exactly same cache sizes as in the log when I set usage_chunks=0 here (and run 32bit bitcoind on x86): https://github.com/bitcoin/bitcoin/blob/master/src/memusage.h#L201 So the bucket_count is calculated correctly which can be seen by the memory jumps.

    So either pool_resource->ChunkSizeBytes() or pool_resource->NumAllocatedChunks() must return 0 for unknown reasons…

  24. sipa commented at 12:05 pm on November 19, 2023: member
    @martinus Is it possible that due to a mistake in the guessed node size, the utxo cache entries are allocated through the fallback to the normal allocator?
  25. hebasto commented at 12:15 pm on November 19, 2023: member

    @martinus Is it possible that due to a mistake in the guessed node size, the utxo cache entries are allocated through the fallback to the normal allocator?

    Which part of code is responsible for that?

  26. sipa commented at 12:20 pm on November 19, 2023: member

    @hebasto The guess for node size is here: https://github.com/bitcoin/bitcoin/blob/v26.0rc2/src/coins.h#L148

    Can you try increasing that (say, add 64 to it), and see if that fixes things?

  27. hebasto referenced this in commit 80c02a9fe6 on Nov 19, 2023
  28. martinus commented at 12:47 pm on November 19, 2023: contributor

    @martinus Is it possible that due to a mistake in the guessed node size, the utxo cache entries are allocated through the fallback to the normal allocator?

    That sounds like a very likely explanation! Although I’m curious why the estimation would work in x86 32bit but not on ARM

  29. hebasto commented at 12:49 pm on November 19, 2023: member
    Might it depend on max_load_factor?
  30. sipa commented at 12:59 pm on November 19, 2023: member

    @hebasto No, it shouldn’t.

    We’re essentially trying to guess the sizeof of an internal data structure in std::unordered_map. If we guess too small, allocations do not go to the pool, but a fallback to the normal allocator is used. @martinus Should we make sure that even allocations that use the fallback get accounted for in the resource?

  31. martinus commented at 1:06 pm on November 19, 2023: contributor

    @sipa I thought that too, it would definitely make sense. Although it would need to move the size_t MallocUsage(size_t alloc); into the pool too

    If the size increase does not help, it could also be that ARM for some reason needs a bigger alignment for its nodes. In any case it would be interesting to run this again with this commit: https://github.com/martinus/bitcoin/commit/eaa6aaa481c45866815d67dc80c509be15f22974 this will log each time the pool is called with an allocation size it can’t handle, which should be very noisy.

  32. hebasto commented at 1:09 pm on November 19, 2023: member

    @martinus

    If the size increase does not help, it could also be that ARM for some reason needs a bigger alignment for its nodes. In any case it would be interesting to run this again with this commit: martinus@eaa6aaa this will log each time the pool is called with an allocation size it can’t handle, which should be very noisy.

    I’ve already tested my own similar commit. There were no log entries for it.

  33. sipa commented at 1:12 pm on November 19, 2023: member
    @hebasto Did the + 64 change change anything at all?
  34. hebasto commented at 1:13 pm on November 19, 2023: member

    @hebasto Did the + 64 change change anything at all?

    It is running… Will report soon.

  35. martinus commented at 1:16 pm on November 19, 2023: contributor

    I’ve already tested my own similar commit. There were no log entries for it.

    There should always be quite a few log lines for the map buckets, even when all works as expected

  36. hebasto commented at 1:23 pm on November 19, 2023: member

    @hebasto Did the + 64 change change anything at all?

    Tested https://github.com/hebasto/bitcoin/commit/80c02a9fe6607fb0a4abfca66be7be42fd744905.

    No behavior change. Still OOM.

  37. hebasto commented at 1:23 pm on November 19, 2023: member

    I’ve already tested my own similar commit. There were no log entries for it.

    There should always be quite a few log lines for the map buckets, even when all works as expected

    Testing now…

  38. hebasto commented at 2:19 pm on November 19, 2023: member

    I’ve already tested my own similar commit. There were no log entries for it.

    There should always be quite a few log lines for the map buckets, even when all works as expected

    The log is flooded with “Allocate: 104 align(8), MAX_BLOCK_SIZE_BYTES=112, ALIGN_BYTES=4”

  39. martinus commented at 2:22 pm on November 19, 2023: contributor

    The log is flooded with “Allocate: 104 align(8), MAX_BLOCK_SIZE_BYTES=112, ALIGN_BYTES=4”

    Ha! This is the issue right there. For whatever reason the nodes require alignment of 8 bytes, but the pool is configured to use the platform’s 4 byte instead. Thanks a lot for all the testing! I’ll try to prepare a fix

  40. martinus commented at 3:01 pm on November 19, 2023: contributor
    TIL that on ARM 32bit an int64_t has 8 byte alignment, even when a pointer has 4 byte alignment: https://godbolt.org/z/xW3avfGef @hebasto could you please try this commit, I believe this should fix the issue: https://github.com/martinus/bitcoin/commit/72918e0d80455ece4aa96228be92246a83c8ad50
  41. sipa commented at 3:18 pm on November 19, 2023: member
    My suggestion would be to use @martinus’ commit above in 26.0 (if it fixes things), and then try to make sure fallback allocations get counted too in master.
  42. sipa commented at 3:38 pm on November 19, 2023: member
    @hebasto Seeing your deleted comment elsewhere, I don’t think std::unordered_map::node_type is useful, as it’s just a reference to a node, it doesn’t actually contain the node data.
  43. hebasto commented at 4:19 pm on November 19, 2023: member

    @hebasto could you please try this commit, I believe this should fix the issue: martinus@72918e0

    It works.

    Here’s an excerpt from the log with the first flushing:

    02023-11-19T16:17:58Z [initload] UpdateTip: new best=00000000000002711e96b4c88a8c60b97011068f0c644bf569a12bfa72b28a0d height=213239 version=0x00000002 log2_work=69.196319 tx=10230628 date='2012-12-23T00:00:41Z' progress=0.011054 cache=376.0MiB(3527856txo)
    12023-11-19T16:17:58Z [initload] Cache size (394554040) exceeds total space (394371840)
    

    The RAM usage remains as low as expected. @martinus Thank you for the fix so much!

  44. martinus commented at 4:23 pm on November 19, 2023: contributor

    It works.

    Awesome! Should I open a PR?

  45. hebasto commented at 4:24 pm on November 19, 2023: member

    Awesome! Should I open a PR?

    Yes, please :)

  46. martinus commented at 7:00 am on November 20, 2023: contributor

    […] try to make sure fallback allocations get counted too in master.

    Related PRs: #27748 tries to do exactly this, and #28531 wants to improve MallocUsage estimation.

  47. bitcoin deleted a comment on Nov 20, 2023
  48. fanquake closed this on Nov 22, 2023

  49. Julio-Rats referenced this in commit e9beaa749c on Jan 26, 2024
  50. PastaPastaPasta referenced this in commit a895bcaa93 on Oct 24, 2024
  51. PastaPastaPasta referenced this in commit 3b0fe7d7fb on Oct 24, 2024
  52. PastaPastaPasta referenced this in commit 02741a7706 on Oct 24, 2024
  53. bitcoin locked this on Nov 21, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me