Add pool based memory resource #25325

pull martinus wants to merge 5 commits into bitcoin:master from martinus:2022-06-very-not-scary-NodePoolResource changing 16 files +1004 −24
  1. martinus commented at 7:16 am on June 10, 2022: contributor

    A memory resource similar to std::pmr::unsynchronized_pool_resource, but optimized for node-based containers. The goal is to be able to cache more coins with the same memory usage, and allocate/deallocate faster.

    This is a reimplementation of #22702. The goal was to implement it in a way that is simpler to review & test

    • There is now a generic PoolResource for allocating/deallocating memory. This has practically the same API as std::pmr::memory_resource. (Unfortunately I cannot use std::pmr because libc++ simply doesn’t implement that API).

    • Thanks to sipa there is now a fuzzer for PoolResource! On a fast machine I ran it for ~770 million executions without finding any issue.

    • The estimation of the correct node size is now gone, PoolResource now has multiple pools and just needs to be created large enough to have space for the unordered_map nodes.

    I ran benchmarks with #22702, mergebase, and this PR. Frequency locked Intel i7-8700, clang++ 13.0.1 to reindex up to block 690000.

    0bitcoind -dbcache=5000 -assumevalid=00000000000000000002a23d6df20eecec15b21d32c75833cce28f113de888b7 -reindex-chainstate -printtoconsole=0 -stopatheight=690000
    

    The performance is practically identical with #22702, just 0.4% slower. It’s ~21% faster than master:

    Progress in Million Transactions over Time(2)

    Size of Cache in MiB over Time Note that on cache drops mergebase’s memory doesnt go so far down because it does not free the CCoinsMap bucket array.

    Size of Cache in Million tx over Time(1)

  2. martinus marked this as a draft on Jun 10, 2022
  3. DrahtBot added the label Build system on Jun 10, 2022
  4. DrahtBot added the label UTXO Db and Indexes on Jun 10, 2022
  5. martinus force-pushed on Jun 10, 2022
  6. martinus force-pushed on Jun 10, 2022
  7. DrahtBot commented at 5:46 pm on June 10, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK john-moffett, jonatack, LarryRuane, achow101
    Stale ACK sipa

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  8. martinus force-pushed on Jun 11, 2022
  9. martinus force-pushed on Jun 11, 2022
  10. martinus force-pushed on Jun 11, 2022
  11. Riahiamirreza approved
  12. martinus force-pushed on Jun 11, 2022
  13. martinus force-pushed on Jun 11, 2022
  14. martinus force-pushed on Jun 11, 2022
  15. martinus force-pushed on Jun 11, 2022
  16. martinus force-pushed on Jun 11, 2022
  17. martinus force-pushed on Jun 12, 2022
  18. martinus force-pushed on Jun 12, 2022
  19. martinus force-pushed on Jun 12, 2022
  20. martinus force-pushed on Jun 12, 2022
  21. martinus renamed this:
    [WIP] Add pool based memory resource
    Add pool based memory resource
    on Jun 13, 2022
  22. martinus marked this as ready for review on Jun 13, 2022
  23. martinus force-pushed on Jun 19, 2022
  24. martinus commented at 5:15 am on June 19, 2022: contributor
    rebased to 9205b60 with minor fixes in pool.h so it is usable in boost::unordered_map
  25. glozow removed the label Build system on Aug 18, 2022
  26. in src/test/util/xoroshiro128plusplus.h:29 in afa98fe230 outdated
    25+    [[nodiscard]] constexpr static uint64_t rotl(uint64_t x, int n)
    26+    {
    27+        return (x << n) | (x >> (64 - n));
    28+    }
    29+
    30+    [[nodiscard]] constexpr static uint64_t SplitMix64(uint64_t& seedval) noexcept
    


    sipa commented at 7:55 pm on August 23, 2022:
    No need for passing seedval by reference.

    martinus commented at 8:10 pm on August 23, 2022:
    Actually this it is necessary because seedval is modified, see the constructor in line 42 where m_s0 and m_s1 are initialized. I could use a pointer instead of reference or add a comment to make that more clear?

    sipa commented at 8:13 pm on August 23, 2022:
    Oh, no, I just didn’t pay attention to how it was used. My mistake.
  27. in src/test/util/xoroshiro128plusplus.h:1 in afa98fe230 outdated
    0@@ -0,0 +1,72 @@
    1+// Copyright (c) 2009-2010 Satoshi Nakamoto
    


    sipa commented at 8:04 pm on August 23, 2022:
    Pretty sure this was not written by Satoshi, or in 2010.

    martinus commented at 3:00 pm on September 18, 2022:
    That’s how rumors start! Maybe it was Satoshi? ;-) I’ll update that.

    martinus commented at 6:11 am on September 24, 2022:
    done
  28. martinus commented at 3:13 pm on September 18, 2022: contributor

    In a semi-related event, I’ve recently created a big benchmark of a lot of different hashmaps. See here: https://martin.ankerl.com/2022/08/27/hashmap-bench-01/

    Some findings that I think are relevant for this PR:

    • I have added the PoolAllocator developed here to the benchmarks and combined it with both std::unordered_map and boost::unordered_map and it works without any problems.
    • boost has completely reworked the hashmap and released that in version 1.80. Just changing the map to that implementation will probably provide some speedup. More info here.
    • Both boost::unordered_map and std::unordered_map benefit a lot memory-wise from the PoolAllocator.
    • Boost has it’s own pool allocator: boost::container::pmr::unsynchronized_pool_resource It behaves relatively similar to PoolAllocator, although slower in my benchmarks. So this might be an alternative to the PoolAllocator.
    • Other implementations like gtl::parallel_flat_hash_map are much faster and have low memory overhead, so they might be a good choice performance wise, although they carry much more risk because usage is not so widespread.
  29. martinus force-pushed on Sep 24, 2022
  30. martinus commented at 6:11 am on September 24, 2022: contributor
    Rebased to f1b20f0 to fix copyright header in xoroshiro128plusplus.h. Nothing else changed.
  31. DrahtBot added the label Needs rebase on Oct 19, 2022
  32. martinus force-pushed on Oct 21, 2022
  33. martinus commented at 3:58 pm on October 21, 2022: contributor
    Rebase to 509d97a fixes merge conflict in Makefile.test_util.include, fixes a comment, adds benchmark priority, fix Win64 64bit shift warning.
  34. DrahtBot removed the label Needs rebase on Oct 21, 2022
  35. martinus force-pushed on Oct 21, 2022
  36. in src/support/allocators/pool.h:70 in de694ed10b outdated
    65+ * m_allocated_chunks list. Each chunk has bytes 262144. The last chunk has still
    66+ * some memory available for the blocks, and when m_available_memory_it is at the
    67+ * end, a new chunk will be allocated and added to the list.
    68+ */
    69+template <std::size_t MAX_BLOCK_SIZE_BYTES, std::size_t ALIGN_BYTES>
    70+class PoolResource final
    


    aureleoules commented at 7:27 am on October 25, 2022:

    de694ed10bfa623596a41e35c95c13e757787b07: consider deleting copy ctor and copy assignment operators

     0diff --git a/src/support/allocators/pool.h b/src/support/allocators/pool.h
     1index 5c04bb31e..00bd1ed90 100644
     2--- a/src/support/allocators/pool.h
     3+++ b/src/support/allocators/pool.h
     4@@ -269,6 +269,9 @@ public:
     5     {
     6         return m_chunk_size_bytes;
     7     }
     8+
     9+    PoolResource(const PoolResource&) = delete;
    10+    PoolResource& operator=(const PoolResource&) = delete;
    11 };
    

    martinus commented at 5:55 pm on November 30, 2022:
    I’ll add these when I rebase to fix the merge conflict :+1:
  37. in src/support/allocators/pool.h:142 in de694ed10b outdated
    137+    }
    138+
    139+    /**
    140+     * Replaces node with placement constructed ListNode that points to the previous node
    141+     */
    142+    void PlacementAddToList(void* p, ListNode*& node)
    


    aureleoules commented at 7:38 am on October 25, 2022:

    de694ed10bfa623596a41e35c95c13e757787b07: maybe use templates instead of void*?

     0diff --git a/src/support/allocators/pool.h b/src/support/allocators/pool.h
     1index 5c04bb31e..38c5f9d62 100644
     2--- a/src/support/allocators/pool.h
     3+++ b/src/support/allocators/pool.h
     4@@ -139,7 +139,8 @@ class PoolResource final
     5     /**
     6      * Replaces node with placement constructed ListNode that points to the previous node
     7      */
     8-    void PlacementAddToList(void* p, ListNode*& node)
     9+    template <typename T>
    10+    void PlacementAddToList(T* p, ListNode*& node)
    11     {
    12         node = new (p) ListNode{node};
    13     }
    14@@ -232,7 +233,8 @@ public:
    15     /**
    16      * Returns a block to the freelists, or deletes the block when it did not come from the cunks.
    17      */
    18-    void Deallocate(void* p, std::size_t bytes, std::size_t alignment) noexcept
    19+    template <typename T>
    20+    void Deallocate(T* p, std::size_t bytes, std::size_t alignment) noexcept
    21     {
    22         if (IsFreeListUsable(bytes, alignment)) {
    23             const std::size_t num_alignments = NumElemAlignBytes(bytes);
    

    martinus commented at 11:25 am on October 31, 2022:
    I think it’s better to leave this as void*, because I don’t want any overloaded operator new or operator delete called depending on the type.
  38. in src/support/allocators/pool.h:101 in 509d97a7a9 outdated
     96+    const size_t m_chunk_size_bytes;
     97+
     98+    /**
     99+     * Contains all allocated pools of memory, used to free the data in the destructor.
    100+     */
    101+    std::list<std::byte*> m_allocated_chunks{};
    


    aureleoules commented at 8:02 am on October 25, 2022:

    de694ed10bfa623596a41e35c95c13e757787b07: might slightly improve performance but haven’t benchmarked

    0    std::vector<std::byte*> m_allocated_chunks{};
    

    martinus commented at 5:57 pm on November 30, 2022:
    I thought std::list is more fitting here, because I don’t need random access, and only every once in a while one element is added. The list is only iterated once when bitcoind shutsdown. So it’s not at all performance relevant here
  39. aureleoules approved
  40. aureleoules commented at 8:09 am on October 25, 2022: member

    The code looks good but it is spooky to review. I have benchmarked the PR by running bitcoind -dbcache=5000 -assumevalid=00000000000000000002a23d6df20eecec15b21d32c75833cce28f113de888b7 -reindex-chainstate -printtoconsole=0 and compared the time elapsed between the reindexing of blocks between height=165618 and height=686269 on both master and this PR on mainnet.

    Master took 18h06mins and this PR took 13h46mins, corresponding to a reduction of ~24% which matches the PR description.

  41. aureleoules commented at 11:15 am on October 25, 2022: member

    I have also verified that using the default cache (450MB), this PR reindexes the chainstate faster than on master.

    Master

    ./src/bitcoind -testnet -reindex-chainstate -stopatheight=1000000 -printtoconsole=0 real 10m6.978s user 9m4.140s sys 1m34.479s

    PR

    ./src/bitcoind -testnet -reindex-chainstate -stopatheight=1000000 -printtoconsole=0 real 9m32.270s user 8m29.061s sys 1m26.911s

  42. DrahtBot added the label Needs rebase on Nov 19, 2022
  43. DrahtBot removed the label Needs rebase on Nov 21, 2022
  44. DrahtBot added the label Needs rebase on Nov 30, 2022
  45. martinus force-pushed on Nov 30, 2022
  46. DrahtBot removed the label Needs rebase on Nov 30, 2022
  47. sipa commented at 7:19 pm on December 5, 2022: member

    Concept ACK, this approach is much easier to be convinced about, I feel. Happy to see that most of the performance gains carry over.

    I’ve ran a partial reindex under valgrind with this PR (to height 592178, which took several days), to really stress test it. No issues. I’ll do a more in-depth code review soon.

  48. in src/support/allocators/pool.h:64 in 4ee892b57c outdated
    59+ *        │   │
    60+ *        │16 │
    61+ *        └───┘
    62+ *
    63+ * Here m_free_lists[1] holds the 2 blocks of size 8 bytes, and m_free_lists[2]
    64+ * holds the3 blocks of size 16. The blocks came from the data stored in the
    


    sipa commented at 9:22 pm on January 19, 2023:
    Typo: space before 3

    martinus commented at 6:09 am on January 21, 2023:
    done in rebase to 80722d8
  49. in src/support/allocators/pool.h:107 in 4ee892b57c outdated
    102+
    103+    /**
    104+     * Single linked lists of all data that came from deallocating.
    105+     * m_free_lists[n] will serve blocks of size n*BLOCK_ALIGNMENT_BYTES.
    106+     */
    107+    std::array<ListNode*, MAX_BLOCK_SIZE_BYTES / ELEM_ALIGN_BYTES + 1> m_free_lists{};
    


    sipa commented at 9:27 pm on January 19, 2023:
    Allocators don’t need to be able to handle allocations of size 0, so I think the + 1 could be dropped here (shifting the indices down by 1).

    martinus commented at 11:15 am on January 20, 2023:

    I’m not 100% sure that is the case

    In https://eel.is/c++draft/allocator.requirements#general-39 it states that “the return value is unspecified”, so it it seems to me that it can be any value, but it doesn’t say that the behavior is unspecified. So it could return any pointer or even nullptr

    In http://eel.is/c++draft/basic.stc.dynamic.allocation#2 it says “The effect of indirecting through a pointer returned from a request for zero size is undefined”, so one might return any value, but it must not be dereferenced (which seems logic, with 0 elements there’s nothing to dereference). The footnote here says “C++ differs from C in requiring a zero request to return a non-null pointer”, so this too seems like a pointer should be returned

    So I think allocate(0) is not forbidden


    sipa commented at 2:45 pm on January 20, 2023:
    Fair enough.
  50. in src/support/allocators/pool.h:193 in 4ee892b57c outdated
    188+
    189+    /**
    190+     * Disable copy & move semantics, these are not supported for the resource.
    191+     */
    192+    PoolResource(const PoolResource&) = delete;
    193+    PoolResource& operator=(const PoolResource&) = delete;
    


    sipa commented at 9:38 pm on January 19, 2023:
    I know there are rules that make move constructors/assignment operators not automatically appear when copy equivalents are specified, but I’d still prefer explicitly disabling those too.
  51. in src/support/allocators/pool.h:215 in 4ee892b57c outdated
    210+    void* Allocate(std::size_t bytes, std::size_t alignment)
    211+    {
    212+        if (IsFreeListUsable(bytes, alignment)) {
    213+            const std::size_t num_alignments = NumElemAlignBytes(bytes);
    214+
    215+            // If alignment is lower than BLOCK_ALIGNMENT_BYTES, we need to allocate a bit more.
    


    sipa commented at 9:47 pm on January 19, 2023:
    It seems this comment would be more appropriate if put a few lines higher. The placement here makes it look like it’s related to the freelist empty test.

    martinus commented at 11:22 am on January 20, 2023:
    I’ll remove the comment, it doesn’t make much sense

    martinus commented at 6:09 am on January 21, 2023:
    done in rebase to 80722d8
  52. in src/support/allocators/pool.h:223 in 4ee892b57c outdated
    218+                // to the unlinked memory. Since FreeList is trivially destructible we can just treat it as
    219+                // uninitialized memory.
    220+                return std::exchange(m_free_lists[num_alignments], m_free_lists[num_alignments]->m_next);
    221+            }
    222+
    223+            // freelist is empty: get one allocation from allocated chunk memory.
    


    sipa commented at 9:52 pm on January 19, 2023:
    This comment seems like it should be moved up as well.

    martinus commented at 11:24 am on January 20, 2023:
    The location seems ok to me, when that line is reached we know the freelist is empty so we need to get memory from the chunk

    sipa commented at 3:15 pm on January 20, 2023:
    Ok, it’s just a tiny nit of course. Given the somewhat out of place comment above I wondered if perhaps a few more comments somehow got misplaced.
  53. in src/support/allocators/pool.h:230 in 4ee892b57c outdated
    225+            if (m_available_memory_it + round_bytes > m_available_memory_end) {
    226+                // slow path, only happens when a new chunk needs to be allocated
    227+                AllocateChunk();
    228+            }
    229+
    230+            // Make sure we use the right amount of bytes for that freelist (might be rounded up),
    


    sipa commented at 9:53 pm on January 19, 2023:
    And this one too.

    martinus commented at 11:25 am on January 20, 2023:
    I think that line fits too, this comments on the m_available_memory_it + round_bytes

    sipa commented at 3:15 pm on January 20, 2023:
    Ok.
  54. in src/support/allocators/pool.h:234 in 4ee892b57c outdated
    229+
    230+            // Make sure we use the right amount of bytes for that freelist (might be rounded up),
    231+            return std::exchange(m_available_memory_it, m_available_memory_it + round_bytes);
    232+        }
    233+
    234+        // Can't use the pool => forward allocation to the upstream resource.
    


    sipa commented at 9:53 pm on January 19, 2023:
    I don’t think there is an upstream resource; it’s just handled by the default allocator.

    martinus commented at 6:10 am on January 21, 2023:
    done in rebase to 80722d8
  55. in src/support/allocators/pool.h:239 in 4ee892b57c outdated
    234+        // Can't use the pool => forward allocation to the upstream resource.
    235+        return ::operator new (bytes, std::align_val_t{alignment});
    236+    }
    237+
    238+    /**
    239+     * Returns a block to the freelists, or deletes the block when it did not come from the cunks.
    


    sipa commented at 9:54 pm on January 19, 2023:
    Typo: cunks.

    martinus commented at 6:10 am on January 21, 2023:
    done in rebase to 80722d8
  56. in src/support/allocators/pool.h:249 in 4ee892b57c outdated
    244+            const std::size_t num_alignments = NumElemAlignBytes(bytes);
    245+            // put the memory block into the linked list. We can placement construct the FreeList
    246+            // into the memory since we can be sure the alignment is correct.
    247+            PlacementAddToList(p, m_free_lists[num_alignments]);
    248+        } else {
    249+            // Can't use the pool => forward deallocation to the upstream resource.
    


    sipa commented at 9:54 pm on January 19, 2023:
    Likewise, no upstream resource here?

    martinus commented at 6:10 am on January 21, 2023:
    done in rebase to 80722d8
  57. in src/test/util/poolresourcetester.h:115 in 4ee892b57c outdated
    110+                ++chunk_it;
    111+                assert(chunk_it != chunks.end());
    112+                chunk_ptr_remaining = chunk_it->ptr;
    113+                chunk_size_remaining = chunk_it->size;
    114+            }
    115+            // std::cout << "free_block=(" << (void*)free_block.ptr << ", " << free_block.size << "), chunk=(" << (void*)chunk_ptr_remaining << ", " << chunk_size_remaining << ")" << std::endl;
    


    sipa commented at 9:58 pm on January 19, 2023:
    This seems like a debugging leftover.

    martinus commented at 6:18 am on January 21, 2023:
    removed in rebase to 80722d838f9a5fdbbbd82f97d353e45b5d375ad8
  58. in src/support/allocators/pool.h:101 in 4ee892b57c outdated
     96+    const size_t m_chunk_size_bytes;
     97+
     98+    /**
     99+     * Contains all allocated pools of memory, used to free the data in the destructor.
    100+     */
    101+    std::list<std::byte*> m_allocated_chunks{};
    


    sipa commented at 10:09 pm on January 19, 2023:
    Could this use std::forward_list?

    martinus commented at 2:03 pm on January 20, 2023:
    I could, but itt would be less convenient, because forward_list doesn’t even have a size() which I use in NumAllocatedChunks()

    sipa commented at 3:12 pm on January 20, 2023:
    I see; it’d reduce memory usage very slightly, but if it comes with extra complexity it’s probably not worth it.
  59. in src/memusage.h:183 in ecc73d72c7 outdated
    176+                                                                         MAX_BLOCK_SIZE_BYTES,
    177+                                                                         ALIGN_BYTES>>& m)
    178+{
    179+    auto* pool_resource = m.get_allocator().resource();
    180+
    181+    size_t estimated_list_node_size = MallocUsage(sizeof(void*) * 3);
    


    sipa commented at 10:10 pm on January 19, 2023:
    A comment to explain where this estimation formula comes from would be useful (or encapsulate the sizeof(void*) * 3 constant as a static member of the resource?).
  60. in src/coins.h:229 in d7dd33e8db outdated
    225@@ -217,7 +226,8 @@ class CCoinsViewCache : public CCoinsViewBacked
    226      * declared as "const".
    227      */
    228     mutable uint256 hashBlock;
    229-    mutable CCoinsMap cacheCoins;
    230+    mutable CCoinsMapMemoryResource cacheCoinsMemoryResource{};
    


    sipa commented at 10:12 pm on January 19, 2023:
    Nit: use variable names following the style guide when introducing new ones (e.g. m_cache_coins_memory_resource).

    martinus commented at 6:10 am on January 21, 2023:
    done in rebase to 80722d8
  61. martinus force-pushed on Jan 20, 2023
  62. in src/Makefile.test.include:115 in 80722d838f outdated
    111@@ -112,6 +112,7 @@ BITCOIN_TESTS =\
    112   test/net_peer_eviction_tests.cpp \
    113   test/net_tests.cpp \
    114   test/netbase_tests.cpp \
    115+  test/pool_tests.cpp \
    


    jonatack commented at 1:33 am on January 21, 2023:

    Noticed while running make check and looking for pool_tests in the output: the following minor change in the first commit 2bbe0f8eeae274fc30d924c6762cbb7a392f3b32 will run the tests in the expected order.

    0   test/net_tests.cpp \
    1   test/netbase_tests.cpp \
    2-  test/pool_tests.cpp \
    3   test/orphanage_tests.cpp \
    4   test/pmt_tests.cpp \
    5   test/policy_fee_tests.cpp \
    6   test/policyestimator_tests.cpp \
    7+  test/pool_tests.cpp \
    8   test/pow_tests.cpp \
    

    currently

    0Running tests: net_peer_eviction_tests from test/net_peer_eviction_tests.cpp
    1Running tests: net_tests from test/net_tests.cpp
    2Running tests: netbase_tests from test/netbase_tests.cpp
    3Running tests: pool_tests from test/pool_tests.cpp
    4Running tests: orphanage_tests from test/orphanage_tests.cpp
    5Running tests: pmt_tests from test/pmt_tests.cpp
    6Running tests: policy_fee_tests from test/policy_fee_tests.cpp
    

    after

    0Running tests: pmt_tests from test/pmt_tests.cpp
    1Running tests: policy_fee_tests from test/policy_fee_tests.cpp
    2Running tests: policyestimator_tests from test/policyestimator_tests.cpp
    3Running tests: pool_tests from test/pool_tests.cpp
    4Running tests: pow_tests from test/pow_tests.cpp
    5Running tests: prevector_tests from test/prevector_tests.cpp
    6Running tests: raii_event_tests from test/raii_event_tests.cpp
    

    martinus commented at 6:18 am on January 21, 2023:
    Interesting that the order here has any effect on the order of tests, I would have expected that the test suite orders alphabetically. I’ll add this to my next update

    martinus commented at 5:19 am on January 28, 2023:
    done in e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  63. jonatack commented at 1:37 am on January 21, 2023: contributor
    Reviewing. Verified that each commit @ 80722d838f9a5fdbbbd82f97d353e45b5d375ad8 builds cleanly with unit tests green.
  64. in src/support/allocators/pool.h:225 in 2bbe0f8eea outdated
    220+                return std::exchange(m_free_lists[num_alignments], m_free_lists[num_alignments]->m_next);
    221+            }
    222+
    223+            // freelist is empty: get one allocation from allocated chunk memory.
    224+            const size_t round_bytes = num_alignments * ELEM_ALIGN_BYTES;
    225+            if (m_available_memory_it + round_bytes > m_available_memory_end) {
    


    sipa commented at 9:16 pm on January 23, 2023:

    I believe this is technically speaking UB: it may construct an out-of-bounds pointer; even without dereferencing, doing so is UB (unless it’s the exactly-one-past-the-end pointer). It’s of course fine in practice, but I think the proper way to write it is round_bytes > m_available_memory_end - m_available_memory_it.

    From https://en.cppreference.com/w/cpp/language/operator_arithmetic:

    If the pointer P points to the ith element of an array, then the expressions P + n, n + P, and P - n are pointers of the same type that point to the i+nth, i+nth, and i-nth element of the same array, respectively. The result of pointer addition may also be a one-past-the-end pointer (that is, pointer P such that the expression P - 1 points to the last element of the array). Any other situations (that is, attempts to generate a pointer that isn’t pointing at an element of the same array or one past the end) invoke undefined behavior.


    martinus commented at 6:02 pm on January 27, 2023:
    nice find, I’ll change the if

    martinus commented at 5:19 am on January 28, 2023:
    done in e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  65. sipa commented at 10:08 pm on January 23, 2023: member
    Code review ACK apart from one comment:
  66. in src/support/allocators/pool.h:261 in 80722d838f outdated
    256+     * true only for the same object.
    257+     */
    258+    bool IsEqual(const PoolResource& other) const noexcept
    259+    {
    260+        return this == &other;
    261+    }
    


    jonatack commented at 9:37 pm on January 24, 2023:
    2bbe0f8eeae274f Is the public helper member PoolResource::IsEqual() intended to be used (it’s currently not)?

    martinus commented at 6:06 pm on January 27, 2023:
    Right, this is unused; I originally used this in the operator==. I’ll remove it

    martinus commented at 5:19 am on January 28, 2023:
    done in e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  67. in src/support/allocators/pool.h:117 in 80722d838f outdated
    112+    std::byte* m_available_memory_it = nullptr;
    113+
    114+    /**
    115+     * Points to the end of available memory for carving out allocations.
    116+     *
    117+     * That member variable is redundant, and is always equal to `m_allocated_chunks.back().get() + CHUNK_SIZE_BYTES`
    


    jonatack commented at 9:41 pm on January 24, 2023:

    2bbe0f8eeae274 it looks like CHUNK_SIZE_BYTES should be m_chunk_size_bytes or ChunkSizeBytes()

    0     * That member variable is redundant, and is always equal to `m_allocated_chunks.back().get() +  m_chunk_size_bytes`
    

    martinus commented at 6:26 pm on January 27, 2023:
    right, also the .get() is no more

    martinus commented at 5:19 am on January 28, 2023:
    done in e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  68. in src/support/allocators/pool.h:105 in 80722d838f outdated
    100+     */
    101+    std::list<std::byte*> m_allocated_chunks{};
    102+
    103+    /**
    104+     * Single linked lists of all data that came from deallocating.
    105+     * m_free_lists[n] will serve blocks of size n*BLOCK_ALIGNMENT_BYTES.
    


    jonatack commented at 9:44 pm on January 24, 2023:
    2bbe0f8eeae274f BLOCK_ALIGNMENT_BYTES is undefined (maybe n * block alignment bytes?)

    martinus commented at 6:31 pm on January 27, 2023:
    It should be n*ELEM_ALIGN_BYTES

    martinus commented at 5:19 am on January 28, 2023:
    done in e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  69. jonatack commented at 10:08 pm on January 24, 2023: contributor

    WIP review of the first commit 2bbe0f8eeae274fc30, none of my comments below are blockers.

    Have been running this patch on mainnet since January 20 without issues (M1 macOS 13.1/13.2, clang 15) but not yet done the reindex described the OP.

    Bench results of $ ./src/bench/bench_bitcoin -filter='Pool*.*'

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               29.85 |       33,505,082.42 |    0.3% |      0.02 | `PoolAllocator_StdUnorderedMap`
    3|               13.03 |       76,732,499.95 |    0.4% |      0.01 | `PoolAllocator_StdUnorderedMapWithPoolResource`
    4
    5|               ns/op |                op/s |    err% |     total | benchmark
    6|--------------------:|--------------------:|--------:|----------:|:----------
    7|               28.63 |       34,927,088.12 |    0.2% |      0.02 | `PoolAllocator_StdUnorderedMap`
    8|               12.56 |       79,586,875.90 |    0.5% |      0.01 | `PoolAllocator_StdUnorderedMapWithPoolResource`
    

    These are the values that build on my machine (fuzz test disabled):

    0     static_assert((ELEM_ALIGN_BYTES & (ELEM_ALIGN_BYTES - 1)) == 0, "ELEM_ALIGN_BYTES must be a power of two");
    1     static_assert(sizeof(ListNode) <= ELEM_ALIGN_BYTES, "Units of size ELEM_SIZE_ALIGN need to be able to store a ListNode");
    2     static_assert((MAX_BLOCK_SIZE_BYTES & (ELEM_ALIGN_BYTES - 1)) == 0, "MAX_BLOCK_SIZE_BYTES needs to be a multiple of the alignment.");
    3+    static_assert(ALIGN_BYTES == 8);
    4+    static_assert(sizeof(ListNode) == 8);
    5+    static_assert(alignof(ListNode) == 8);
    6+    static_assert(ELEM_ALIGN_BYTES == 8);
    
  70. in src/support/allocators/pool.h:296 in 80722d838f outdated
    300+        : m_resource(resource)
    301+    {
    302+    }
    303+
    304+    PoolAllocator(const PoolAllocator& other) noexcept = default;
    305+    PoolAllocator& operator=(const PoolAllocator& other) noexcept = default;
    


    jonatack commented at 5:57 pm on January 25, 2023:

    https://github.com/bitcoin/bitcoin/commit/2bbe0f8eeae274fc30d924c6762cbb7a392f3b32 Question, are these user-defined copy-constructor and copy-assignment operators actually needed, and if yes, should the move operators be specified as well?

    Per https://en.cppreference.com/w/cpp/language/rule_of_three#rule-of-five, “Because the presence of a user-defined destructor, copy-constructor, or copy-assignment operator prevents implicit definition of the move constructor and the move assignment operator, any class for which move semantics are desirable, has to declare all five special member functions. Unlike Rule of Three, failing to provide move constructor and move assignment is usually not an error, but a missed optimization opportunity.”


    sipa commented at 9:45 pm on January 27, 2023:
    Moving is useless for this class, as all it holds is a pointer as member variable. Copying and moving a pointer is the same.

    jonatack commented at 9:59 pm on January 27, 2023:

    Thanks. I wondered if the move constructor is used, because while this branch compiles for me (Clang 15 on ARM64) with the move assignment disabled (PoolAllocator& operator=(PoolAllocator&&) noexcept = delete;), it doesn’t build with the move constructor disabled (PoolAllocator(PoolAllocator&&) noexcept = delete;).

    0./coins.h:230:33: note: in instantiation of member function 'std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher, std::equal_to<COutPoint>, PoolAllocator<std::pair<const COutPoint, CCoinsCacheEntry>, 128, 8>>::unordered_map' requested here
    1    mutable CCoinsMap cacheCoins{0, CCoinsMap::hasher{}, CCoinsMap::key_equal{}, &m_cache_coins_memory_resource};
    2                                ^
    3./support/allocators/pool.h:297:5: note: 'PoolAllocator' has been explicitly marked deleted here
    4    PoolAllocator(PoolAllocator&&) noexcept = delete;
    5    ^
    61 error generated.
    

    martinus commented at 5:23 am on January 28, 2023:
    I think the problem with adding the PoolAllocator(PoolAllocator&&) noexcept = delete; is that it makes the function available for overload resolution, but if its chosen the compilation fails https://stackoverflow.com/a/35406415/48181
  71. in src/support/allocators/pool.h:345 in 80722d838f outdated
    349+}
    350+
    351+template <class T1, class T2, std::size_t MAX_BLOCK_SIZE_BYTES, std::size_t ALIGN_BYTES>
    352+bool operator!=(const PoolAllocator<T1, MAX_BLOCK_SIZE_BYTES, ALIGN_BYTES>& a,
    353+                const PoolAllocator<T2, MAX_BLOCK_SIZE_BYTES, ALIGN_BYTES>& b) noexcept
    354+{
    


    jonatack commented at 6:02 pm on January 25, 2023:
    https://github.com/bitcoin/bitcoin/commit/2bbe0f8eeae274fc30d924c6762cbb7a392f3b32 Noting that these two == and != template methods don’t appear to be otherwise used in this PR.

    martinus commented at 6:52 pm on January 27, 2023:

    It’s not actually used anywhere currently, but this is needed when move-assigning containers that use the allocator. The containers need to check in a move-assignment if the allocators operator== returns true or false, and if they return true it means that both containers use the same resource and they can just destroy lhs and then move the pointers of rhs to lhs without copying elements. When they don’t compare equal when the resource is different, all the objects in rhs need to be recreated.

    The best explanation that I found for this stuff is this post from Howard Hinnant (he wrote std::chrono among other things) : https://stackoverflow.com/a/27472502/48181

  72. in src/test/pool_tests.cpp:184 in 80722d838f outdated
    165+            std_map[i];
    166+            resource_map[i];
    167+        }
    168+
    169+        // Eventually the resource_map should have a much lower memory usage because it has less malloc overhead
    170+        BOOST_TEST(memusage::DynamicUsage(resource_map) <= memusage::DynamicUsage(std_map) * 90 / 100);
    


    jonatack commented at 9:07 pm on January 25, 2023:
    Nice. (In practice on my particular system, 86 / 100 is the lowest value that passes.)

    martinus commented at 6:32 pm on January 27, 2023:
    I think the savings are a bit less than 86/100 on 32bit due to 32bit alignment, that’s why I have the number a bit on the high side here
  73. jonatack commented at 9:50 pm on January 25, 2023: contributor
    ACK 80722d838f9a5fdbbbd82f97d353e45b5d375ad8 modulo open comments and questions
  74. martinus force-pushed on Jan 27, 2023
  75. jonatack commented at 9:30 pm on January 27, 2023: contributor
    ACK e7158613dcafd0065a94b03c8013ee4ced8ec3e3 per git range-diff 4b51290 80722d8 e715861
  76. sipa commented at 10:07 pm on January 27, 2023: member
    ACK e7158613dcafd0065a94b03c8013ee4ced8ec3e3
  77. martinus commented at 5:24 am on January 28, 2023: contributor
    Thanks for reviewing @jonatack and @sipa! I’ve addressed all comments with e715861
  78. sipa commented at 4:21 pm on January 30, 2023: member
    Needs (fairly trivial) rebase after merge of #17487.
  79. DrahtBot added the label Needs rebase on Jan 30, 2023
  80. martinus force-pushed on Jan 30, 2023
  81. martinus commented at 5:21 pm on January 30, 2023: contributor
    Rebased to resolve conflicts, diff should be viewable with git range-diff 82903a7 e715861 f58a5eee6f
  82. jonatack commented at 5:53 pm on January 30, 2023: contributor

    re-ACK f58a5eee6ff69582d6e46e93dd52e07202d257a8

    The unit test CI failures look unrelated – I didn’t reproduce them running make check locally (Clang 15 on ARM64).

    Edit: there are now more of the failures and some seem related.

  83. martinus commented at 6:12 pm on January 30, 2023: contributor

    Hm it looks like in some environments the new check fails:

    0coins_tests.cpp(984): error: in "coins_tests/ccoins_flush_behavior": check view->DynamicMemoryUsage() < cache_usage has failed
    

    I think the problem is this: The test from @jamesob assumes that memory usage goes down when the cache is flushed. But the pool is internally using chunks of 262144 bytes, and when flushed this memory block will be deallocated but might be allocated again right away when the map is created, so memory usage doesn’t necessarily change. E.g. in Windows some control block is immediately allocated for the map, and this makes the pool allocator allocate its first chunk

  84. DrahtBot removed the label Needs rebase on Jan 30, 2023
  85. jonatack commented at 8:58 pm on January 30, 2023: contributor
    Fixup LGTM modulo squashing. The remaining CI failures are due to https://cirrus-ci.com/task/5963593934438400 (that seems unrelated?) and to #27001 that should be fixed in #26998.
  86. DrahtBot added the label Needs rebase on Jan 30, 2023
  87. martinus force-pushed on Jan 31, 2023
  88. martinus commented at 6:11 am on January 31, 2023: contributor

    Squashed and rebased again due to #26999

    See the full range-diff after the 2 rebases: git range-diff ceb74b8 e715861 c25a754b

    Biggest change was updating the method TestFlushBehavior in coins_test.cpp

  89. DrahtBot removed the label Needs rebase on Jan 31, 2023
  90. martinus force-pushed on Jan 31, 2023
  91. jonatack commented at 10:16 pm on January 31, 2023: contributor
    re-ACK 0007d69f249068a14b9b5a97d46ace9dabdc2c8b
  92. sipa commented at 8:56 pm on February 2, 2023: member
    FWIW, I rebased this on top of #27011, and am running a few CPU cores on it. No issues so far.
  93. DrahtBot added the label Needs rebase on Feb 13, 2023
  94. martinus force-pushed on Feb 14, 2023
  95. martinus commented at 8:04 am on February 14, 2023: contributor
    Rebased due to #27011. Note that this adds a /*deterministic=*/true here in test/fuzz/coins_view.cpp, see git range-diff 0007d69...78f597b
  96. DrahtBot removed the label Needs rebase on Feb 14, 2023
  97. jonatack commented at 4:58 pm on February 14, 2023: contributor
    re-ACK 78f597be2879c39d9d2b98e21ed0120d2308de20 reviewed git range-diff 2c1fe27 0007d69 78f597b and as a sanity check locally ran debug build and unit tests and fuzz build with briefly running the coins_view and coinscache_sim fuzzers
  98. DrahtBot requested review from sipa on Feb 14, 2023
  99. DrahtBot added the label Needs rebase on Feb 15, 2023
  100. martinus force-pushed on Feb 15, 2023
  101. martinus commented at 6:07 pm on February 15, 2023: contributor
    yet another rebase, only dropping the “Add xoroshiro128++ PRNG” commit which has already been added in #26153 (commit 5f05b27841af0bed1b6e7de5f46ffe33e5919e4d)
  102. DrahtBot removed the label Needs rebase on Feb 15, 2023
  103. jonatack commented at 6:34 pm on February 16, 2023: contributor
    re-ACK b5eba9ad001f1035dd641bc5880cb6bb53a8b07f
  104. DrahtBot added the label Needs rebase on Feb 17, 2023
  105. martinus commented at 6:50 am on February 18, 2023: contributor
    Rebased adds #include <test/util/random.h>
  106. martinus force-pushed on Feb 18, 2023
  107. DrahtBot removed the label Needs rebase on Feb 18, 2023
  108. jonatack commented at 3:32 pm on February 18, 2023: contributor
    re-ACK d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d
  109. in src/support/allocators/pool.h:118 in d87cb99bb3 outdated
    113+
    114+    /**
    115+     * Points to the end of available memory for carving out allocations.
    116+     *
    117+     * That member variable is redundant, and is always equal to `m_allocated_chunks.back() + m_chunk_size_bytes`
    118+     * whenever it is accessed, but `m_untouched_memory_end` caches this for clarity and efficiency.
    


    john-moffett commented at 5:01 pm on February 24, 2023:
    m_untouched_memory_end now called m_available_memory_end
  110. in src/support/allocators/pool.h:155 in d87cb99bb3 outdated
    150+     *
    151+     * Precondition: leftover bytes are either 0 or few enough to fit into a place in the freelist
    152+     */
    153+    void AllocateChunk()
    154+    {
    155+        // if there is still any available memory is left, put it into the freelist.
    


    john-moffett commented at 5:04 pm on February 24, 2023:
    Extra “is” after “memory”.
  111. john-moffett approved
  112. john-moffett commented at 5:12 pm on February 24, 2023: contributor

    ACK d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d

    Tested and code reviewed. Very nice improvement.

    Left tiny nits in documentation, but please ignore unless updating for other reasons.

  113. in src/support/allocators/pool.h:91 in d87cb99bb3 outdated
    86+     * Internal alignment value. The larger of the requested ALIGN_BYTES and alignof(FreeList).
    87+     */
    88+    static constexpr std::size_t ELEM_ALIGN_BYTES = std::max(alignof(ListNode), ALIGN_BYTES);
    89+    static_assert((ELEM_ALIGN_BYTES & (ELEM_ALIGN_BYTES - 1)) == 0, "ELEM_ALIGN_BYTES must be a power of two");
    90+    static_assert(sizeof(ListNode) <= ELEM_ALIGN_BYTES, "Units of size ELEM_SIZE_ALIGN need to be able to store a ListNode");
    91+    static_assert((MAX_BLOCK_SIZE_BYTES & (ELEM_ALIGN_BYTES - 1)) == 0, "MAX_BLOCK_SIZE_BYTES needs to be a multiple of the alignment.");
    


    LarryRuane commented at 7:18 pm on March 3, 2023:

    I think this condition is guaranteed by the other static assertions, but this expression seems incorrect. In general, A & (B-1) == 0 doesn’t mean that A is a multiple of B, does it? If A=8 and B=3, then 8 & (3-1) is zero, but 8 isn’t a multiple of 3. If A=8, B=4, then 8 & (4-1) is also zero, so we get the correct result (8 is a multiple of 4), but it’s kind of by accident. To state it differently, if MAX_BLOCK_SIZE_BYTES is some large power of 2 (which is guaranteed above), then in binary it’s 1000...0, so ANDing with any smaller value will always produce zero.

    Maybe this is better:

    0    static_assert((MAX_BLOCK_SIZE_BYTES % ELEM_ALIGN_BYTES) == 0, "MAX_BLOCK_SIZE_BYTES needs to be a multiple of the alignment.");
    

    martinus commented at 11:10 am on March 4, 2023:

    In line 89 there is an assert that ELEM_ALIGN_BYTES is multiple of 2:

    0static_assert((ELEM_ALIGN_BYTES & (ELEM_ALIGN_BYTES - 1)) == 0, "ELEM_ALIGN_BYTES must be a power of two");
    

    So given that line 91 should make sense, ELEM_ALIGN_BYTES - 1 becomes a bitmask and actually asserts that MAX_BLOCK_SIZE_BYTES is multiple of ELEM_ALIGN_BYTES. But right, on its own that assert wouldn’t be enough and using % is probably a bit more clear.

  114. in src/coins.h:148 in d87cb99bb3 outdated
    136+using CCoinsMap = std::unordered_map<COutPoint,
    137+                                     CCoinsCacheEntry,
    138+                                     SaltedOutpointHasher,
    139+                                     std::equal_to<COutPoint>,
    140+                                     PoolAllocator<std::pair<const COutPoint, CCoinsCacheEntry>,
    141+                                                   sizeof(std::pair<const COutPoint, CCoinsCacheEntry>) + sizeof(void*) * 4,
    


    LarryRuane commented at 8:09 am on March 4, 2023:
    I don’t understand the purpose of adding sizeof(void*) * 4; could you leave a brief comment if you get the chance? (Unless I’m just being clueless!)

    martinus commented at 11:10 am on March 4, 2023:

    Posting here now, I can add this as a comment later:

    The value here determines the maximum bytes that the PoolAllocator supports. When bigger blocks are allocated, this is just forwarded to new.

    The thing with sizeof(void*) * 4 is, it is not enough to just support up to sizes of the std::pair<const COutPoint, CCoinsCacheEntry>, because the different implementations of std::unordered_map use more memory for each node. Most implementations wrap the std::pair into a struct that contains a single pointer, so they can link them in a single linked list. But not all; e.g. Microsoft’s STL uses a double linked list. Also libstd++ and libc++ wrap the pair differently, which can lead to different memory usage due to alignement, and some might store the hash value as well; depending on if the hash’es operator() is noexcept or not. All in all, correctly determining the size used for allocation is really hard and brittle because this depends on a lot of implementation details.

    So with adding 4 * the size of a pointer we err on the safe side; and nodes can surely be allocated with all implementions of std::unordered_map. m_free_lists might be a bit larger than it needs to be and have a few pointers that never hold a freelist, but at least we can be sure that std::unordered_map nodes are actually allocated with the pool.


    hebasto commented at 3:34 pm on November 19, 2023:

    @martinus

    Could assessing of the std::unordered_map::node_type improve this heuristic?


    martinus commented at 3:38 pm on November 19, 2023:
    @hebasto Unfortunately not, I actually wanted to use this but learned that this is actually just the type of the node handle which more or less is just a pointer to the data. The actual type of the node used for allocation is not exposed by the standard. If it were, it would make the pool implementation quite a bit simpler
  115. LarryRuane commented at 8:12 am on March 4, 2023: contributor
    I’d like to review more, but don’t let me hold you up. I’m hosting review club on this PR on March 8 (next week); anyone here should feel free to join since you know this PR better than I do!
  116. martinus commented at 11:13 am on March 4, 2023: contributor
    @LarryRuane awesome that you’ll hold a PR review club about this PR! I’ll try to join, but can’t yet guarantee that I’ll have the time.
  117. in src/support/allocators/pool.h:69 in d87cb99bb3 outdated
    64+ * holds the 3 blocks of size 16. The blocks came from the data stored in the
    65+ * m_allocated_chunks list. Each chunk has bytes 262144. The last chunk has still
    66+ * some memory available for the blocks, and when m_available_memory_it is at the
    67+ * end, a new chunk will be allocated and added to the list.
    68+ */
    69+template <std::size_t MAX_BLOCK_SIZE_BYTES, std::size_t ALIGN_BYTES>
    


    LarryRuane commented at 8:59 pm on March 7, 2023:
    Why is std::size_t preferred over size_t within this file?

    martinus commented at 3:43 pm on March 8, 2023:
    No real reason. Technically in C++ I believe std::size_t should be preferred, especially when using includes like <cstddef> instead of <stddef.h>. I personally usually use just size_t everywhere and never had a compile problem.
  118. in src/support/allocators/pool.h:156 in d87cb99bb3 outdated
    151+     * Precondition: leftover bytes are either 0 or few enough to fit into a place in the freelist
    152+     */
    153+    void AllocateChunk()
    154+    {
    155+        // if there is still any available memory is left, put it into the freelist.
    156+        size_t remaining_available_bytes = std::distance(m_available_memory_it, m_available_memory_end);
    


    LarryRuane commented at 9:13 pm on March 7, 2023:

    Possibly related to #25325 (review), the first time this runs, both m_available_memory_it and m_available_memory_end are nullptr; is it UB to pass these to std::distance()? I think it’s okay, just wanted to raise as a possible concern. You could do something like

    0        size_t remaining_available_bytes = m_available_memory_it ? std::distance(m_available_memory_it, m_available_memory_end) : 0;
    

    martinus commented at 3:41 pm on March 8, 2023:
    This is explicitly defined in the standard here, in 5.1: http://eel.is/c++draft/expr.add#5
  119. in src/bench/pool.cpp:24 in d87cb99bb3 outdated
    19+        auto rng = ankerl::nanobench::Rng(1234);
    20+        for (size_t i = 0; i < batch_size; ++i) {
    21+            map[rng()];
    22+        }
    23+        map.clear();
    24+    });
    


    LarryRuane commented at 3:15 pm on March 8, 2023:

    This benchmark doesn’t re-use memory (add to freelist then remove from freelist); maybe it would be better if it did because that’s what happens with the real coins cache, maybe something like this:

     0-    size_t batch_size = 5000;
     1+    // The steady-state size of the map will be half of this value;
     2+    // power-of-2 to avoid expensive mod operation during benchmark.
     3+    constexpr size_t batch_size = 1 << 13;
     4 
     5     // make sure each iteration of the benchmark contains exactly 5000 inserts and one clear.
     6     // do this at least 10 times so we get reasonable accurate results
     7 
     8     bench.batch(batch_size).minEpochIterations(10).run([&] {
     9         auto rng = ankerl::nanobench::Rng(1234);
    10-        for (size_t i = 0; i < batch_size; ++i) {
    11-            map[rng()];
    12+        for (size_t i = 0; i < batch_size * 10; ++i) {
    13+            uint64_t r{rng()};
    14+            // if the map has few entries, more likely add an entry, else delete
    15+            if ((r & (batch_size-1)) < map.size()) {
    16+                map.erase(map.begin());
    17+            } else {
    18+                map[r];
    19+            }
    20         }
    21         map.clear();
    22     });
    

    I don’t think it matters which entry we delete, since map.begin() could return any entry. Deleting a truly random entry gets more complicated, and I don’t think it changes the performance significantly. The loop iterates batch_size * 10 instead of batch_size so we reach steady-state (about equal numbers of insertions and deletions).


    martinus commented at 3:46 pm on March 8, 2023:
    Actually it does reuse memory, when calling map.clear() all of the entries of the map are deallocated and thus the pool_resource gets them and puts them into the freelist. The memory is only ever released in the destructor of pool_resource

    LarryRuane commented at 4:17 pm on March 8, 2023:

    That’s a good point, I hadn’t noticed that. But perhaps it’s good to make the load more realistic to get better benchmarking results. I’m assuming that when UTXOs become spent TXOs during block validation, they’re removed from this unordered_map, but maybe that’s not true? Just seems like doing many interleaving allocations and deallocations would be more realistic, but maybe not. But this suggestion is non-blocking.

    In case anyone is wondering, I observed that the benchmark does cause allocations (and deallocations) that are too large for the pool, and these are for the hash bucket arrays. (I’m sure the fuzzer does that too, so far I’ve run only the benchmark.)

  120. LarryRuane commented at 3:18 pm on March 8, 2023: contributor
    Looks good.
  121. martinus referenced this in commit c16086fe2c on Mar 11, 2023
  122. sipa approved
  123. sipa commented at 5:18 pm on March 20, 2023: member
    ACK d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d
  124. achow101 requested review from LarryRuane on Mar 20, 2023
  125. achow101 removed review request from LarryRuane on Mar 20, 2023
  126. achow101 requested review from aureleoules on Mar 20, 2023
  127. achow101 requested review from LarryRuane on Mar 20, 2023
  128. jamesob commented at 0:46 am on March 21, 2023: member
    Kicked off a bitcoinperf run; will have some results tomorrow.
  129. DrahtBot removed review request from LarryRuane on Mar 21, 2023
  130. jamesob commented at 1:02 pm on March 21, 2023: member

    Cool! Seeing a ~8% speedup over a modern region of the chain with lower memory usage.


    ibd local range dbcache=8000 667200 697200

    bench name command
    ibd.local.range.dbcache=8000.667200.697200 bitcoind -dbcache=8000 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876

    #25325 vs. $mergebase (absolute)

    bench name x #25325 $mergebase
    ibd.local.range.dbcache=8000.667200.697200.total_secs 3 4205.0444 (± 23.6683) 4536.8536 (± 7.9194)
    ibd.local.range.dbcache=8000.667200.697200.peak_rss_KiB 3 4428684.0000 (± 3557.8525) 4806934.6667 (± 108.7791)
    ibd.local.range.dbcache=8000.667200.697200.cpu_kernel_secs 3 210.8233 (± 1.0422) 209.8300 (± 0.9361)
    ibd.local.range.dbcache=8000.667200.697200.cpu_user_secs 3 26000.4633 (± 15.5244) 26412.5367 (± 8.9365)

    #25325 vs. $mergebase (relative)

    bench name x #25325 $mergebase
    ibd.local.range.dbcache=8000.667200.697200.total_secs 3 1.00 1.079
    ibd.local.range.dbcache=8000.667200.697200.peak_rss_KiB 3 1.00 1.085
    ibd.local.range.dbcache=8000.667200.697200.cpu_kernel_secs 3 1.00 1.000
    ibd.local.range.dbcache=8000.667200.697200.cpu_user_secs 3 1.00 1.016
  131. martinus commented at 2:18 pm on March 21, 2023: contributor
    Thanks for the benchmark @jamesob! There was no cache flush in the benchmark, that’s why the memory usage was lower. When flushes would happen with e.g. lower dbcache size or longer range of blocks memory usage should be about equal, but then there should be an even larger performance benefit for this PR because it can cache more data with the same memory.
  132. jamesob commented at 2:28 pm on March 21, 2023: member

    When flushes would happen with e.g. lower dbcache size or longer range of blocks memory usage should be about equal, but then there should be an even larger performance benefit for this PR because it can cache more data with the same memory.

    Yup - I started another run with dbcache=1000.

  133. in src/test/util/poolresourcetester.h:116 in d87cb99bb3 outdated
    111+                assert(chunk_it != chunks.end());
    112+                chunk_ptr_remaining = chunk_it->ptr;
    113+                chunk_size_remaining = chunk_it->size;
    114+            }
    115+            assert(free_block.ptr == chunk_ptr_remaining);                   // ensure addresses match
    116+            assert(free_block.size <= chunk_size_remaining);                 // ensure we no overflow
    


    LarryRuane commented at 9:48 pm on March 21, 2023:
    0            assert(free_block.size <= chunk_size_remaining);                 // ensure no overflow
    
  134. in src/test/pool_tests.cpp:121 in d87cb99bb3 outdated
    116+    std::vector<PtrSizeAlignment> ptr_size_alignment{};
    117+    for (size_t i = 0; i < 1000; ++i) {
    118+        // make it a bit more likely to allocate than deallocate
    119+        if (ptr_size_alignment.empty() || 0 != InsecureRandRange(4)) {
    120+            // allocate a random item
    121+            std::size_t alignment = std::size_t{1} << InsecureRandRange(7);           // 1, 2, ..., 128
    


    LarryRuane commented at 9:51 pm on March 21, 2023:
    0            std::size_t alignment = std::size_t{1} << InsecureRandRange(8);           // 1, 2, ..., 128
    
  135. in src/test/util/poolresourcetester.h:95 in d87cb99bb3 outdated
    90+            free_blocks.emplace_back(resource.m_available_memory_it, num_available_bytes);
    91+        }
    92+
    93+        // collect all chunks
    94+        std::vector<PtrAndBytes> chunks;
    95+        for (std::byte* ptr : resource.m_allocated_chunks) {
    


    LarryRuane commented at 9:52 pm on March 21, 2023:
    0        for (const std::byte* ptr : resource.m_allocated_chunks) {
    
  136. in src/test/util/poolresourcetester.h:121 in d87cb99bb3 outdated
    116+            assert(free_block.size <= chunk_size_remaining);                 // ensure we no overflow
    117+            assert((free_block.ptr & (resource.ELEM_ALIGN_BYTES - 1)) == 0); // ensure correct alignment
    118+            chunk_ptr_remaining += free_block.size;
    119+            chunk_size_remaining -= free_block.size;
    120+        }
    121+        assert(chunk_ptr_remaining == chunks.back().ptr + chunks.back().size); // ensure we are t the end of the chunks
    


    LarryRuane commented at 9:52 pm on March 21, 2023:

    nit, this would test slightly more

    0        assert(chunk_ptr_remaining == chunk_it->ptr + chunk_it->size); // ensure we are at the end of the chunks
    1        ++chunk_it;
    2        assert(chunk_it == chunks.end());
    
  137. achow101 commented at 10:16 pm on March 21, 2023: member

    Light ACK d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d

    Not particularly well versed in allocators, but the logic of the allocator and its tests makes sense, and the benchmarks seem to show there is a noticeable improvement.

    One thing I did notice though is that when configured with --enable-debug, the benchmark PoolAllocator_StdUnorderedMapWithPoolResource is a little bit slower than PoolAllocator_StdUnorderedMap. Without --enable-debug, it’s quite a bit faster. I didn’t measure the effect of this in actual IBD, but configuring with --enable-debug is already known to make things slower, so it shouldn’t matter much.

    With --enable-debug:

    ns/op op/s err% ins/op bra/op miss% total benchmark
    239.73 4,171,384.12 0.1% 2,516.56 401.94 0.1% 0.14 PoolAllocator_StdUnorderedMap
    277.15 3,608,128.61 0.1% 2,792.06 432.32 0.1% 0.17 PoolAllocator_StdUnorderedMapWithPoolResource

    Without --enable-debug:

    ns/op op/s err% ins/op bra/op miss% total benchmark
    21.53 46,436,178.12 0.3% 279.84 59.34 0.4% 0.01 PoolAllocator_StdUnorderedMap
    10.45 95,674,714.97 0.2% 82.34 13.71 0.8% 0.01 PoolAllocator_StdUnorderedMapWithPoolResource
  138. in src/test/util/poolresourcetester.h:85 in d87cb99bb3 outdated
    80+            std::size_t bytes = freelist_idx * resource.ELEM_ALIGN_BYTES;
    81+            auto* ptr = resource.m_free_lists[freelist_idx];
    82+            while (ptr != nullptr) {
    83+                free_blocks.emplace_back(ptr, bytes);
    84+                ptr = ptr->m_next;
    85+            }
    


    LarryRuane commented at 10:20 pm on March 21, 2023:
    Probably not important, but if a bug caused a cycle in a freelist, I think the test would allocate an unbounded amount of memory (pushing to free_blocks), which would be not a nice way to fail. I think you could calculate an upper bound of the number of free blocks (number of chunks times chunk size divided by this free list’s blocksize), then assert if the number of iterations exceeds that number.

    martinus commented at 5:09 am on March 24, 2023:
    I’ve left that out on purpose, I didn’t want to complicate this code any further. So far this case never happened
  139. in src/test/pool_tests.cpp:40 in d87cb99bb3 outdated
    35+    resource.Deallocate(block, 8, 8);
    36+    PoolResourceTester::CheckAllDataAccountedFor(resource);
    37+    BOOST_TEST(1 == PoolResourceTester::FreeListSizes(resource)[1]);
    38+
    39+    // alignment is too small, but the best fitting freelist is used. Nothing is allocated.
    40+    block = resource.Allocate(8, 1);
    


    LarryRuane commented at 10:59 pm on March 21, 2023:

    nit, and would this make the test too specific? I don’t think so, but something to consider.

    0    void* b = resource.Allocate(8, 1);
    1    BOOST_TEST(block == b);
    
  140. in src/test/pool_tests.cpp:50 in d87cb99bb3 outdated
    45+    PoolResourceTester::CheckAllDataAccountedFor(resource);
    46+    BOOST_TEST(1 == PoolResourceTester::FreeListSizes(resource)[1]);
    47+    BOOST_TEST(expected_bytes_available == PoolResourceTester::AvailableMemoryFromChunk(resource));
    48+
    49+    // can't use chunk because alignment is too big
    50+    block = resource.Allocate(8, 16);
    


    LarryRuane commented at 11:04 pm on March 21, 2023:
    0    // can't use resource because alignment is too big, allocate system memory
    1    b = resource.Allocate(8, 16);
    2    BOOST_TEST(block == b);
    3    block = b;
    

    martinus commented at 5:42 pm on March 22, 2023:
    That here would be BOOST_TEST(block != b), because since b now has to come from the ::operator new and not from the freelist

    jonatack commented at 5:04 pm on March 24, 2023:

    (Only if you have to retouch, or maybe in a follow-up), perhaps add this comment:

    0BOOST_TEST(b != block); // as `b` has to come from `::operator new` and not from the freelist
    
  141. in src/test/pool_tests.cpp:64 in d87cb99bb3 outdated
    56+    PoolResourceTester::CheckAllDataAccountedFor(resource);
    57+    BOOST_TEST(1 == PoolResourceTester::FreeListSizes(resource)[1]);
    58+    BOOST_TEST(expected_bytes_available == PoolResourceTester::AvailableMemoryFromChunk(resource));
    59+
    60+    // can't use chunk because size is too big
    61+    block = resource.Allocate(16, 8);
    


    LarryRuane commented at 11:12 pm on March 21, 2023:
    0    // can't use resource because size is too big, allocate system memory
    1    b = resource.Allocate(16, 8);
    2    BOOST_TEST(block != b);
    3    block = b;
    

    martinus commented at 5:45 pm on March 22, 2023:
    With the above change, if I’m not mistaken I don’t think it’s safe to check for BOOST_TEST(block != b) here too. This too calls ::operator new, and because the previous memory was deallocated already, it might give out the same block of memory, depending on however malloc is implemented
  142. in src/test/pool_tests.cpp:76 in d87cb99bb3 outdated
    71+
    72+// Allocates from 0 to n bytes were n > the PoolResource's data, and each should work
    73+BOOST_AUTO_TEST_CASE(allocate_any_byte)
    74+{
    75+    auto resource = PoolResource<128, 8>(1024);
    76+    auto counts = PoolResourceTester::FreeListSizes(resource);
    


    LarryRuane commented at 11:18 pm on March 21, 2023:
    counts is unused
  143. in src/test/pool_tests.cpp:88 in d87cb99bb3 outdated
    83+    for (uint8_t num_bytes = 1; num_bytes < num_allocs; ++num_bytes) {
    84+        uint8_t* bytes = new (resource.Allocate(num_bytes, 1)) uint8_t[num_bytes];
    85+        BOOST_TEST(bytes != nullptr);
    86+        data.emplace_back(bytes, num_bytes);
    87+
    88+        // set each byte to i
    


    LarryRuane commented at 11:26 pm on March 21, 2023:
    0        // set each byte to num_bytes
    
  144. in src/test/pool_tests.cpp:86 in d87cb99bb3 outdated
    67+    PoolResourceTester::CheckAllDataAccountedFor(resource);
    68+    BOOST_TEST(1 == PoolResourceTester::FreeListSizes(resource)[1]);
    69+    BOOST_TEST(expected_bytes_available == PoolResourceTester::AvailableMemoryFromChunk(resource));
    70+}
    71+
    72+// Allocates from 0 to n bytes were n > the PoolResource's data, and each should work
    


    LarryRuane commented at 11:59 pm on March 21, 2023:
    This test doesn’t allocate zero bytes, it allocates 1 to n bytes. It seems like zero-byte allocations should be allowed. See comments below.

    martinus commented at 6:17 pm on March 22, 2023:

    After having a closer look, you are right that currently allocating & freeing 0 bytes does not work with the resource. But I think I can’t allow allocating 0 bytes with the resource at all, because when allocating multple 0 bytes it would always give out the same pointer, and the standard says that this is not allowed: https://cplusplus.github.io/LWG/issue9

    So I think the best solution is to just add a bytes > 0 condition to IsFreeListUsable, so that ::operator new will be used in that case.


    sipa commented at 9:38 pm on March 22, 2023:
    Alternatively, just round up size 0 to be at least 1 alignment?

    LarryRuane commented at 10:40 pm on March 22, 2023:

    @martinus, good catch! @sipa, good idea, this does seem to work (hope this is what you meant):

    0     [[nodiscard]] static constexpr std::size_t NumElemAlignBytes(std::size_t bytes)
    1     {
    2-        return (bytes + ELEM_ALIGN_BYTES - 1) / ELEM_ALIGN_BYTES;
    3+        return bytes > 0 ? ((bytes + ELEM_ALIGN_BYTES - 1) / ELEM_ALIGN_BYTES) : 1;
    4     }
    

    Benchmark results are identical for me, but this should be confirmed.


    sipa commented at 11:01 pm on March 22, 2023:

    Yeah, or even:

    0return (bytes + ELEM_ALIGN_BYTES - 1) / ELEM_ALIGN_BYTES + (bytes == 0);
    

    which might be a minuscule amount faster.


    LarryRuane commented at 5:02 pm on March 23, 2023:

    That’s clever! They generate similar code, but this first way is slightly less code https://godbolt.org/z/Pz77TTEd8

    0int f(int bytes)
    1{
    2    return bytes > 0 ? (bytes+8-1)/8 : 1;
    3}
    

    Or https://godbolt.org/z/adzasoPad

    0int f(int bytes)
    1{
    2    return (bytes+8-1)/8 + (bytes == 0);
    3}
    

    That’s using x86-64 gcc 11.3; results are similar for x86-64 clang 16.0.0.


    martinus commented at 5:37 pm on March 23, 2023:
    The code generation is a bit different when you use unsigned types, then the + (bytes == 0) version is the shortedst for me. In my microbenchmark the + (bytes == 0) is also fastest, but in practice its most likely irrelevant

    martinus commented at 5:10 am on March 24, 2023:
    I’ve implemented the + (bytes == 0) version in 9f947fc3d4b779f017332135323b34e8f216f613

    jonatack commented at 5:03 pm on March 24, 2023:

    The code generation is a bit different when you use unsigned types, then the + (bytes == 0) version is the shortedst for me. In my microbenchmark the + (bytes == 0) is also fastest, but in practice its most likely irrelevant

    Yes, it seems to be one instruction shorter after updating @LarryRuane’s examples (thanks for doing them) to size_t and checking with gcc 12.2 and clang 16.

  145. in src/test/pool_tests.cpp:83 in d87cb99bb3 outdated
    78+    uint8_t num_allocs = 200;
    79+
    80+    auto data = std::vector<Span<uint8_t>>();
    81+
    82+    // allocate an increasing number of bytes
    83+    for (uint8_t num_bytes = 1; num_bytes < num_allocs; ++num_bytes) {
    


    LarryRuane commented at 0:01 am on March 22, 2023:

    If we want to test zero-length allocations (but requires a change to Deallocate(), see comment there).

    0    for (uint8_t num_bytes = 0; num_bytes < num_allocs; ++num_bytes) {
    
  146. in src/test/pool_tests.cpp:93 in d87cb99bb3 outdated
    88+        // set each byte to i
    89+        std::fill(bytes, bytes + num_bytes, num_bytes);
    90+    }
    91+
    92+    // now that we got all allocated, test if all still have the correct values, and give everything back to the allocator
    93+    uint8_t val = 1;
    


    LarryRuane commented at 0:01 am on March 22, 2023:

    If we want to test zero-length allocations (but requires a change to Deallocate(), see comment there).

    0    uint8_t val = 0;
    
  147. in src/support/allocators/pool.h:247 in d87cb99bb3 outdated
    242+    {
    243+        if (IsFreeListUsable(bytes, alignment)) {
    244+            const std::size_t num_alignments = NumElemAlignBytes(bytes);
    245+            // put the memory block into the linked list. We can placement construct the FreeList
    246+            // into the memory since we can be sure the alignment is correct.
    247+            PlacementAddToList(p, m_free_lists[num_alignments]);
    


    LarryRuane commented at 0:06 am on March 22, 2023:

    This allows zero-byte allocations, which I think is supported by the standard allocator, so we should too (right?) I benchmarked this change and there was no difference, but my benchmark setup is not very good, so that should definitely be tested.

    0            if (bytes > 0) {
    1                const std::size_t num_alignments = NumElemAlignBytes(bytes);
    2                // put the memory block into the linked list. We can placement construct the FreeList
    3                // into the memory since we can be sure the alignment is correct.
    4                PlacementAddToList(p, m_free_lists[num_alignments]);
    5            }
    

    martinus commented at 6:30 pm on March 23, 2023:
    I’ll implement the + (bytes == 0) for NumElemAlignBytes
  148. jamesob commented at 0:34 am on March 22, 2023: member

    Back with results for dbcache=1000; less noticeable speedup (5%) and increased memory usage (11%).


    ibd local range dbcache=1000 667200 697200

    bench name command
    ibd.local.range.dbcache=1000.667200.697200 bitcoind -dbcache=1000 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876

    #25325 vs. $mergebase (absolute)

    bench name x #25325 $mergebase
    ibd.local.range.dbcache=1000.667200.697200.total_secs 3 4721.4819 (± 11.2932) 4990.6752 (± 19.7466)
    ibd.local.range.dbcache=1000.667200.697200.peak_rss_KiB 3 2906596.0000 (± 86412.3208) 2607876.0000 (± 4288.9414)
    ibd.local.range.dbcache=1000.667200.697200.cpu_kernel_secs 3 499.0333 (± 4.3527) 581.9333 (± 4.1179)
    ibd.local.range.dbcache=1000.667200.697200.cpu_user_secs 3 26882.3333 (± 17.6349) 27490.5000 (± 12.0885)

    #25325 vs. $mergebase (relative)

    bench name x #25325 $mergebase
    ibd.local.range.dbcache=1000.667200.697200.total_secs 3 1.00 1.057
    ibd.local.range.dbcache=1000.667200.697200.peak_rss_KiB 3 1.11 1.000
    ibd.local.range.dbcache=1000.667200.697200.cpu_kernel_secs 3 1.00 1.166
    ibd.local.range.dbcache=1000.667200.697200.cpu_user_secs 3 1.00 1.023
  149. in src/test/pool_tests.cpp:122 in d87cb99bb3 outdated
    117+    for (size_t i = 0; i < 1000; ++i) {
    118+        // make it a bit more likely to allocate than deallocate
    119+        if (ptr_size_alignment.empty() || 0 != InsecureRandRange(4)) {
    120+            // allocate a random item
    121+            std::size_t alignment = std::size_t{1} << InsecureRandRange(7);           // 1, 2, ..., 128
    122+            std::size_t size = (InsecureRandRange(2000) / alignment + 1) * alignment; // multiple of alignment
    


    LarryRuane commented at 0:53 am on March 22, 2023:

    This change will cause the pool resource allocator to be used much more often (since the max block size is 128).

    0            std::size_t size = (InsecureRandRange(200) / alignment + 1) * alignment;  // multiple of alignment
    
  150. LarryRuane commented at 1:42 am on March 22, 2023: contributor

    All comments are non-blocking, except I am concerned about zero-length allocations.

    I stepped through pool_tests.cpp in the debugger and all was as expected. I didn’t run or review the fuzz test or the other modified unit tests, but I’ll try to get to those in the next couple of days.

  151. LarryRuane commented at 2:33 pm on March 22, 2023: contributor
    ACK d87cb99bb37637e26a9e00b9f7de4bc6f44cb79d
  152. martinus commented at 5:23 pm on March 22, 2023: contributor
    @jamesob interesting that it didn’t see a bigger speedup, but I guess it depends on a lot of other factors as well. How fast is your harddisk, and how much RAM does your computer have?
  153. Add pool based memory resource & allocator
    A memory resource similar to std::pmr::unsynchronized_pool_resource, but
    optimized for node-based containers.
    
    Co-Authored-By: Pieter Wuille <pieter@wuille.net>
    b8401c3281
  154. Calculate memory usage correctly for unordered_maps that use PoolAllocator
    Extracts the resource from a PoolAllocator and uses it for
    calculation of the node's memory usage.
    e19943f049
  155. Add PoolResource fuzzer
    Fuzzes PoolResource with random allocations/deallocations, and multiple
    asserts.
    
    Co-Authored-By: Pieter Wuille <pieter@wuille.net>
    1afca6b663
  156. Call ReallocateCache() on each Flush()
    This frees up all associated memory with the map, not only the nodes.
    This is necessary in preparation for using the PoolAllocator for
    CCoinsMap, which does not actually free any memory on clear().
    5e4ac5abf5
  157. Use PoolAllocator for CCoinsMap
    In my benchmarks, using this pool allocator for CCoinsMap gives about
    20% faster `-reindex-chainstate` with -dbcache=5000 with practically the
    same memory usage. The change in max RSS changed was 0.3%.
    
    The `validation_flush_tests` tests need to be updated because
    memory allocation is now done in large pools instead of one node at a
    time, so the limits need to be updated accordingly.
    9f947fc3d4
  158. martinus force-pushed on Mar 23, 2023
  159. martinus commented at 7:34 pm on March 23, 2023: contributor

    Updated d87cb99 -> 9f947fc3d4b779f017332135323b34e8f216f613 (pr25325.1 -> pr25325.2)

    There is a single behavior change in pool.h, now NumElemAlignBytes adds + (bytes == 0) so that allocations of 0 bytes work with the PoolAllocator.

    Other than that, updated tests to include allocation of 0 bytes, and fixed all the nits.

    I tried to to benchmark to see any diffference with the new + (bytes == 0) check or with the bytes > 0 ? variant, and all behave exactly the same in the PoolAllocator_StdUnorderedMapWithPoolResource benchmark. Exactly the same number of instructions, same number of branches, and the time fluctuates only due to measurement precision:

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    40.69 24,575,107.54 0.1% 151.20 129.46 1.168 23.54 2.1% 1.10 old d87cb99bb3
    40.70 24,568,640.94 0.2% 151.20 129.48 1.168 23.54 2.0% 1.10 bytes > 0 ?
    40.47 24,707,366.33 0.7% 151.20 128.74 1.174 23.54 2.1% 1.10 new 9f947fc3d4: + (bytes == 0)
  160. john-moffett approved
  161. john-moffett commented at 3:48 pm on March 24, 2023: contributor

    ACK 9f947fc3d4b779f017332135323b34e8f216f613

    Verified the changes with range-diff:

     01:  45508ec79 ! 1:  b8401c328 Add pool based memory resource & allocator
     1    @@ src/support/allocators/pool.h (new)
     2-     * whenever it is accessed, but `m_untouched_memory_end` caches this for clarity and efficiency.
     3+     * whenever it is accessed, but `m_available_memory_end` caches this for clarity and efficiency.
     4    @@ src/support/allocators/pool.h (new)
     5-     * into m_free_lists.
     6+     * into m_free_lists. Round up for the special case when bytes==0.
     7    @@ src/support/allocators/pool.h (new)
     8-        return (bytes + ELEM_ALIGN_BYTES - 1) / ELEM_ALIGN_BYTES;
     9+        return (bytes + ELEM_ALIGN_BYTES - 1) / ELEM_ALIGN_BYTES + (bytes == 0);
    10    @@ src/support/allocators/pool.h (new)
    11-        // if there is still any available memory is left, put it into the freelist.
    12+        // if there is still any available memory left, put it into the freelist.
    13    @@ src/test/pool_tests.cpp (new)
    14-    block = resource.Allocate(8, 1);
    15+    void* b = resource.Allocate(8, 1);
    16+    BOOST_TEST(b == block); // we got the same block of memory as before
    17    @@ src/test/pool_tests.cpp (new)
    18-    // can't use chunk because alignment is too big
    19-    block = resource.Allocate(8, 16);
    20+    // can't use resource because alignment is too big, allocate system memory
    21+    b = resource.Allocate(8, 16);
    22+    BOOST_TEST(b != block);
    23+    block = b;
    24    @@ src/test/pool_tests.cpp (new)
    25+
    26+    // it's possible that 0 bytes are allocated, make sure this works. In that case the call is forwarded to operator new
    27+    // 0 bytes takes one entry from the first freelist
    28+    void* p = resource.Allocate(0, 1);
    29+    BOOST_TEST(0 == PoolResourceTester::FreeListSizes(resource)[1]);
    30+    BOOST_TEST(expected_bytes_available == PoolResourceTester::AvailableMemoryFromChunk(resource));
    31+
    32+    resource.Deallocate(p, 0, 1);
    33+    PoolResourceTester::CheckAllDataAccountedFor(resource);
    34+    BOOST_TEST(1 == PoolResourceTester::FreeListSizes(resource)[1]);
    35+    BOOST_TEST(expected_bytes_available == PoolResourceTester::AvailableMemoryFromChunk(resource));
    36    @@ src/test/pool_tests.cpp (new)
    37-    auto counts = PoolResourceTester::FreeListSizes(resource);
    38    @@ src/test/pool_tests.cpp (new)
    39-    for (uint8_t num_bytes = 1; num_bytes < num_allocs; +num_bytes) {
    40+    for (uint8_t num_bytes = 0; num_bytes < num_allocs; +num_bytes) {
    41    @@ src/test/pool_tests.cpp (new)
    42-        // set each byte to i
    43+        // set each byte to num_bytes
    44    @@ src/test/pool_tests.cpp (new)
    45-    uint8_t val = 1;
    46+    uint8_t val = 0;
    47    @@ src/test/pool_tests.cpp (new)
    48-            std::size_t alignment = std::size_t{1} << InsecureRandRange(7);           // 1, 2, ..., 128
    49-            std::size_t size = (InsecureRandRange(2000) / alignment + 1) * alignment; // multiple of alignment
    50+            std::size_t alignment = std::size_t{1} << InsecureRandRange(8);          // 1, 2, ..., 128
    51+            std::size_t size = (InsecureRandRange(200) / alignment + 1) * alignment; // multiple of alignment
    52    @@ src/test/util/poolresourcetester.h (new)
    53-        for (std::byte* ptr : resource.m_allocated_chunks) {
    54+        for (const std::byte* ptr : resource.m_allocated_chunks) {
    55    @@ src/test/util/poolresourcetester.h (new)
    56-            assert(free_block.size <= chunk_size_remaining);                 // ensure we no overflow
    57+            assert(free_block.size <= chunk_size_remaining);                 // ensure no overflow
    58    @@ src/test/util/poolresourcetester.h (new)
    59-        assert(chunk_ptr_remaining == chunks.back().ptr + chunks.back().size); // ensure we are t the end of the chunks
    60+        // ensure we are at the end of the chunks
    61+        assert(chunk_ptr_remaining == chunk_it->ptr + chunk_it->size);
    62+    +chunk_it;
    63+        assert(chunk_it == chunks.end());
    642:  ed2e1cbe8 = 2:  e19943f04 Calculate memory usage correctly for unordered_maps that use PoolAllocator
    653:  477e16dbf = 3:  1afca6b66 Add PoolResource fuzzer
    664:  d6b85474d = 4:  5e4ac5abf Call ReallocateCache() on each Flush()
    675:  d87cb99bb ! 5:  9f947fc3d Use PoolAllocator for CCoinsMap
    68    @@ src/coins.h: struct CCoinsCacheEntry
    69+/**
    70+ * PoolAllocator's MAX_BLOCK_SIZE_BYTES parameter here uses sizeof the data, and adds the size
    71+ * of 4 pointers. We do not know the exact node size used in the std::unordered_node implementation
    72+ * because it is implementation defined. Most implementations have an overhead of 1 or 2 pointers,
    73+ * so nodes can be connected in a linked list, and in some cases the hash value is stored as well.
    74+ * Using an additional sizeof(void*)*4 for MAX_BLOCK_SIZE_BYTES should thus be sufficient so that
    75+ * all implementations can allocate the nodes from the PoolAllocator.
    76+ */
    
  162. DrahtBot requested review from achow101 on Mar 24, 2023
  163. DrahtBot requested review from jonatack on Mar 24, 2023
  164. DrahtBot requested review from LarryRuane on Mar 24, 2023
  165. DrahtBot requested review from sipa on Mar 24, 2023
  166. DrahtBot removed review request from LarryRuane on Mar 24, 2023
  167. DrahtBot requested review from LarryRuane on Mar 24, 2023
  168. DrahtBot removed review request from LarryRuane on Mar 24, 2023
  169. DrahtBot requested review from LarryRuane on Mar 24, 2023
  170. jonatack approved
  171. jonatack commented at 5:05 pm on March 24, 2023: contributor

    Nice review work, @LarryRuane.

    re-ACK 9f947fc3d4b779f017332135323b34e8f216f613

  172. DrahtBot removed review request from LarryRuane on Mar 24, 2023
  173. DrahtBot requested review from LarryRuane on Mar 24, 2023
  174. LarryRuane commented at 5:40 pm on March 24, 2023: contributor
    ACK 9f947fc3d4b779f017332135323b34e8f216f613
  175. DrahtBot removed review request from LarryRuane on Mar 24, 2023
  176. martinus commented at 7:21 am on March 28, 2023: contributor
    @sipa could you have another look after my update from d87cb99 -> 9f947fc3d4b779f017332135323b34e8f216f613?
  177. martinus referenced this in commit 319adf5b4d on Apr 4, 2023
  178. achow101 commented at 8:11 pm on April 20, 2023: member
    re-ACK 9f947fc3d4b779f017332135323b34e8f216f613
  179. DrahtBot removed review request from achow101 on Apr 20, 2023
  180. achow101 merged this on Apr 20, 2023
  181. achow101 closed this on Apr 20, 2023

  182. sipa commented at 8:32 pm on April 20, 2023: member
    Posthumous utACK 9f947fc3d4b779f017332135323b34e8f216f613
  183. martinus commented at 4:12 am on April 21, 2023: contributor
    Wohoo :tada: Thanks everyone for making this happen!
  184. sidhujag referenced this in commit 634c900ebd on Apr 21, 2023
  185. martinus deleted the branch on Dec 4, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 13:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me