Add allocator for node based containers #22702

pull martinus wants to merge 5 commits into bitcoin:master from martinus:2019-08-bulkpoolallocator changing 20 files +1164 −28
  1. martinus commented at 8:37 pm on August 13, 2021: contributor

    jamesob has benchmarked my rebased branch (see #16801 (comment)) concerning the allocator for node based containers, and has seen a 9% speedup along with 8% reduction in memory. That’s why I had another good look at my previously closed PR #16801 and updated the code. I’ve updated the code quite a bit, mostly trying to simplify it and improve correctness:

    • Renamed classes to be more in line with naming used for thestd::pmr allocators (see https://en.cppreference.com/w/cpp/memory/polymorphic_allocator)
    • Simplified Allocator thanks to being able to use C++17 behavior (std::allocator_traits)
    • Simpler allocation: only allocate blocks of the same size and just keep them in a std::vector.
    • Only access memory when actually needed. E.g. Linux uses optimistic memory allocation, so memory pages from malloc are only actually made available to the program when accessed
    • Correctly handle alignment for the inplace linked list.

    jamesob it would be great if you could run your benchmarks again with the updated code.

    edit laanwj: removed @ signs and html comments from message

  2. martinus force-pushed on Aug 13, 2021
  3. DrahtBot added the label Build system on Aug 13, 2021
  4. DrahtBot added the label UTXO Db and Indexes on Aug 13, 2021
  5. practicalswift commented at 6:46 pm on August 15, 2021: contributor

    Concept ACK

    The numbers look very promising!

    Seems like this could be one of those rare optimization opportunities actually worth pursuing :)

  6. in src/support/allocators/node_allocator.h:34 in 3574aeebcb outdated
    29+ * There's no free lunch, so there are also disadvantages:
    30+ *
    31+ * - Memory that's been used for nodes is always put back into a freelist and never given back to the system. Memory
    32+ *   is only freed when the MemoryResource is destructed.
    33+ *
    34+ * - The freelist is a simple first-in-last-out linked list, it doesn't reorder elements. So freeing and malloc'ing again
    


    JeremyRubin commented at 6:56 pm on August 15, 2021:
    last in first out?
  7. in src/support/allocators/node_allocator.h:46 in 3574aeebcb outdated
    41+ *
    42+ * MemoryResource is an immobile object that actually allocates, holds and manages chunks of memory. Since there is
    43+ * unfortunately no way to determine the size of nodes that we want to optimize for in advance, MemoryResource uses
    44+ * a simple heuristic: We assume the first call to Allocate with 1 element is for the node, and upon that first call the
    45+ * MemoryResource is configured to use that as it's chunk size. Note that due to this heuristic we cannot implement an
    46+ * `std::pmr::memory_resource` because it does not gie the information about the number of elements.
    


    JeremyRubin commented at 6:57 pm on August 15, 2021:
    gie -> give
  8. JeremyRubin commented at 7:11 pm on August 15, 2021: contributor

    I’m curious what the impact would be on the performance if you allocated N chunks at a time, always, to fill a page.

    E.g., if you want to allocate one node, always allocate something like aligned_alloc(sysconf(_SC_PAGESIZE), size) and divide it up into a bunch of pointers for the free list.

    Cool properties of doing so: Better cache alignment on items added around the same time, fewer things to keep in the allocated chunks list, if you were to sort the list by pointer (perhaps lazily, when we’re not doing anything else) you would get something with good cache alignment again.

  9. martinus commented at 5:52 am on August 16, 2021: contributor

    I’m curious what the impact would be on the performance if you allocated N chunks at a time, always, to fill a page.

    I’ve run the benchmark with ::aligned_alloc(sysconf(_SC_PAGESIZE), ...); and only allocating a single page, and at least in the benchmark there’s no difference:

    ns/op op/s err% total benchmark
    140.95 7,094,953.03 0.2% 0.08 NodeAllocator_StdUnorderedMap
    115.57 8,652,942.75 0.1% 0.06 NodeAllocator_StdUnorderedMapWithNodeAllocator ::operator new
    115.33 8,670,873.08 0.2% 0.06 NodeAllocator_StdUnorderedMapWithNodeAllocatoraligned_alloc
  10. JeremyRubin commented at 9:19 am on August 16, 2021: contributor
    Do you have specific code for that? The changes sounded a bit more complex than just aligning the allocator. Did you divide the allocation into N free list chunks? Or just over-allocate?
  11. martinus commented at 10:03 am on August 16, 2021: contributor

    Do you have specific code for that? The changes sounded a bit more complex than just aligning the allocator. Did you divide the allocation into N free list chunks? Or just over-allocate?

    MemoryResource::AllocateNewBlock is the only place where I actually malloc a new block that’s divided into the chunks. Here I’ve replaced the ::operator new(num_chunks* m_chunk_size_bytes) with ::alligned_alloc(sysconf(_SC_PAGESIZE), num_chunks* m_chunk_size_bytes). And use free in ~MemoryResource. I also changed the default allocation size m_block_size_bytes from 262144 to sysconf(_SC_PAGESIZE).

    So for node size of 104 bytes and page size 4096, this should allocate a page-alligned block with 39*104=4056 bytes.

  12. jamesob commented at 2:13 pm on August 16, 2021: member

    Some fresh benches in (high dbcache). Continuing to see fairly substantial time/memory improvements with this change.

    ibd local range 500000 540000

    commands index

    bench name command
    ibd.local.range.500000.540000 bitcoind -dbcache=10000 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876

    #22702 vs. $mergebase (absolute)

    bench name x #22702 $mergebase
    ibd.local.range.500000.540000.total_secs 3 2888.4055 (± 24.7775) 3131.5871 (± 4.8045)
    ibd.local.range.500000.540000.peak_rss_KiB 3 5886729.3333 (± 3815.6315) 6352470.6667 (± 979.1552)
    ibd.local.range.500000.540000.cpu_kernel_secs 3 263.5433 (± 1.9605) 269.2233 (± 2.6299)
    ibd.local.range.500000.540000.cpu_user_secs 3 12852.1300 (± 15.0661) 13134.5300 (± 2.9587)

    #22702 vs. $mergebase (relative)

    bench name x #22702 $mergebase
    ibd.local.range.500000.540000.total_secs 3 1 1.084
    ibd.local.range.500000.540000.peak_rss_KiB 3 1 1.079
    ibd.local.range.500000.540000.cpu_kernel_secs 3 1 1.022
    ibd.local.range.500000.540000.cpu_user_secs 3 1 1.022
  13. martinus commented at 5:46 pm on August 16, 2021: contributor
    Thanks for the benchmark! Good to see its 8.4% faster. I think the only reason the memory is lower is because the memusage::DynamicUsage now overestimates the memory requirements of the std::unordered_map that uses the node_allocator. I guess I should correct that. Then the memory usage should stay roughly the same (as it should, that’s what -dbcache is for), but the number of transactions that can be cached will increase.
  14. JeremyRubin commented at 7:22 pm on August 16, 2021: contributor

    MemoryResource::AllocateNewBlock is the only place where I actually malloc a new block that’s divided into the chunks. Here I’ve replaced the ::operator new(num_chunks* m_chunk_size_bytes) with ::alligned_alloc(sysconf(_SC_PAGESIZE), num_chunks* m_chunk_size_bytes). And use free in ~MemoryResource. I also changed the default allocation size m_block_size_bytes from 262144 to sysconf(_SC_PAGESIZE).

    hmm yeah it’s interesting to think through the alignment issues. I think I understand the code a bit better now so it makes sense that most of you allocations are already aligned. One thing that strikes me is that it seems that you have these 104 byte chunks (13 uint64_t s). You might get better properties if you pad them out to the nearest 32/64/128 bytes (memory overhead 24 bytes :( ), because then you’ll never have nodes across cache lines. Memory usage is a bit worse though, but cache perfomance may be better this way? https://www.usenix.org/legacy/publications/library/proceedings/als00/2000papers/papers/full_papers/sears/sears_html/index.html for reference.

  15. JeremyRubin commented at 7:29 pm on August 16, 2021: contributor

    might also be interesting to bump the CScript prevector size from 28 to 35 (7 bytes). This would help with the above as well because if we’re 104/128 byte aligned we have the capacity & then we’d also save a prevector allocation (which means the vector costs 25 bytes + 34-35 bytes) for a non p2sh segwit v0-v1 output, and also means even more indirection.

    This might have system impacts very broadly because prevector is a leaky optimization sadly.

  16. in src/test/node_allocator_tests.cpp:248 in 3574aeebcb outdated
    243+};
    244+
    245+constexpr bool isMultiple(size_t a, size_t b)
    246+{
    247+    return (a / b) * b == a;
    248+}
    


    jonatack commented at 10:55 am on August 17, 2021:

    Tested that IsMultiple() can be removed (Debian Clang 13)

    0test/node_allocator_tests.cpp:245:16: warning: unused function 'isMultiple' [-Wunused-function]
    1constexpr bool isMultiple(size_t a, size_t b)
    2               ^
    31 warning generated.
    
  17. in src/support/allocators/node_allocator.h:132 in 3574aeebcb outdated
    86+
    87+    /**
    88+     * Copying a memory resource is not allowed, it is an immobile object.
    89+     */
    90+    MemoryResource(const MemoryResource&) = delete;
    91+    MemoryResource& operator=(const MemoryResource&) = delete;
    


    jonatack commented at 11:04 am on August 17, 2021:

    Perhaps rule of 5 (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-five, https://www.stroustrup.com/C++11FAQ.html#default)

    0     /**
    1-     * Copying a memory resource is not allowed, it is an immobile object.
    2+     * Copying/moving a memory resource is not allowed; it is an immobile object.
    3      */
    4     MemoryResource(const MemoryResource&) = delete;
    5     MemoryResource& operator=(const MemoryResource&) = delete;
    6+    MemoryResource(MemoryResource&&) = delete;
    7+    MemoryResource& operator=(MemoryResource&&) = delete;
    
  18. jonatack commented at 11:27 am on August 17, 2021: contributor
    Concept/Approach ACK. Interesting work! Some initial review feedback at https://github.com/jonatack/bitcoin/commit/df355206323e61603b181d426f4977f026559784
  19. 0xB10C commented at 10:04 pm on August 17, 2021: contributor

    I’ve benchmarked the performance of CChainState::ConnectBlock between cde8a991f525b72d3a7ac76e0c83aaa611169f22 (PR) and the mergebase 803ef70fd9f65ef800567ff9456fac525bc3e3c2 (MB) using the validation::connect_block tracepoint. The tracepoint reports the time it took to connect each block. This means my results don’t include the time it took to download and, AFAIK, not the time it took to persist the block to disk. I didn’t look at memory usage.

    I’ve oriented myself on @jamesob’s parameters and synced the blocks 500.000 to 540.000 from a localhost peer three times for the PR and the MB. I used the default dbcache (shouldn’t matter for my measurements as cache flushes don’t happen while inside CChainState::ConnectBlock). I’ve used an idle workstation with an i5 6500 and an NVME drive. Room and hardware temperature between runs changed which could have impacted performance. A bpftrace script to hook into the tracepoint can be found here.

    Total time spent in CChainState::ConnectBlock per run for PR and MB:

    run 1 2 3
    PR 1427s 1487s 1486s
    MB 1577s 1670s 1661s

    image (PR run 2 and run 3 are overlapping quite a bit and indistinguishably here)

    My results confirm that less time is spent in CChainState::ConnectBlock with PR compared to MB. A performance improvement between 6% - 10% is likely.

    To rule out temperature effects on performance, these measurements could be redone with a properly cooled server/workstation.


    EDIT: I did another three runs of PR and MB with calling pyperf system tune beforehand as @martinus suggested. PR is still faster.

    run 4 5 6
    PR 1585s 1537s 1546s
    MB 1647s 1688s 1644s

    image

  20. martinus commented at 7:03 am on August 18, 2021: contributor
    @JeremyRubin I actually tried to bump it up to 128 byte to see what that does, but I didn’t see any major difference. The benchmark got maybe 1% slower, but maybe the benchmark is also not very realistic. I also tried to reduce the size by 8 byte (which can be done by making use of the Coin’s padding in CCoinsCacheEntry, I might open anothe PR for that), and it got about ~1% faster. I think though that even with no performance benefit the node_allocator is worthwhile because it has lower memory overhead, so we can cache more entries with the same -dbcache setting. @jonatack Thanks for the review! I’m not sure though how I should best incorporate your changes? Should I copy them and add a Co-authored-by in the commit message? (I’ve never done that) @0xB10C Thanks for the benchmark! You might get a bit more stable result with pyperf: I always use sudo pyperf system tune before I run a benchmark which does a few things to make benchmarks much more stable (e.g. locks the CPU to a fixed frequency, disables turbo boost)
  21. laanwj removed the label Build system on Aug 18, 2021
  22. laanwj added the label Utils/log/libs on Aug 18, 2021
  23. laanwj added the label Resource usage on Aug 18, 2021
  24. jonatack commented at 9:00 pm on August 18, 2021: contributor

    @jonatack Thanks for the review! I’m not sure though how I should best incorporate your changes? Should I copy them and add a Co-authored-by in the commit message? (I’ve never done that)

    Thanks! I’d say squash in the changes you want to keep. You can add an empty line followed by Co-authored-by: Jon Atack <jon@atack.com> to the bottom of a commit message if you like, but it was just a regular average review :)

  25. jamesob commented at 4:17 pm on August 19, 2021: member

    Bench results with more modest dbcache (800): 6% speedup, 6.7% less memory usage.

    bench name command
    ibd.local.range.500000.540000 bitcoind -dbcache=800 -debug=coindb -debug=bench -listen=0 -connect=0 -addnode=127.0.0.1:8888 -prune=9999999 -printtoconsole=0 -assumevalid=000000000000000000176c192f42ad13ab159fdb20198b87e7ba3c001e47b876
    bench name x #22702 $mergebase
    ibd.local.range.500000.540000.total_secs 3 3519.2667 (± 58.0440) 3752.9732 (± 54.8626)
    ibd.local.range.500000.540000.peak_rss_KiB 3 2282000.0000 (± 45539.0428) 2443188.0000 (± 22211.2803)
    ibd.local.range.500000.540000.cpu_kernel_secs 3 491.2067 (± 4.5352) 493.7567 (± 13.1047)
    ibd.local.range.500000.540000.cpu_user_secs 3 13717.5467 (± 15.0752) 13957.7867 (± 18.7773)
    bench name x #22702 $mergebase
    ibd.local.range.500000.540000.total_secs 3 1 1.066
    ibd.local.range.500000.540000.peak_rss_KiB 3 1 1.071
    ibd.local.range.500000.540000.cpu_kernel_secs 3 1 1.005
    ibd.local.range.500000.540000.cpu_user_secs 3 1 1.018
  26. martinus force-pushed on Aug 20, 2021
  27. martinus commented at 5:03 pm on August 20, 2021: contributor

    I’ve now updated the code with correct memory estimation, and benchmarked it. I had a fully synchronized node, and ran -reindex-chainstate from block 0 to 690000. All done on a Intel i7 8700, locked to 3200 MHz. I used this command:

    0/usr/bin/time -v bitcoind -datadir=/run/media/martinus/big/bitcoin/db -dbcache=5000 -assumevalid=00000000000000000002a23d6df20eecec15b21d32c75833cce28f113de888b7 -reindex-chainstate -printtoconsole=0 -stopatheight=690000
    

    The PR ran quite a bit faster than master, 20.8%, with practically the same memory usage. Here some interesting data from /usr/bin/time:

    what mergebase #22702 Change
    Elapsed (wall clock) time (h:mm:ss or m:ss) 4:06:05 3:14:49 20.8% lower
    Maximum resident set size (kbytes) 6837496 6856104 0.27% more
    Minor (reclaiming a frame) page faults 101689566 70194069 31% fewer
    File system inputs 693891464 687479120 0.9% fewer
    File system outputs 239830024 188574064 21.4% fewer

    Here are some graphs that I created from parsing the debug.log file:

    Progress in Million Transactions over Time(1)

    Size of Cache in MiB over Time

    CCoinsMap requires currently 104 byte for one node, and node_allocator basically uses exactly that (a bit more for house keeping of the allocated blocks but that’s practically negigable), and the std::unordered_map’s default requires 128 byte (16 byte house keeping, and 8 more byte for 16 byte alignment). So we can cache quite a bit more transactions in the same amount of memory:

    Size of Cache in Million tx over Time

    Due to the different allocation behavior (it allocates one big chunk of memory right as the first Coin is added), I also had to update the checks in validation_flush_tests.cpp

  28. martinus force-pushed on Aug 20, 2021
  29. in src/support/allocators/node_allocator.h:38 in 6fa1a72e7d outdated
    32+ * There's no free lunch, so there are also disadvantages:
    33+ *
    34+ * - Memory that's been used for nodes is always put back into a free list and never given back to the system. Memory
    35+ *   is only freed when the MemoryResource is destructed.
    36+ *
    37+ * - The free list is a simple last-in-first-out linked list, it doesn't reorder elements. So freeing and malloc'ing again
    


    sipa commented at 8:51 pm on August 20, 2021:
    This does not sound like a disadvantage when formulated this way?

    martinus commented at 5:35 am on August 21, 2021:
    yes I wanted to clarify that the linked list’s can become relatively random access pattern into the memory, which can be slower due to lots of cache misses. I’ll fix the comment
  30. in src/support/allocators/node_allocator.h:257 in 6fa1a72e7d outdated
    262+    }
    263+
    264+    //! A single linked list of all data available in the pool. This list is used for allocations of single elements.
    265+    void* m_free_chunks = nullptr;
    266+
    267+    //! A contains all allocated blocks of memory, used to free the data in the destructor.
    


    sipa commented at 8:58 pm on August 20, 2021:
    “A contains” … what is A?

    martinus commented at 5:36 am on August 21, 2021:
    Should be just “Contains all allocated blocks of memory”

    martinus commented at 8:46 am on October 26, 2021:
    fixed in 89c62017f2a099de5353579b08f79037805ca70f
  31. in src/support/allocators/node_allocator.h:270 in 6fa1a72e7d outdated
    265+    void* m_free_chunks = nullptr;
    266+
    267+    //! A contains all allocated blocks of memory, used to free the data in the destructor.
    268+    std::vector<void*> m_allocated_blocks{};
    269+
    270+    //! The pool's size for the memory blocks. First call to Allocate() determines the used size.
    


    sipa commented at 9:07 pm on August 20, 2021:
    I think this comment is incorrect (this is the size of one chunk; blocks are a multiple of that).

    martinus commented at 5:53 am on August 27, 2021:
    thanks, I’ve updated the comment
  32. in src/support/allocators/node_allocator.h:158 in 6fa1a72e7d outdated
    163+            // slow path, only happens when a new block needs to be allocated
    164+            AllocateNewBlock();
    165+        }
    166+
    167+        // peel off one chunk from the untouched memory
    168+        auto tmp = m_untouched_memory_iterator;
    


    sipa commented at 9:21 pm on August 20, 2021:
    Perhaps add a comment here explaining that the next pointer of in-use elements actually doesn’t matter until it’s deallocated, so it doesn’t need initialization here.

    martinus commented at 5:54 am on August 27, 2021:
    I’ve updated the comment
  33. martinus force-pushed on Aug 22, 2021
  34. martinus renamed this:
    Add allocator for node based containers
    [WIP] Add allocator for node based containers
    on Aug 22, 2021
  35. martinus commented at 5:28 pm on August 22, 2021: contributor
    Unfortunately Microsofts implementation of unordered_map behaves quite a bit different from libc++ and libstdc++, so the heuristic that I’m using to detect the node size doesn’t work. I’ve added a WIP to the header until I’ve fixed this. Most likely I’ll ditch the heuristic completely, and properly calculate the correct size. I’m already doing that in the tests anyways (except for windows)
  36. martinus force-pushed on Aug 23, 2021
  37. martinus force-pushed on Aug 24, 2021
  38. martinus force-pushed on Aug 24, 2021
  39. martinus force-pushed on Aug 24, 2021
  40. in src/test/validation_flush_tests.cpp:146 in c6561fd394 outdated
    142@@ -135,7 +143,7 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
    143         BOOST_CHECK(usage_percentage >= 0.9);
    144         BOOST_CHECK(usage_percentage < 1);
    145         BOOST_CHECK_EQUAL(
    146-            chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, 1 << 10),
    147+            chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, 512),
    


    jonatack commented at 8:50 pm on August 25, 2021:

    As far as I can tell, this can remain 1 << 10 (1024).

    In both cases, invoking the test with src/test/test_bitcoin -t validation_flush_tests -l test_suite prints CoinsTip usage percentage: 0.997634

    0            chainstate.GetCoinsCacheSizeState(MAX_COINS_CACHE_BYTES, /*max_mempool_size_bytes*/ 1 << 10), // 1024
    

    martinus commented at 5:54 am on August 27, 2021:
    Thanks again, I’ve squashed your review and rebased
  41. in src/test/node_allocator_tests.cpp:152 in c6561fd394 outdated
    147+            auto map_b = Factory::CreateContainer(&mr_b);
    148+            map_b[123] = 321;
    149+
    150+            map_b = std::move(map_a);
    151+
    152+            // map_a now uses mr_b, since propagate_on_container_copy_assignment is std::true_type
    


    jonatack commented at 9:25 pm on August 25, 2021:
    0            // map_a now uses mr_b, since propagate_on_container_move_assignment is std::true_type
    
  42. jonatack commented at 9:31 pm on August 25, 2021: contributor

    Currently-testing almost-ACK c6561fd394aab850ac2d8bc6889eaaa274c4abda

    Learning more about allocators thanks to this patch and the links in your documentation.

    A second round of review ideas at https://github.com/jonatack/bitcoin/commits/pr22702-review-2, feel free to pick/choose/ignore/squash in. I’m running a node built on that/this branch.

  43. martinus force-pushed on Aug 27, 2021
  44. martinus commented at 6:27 am on August 27, 2021: contributor

    I’ve run the same benchmark as #22702 (comment) again, but this time with -dbcache=10000, and on my brand new much faster SSD (NVMe Corsair MP400). The results are very similar as before, just both runs are relatively faster. Interestingly, with such a large dbsize the branch never needs to flush the cache, only when finished, while master has to flush once because it got full around the 190 minute mark. That’s also why memory usage of the branch is a bit lower here.

    what mergebase #22702 Change
    Elapsed (wall clock) time (h:mm:ss or m:ss) 3:29:56 2:43:43 22.0% faster
    Maximum resident set size (kbytes) 10303716 9637256 6.4% lower
    Minor (reclaiming a frame) page faults 24088467 3836829 84% fewer
    File system inputs 693485000 696413568 0.4% more
    File system outputs 109392776 81335536 25.6% fewer

    Again, graphs are generated from parsing the debug.log files, on an Intel i7 8700 locked to 3.2 GHz: Progress in Million Transactions over Time Size of Cache in MiB over Time Size of Cache in Million tx over Time

  45. martinus renamed this:
    [WIP] Add allocator for node based containers
    Add allocator for node based containers
    on Aug 27, 2021
  46. martinus commented at 7:46 am on August 27, 2021: contributor
    I’ve removed [WIP] in the title after the code cleanup & fixes, everything should work now as expected on all platforms
  47. jonatack commented at 2:55 pm on September 2, 2021: contributor
    ACK 952c37a31c39b7acdb5ab2634a354a1c548708a2 code review and tested on debian 5.10.46-4 (2021-08-03) running a (synced) mainnet node since 8 days so far
  48. martinus closed this on Sep 2, 2021

  49. martinus reopened this on Sep 2, 2021

  50. in src/bench/node_allocator.cpp:29 in 952c37a31c outdated
    24+
    25+    COutPoint p{tx.GetHash(), 0};
    26+
    27+    bench.epochIterations(5000 * 10).run([&] {
    28+        // modify hash a bit so we get a new entry in the map
    29+        ++p.n;
    


    benthecarman commented at 2:00 am on September 3, 2021:
    will this be handled correctly in the case of overflow?

    martinus commented at 5:29 am on September 3, 2021:
    I think p.n cannot overflow here. It is initialized to 0, and since I’ve specified the number of iterations each measurement will have exactly 50000 iterations. I do this so I can be sure each measurement has exactly the same number of calls to map.clear();, which gives more stable benchmark results. Nanobench defaults to 11 measurements, so in total n will reach 550000.
  51. 0xB10C commented at 9:50 am on September 3, 2021: contributor
    I think it would be better to not @ @jonatack in the commit message. IIRC he’ll get a notification each time this commit is included in some bitcoin-fork.
  52. martinus commented at 12:47 pm on September 3, 2021: contributor

    I think it would be better to not @ @jonatack in the commit message. IIRC he’ll get a notification each time this commit is included in some bitcoin-fork.

    Ah right, it even says so in CONTRIBUTING.md. I’ll change the commit message.

  53. jonatack commented at 12:58 pm on September 3, 2021: contributor
    The @ is not bothersome until merge, so it can wait until need to retouch.
  54. benthecarman commented at 0:17 am on September 4, 2021: contributor

    tACK 952c37a31c39b7acdb5ab2634a354a1c548708a2

    Did IDB on a testnet node, ubuntu 20.04

  55. DrahtBot commented at 5:28 pm on September 9, 2021: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  56. jamesob commented at 2:17 pm on October 12, 2021: member

    Given the high potential performance benefit of this PR, it’d be great to see it move forward somehow. @martinus: are there any limitations or concerns you have about this? Are there any risks in particular we should be testing for? @sipa @ryanofsky @TheBlueMatt: given you’re some resident C++/systems experts, are there any concerns you have? Tests you’d like to see?

    I just want to best understand what kind of work we should be doing to assess the suitability of this change.

  57. sipa commented at 2:30 pm on October 12, 2021: member

    It’s been on my list to go through the code in detail.

    So far, my biggest concern is that the object size gets computed incorrectly by a future libstdc++ (or equivalent) change, and we’ll silently fall back to old performance. @martinus What do you think the impact would be from having a list of memory blobs per allocation size? That would alleviate any such concerns, but I’m not sure whether its performance impact might be non-negligible, of if there are other issues with that approach. A pointer to the last-used allocation size’s blobs could be cached, perhaps.

  58. ryanofsky commented at 3:03 pm on October 12, 2021: contributor

    @sipa @ryanofsky @TheBlueMatt: given you’re some resident C++/systems experts, are there any concerns you have? Tests you’d like to see?

    No specific concerns from me. It’s just a PR adding a lot of dense code that will take time to read over. This seems like a very worthwhile change if it is speeding up IBD, and it seems less risky than other more invasive performance improvements.

    Dumb question, just because the title of this PR is makes me immediately feel stupid: What is a “node-based” container? Just any map container? Maybe a more motivating PR title would be “Implement node allocator for UTXO cache for faster IBD”. Also it would be good if PR description described the new PR instead of just comparing it to previous PR, since probably some potential reviewers like me are not familiar with the previous PR.

  59. sipa commented at 3:14 pm on October 12, 2021: member
    @ryanofsky I’m not exactly sure where the name “node based” comes from, but my interpretation is this: it’s an allocator that optimizes the allocation of lots of objects of one specific preset size, without the ability to return individual memory back to the system. Such an allocator is specifically useful for standard library collection types which involve a separately-allocated object (which I guess are called nodes?) for each item (this would apply to things like std::{,forward_}list, std::{,unordered_}{,multi}{set,map}; I’m unsure about std::deque).
  60. ryanofsky commented at 3:24 pm on October 12, 2021: contributor
    Yeah I don’t mean to quibble over the name. I’m sure it’s appropriate. Just suggesting to make not make the term seem like assumed knowledge for reviewing the PR. Naively I would think “separately-allocated object” would apply to trees and lists but not hash tables so {,unordered_} part of this is specifically what’s confusing me here. But I should just sit down and read node_allocator.h, since the code itself is nicely documented.
  61. sipa commented at 3:27 pm on October 12, 2021: member
    @ryanofsky Yeah I’m just trying to explain my understanding, not nit about the name. std::unordered_{multi,}{set}{map} use a single hashtable with pointers that start singly-linked lists of objects in that bucket, each of which points to the next one. Those objects are potential candidates for optimization by the allocator (the hashtable itself isn’t).
  62. martinus commented at 3:29 pm on October 12, 2021: contributor

    @jamesob concerning limitations, I can’t really think of much… I think the biggest change is that the MemoryResource doesn’t free memory any more. So once the map reaches a size, memory usage won’t decrease even when the map gets smaller. You can see that nicely in this graph: https://user-images.githubusercontent.com/14386/131080505-23278cc5-8977-4a73-beec-4f2ece769fae.png The blue line is mergebase and when map size decreases memory goes down. For this PR the memory stays constant (at a lower level because it needs less memory).

    Another concern is that on some systems malloc could be exceptionally fast, or implemented differently, so that this node_allocator is not actually a performance / memory benefit. @sipa concerning incorrectly calculating object sizes, I think this can be covered in unit tests. In the new test test_chunks_are_used I’m trying to create std::unordered_map with all kind of alignments with and without a noexcept hash, and assert if the MemoryResource is actually used. I could also add another test that actually uses CCoinsMap to see if the allocator is used.

    About having a list of memory blobs per allocation size: I’ve thought about that, but I’ve tried to keep the allocator as simple as possible and didn’t want to make it more complex. By the way the logic is quite a bit similar to python’s pymalloc (which I’ve discovered after I wrote most of it…): https://www.evanjones.ca/memoryallocator/ Newer versions have multiple pools of different byte sizes, and also the ability to actually free chunks. Of course then the allocator would become much more complex. @ryanofsky Node based containers are all containers where for each element they need to allocate data (the actual content + some control structure). For std::unordered_map that node usually consists of key+value pair and a pointer. Sometimes hash value is also stored in the node. Only std::vector, std::deque, std::array are not node based. The nodes are a big performance issue because it’s always a layer of indirection and requires allocation, but it is necessary so pointers/references to data stays valid even when the container is modified.

  63. ryanofsky commented at 3:30 pm on October 12, 2021: contributor
    Makes sense! I didn’t know that about those containers.
  64. sipa commented at 3:30 pm on October 12, 2021: member
    @martinus Ah good, the unit tests covering this that way mostly alleviates my concern. I’ll review the code.
  65. in src/support/allocators/node_allocator.h:164 in 952c37a31c outdated
    159+        m_untouched_memory_iterator = static_cast<char*>(tmp) + m_chunk_size_bytes;
    160+        return static_cast<T*>(tmp);
    161+    }
    162+
    163+    /**
    164+     * Puts p back into the free list f it was actually allocated from the memory block.
    


    martinus commented at 6:27 am on October 14, 2021:
    typo: “puts p back into the free list if it was…”
  66. in src/support/allocators/node_allocator.h:267 in 952c37a31c outdated
    262+
    263+    //! Points to the beginning of available memory for carving out chunks.
    264+    void* m_untouched_memory_iterator = nullptr;
    265+
    266+    //! Points to the end of available memory for carving out chunks.
    267+    void* m_untouched_memory_end = nullptr;
    


    ryanofsky commented at 3:34 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    m_untouched_memory_end is always m_allocated_blocks.back() + alloc_size. Seems like it would be safer and less confusing to drop this unnecessary variable. Safer to avoid possibility of bugs where object gets into an inconsistent state. Less confusing to need one less variable to understand this.


    martinus commented at 10:27 am on October 24, 2021:
    I’d prefer to keep that member variable because it is used in Allocate(size_t n) which should be really fast. I think it would be a bit slower with additional indirections. Also I currently don’t need special handling for when nothing is allocated yet because both pointers are initialized to nullptr

    ryanofsky commented at 4:27 pm on October 27, 2021:

    If you prefer to keep this variable, I think for clarity it should have a comment like

    // m_untouched_memory_end member variable is redundant, and is always equal to (m_allocated_pools.back().get() + POOL_SIZE_BYTES) whenever it is accessed, but m_untouched_memory_end caches this for clarity and efficiency.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  67. in src/support/allocators/node_allocator.h:86 in 952c37a31c outdated
    81+
    82+public:
    83+    /**
    84+     * In-place linked list of the allocation chunks, used for the free list.
    85+     */
    86+    struct ChunkNode {
    


    ryanofsky commented at 3:54 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Why are individual allocations sometimes called “chunks” and sometimes called “nodes”? Can we drop “chunk” terminology and just refer to individual allocations consistently as nodes?

    Personally I would love to drop the custom chunk/node/block terms entirely, just call the allocator a fixed size allocator, call the allocations allocations, and call the blocks pools. Just IMO, though.


    martinus commented at 6:02 am on October 26, 2021:
    I also wasn’t really sure what terms to use. I’ll try to clean that up.
  68. in src/support/allocators/node_allocator.h:230 in 952c37a31c outdated
    225+     * The memory block needs to be correctly aligned and large enough to hold both T and ChunkNode.
    226+     */
    227+    template <typename T>
    228+    [[nodiscard]] static constexpr size_t CalcRequiredChunkSizeBytes() noexcept
    229+    {
    230+        const auto alignment_max = std::max(std::alignment_of_v<T>, std::alignment_of_v<ChunkNode>);
    


    ryanofsky commented at 4:15 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Strictly speaking, keeping ChunkNode structs aligned shouldn’t be required, because we could memcpy them to and from the empty spaces instead of static_casting them. But I guess not taking the max alignment is unlikely to save any space, if we already need to use the max size anyway?

    In any case, it would be good to have sentence here to say why this is using ChunkNode alignment, since it seems like ChunkNode alignment shouldn’t need to matter in principle.


    martinus commented at 6:45 am on October 25, 2021:
    I prefer to keep the alignment here, as far as I remember some systems can produce very slow code when using memcpy without the alignment requirement. It shouldn’t matter for x86 though. I’ll add a comment
  69. in src/support/allocators/node_allocator.h:93 in 952c37a31c outdated
    88+    };
    89+
    90+    /**
    91+     * Construct a new Memory Resource object that uses the specified chunk size to optimize for.
    92+     */
    93+    explicit MemoryResource(size_t chunk_size_bytes) : m_chunk_size_bytes(chunk_size_bytes) {}
    


    ryanofsky commented at 7:11 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    It seems like it would be more ideal for chunk_size_bytes to be a compile time constant (template parameter) instead of a runtime constant (class member), since in all cases can be known at compile time, and the fact that it isn’t mean that other things like size of a block can’t be constant either. Not being able to easily compute size of the block (it requires runtime multiplication and division) I think indirectly led to m_untouched_memory_end being added unnecessarily. Not having a compile-time block size also appears to be a reason m_allocated_blocks is using void* and raw new/deletes instead of safer types like std::unique_ptr and std::array

    Making this const would also have more minor benefit of improving clarity by being able to declare useful constants up front and not have things like / m_chunk_size_bytes * m_chunk_size_bytes various places in the code.


    martinus commented at 5:28 pm on October 24, 2021:
    I’ve tried to make it a template argument before, but couldn’t get it to compile. But I’ve tried it again and figured it out now, the main issue that I missed is that the Allocator then needs to have a struct rebind for this to work. (this is described in a little footnote here: https://en.cppreference.com/w/cpp/named_req/Allocator#cite_note-2 The rebind is not optional because the second template argument will be a size_t, and not a template type)

    martinus commented at 8:48 am on October 26, 2021:
    done in 89c62017f2a099de5353579b08f79037805ca70f
  70. in src/support/allocators/node_allocator.h:116 in 952c37a31c outdated
    111+    ~MemoryResource() noexcept
    112+    {
    113+        for (auto* block : m_allocated_blocks) {
    114+            ::operator delete(block);
    115+        }
    116+    }
    


    ryanofsky commented at 8:30 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    This could go away using std::unique_ptr


    martinus commented at 5:25 pm on October 24, 2021:
    Right! I’ll switch to using std::unique_ptr<char[]>.

    martinus commented at 8:47 am on October 26, 2021:
    Done in 89c62017f2a099de5353579b08f79037805ca70f
  71. in src/support/allocators/node_allocator.h:204 in 952c37a31c outdated
    199+    }
    200+
    201+    /**
    202+     * Counts number of free entries in the free list. This is an O(n) operation. Mostly for debugging / logging / testing.
    203+     */
    204+    [[nodiscard]] size_t NumFreeChunks() const noexcept
    


    ryanofsky commented at 8:41 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Exposing this method seems pretty dubious. Maybe do friend class MemoryResourceTester; instead and move it to test code. My concerns here are verbosity of this header, and temptation for someone to call this inappropriately in some performance counter or log statement. If this should be exposed publicly, maybe it should have a name like CountFreeChunks so it doesn’t look like an innocuous accessor method.


    martinus commented at 6:03 am on October 26, 2021:
    That’s a good idea, then I can move NumFreeChunks and NumBlocks to the test class because it shouldn’t be needed anywhere else

    martinus commented at 8:48 am on October 26, 2021:
    done in 89c62017f2a099de5353579b08f79037805ca70f
  72. in src/support/allocators/node_allocator.h:234 in 952c37a31c outdated
    229+    {
    230+        const auto alignment_max = std::max(std::alignment_of_v<T>, std::alignment_of_v<ChunkNode>);
    231+        const auto size_max = std::max(sizeof(T), sizeof(ChunkNode));
    232+
    233+        // find closest multiple of alignment_max that holds size_max
    234+        return ((size_max + alignment_max - 1U) / alignment_max) * alignment_max;
    


    ryanofsky commented at 8:49 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Suggestion not for here, but I wouldn’t mind some other PR adding simple CEIL_DIV, ROUND_UP util macros and using them places like this.

  73. in src/support/allocators/node_allocator.h:244 in 952c37a31c outdated
    239+     * Allocate one full memory block which is used to carve out chunks.
    240+     * The block size is the multiple of m_chunk_size_bytes that comes closest to BlockSizeBytes.
    241+     */
    242+    void AllocateNewBlock()
    243+    {
    244+        static_assert(sizeof(char) == 1U);
    


    ryanofsky commented at 8:56 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    This should be removed since it’s true by definition that sizeof returns char sized units and sizeof(char) is 1. Size of a char can vary (it is CHAR_BIT bits which is usually 8), but I don’t think any code here depends on a particular char size.


    martinus commented at 8:48 am on October 26, 2021:
    done in 89c62017f2a099de5353579b08f79037805ca70f
  74. in src/support/allocators/node_allocator.h:80 in 952c37a31c outdated
    75+ *   Whenever a node is given back with Deallocate(), it is put into that free list.
    76+ */
    77+class MemoryResource
    78+{
    79+    //! Size in bytes to allocate per block, currently hardcoded to 256 KiB.
    80+    static constexpr size_t BlockSizeBytes = 262144;
    


    ryanofsky commented at 9:04 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    I think this should be named MaxBlockSizeBytes instead of BlockSizeBytes since actual block size may be slightly less depending on alignment. Also convention is to use snake case for variable/constant names and camel case for type/function names, so maybe would be good MAX_BLOCK_SIZE_BYTES to also make it more obvious this is a compile-time constant.


    martinus commented at 6:00 am on October 26, 2021:
    When refactoring to use compile time NODE_SIZE_BYTES (renamed to ALLOCATION_SIZE_BYTES) I can now calculate the correct blocksize right here.

    martinus commented at 8:48 am on October 26, 2021:
    done in 89c62017f2a099de5353579b08f79037805ca70f
  75. in src/support/allocators/node_allocator.h:52 in 952c37a31c outdated
    47+ *
    48+ * MemoryResource is an immobile object that actually allocates, holds and manages chunks of memory. Currently it is only
    49+ * able provide optimized alloc/free for a single fixed node size which is given in the constructor. Only allocations that match this
    50+ * size will be provided from the preallocated blocks of memory; all other requests simply use ::operator new().
    51+ * To get the correct node size in all cases it is unfortunately necessary to have a look at the various implementations of
    52+ * std::unordered_map. There is currently no standard way of getting the node size programmatically.
    


    ryanofsky commented at 9:13 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    This doesn’t doesn’t really have anything to do with MemoryResource, and I’d think it should be part of the Allocator paragraph not the MemoryResource one. I’d reshuffle this design section a little bit and put MemoryResource paragraph first, Allocator paragraph second, and information about determining the node sizes third.


    martinus commented at 8:49 am on October 26, 2021:
    reshuffled in 89c62017f2a099de5353579b08f79037805ca70f
  76. in src/support/allocators/node_allocator.h:303 in 952c37a31c outdated
    316+        : m_memory_resource(memory_resource)
    317+    {
    318+    }
    319+
    320+    /**
    321+     * Conversion constructor for rebinding. All Allocators use the same memory_resource.
    


    ryanofsky commented at 9:21 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Curious, when does rebinding happen? Maybe say something about when/if this would happen in a sentence here.


    martinus commented at 5:15 pm on October 24, 2021:
    The C++ API for allocators is a bit stupid, it actually happens all the time. No allocation actually ever happens with the std::pair<const Key, Value> type, the container’s rebind to the internal node type, possibly to some control structure, and for unordered_map to the underlying array type. I’ll add a comment

    martinus commented at 8:49 am on October 26, 2021:
    comment added in 89c62017f2a099de5353579b08f79037805ca70f
  77. in src/support/allocators/node_allocator.h:422 in 952c37a31c outdated
    416+    static void Construct(ContainerType* ptr, MemoryResource* memory_resource)
    417+    {
    418+        assert(ptr != nullptr && memory_resource != nullptr && memory_resource->ChunkSizeBytes() == NodeSizeBytes);
    419+        ::new ((void*)ptr) ContainerType{0, Hash{}, Equals{}, AllocatorType{memory_resource}};
    420+    }
    421+};
    


    ryanofsky commented at 9:34 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    These construct methods seem sketchy and prone to misuse. Can we drop these? The one place these are called is in ReallocateCache and it should be possible to eliminate them there by keeping the existing placement new:

    0::new (&cacheCoinsMemoryResource) MemoryResource{CCoinsMapFactory::CreateMemoryResource()};
    1::new (&cacheCoins) CCoinsMap{CCoinsMapFactory::CreateContainer(&cacheCoinsMemoryResource)};
    

    martinus commented at 6:04 am on October 25, 2021:
    Good idea, I didn’t think about writing it that way

    martinus commented at 8:49 am on October 26, 2021:
    done in 89c62017f2a099de5353579b08f79037805ca70f
  78. in src/coins.cpp:238 in 952c37a31c outdated
    221@@ -222,6 +222,7 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    222 bool CCoinsViewCache::Flush() {
    223     bool fOk = base->BatchWrite(cacheCoins, hashBlock);
    224     cacheCoins.clear();
    225+    ReallocateCache();
    


    ryanofsky commented at 9:44 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    I think it makes sense to add ReallocateCache cache (though I would be curious if someone knows a reason it wasn’t added here initially).

    But if we are going to add this here I think we should remove the other ReallocateCache() call in validation.h which is now redundant, and make the ReallocateCache method private so it is not uselessly confusing.


    martinus commented at 8:20 am on October 23, 2021:

    ReallocateCache is now necessary here, because Flush is triggered when memory is full, and with the node_allocator clear() does not free any memory the memory usage would stay constant. It was not necessary before because clear() already deallocated all of the nodes and only kept the memory for the unordered_map’s internal indexing array.

    Right! CoinsTip().ReallocateCache(); in validation.cpp is now unnecessary. Good catch.


    martinus commented at 8:52 am on October 26, 2021:
    Done that in a separate commit in 1e4a65b21c70ee745a8ae02c3af109f4f5e16422
  79. in src/memusage.h:165 in 952c37a31c outdated
    162+template <typename T>
    163+struct NodeSize {
    164+};
    165+
    166+template <typename Key, typename V, typename Hash, typename Equals, typename Allocator>
    167+struct NodeSize<std::unordered_map<Key, V, Hash, Equals, Allocator>> {
    


    ryanofsky commented at 9:51 pm on October 22, 2021:

    In commit “Add allocator for node based containers” (952c37a31c39b7acdb5ab2634a354a1c548708a2)

    Just so this information is easily accessible, can you add a comment about how this is tested for accuracy or where the tests can be found?


    martinus commented at 10:51 am on October 26, 2021:
    Added in a91b603
  80. ryanofsky approved
  81. ryanofsky commented at 10:02 pm on October 22, 2021: contributor

    Code review ACK 952c37a31c39b7acdb5ab2634a354a1c548708a2 for everything except benchmarks & tests, which I skimmed but didn’t really look at yet.

    Looks great! I left a lot of suggestions, but none are critical so please ignore them if they don’t seem worth implementing. Some of them might make more sense to follow up later, anyway.

  82. martinus commented at 2:07 pm on October 23, 2021: contributor
    Thanks a lot for the detailed review @ryanofsky! I’ll go through all that and see what I can do
  83. martinus commented at 2:38 pm on October 26, 2021: contributor
    @ryanofsky I have done quite a bit of refactoring based on your suggestion, thanks again! It’s a big change unfortunately, but I think it addresses almost everything you mentioned.
  84. in src/memusage.h:199 in a91b6032f0 outdated
    202 
    203 template<typename X, typename Y>
    204 static inline size_t DynamicUsage(const std::unordered_set<X, Y>& s)
    205 {
    206-    return MallocUsage(sizeof(unordered_node<X>)) * s.size() + MallocUsage(sizeof(void*) * s.bucket_count());
    207+    return MallocUsage(sizeof(std::pair<X, void*>)) * s.size() + MallocUsage(sizeof(void*) * s.bucket_count());
    


    ryanofsky commented at 7:51 pm on October 26, 2021:

    To confirm, replacing unordered_node<X> with std::pair<X, void*> is just a simplification for consistent style, not a change in behavior?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 8:11 am on October 29, 2021:
    exactly, I found it simpler & less code to just use std::pair<...> than writing many unordered_node implementations.
  85. in src/bench/node_allocator.cpp:23 in a91b6032f0 outdated
    18+    tx.vin.resize(1);
    19+    tx.vin[0].scriptSig = CScript() << OP_2;
    20+    tx.vin[0].scriptWitness.stack.push_back({2});
    21+    tx.vout.resize(1);
    22+    tx.vout[0].scriptPubKey = CScript() << OP_2 << OP_EQUAL;
    23+    tx.vout[0].nValue = 10 * COIN;
    


    ryanofsky commented at 11:02 pm on October 26, 2021:

    Filling this fake transaction just to throw it all away and call tx.GetHash seems unnecessarily complicated. Could just construct the outpoint with any uint256 instead of a real transaction hash

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 2:58 pm on November 5, 2021:
    right! I just copied that code without really thinking too much about it. I’ll clean that up.

    martinus commented at 5:24 pm on November 7, 2021:
    Done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  86. in src/memusage.h:208 in a91b6032f0 outdated
    212+template <typename Key, typename Value, typename Hash, typename Equals, typename Alloc>
    213+static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equals, Alloc>& m)
    214 {
    215-    return MallocUsage(sizeof(unordered_node<std::pair<const X, Y> >)) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
    216+    using ContainerType = std::unordered_map<Key, Value, Hash, Equals, Alloc>;
    217+    if constexpr (std::is_same_v<Alloc, node_allocator::Allocator<std::pair<const Key, Value>, memusage::NodeSize<ContainerType>::Value()>>) {
    


    ryanofsky commented at 11:10 pm on October 26, 2021:

    Would be good to replace std::pair<const Key, Value> with ContainerType::value_type to make this more obvious.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:26 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  87. in src/memusage.h:213 in a91b6032f0 outdated
    217+    if constexpr (std::is_same_v<Alloc, node_allocator::Allocator<std::pair<const Key, Value>, memusage::NodeSize<ContainerType>::Value()>>) {
    218+        // Assumes that DynamicUsage of the MemoryResource is called separately. We don't do it here
    219+        // because multiple maps could use the same MemoryResource.
    220+        return MallocUsage(sizeof(void*) * m.bucket_count());
    221+    } else {
    222+        auto node_size = NodeSize<std::unordered_map<Key, Value, Hash, Equals, Alloc>>::Value();
    


    ryanofsky commented at 11:20 pm on October 26, 2021:

    Could be shortened to NodeSize<ContainerType>

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:26 pm on November 7, 2021:
    Split that code up into two methods in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  88. in src/support/allocators/node_allocator.h:25 in a91b6032f0 outdated
    20+ * containers that experience heavy load.
    21+ *
    22+ * ## Behavior
    23+ *
    24+ * MemoryResource mallocs pools of memory and uses these to carve out memory for the nodes. Nodes
    25+ * that are destroyed by the Allocator are actually put back into a free list for further use. This
    


    ryanofsky commented at 3:41 pm on October 27, 2021:

    Would s/destroyed by the Allocator/freed/ to be clear this happens independently of the allocator, and whether or not it is used.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:27 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  89. in src/support/allocators/node_allocator.h:53 in a91b6032f0 outdated
    48+ * Allocator is a cheaply copyable, `std::allocator`-compatible type used for the containers.
    49+ * Similar to `std::pmr::polymorphic_allocator`, it holds a pointer to a memory resource.
    50+ *
    51+ * MemoryResource is an immobile object that actually allocates, holds and manages memory. Currently
    52+ * it is only able to provide optimized alloc/free for a single fixed allocation size which is given
    53+ * in the constructor. Only allocations that match this size will be provided from the preallocated
    


    ryanofsky commented at 3:44 pm on October 27, 2021:

    Should drop “which is given in the constructor”, since it’s not true anymore and is not a really relevant detail describing the high level behavior here.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  90. in src/support/allocators/node_allocator.h:56 in a91b6032f0 outdated
    51+ * MemoryResource is an immobile object that actually allocates, holds and manages memory. Currently
    52+ * it is only able to provide optimized alloc/free for a single fixed allocation size which is given
    53+ * in the constructor. Only allocations that match this size will be provided from the preallocated
    54+ * pools of memory; all other requests simply use ::operator new(). To get the correct node size in
    55+ * all cases it is unfortunately necessary to have a look at the various implementations of
    56+ * std::unordered_map. There is currently no standard way of getting the node size programmatically.
    


    ryanofsky commented at 3:55 pm on October 27, 2021:

    It is unexpected to mention std::unordered_map here when everything above is talking about containers generally. Also unclear what “all cases” is referring to. Would drop last two sentences and just say something like:

    Using node_allocator with a standard container types like std::list, std::unordered_map requires knowing node sizes of those containers which are non-standard implementation details. The \ref memusage::NodeSize trait and \ref node_allocator::Factory class can be used to help with this.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:29 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  91. in src/support/allocators/node_allocator.h:153 in a91b6032f0 outdated
    148+     * @param n Number of objects to allocate for
    149+     */
    150+    template <typename T>
    151+    [[nodiscard]] T* Allocate(size_t n)
    152+    {
    153+        if constexpr (ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>()) {
    


    ryanofsky commented at 4:10 pm on October 27, 2021:

    It seems like it would be better to make this a static_assert than an if constexpr. I also don’t see a reason for this method to take an n argument. Switching to static_assert and dropping n argument here would have the following advantages:

    • It would simplify this method and remove footguns where calling it incorrectly could result in bad runtime behavior instead of a compile time errors.
    • Moving the size and n checks to Allocator::allocate instead of MemoryResource::Allocate would allow more flexiblility like falling back to a nested allocator like std::allocator instead of global ::operator new (In theory this would even allow chaining different sized node_allocator::Allocator types together so an allocator could transparently direct diffferent sized allocations to different sized pools, and fall back to std::allocator if no pool fits)

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:30 pm on November 7, 2021:
    Hm, I don’t really want to add a counter for m_free_allocations but also don’t want to iterate through the freelist. I think it’s ok to keep the MemoryResource::DynamicMemoryUsage as it is because all that memory is the responsibility of that class. I got rid of the if constexpr though in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5.
  92. in src/support/allocators/node_allocator.h:78 in a91b6032f0 outdated
    73+
    74+/**
    75+ * In-place linked list of the allocations, used for the free list.
    76+ */
    77+struct FreeList {
    78+    void* next;
    


    ryanofsky commented at 4:19 pm on October 27, 2021:

    Would expect FreeList* instead of void*. void* could make sense if these were ever pointers to something besides FreeLists, but if next is ever non-null it is always pointing to a FreeList.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 6:41 am on November 6, 2021:

    Reading through the code again I now think that I actually have undefined behavior: I’m basically reinterpret_casting T* to FreeList*, and both types are completely unrelated, so according to the type aliasing rules accessing this is undefined behavior:

    Performing a class member access that designates a non-static data member or a non-static member function on a glvalue that does not actually designate an object of the appropriate type - such as one obtained through a reinterpret_cast - results in undefined behavior

    See https://en.cppreference.com/w/cpp/language/reinterpret_cast

    This can be fixed by using memcpy, and then I can also relax the alignment.


    martinus commented at 7:46 am on November 6, 2021:
    I guess I didn’t have enough coffee yet. I can’t use reinterpret_cast, but since the T was already destroyed, I can use placement new in that memory to create the FreeList. No reinterpret_cast needed at all.

    martinus commented at 5:31 pm on November 7, 2021:
    MemoryResource::Deallocate now uses pacement new in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5, and uses the returned pointer.
  93. in src/support/allocators/node_allocator.h:242 in a91b6032f0 outdated
    237+    //! Contains all allocated pools of memory, used to free the data in the destructor.
    238+    std::vector<std::unique_ptr<char[]>> m_allocated_pools{};
    239+
    240+    //! A single linked list of all data available in the MemoryResource. This list is used for
    241+    //! allocations of single elements.
    242+    void* m_free_allocations = nullptr;
    


    ryanofsky commented at 4:20 pm on October 27, 2021:

    Again seems like type should be FreeList* not void*

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:32 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5. I also changed the m_untouched_memory_iterator and m_untouched_memory_end from void* to char*.
  94. in src/memusage.h:211 in a91b6032f0 outdated
    215-    return MallocUsage(sizeof(unordered_node<std::pair<const X, Y> >)) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
    216+    using ContainerType = std::unordered_map<Key, Value, Hash, Equals, Alloc>;
    217+    if constexpr (std::is_same_v<Alloc, node_allocator::Allocator<std::pair<const Key, Value>, memusage::NodeSize<ContainerType>::Value()>>) {
    218+        // Assumes that DynamicUsage of the MemoryResource is called separately. We don't do it here
    219+        // because multiple maps could use the same MemoryResource.
    220+        return MallocUsage(sizeof(void*) * m.bucket_count());
    


    ryanofsky commented at 5:05 pm on October 27, 2021:

    Having this special case to handle shared memory resources seems clunky, especially because it adds a circular dependency between memusage and node_allocator types, instead of letting node_allocator have a one-way dependency on memusage.

    Also it seems like awkward accounting to have dynamic usage of containers not actually include the memory they are using, and for all the cost of all memory be attributed to the memory resource with no distinction between allocated and unallocated memory.

    Another way of doing the accounting would be to keep the containers attributed memory usage unchanged, and just attribute size of unallocated memory to MemoryResource, instead of size of allocated+unallocated memory.

    Specifically the suggestion would be to:

    • Drop the if constexpr special case code here and circular dependency of memusage on node_allocator
    • Change definition of MemoryResource::DynamicMemoryUsage to only include unallocated space not total space by:
      • Subtracting POOL_SIZE_BYTES * m_allocated_pools.size() from current value
      • Adding m_untouched_memory_end - m_untouched_memory_iterator + (length of m_free_allocations list) * ALLOCATION_SIZE_BYTES
      • Maybe adding a counter variable to get length of m_free_allocations list efficiently

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 7:09 am on November 13, 2021:

    I’ve played with this for a while now, but it doesn’t work that way because in memusage.h I still need to know if the node_allocator is used for a container, because the node sizes are different. For the standard std::unordered_map I need to use memusage::MallocUsage for the nodes, and for node_allocator I have use the node size directly. So I need at least a forward declaration of node_allocator::Allocator. But that is fine I guess, I’ll create a forward declaration then there’s no circular dependency any more.

    Also I’ll split up NodeSize calculation into a separate independent file that both memusage and the allocator can use, so there are no circular dependency for that too.

  95. in src/support/allocators/node_allocator.h:402 in a91b6032f0 outdated
    380+ *
    381+ * This calculates the size of the container's internally used node correctly for all supported
    382+ * platforms, which is also asserted by the unit tests.
    383+ */
    384+template <typename Key, typename Value, typename Hash = std::hash<Key>, typename Equals = std::equal_to<Key>>
    385+class UnorderedMapFactory
    


    ryanofsky commented at 5:17 pm on October 27, 2021:

    I don’t see a reason to tie this to unordered_map. Would suggest renaming this from UnorderedMapFactory to just Factory, taking a typename Container template argument and calling NodeSize<Container>::Value() below

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  96. in src/support/allocators/node_allocator.h:417 in a91b6032f0 outdated
    392+
    393+    /**
    394+     * Creates the std::unordered_map container, and asserts that the specified memory_resource is
    395+     * correct.
    396+     */
    397+    [[nodiscard]] static ContainerType CreateContainer(MemoryResourceType* memory_resource)
    


    ryanofsky commented at 5:24 pm on October 27, 2021:

    This CreateContainer method seems like a pretty superfluous wrapper around the ContainerType constructor. Maybe we don’t need a factory class at all and can just provide some template typedefs for ccoins to use:

    0template<typename BaseContainer>
    1using MemoryResourceType = MemoryResource<NodeSize<BaseContainer>::Value()>;
    2
    3template<typename BaseContainer>
    4using AllocatorType = Allocator<BaseContainer::value_type, NodeSize<BaseContainer>::Value()>;
    

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 10:50 am on November 6, 2021:
    I’d like to keep the CreateContainer here because otherwise constructing an unordered_map is quite a lot of non-obvious code, like first argument is always 0, then create a temporary hash of the correct typ, equals, and finally the Allocator.
  97. ryanofsky commented at 5:37 pm on October 27, 2021: contributor

    This looks great and thanks for using some my earlier suggestions. Again feel free to ignore all suggestions from me, as I’m mostly writing to help understand things.

    I mostly finished going through the review again, except I still didn’t look deeply at test yet. I’m just posting comments mid-review now because they are piling up

    I’d also recommend squashing all commits exception the validation ReallocateCache commit to avoid churn within PR and make review more straightforward for new reviewers.

  98. in src/memusage.h:170 in a91b6032f0 outdated
    170+struct NodeSize<std::unordered_map<Key, V, Hash, Equals, Allocator>> {
    171+    [[nodiscard]] static constexpr size_t Value()
    172+    {
    173+        // libstdc++, libc++, and MSVC implement the nodes differently. To get the correct size
    174+        // with the correct alignment, we can simulate that with accordingly nested std::pairs.
    175+        using ValueType = std::pair<const Key, V>;
    


    ryanofsky commented at 4:15 pm on October 28, 2021:

    Putting this comment next to this typedef is a little confusing, because the comment is describing simulating pairs which simulate nonstandard internal data structures, but this pair isn’t really simulating anything, since it’s actually part of standard container definition.

    Would suggest either moving comment, or only using std::pair for simulated pairs, or doing both things. By only using std::pair for simulated pairs, I mean something like:

    0using ContainerType = std::unordered_map<Key, Value, Hash, Equals, Alloc>;
    1using ValueType = typename ContainerType::value_type;
    

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:46 pm on November 7, 2021:
    Moved comment in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  99. in src/memusage.h:188 in a91b6032f0 outdated
    194+        } else {
    195+            // hash is stored along ValueType, and that is wrapped with the pointer.
    196+            return sizeof(std::pair<void*, std::pair<ValueType, size_t>>);
    197+        }
    198+#endif
    199+#endif
    


    ryanofsky commented at 4:38 pm on October 28, 2021:

    Can you use #elif above to get rid of this nesting and just make a flat list of cases?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:46 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  100. in src/support/allocators/node_allocator.h:388 in a91b6032f0 outdated
    383+ */
    384+template <typename Key, typename Value, typename Hash = std::hash<Key>, typename Equals = std::equal_to<Key>>
    385+class UnorderedMapFactory
    386+{
    387+public:
    388+    static constexpr size_t ALLOCATION_SIZE_BYTES = memusage::NodeSize<std::unordered_map<Key, Value, Hash, Equals>>::Value();
    


    ryanofsky commented at 4:57 pm on October 28, 2021:

    In practice, this is probably fine, but logically and in theory, it seems like there is a bug here because this is only using node size to determine allocation size, ignoring node alignment and FreeList size and alignment. It seems like this should be using the RequiredAllocationSizeBytes function to take these things into account, like:

     0diff --git a/src/support/allocators/node_allocator.h b/src/support/allocators/node_allocator.h
     1index 7ff94d39700..5b667ca99b8 100644
     2--- a/src/support/allocators/node_allocator.h
     3+++ b/src/support/allocators/node_allocator.h
     4@@ -385,7 +385,8 @@ template <typename Key, typename Value, typename Hash = std::hash<Key>, typename
     5 class UnorderedMapFactory
     6 {
     7 public:
     8-    static constexpr size_t ALLOCATION_SIZE_BYTES = memusage::NodeSize<std::unordered_map<Key, Value, Hash, Equals>>::Value();
     9+    static constexpr size_t ALLOCATION_SIZE_BYTES = detail::RequiredAllocationSizeBytes<typename
    10+                                                    memusage::NodeSize<std::unordered_map<Key, Value, Hash, Equals>>::NodeType>();
    11     using MemoryResourceType = MemoryResource<ALLOCATION_SIZE_BYTES>;
    12     using AllocatorType = Allocator<std::pair<const Key, Value>, ALLOCATION_SIZE_BYTES>;
    13     using ContainerType = std::unordered_map<Key, Value, Hash, Equals, AllocatorType>;
    14diff --git a/src/memusage.h b/src/memusage.h
    15index fd781f77bf5..af1f33963d6 100644
    16--- a/src/memusage.h
    17+++ b/src/memusage.h
    18@@ -163,35 +163,38 @@ struct NodeSize {
    19 // with multiple configurations (alignments, noexcept hash, node sizes).
    20 template <typename Key, typename V, typename Hash, typename Equals, typename Allocator>
    21 struct NodeSize<std::unordered_map<Key, V, Hash, Equals, Allocator>> {
    22-    [[nodiscard]] static constexpr size_t Value()
    23-    {
    24         // libstdc++, libc++, and MSVC implement the nodes differently. To get the correct size
    25         // with the correct alignment, we can simulate that with accordingly nested std::pairs.
    26         using ValueType = std::pair<const Key, V>;
    27+        using NodeType =
    28 
    29 #if defined(_MSC_VER)
    30         // list node contains 2 pointers and no hash; see
    31         // https://github.com/microsoft/STL/blob/main/stl/inc/unordered_map and
    32         // https://github.com/microsoft/STL/blob/main/stl/inc/list
    33-        return sizeof(std::pair<std::pair<void*, void*>, ValueType>);
    34+                      std::pair<std::pair<void*, void*>, ValueType>
    35 #else
    36 
    37 #if defined(_LIBCPP_VERSION) // defined in any C++ header from libc++
    38         // libc++ always stores hash and pointer in the node
    39         // see https://github.com/llvm/llvm-project/blob/release/13.x/libcxx/include/__hash_table#L92
    40-        return sizeof(std::pair<ValueType, std::pair<size_t, void*>>);
    41+                      std::pair<ValueType, std::pair<size_t, void*>>
    42 #else
    43+        std::conditional_t<
    44         // libstdc++ doesn't store hash when its operator() is noexcept;
    45         // see hashtable_policy.h, struct _Hash_node
    46         // https://gcc.gnu.org/onlinedocs/libstdc++/latest-doxygen/a05689.html
    47-        if (noexcept(std::declval<Hash>()(std::declval<const Key&>()))) {
    48-            return sizeof(std::pair<void*, ValueType>);
    49-        } else {
    50+            noexcept(std::declval<Hash>()(std::declval<const Key&>())),
    51+                          std::pair<void*, ValueType>,
    52             // hash is stored along ValueType, and that is wrapped with the pointer.
    53-            return sizeof(std::pair<void*, std::pair<ValueType, size_t>>);
    54-        }
    55+                          std::pair<void*, std::pair<ValueType, size_t>>
    56+        >
    57 #endif
    58 #endif
    59+    ;
    60+    [[nodiscard]] static constexpr size_t Value()
    61+    {
    62+        return sizeof(NodeType);
    63     }
    64 };
    65 
    

    Fix above is trying to be minimal but it could be cleaned up more by renaming NodeSize to something like ContainerTraits and dropping the Value() method.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 11:14 am on November 6, 2021:
    Thanks, and I agree that is the correct way to do it. I don’t think it was a bug though, at least the worst thing that could happen was that it falls back to default ::operator new, but for that there are unit tests too.

    martinus commented at 5:47 pm on November 7, 2021:
    Done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  101. in src/support/allocators/node_allocator.h:286 in a91b6032f0 outdated
    281+     */
    282+    using propagate_on_container_move_assignment = std::true_type;
    283+
    284+    /**
    285+     * Swapping two containers with unequal allocators who are *not* propagated is undefined
    286+     * behavior. Unfortunately this is the default! Obviously, we don't want that.
    


    ryanofsky commented at 5:07 pm on October 28, 2021:

    I’m curious why this is phrased negatively. Is there some downside to swapping allocators when containers are swapped? Swapping everything in the container when the container is swapped seems like the obvious and good thing to do.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 8:08 am on October 29, 2021:

    I’ll reword my comment, I meant that the default for propagate_on_container_swap is std::false_type, which is bad because that’s undefined behavior. I’ll change the comment:

    0    /**
    1     * The default for propagate_on_container_swap is std::false_type. This is bad,because swapping
    2     * two containers with unequal allocators but not propagating the allocator is undefined
    3     * behavior! Obviously, we want so swap the allocator as well, so we set that to true.
    4     *
    5     * see https://www.foonathan.net/2015/10/allocatorawarecontainer-propagation-pitfalls/
    6     */
    

    martinus commented at 5:47 pm on November 7, 2021:
    changed in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  102. in src/support/allocators/node_allocator.h:289 in a91b6032f0 outdated
    287+     */
    288+    using propagate_on_container_swap = std::true_type; // to avoid the undefined behavior
    289+
    290+    /**
    291+     * Move and swap have to propagate the allocator, so for consistency we do the same for copy
    292+     * assignment.
    


    ryanofsky commented at 5:10 pm on October 28, 2021:

    Again wondering why it’s phrased like we “have to” do these things. Is there some downside, or some particular way this is not ideal for our use-case?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 11:17 am on November 6, 2021:
    no real downside, it’s just that because the allocator has a state it also has to be propagated.
  103. in src/support/allocators/node_allocator.h:310 in a91b6032f0 outdated
    308+     * The rebound type is determined with the `struct rebind`. The rebind is necessary feature
    309+     * for standard allocators, and it happens when the allocator shall be used for different types.
    310+     * E.g. for std::unordered_map this happens for allocation of the internally used nodes, for the
    311+     * underlying index array, and possibly for some control structure. In fact the type required by
    312+     * the standard `std::pair<const Key, Value>` type is never actually used for any allocation
    313+     * (which is therefore a stupid requirement, but that's just C++ being C++).
    


    ryanofsky commented at 5:22 pm on October 28, 2021:

    This is interesting. It seems to suggest about the concern that if “the object size gets computed incorrectly by a future libstdc++ change… we’ll silently fall back to old performance” that in theory we could detect at compile time, not runtime, whether a container implementation was calling Allocator::allocate with an unexpected T size for that implementation.

    But it doesn’t seem worth implementing this if it requires lots of template trickery or tracking down compile errors on different platforms, especially if NodeSize unit test coverage works OK practically.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  104. in src/test/node_allocator_tests.cpp:67 in a91b6032f0 outdated
    62+#define CHECK_MEMORY_RESOURCE(mr, num_free_allocations, num_pools)                   \
    63+    BOOST_CHECK_EQUAL(num_free_allocations,                                          \
    64+                      node_allocator::MemoryResourceTester::NumFreeAllocations(mr)); \
    65+    BOOST_CHECK_EQUAL(num_pools, node_allocator::MemoryResourceTester::NumPools(mr));
    66+
    67+#define CHECK_IN_RANGE(what, lowerInclusive, upperInclusive) \
    


    ryanofsky commented at 5:32 pm on October 28, 2021:

    Probably should use snake case variable names

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:49 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  105. in src/test/node_allocator_tests.cpp:62 in a91b6032f0 outdated
    57+
    58+} // namespace node_allocator
    59+
    60+BOOST_FIXTURE_TEST_SUITE(node_allocator_tests, BasicTestingSetup)
    61+
    62+#define CHECK_MEMORY_RESOURCE(mr, num_free_allocations, num_pools)                   \
    


    ryanofsky commented at 5:36 pm on October 28, 2021:

    Might suggest calling this something like CHECK_NUMS_FREES_POOLS just so it’s easier to remember what the arguments are to this guy below (and what it is doing).

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:49 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  106. in src/test/node_allocator_tests.cpp:73 in a91b6032f0 outdated
    68+    BOOST_TEST(what >= lowerInclusive);                      \
    69+    BOOST_TEST(what <= upperInclusive);
    70+
    71+BOOST_AUTO_TEST_CASE(too_small)
    72+{
    73+    node_allocator::MemoryResource<sizeof(void*)> mr{};
    


    ryanofsky commented at 5:45 pm on October 28, 2021:

    Writing this comment feels like reaching new depths of pedantry, but it doesn’t seem guaranteed that void* is bigger than char. It can’t be smaller, because sizeof(char) is always 1, but if they are the same size, then the test isn’t really meaningful. I guess my suggestion would be to replace void* with a type guaranteed to be bigger like std::pair<char, char>.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  107. in src/test/node_allocator_tests.cpp:115 in a91b6032f0 outdated
    110+        {
    111+            auto b = a;
    112+        }
    113+
    114+        BOOST_CHECK(node_allocator::MemoryResourceTester::NumFreeAllocations(mr) >
    115+                    num_free_allocations);
    


    ryanofsky commented at 6:01 pm on October 28, 2021:

    Could do a stricter check?

    0BOOST_CHECK(node_allocator::MemoryResourceTester::NumFreeAllocations(mr) >=
    1                    num_free_allocations + 1000);
    

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:50 pm on November 7, 2021:
    stricter check in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  108. in src/test/node_allocator_tests.cpp:130 in a91b6032f0 outdated
    125+        // moving the map should not create new nodes
    126+        m = std::move(a);
    127+        BOOST_CHECK_EQUAL(node_allocator::MemoryResourceTester::NumFreeAllocations(mr),
    128+                          num_free_allocations);
    129+    }
    130+    // a is destroyed, still all allocations should stay roughly the same.
    


    ryanofsky commented at 6:06 pm on October 28, 2021:

    Can you add “because its contents were moved to m” to this comment? I was a little confused by what this was doing at first, and the extra hint would have helped.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  109. in src/test/node_allocator_tests.cpp:138 in a91b6032f0 outdated
    133+
    134+    m = Factory::CreateContainer(&mr);
    135+
    136+    // now we got everything free
    137+    BOOST_CHECK(node_allocator::MemoryResourceTester::NumFreeAllocations(mr) >
    138+                num_free_allocations + 50);
    


    ryanofsky commented at 6:16 pm on October 28, 2021:

    Why only >50 here, not >=1000 again?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 11:29 am on November 6, 2021:
    it sure can be made stricter :+1:

    martinus commented at 5:57 pm on November 7, 2021:
    stricter in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  110. in src/test/node_allocator_tests.cpp:20 in a91b6032f0 outdated
    14+#include <type_traits>
    15+#include <unordered_map>
    16+
    17+namespace node_allocator {
    18+
    19+class MemoryResourceTester
    


    ryanofsky commented at 6:23 pm on October 28, 2021:

    It seems like another thing these tests could be checking for aside from number of free allocations and number of pools is amount of slack space the last pool (untouched_end - untouched_iterator)/alloc size.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  111. in src/test/node_allocator_tests.cpp:146 in a91b6032f0 outdated
    138+                num_free_allocations + 50);
    139+}
    140+
    141+BOOST_AUTO_TEST_CASE(different_memoryresource_assignment)
    142+{
    143+    using Factory = node_allocator::UnorderedMapFactory<uint64_t, uint64_t>;
    


    ryanofsky commented at 6:30 pm on October 28, 2021:

    I get that it makes sense to have test coverage for std::unordered_map, but I feel like it would be if ideal if unordered_map vagaries with IN_RANGE(1U, 2U) (101U, 102U) soft checks were confined to ~one test, and all other tests used a simpler container like std::list that was easier to think about and had better defined allocation behavior, so the tests would be easier to understand, less fragile, and in some places more meaningful.

    Not saying you should rewrite all tests, but this might be something to think about doing selectively or for new tests.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)

  112. in src/test/node_allocator_tests.cpp:222 in a91b6032f0 outdated
    217+    // finally map_a is destroyed, but since it was moved, no more free allocations.
    218+    CHECK_IN_RANGE(node_allocator::MemoryResourceTester::NumFreeAllocations(mr_a), 100U, 102U);
    219+}
    220+
    221+
    222+BOOST_AUTO_TEST_CASE(different_memoryresource_swap)
    


    ryanofsky commented at 6:52 pm on October 28, 2021:

    Amount of code duplication between different_memoryresource_* tests is a little eye-watering. This seems like a place where BOOST_FIXTURE_TEST_CASE and a test fixture with unordered_map m_map_a and optional<unordered_map> m_map_b members and some helper methods would cut down duplication and make the checks more human readable.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:56 pm on November 7, 2021:
    deduplicated in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  113. in src/test/node_allocator_tests.cpp:246 in a91b6032f0 outdated
    290+};
    291+
    292+template <size_t S>
    293+struct alignas(32) A32 {
    294+    char data[S];
    295+};
    


    ryanofsky commented at 6:56 pm on October 28, 2021:

    Could all of this just be

    0template<size_t SIZE, size_t ALIGN>
    1struct alignas(ALIGN) A {
    2    char data[SIZE];
    3};
    

    ?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 8:16 am on October 29, 2021:
    Ah, I didn’t know that it’s possible to use a template argument in alignas, that simplifies things :+1:

    martinus commented at 5:51 pm on November 7, 2021:
    done in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5, and renamed the struct to AlignedSize
  114. in src/test/node_allocator_tests.cpp:333 in a91b6032f0 outdated
    328+    static_assert(16U == node_allocator::detail::RequiredAllocationSizeBytes<A8<16>>());
    329+
    330+    static_assert(16U == node_allocator::detail::RequiredAllocationSizeBytes<A16<1>>());
    331+    static_assert(16U == node_allocator::detail::RequiredAllocationSizeBytes<A16<8>>());
    332+    static_assert(16U == node_allocator::detail::RequiredAllocationSizeBytes<A16<16>>());
    333+    static_assert(32U == node_allocator::detail::RequiredAllocationSizeBytes<A16<17>>());
    


    ryanofsky commented at 7:05 pm on October 28, 2021:

    IMO would be good to move these A8 and bigger checks out of the preprocessor block to deduplicate and make it obvious these are all the same whether you have have 4 byte or 8 byte pointers.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 5:55 pm on November 7, 2021:

    deduplicated in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5. Also I have added a

    0static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure AllocatePool() aligns correctly");
    

    Into Allocate, because actually the 16 byte alignments are not allowed when the pools are not correctly aligned as well.

  115. in src/test/node_allocator_tests.cpp:366 in a91b6032f0 outdated
    361+
    362+namespace {
    363+
    364+template <typename T>
    365+struct NotNoexceptHash {
    366+    size_t operator()(const T& x) const /* noexcept */ { return std::hash<T>{}(x); }
    


    ryanofsky commented at 7:08 pm on October 28, 2021:

    Suggest writing /* not noexcept */ to make comment intent more plain.

    Also, I have only the vaguest understanding of noexcept but could this be noexcept(false)?

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 11:41 am on November 6, 2021:
    noexcept(false) and writing nothing is equivalent
  116. in src/test/node_allocator_tests.cpp:398 in a91b6032f0 outdated
    393+        map.clear();
    394+        BOOST_CHECK_EQUAL(node_allocator::MemoryResourceTester::NumFreeAllocations(mr), 5);
    395+    }
    396+
    397+    // makes sure clear frees all allocations. With MSVC there might be an additional allocation
    398+    // used for a control structure.
    


    ryanofsky commented at 7:13 pm on October 28, 2021:

    The hidden mystery of node_allocator_tests is revealed! This is reason for all the IN_RANGE(101, 102) stuff :arrow_up:. I wish there was a spoiler tag.

    (Version: a91b6032f037ab1a26ac4baa761e809fbfeb1bdd)


    martinus commented at 8:13 am on October 29, 2021:
    Maybe I shouldn’t have but that reveal at the very last place where that CHECK_IN_RANGE is used :sweat_smile:

    martinus commented at 5:52 pm on November 7, 2021:
    Moved that comment up in e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5
  117. ryanofsky approved
  118. ryanofsky commented at 7:29 pm on October 28, 2021: contributor
    Code review ACK a91b6032f037ab1a26ac4baa761e809fbfeb1bdd. My suggestions are all minor and can be ignored or could be followed up later.
  119. martinus commented at 2:08 pm on November 2, 2021: contributor
    Thanks again @ryanofsky, I’ll of course look through your review closely again but I don’t have much time right now, so it can take a while.
  120. martinus force-pushed on Nov 7, 2021
  121. martinus force-pushed on Nov 7, 2021
  122. sipa commented at 6:02 pm on November 7, 2021: member

    A few high-level comments:

    • I think it would be nice to split up the first commit more (e.g. separating the introduction of the allocator from the changes to make the UTXO set use it)
    • Don’t @ mention people in the PR description; they’ll get random spam anytime j-random-altcoin cherry-picks it.
    • I find the “semantic” cyclic dependency between node_allocator and memusage (as in: neither can be used without the other, effectively) somewhat ugly. Would it make sense to have the logic for determining node size for containers live in node_allocator itself?
  123. martinus commented at 6:03 pm on November 7, 2021: contributor

    I have rebased & squashed to e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5 and integrated most suggestions from @ryanofsky. This update is mostly style improvements, deduplicating code, but also two correctness improvements:

    • MemoryResource::Deallocate previously just static_cast void* to FreeList. This was technically undefined behavior. Instead, I’m now using placement new.
    • MemoryResource::Allocate now has a static_assert to make sure the pool’s data is correctly aligned for node alignments.
  124. martinus commented at 8:10 am on November 8, 2021: contributor
    @sipa I’ll split up the node_allocator into multiple files, and also into multiple commits. It’s a bit tricky though to get rid of the cyclic dependency, because MemoryResource needs some functionality of memusage to calculate its dynamic usage, and memusage needs to know the Allocator.
  125. ryanofsky commented at 8:05 pm on November 9, 2021: contributor

    It’s a bit tricky though to get rid of the cyclic dependency, because MemoryResource needs some functionality of memusage to calculate its dynamic usage, and memusage needs to know the Allocator.

    FWIW, I gave a suggestion to remove the circular dependency in #22702 (review). Instead of not including the size of allocated nodes in the container size when an allocator is used, the container can report its size the same way whether or not an allocator is used, and the allocator can only report the size of unallocated space instead of the total allocated + unallocated space. I think this would make the accounting more intuitive, in addition to making the container size calculation independent of the allocator size calculation.

  126. martinus force-pushed on Nov 13, 2021
  127. martinus force-pushed on Nov 13, 2021
  128. martinus force-pushed on Nov 13, 2021
  129. martinus force-pushed on Nov 14, 2021
  130. martinus commented at 8:18 am on November 14, 2021: contributor

    Rebased e2a5547f12e5da2718ba17ebac5d5aeb5a2e24d5 -> 290649c with plenty of refactoring. No functional changes.

    • Split code into multiple files, each with its own concern.
    • Split up commits into logical chunks
    • Added another test for NodeSize calculation
  131. DrahtBot added the label Needs rebase on Nov 15, 2021
  132. martinus force-pushed on Nov 15, 2021
  133. martinus commented at 8:27 pm on November 15, 2021: contributor
    Rebased to 8dfd71ca7fddf1a9436d04a16f5a8d9d98ee8c74 to resolve conflicts
  134. DrahtBot removed the label Needs rebase on Nov 15, 2021
  135. ryanofsky commented at 9:21 pm on November 17, 2021: contributor
    I’m starting to catch up on this, but so far the new changes look great, and it’s nice the way this is now split up into smaller commits and files. I think being able to look at different parts in isolation now makes this PR more approachable and easier to review.
  136. in src/test/node_size_tests.cpp:80 in 119c3b90d6 outdated
    75+    for (size_t i = 0; i < num_entries; ++i) {
    76+        map[i];
    77+    }
    78+
    79+    // there should be a entry with exactly 123 allocations
    80+    auto it = std::find_if(SingletonAllocationInfo().begin(), SingletonAllocationInfo().end(), [](const auto& kv) {
    


    ryanofsky commented at 8:55 pm on November 18, 2021:

    In commit “Correctly calculate node size for std::unordered_map” (119c3b90d6b7e9b81915a09f5ca5370f2ef1ea57)

    Might be good to replace this with a loop and verify there is exactly 1 entry with 123 allocations (and the expected size), instead of at least 1.


    martinus commented at 6:10 pm on December 9, 2021:
    This is unfortunately not the case on Windows, there the allocator is used to allocate one control structure as well
  137. in src/support/allocators/node_allocator/allocator.h:51 in 96d8e33aca outdated
    46+ * Allocator is a cheaply copyable, `std::allocator`-compatible type used for the containers.
    47+ * Similar to `std::pmr::polymorphic_allocator`, it holds a pointer to a memory resource.
    48+ *
    49+ * MemoryResource is an immobile object that actually allocates, holds and manages memory. Currently
    50+ * it is only able to provide optimized alloc/free for a single fixed allocation size which is given
    51+ * in the constructor. Only allocations that match this size will be provided from the preallocated
    


    ryanofsky commented at 3:41 pm on November 22, 2021:

    In commit “Add allocator for node based containers” (96d8e33acaa079491403d8e8afa6709978dec205)

    Would drop “which is given in constructor” (no longer true)

  138. in src/test/node_allocator_tests.cpp:150 in 96d8e33aca outdated
    145+#define CHECK_NUMS_FREES_POOLS(mr, num_free_allocations, num_pools)                  \
    146+    BOOST_CHECK_EQUAL(num_free_allocations,                                          \
    147+                      node_allocator::MemoryResourceTester::NumFreeAllocations(mr)); \
    148+    BOOST_CHECK_EQUAL(num_pools, node_allocator::MemoryResourceTester::NumPools(mr));
    149+
    150+BOOST_AUTO_TEST_CASE(too_small)
    


    ryanofsky commented at 4:25 pm on November 22, 2021:

    In commit “Add allocator for node based containers” (96d8e33acaa079491403d8e8afa6709978dec205)

    Purpose of this test could be a made a little clearer with comment like “Verify that MemoryResource specialized for a larger char pair type will do smaller single char allocations as well”

  139. in src/test/node_allocator_tests.cpp:162 in 96d8e33aca outdated
    157+    // mr is used
    158+    CHECK_NUMS_FREES_POOLS(mr, 0, 1);
    159+    mr.Deallocate<char>(ptr);
    160+    CHECK_NUMS_FREES_POOLS(mr, 1, 1);
    161+
    162+    // void* works too, use freelist
    


    ryanofsky commented at 4:25 pm on November 22, 2021:

    In commit “Add allocator for node based containers” (96d8e33acaa079491403d8e8afa6709978dec205)

    Test is no longer using void*. Probably would just say // freelist is used or drop comment.

  140. in src/test/validation_flush_tests.cpp:158 in 66f745394d outdated
    154@@ -147,8 +155,8 @@ BOOST_AUTO_TEST_CASE(getcoinscachesizestate)
    155             CoinsCacheSizeState::OK);
    156     }
    157 
    158-    // Flushing the view doesn't take us back to OK because cacheCoins has
    159-    // preallocated memory that doesn't get reclaimed even after flush.
    160+    // Flushing the view does take us back to OK because cacheCoins' MemoryResource frees its
    


    ryanofsky commented at 5:17 pm on November 22, 2021:

    In commit “Make use of node_allocator in CCoinsMap” (66f745394ddfb546b9d0d29fe764110944dea158)

    Extra punctuation here s/cacheCoins' MemoryResource/cacheCoinsMemoryResource/

  141. in src/support/allocators/node_allocator/memory_resource.h:104 in 96d8e33aca outdated
     99+     */
    100+    template <typename T>
    101+    [[nodiscard]] T* Allocate()
    102+    {
    103+        static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
    104+        static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure AllocatePool() aligns correctly");
    


    ryanofsky commented at 6:02 pm on November 22, 2021:

    In commit “Add allocator for node based containers” (96d8e33acaa079491403d8e8afa6709978dec205)

    It doesn’t seem like this static_assert is verifying anything related to our code. It’s not calling any of our own functions, just verifying a relationship between __STDCPP_DEFAULT_NEW_ALIGNMENT__ and std::alignment_of. Also assert message doesn’t seem to make sense. Check might make more sense inside the RequiredAllocationSizeBytes function than outside here.


    martinus commented at 6:15 am on November 25, 2021:
    I think that check is necessary. In AllocatePool I’m doing new char[POOL_SIZE_BYTES], and this uses __STDCPP_DEFAULT_NEW_ALIGNMENT__ as the alignment for the pool’s memory. So if you try to use the node_allocator with a type that has an alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ this is a problem, even when detail::RequiredAllocationSizeBytes is done correctly. I had a test case that used 16 byte alignment, and with that static_assert that test failed on a 32bit machine.

    ryanofsky commented at 7:47 pm on November 30, 2021:

    re: #22702 (review)

    I think that check is necessary. In AllocatePool I’m doing new char[POOL_SIZE_BYTES]

    Maybe can move check closer to where it’s needed

     0diff --git a/src/support/allocators/node_allocator/memory_resource.h b/src/support/allocators/node_allocator/memory_resource.h
     1index f4ba0138924..cf956da318f 100644
     2--- a/src/support/allocators/node_allocator/memory_resource.h
     3+++ b/src/support/allocators/node_allocator/memory_resource.h
     4@@ -101,7 +101,6 @@ public:
     5     [[nodiscard]] T* Allocate()
     6     {
     7         static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
     8-        static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure AllocatePool() aligns correctly");
     9 
    10         if (m_free_allocations) {
    11             // we've already got data in the free list, unlink one element
    12@@ -120,7 +119,7 @@ public:
    13         // than what has been malloc'ed.
    14         if (m_untouched_memory_iterator == m_untouched_memory_end) {
    15             // slow path, only happens when a new pool needs to be allocated
    16-            AllocatePool();
    17+            AllocatePool<T>();
    18         }
    19 
    20         // peel off one allocation from the untouched memory. The next pointer of in-use
    21@@ -164,8 +163,10 @@ private:
    22     /**
    23      * Allocate one full memory pool which is used to carve out allocations.
    24      */
    25+    template <typename T>
    26     void AllocatePool()
    27     {
    28+        static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure alignment is compatible with T");
    29         m_allocated_pools.emplace_back(new char[POOL_SIZE_BYTES]);
    30         m_untouched_memory_iterator = m_allocated_pools.back().get();
    31         m_untouched_memory_end = m_untouched_memory_iterator + POOL_SIZE_BYTES;
    

    martinus commented at 6:55 pm on May 10, 2022:
    I think I’d prefer to keep the static_assert at the Allocate entry point, because currently AllocatePool doesn’t need to know the type T. I guess it’s more a matter of style :)
  142. in src/test/node_allocator_helpers.h:13 in 119c3b90d6 outdated
     8+ * Wrapper around std::hash, but without noexcept. Useful for testing unordered containers
     9+ * because they potentially behave differently with/without a noexcept hash.
    10+ */
    11+template <typename T>
    12+struct NotNoexceptHash {
    13+    size_t operator()(const T& x) const /* not noexcept */
    


    sipa commented at 6:19 pm on November 22, 2021:
    I’m somewhat surprised that the compiler/language doesn’t automatically declare a function non-throwing if all its calls are declared to be noexcept. Are you sure that’s not a risk here?

    martinus commented at 6:23 am on November 25, 2021:
    I think this is safe, the standard says that only a handful functions are implicitly nothrowing: default ctor and dtor, move & copy ctor, move & assignment operator. There has been a proposal for a noexcept(auto) to add implicitly nothrowing anywhere else, but it seems that there were too many problems.
  143. in src/support/allocators/node_allocator/node_size.h:27 in 119c3b90d6 outdated
    22+ * It is important that the calculation here matches exactly the behavior of std::unordered_map
    23+ * so the node_allocator actually works. This is tested in node_allocator_tests/test_allocations_are_used
    24+ * with multiple configurations (alignments, noexcept hash, node sizes).
    25+ */
    26+template <typename Key, typename V, typename Hash, typename Equals, typename Allocator>
    27+struct NodeSize<std::unordered_map<Key, V, Hash, Equals, Allocator>> {
    


    ryanofsky commented at 8:37 pm on November 22, 2021:

    In commit “Correctly calculate node size for std::unordered_map” (119c3b90d6b7e9b81915a09f5ca5370f2ef1ea57)

    I think it would make more sense to move this class back to src/memusage.h, or to a new src/memusage_nodesize.h file, instead of making it part of node_allocator. This is a helper class for figuring out memory usage of a standard container, so functionally it is more related to memusage than node_allocator. Memusage needs this to function, while node_allocator::Allocator and node_allocator::MemoryResource are both usable without it. Also, In the future, we may also want to experiment with writing new allocators, and it would be nice if they could reuse this NodeSize class without having to reinvent it or depend on node_allocator themselves. Also, combined with other suggestions, moving this class out of node_allocator would allow us to break the node_allocator <-> memusage cyclic dependency and have node_allocator just depend on memusage in one direction.

    (Moving this class out of node_allocator could also let you split off the first two commits of this PR into their own PR so this PR could be smaller, even though I do think this PR is small enough and wouldn’t see a need for that.)

  144. in src/support/allocators/node_allocator/node_size.h:52 in 119c3b90d6 outdated
    47+    using SimulatedNodeType = std::conditional_t<noexcept(std::declval<Hash>()(std::declval<const Key&>())),
    48+                                                 std::pair<void*, ValueType>,                     // no hash stored
    49+                                                 std::pair<void*, std::pair<ValueType, size_t>>>; // hash stored along ValueType, and that is wrapped with the pointer.
    50+#endif
    51+
    52+    [[nodiscard]] static constexpr size_t Value()
    


    sipa commented at 9:05 pm on November 22, 2021:
    Nit: I think with C++17, this can be a constexpr static member variable, rather than a function (static constexpr size_t VALUE = sizeof(SimulatedNodeType);).
  145. in src/memusage.h:178 in 96d8e33aca outdated
    169@@ -169,6 +170,14 @@ static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equ
    170     return MallocUsage(node_size) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
    171 }
    172 
    173+template <typename Key, typename Value, typename Hash, typename Equals, typename AllocT, size_t AllocS>
    174+static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equals, node_allocator::Allocator<AllocT, AllocS>>& m)
    175+{
    176+    // Since node_allocator is used, the memory of the nodes are the MemoryResource's responsibility and
    177+    // won't be added here. Also multiple maps could use the same MemoryResource.
    178+    return MallocUsage(sizeof(void*) * m.bucket_count());
    


    ryanofsky commented at 9:32 pm on November 22, 2021:

    In commit “Add allocator for node based containers” (96d8e33acaa079491403d8e8afa6709978dec205)

    I still think it’s unfortunate to be adding this memusage specialization when it is not necessary. I don’t like how it:

    • Makes memusage.h depend node_allocator, when the job of memusage is just to tell us about memory usage of standard types, and it shouldn’t need to be referencing a particular specialized allocator.
    • Redefines the concept of memory usage for this one unordered_map+node_allocator container to not consider memory allocated by the container to be “used” by the container, even though the container allocated that memory, and is still responsible for deallocating (if not freeing) it.
    • Will require new specializations of map+node_allocator or list+node_allocator to be added here if node_allocator is used with other map or list containers in the future.

    I think code can be simpler, definition of “usage” can be more consistent, and cyclic dependencies and special cases can be removed using earlier suggestions from #22702 (review), and only downside would be adding a counter variable. On top of the current PR, changes would look like:

      0diff --git a/src/Makefile.am b/src/Makefile.am
      1index b39d03fe8cb..9d09682e243 100644
      2--- a/src/Makefile.am
      3+++ b/src/Makefile.am
      4@@ -217,7 +217,6 @@ BITCOIN_CORE_H = \
      5   shutdown.h \
      6   signet.h \
      7   streams.h \
      8-  support/allocators/node_allocator/allocator_fwd.h \
      9   support/allocators/node_allocator/allocator.h \
     10   support/allocators/node_allocator/factory.h \
     11   support/allocators/node_allocator/memory_resource.h \
     12@@ -575,7 +574,6 @@ libbitcoin_util_a_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES)
     13 libbitcoin_util_a_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
     14 libbitcoin_util_a_SOURCES = \
     15   support/lockedpool.cpp \
     16-  support/allocators/node_allocator/memory_resource.cpp \
     17   chainparamsbase.cpp \
     18   clientversion.cpp \
     19   compat/glibcxx_sanity.cpp \
     20diff --git a/src/memusage.h b/src/memusage.h
     21index bc20ee4052c..b9faff9b2d9 100644
     22--- a/src/memusage.h
     23+++ b/src/memusage.h
     24@@ -7,7 +7,6 @@
     25 
     26 #include <indirectmap.h>
     27 #include <prevector.h>
     28-#include <support/allocators/node_allocator/allocator_fwd.h>
     29 #include <support/allocators/node_allocator/node_size.h>
     30 
     31 #include <stdlib.h>
     32@@ -163,21 +162,13 @@ static inline size_t DynamicUsage(const std::unordered_set<X, Y>& s)
     33     return MallocUsage(sizeof(unordered_node<X>)) * s.size() + MallocUsage(sizeof(void*) * s.bucket_count());
     34 }
     35 
     36-template <typename Key, typename Value, typename Hash, typename Equals>
     37-static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equals>& m)
     38+template <typename Key, typename Value, typename Hash, typename Equals, typename Alloc>
     39+static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equals, Alloc>& m)
     40 {
     41-    auto node_size = node_allocator::NodeSize<std::unordered_map<Key, Value, Hash, Equals>>::Value();
     42+    auto node_size = node_allocator::NodeSize<std::unordered_map<Key, Value, Hash, Equals, Alloc>>::Value();
     43     return MallocUsage(node_size) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
     44 }
     45 
     46-template <typename Key, typename Value, typename Hash, typename Equals, typename AllocT, size_t AllocS>
     47-static inline size_t DynamicUsage(const std::unordered_map<Key, Value, Hash, Equals, node_allocator::Allocator<AllocT, AllocS>>& m)
     48-{
     49-    // Since node_allocator is used, the memory of the nodes are the MemoryResource's responsibility and
     50-    // won't be added here. Also multiple maps could use the same MemoryResource.
     51-    return MallocUsage(sizeof(void*) * m.bucket_count());
     52-}
     53-
     54 }
     55 
     56 #endif // BITCOIN_MEMUSAGE_H
     57diff --git a/src/support/allocators/node_allocator/allocator.h b/src/support/allocators/node_allocator/allocator.h
     58index 778471bb20b..c728ab8638a 100644
     59--- a/src/support/allocators/node_allocator/allocator.h
     60+++ b/src/support/allocators/node_allocator/allocator.h
     61@@ -5,7 +5,6 @@
     62 #ifndef BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_ALLOCATOR_H
     63 #define BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_ALLOCATOR_H
     64 
     65-#include <support/allocators/node_allocator/allocator_fwd.h>
     66 #include <support/allocators/node_allocator/memory_resource.h>
     67 
     68 #include <cstdint>
     69diff --git a/src/support/allocators/node_allocator/allocator_fwd.h b/src/support/allocators/node_allocator/allocator_fwd.h
     70deleted file mode 100644
     71index e3fa0ddd889..00000000000
     72--- a/src/support/allocators/node_allocator/allocator_fwd.h
     73+++ /dev/null
     74@@ -1,17 +0,0 @@
     75-// Copyright (c) 2021 The Bitcoin Core developers
     76-// Distributed under the MIT software license, see the accompanying
     77-// file COPYING or http://www.opensource.org/licenses/mit-license.php.
     78-
     79-#ifndef BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_ALLOCATOR_FWD_H
     80-#define BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_ALLOCATOR_FWD_H
     81-
     82-#include <cstddef>
     83-
     84-namespace node_allocator {
     85-
     86-template <typename T, size_t ALLOCATION_SIZE_BYTES>
     87-class Allocator;
     88-
     89-} // namespace node_allocator
     90-
     91-#endif // BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_ALLOCATOR_FWD_H
     92diff --git a/src/support/allocators/node_allocator/memory_resource.cpp b/src/support/allocators/node_allocator/memory_resource.cpp
     93deleted file mode 100644
     94index 368752fa2b9..00000000000
     95--- a/src/support/allocators/node_allocator/memory_resource.cpp
     96+++ /dev/null
     97@@ -1,12 +0,0 @@
     98-#include <support/allocators/node_allocator/memory_resource.h>
     99-
    100-#include <memusage.h>
    101-
    102-namespace node_allocator::detail {
    103-
    104-size_t DynamicMemoryUsage(size_t pool_size_bytes, std::vector<std::unique_ptr<char[]>> const& allocated_pools)
    105-{
    106-    return memusage::MallocUsage(pool_size_bytes) * allocated_pools.size() + memusage::DynamicUsage(allocated_pools);
    107-}
    108-
    109-} // namespace node_allocator::detail
    110diff --git a/src/support/allocators/node_allocator/memory_resource.h b/src/support/allocators/node_allocator/memory_resource.h
    111index f4ba0138924..e60e847c86d 100644
    112--- a/src/support/allocators/node_allocator/memory_resource.h
    113+++ b/src/support/allocators/node_allocator/memory_resource.h
    114@@ -8,6 +8,8 @@
    115 #include <utility>
    116 #include <vector>
    117 
    118+#include <memusage.h>
    119+
    120 #ifndef BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_MEMORY_RESOURCE_H
    121 #define BITCOIN_SUPPORT_ALLOCATORS_NODE_ALLOCATOR_MEMORY_RESOURCE_H
    122 
    123@@ -38,12 +40,6 @@ template <typename T>
    124     return ((size_max + alignment_max - 1U) / alignment_max) * alignment_max;
    125 }
    126 
    127-/**
    128- * Calculates dynamic memory usage of a MemoryResource. Not a member of MemoryResource so we can implement this
    129- * in a cpp and don't depend on memusage.h in the header.
    130- */
    131-[[nodiscard]] size_t DynamicMemoryUsage(size_t pool_size_bytes, std::vector<std::unique_ptr<char[]>> const& allocated_pools);
    132-
    133 } // namespace detail
    134 
    135 /**
    136@@ -102,6 +98,7 @@ public:
    137     {
    138         static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
    139         static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure AllocatePool() aligns correctly");
    140+        ++m_allocations;
    141 
    142         if (m_free_allocations) {
    143             // we've already got data in the free list, unlink one element
    144@@ -138,6 +135,7 @@ public:
    145     void Deallocate(void* p) noexcept
    146     {
    147         static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
    148+        --m_allocations;
    149 
    150         // put it into the linked list.
    151         //
    152@@ -150,11 +148,11 @@ public:
    153     }
    154 
    155     /**
    156-     * Calculates bytes allocated by the memory resource.
    157+     * Calculates number of bytes allocated to the memory resource minus number of bytes allocated from the memory resource.
    158      */
    159     [[nodiscard]] size_t DynamicMemoryUsage() const noexcept
    160     {
    161-        return detail::DynamicMemoryUsage(POOL_SIZE_BYTES, m_allocated_pools);
    162+        return memusage::MallocUsage(POOL_SIZE_BYTES) * m_allocated_pools.size() + memusage::DynamicUsage(m_allocated_pools) - ALLOCATION_SIZE_BYTES * m_allocations;
    163     }
    164 
    165 private:
    166@@ -171,6 +169,9 @@ private:
    167         m_untouched_memory_end = m_untouched_memory_iterator + POOL_SIZE_BYTES;
    168     }
    169 
    170+    //! Number of current allocations.
    171+    size_t m_allocations = 0;
    172+
    173     //! Contains all allocated pools of memory, used to free the data in the destructor.
    174     std::vector<std::unique_ptr<char[]>> m_allocated_pools{};
    175 
    

    martinus commented at 7:40 am on November 27, 2021:

    Thanks for the patch! This makes discussion a lot easier. I think there’s one bug in it, this now overestimates memory usage. In memusage.h all maps are now calculated like this:

    0return MallocUsage(node_size) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
    

    To correct for the incorrectly added MallocUsage(node_size) * m.size() it should be memoryusage::MallocUsage(ALLOCATION_SIZE_BYTES) * m_allocations. That should be correct in most cases, but it might not be when RequiredAllocationSizeBytes<typename NodeSize...> (in factory.h) gives a different result than the NodeSize. The only way to do this correctly is to also pass the result of NodeSize to the MemoryResource, but this complicates things.

    I also don’t like that in most cases memoryresource.DynamicMemoryUsage() will return a negative number (causing size_t underflow) to correct for the overestimation in memusage. This will happen whenever the overestimation is large enough.


    ryanofsky commented at 7:59 pm on November 30, 2021:

    re: #22702 (review)

    Thanks for the patch! This makes discussion a lot easier. I think there’s one bug in it […]

    That’s a good catch, but there is an easier fix for for that bug. Space internally wasted by MemoryResource should count towards MemoryResource usage. The fix actually simplifies code even more

     0diff --git a/src/support/allocators/node_allocator/memory_resource.h b/src/support/allocators/node_allocator/memory_resource.h
     1index e60e847c86d..c1bf680e071 100644
     2--- a/src/support/allocators/node_allocator/memory_resource.h
     3+++ b/src/support/allocators/node_allocator/memory_resource.h
     4@@ -98,7 +98,7 @@ public:
     5     {
     6         static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
     7         static_assert(__STDCPP_DEFAULT_NEW_ALIGNMENT__ >= std::alignment_of_v<T>, "make sure AllocatePool() aligns correctly");
     8-        ++m_allocations;
     9+        m_allocated += sizeof(T);
    10 
    11         if (m_free_allocations) {
    12             // we've already got data in the free list, unlink one element
    13@@ -135,7 +135,7 @@ public:
    14     void Deallocate(void* p) noexcept
    15     {
    16         static_assert(ALLOCATION_SIZE_BYTES == detail::RequiredAllocationSizeBytes<T>());
    17-        --m_allocations;
    18+        m_allocated -= sizeof(T);
    19 
    20         // put it into the linked list.
    21         //
    22@@ -152,7 +152,7 @@ public:
    23      */
    24     [[nodiscard]] size_t DynamicMemoryUsage() const noexcept
    25     {
    26-        return memusage::MallocUsage(POOL_SIZE_BYTES) * m_allocated_pools.size() + memusage::DynamicUsage(m_allocated_pools) - ALLOCATION_SIZE_BYTES * m_allocations;
    27+        return memusage::MallocUsage(POOL_SIZE_BYTES) * m_allocated_pools.size() + memusage::DynamicUsage(m_allocated_pools) - m_allocated;
    28     }
    29 
    30 private:
    31@@ -169,8 +169,8 @@ private:
    32         m_untouched_memory_end = m_untouched_memory_iterator + POOL_SIZE_BYTES;
    33     }
    34 
    35-    //! Number of current allocations.
    36-    size_t m_allocations = 0;
    37+    //! Number of bytes allocated.
    38+    size_t m_allocated = 0;
    39 
    40     //! Contains all allocated pools of memory, used to free the data in the destructor.
    41     std::vector<std::unique_ptr<char[]>> m_allocated_pools{};
    

    I also don’t like that in most cases memoryresource.DynamicMemoryUsage() will return a negative number (causing size_t underflow) to correct for the overestimation in memusage. This will happen whenever the overestimation is large enough.

    I don’t understand this. The only negative term in the DynamicMemoryUsage calculation is the actual number of bytes allocated, not a memusage estimate. The result should always be positive, and higher with more memusage overestimation.


    martinus commented at 7:58 am on December 1, 2021:

    Here is a concrete example that I just tried with CCoinsMap. I created a CCoinsMap and added 100k entries. memusage::DynamicUsage calculates this:

    0return MallocUsage(node_size) * m.size() + MallocUsage(sizeof(void*) * m.bucket_count());
    1// return MallocUsage(104) * 100000 + MallocUsage(8 * 172933)  
    2// return 128 * 100000 + 1383488
    3// return 14183488
    

    So DynamicUsage estimates 13.5MB memory usage. This is too much, because node_allocator doesn’t have the overhead of malloc. MallocUsage uses 128 bytes per node, but when node_allocator the bytes per node are actually just 104. So when calculating MemoryResource::DynamicMemoryUsage we’d have to subtract the 128*100000 and not 104*100000 to get the correct memory usage:

    0return memusage::MallocUsage(POOL_SIZE_BYTES) * m_allocated_pools.size() + memusage::DynamicUsage(m_allocated_pools) - ALLOCATION_SIZE_BYTES * m_allocations;
    1// return MallocUsage(262080) * 40 + 528 - 104 * 100000 // <-- incorrect, has to be 128*100000
    2// return 262096 * 40 + 528 - 128*100000 // correct, but no easy way to get the correct 128 bytes within MemoryResource
    3// return -2315632
    
  146. ryanofsky approved
  147. ryanofsky commented at 9:51 pm on November 22, 2021: contributor

    Code review ACK 8dfd71ca7fddf1a9436d04a16f5a8d9d98ee8c74. Finally caught up on this again! Appreciate the new count-allocations test, and all the breakups and cleanups.

    I did leave some more suggestions below about not making memusage.h depend on node_allocator, and only making the dependency go the other way. But I raised concerns about that before, and think they are minor, and it’s fine ignore them and stick with the current approach. Overall everything looks very good here!

  148. ryanofsky approved
  149. DrahtBot added the label Needs rebase on Dec 1, 2021
  150. martinus force-pushed on Dec 11, 2021
  151. martinus commented at 7:26 am on December 11, 2021: contributor

    Rebased to 02da536 to resolve conflicts, and some very minor changes:

    • Fixed some comments
    • use static constexpr size_t VALUE in NodeSize
  152. DrahtBot removed the label Needs rebase on Dec 11, 2021
  153. laanwj added this to the milestone 24.0 on Feb 10, 2022
  154. DrahtBot added the label Needs rebase on Mar 16, 2022
  155. martinus force-pushed on Mar 21, 2022
  156. martinus commented at 8:15 pm on March 21, 2022: contributor
    Rebased to resolve conflict in src/Makefile.bench.include
  157. DrahtBot removed the label Needs rebase on Mar 21, 2022
  158. martinus force-pushed on Mar 22, 2022
  159. jonatack commented at 10:19 am on March 22, 2022: contributor
    Now that v23 has been branched off, this would be a good time to move forward on this. Planning to re-review.
  160. laanwj commented at 11:00 am on March 30, 2022: member

    Now that v23 has been branched off, this would be a good time to move forward on this.

    Agree, it would be good to get this in early in the release cycle. I’ll start testing it as well.

  161. DrahtBot added the label Needs rebase on Apr 26, 2022
  162. Correctly calculate node size for std::unordered_map
    Node sizes are calculated correctly for libc++, libstdc++ and MSVC
    implementations of std::unordered_map. This is tested for all kind of
    possible types used for std::unordered_map.
    51b0394b81
  163. Make use of node_allocator::NodeSize in DynamicUsage
    Makes use of the new NodeSize calculation in memusage::DynamicUsage
    so that the memory usage is more accurate.
    c5c111ac45
  164. Add allocator for node based containers
    This allocator is potentially useful for all node-based containers like
    `std::list`, `std::map`, `std::unordered_map` that are heavily used.
    
    The allocator uses pools to be more memory efficient and faster than
    the default std::allocator. Currently this provides specializations for
    `std::unordered_map`, and other specializations can be added as needed.
    
    This updates the `memusage::DynamicUsage` calculation to take the
    node_allocator allocator into account.
    
    Also, the dynamic usage for standard `std::unordered_map` is made more
    accurate with the new node_size calculation.
    
    Rule of 5, drop unused function, spelling fixups and more code cleanup
    by jonatack, countless suggestions & improvements by ryanofsky.
    
    Co-authored-by: Jon Atack <jon@atack.com>
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    159fa152fc
  165. Make use of node_allocator in CCoinsMap
    In my benchmarks, using this pool allocator for CCoinsMap gives about
    20% faster `-reindex-chainstate` with -dbcache=5000 with practically the
    same memory usage. The change in max RSS changed was 0.3%.
    
    Note that the memory usage behavior is now different, since MemoryResource
    does not free any memory when the map is cleared. Instead, the memory is
    held in a freelist. This requires now that `CCoinsViewCache::Flush`
    needs to call `ReallocateCache` so memory is actually freed. If that
    wouldn't happen, memory usage would not decrease and Flush() would be
    triggered again soon after.
    
    Also, the `validation_flush_tests` tests need to be updated because
    memory allocation is now done in large pools instead of one node at a
    time, so the limits need to be updated accordingly.
    01d77649f7
  166. Don't call ReallocateCache twice in ResizeCoinsCaches
    Removes the call to CoinsTip().ReallocateCache() in ResizeCoinsCaches because this is now done in CCoinsViewCache::Flush().
    
    Also makes ReallocateCache private because it's not needed outside any more.
    95c53bf2c2
  167. martinus force-pushed on Apr 26, 2022
  168. martinus commented at 8:39 am on April 26, 2022: contributor
    Rebased to resolve conflict in Makefile.test.include By the way, is there an easy way to see what has actually changed in a rebase? I just moved 2 lines.
  169. DrahtBot removed the label Needs rebase on Apr 26, 2022
  170. jonatack commented at 7:40 pm on May 6, 2022: contributor

    Rebased to resolve conflict in Makefile.test.include By the way, is there an easy way to see what has actually changed in a rebase? I just moved 2 lines.

    git range-diff a19f641 472f5b0 95c53bf

    (Will re-review this PR soon.)

  171. laanwj commented at 2:11 pm on May 10, 2022: member
    I have been testing this on a busy node (~400 connections) for more than a month. I have noticed no crashes, no memory leaks, and no other unexpected behavior.
  172. sipa commented at 4:53 pm on May 10, 2022: member
    I see a number of unaddressed comments here by @ryanofsky. What is the status of those?
  173. martinus commented at 6:58 pm on May 10, 2022: contributor

    I see a number of unaddressed comments here by @ryanofsky. What is the status of those?

    I’ve now gone through all unaddressed comments and marked all as resolved that I consider fully resolved. As far as I can say only nits remain

  174. laanwj commented at 11:34 am on June 2, 2022: member

    I have been testing this on a busy node (~400 connections) for more than a month. I have noticed no crashes, no memory leaks, and no other unexpected behavior.

    Still no issues!

    Still, to be honest, I think this change is a bit scary, and I have a hard time signing off on it. The reason is that it adds quite a lot of complexity, and depends on quite some C++ magic and cleverness (e.g. relying on unordered_map internals). C++ being not exactly a memory-safe language, this increases the risk of accidental foot-guns. I appreciate the work it’s great to make things faster, and I really don’t want to discourage it, but, especially as long as it’s used for consensus code we do have to err on the side of safety and correctness.

  175. martinus commented at 9:07 am on June 3, 2022: contributor

    Still, to be honest, I think this change is a bit scary

    I guess it is, it got a lot more complex that I initially anticipated. I don’t think it can be made much simpler. I’m pretty sure it works, but obviously there’s no 100% guarantee. So I’m fine with it not being merged, at least then I can’t be responsible for a bitcoind crash :sweat_smile:

    Since the code here is not tightly coupled to bitcoind, when I have the time I will try to create a generic c++ library out of it.

    Thanks everyone who reviewed & tested or otherwise spent time on this!

  176. martinus closed this on Jun 3, 2022

  177. jonatack commented at 9:13 am on June 3, 2022: contributor
    @martinus this pull was discussed yesterday at the weekly meeting (see https://www.erisian.com.au/bitcoin-core-dev/log-2022-06-02.html#l-358). I’ve been running it without issues as well for some time and am concept ACK (but need to re-review carefully). @sipa wrote “I need to look at the latest code changes again, but I’m very concept ACK on that allocator”, followed by @laanwj “to be clear i’m not against it, but we need to be really sure it gets enough review”. For info :)
  178. 0xB10C commented at 1:12 pm on June 3, 2022: contributor
    I’m trying to understand what potential problems we fear here. Is it that the node crashes or is it consensus problems (or both)?
  179. laanwj commented at 1:22 pm on June 3, 2022: member

    I didn’t mean for it to be closed @martinus. Of course, you’re free to. Making a generic library sounds like a good idea. We could subtree or import it once it’s “proven”.

    I just wanted to voice my concern. If we can drum up enough reviewers here with the appropriate knowledge about C++ internals (and it doesn’t all rest on @sipa’s shoulders, for example), it’d be good. Something that was proposed in the meeting yesterday was an advanced review club.

    I’m trying to understand what potential problems we fear here. Is it that the node crashes or is it consensus problems (or both)?

    Memory corruption resulting in consensus problems would be the worst outcome. Other crashes (especially if hard to debug / diagnose) would be bad too, of course.

  180. jamesob commented at 2:23 pm on June 3, 2022: member

    I’m trying to understand what potential problems we fear here. Is it that the node crashes or is it consensus problems (or both)?

    I think the major concern here could be summarized as “we’re worried about unknown unknowns.” Swapping out an allocator (or as in the original case of #16718, the whole map implementation) entails displacing a lot of low-level C++ code that is widely deployed and tested by the standard library with code that we (or if not just us, a much smaller subset of the software world) maintain and test.

    If anything goes wrong with the UTXO set, it’s no news to everyone that it certainly spells big issues for bitcoin. While the performance gains here are very enticing, and indeed @luke-jr earlier made the interesting point that maybe moving this critical code into our domain of control (and out of the black box of std) may be preferable, since we would explicitly understand and maintain a very core dependency of the validation process, this “mountain man” approach obviously has an associated cost: maintaining such an important and sensitive piece of the system represents a liability that we would no longer shoulder alongside the wider C++ usage base.

    I originally proposed the hashmap implementation change that (I think?) inspired @martinus to undertake this work. I’m glad he has done this work, even if it remains experimental, because it’s important for us to understand this design space better. But that said I can understand why @laanwj would express an abundance of caution about this kind of thing, and it resonates with me. Everyone here understands that a change like this should not be done lightly, but even given that, it’s hard to know when we’ve “tested enough” to know that this will be safe for perpetuity. Unknown unknowns.

    All that said, I’m far from the most qualified systems guy on this project, so it would be interesting to hear from others like @sipa @ryanofsky @theuni @TheBlueMatt. Maybe we can fuzz this to death on a bunch of different platforms and convince ourselves that it’s as safe or safer than using std’s allocator. Such a determination is out of my ability.

  181. martinus commented at 3:04 pm on June 5, 2022: contributor
    I think I’ve found a way to get almost all of the benefit of the node_allocator with a significantly less scary implementation. Same memory advantage, but a bit slower. I’ll prepare a new PR :+1:
  182. sipa commented at 6:18 pm on June 5, 2022: member

    @martinus I started working on a fuzz test for the allocator, to show at least its behavior w.r.t. its external interface is correct.

    If you’re going to simplify things: will there still be an allocator class, and a concept of a memory resource? If so, will the memory resource still be parametrized by the size of the pooled objects in it?

  183. martinus commented at 8:26 pm on June 5, 2022: contributor

    @martinus I started working on a fuzz test for the allocator, to show at least its behavior w.r.t. its external interface is correct.

    That’s awesome!

    If you’re going to simplify things: will there still be an allocator class, and a concept of a memory resource? If so, will the memory resource still be parametrized by the size of the pooled objects in it?

    There will be just a memory resource, which will be passed to a std::pmr::polymorphic_allocator. I have a prototype of the memory resource here: https://gist.github.com/martinus/0cee4dd177beda78bb1e2e3d128c2d70 That’s basically all that is needed in <100 lines of code. This implements the abstract std::pmr::memory_resource interface which is much simpler than implementing a full allocator. Doesn’t need any rebinding, type casting, reinterpret_cast.

    Also, instead of having to correctly guess the pool’s memory size I simply use a vector with multiple pools of different size. Most of them will stay empty, but that overhead is negligible.

    Usage is like so:

    0using CCoinsMap = std::pmr::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher>;
    1auto mr = NodePoolResource<256>(); // large enough to fit nodes size
    2auto map = CCoinsMap{0, SaltedOutpointHasher{}, std::equal_to<COutPoint>{}, &mr};
    
  184. sipa commented at 4:05 pm on June 6, 2022: member

    @martinus Cool, that looks a lot simpler. The runtime indirection from the polymorphic resource does sound like it’d introduce a performance penalty, so I think we’ll want benchmarks to see if it doesn’t hurt.

    A few thoughts:

    • I like the idea of giving the resource a pool for every allocation size up to a limit; that seems far less fragile than something that only works for one size.
    • It seems it would be possible to actually (possibly later, even) introduce a class like std::polymorphic_allocator, but which is parametrized by the concrete type of the resource? If the NodePoolResource is marked final (perhaps a good idea to do in general), then the compiler would be able to optimize away the dynamic calls even. This would allow getting the same benefits as the earlier approach in a more incremental way (and perhaps may be useful for benchmarking the overhead of the polymorphism exactly).
    • I believe it may be desirable to round up size at the start of do_allocate / do_deallocate instead of using 0 for small sizes. While that is not optimally efficient, it’s still better in practice than deferring to the default allocator (as at least on common platforms, that has an overhead of at least two pointers). The requirements would be (a) size has to be a multiple of alignment_of(FreeList*) and of alignment, and (b) size has to be at least sizeof(FreeList*) and at least size. Because sizes are always a multiple of alignment, and alignments are always powers of two, I believe these two requirements can be met as std::max(S, (size + A - 1) & ~size_t{A - 1}), where S = sizeof(FreeList*) and A = alignment_of(FreeList*).
    • When the above is done, I think the code can otherwise actually ignore the input alignment value (making it work for any alignment up to the alignment of the pools themselves, which ought to be sufficient for any object).
    • Would it make sense to use an array of pools, rather than a vector? That avoids one memory indirection at runtime, and for MAX_BLOCK_SIZE_BYTES=256, an array of 63 elements suffices (even with A=S=4).
  185. martinus commented at 10:59 am on June 7, 2022: contributor

    Thanks @spia for having a close look!

    I believe it may be desirable to round up size at the start of do_allocate / do_deallocate instead of using 0 for small sizes.

    I can do that and it probably makes the resource more widely usable, but at least for the CCoinsMap it won’t make a difference because it has the same alignment as the FreeList*.

    Would it make sense to use an array of pools, rather than a vector?

    Yes! I originally wanted to get rid of the template argument, but using std::array is actually a bit faster in my benchmarks so I’ll go that route.

    I’m working on that in this branch, where I’m now using std::array and rounding up: https://github.com/bitcoin/bitcoin/compare/master...martinus:2022-06-very-not-scary-NodePoolResource

    I hope to have something PR-ready in a week or two.

  186. sipa commented at 8:21 pm on June 7, 2022: member

    @martinus Here is a generic memory resource fuzzer: https://github.com/sipa/bitcoin/commits/202206_pmrallocfuzz

    It also includes a somewhat modified version of your code, with a few more template parameters/sanity checks, and the rounding as I was thinking of it. Seems to work great. Feel free to try on top of your version of the resource, or combine ideas/whatever.

    I like making the code generally usable in wider scenarios than just the ones we care about, as it means generic tests can be written for it, which need fewer assumptions.

    Another idea for something to include in the tests which I haven’t done: add a member function to the resource to ask if it still has any outstanding allocations (by summing the size of all allocated chunks, and comparing with the sum of all elements in the freelist).

  187. martinus commented at 5:31 am on June 10, 2022: contributor

    Another idea for something to include in the tests which I haven’t done: add a member function to the resource to ask if it still has any outstanding allocations (by summing the size of all allocated chunks, and comparing with the sum of all elements in the freelist).

    Excellent idea, I’ve now added an even stronger check: make sure that all data in the freelist actually comes from the chunks, doesn’t overlap in any way, and each byte from the chunks can be found in the freelists or is still available

  188. fanquake removed this from the milestone 24.0 on Sep 13, 2022
  189. achow101 referenced this in commit 5aa0c82ccd on Apr 20, 2023
  190. bitcoin locked this on Sep 13, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-26 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me