coins: cache UTXO outpoint hash codes #35195

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/noexcept-false changing 2 files +21 −7
  1. l0rinc commented at 4:22 PM on May 2, 2026: contributor

    Problem: CCoinsMap uses SaltedOutpointHasher for UTXO cache lookups. Since #16957 marked the hasher noexcept, libstdc++ no longer stores cached hash codes in unordered-map nodes.

    #16957 optimized for memory: the original benchmark reported about 9.4% lower peak RSS, but also about 1.6% slower runtime from recomputing hashes. This PR takes the opposite side of that same tradeoff for the UTXO cache: allow libstdc++ to store one cached hash code per unordered-map node, avoiding repeated SipHash work in rehashing and some table operations.

    The supporting measurements repeatedly point in the same direction:

    • Full IBD/reindex-chainstate runs showed about 2-3% faster at both low and high -dbcache.
    • AssumeUTXO loading showed the strongest signal, up to about 11% faster. This is the cleanest dbcache-exercising benchmark here because it avoids block download and full validation work, so the coins-cache effect is less diluted.
    • Compatibility measurements on top of the input-fetcher parallelization and SipHash optimization work still showed cached hash codes helping: about 3% faster on an Intel N150 after the SipHash change, and still about 1% faster on M4 with -dbcache=10000.
    • Theoretically, memory can increase on libstdc++ by up to one cached size_t per node. In practice, recent Massif runs were roughly neutral at moderate dbcache values, while a very large dbcache run showed the expected higher peak.

    For CCoinsMap, no allocator sizing change is needed since its PoolAllocator block size already reserves implementation-defined node overhead and explicitly accounts for implementations where "the hash value is stored as well". That assumption remained in the code after the #16957 noexcept change disabled cached hash codes for this hasher, so there is no memory calculation to restore in this PR.

    This is also consistent with concerns raised in the original #16957 discussion: reviewers noted the noexcept change was slower, that it imposed a penalty when memory was not limiting, and that "it might be possible to regain the performance loss by caching the hash ourselves." This PR lets libstdc++ do that caching for us.

    Fix: Revert #16957's SaltedOutpointHasher noexcept change, then document the now-intentional noexcept(false) contract and add a focused unit test.

    The implementation still does not throw; only the exception specification used by libstdc++'s unordered-map node policy changes: for a fast hash functor that may throw, libstdc++ caches hash codes in nodes.

    The test checks both inputs to that policy:

    • SaltedOutpointHasher is not nothrow-invocable
    • on libstdc++, it remains classified as a fast hash

    This mirrors the type-level contract style used in libstdc++'s own hash tests.

    Reproducer:

    <details> <summary>Fresh reindex-chainstate run</summary>

    for DBCACHE in 1000 30000; do \
        COMMITS="ef499680c8d426ac04f3a5d4ad67cbb3426e0e21 1016fba624f840fc893b8de61df4a149ac9551a8"; \
        STOP=946649; CC=gcc; CXX=g++; \
        BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
        (echo ""; for c in $COMMITS; do git fetch -q origin "$c" 2>/dev/null || true; git log -1 --pretty='%h %s' $c || exit 1; done) && \
        (echo "" && echo "$(date -I) | reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
        hyperfine \
        --sort command \
        --runs 1 \
        --export-json "$BASE_DIR/rdx-$(sed -E 's/[^ ]+/\L&/g;s/[.]/_/g;s/ /-/g'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
        --parameter-list COMMIT ${COMMITS// /,} \
        --prepare "killall -9 bitcoind 2>/dev/null; rm -f ./build/bin/bitcoind; git clean -fxd; git reset --hard {COMMIT} && \
          cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind -j1 && \
          ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20; rm -f $DATA_DIR/debug.log; rm -rfd $DATA_DIR/indexes;" \
        --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block [#1](/bitcoin-bitcoin/1/)' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log && grep 'Bitcoin Core version' $DATA_DIR/debug.log | grep -q \"\$(git rev-parse --short=12 {COMMIT})\"; \
                    cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
        "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0";
    done
    
    ef499680c8 Merge bitcoin/bitcoin#34176: wallet: crash fix, handle non-writable db directories
    1016fba624 util: allow caching outpoint hash codes
    
    2026-04-30 | reindex-chainstate | 946649 blocks | dbcache 1000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = ef499680c8d426ac04f3a5d4ad67cbb3426e0e21)
      Time (abs ≡):        19288.792 s               [User: 33333.953 s, System: 1949.252 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 1016fba624f840fc893b8de61df4a149ac9551a8)
      Time (abs ≡):        18968.171 s               [User: 33375.739 s, System: 2030.692 s]
    
    Relative speed comparison
            1.02          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = ef499680c8d426ac04f3a5d4ad67cbb3426e0e21)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 1016fba624f840fc893b8de61df4a149ac9551a8)
    
    ef499680c8 Merge bitcoin/bitcoin#34176: wallet: crash fix, handle non-writable db directories
    1016fba624 util: allow caching outpoint hash codes
    
    2026-04-30 | reindex-chainstate | 946649 blocks | dbcache 30000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=30000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = ef499680c8d426ac04f3a5d4ad67cbb3426e0e21)
      Time (abs ≡):        17615.001 s               [User: 24395.224 s, System: 771.559 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=30000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 1016fba624f840fc893b8de61df4a149ac9551a8)
      Time (abs ≡):        17209.581 s               [User: 24079.479 s, System: 785.327 s]
    
    Relative speed comparison
            1.02          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=30000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = ef499680c8d426ac04f3a5d4ad67cbb3426e0e21)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=946649 -dbcache=30000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 1016fba624f840fc893b8de61df4a149ac9551a8)
    

    </details>

    <details> <summary>Earlier supporting measurements</summary>

    Original #16957 result:

    master                                4:13:59   7696728 KiB
    2019-09-SaltedOutpointHasher-noexcept 4:18:11   6971412 KiB
    change                                +1.65%    -9.42%
    

    #16957 also showed that increasing dbcache on the noexcept branch could recover performance while staying below old memory use:

    master -dbcache=5000     4:13:59   7696728 KiB
    noexcept -dbcache=5471   4:01:16   7282044 KiB
    

    Earlier cached-hash measurements:

    IBD to height 909090, -dbcache=5000
    baseline:     27046.236 s ± 631.042 s
    cached hash:  26226.707 s ± 373.829 s
    change:       ~3.0% faster
    
    reindex-chainstate to height 909090, -dbcache=5000
    baseline:     27728.594 s ± 96.850 s
    cached hash:  26847.390 s ± 105.590 s
    change:       ~3.2% faster
    
    loadtxoutset, -dbcache=500
    baseline:     451.699 s ± 1.126 s
    cached hash:  422.857 s ± 1.488 s
    change:       ~6.4% faster
    
    loadtxoutset, -dbcache=3000
    baseline:     426.088 s ± 1.824 s
    cached hash:  403.923 s ± 1.043 s
    change:       ~5.2% faster
    
    loadtxoutset, -dbcache=4000
    baseline:     440.708 s ± 2.242 s
    cached hash:  423.934 s ± 4.145 s
    change:       ~3.8% faster
    

    Later noexcept removal measurements:

    reindex-chainstate, height 916000, -dbcache=15000
    baseline:     23856.269 s
    no noexcept:  22734.025 s
    change:       ~4.7% faster
    
    reindex-chainstate, height 917000, -dbcache=30000
    baseline:     12703.654 s
    no noexcept:  11957.649 s
    change:       ~5.9% faster
    
    assumeutxo on i9 SSD, -dbcache=450
    baseline:     347.241 s ± 3.282 s
    no noexcept:  309.558 s ± 1.951 s
    change:       ~10.9% faster
    

    Memory observations:

    Original [#16957](/bitcoin-bitcoin/16957/):
    master:      7696728 KiB
    noexcept:    6971412 KiB
    change:      -9.42%
    
    Recent Massif, assumeutxo -dbcache=450:
    baseline:     744.5 MB
    cached hash:  737.7 MB
    
    Recent Massif, assumeutxo -dbcache=4500:
    baseline:     4.640 GB
    cached hash:  4.641 GB
    
    Very large dbcache:
    baseline:     24.11 GB
    cached hash:  25.49 GB
    

    </details>

    <details> <summary>Compatibility with input-fetcher and `SipHash` work</summary>

    This change has been in my queue for more than a year, so some supporting measurements are older and were collected while adjacent input-fetcher and SipHash work was still evolving. The relevant compatibility measurements are documented in this [#31132](/bitcoin-bitcoin/31132/) review comment.

    After input fetching, SipHash inlining, and noexcept removal on an Intel N150:

    reindex-chainstate, height 943349, -dbcache=1000, Intel N150
    
    input fetcher baseline:     20014.228 s
    `SipHash` optimized:          18933.341 s
    no noexcept:                18388.590 s
    
    change from `SipHash` optimized to no noexcept: ~2.9% faster
    change from input fetcher baseline to no noexcept: ~8.1% faster
    

    After input fetching, SipHash inlining, and noexcept removal on an M4 Max:

    reindex-chainstate, height 943349, -dbcache=10000, Apple M4 Max
    
    input fetcher baseline:     4836.390 s
    `SipHash` optimized:          4543.009 s
    no noexcept:                4493.407 s
    
    change from `SipHash` optimized to no noexcept: ~1.1% faster
    change from input fetcher baseline to no noexcept: ~7.1% faster
    

    These measurements suggest cached hash-code nodes remain useful even when the hasher itself gets faster and block input fetching is parallelized.

    </details>

  2. Revert "make SaltedOutpointHasher noexcept"
    This reverts commit 67d99900b0d770038c9c5708553143137b124a6c.
    fb0c207567
  3. util: allow caching outpoint hash codes
    `SaltedOutpointHasher` has been `noexcept` since #16957, which makes libstdc++ omit cached hash codes from `std::unordered_map` nodes.
    
    That saves one `size_t` per node, but it also makes `CCoinsMap` recompute the outpoint hash in table operations that could otherwise reuse the cached code.
    
    Declare the operator `noexcept(false)` to restore libstdc++ hash-code caching for this fast hash functor.
    The implementation still does not throw; only the exception specification used by the container policy changes.
    
    This restores a value that #16957 deliberately removed, but the pool-backed `CCoinsMap` allocation budget still accounts for implementations storing hash values.
    The existing `PoolAllocator` comment says that "in some cases the hash value is stored as well" and that "sizeof(void*)*4" overhead "should thus be sufficient so that all implementations can allocate the nodes from the PoolAllocator."
    
    Add a unit test for the exception-specification contract and the libstdc++ fast-hash classification.
    16e77fdf13
  4. DrahtBot added the label UTXO Db and Indexes on May 2, 2026
  5. DrahtBot commented at 4:22 PM on May 2, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35195.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK optout21

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  6. luke-jr referenced this in commit 2969207e15 on May 3, 2026
  7. luke-jr referenced this in commit 6d2d191c0f on May 3, 2026
  8. l0rinc marked this as a draft on May 5, 2026
  9. l0rinc marked this as a draft on May 5, 2026
  10. in src/util/hasher.h:63 in fb0c207567
      68 | -     * memory savings of about 9% which allow for a larger dbcache setting.
      69 | -     *
      70 | -     * @see https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libstdc++/manual/manual/unordered_associative.html
      71 | -     */
      72 | -    size_t operator()(const COutPoint& id) const noexcept
      73 | +    size_t operator()(const COutPoint& id) const
    


    optout21 commented at 9:43 AM on May 28, 2026:

    fb0c207 Revert "make SaltedOutpointHasher noexcept":

    Nit: In this particular case, I don't see a value in separating into 2 commits, I would prefer a single commit.

  11. in src/test/hash_tests.cpp:18 in 16e77fdf13
      13 | +#ifdef __GLIBCXX__
      14 | +#include <bits/functional_hash.h>
      15 | +#endif
      16 | +
      17 | +#include <type_traits>
      18 | +
    


    optout21 commented at 9:46 AM on May 28, 2026:

    16e77fd util: allow caching outpoint hash codes:

    Nit: I suggest removing the blank lines here. A blank line as a separator between project and external include makes sense, but these are not needed.

  12. in src/test/hash_tests.cpp:60 in 16e77fdf13
      52 | @@ -46,6 +53,16 @@ BOOST_AUTO_TEST_CASE(murmurhash3)
      53 |  #undef T
      54 |  }
      55 |  
      56 | +BOOST_AUTO_TEST_CASE(salted_outpoint_hasher_cache_policy)
      57 | +{
      58 | +    constexpr bool nothrow_invocable{std::is_nothrow_invocable_v<const SaltedOutpointHasher&, const COutPoint&>};
      59 | +    BOOST_CHECK(!nothrow_invocable);
      60 | +#ifdef __GLIBCXX__
    


    optout21 commented at 9:46 AM on May 28, 2026:

    16e77fd util: allow caching outpoint hash codes:

    Nit: I suggest adding a blank line here, to show that the two parts are separate checks.

  13. in src/util/hasher.h:69 in 16e77fdf13
      65 | +     * `noexcept(false)` is intentional even though the body cannot throw:
      66 | +     * libstdc++ caches hash codes for potentially throwing fast hash functions.
      67 | +     *
      68 | +     * @see https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html
      69 | +     */
      70 | +    size_t operator()(const COutPoint& id) const noexcept(false)
    


    optout21 commented at 9:54 AM on May 28, 2026:

    16e77fd util: allow caching outpoint hash codes:

    Have you considered controlling hash-caching via __is_fast_hash (as described in the mentioned link)? Seems more clear than abusing 'noexcept'. Or that would not work on other platforms?

  14. optout21 commented at 10:02 AM on May 28, 2026: contributor

    Concept ACK 16e77fdf1324e7419eff5adb358f2c9b27db948a

    This PR attempts to improve performance, by allowing CCoinsMap key hashes to be cached. In general, the tradeoff between extra memory and extra CPU is not clear, but in this case benchmarks indicate that caching (more memory, less CPU) results in faster execution, therefore it should be applied. Code-wise I don't see issues or dangers reverting the noexcept on the hasher to noexcept(false). Left some nits.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-11 10:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me