dbwrapper: avoid copying `CDBIterator` keys in `GetKey()` #35128

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/dbwrapper-key-spanreader changing 3 files +8 −2
  1. l0rinc commented at 10:11 AM on April 21, 2026: contributor

    Problem

    CDBIterator::GetKey() only deserializes the current LevelDB key once and GetKeyImpl() already exposes that key as a contiguous borrowed byte span, and GetKey() creates a fresh local reader and only performs immediate forward reads before returning.

    The copied DataStream currently insulates the iterator entry from a failed decode, so switching to a borrowed reader is only safe if a deserialization failure still returns false and leaves the same key/value readable afterward.

    [!NOTE] The same simplification does not apply to GetValue(), because that path deobfuscates the value bytes in place first and still needs an owning mutable buffer.

    Fix

    Add a preparatory test with an invalid reads and checks that the failed decode does not consume the current iterator entry. Then switch GetKey() to SpanReader so the key bytes are read in place instead of being copied into a temporary DataStream.

    This keeps the same exception swallowing and bool return semantics while avoiding the extra allocation and copy.

    Context

    Related to #34483 and #35025

    Reproducer

    gettxoutsetinfo is ~10-12% faster for up-to-date blocks (run on SSD), see:

    <details><summary>2026-04-20 | gettxoutsetinfo | rpi5-8 | aarch64 | Cortex-A76 | 4 cores | 7.7Gi RAM | ext4 | SSD</summary>

    COMMITS="64a88c8c1edc7ee5cef623d9aa8179a239e27ce9 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf"; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    mkdir -p "$LOG_DIR" && \
    (echo ""; for c in $COMMITS; do git cat-file -e "$c^{commit}" 2>/dev/null || git fetch -q origin "$c" || exit 1; git log -1 --pretty='%h %s' "$c" || exit 1; done) && \
    (echo "" && echo "$(date -I) | gettxoutsetinfo | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
    hyperfine \
      --sort command \
      --runs 10 \
      --export-json "$BASE_DIR/gettxoutsetinfo-$(sed -E 's/([a-f0-9]{8})[a-f0-9]* ?/\1-/g;s/-$//'<<<"$COMMITS")-$(date +%s).json" \
      --parameter-list COMMIT ${COMMITS// /,} \
      --prepare "killall -9 bitcoind 2>/dev/null || true; rm -f $DATA_DIR/debug.log; git clean -fxd && git reset --hard {COMMIT} && \
        cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind bitcoin-cli -j$(nproc) && \
        ./build/bin/bitcoind -datadir=$DATA_DIR -connect=0 -listen=0 -dnsseed=0 -coinstatsindex=0 -txindex=0 -blockfilterindex=0 -daemon -printtoconsole=0; \
        ./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcwait getblockcount >/dev/null" \
      --conclude "./build/bin/bitcoin-cli -datadir=$DATA_DIR stop 2>/dev/null || true; killall bitcoind 2>/dev/null || true; sleep 10; \
        grep -q 'Done loading' $DATA_DIR/debug.log && grep 'Bitcoin Core version' $DATA_DIR/debug.log | grep -q \"\$(git rev-parse --short=12 {COMMIT})\"; \
        cp $DATA_DIR/debug.log $LOG_DIR/gettxoutsetinfo-{COMMIT}-$(date +%s).log" \
      "./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null"
    
    64a88c8c1e Merge bitcoin/bitcoin#35096: kernel: align height parameters to int32_t in btck API
    57dc0202dd dbwrapper: use SpanReader for iterator keys
    
    Benchmark 1: ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 64a88c8c1edc7ee5cef623d9aa8179a239e27ce9)
      Time (mean ± σ):     109.002 s ±  3.091 s    [User: 0.003 s, System: 0.004 s]
      Range (min … max):   106.191 s … 113.608 s    10 runs
    
    Benchmark 2: ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf)
      Time (mean ± σ):     97.711 s ±  1.172 s    [User: 0.003 s, System: 0.004 s]
      Range (min … max):   96.651 s … 100.104 s    10 runs
    
    Relative speed comparison
            1.12 ±  0.03  ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 64a88c8c1edc7ee5cef623d9aa8179a239e27ce9)
            1.00          ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf)
    

    </details>

  2. test: cover failed `CDBIterator::GetKey()` deserialization
    The upcoming change will replace the temporary owning `DataStream` inside `CDBIterator::GetKey()` with a borrowed reader over the current LevelDB key bytes.
    The copied `DataStream` currently insulates the iterator entry from a failed decode, so the optimization is only safe if a deserialization failure still returns `false` and leaves the same key/value readable afterward.
    
    Extend `dbwrapper_iterator` to read a one-byte key as a `uint16_t`.
    The read must fail, return `false`, and still allow the same key and value to be read afterward.
    This would fail if `GetKey()` stopped swallowing deserialization exceptions, or if a failed decode started consuming shared iterator state instead of only temporary reader state.
    
    Drop the dead `const_cast` in the test while here, since `dbw` is already non-const.
    
    Locking down that contract first makes the following `SpanReader` switch a behavior-preserving optimization.
    f0e498af5c
  3. dbwrapper: use `SpanReader` for iterator keys
    `CDBIterator::GetKey()` only deserializes the current LevelDB key once.
    `GetKeyImpl()` already exposes the current key as a contiguous borrowed byte span, and `GetKey()` creates a fresh local reader and only performs immediate forward reads before returning.
    
    Switch this path to `SpanReader` so the key bytes are read in place instead of being copied into a temporary `DataStream`.
    This keeps the same exception swallowing and `bool` return semantics while avoiding the extra allocation and copy.
    
    The preceding test locks down the subtle safety property that matters here: a failed decode must not consume the current iterator entry.
    Note that the same simplification does not apply to `GetValue()`, because that path deobfuscates the value bytes in place first and still needs an owning mutable buffer.
    5de2f97a05
  4. DrahtBot commented at 10:11 AM on April 21, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, andrewtoth, optout21, theStack, achow101

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  5. sedited approved
  6. sedited commented at 12:51 PM on April 21, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

  7. andrewtoth approved
  8. andrewtoth commented at 1:50 PM on April 21, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

    Saw a ~7% performance boost on gettxoutsetinfo on my i9.

  9. andrewtoth commented at 2:24 PM on April 21, 2026: contributor

    Also got a nice speedup by not reallocating a DataStream for each GetValue:

    diff --git a/src/dbwrapper.h b/src/dbwrapper.h
    index 75ac5fdb05..c439c0f03b 100644
    --- a/src/dbwrapper.h
    +++ b/src/dbwrapper.h
    @@ -124,6 +124,7 @@ public:
     private:
         const CDBWrapper &parent;
         const std::unique_ptr<IteratorImpl> m_impl_iter;
    +    DataStream m_val_stream{};
     
         void SeekImpl(std::span<const std::byte> key);
         std::span<const std::byte> GetKeyImpl() const;
    @@ -163,9 +164,10 @@ public:
     
         template<typename V> bool GetValue(V& value) {
             try {
    -            DataStream ssValue{GetValueImpl()};
    -            dbwrapper_private::GetObfuscation(parent)(ssValue);
    -            ssValue >> value;
    +            m_val_stream.clear();
    +            m_val_stream.write(GetValueImpl());
    +            dbwrapper_private::GetObfuscation(parent)(m_val_stream);
    +            m_val_stream >> value;
             } catch (const std::exception&) {
                 return false;
             }
    

    Feel free to take it.

  10. l0rinc commented at 9:36 AM on April 22, 2026: contributor

    got a nice speedup by not reallocating a DataStream for each GetValue

    I agree, this should be tackled too! There are a few similar DataStream reuse opportunities, so I think it makes more sense to handle them consistently instead of special-casing GetValue() in isolation. That needs a broader audit to make sure the reused scratch buffers are not shared across threads/reentrant call paths, and to make sure oversized values don’t leave large allocations hanging around longer than intended - so I’d prefer to do it in a follow-up PR rather than fold it into this one.


    I’ve been sketching a small shared helper for that (ScopedDataStreamUsage): require the scratch stream to be empty on entry, clear it on exit, and drop unusually large retained allocations. With that in place we can apply the pattern consistently and also remove the now-redundant per-call reserve() handling.

    diff --git a/src/streams.h b/src/streams.h
    --- a/src/streams.h	(revision c2c5d65d7f6755ea9b3e8c3d21c5577be8d85cf1)
    +++ b/src/streams.h	(revision 7480c31ff5bf68cb34c8cb092302ffab65549a5a)
    @@ -9,6 +9,7 @@
     #include <serialize.h>
     #include <span.h>
     #include <support/allocators/zeroafterfree.h>
    +#include <util/byte_units.h>
     #include <util/check.h>
     #include <util/log.h>
     #include <util/obfuscation.h>
    @@ -198,6 +199,7 @@
         iterator end()                                   { return vch.end(); }
         size_type size() const                           { return vch.size() - m_read_pos; }
         bool empty() const                               { return vch.size() == m_read_pos; }
    +    size_t capacity() const                          { return vch.capacity(); }
         void resize(size_type n, value_type c = value_type{}) { vch.resize(n + m_read_pos, c); }
         void reserve(size_type n)                        { vch.reserve(n + m_read_pos); }
         const_reference operator[](size_type pos) const  { return vch[pos + m_read_pos]; }
    @@ -268,6 +270,34 @@
         size_t GetMemoryUsage() const noexcept;
     };
     
    +// Require empty scratch streams on entry and reset them on exit.
    +class ScopedDataStreamUsage
    +{
    +    static constexpr auto DEFAULT_MAX_CAPACITY{1_MiB};
    +
    +    DataStream& m_stream;
    +    const size_t m_max_capacity;
    +
    +public:
    +    explicit ScopedDataStreamUsage(DataStream& stream, size_t max_capacity = DEFAULT_MAX_CAPACITY)
    +        : m_stream(stream), m_max_capacity(max_capacity)
    +    {
    +        assert(m_stream.empty());
    +    }
    +
    +    ScopedDataStreamUsage(const ScopedDataStreamUsage&) = delete;
    +    ScopedDataStreamUsage& operator=(const ScopedDataStreamUsage&) = delete;
    +
    +    ~ScopedDataStreamUsage()
    +    {
    +        if (m_stream.capacity() <= m_max_capacity) [[likely]] {
    +            m_stream.clear(); // Keep the existing capacity for the next reuse.
    +        } else {
    +            m_stream = DataStream{}; // Drop unusually large scratch allocations.
    +        }
    +    }
    +};
    +
     template <typename IStream>
     class BitStreamReader
     {
    

    I’d like to keep this PR scoped to the GetKey() change and take the GetValue()/other scratch-buffer reuses in a separate PR.

  11. andrewtoth commented at 4:21 PM on April 22, 2026: contributor

    I’d prefer to do it in a follow-up PR

    I don't think it depends on this, so please make a PR and I will ACK!

    There are a few similar DataStream reuse opportunities

    I wasn't able to find any others. Other places with one-shot DataStream usages would not work due to being const and/or not being thread-safe. CDBIterator must only be accessed in a single thread. This seems like the only place where this would be a benefit. Maybe Seek but that is only called once, and can reuse the same one in CDBIterator. Did I miss anywhere?

    make sure oversized values don’t leave large allocations hanging around longer than intended

    The usages are Coin, CDiskBlockIndex, compact block filters. None of these are very large. Also, all usages of CDBIterator are wrapped in a local std::unique_ptr which is then destroyed after it is iterated and goes out of scope.

  12. l0rinc force-pushed on Apr 23, 2026
  13. l0rinc force-pushed on Apr 23, 2026
  14. DrahtBot added the label CI failed on Apr 23, 2026
  15. l0rinc commented at 8:04 AM on April 23, 2026: contributor

    (started working on ScopedDataStreamUsage and accidentally pushed to this branch - restored it to the acked version)

  16. DrahtBot removed the label CI failed on Apr 23, 2026
  17. optout21 commented at 9:49 AM on April 23, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

    A simple fix for an unnecessary extra copy when accessing keys from iterator in LevelDB. Good catch! Reviewed code; analyzed additional suggestions. Verified that the new test passes with and without the second commit. Also triggered the test to fail by intentionally breaking the code (in the catch branch seek past last).

  18. theStack approved
  19. theStack commented at 3:28 PM on April 23, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

    With this change, gettxoutsetinfo is about 7.5% faster on my arm64 machine (48.209s on PR vs. 53.561s on master).

  20. achow101 commented at 6:39 PM on April 23, 2026: member

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

  21. achow101 merged this on Apr 23, 2026
  22. achow101 closed this on Apr 23, 2026

  23. l0rinc deleted the branch on Apr 23, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-12 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me