dbwrapper: avoid copying `CDBIterator` keys in `GetKey()` #35128

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/dbwrapper-key-spanreader changing 3 files +8 −2
  1. l0rinc commented at 10:11 AM on April 21, 2026: contributor

    Problem

    CDBIterator::GetKey() only deserializes the current LevelDB key once and GetKeyImpl() already exposes that key as a contiguous borrowed byte span, and GetKey() creates a fresh local reader and only performs immediate forward reads before returning.

    The copied DataStream currently insulates the iterator entry from a failed decode, so switching to a borrowed reader is only safe if a deserialization failure still returns false and leaves the same key/value readable afterward.

    [!NOTE] The same simplification does not apply to GetValue(), because that path deobfuscates the value bytes in place first and still needs an owning mutable buffer.

    Fix

    Add a preparatory test with an invalid reads and checks that the failed decode does not consume the current iterator entry. Then switch GetKey() to SpanReader so the key bytes are read in place instead of being copied into a temporary DataStream.

    This keeps the same exception swallowing and bool return semantics while avoiding the extra allocation and copy.

    Context

    Related to #34483 and #35025

    Reproducer

    gettxoutsetinfo is ~10-12% faster for up-to-date blocks (run on SSD), see:

    <details><summary>2026-04-20 | gettxoutsetinfo | rpi5-8 | aarch64 | Cortex-A76 | 4 cores | 7.7Gi RAM | ext4 | SSD</summary>

    COMMITS="64a88c8c1edc7ee5cef623d9aa8179a239e27ce9 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf"; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    mkdir -p "$LOG_DIR" && \
    (echo ""; for c in $COMMITS; do git cat-file -e "$c^{commit}" 2>/dev/null || git fetch -q origin "$c" || exit 1; git log -1 --pretty='%h %s' "$c" || exit 1; done) && \
    (echo "" && echo "$(date -I) | gettxoutsetinfo | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
    hyperfine \
      --sort command \
      --runs 10 \
      --export-json "$BASE_DIR/gettxoutsetinfo-$(sed -E 's/([a-f0-9]{8})[a-f0-9]* ?/\1-/g;s/-$//'<<<"$COMMITS")-$(date +%s).json" \
      --parameter-list COMMIT ${COMMITS// /,} \
      --prepare "killall -9 bitcoind 2>/dev/null || true; rm -f $DATA_DIR/debug.log; git clean -fxd && git reset --hard {COMMIT} && \
        cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind bitcoin-cli -j$(nproc) && \
        ./build/bin/bitcoind -datadir=$DATA_DIR -connect=0 -listen=0 -dnsseed=0 -coinstatsindex=0 -txindex=0 -blockfilterindex=0 -daemon -printtoconsole=0; \
        ./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcwait getblockcount >/dev/null" \
      --conclude "./build/bin/bitcoin-cli -datadir=$DATA_DIR stop 2>/dev/null || true; killall bitcoind 2>/dev/null || true; sleep 10; \
        grep -q 'Done loading' $DATA_DIR/debug.log && grep 'Bitcoin Core version' $DATA_DIR/debug.log | grep -q \"\$(git rev-parse --short=12 {COMMIT})\"; \
        cp $DATA_DIR/debug.log $LOG_DIR/gettxoutsetinfo-{COMMIT}-$(date +%s).log" \
      "./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null"
    
    64a88c8c1e Merge bitcoin/bitcoin#35096: kernel: align height parameters to int32_t in btck API
    57dc0202dd dbwrapper: use SpanReader for iterator keys
    
    Benchmark 1: ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 64a88c8c1edc7ee5cef623d9aa8179a239e27ce9)
      Time (mean ± σ):     109.002 s ±  3.091 s    [User: 0.003 s, System: 0.004 s]
      Range (min … max):   106.191 s … 113.608 s    10 runs
    
    Benchmark 2: ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf)
      Time (mean ± σ):     97.711 s ±  1.172 s    [User: 0.003 s, System: 0.004 s]
      Range (min … max):   96.651 s … 100.104 s    10 runs
    
    Relative speed comparison
            1.12 ±  0.03  ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 64a88c8c1edc7ee5cef623d9aa8179a239e27ce9)
            1.00          ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/BitcoinData -rpcclienttimeout=0 -named gettxoutsetinfo hash_type='none' use_index='false' >/dev/null (COMMIT = 57dc0202ddb7b4cbdd521fb237a25fc4d7f28ddf)
    

    </details>

  2. test: cover failed `CDBIterator::GetKey()` deserialization
    The upcoming change will replace the temporary owning `DataStream` inside `CDBIterator::GetKey()` with a borrowed reader over the current LevelDB key bytes.
    The copied `DataStream` currently insulates the iterator entry from a failed decode, so the optimization is only safe if a deserialization failure still returns `false` and leaves the same key/value readable afterward.
    
    Extend `dbwrapper_iterator` to read a one-byte key as a `uint16_t`.
    The read must fail, return `false`, and still allow the same key and value to be read afterward.
    This would fail if `GetKey()` stopped swallowing deserialization exceptions, or if a failed decode started consuming shared iterator state instead of only temporary reader state.
    
    Drop the dead `const_cast` in the test while here, since `dbw` is already non-const.
    
    Locking down that contract first makes the following `SpanReader` switch a behavior-preserving optimization.
    f0e498af5c
  3. dbwrapper: use `SpanReader` for iterator keys
    `CDBIterator::GetKey()` only deserializes the current LevelDB key once.
    `GetKeyImpl()` already exposes the current key as a contiguous borrowed byte span, and `GetKey()` creates a fresh local reader and only performs immediate forward reads before returning.
    
    Switch this path to `SpanReader` so the key bytes are read in place instead of being copied into a temporary `DataStream`.
    This keeps the same exception swallowing and `bool` return semantics while avoiding the extra allocation and copy.
    
    The preceding test locks down the subtle safety property that matters here: a failed decode must not consume the current iterator entry.
    Note that the same simplification does not apply to `GetValue()`, because that path deobfuscates the value bytes in place first and still needs an owning mutable buffer.
    5de2f97a05
  4. DrahtBot commented at 10:11 AM on April 21, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, andrewtoth

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  5. sedited approved
  6. sedited commented at 12:51 PM on April 21, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

  7. andrewtoth approved
  8. andrewtoth commented at 1:50 PM on April 21, 2026: contributor

    ACK 5de2f97a0521fe75b47f9840141c4ace74656de9

    Saw a ~7% performance boost on gettxoutsetinfo on my i9.

  9. andrewtoth commented at 2:24 PM on April 21, 2026: contributor

    Also got a nice speedup by not reallocating a DataStream for each GetValue:

    diff --git a/src/dbwrapper.h b/src/dbwrapper.h
    index 75ac5fdb05..c439c0f03b 100644
    --- a/src/dbwrapper.h
    +++ b/src/dbwrapper.h
    @@ -124,6 +124,7 @@ public:
     private:
         const CDBWrapper &parent;
         const std::unique_ptr<IteratorImpl> m_impl_iter;
    +    DataStream m_val_stream{};
     
         void SeekImpl(std::span<const std::byte> key);
         std::span<const std::byte> GetKeyImpl() const;
    @@ -163,9 +164,10 @@ public:
     
         template<typename V> bool GetValue(V& value) {
             try {
    -            DataStream ssValue{GetValueImpl()};
    -            dbwrapper_private::GetObfuscation(parent)(ssValue);
    -            ssValue >> value;
    +            m_val_stream.clear();
    +            m_val_stream.write(GetValueImpl());
    +            dbwrapper_private::GetObfuscation(parent)(m_val_stream);
    +            m_val_stream >> value;
             } catch (const std::exception&) {
                 return false;
             }
    

    Feel free to take it.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me