validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/bip30-bloom-filter-removal changing 2 files +4 −4
  1. l0rinc commented at 12:43 PM on November 7, 2025: contributor

    draft to gather comments and conceptual reviews


    Context

    BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them. LevelDB's FilterPolicy stores a per-table probabilistic filter to optimize for negative lookups.

    After the first ~230k blocks (BIP30/BIP34 windows), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions). Bloom filters therefore slow the common case (present-key lookups) while bloating the on-disk tables.

    History

    Bloom filters were introduced in the Ultraprune PR (#1677) without explicit documentation of their purpose.

    Fix

    For blocks prior to the assumevalid anchor, we already skip script verification, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid's purpose (especially after the recent checkpoint removal).

    Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

    Disclaimer

    Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation, which may be a few seconds slower. Checks beyond 1,983,701 remain enforced regardless of fScriptChecks.

    Performance

    Performance change is best demonstrated by an AssumeUTXO loading - since this change was mostly motivated by UTXO set size and memory reduction.

    AssumeUTXO loads with default dbcache show ~11% faster bootstrapping.

    <details> <summary>11%, 8% and 1% faster assumeutxo load | 880000 blocks | dbcache 450/4500/45000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD</summary>

    COMMITS="745eb053a41c487cc10f20644c65dc8455cf8974 5cb93dad7c06db82642169d8f7d07442d215f49c"; \
    CC=gcc; CXX=g++; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/ShallowBitcoinData"; LOG_DIR="$BASE_DIR/logs"; UTXO_SNAPSHOT_PATH="$BASE_DIR/utxo-880000.dat"; \
    (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
    for DBCACHE in 450 4500 45000; do \
      (echo "assumeutxo load | 880000 blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') |
    $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)";) &&\
      hyperfine \
      --sort command \
      --runs 5 \
      --export-json "$BASE_DIR/assumeutxo-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$DBCACHE-$CC-$(date +%s).json" \
      --parameter-list COMMIT ${COMMITS// /,} \
      --prepare "killall bitcoind 2>/dev/null; rm -rf $DATA_DIR/blocks $DATA_DIR/chainstate $DATA_DIR/chainstate_snapshot $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
                 cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind bitcoin-cli -j2 && \
                 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 20 && \
                 ./build/bin/bitcoind -datadir=$DATA_DIR -daemon -blocksonly -connect=0 -dbcache=$DBCACHE -printtoconsole=0; sleep 20" \
       --conclude "build/bin/bitcoin-cli -datadir=$DATA_DIR stop || true; killall bitcoind || true; sleep 10; \
                   echo '{COMMIT} | dbcache=$DBCACHE | chainstate: $(find $DATA_DIR/chainstate_snapshot -type f 2>/dev/null | wc -l) files, $(du -sb $DATA_DIR/chainstate_snapshot 2>/dev/null | cut -f1) bytes' >> $DATA_DIR/debug.log; \
                   cp $DATA_DIR/debug.log $LOG_DIR/debug-assumeutxo-{COMMIT}-dbcache-$DBCACHE-$(date +%s).log" \
        "COMPILER=$CC DBCACHE=$DBCACHE ./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcclienttimeout=0 loadtxoutset $UTXO_SNAPSHOT_PATH"; \
    done
    
    745eb053a4 Merge bitcoin-core/gui#901: Add createwallet, createwalletdescriptor, and migratewallet to history filter
    5cb93dad7c leveldb: remove bloom filters from leveldb
    
    Benchmark 1: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
      Time (mean ± σ):     696.452 s ± 57.904 s    [User: 0.002 s, System: 0.001 s]
      Range (min … max):   655.482 s … 797.623 s    5 runs
    
    Benchmark 2: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
      Time (mean ± σ):     628.999 s ± 37.939 s    [User: 0.002 s, System: 0.001 s]
      Range (min … max):   596.216 s … 673.440 s    5 runs
    
    Relative speed comparison
            1.11 ±  0.11  COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
            1.00          COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    assumeutxo load | 880000 blocks | dbcache 4500 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
    Benchmark 1: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
      Time (mean ± σ):     674.430 s ± 37.704 s    [User: 0.001 s, System: 0.001 s]
      Range (min … max):   642.483 s … 734.178 s    5 runs
    
    Benchmark 2: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
      Time (mean ± σ):     622.827 s ± 16.068 s    [User: 0.001 s, System: 0.002 s]
      Range (min … max):   610.489 s … 650.770 s    5 runs
    
    Relative speed comparison
            1.08 ±  0.07  COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
            1.00          COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    assumeutxo load | 880000 blocks | dbcache 45000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
    Benchmark 1: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
      Time (mean ± σ):     484.569 s ± 16.260 s    [User: 0.001 s, System: 0.002 s]
      Range (min … max):   469.979 s … 507.771 s    5 runs
    
    Benchmark 2: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
      Time (mean ± σ):     482.040 s ± 12.817 s    [User: 0.002 s, System: 0.001 s]
      Range (min … max):   465.205 s … 500.719 s    5 runs
    
    Relative speed comparison
            1.01 ±  0.04  COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
            1.00          COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    

    </details>

    <img width="1477" height="802" alt="image" src="https://github.com/user-attachments/assets/e1c7720b-59f3-4dda-80fd-f7eeb1066f41" /> (note: image will be moved to a comment later)

    For reference, here is how the change affects reindex-chainstate per 100k block chunk: <img width="4500" height="3000" alt="block_chunk_comparison" src="https://github.com/user-attachments/assets/b51b9224-ad4f-4f5d-899d-df3d32c75d69" /> (note: image will be moved to a comment later)

    Persisted Size

    UTXO set size depends on LevelDB compaction scheduling. To stability stabilize measurements, we have instrumented the code to compact after every block connect to see the exact effect of the bloom filters on number of LevelDB files and their total sizes. This is for on-disk size measurement only, not for performance.

    <details> <summary>compact after each block connection for stable size</summary>

    log index directory stats for every update tip

    From 3e5414c6ef6f4cefbb0ad49d3c164823850e42b2 Mon Sep 17 00:00:00 2001
    From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
    Date: Wed, 29 Oct 2025 08:53:07 +0100
    Subject: [PATCH] log index directory stats for every update tip
    
    ---
     src/validation.cpp | 33 +++++++++++++++++++++++++++++++--
     1 file changed, 31 insertions(+), 2 deletions(-)
    
    diff --git a/src/validation.cpp b/src/validation.cpp
    index af523b06d74e4..cc68023b1e7dc 100644
    --- a/src/validation.cpp
    +++ b/src/validation.cpp
    @@ -68,11 +68,13 @@
     #include <cassert>
     #include <chrono>
     #include <deque>
    +#include <filesystem>
     #include <numeric>
     #include <optional>
     #include <ranges>
     #include <span>
     #include <string>
    +#include <thread>
     #include <tuple>
     #include <utility>
    
    @@ -2942,6 +2944,27 @@ void Chainstate::PruneAndFlush()
         }
     }
    
    +static std::pair<size_t, size_t> GetDirectoryStats(const fs::path& dir_path)
    +{
    +    assert(fs::exists(dir_path) && fs::is_directory(dir_path));
    +    for (int attempts{0}; attempts < 100; ++attempts) {
    +        try {
    +            size_t file_count{0}, total_bytes{0};
    +            for (const auto& entry : fs::recursive_directory_iterator(dir_path)) {
    +                if (entry.is_regular_file()) {
    +                    ++file_count;
    +                    total_bytes += entry.file_size();
    +                }
    +            }
    +            return {file_count, total_bytes};
    +        } catch (const fs::filesystem_error&) {
    +            // can fail during compaction
    +            std::this_thread::sleep_for(std::chrono::seconds(5));
    +        }
    +    }
    +    std::terminate();
    +}
    +
     static void UpdateTipLog(
         const ChainstateManager& chainman,
         const CCoinsViewCache& coins_tip,
    @@ -2953,8 +2976,12 @@ static void UpdateTipLog(
    
         AssertLockHeld(::cs_main);
    
    -    // Disable rate limiting in LogPrintLevel_ so this source location may log during IBD.
    -    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false, "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo)%s\n",
    +    const fs::path datadir{"/mnt/my_storage/BitcoinData"}; // TODO shouldn't be hard-coded
    +    auto [chainstate_files, chainstate_bytes] = GetDirectoryStats(datadir / "chainstate");
    +    auto [index_files, index_bytes] = GetDirectoryStats(datadir / "blocks" / "index");
    +
    +    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false,
    +                   "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo) chainstate=%zu files/%zu bytes index=%zu files/%zu bytes%s\n",
                        prefix, func_name,
                        tip->GetBlockHash().ToString(), tip->nHeight, tip->nVersion,
                        log(tip->nChainWork.getdouble()) / log(2.0), tip->m_chain_tx_count,
    @@ -2962,6 +2989,8 @@ static void UpdateTipLog(
                        chainman.GuessVerificationProgress(tip),
                        coins_tip.DynamicMemoryUsage() * (1.0 / (1 << 20)),
                        coins_tip.GetCacheSize(),
    +                   chainstate_files, chainstate_bytes,
    +                   index_files, index_bytes,
                        !warning_messages.empty() ? strprintf(" warning='%s'", warning_messages) : "");
     }
    
    
    From 76d866de450e30bd60edddd221a64266fb6488da Mon Sep 17 00:00:00 2001
    From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
    Date: Tue, 4 Nov 2025 18:07:59 +0100
    Subject: [PATCH] compact after each block connection
    
    ---
     src/dbwrapper.cpp  | 7 ++++++-
     src/dbwrapper.h    | 2 ++
     src/txdb.h         | 1 +
     src/validation.cpp | 2 ++
     4 files changed, 11 insertions(+), 1 deletion(-)
    
    diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
    index fe5f9cb0893d7..8e2be54f35fd3 100644
    --- a/src/dbwrapper.cpp
    +++ b/src/dbwrapper.cpp
    @@ -245,7 +245,7 @@ CDBWrapper::CDBWrapper(const DBParams& params)
    
         if (params.options.force_compact) {
             LogInfo("Starting database compaction of %s", fs::PathToString(params.path));
    -        DBContext().pdb->CompactRange(nullptr, nullptr);
    +        CompactFull();
             LogInfo("Finished database compaction of %s", fs::PathToString(params.path));
         }
    
    @@ -348,6 +348,11 @@ bool CDBWrapper::IsEmpty()
         return !(it->Valid());
     }
    
    +void CDBWrapper::CompactFull()
    +{
    +    DBContext().pdb->CompactRange(nullptr, nullptr);
    +}
    +
     struct CDBIterator::IteratorImpl {
         const std::unique_ptr<leveldb::Iterator> iter;
    
    diff --git a/src/dbwrapper.h b/src/dbwrapper.h
    index b9b98bd96ade3..8aba4feb08e6c 100644
    --- a/src/dbwrapper.h
    +++ b/src/dbwrapper.h
    @@ -284,6 +284,8 @@ class CDBWrapper
             ssKey2 << key_end;
             return EstimateSizeImpl(ssKey1, ssKey2);
         }
    +
    +    void CompactFull();
     };
    
     #endif // BITCOIN_DBWRAPPER_H
    diff --git a/src/txdb.h b/src/txdb.h
    index ea0cf9d77e596..394993fa5264a 100644
    --- a/src/txdb.h
    +++ b/src/txdb.h
    @@ -56,6 +56,7 @@ class CCoinsViewDB final : public CCoinsView
    
         //! [@returns](/bitcoin-bitcoin/contributor/returns/) filesystem path to on-disk storage or std::nullopt if in memory.
         std::optional<fs::path> StoragePath() { return m_db->StoragePath(); }
    +    void CompactFull() const { m_db->CompactFull(); }
     };
    
     #endif // BITCOIN_TXDB_H
    diff --git a/src/validation.cpp b/src/validation.cpp
    index cc68023b1e7dc..b6e796d539357 100644
    --- a/src/validation.cpp
    +++ b/src/validation.cpp
    @@ -3027,6 +3027,8 @@ void Chainstate::UpdateTip(const CBlockIndex* pindexNew)
                 }
             }
         }
    +    m_blockman.m_block_tree_db->CompactFull();
    +    this->CoinsDB().CompactFull();
         UpdateTipLog(m_chainman, coins_tip, pindexNew, __func__, "",
                      util::Join(warning_messages, Untranslated(", ")).original);
     }
    

    </details>

    Running a before/after reindex-chainstate and plotting the on-disk size of the chainstate index for every block shows that the PR reduces the UTXO index by roughly 222MB (2%).

    <details> <summary>instrumented benchmark patch</summary>

    COMMITS="76d866de450e30bd60edddd221a64266fb6488da fd69291daff5cee0763023203a24d52cd7aab183"; \
    STOP=921129; DBCACHE=450; \
    CC=gcc; CXX=g++; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
    (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
    hyperfine \
      --sort command \
      --runs 1 \
      --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
      --parameter-list COMMIT ${COMMITS// /,} \
      --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
        cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
        ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
      --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
                  cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
      "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
    
    76d866de45 compact after each block connection
    fd69291daf leveldb: remove bloom filters from leveldb
    
    reindex-chainstate | 921129 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
      Time (abs ≡):        56611.911 s               [User: 45475.199 s, System: 5006.131 s]
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)
      Time (abs ≡):        53522.995 s               [User: 41015.776 s, System: 4894.659 s]
    
    Relative speed comparison
            1.06          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)
    

    </details>

    <img width="4500" height="3600" alt="chainstate_growth" src="https://github.com/user-attachments/assets/e3196d2c-0155-463a-a7e5-db42b6fe0844" /> (note: image will be moved to a comment later)

    Full validation

    To help with reproducibility, the first commit introduces a slight regression to demonstrate the need for the second commit.

    With BIP30 checks still active and without LevelDB bloom filters, the first 230k blocks validate ~7% slower.

    <details> <summary>7% slower reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD</summary>

    COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 166d35713cf61986bb4b37283cb8b001ad013771"; STOP=230000; DBCACHE=450; CC=gcc; CXX=g++; BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&hyperfine   --sort command   --runs 2   --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json"   --parameter-list COMMIT ${COMMITS// /,}   --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
        cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
        ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20"   --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block [#1](/bitcoin-bitcoin/1/)' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
                  cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log"   "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    166d35713c leveldb: remove bloom filters from leveldb
    
    reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.615 s ±  0.468 s    [User: 186.278 s, System: 10.035 s]
      Range (min … max):   170.285 s … 170.946 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
      Time (mean ± σ):     181.904 s ±  0.534 s    [User: 196.567 s, System: 10.482 s]
      Range (min … max):   181.526 s … 182.281 s    2 runs
    
    Relative speed comparison
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
            1.07 ±  0.00  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
    

    </details>

    With BIP30 buried behind assumevalid and without LevelDB bloom filters, the first 230k blocks validate ~33% faster.

    <details> <summary>33% faster reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD</summary>

    COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 060a83df97a84e39a44a7f4a8ea27512d2e7b008"; \
    STOP=230000; DBCACHE=450; \
    CC=gcc; CXX=g++; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
    (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2
     | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $
    BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
    hyperfine \
      --sort command \
      --runs 2 \
      --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
      --parameter-list COMMIT ${COMMITS// /,} \
      --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
        cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
        ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
      --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block [#1](/bitcoin-bitcoin/1/)' $DATA_DIR/deb
    ug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
                  cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
      "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    060a83df97 validation: bury bip30 checks behind assumevalid
    
    reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.827 s ±  0.718 s    [User: 186.351 s, System: 10.223 s]
      Range (min … max):   170.319 s … 171.334 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
      Time (mean ± σ):     128.569 s ±  0.168 s    [User: 143.057 s, System: 10.436 s]
      Range (min … max):   128.449 s … 128.688 s    2 runs
    
    Relative speed comparison
            1.33 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
    

    </details>

  2. leveldb: remove bloom filters from LevelDB
    LevelDB's `FilterPolicy` stores a per-table probabilistic filter to optimize for negative lookups.
    Outside the BIP30/BIP34 window (first ~230k blocks), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions).
    Filters therefore slow the common case (present-key lookups) while adding a probabilistic structure to on-disk tables.
    
    Bloom filters were introduced in the Ultraprune PR (#1677) without explicit documentation of their purpose. Removing them slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.
    
    This commit is placed before burying BIP30 behind assumevalid to make performance changes reproducible in isolation.
    
    Benchmarking reindex-chainstate for the first 230k blocks (to quantify the cost of negative lookups without filters) shows only a small slowdown on misses, totaling a few seconds, while later blocks can be faster due to optimizing for the common case.
    
    -----
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    166d35713c leveldb: remove bloom filters from leveldb
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.615 s ±  0.468 s    [User: 186.278 s, System: 10.035 s]
      Range (min … max):   170.285 s … 170.946 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
      Time (mean ± σ):     181.904 s ±  0.534 s    [User: 196.567 s, System: 10.482 s]
      Range (min … max):   181.526 s … 182.281 s    2 runs
    e02714eb10
  3. validation: bury BIP30 checks behind assumevalid
    BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them. This applies to blocks <227,930 (pre-BIP34 activation) and is conservatively re-enforced after height 1,983,701.
    
    BIP30 checks are the only place in validation where we intentionally query the UTXO database for entries we expect not to find. For blocks prior to the `assumevalid` anchor, we already skip script verification and other checks, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid's purpose.
    
    This removes negative UTXO lookups during IBD when íassumevalidí is used. Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation. Checks beyond 1,983,701 remain enforced regardless of `fScriptChecks`.
    
    -----
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    060a83df97 validation: bury bip30 checks behind assumevalid
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.827 s ±  0.718 s    [User: 186.351 s, System: 10.223 s]
      Range (min … max):   170.319 s … 171.334 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
      Time (mean ± σ):     128.569 s ±  0.168 s    [User: 143.057 s, System: 10.436 s]
      Range (min … max):   128.449 s … 128.688 s    2 runs
    d6ce0ee916
  4. DrahtBot added the label Validation on Nov 7, 2025
  5. DrahtBot commented at 12:43 PM on November 7, 2025: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33817.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34004 (Implementation of SwiftSync by rustaceanrob)
    • #32317 (kernel: Separate UTXO set access from validation functions by sedited)
    • #30214 (refactor: Improve assumeutxo state representation by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  6. maflcko commented at 3:29 PM on November 7, 2025: member

    Skipping BIP30 for those deeply buried blocks

    Not sure about this. Wouldn't this mean someone can feed a -nominimumchainwork node a bogus chain, so that the node crashes or is stuck irrecoverably on the bogus chain?

    Even if it wasn't, I am not sure if touching validation.cpp is worth it for basically a rounding error on overall IBD speed?

  7. l0rinc commented at 3:34 PM on November 7, 2025: contributor

    I am not sure if touching validation.cpp is worth it

    It's not about IBD necessarily, but reduced disk footprint and adjusting the database to resemble the usage more closely:

    Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

  8. gmaxwell commented at 9:51 PM on November 7, 2025: contributor

    Hm. I don't know we were aware that you could turn off the filters in leveldb-- I thought they were used to also decide what level an entry might be in!

    Have you tried to characterize if this opens up any DOS attacks with unconfirmed transactions?

    I think assumeutxo load time is not the best benchmark for this-- it's a one time operation and already pretty fast. It would be more compelling if it could be shown to reduce IBD time or block validation time at tip-- though the validation time graph you've provided isn't very compelling.

    Nor do I think 2% storage is particularly compelling. But if there isn't a potential downside, why not?

  9. l0rinc commented at 6:49 PM on November 9, 2025: contributor

    Thanks for the comments! I'm still testing how this combines with other LevelDB options, and how it integrates with other changes such as #31132, which could benefit from faster reads (I'm getting mixed results on different systems for now), and how much memory is saved by skipping the filters (these are all really slow to measure reliably). In the meantime please keep the conceptual reviews coming, appreciate the high-level context.

  10. DrahtBot commented at 3:28 PM on December 16, 2025: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->

    🐙 This pull request conflicts with the target branch and needs rebase.

  11. DrahtBot added the label Needs rebase on Dec 16, 2025
  12. DrahtBot commented at 1:14 AM on March 15, 2026: contributor

    <!--13523179cfe9479db18ec6c5d236f789-->

    ⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it with one of the labels 'Up for grabs' or 'Insufficient Review', so that it can be picked up in the future.
  13. l0rinc commented at 2:57 PM on April 7, 2026: contributor

    Closing for lack of interest - and lack of obvious results, I was getting mixed measurements.

  14. l0rinc closed this on Apr 7, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-30 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me