validation: reduce persisted UTXO set size by prioritizing positive lookups (RFC) #33817

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/bip30-bloom-filter-removal changing 2 files +4 −4
  1. l0rinc commented at 12:43 pm on November 7, 2025: contributor

    draft to gather comments and conceptual reviews


    Context

    BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them. LevelDB’s FilterPolicy stores a per-table probabilistic filter to optimize for negative lookups.

    After the first ~230k blocks (BIP30/BIP34 windows), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions). Bloom filters therefore slow the common case (present-key lookups) while bloating the on-disk tables.

    History

    Bloom filters were introduced in the Ultraprune PR (#1677) without explicit documentation of their purpose.

    Fix

    For blocks prior to the assumevalid anchor, we already skip script verification, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid’s purpose (especially after the recent checkpoint removal).

    Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

    Disclaimer

    Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation, which may be a few seconds slower. Checks beyond 1,983,701 remain enforced regardless of fScriptChecks.

    Performance

    Performance change is best demonstrated by an AssumeUTXO loading - since this change was mostly motivated by UTXO set size and memory reduction.

    AssumeUTXO loads with default dbcache show ~11% faster bootstrapping.

     0COMMITS="745eb053a41c487cc10f20644c65dc8455cf8974 5cb93dad7c06db82642169d8f7d07442d215f49c"; \
     1CC=gcc; CXX=g++; \
     2BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/ShallowBitcoinData"; LOG_DIR="$BASE_DIR/logs"; UTXO_SNAPSHOT_PATH="$BASE_DIR/utxo-880000.dat"; \
     3(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
     4for DBCACHE in 450 4500 45000; do \
     5  (echo "assumeutxo load | 880000 blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') |
     6$(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)";) &&\
     7  hyperfine \
     8  --sort command \
     9  --runs 5 \
    10  --export-json "$BASE_DIR/assumeutxo-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$DBCACHE-$CC-$(date +%s).json" \
    11  --parameter-list COMMIT ${COMMITS// /,} \
    12  --prepare "killall bitcoind 2>/dev/null; rm -rf $DATA_DIR/blocks $DATA_DIR/chainstate $DATA_DIR/chainstate_snapshot $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    13             cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind bitcoin-cli -j2 && \
    14             ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 20 && \
    15             ./build/bin/bitcoind -datadir=$DATA_DIR -daemon -blocksonly -connect=0 -dbcache=$DBCACHE -printtoconsole=0; sleep 20" \
    16   --conclude "build/bin/bitcoin-cli -datadir=$DATA_DIR stop || true; killall bitcoind || true; sleep 10; \
    17               echo '{COMMIT} | dbcache=$DBCACHE | chainstate: $(find $DATA_DIR/chainstate_snapshot -type f 2>/dev/null | wc -l) files, $(du -sb $DATA_DIR/chainstate_snapshot 2>/dev/null | cut -f1) bytes' >> $DATA_DIR/debug.log; \
    18               cp $DATA_DIR/debug.log $LOG_DIR/debug-assumeutxo-{COMMIT}-dbcache-$DBCACHE-$(date +%s).log" \
    19    "COMPILER=$CC DBCACHE=$DBCACHE ./build/bin/bitcoin-cli -datadir=$DATA_DIR -rpcclienttimeout=0 loadtxoutset $UTXO_SNAPSHOT_PATH"; \
    20done
    21
    22745eb053a4 Merge bitcoin-core/gui#901: Add createwallet, createwalletdescriptor, and migratewallet to history filter
    235cb93dad7c leveldb: remove bloom filters from leveldb
    24
    25Benchmark 1: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    26  Time (mean ± σ):     696.452 s ± 57.904 s    [User: 0.002 s, System: 0.001 s]
    27  Range (min  max):   655.482 s  797.623 s    5 runs
    28
    29Benchmark 2: COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    30  Time (mean ± σ):     628.999 s ± 37.939 s    [User: 0.002 s, System: 0.001 s]
    31  Range (min  max):   596.216 s  673.440 s    5 runs
    32
    33Relative speed comparison
    34        1.11 ±  0.11  COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    35        1.00          COMPILER=gcc DBCACHE=450 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    36assumeutxo load | 880000 blocks | dbcache 4500 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
    37Benchmark 1: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    38  Time (mean ± σ):     674.430 s ± 37.704 s    [User: 0.001 s, System: 0.001 s]
    39  Range (min  max):   642.483 s  734.178 s    5 runs
    40
    41Benchmark 2: COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    42  Time (mean ± σ):     622.827 s ± 16.068 s    [User: 0.001 s, System: 0.002 s]
    43  Range (min  max):   610.489 s  650.770 s    5 runs
    44
    45Relative speed comparison
    46        1.08 ±  0.07  COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    47        1.00          COMPILER=gcc DBCACHE=4500 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    48assumeutxo load | 880000 blocks | dbcache 45000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD
    49Benchmark 1: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    50  Time (mean ± σ):     484.569 s ± 16.260 s    [User: 0.001 s, System: 0.002 s]
    51  Range (min  max):   469.979 s  507.771 s    5 runs
    52
    53Benchmark 2: COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    54  Time (mean ± σ):     482.040 s ± 12.817 s    [User: 0.002 s, System: 0.001 s]
    55  Range (min  max):   465.205 s  500.719 s    5 runs
    56
    57Relative speed comparison
    58        1.01 ±  0.04  COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 745eb053a41c487cc10f20644c65dc8455cf8974)
    59        1.00          COMPILER=gcc DBCACHE=45000 ./build/bin/bitcoin-cli -datadir=/mnt/my_storage/ShallowBitcoinData -rpcclienttimeout=0 loadtxoutset /mnt/my_storage/utxo-880000.dat (COMMIT = 5cb93dad7c06db82642169d8f7d07442d215f49c)
    

    For reference, here is how the change affects reindex-chainstate per 100k block chunk: (note: image will be moved to a comment later)

    Persisted Size

    UTXO set size depends on LevelDB compaction scheduling. To stability stabilize measurements, we have instrumented the code to compact after every block connect to see the exact effect of the bloom filters on number of LevelDB files and their total sizes. This is for on-disk size measurement only, not for performance.

    log index directory stats for every update tip

     0From 3e5414c6ef6f4cefbb0ad49d3c164823850e42b2 Mon Sep 17 00:00:00 2001
     1From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
     2Date: Wed, 29 Oct 2025 08:53:07 +0100
     3Subject: [PATCH] log index directory stats for every update tip
     4
     5---
     6 src/validation.cpp | 33 +++++++++++++++++++++++++++++++--
     7 1 file changed, 31 insertions(+), 2 deletions(-)
     8
     9diff --git a/src/validation.cpp b/src/validation.cpp
    10index af523b06d74e4..cc68023b1e7dc 100644
    11--- a/src/validation.cpp
    12+++ b/src/validation.cpp
    13@@ -68,11 +68,13 @@
    14 #include <cassert>
    15 #include <chrono>
    16 #include <deque>
    17+#include <filesystem>
    18 #include <numeric>
    19 #include <optional>
    20 #include <ranges>
    21 #include <span>
    22 #include <string>
    23+#include <thread>
    24 #include <tuple>
    25 #include <utility>
    26
    27@@ -2942,6 +2944,27 @@ void Chainstate::PruneAndFlush()
    28     }
    29 }
    30
    31+static std::pair<size_t, size_t> GetDirectoryStats(const fs::path& dir_path)
    32+{
    33+    assert(fs::exists(dir_path) && fs::is_directory(dir_path));
    34+    for (int attempts{0}; attempts < 100; ++attempts) {
    35+        try {
    36+            size_t file_count{0}, total_bytes{0};
    37+            for (const auto& entry : fs::recursive_directory_iterator(dir_path)) {
    38+                if (entry.is_regular_file()) {
    39+                    ++file_count;
    40+                    total_bytes += entry.file_size();
    41+                }
    42+            }
    43+            return {file_count, total_bytes};
    44+        } catch (const fs::filesystem_error&) {
    45+            // can fail during compaction
    46+            std::this_thread::sleep_for(std::chrono::seconds(5));
    47+        }
    48+    }
    49+    std::terminate();
    50+}
    51+
    52 static void UpdateTipLog(
    53     const ChainstateManager& chainman,
    54     const CCoinsViewCache& coins_tip,
    55@@ -2953,8 +2976,12 @@ static void UpdateTipLog(
    56
    57     AssertLockHeld(::cs_main);
    58
    59-    // Disable rate limiting in LogPrintLevel_ so this source location may log during IBD.
    60-    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false, "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo)%s\n",
    61+    const fs::path datadir{"/mnt/my_storage/BitcoinData"}; // TODO shouldn't be hard-coded
    62+    auto [chainstate_files, chainstate_bytes] = GetDirectoryStats(datadir / "chainstate");
    63+    auto [index_files, index_bytes] = GetDirectoryStats(datadir / "blocks" / "index");
    64+
    65+    LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, /*should_ratelimit=*/false,
    66+                   "%s%s: new best=%s height=%d version=0x%08x log2_work=%f tx=%lu date='%s' progress=%f cache=%.1fMiB(%utxo) chainstate=%zu files/%zu bytes index=%zu files/%zu bytes%s\n",
    67                    prefix, func_name,
    68                    tip->GetBlockHash().ToString(), tip->nHeight, tip->nVersion,
    69                    log(tip->nChainWork.getdouble()) / log(2.0), tip->m_chain_tx_count,
    70@@ -2962,6 +2989,8 @@ static void UpdateTipLog(
    71                    chainman.GuessVerificationProgress(tip),
    72                    coins_tip.DynamicMemoryUsage() * (1.0 / (1 << 20)),
    73                    coins_tip.GetCacheSize(),
    74+                   chainstate_files, chainstate_bytes,
    75+                   index_files, index_bytes,
    76                    !warning_messages.empty() ? strprintf(" warning='%s'", warning_messages) : "");
    77 }
    
     0From 76d866de450e30bd60edddd221a64266fb6488da Mon Sep 17 00:00:00 2001
     1From: =?UTF-8?q?L=C5=91rinc?= <pap.lorinc@gmail.com>
     2Date: Tue, 4 Nov 2025 18:07:59 +0100
     3Subject: [PATCH] compact after each block connection
     4
     5---
     6 src/dbwrapper.cpp  | 7 ++++++-
     7 src/dbwrapper.h    | 2 ++
     8 src/txdb.h         | 1 +
     9 src/validation.cpp | 2 ++
    10 4 files changed, 11 insertions(+), 1 deletion(-)
    11
    12diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
    13index fe5f9cb0893d7..8e2be54f35fd3 100644
    14--- a/src/dbwrapper.cpp
    15+++ b/src/dbwrapper.cpp
    16@@ -245,7 +245,7 @@ CDBWrapper::CDBWrapper(const DBParams& params)
    17
    18     if (params.options.force_compact) {
    19         LogInfo("Starting database compaction of %s", fs::PathToString(params.path));
    20-        DBContext().pdb->CompactRange(nullptr, nullptr);
    21+        CompactFull();
    22         LogInfo("Finished database compaction of %s", fs::PathToString(params.path));
    23     }
    24
    25@@ -348,6 +348,11 @@ bool CDBWrapper::IsEmpty()
    26     return !(it->Valid());
    27 }
    28
    29+void CDBWrapper::CompactFull()
    30+{
    31+    DBContext().pdb->CompactRange(nullptr, nullptr);
    32+}
    33+
    34 struct CDBIterator::IteratorImpl {
    35     const std::unique_ptr<leveldb::Iterator> iter;
    36
    37diff --git a/src/dbwrapper.h b/src/dbwrapper.h
    38index b9b98bd96ade3..8aba4feb08e6c 100644
    39--- a/src/dbwrapper.h
    40+++ b/src/dbwrapper.h
    41@@ -284,6 +284,8 @@ class CDBWrapper
    42         ssKey2 << key_end;
    43         return EstimateSizeImpl(ssKey1, ssKey2);
    44     }
    45+
    46+    void CompactFull();
    47 };
    48
    49 #endif // BITCOIN_DBWRAPPER_H
    50diff --git a/src/txdb.h b/src/txdb.h
    51index ea0cf9d77e596..394993fa5264a 100644
    52--- a/src/txdb.h
    53+++ b/src/txdb.h
    54@@ -56,6 +56,7 @@ class CCoinsViewDB final : public CCoinsView
    55
    56     //! [@returns](/bitcoin-bitcoin/contributor/returns/) filesystem path to on-disk storage or std::nullopt if in memory.
    57     std::optional<fs::path> StoragePath() { return m_db->StoragePath(); }
    58+    void CompactFull() const { m_db->CompactFull(); }
    59 };
    60
    61 #endif // BITCOIN_TXDB_H
    62diff --git a/src/validation.cpp b/src/validation.cpp
    63index cc68023b1e7dc..b6e796d539357 100644
    64--- a/src/validation.cpp
    65+++ b/src/validation.cpp
    66@@ -3027,6 +3027,8 @@ void Chainstate::UpdateTip(const CBlockIndex* pindexNew)
    67             }
    68         }
    69     }
    70+    m_blockman.m_block_tree_db->CompactFull();
    71+    this->CoinsDB().CompactFull();
    72     UpdateTipLog(m_chainman, coins_tip, pindexNew, __func__, "",
    73                  util::Join(warning_messages, Untranslated(", ")).original);
    74 }
    

    Running a before/after reindex-chainstate and plotting the on-disk size of the chainstate index for every block shows that the PR reduces the UTXO index by roughly 222MB (2%).

     0COMMITS="76d866de450e30bd60edddd221a64266fb6488da fd69291daff5cee0763023203a24d52cd7aab183"; \
     1STOP=921129; DBCACHE=450; \
     2CC=gcc; CXX=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
     4(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
     5(echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
     6hyperfine \
     7  --sort command \
     8  --runs 1 \
     9  --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    10  --parameter-list COMMIT ${COMMITS// /,} \
    11  --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    12    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
    13    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
    14  --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
    15              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    16  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
    17
    1876d866de45 compact after each block connection
    19fd69291daf leveldb: remove bloom filters from leveldb
    20
    21reindex-chainstate | 921129 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
    22
    23Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
    24  Time (abs ):        56611.911 s               [User: 45475.199 s, System: 5006.131 s]
    25Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)
    26  Time (abs ):        53522.995 s               [User: 41015.776 s, System: 4894.659 s]
    27
    28Relative speed comparison
    29        1.06          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 76d866de450e30bd60edddd221a64266fb6488da)
    30        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=921129 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fd69291daff5cee0763023203a24d52cd7aab183)
    

    Full validation

    To help with reproducibility, the first commit introduces a slight regression to demonstrate the need for the second commit.

    With BIP30 checks still active and without LevelDB bloom filters, the first 230k blocks validate ~7% slower.

     0COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 166d35713cf61986bb4b37283cb8b001ad013771"; STOP=230000; DBCACHE=450; CC=gcc; CXX=g++; BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; (echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && (echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&hyperfine   --sort command   --runs 2   --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json"   --parameter-list COMMIT ${COMMITS// /,}   --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
     1    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
     2    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20"   --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block [#1](/bitcoin-bitcoin/1/)' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
     3              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log"   "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
     4
     52b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
     6166d35713c leveldb: remove bloom filters from leveldb
     7
     8reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
     9
    10Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
    11  Time (mean ± σ):     170.615 s ±  0.468 s    [User: 186.278 s, System: 10.035 s]
    12  Range (min  max):   170.285 s  170.946 s    2 runs
    13
    14Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
    15  Time (mean ± σ):     181.904 s ±  0.534 s    [User: 196.567 s, System: 10.482 s]
    16  Range (min  max):   181.526 s  182.281 s    2 runs
    17
    18Relative speed comparison
    19        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
    20        1.07 ±  0.00  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
    

    With BIP30 buried behind assumevalid and without LevelDB bloom filters, the first 230k blocks validate ~33% faster.

     0COMMITS="2b9c3511986bb2f55310dd5fe7b6367fcc63e44e 060a83df97a84e39a44a7f4a8ea27512d2e7b008"; \
     1STOP=230000; DBCACHE=450; \
     2CC=gcc; CXX=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
     4(echo ""; for c in $COMMITS; do git fetch -q origin $c && git log -1 --pretty='%h %s' $c || exit 1; done) && \
     5(echo "" && echo "reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2
     6 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $BASE_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $
     7BASE_DIR | tail -1) | grep -q 0 && echo SSD || echo HDD)"; echo "") &&\
     8hyperfine \
     9  --sort command \
    10  --runs 2 \
    11  --export-json "$BASE_DIR/rdx-$(sed -E 's/(\w{8})\w+ ?/\1-/g;s/-$//'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    12  --parameter-list COMMIT ${COMMITS// /,} \
    13  --prepare "killall bitcoind 2>/dev/null; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard && \
    14    cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_IPC=OFF && ninja -C build bitcoind -j2 && \
    15    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20" \
    16  --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block [#1](/bitcoin-bitcoin/1/)' $DATA_DIR/deb
    17ug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log; \
    18              cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    19  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0"
    20
    212b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    22060a83df97 validation: bury bip30 checks behind assumevalid
    23
    24reindex-chainstate | 230000 blocks | dbcache 450 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | xfs | SSD
    25
    26Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
    27  Time (mean ± σ):     170.827 s ±  0.718 s    [User: 186.351 s, System: 10.223 s]
    28  Range (min  max):   170.319 s  171.334 s    2 runs
    29
    30Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
    31  Time (mean ± σ):     128.569 s ±  0.168 s    [User: 143.057 s, System: 10.436 s]
    32  Range (min  max):   128.449 s  128.688 s    2 runs
    33
    34Relative speed comparison
    35        1.33 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
    36        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
    
  2. leveldb: remove bloom filters from LevelDB
    LevelDB's `FilterPolicy` stores a per-table probabilistic filter to optimize for negative lookups.
    Outside the BIP30/BIP34 window (first ~230k blocks), validation does not deliberately probe the UTXO set for missing entries (missing coins imply invalid transactions).
    Filters therefore slow the common case (present-key lookups) while adding a probabilistic structure to on-disk tables.
    
    Bloom filters were introduced in the Ultraprune PR (#1677) without explicit documentation of their purpose. Removing them slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.
    
    This commit is placed before burying BIP30 behind assumevalid to make performance changes reproducible in isolation.
    
    Benchmarking reindex-chainstate for the first 230k blocks (to quantify the cost of negative lookups without filters) shows only a small slowdown on misses, totaling a few seconds, while later blocks can be faster due to optimizing for the common case.
    
    -----
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    166d35713c leveldb: remove bloom filters from leveldb
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.615 s ±  0.468 s    [User: 186.278 s, System: 10.035 s]
      Range (min … max):   170.285 s … 170.946 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 166d35713cf61986bb4b37283cb8b001ad013771)
      Time (mean ± σ):     181.904 s ±  0.534 s    [User: 196.567 s, System: 10.482 s]
      Range (min … max):   181.526 s … 182.281 s    2 runs
    e02714eb10
  3. validation: bury BIP30 checks behind assumevalid
    BIP30 prevents duplicate transaction IDs by checking whether outputs already exist in the UTXO set before adding them. This applies to blocks <227,930 (pre-BIP34 activation) and is conservatively re-enforced after height 1,983,701.
    
    BIP30 checks are the only place in validation where we intentionally query the UTXO database for entries we expect not to find. For blocks prior to the `assumevalid` anchor, we already skip script verification and other checks, relying on accumulated proof of work. Skipping BIP30 for those deeply buried blocks is consistent with assumevalid's purpose.
    
    This removes negative UTXO lookups during IBD when íassumevalidí is used. Nodes syncing from genesis with -assumevalid=0 still perform full BIP30 validation. Checks beyond 1,983,701 remain enforced regardless of `fScriptChecks`.
    
    -----
    
    2b9c351198 Merge bitcoin/bitcoin#33768: refactor: remove dead branches in `SingletonClusterImpl`
    060a83df97 validation: bury bip30 checks behind assumevalid
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 2b9c3511986bb2f55310dd5fe7b6367fcc63e44e)
      Time (mean ± σ):     170.827 s ±  0.718 s    [User: 186.351 s, System: 10.223 s]
      Range (min … max):   170.319 s … 171.334 s    2 runs
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=230000 -dbcache=450 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 060a83df97a84e39a44a7f4a8ea27512d2e7b008)
      Time (mean ± σ):     128.569 s ±  0.168 s    [User: 143.057 s, System: 10.436 s]
      Range (min … max):   128.449 s … 128.688 s    2 runs
    d6ce0ee916
  4. DrahtBot added the label Validation on Nov 7, 2025
  5. DrahtBot commented at 12:43 pm on November 7, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33817.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  6. maflcko commented at 3:29 pm on November 7, 2025: member

    Skipping BIP30 for those deeply buried blocks

    Not sure about this. Wouldn’t this mean someone can feed a -nominimumchainwork node a bogus chain, so that the node crashes or is stuck irrecoverably on the bogus chain?

    Even if it wasn’t, I am not sure if touching validation.cpp is worth it for basically a rounding error on overall IBD speed?

  7. l0rinc commented at 3:34 pm on November 7, 2025: contributor

    I am not sure if touching validation.cpp is worth it

    It’s not about IBD necessarily, but reduced disk footprint and adjusting the database to resemble the usage more closely:

    Removing the LevelDB bloom filters slightly speeds up present-key workloads (~11% faster AssumeUTXO load) and reduces the on-disk chainstate size by ~2% because filter blocks are not stored.

  8. gmaxwell commented at 9:51 pm on November 7, 2025: contributor

    Hm. I don’t know we were aware that you could turn off the filters in leveldb– I thought they were used to also decide what level an entry might be in!

    Have you tried to characterize if this opens up any DOS attacks with unconfirmed transactions?

    I think assumeutxo load time is not the best benchmark for this– it’s a one time operation and already pretty fast. It would be more compelling if it could be shown to reduce IBD time or block validation time at tip– though the validation time graph you’ve provided isn’t very compelling.

    Nor do I think 2% storage is particularly compelling. But if there isn’t a potential downside, why not?

  9. l0rinc commented at 6:49 pm on November 9, 2025: contributor
    Thanks for the comments! I’m still testing how this combines with other LevelDB options, and how it integrates with other changes such as #31132, which could benefit from faster reads (I’m getting mixed results on different systems for now), and how much memory is saved by skipping the filters (these are all really slow to measure reliably). In the meantime please keep the conceptual reviews coming, appreciate the high-level context.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-09 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me