Revert compact block cache inefficiencies #33253
pull ajtowns wants to merge 3 commits into bitcoin:master from ajtowns:202508-cache-friendly-compactblock changing 9 files +166 −28-
ajtowns commented at 10:03 am on August 25, 2025: contributor
-
DrahtBot commented at 10:03 am on August 25, 2025: contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage & Benchmarks
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33253.
Reviews
See the guideline for information on the review process.
Type Reviewers ACK achow101, polespinasa, davidgumberg, cedwies, instagibbs Concept ACK l0rinc If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #33191 (net: Provide block templates to peers on request by ajtowns)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
-
DrahtBot added the label CI failed on Aug 25, 2025
-
DrahtBot commented at 10:29 am on August 25, 2025: contributor
🚧 At least one of the CI tasks failed. Task
ARM, unit tests, no functional tests
: https://github.com/bitcoin/bitcoin/runs/48805674537 LLM reason (✨ experimental): The CI failure is caused by a segmentation fault in the bench_sanity_check test.Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
-
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
-
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
-
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
-
-
ajtowns force-pushed on Aug 25, 2025
-
ajtowns force-pushed on Aug 25, 2025
-
ajtowns force-pushed on Aug 25, 2025
-
DrahtBot removed the label CI failed on Aug 25, 2025
-
fanquake requested review from instagibbs on Aug 25, 2025
-
instagibbs commented at 2:41 pm on August 25, 2025: memberrelated comment: #29752 (comment)
-
instagibbs commented at 2:59 pm on August 25, 2025: member
Seems like a03aef9cec35b0d03aa63d7e8093f0420cd4b40b is the problematic commit for my machine? Doubles the runtime.
Both reverted:
ns/op op/s err% total benchmark 1,955,234.00 511.45 2.1% 0.02 BlockEncoding
No reversions:
ns/op op/s err% total benchmark 3,945,310.00 253.47 3.0% 0.05 BlockEncoding
a8203e94123b6ea6e4f4a6320e3ad20457f44a28 reverted only:
ns/op op/s err% total benchmark 3,929,789.00 254.47 3.6% 0.05 BlockEncoding
a03aef9cec35b0d03aa63d7e8093f0420cd4b40b reverted only:
ns/op op/s err% total benchmark 1,964,505.00 509.03 2.8% 0.02 BlockEncoding
edit:
aj> instagibbs: re: 33253, the benchmark doesn’t exercise a8203e9 – it just passes an empty vector of extra transactions
-
jonatack commented at 3:53 pm on August 25, 2025: member
ARM64 M1 Max (2022), macOS 15.6.1
benchmark only
0| ns/op | op/s | err% | total | benchmark 1|--------------------:|--------------------:|--------:|----------:|:---------- 2| 2,643,000.00 | 378.36 | 1.5% | 0.03 | `BlockEncoding`
benchmark with reversions (the difference is from the last commit,
Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef
)0| ns/op | op/s | err% | total | benchmark 1|--------------------:|--------------------:|--------:|----------:|:---------- 2| 2,268,791.00 | 440.76 | 1.1% | 0.03 | `BlockEncoding`
-
in src/bench/blockencodings.cpp:74 in eb1920a82f outdated
69+ LOCK2(cs_main, pool.cs); 70+ 71+ { 72+ // a reasonably large mempool of 50k txs, ~10MB total 73+ std::vector<CTransactionRef> refs; 74+ for (size_t i = 0; i < 50000; ++i) {
l0rinc commented at 6:25 pm on August 25, 2025:nit: we might as well reserve50'000
herein src/bench/blockencodings.cpp:87 in eb1920a82f outdated
82+ refs.push_back(MakeTransactionRef(tx)); 83+ } 84+ 85+ // ensure mempool ordering is different to memory ordering of transactions, 86+ // to simulate a mempool that has changed over time 87+ std::shuffle(refs.begin(), refs.end(), rng);
l0rinc commented at 6:26 pm on August 25, 2025:nit:
std::ranges::shuffle
might be simpler:0 std::ranges::shuffle(refs, rng);
in src/bench/blockencodings.cpp:98 in eb1920a82f outdated
93+ 94+ BenchCBHAST cmpctblock{rng, 3000}; 95+ 96+ bench.run([&] { 97+ PartiallyDownloadedBlock pdb{&pool}; 98+ (void)pdb.InitData(cmpctblock, {});
l0rinc commented at 6:27 pm on August 25, 2025:we could consume the result to make sure the benchmark isn’t optimized away and that we’re measuring the success/failure scenario correctly
l0rinc commented at 8:32 pm on August 25, 2025:as mentioned by @instagibbs and @achow101, we could add two benches here, for empty extra and with some (partially overlapping) one, something like:
0 std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn; 1 // ... 2 extra_txn.reserve(extra_tx_count); 3 for (size_t i{0}; i < extra_tx_count; ++i) { 4 size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool 5 auto tx{CreateTestTransaction(sigspam, index)}; 6 extra_txn.emplace_back(tx->GetWitnessHash(), tx); 7 }
in src/bench/blockencodings.cpp:102 in eb1920a82f outdated
97+ PartiallyDownloadedBlock pdb{&pool}; 98+ (void)pdb.InitData(cmpctblock, {}); 99+ }); 100+} 101+ 102+BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);
l0rinc commented at 6:28 pm on August 25, 2025:nit: newline at the very end is customary for the linter and formatter to not complainin src/bench/blockencodings.cpp:64 in eb1920a82f outdated
62+{ 63+ const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN); 64+ CTxMemPool& pool = *Assert(testing_setup->m_node.mempool); 65+ std::array<std::byte,200> sigspam; 66+ sigspam.fill(std::byte(42)); 67+ InsecureRandomContext rng(11);
l0rinc commented at 6:32 pm on August 25, 2025:alternatively this may be slightly more descriptive than
11
and it could even be used to init thesigspam
:0 FastRandomContext rng{/*fDeterministic=*/true}; 1 auto sigspam{rng.randbytes<200>()};
davidgumberg commented at 9:38 pm on August 26, 2025:+1in src/bench/blockencodings.cpp:91 in eb1920a82f outdated
86+ // to simulate a mempool that has changed over time 87+ std::shuffle(refs.begin(), refs.end(), rng); 88+ 89+ for (size_t i = 0; i < refs.size(); ++i) { 90+ AddTx(refs[i], /*fee=*/refs[i]->vout[0].nValue, pool); 91+ }
l0rinc commented at 6:36 pm on August 25, 2025:nit: direct iteration would be slightly simpler:
0 for (auto& tx : refs) { 1 AddTx(tx, /*fee=*/tx->vout[0].nValue, pool); 2 }
in src/bench/blockencodings.cpp:82 in eb1920a82f outdated
77+ tx.vin[0].scriptSig = CScript() << sigspam << OP_1; 78+ tx.vin[0].scriptWitness.stack.push_back({1}); 79+ tx.vout.resize(1); 80+ tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL; 81+ tx.vout[0].nValue = i; 82+ refs.push_back(MakeTransactionRef(tx));
l0rinc commented at 6:40 pm on August 25, 2025:nit: for consistency we might as well useemplace_back
like we do inDummyBlock
in src/bench/blockencodings.cpp:100 in eb1920a82f outdated
89+ for (size_t i = 0; i < refs.size(); ++i) { 90+ AddTx(refs[i], /*fee=*/refs[i]->vout[0].nValue, pool); 91+ } 92+ } 93+ 94+ BenchCBHAST cmpctblock{rng, 3000};
l0rinc commented at 6:59 pm on August 25, 2025:nit:
0 BenchCBHAST cmpctblock{rng, /*txs=*/3'000};
in src/blockencodings.h:147 in f2b9cbed71 outdated
143@@ -144,8 +144,8 @@ class PartiallyDownloadedBlock { 144 145 explicit PartiallyDownloadedBlock(CTxMemPool* poolIn) : pool(poolIn) {} 146 147- // extra_txn is a list of extra orphan/conflicted/etc transactions to look at 148- ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn); 149+ // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
l0rinc commented at 7:06 pm on August 25, 2025:I redid the a8203e94123b6ea6e4f4a6320e3ad20457f44a28 cherry-pick, this line and the top comment are new additions (+ some conflicts were resolved slightly differently to how I have, but the result seems correct to me)in src/blockencodings.cpp:117 in decb48a607 outdated
113@@ -114,12 +114,12 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c 114 std::vector<bool> have_txn(txn_available.size()); 115 { 116 LOCK(pool->cs); 117- for (const auto& tx : pool->txns_randomized) { 118- uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash()); 119+ for (const auto& [wtxid, txit] : pool->txns_randomized) {
l0rinc commented at 7:16 pm on August 25, 2025:redid the a03aef9cec35b0d03aa63d7e8093f0420cd4b40b revert, after resolving the conflicts I ended up with a very similar change - except for some comments, somevTxHashes
vstxns_randomized
changes which weren’t revertedin src/bench/blockencodings.cpp:65 in decb48a607 outdated
60+ 61+static void BlockEncoding(benchmark::Bench& bench) 62+{ 63+ const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN); 64+ CTxMemPool& pool = *Assert(testing_setup->m_node.mempool); 65+ std::array<std::byte,200> sigspam;
l0rinc commented at 8:32 pm on August 25, 2025:We could extract the constant used here to the front, like:
0 constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};
in src/bench/blockencodings.cpp:54 in decb48a607 outdated
50+public: 51+ BenchCBHAST(InsecureRandomContext& rng, int txs) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64()) 52+ { 53+ shorttxids.reserve(txs); 54+ while (txs-- > 0) { 55+ shorttxids.push_back(rng.randbits<SHORTTXIDS_LENGTH*8>());
l0rinc commented at 8:52 pm on August 25, 2025:we’re lucky that this doesn’t generate any duplicates, but we could test those cases as well:
0struct BenchCBHAST : CBlockHeaderAndShortTxIDs 1{ 2 BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64()) 3 { 4 std::unordered_set<uint64_t> seen; 5 shorttxids.reserve(txs); 6 while (txs-- > 0) { 7 uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()}; 8 assert(seen.insert(id).second); 9 shorttxids.push_back(id); 10 } 11 if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0); 12 } 13};
and we could bench all 4 possibilities:
0for (const auto duplicates : {false, true}) { 1 BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates}; 2 for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) { 3 bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : "")) 4 .run([&] { 5 PartiallyDownloadedBlock pdb{&pool}; 6 const auto status{pdb.InitData(cmpctblock, extra)}; 7 assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK)); 8 }); 9 } 10}
resulting in something like:
ns/op op/s err% total benchmark 1,371,045.12 729.37 1.3% 5.36 reconstructions without extra
1,309,789.19 763.48 2.8% 5.35 reconstructions with extra
71,741.96 13,938.84 2.3% 5.42 reconstructions without extra (dup)
70,113.89 14,262.51 1.2% 5.50 reconstructions with extra (dup)
ajtowns commented at 3:56 am on August 26, 2025:This is just testing how quickly we can iterate through all the txs in the mempool when looking at a compact block – which is the worst case scenario that can still be fast (if we finish solving the block with transactions from the extra pool, and don’t need to go back to the network). Hitting duplicates automatically puts us in the slow resolve-via-network path, so benching that shouldn’t be terribly interesting.
l0rinc commented at 4:17 am on August 26, 2025:To make sure that’s what we’re testing, we should still assert the outcome ofInitData
wasREAD_STATUS_OK
- otherwise it would be possible that we accidentally generated duplicates here.
ajtowns commented at 5:37 pm on August 26, 2025:Added an assert and comment.in src/bench/blockencodings.cpp:1 in decb48a607 outdated
0@@ -0,0 +1,102 @@ 1+// Copyright (c) 2011-2022 The Bitcoin Core developers
l0rinc commented at 9:07 pm on August 25, 2025:nit:
0// Copyright (c) 2025-present The Bitcoin Core developers
in src/bench/blockencodings.cpp:16 in decb48a607 outdated
11+#include <script/script.h> 12+#include <sync.h> 13+#include <test/util/setup_common.h> 14+#include <test/util/txmempool.h> 15+#include <txmempool.h> 16+#include <univalue.h>
l0rinc commented at 9:07 pm on August 25, 2025:#include <univalue.h>
and#include <rpc/mempool.h>
don’t seem to be neededin src/bench/blockencodings.cpp:25 in decb48a607 outdated
21+ 22+ 23+static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs) 24+{ 25+ LockPoints lp; 26+ AddToMempool(pool, CTxMemPoolEntry(tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp));
l0rinc commented at 9:08 pm on August 25, 2025:CTxMemPoolEntry{}
would indicate slightly better that it’s a constructorl0rinc approvedl0rinc commented at 9:15 pm on August 25, 2025: contributorConcept ACK, excellent find.
I redid the benchmarks, got more modest changes with GCC and Clang:
0echo "> C++ compiler .......................... GNU $(gcc -dumpfullversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 4,386,276.06 227.98 0.2% 17,156,885.24 15,741,370.59 1.090 1,287,419.20 3.2% 5.40 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 4,373,623.58 228.64 0.2% 17,156,871.09 15,696,695.56 1.093 1,287,418.38 3.2% 5.41 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 4,380,919.51 228.26 0.2% 17,156,870.18 15,719,209.66 1.091 1,287,418.17 3.2% 5.40 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,168,586.91 315.60 0.1% 17,156,847.31 11,378,141.35 1.508 1,287,390.88 3.2% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,241,851.00 308.47 0.2% 17,156,842.43 11,640,138.08 1.474 1,287,385.26 3.2% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,173,445.04 315.11 0.1% 17,156,847.31 11,394,621.15 1.506 1,287,390.88 3.2% 5.28 BlockEncoding
i.e. 38% improvement with gcc
0echo "> C++ compiler .......................... clang $(clang -dumpversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,967,492.38 252.05 0.1% 15,494,632.58 14,241,688.80 1.088 1,070,986.04 4.0% 5.37 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,984,996.65 250.94 0.1% 15,494,608.15 14,303,936.25 1.083 1,070,973.80 4.0% 5.38 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 4,078,221.50 245.20 0.1% 15,494,616.62 14,638,662.82 1.058 1,070,976.86 4.0% 5.37 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,066,879.10 326.06 0.0% 15,497,664.10 11,011,637.17 1.407 1,070,979.09 4.0% 5.28 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,135,228.99 318.96 0.1% 15,497,665.21 11,253,673.49 1.377 1,070,979.81 4.0% 5.29 BlockEncoding
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 3,127,079.54 319.79 0.1% 15,497,665.71 11,223,561.21 1.381 1,070,979.87 4.0% 5.30 BlockEncoding
i.e. 29% improvement for clang
Similarly to other people before me, I think we could include a few more scenarios in the benchmark. I left a few suggestions explaining my suggestions, but it’s probably easier if I post the final benchmark code here as well:
0// Copyright (c) 2025-present The Bitcoin Core developers 1// Distributed under the MIT software license, see the accompanying 2// file COPYING or http://www.opensource.org/licenses/mit-license.php. 3 4#include <bench/bench.h> 5#include <blockencodings.h> 6#include <consensus/amount.h> 7#include <kernel/cs_main.h> 8#include <primitives/transaction.h> 9#include <script/script.h> 10#include <sync.h> 11#include <test/util/setup_common.h> 12#include <test/util/txmempool.h> 13#include <txmempool.h> 14#include <util/check.h> 15 16#include <memory> 17#include <unordered_set> 18#include <vector> 19#include <algorithm> 20 21static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs) 22{ 23 LockPoints lp; 24 AddToMempool(pool, CTxMemPoolEntry{tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp}); 25} 26 27namespace { 28CTransactionRef CreateTestTransaction(std::span<const std::byte> sigspam, size_t index) 29{ 30 CMutableTransaction tx; 31 tx.vin.resize(1); 32 tx.vin[0].scriptSig = CScript() << sigspam << OP_1; 33 tx.vin[0].scriptWitness.stack.push_back({1}); 34 tx.vout.resize(1); 35 tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL; 36 tx.vout[0].nValue = index; 37 return MakeTransactionRef(tx); 38} 39 40CBlock DummyBlock() 41{ 42 CBlock block; 43 block.nVersion = 5; 44 block.hashPrevBlock.SetNull(); 45 block.hashMerkleRoot.SetNull(); 46 block.nTime = 1231006505; 47 block.nBits = 0x1d00ffff; 48 block.nNonce = 2083236893; 49 block.fChecked = false; 50 CMutableTransaction tx; 51 tx.vin.resize(1); 52 tx.vout.resize(1); 53 block.vtx.emplace_back(MakeTransactionRef(tx)); // dummy coinbase 54 return block; 55} 56 57struct BenchCBHAST : CBlockHeaderAndShortTxIDs 58{ 59 BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64()) 60 { 61 std::unordered_set<uint64_t> seen; 62 shorttxids.reserve(txs); 63 while (txs-- > 0) { 64 uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()}; 65 assert(seen.insert(id).second); 66 shorttxids.push_back(id); 67 } 68 if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0); 69 } 70}; 71} // anon namespace 72 73static void BlockEncoding(benchmark::Bench& bench) 74{ 75 constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000}; 76 77 const auto testing_setup{MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN)}; 78 auto& pool{*Assert(testing_setup->m_node.mempool)}; 79 80 FastRandomContext rng{/*fDeterministic=*/true}; 81 const auto sigspam{rng.randbytes<sigspam_size>()}; 82 83 LOCK2(cs_main, pool.cs); 84 std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn; 85 86 { 87 // a reasonably large mempool of 50k txs, ~10MB total 88 std::vector<CTransactionRef> refs; 89 refs.reserve(mempool_size); 90 for (size_t i{0}; i < mempool_size; ++i) { 91 refs.emplace_back(CreateTestTransaction(sigspam, i)); 92 } 93 94 // ensure mempool ordering is different to memory ordering of transactions, 95 // to simulate a mempool that has changed over time 96 std::ranges::shuffle(refs, rng); 97 98 extra_txn.reserve(extra_tx_count); 99 for (size_t i{0}; i < extra_tx_count; ++i) { 100 size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool 101 auto tx{CreateTestTransaction(sigspam, index)}; 102 extra_txn.emplace_back(tx->GetWitnessHash(), tx); 103 } 104 105 for (const auto& tx : refs) { 106 AddTx(tx, /*fee=*/tx->vout[0].nValue, pool); 107 } 108 } 109 110 for (const auto duplicates : {false, true}) { 111 BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates}; 112 for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) { 113 bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : "")) 114 .run([&] { 115 PartiallyDownloadedBlock pdb{&pool}; 116 const auto status{pdb.InitData(cmpctblock, extra)}; 117 assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK)); 118 }); 119 } 120 } 121} 122 123BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);
Edit: what is the expected memory increase because of the change?
in src/blockencodings.cpp:160 in f2b9cbed71 outdated
156@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c 157 // Note that we don't want duplication between extra_txn and mempool to 158 // trigger this case, so we compare witness hashes first 159 if (txn_available[idit->second] && 160- txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) { 161+ txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {
polespinasa commented at 10:21 pm on August 25, 2025:I think this should be:
0if (txn_available[idit->second] && 1 txn_available[idit->second]->GetWitnessHash() != extra_txn[i].first) { ...
ajtowns commented at 4:02 am on August 26, 2025:Once we’ve found a shortid match, looking up the actual tx is fine, so I don’t think this warrants changing.
davidgumberg commented at 8:16 pm on August 26, 2025:What makes this section not performance critical, is that we’re likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of theif
check entirely, since it’s checking the inverse of theif
block above.polespinasa commented at 11:10 pm on August 25, 2025: contributorConcept ACK
Nice catch No reversions eb1920a82ffa2086c3e8e3e8946b9f43053455aa:
ns/op op/s err% total benchmark 2,163,341.00 462.25 7.4% 0.03 BlockEncoding
Only f2b9cbed7196617880d177c089332e4fd48f28ca reverted
ns/op op/s err% total benchmark 2,097,861.00 476.68 5.3% 0.03 BlockEncoding
Only decb48a607fb3911a25e5e8b37628a5e15b3ce17 reverted
ns/op op/s err% total benchmark 1,727,394.00 578.91 2.4% 0.02 BlockEncoding
Both reverted
ns/op op/s err% total benchmark 1,697,419.00 589.13 2.2% 0.02 BlockEncoding
ajtowns force-pushed on Aug 26, 2025ajtowns commented at 4:06 am on August 26, 2025: contributorCan we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.
Added benchmarks with 50k txs in mempool and 0, 100 (default), and 5000 entries in extra pool. For reference datum recommends setting the value to 1M.
instagibbs commented at 4:05 pm on August 26, 2025: memberBenchmarks now match expectation: each commit separately improves the benchmarks :heavy_check_mark:achow101 added this to the milestone 30.0 on Aug 26, 2025bench/blockencodings: add compact block reconstruction benchmark df5a50e5deRevert "refactor: Simplify `extra_txn` to be a vec of CTransactionRef instead of a vec of pair<Wtxid, CTransactionRef>"
This reverts commit a8203e94123b6ea6e4f4a6320e3ad20457f44a28.
Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef"
This reverts commit a03aef9cec35b0d03aa63d7e8093f0420cd4b40b.
ajtowns force-pushed on Aug 26, 2025achow101 commented at 5:41 pm on August 26, 2025: memberACK b7b249d3adfbd3c7b0c4d0499a86300f57982972DrahtBot requested review from polespinasa on Aug 26, 2025DrahtBot requested review from l0rinc on Aug 26, 2025ajtowns commented at 5:43 pm on August 26, 2025: contributorEdit: what is the expected memory increase because of the change?
32 bytes per mempool transaction and per extra txn (of which there are 100 by default).
A 300MB mempool might fill up with about ~100k txs (at about 440vb each on average), which would be 3.2MB total. (The 3.2MB would not be an increase overall, but would rather cause earlier eviction from the mempool, ie a ~1% reduction in capacity)
janb84 commented at 6:16 pm on August 26, 2025: contributorI have done a review on the code and tried to understand what to look for, to improve my PR reviewing in the future. I’m missing some experience to give “judgement” on this PR.
The PR restores code to undo some “optimisations” that had a negative impact on the performance (restores caching ) and adds some extra benchmarks.
results of benchmarking this PR (both reverted):
ns/op op/s err% total benchmark 2,546,375.00 392.72 1.9% 0.03 BlockEncodingLargeExtra
2,414,458.00 414.17 4.1% 0.03 BlockEncodingNoExtra
2,466,416.00 405.45 2.2% 0.03 BlockEncodingStdExtra
polespinasa commented at 6:35 pm on August 26, 2025: contributorlgtm code review and tested ACK b7b249d3adfbd3c7b0c4d0499a86300f57982972 nothing to add that has not already been said in other commentsw0xlt referenced this in commit b5488017b3 on Aug 26, 2025in src/bench/blockencodings.cpp:102 in df5a50e5de outdated
97+ extratxn.push_back(refs[i]); 98+ } 99+ 100+ BenchCBHAST cmpctblock{rng, 3000}; 101+ 102+ bench.run([&] {
davidgumberg commented at 11:24 pm on August 26, 2025:nit:
0 bench.unit("block").run([&] {
and maybe:
0 bench.unit("block").timeUnit(1ms, "ms").run([&] {
davidgumberg approveddavidgumberg commented at 0:20 am on August 27, 2025: contributorcrACK b7b249d3adfbd3c
This PR reverts two performance regressions that broke cache locality of the wtxid’s we need to hash during compact block reconstruction and it adds a nice benchmark that demonstrates the regressions and might have caught them before they were merged.
For more context: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-08-23#1755938803-1755950270;
I reviewed the code changes here directly, and I also compared them to the commits they claimed to be reverting, and the only differences I found were comment changes, small refactor-y changes, and changes related to new code.
0DIFF_NOISE="^(diff --git|index|@@)" 1REVERTING="b9300d8" 2REVERTED="a8203e9" 3 4diff --color -u\ 5 <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \ 6 <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
0@@ -1,9 +1,27 @@ 1+--- a/src/bench/blockencodings.cpp 2++++ b/src/bench/blockencodings.cpp 3+ 4+- std::vector<std::pair<Wtxid, CTransactionRef>> extratxn; 5++ std::vector<CTransactionRef> extratxn; 6+ extratxn.reserve(n_extra); 7+ for (size_t i = n_pool; i < n_pool + n_extra; ++i) { 8+- extratxn.emplace_back(refs[i]->GetWitnessHash(), refs[i]); 9++ extratxn.push_back(refs[i]); 10+ } 11 --- a/src/blockencodings.cpp 12 +++ b/src/blockencodings.cpp 13 14--ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) { 15+-/* Reconstructing a compact block is in the hot-path for block relay, 16+- * so we want to do it as quickly as possible. Because this often 17+- * involves iterating over the entire mempool, we put all the data we 18+- * need (ie the wtxid and a reference to the actual transaction data) 19+- * in a vector and iterate over the vector directly. This allows optimal 20+- * CPU caching behaviour, at a cost of only 40 bytes per transaction. 21+- */ 22+-ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) 23+-{ 24 +ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn) { 25- if (cmpctblock.header.IsNull() || (cmpctblock.shorttxids.empty() && cmpctblock.prefilledtxn.empty())) 26+ LogDebug(BCLog::CMPCTBLOCK, "Initializing PartiallyDownloadedBlock for block %s using a cmpctblock of %u bytes\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock)); 27 for (size_t i = 0; i < extra_txn.size(); i++) { 28 - uint64_t shortid = cmpctblock.GetShortID(extra_txn[i].first); 29 + if (extra_txn[i] == nullptr) { 30@@ -21,8 +39,10 @@ 31 txn_available[idit->second].reset(); 32 --- a/src/blockencodings.h 33 +++ b/src/blockencodings.h 34- // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form 35+ 36+- // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form 37 - ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn); 38++ // extra_txn is a list of extra orphan/conflicted/etc transactions to look at 39 + ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn); 40 bool IsTxAvailable(size_t index) const; 41 --- a/src/net_processing.cpp 42@@ -38,8 +58,19 @@ 43 --- a/src/test/blockencodings_tests.cpp 44 +++ b/src/test/blockencodings_tests.cpp 45 46--std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn; 47-+std::vector<CTransactionRef> extra_txn; 48+-const std::vector<std::pair<Wtxid, CTransactionRef>> empty_extra_txn; 49++const std::vector<CTransactionRef> empty_extra_txn; 50+ 51+ CBlock block(BuildBlockTestCase(rand_ctx)); 52+- std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn; 53++ std::vector<CTransactionRef> extra_txn; 54+ extra_txn.resize(10); 55+ // Add an unrelated tx to extra_txn: 56+- extra_txn[0] = {non_block_tx->GetWitnessHash(), non_block_tx}; 57++ extra_txn[0] = non_block_tx; 58+ // and a tx from the block that's not in the mempool: 59+- extra_txn[1] = {block.vtx[1]->GetWitnessHash(), block.vtx[1]}; 60++ extra_txn[1] = block.vtx[1]; 61 62 --- a/src/test/fuzz/partially_downloaded_block.cpp 63 +++ b/src/test/fuzz/partially_downloaded_block.cpp
0DIFF_NOISE="^(diff --git|index|@@)" 1REVERTING="b7b249d" 2REVERTED="a03aef9" 3 4diff --color -u\ 5 <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \ 6 <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
0@@ -1,39 +1,48 @@ 1 --- a/src/blockencodings.cpp 2 +++ b/src/blockencodings.cpp 3 LOCK(pool->cs); 4-- for (size_t i = 0; i < pool->vTxHashes.size(); i++) { 5-- uint64_t shortid = cmpctblock.GetShortID(pool->vTxHashes[i].first); 6-+ for (const auto& tx : pool->vTxHashes) { 7+- for (const auto& [wtxid, txit] : pool->txns_randomized) { 8+- uint64_t shortid = cmpctblock.GetShortID(wtxid); 9++ for (const auto& tx : pool->txns_randomized) { 10 + uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash()); 11 std::unordered_map<uint64_t, uint16_t>::iterator idit = shorttxids.find(shortid); 12 if (!have_txn[idit->second]) { 13-- txn_available[idit->second] = pool->vTxHashes[i].second->GetSharedTx(); 14+- txn_available[idit->second] = txit->GetSharedTx(); 15 + txn_available[idit->second] = tx; 16 have_txn[idit->second] = true; 17 --- a/src/test/blockencodings_tests.cpp 18 +++ b/src/test/blockencodings_tests.cpp 19 // Number of shared use_counts we expect for a tx we haven't touched 20--// (block + mempool + our copy from the GetSharedTx call) 21+-// (block + mempool entry + our copy from the GetSharedTx call) 22 -constexpr long SHARED_TX_OFFSET{3}; 23-+// (block + mempool entry + mempool vTxHashes + our copy from the GetSharedTx call) 24++// (block + mempool entry + mempool txns_randomized + our copy from the GetSharedTx call) 25 +constexpr long SHARED_TX_OFFSET{4}; 26 27 --- a/src/txmempool.cpp 28 +++ b/src/txmempool.cpp 29 30-- vTxHashes.emplace_back(tx.GetWitnessHash(), newit); 31-+ vTxHashes.emplace_back(newit->GetSharedTx()); 32- newit->vTxHashesIdx = vTxHashes.size() - 1; 33- if (vTxHashes.size() > 1) { 34-+ // Update vTxHashesIdx of the to-be-moved entry. 35-+ Assert(GetEntry(vTxHashes.back()->GetHash()))->vTxHashesIdx = it->vTxHashesIdx; 36-+ // Remove entry from vTxHashes by replacing it with the back and deleting the back. 37- vTxHashes[it->vTxHashesIdx] = std::move(vTxHashes.back()); 38-- vTxHashes[it->vTxHashesIdx].second->vTxHashesIdx = it->vTxHashesIdx; 39- vTxHashes.pop_back(); 40+- txns_randomized.emplace_back(tx.GetWitnessHash(), newit); 41++ txns_randomized.emplace_back(newit->GetSharedTx()); 42+ newit->idx_randomized = txns_randomized.size() - 1; 43+ if (txns_randomized.size() > 1) { 44++ // Update idx_randomized of the to-be-moved entry. 45++ Assert(GetEntry(txns_randomized.back()->GetHash()))->idx_randomized = it->idx_randomized; 46+ // Remove entry from txns_randomized by replacing it with the back and deleting the back. 47+ txns_randomized[it->idx_randomized] = std::move(txns_randomized.back()); 48+- txns_randomized[it->idx_randomized].second->idx_randomized = it->idx_randomized; 49+ txns_randomized.pop_back(); 50+- if (txns_randomized.size() * 2 < txns_randomized.capacity()) { 51++ if (txns_randomized.size() * 2 < txns_randomized.capacity()) 52+ txns_randomized.shrink_to_fit(); 53+- } 54+- } else { 55++ } else 56+ txns_randomized.clear(); 57+- } 58+ 59 --- a/src/txmempool.h 60 +++ b/src/txmempool.h 61 using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator; 62-- std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order 63-+ std::vector<CTransactionRef> vTxHashes GUARDED_BY(cs); //!< All transactions in mapTx, in random order 64+- std::vector<std::pair<Wtxid, txiter>> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx with their wtxids, in arbitrary order 65++ std::vector<CTransactionRef> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx, in random order
benchmark % faster BlockEncodingLargeExtra
19.9% BlockEncodingNoExtra
18.5% BlockEncodingStdExtra
18.6% % faster calculated as: $
1 - \frac{\text{time}_\text{branch}}{\text{time}_\text{master}}
$0$ ./build/bin/bench_bitcoin --filter="BlockEncoding.*" -min-time=5000
df5a50e5de (master + benchmark added)
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 1,644,722.24 608.01 0.3% 17,098,172.55 7,050,895.83 2.425 1,138,993.60 3.8% 5.47 BlockEncodingLargeExtra
1,461,528.56 684.22 0.2% 15,650,495.41 6,266,693.24 2.497 1,064,736.70 3.5% 5.49 BlockEncodingNoExtra
1,476,456.22 677.30 0.9% 15,680,631.05 6,328,282.73 2.478 1,066,421.94 3.5% 5.55 BlockEncodingStdExtra
b9300d8d0a74b15a220a2ce529d5157d109c7ed3 (extratxn better data locality)
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 1,566,821.01 638.23 0.2% 17,099,295.07 6,718,281.29 2.545 1,134,042.47 3.9% 5.53 BlockEncodingLargeExtra
1,467,993.68 681.20 0.1% 15,656,135.06 6,294,675.71 2.487 1,064,516.33 3.6% 5.47 BlockEncodingNoExtra
1,454,058.97 687.73 0.5% 15,687,165.75 6,234,731.73 2.516 1,066,490.20 3.5% 5.48 BlockEncodingStdExtra
b7b249d3ad (mempool better data locality)
ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark 1,316,987.01 759.31 0.1% 17,049,029.82 5,652,300.18 3.016 1,133,975.24 3.8% 5.50 BlockEncodingLargeExtra
1,190,773.91 839.79 0.3% 15,606,374.31 5,102,241.10 3.059 1,064,614.44 3.5% 5.48 BlockEncodingNoExtra
1,201,347.37 832.40 0.2% 15,637,600.08 5,147,161.34 3.038 1,066,621.86 3.5% 5.43 BlockEncodingStdExtra
cedwies commented at 12:09 pm on August 28, 2025: nonecode-review ACK b7b249d
Read-through + local bench; results match others which I have seen. This PR restores cache locality in the compact-block reconstruction hot path and adds a well focused benchmark.
- Reverts the “vector of
CTransactionRef
” refactors and restores a cache-friendly layout for reconstruction. InitData
again takesstd::vector<std::pair<Wtxid, CTransactionRef>> extra_txn
, causing the wtxid to be available without dereferencing in the hot loop.- Early-exit and collision behavior are preserved.
- Bench design is scoped to the hot path we care about and does not mix in network or dup-resolution behavior.
Strong improvement. The memory overhead is small and well justified by the speedup.
in src/bench/blockencodings.cpp:126 in df5a50e5de outdated
121+ BlockEncodingBench(bench, 50000, 100); 122+} 123+ 124+static void BlockEncodingLargeExtra(benchmark::Bench& bench) 125+{ 126+ BlockEncodingBench(bench, 50000, 5000);
instagibbs commented at 1:21 pm on August 28, 2025:micro-nit: annotate numbered argsin src/bench/blockencodings.cpp:121 in df5a50e5de outdated
116+} 117+ 118+static void BlockEncodingStdExtra(benchmark::Bench& bench) 119+{ 120+ static_assert(DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN == 100); 121+ BlockEncodingBench(bench, 50000, 100);
instagibbs commented at 1:21 pm on August 28, 2025:micro-nit: annotate numbered argsin src/bench/blockencodings.cpp:115 in df5a50e5de outdated
110+ }); 111+} 112+ 113+static void BlockEncodingNoExtra(benchmark::Bench& bench) 114+{ 115+ BlockEncodingBench(bench, 50000, 0);
instagibbs commented at 1:21 pm on August 28, 2025:micro-nit: annotate numbered argsin src/bench/blockencodings.cpp:75 in df5a50e5de outdated
70+ 71+ // bump up the size of txs 72+ std::array<std::byte,200> sigspam; 73+ sigspam.fill(std::byte(42)); 74+ 75+ // a reasonably large mempool of 50k txs, ~10MB total
instagibbs commented at 1:25 pm on August 28, 2025:0 // a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns
instagibbs approvedinstagibbs commented at 1:32 pm on August 28, 2025: memberACK b7b249d3adfbd3c7b0c4d0499a86300f57982972
non-blocking nits only
achow101 merged this on Aug 28, 2025achow101 closed this on Aug 28, 2025
l0rinc commented at 0:04 am on August 29, 2025: contributorpost-merge ACK
ajtowns
DrahtBot
instagibbs
jonatack
l0rinc
davidgumberg
achow101
polespinasa
janb84
cedwies
Milestone
30.0
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-02 12:13 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me