Revert compact block cache inefficiencies #33253

pull ajtowns wants to merge 3 commits into bitcoin:master from ajtowns:202508-cache-friendly-compactblock changing 9 files +166 −28
  1. ajtowns commented at 10:03 am on August 25, 2025: contributor
    Reconstructing compact blocks is on the hot path for block relay, so revert changes from #28391 and #29752 that made it slower. Also add a benchmark to validate reconstruction performance, and a comment giving some background as to the approach.
  2. DrahtBot commented at 10:03 am on August 25, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33253.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK achow101, polespinasa, davidgumberg, cedwies, instagibbs
    Concept ACK l0rinc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33191 (net: Provide block templates to peers on request by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. ajtowns commented at 10:05 am on August 25, 2025: contributor

    On my dev machine with a release build, I see a 64% performance gain on the benchmark after the reversion patches are applied.

    cc @sipa @gmaxwell

  4. DrahtBot added the label CI failed on Aug 25, 2025
  5. DrahtBot commented at 10:29 am on August 25, 2025: contributor

    🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/48805674537 LLM reason (✨ experimental): The CI failure is caused by a segmentation fault in the bench_sanity_check test.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. ajtowns force-pushed on Aug 25, 2025
  7. ajtowns force-pushed on Aug 25, 2025
  8. ajtowns force-pushed on Aug 25, 2025
  9. DrahtBot removed the label CI failed on Aug 25, 2025
  10. fanquake requested review from instagibbs on Aug 25, 2025
  11. instagibbs commented at 2:41 pm on August 25, 2025: member
    related comment: #29752 (comment)
  12. instagibbs commented at 2:59 pm on August 25, 2025: member

    Seems like a03aef9cec35b0d03aa63d7e8093f0420cd4b40b is the problematic commit for my machine? Doubles the runtime.

    Both reverted:

    ns/op op/s err% total benchmark
    1,955,234.00 511.45 2.1% 0.02 BlockEncoding

    No reversions:

    ns/op op/s err% total benchmark
    3,945,310.00 253.47 3.0% 0.05 BlockEncoding

    a8203e94123b6ea6e4f4a6320e3ad20457f44a28 reverted only:

    ns/op op/s err% total benchmark
    3,929,789.00 254.47 3.6% 0.05 BlockEncoding

    a03aef9cec35b0d03aa63d7e8093f0420cd4b40b reverted only:

    ns/op op/s err% total benchmark
    1,964,505.00 509.03 2.8% 0.02 BlockEncoding

    edit:

    aj> instagibbs: re: 33253, the benchmark doesn’t exercise a8203e9 – it just passes an empty vector of extra transactions

  13. jonatack commented at 3:53 pm on August 25, 2025: member

    ARM64 M1 Max (2022), macOS 15.6.1

    benchmark only

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        2,643,000.00 |              378.36 |    1.5% |      0.03 | `BlockEncoding`
    

    benchmark with reversions (the difference is from the last commit, Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef)

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        2,268,791.00 |              440.76 |    1.1% |      0.03 | `BlockEncoding`
    
  14. in src/bench/blockencodings.cpp:74 in eb1920a82f outdated
    69+    LOCK2(cs_main, pool.cs);
    70+
    71+    {
    72+        // a reasonably large mempool of 50k txs, ~10MB total
    73+        std::vector<CTransactionRef> refs;
    74+        for (size_t i = 0; i < 50000; ++i) {
    


    l0rinc commented at 6:25 pm on August 25, 2025:
    nit: we might as well reserve 50'000 here
  15. in src/bench/blockencodings.cpp:87 in eb1920a82f outdated
    82+            refs.push_back(MakeTransactionRef(tx));
    83+        }
    84+
    85+        // ensure mempool ordering is different to memory ordering of transactions,
    86+        // to simulate a mempool that has changed over time
    87+        std::shuffle(refs.begin(), refs.end(), rng);
    


    l0rinc commented at 6:26 pm on August 25, 2025:

    nit: std::ranges::shuffle might be simpler:

    0        std::ranges::shuffle(refs, rng);
    
  16. in src/bench/blockencodings.cpp:98 in eb1920a82f outdated
    93+
    94+    BenchCBHAST cmpctblock{rng, 3000};
    95+
    96+    bench.run([&] {
    97+        PartiallyDownloadedBlock pdb{&pool};
    98+        (void)pdb.InitData(cmpctblock, {});
    


    l0rinc commented at 6:27 pm on August 25, 2025:
    we could consume the result to make sure the benchmark isn’t optimized away and that we’re measuring the success/failure scenario correctly

    l0rinc commented at 8:32 pm on August 25, 2025:

    as mentioned by @instagibbs and @achow101, we could add two benches here, for empty extra and with some (partially overlapping) one, something like:

    0    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
    1    // ...
    2        extra_txn.reserve(extra_tx_count);
    3        for (size_t i{0}; i < extra_tx_count; ++i) {
    4            size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool
    5            auto tx{CreateTestTransaction(sigspam, index)};
    6            extra_txn.emplace_back(tx->GetWitnessHash(), tx);
    7        }
    
  17. in src/bench/blockencodings.cpp:102 in eb1920a82f outdated
     97+        PartiallyDownloadedBlock pdb{&pool};
     98+        (void)pdb.InitData(cmpctblock, {});
     99+    });
    100+}
    101+
    102+BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);
    


    l0rinc commented at 6:28 pm on August 25, 2025:
    nit: newline at the very end is customary for the linter and formatter to not complain
  18. in src/bench/blockencodings.cpp:64 in eb1920a82f outdated
    62+{
    63+    const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN);
    64+    CTxMemPool& pool = *Assert(testing_setup->m_node.mempool);
    65+    std::array<std::byte,200> sigspam;
    66+    sigspam.fill(std::byte(42));
    67+    InsecureRandomContext rng(11);
    


    l0rinc commented at 6:32 pm on August 25, 2025:

    alternatively this may be slightly more descriptive than 11 and it could even be used to init the sigspam:

    0    FastRandomContext rng{/*fDeterministic=*/true};
    1    auto sigspam{rng.randbytes<200>()};
    

    davidgumberg commented at 9:38 pm on August 26, 2025:
    +1
  19. in src/bench/blockencodings.cpp:91 in eb1920a82f outdated
    86+        // to simulate a mempool that has changed over time
    87+        std::shuffle(refs.begin(), refs.end(), rng);
    88+
    89+        for (size_t i = 0; i < refs.size(); ++i) {
    90+            AddTx(refs[i], /*fee=*/refs[i]->vout[0].nValue, pool);
    91+        }
    


    l0rinc commented at 6:36 pm on August 25, 2025:

    nit: direct iteration would be slightly simpler:

    0        for (auto& tx : refs) {
    1            AddTx(tx, /*fee=*/tx->vout[0].nValue, pool);
    2        }
    
  20. in src/bench/blockencodings.cpp:82 in eb1920a82f outdated
    77+            tx.vin[0].scriptSig = CScript() << sigspam << OP_1;
    78+            tx.vin[0].scriptWitness.stack.push_back({1});
    79+            tx.vout.resize(1);
    80+            tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL;
    81+            tx.vout[0].nValue = i;
    82+            refs.push_back(MakeTransactionRef(tx));
    


    l0rinc commented at 6:40 pm on August 25, 2025:
    nit: for consistency we might as well use emplace_back like we do in DummyBlock
  21. in src/bench/blockencodings.cpp:100 in eb1920a82f outdated
    89+        for (size_t i = 0; i < refs.size(); ++i) {
    90+            AddTx(refs[i], /*fee=*/refs[i]->vout[0].nValue, pool);
    91+        }
    92+    }
    93+
    94+    BenchCBHAST cmpctblock{rng, 3000};
    


    l0rinc commented at 6:59 pm on August 25, 2025:

    nit:

    0    BenchCBHAST cmpctblock{rng, /*txs=*/3'000};
    
  22. in src/blockencodings.h:147 in f2b9cbed71 outdated
    143@@ -144,8 +144,8 @@ class PartiallyDownloadedBlock {
    144 
    145     explicit PartiallyDownloadedBlock(CTxMemPool* poolIn) : pool(poolIn) {}
    146 
    147-    // extra_txn is a list of extra orphan/conflicted/etc transactions to look at
    148-    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn);
    149+    // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
    


    l0rinc commented at 7:06 pm on August 25, 2025:
    I redid the a8203e94123b6ea6e4f4a6320e3ad20457f44a28 cherry-pick, this line and the top comment are new additions (+ some conflicts were resolved slightly differently to how I have, but the result seems correct to me)
  23. in src/blockencodings.cpp:117 in decb48a607 outdated
    113@@ -114,12 +114,12 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
    114     std::vector<bool> have_txn(txn_available.size());
    115     {
    116     LOCK(pool->cs);
    117-    for (const auto& tx : pool->txns_randomized) {
    118-        uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash());
    119+    for (const auto& [wtxid, txit] : pool->txns_randomized) {
    


    l0rinc commented at 7:16 pm on August 25, 2025:
    redid the a03aef9cec35b0d03aa63d7e8093f0420cd4b40b revert, after resolving the conflicts I ended up with a very similar change - except for some comments, some vTxHashes vs txns_randomized changes which weren’t reverted
  24. achow101 commented at 7:44 pm on August 25, 2025: member

    aj> instagibbs: re: 33253, the benchmark doesn’t exercise a8203e9 – it just passes an empty vector of extra transactions

    Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

  25. in src/bench/blockencodings.cpp:65 in decb48a607 outdated
    60+
    61+static void BlockEncoding(benchmark::Bench& bench)
    62+{
    63+    const auto testing_setup = MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN);
    64+    CTxMemPool& pool = *Assert(testing_setup->m_node.mempool);
    65+    std::array<std::byte,200> sigspam;
    


    l0rinc commented at 8:32 pm on August 25, 2025:

    We could extract the constant used here to the front, like:

    0    constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};
    
  26. in src/bench/blockencodings.cpp:54 in decb48a607 outdated
    50+public:
    51+    BenchCBHAST(InsecureRandomContext& rng, int txs) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
    52+    {
    53+        shorttxids.reserve(txs);
    54+        while (txs-- > 0) {
    55+            shorttxids.push_back(rng.randbits<SHORTTXIDS_LENGTH*8>());
    


    l0rinc commented at 8:52 pm on August 25, 2025:

    we’re lucky that this doesn’t generate any duplicates, but we could test those cases as well:

     0struct BenchCBHAST : CBlockHeaderAndShortTxIDs
     1{
     2    BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
     3    {
     4        std::unordered_set<uint64_t> seen;
     5        shorttxids.reserve(txs);
     6        while (txs-- > 0) {
     7            uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
     8            assert(seen.insert(id).second);
     9            shorttxids.push_back(id);
    10        }
    11        if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
    12    }
    13};
    

    and we could bench all 4 possibilities:

     0for (const auto duplicates : {false, true}) {
     1    BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
     2    for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
     3        bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
     4             .run([&] {
     5                 PartiallyDownloadedBlock pdb{&pool};
     6                 const auto status{pdb.InitData(cmpctblock, extra)};
     7                 assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
     8             });
     9    }
    10}
    

    resulting in something like:

    ns/op op/s err% total benchmark
    1,371,045.12 729.37 1.3% 5.36 reconstructions without extra
    1,309,789.19 763.48 2.8% 5.35 reconstructions with extra
    71,741.96 13,938.84 2.3% 5.42 reconstructions without extra (dup)
    70,113.89 14,262.51 1.2% 5.50 reconstructions with extra (dup)

    ajtowns commented at 3:56 am on August 26, 2025:
    This is just testing how quickly we can iterate through all the txs in the mempool when looking at a compact block – which is the worst case scenario that can still be fast (if we finish solving the block with transactions from the extra pool, and don’t need to go back to the network). Hitting duplicates automatically puts us in the slow resolve-via-network path, so benching that shouldn’t be terribly interesting.

    l0rinc commented at 4:17 am on August 26, 2025:
    To make sure that’s what we’re testing, we should still assert the outcome of InitData was READ_STATUS_OK - otherwise it would be possible that we accidentally generated duplicates here.

    ajtowns commented at 5:37 pm on August 26, 2025:
    Added an assert and comment.
  27. in src/bench/blockencodings.cpp:1 in decb48a607 outdated
    0@@ -0,0 +1,102 @@
    1+// Copyright (c) 2011-2022 The Bitcoin Core developers
    


    l0rinc commented at 9:07 pm on August 25, 2025:

    nit:

    0// Copyright (c) 2025-present The Bitcoin Core developers
    
  28. in src/bench/blockencodings.cpp:16 in decb48a607 outdated
    11+#include <script/script.h>
    12+#include <sync.h>
    13+#include <test/util/setup_common.h>
    14+#include <test/util/txmempool.h>
    15+#include <txmempool.h>
    16+#include <univalue.h>
    


    l0rinc commented at 9:07 pm on August 25, 2025:
    #include <univalue.h> and #include <rpc/mempool.h> don’t seem to be needed
  29. in src/bench/blockencodings.cpp:25 in decb48a607 outdated
    21+
    22+
    23+static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
    24+{
    25+    LockPoints lp;
    26+    AddToMempool(pool, CTxMemPoolEntry(tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp));
    


    l0rinc commented at 9:08 pm on August 25, 2025:
    CTxMemPoolEntry{} would indicate slightly better that it’s a constructor
  30. l0rinc approved
  31. l0rinc commented at 9:15 pm on August 25, 2025: contributor

    Concept ACK, excellent find.

    I redid the benchmarks, got more modest changes with GCC and Clang:

    0echo ">  C++ compiler .......................... GNU $(gcc -dumpfullversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    4,386,276.06 227.98 0.2% 17,156,885.24 15,741,370.59 1.090 1,287,419.20 3.2% 5.40 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    4,373,623.58 228.64 0.2% 17,156,871.09 15,696,695.56 1.093 1,287,418.38 3.2% 5.41 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    4,380,919.51 228.26 0.2% 17,156,870.18 15,719,209.66 1.091 1,287,418.17 3.2% 5.40 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,168,586.91 315.60 0.1% 17,156,847.31 11,378,141.35 1.508 1,287,390.88 3.2% 5.29 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,241,851.00 308.47 0.2% 17,156,842.43 11,640,138.08 1.474 1,287,385.26 3.2% 5.29 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,173,445.04 315.11 0.1% 17,156,847.31 11,394,621.15 1.506 1,287,390.88 3.2% 5.28 BlockEncoding

    i.e. 38% improvement with gcc


    0echo ">  C++ compiler .......................... clang $(clang -dumpversion)" && rm -rf build && cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && cmake --build build -j$(nproc) >/dev/null 2>&1 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000 && build/bin/bench_bitcoin -filter='BlockEncoding' -min-time=5000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,967,492.38 252.05 0.1% 15,494,632.58 14,241,688.80 1.088 1,070,986.04 4.0% 5.37 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,984,996.65 250.94 0.1% 15,494,608.15 14,303,936.25 1.083 1,070,973.80 4.0% 5.38 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    4,078,221.50 245.20 0.1% 15,494,616.62 14,638,662.82 1.058 1,070,976.86 4.0% 5.37 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,066,879.10 326.06 0.0% 15,497,664.10 11,011,637.17 1.407 1,070,979.09 4.0% 5.28 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,135,228.99 318.96 0.1% 15,497,665.21 11,253,673.49 1.377 1,070,979.81 4.0% 5.29 BlockEncoding
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,127,079.54 319.79 0.1% 15,497,665.71 11,223,561.21 1.381 1,070,979.87 4.0% 5.30 BlockEncoding

    i.e. 29% improvement for clang


    Similarly to other people before me, I think we could include a few more scenarios in the benchmark. I left a few suggestions explaining my suggestions, but it’s probably easier if I post the final benchmark code here as well:

      0// Copyright (c) 2025-present The Bitcoin Core developers
      1// Distributed under the MIT software license, see the accompanying
      2// file COPYING or http://www.opensource.org/licenses/mit-license.php.
      3
      4#include <bench/bench.h>
      5#include <blockencodings.h>
      6#include <consensus/amount.h>
      7#include <kernel/cs_main.h>
      8#include <primitives/transaction.h>
      9#include <script/script.h>
     10#include <sync.h>
     11#include <test/util/setup_common.h>
     12#include <test/util/txmempool.h>
     13#include <txmempool.h>
     14#include <util/check.h>
     15
     16#include <memory>
     17#include <unordered_set>
     18#include <vector>
     19#include <algorithm>
     20
     21static void AddTx(const CTransactionRef& tx, const CAmount& fee, CTxMemPool& pool) EXCLUSIVE_LOCKS_REQUIRED(cs_main, pool.cs)
     22{
     23    LockPoints lp;
     24    AddToMempool(pool, CTxMemPoolEntry{tx, fee, /*time=*/0, /*entry_height=*/1, /*entry_sequence=*/0, /*spends_coinbase=*/false, /*sigops_cost=*/4, lp});
     25}
     26
     27namespace {
     28CTransactionRef CreateTestTransaction(std::span<const std::byte> sigspam, size_t index)
     29{
     30    CMutableTransaction tx;
     31    tx.vin.resize(1);
     32    tx.vin[0].scriptSig = CScript() << sigspam << OP_1;
     33    tx.vin[0].scriptWitness.stack.push_back({1});
     34    tx.vout.resize(1);
     35    tx.vout[0].scriptPubKey = CScript() << OP_1 << OP_EQUAL;
     36    tx.vout[0].nValue = index;
     37    return MakeTransactionRef(tx);
     38}
     39
     40CBlock DummyBlock()
     41{
     42    CBlock block;
     43    block.nVersion = 5;
     44    block.hashPrevBlock.SetNull();
     45    block.hashMerkleRoot.SetNull();
     46    block.nTime = 1231006505;
     47    block.nBits = 0x1d00ffff;
     48    block.nNonce = 2083236893;
     49    block.fChecked = false;
     50    CMutableTransaction tx;
     51    tx.vin.resize(1);
     52    tx.vout.resize(1);
     53    block.vtx.emplace_back(MakeTransactionRef(tx)); // dummy coinbase
     54    return block;
     55}
     56
     57struct BenchCBHAST : CBlockHeaderAndShortTxIDs
     58{
     59    BenchCBHAST(FastRandomContext& rng, int txs, bool duplicates) : CBlockHeaderAndShortTxIDs(DummyBlock(), rng.rand64())
     60    {
     61        std::unordered_set<uint64_t> seen;
     62        shorttxids.reserve(txs);
     63        while (txs-- > 0) {
     64            uint64_t id{rng.randbits<SHORTTXIDS_LENGTH * 8>()};
     65            assert(seen.insert(id).second);
     66            shorttxids.push_back(id);
     67        }
     68        if (duplicates) shorttxids.at(shorttxids.size() - 1) = shorttxids.at(0);
     69    }
     70};
     71} // anon namespace
     72
     73static void BlockEncoding(benchmark::Bench& bench)
     74{
     75    constexpr size_t mempool_size{50'000}, sigspam_size{200}, extra_tx_count{10}, shortid_count{3'000};
     76
     77    const auto testing_setup{MakeNoLogFileContext<const ChainTestingSetup>(ChainType::MAIN)};
     78    auto& pool{*Assert(testing_setup->m_node.mempool)};
     79
     80    FastRandomContext rng{/*fDeterministic=*/true};
     81    const auto sigspam{rng.randbytes<sigspam_size>()};
     82
     83    LOCK2(cs_main, pool.cs);
     84    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
     85
     86    {
     87        // a reasonably large mempool of 50k txs, ~10MB total
     88        std::vector<CTransactionRef> refs;
     89        refs.reserve(mempool_size);
     90        for (size_t i{0}; i < mempool_size; ++i) {
     91            refs.emplace_back(CreateTestTransaction(sigspam, i));
     92        }
     93
     94        // ensure mempool ordering is different to memory ordering of transactions,
     95        // to simulate a mempool that has changed over time
     96        std::ranges::shuffle(refs, rng);
     97
     98        extra_txn.reserve(extra_tx_count);
     99        for (size_t i{0}; i < extra_tx_count; ++i) {
    100            size_t index{i < (extra_tx_count / 2) ? mempool_size + i : mempool_size - i}; // some overlap with mempool
    101            auto tx{CreateTestTransaction(sigspam, index)};
    102            extra_txn.emplace_back(tx->GetWitnessHash(), tx);
    103        }
    104
    105        for (const auto& tx : refs) {
    106            AddTx(tx, /*fee=*/tx->vout[0].nValue, pool);
    107        }
    108    }
    109
    110    for (const auto duplicates : {false, true}) {
    111        BenchCBHAST cmpctblock{rng, /*txs=*/shortid_count, duplicates};
    112        for (const auto& extra : {extra_txn, std::vector<std::pair<Wtxid, CTransactionRef>>{}}) {
    113            bench.name(strprintf("reconstructions %s%s", extra.empty() ? "without extra" : "with extra", duplicates ? " (dup)" : ""))
    114                 .run([&] {
    115                     PartiallyDownloadedBlock pdb{&pool};
    116                     const auto status{pdb.InitData(cmpctblock, extra)};
    117                     assert(status == (duplicates ? READ_STATUS_FAILED : READ_STATUS_OK));
    118                 });
    119        }
    120    }
    121}
    122
    123BENCHMARK(BlockEncoding, benchmark::PriorityLevel::HIGH);
    

    Edit: what is the expected memory increase because of the change?

  32. in src/blockencodings.cpp:160 in f2b9cbed71 outdated
    156@@ -152,7 +157,7 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
    157                 // Note that we don't want duplication between extra_txn and mempool to
    158                 // trigger this case, so we compare witness hashes first
    159                 if (txn_available[idit->second] &&
    160-                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i]->GetWitnessHash()) {
    161+                        txn_available[idit->second]->GetWitnessHash() != extra_txn[i].second->GetWitnessHash()) {
    


    polespinasa commented at 10:21 pm on August 25, 2025:

    I think this should be:

    0if (txn_available[idit->second] && 
    1    txn_available[idit->second]->GetWitnessHash() != extra_txn[i].first) { ...
    

    ajtowns commented at 4:02 am on August 26, 2025:
    Once we’ve found a shortid match, looking up the actual tx is fine, so I don’t think this warrants changing.

    davidgumberg commented at 8:16 pm on August 26, 2025:
    What makes this section not performance critical, is that we’re likely going to have to do a full round-trip anyways to get the collided transactions, but I still think the suggestion is more correct/consistent, so might be nice to do if retouching. Out of scope, but we can also remove the first clause of the if check entirely, since it’s checking the inverse of the if block above.
  33. polespinasa commented at 11:10 pm on August 25, 2025: contributor

    Concept ACK

    Nice catch No reversions eb1920a82ffa2086c3e8e3e8946b9f43053455aa:

    ns/op op/s err% total benchmark
    2,163,341.00 462.25 7.4% 0.03 BlockEncoding

    Only f2b9cbed7196617880d177c089332e4fd48f28ca reverted

    ns/op op/s err% total benchmark
    2,097,861.00 476.68 5.3% 0.03 BlockEncoding

    Only decb48a607fb3911a25e5e8b37628a5e15b3ce17 reverted

    ns/op op/s err% total benchmark
    1,727,394.00 578.91 2.4% 0.02 BlockEncoding

    Both reverted

    ns/op op/s err% total benchmark
    1,697,419.00 589.13 2.2% 0.02 BlockEncoding
  34. ajtowns force-pushed on Aug 26, 2025
  35. ajtowns commented at 4:06 am on August 26, 2025: contributor

    Can we bench that? Maybe as a separate benchmark instead of having this one benchmark include both.

    Added benchmarks with 50k txs in mempool and 0, 100 (default), and 5000 entries in extra pool. For reference datum recommends setting the value to 1M.

  36. instagibbs commented at 4:05 pm on August 26, 2025: member
    Benchmarks now match expectation: each commit separately improves the benchmarks :heavy_check_mark:
  37. achow101 added this to the milestone 30.0 on Aug 26, 2025
  38. bench/blockencodings: add compact block reconstruction benchmark df5a50e5de
  39. Revert "refactor: Simplify `extra_txn` to be a vec of CTransactionRef instead of a vec of pair<Wtxid, CTransactionRef>"
    This reverts commit a8203e94123b6ea6e4f4a6320e3ad20457f44a28.
    b9300d8d0a
  40. Revert "[refactor] rewrite vTxHashes as a vector of CTransactionRef"
    This reverts commit a03aef9cec35b0d03aa63d7e8093f0420cd4b40b.
    b7b249d3ad
  41. ajtowns force-pushed on Aug 26, 2025
  42. achow101 commented at 5:41 pm on August 26, 2025: member
    ACK b7b249d3adfbd3c7b0c4d0499a86300f57982972
  43. DrahtBot requested review from polespinasa on Aug 26, 2025
  44. DrahtBot requested review from l0rinc on Aug 26, 2025
  45. ajtowns commented at 5:43 pm on August 26, 2025: contributor

    Edit: what is the expected memory increase because of the change?

    32 bytes per mempool transaction and per extra txn (of which there are 100 by default).

    A 300MB mempool might fill up with about ~100k txs (at about 440vb each on average), which would be 3.2MB total. (The 3.2MB would not be an increase overall, but would rather cause earlier eviction from the mempool, ie a ~1% reduction in capacity)

  46. janb84 commented at 6:16 pm on August 26, 2025: contributor

    I have done a review on the code and tried to understand what to look for, to improve my PR reviewing in the future. I’m missing some experience to give “judgement” on this PR.

    The PR restores code to undo some “optimisations” that had a negative impact on the performance (restores caching ) and adds some extra benchmarks.

    results of benchmarking this PR (both reverted):

    ns/op op/s err% total benchmark
    2,546,375.00 392.72 1.9% 0.03 BlockEncodingLargeExtra
    2,414,458.00 414.17 4.1% 0.03 BlockEncodingNoExtra
    2,466,416.00 405.45 2.2% 0.03 BlockEncodingStdExtra
  47. polespinasa commented at 6:35 pm on August 26, 2025: contributor
    lgtm code review and tested ACK b7b249d3adfbd3c7b0c4d0499a86300f57982972 nothing to add that has not already been said in other comments
  48. w0xlt referenced this in commit b5488017b3 on Aug 26, 2025
  49. in src/bench/blockencodings.cpp:102 in df5a50e5de outdated
     97+        extratxn.push_back(refs[i]);
     98+    }
     99+
    100+    BenchCBHAST cmpctblock{rng, 3000};
    101+
    102+    bench.run([&] {
    


    davidgumberg commented at 11:24 pm on August 26, 2025:

    nit:

    0    bench.unit("block").run([&] {
    

    and maybe:

    0    bench.unit("block").timeUnit(1ms, "ms").run([&] {
    
  50. davidgumberg approved
  51. davidgumberg commented at 0:20 am on August 27, 2025: contributor

    crACK b7b249d3adfbd3c

    This PR reverts two performance regressions that broke cache locality of the wtxid’s we need to hash during compact block reconstruction and it adds a nice benchmark that demonstrates the regressions and might have caught them before they were merged.

    For more context: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-08-23#1755938803-1755950270;

    I reviewed the code changes here directly, and I also compared them to the commits they claimed to be reverting, and the only differences I found were comment changes, small refactor-y changes, and changes related to new code.

    0DIFF_NOISE="^(diff --git|index|@@)"
    1REVERTING="b9300d8"
    2REVERTED="a8203e9"
    3
    4diff --color -u\
    5  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
    6  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
    
     0@@ -1,9 +1,27 @@
     1+--- a/src/bench/blockencodings.cpp
     2++++ b/src/bench/blockencodings.cpp
     3+ 
     4+-    std::vector<std::pair<Wtxid, CTransactionRef>> extratxn;
     5++    std::vector<CTransactionRef> extratxn;
     6+     extratxn.reserve(n_extra);
     7+     for (size_t i = n_pool; i < n_pool + n_extra; ++i) {
     8+-        extratxn.emplace_back(refs[i]->GetWitnessHash(), refs[i]);
     9++        extratxn.push_back(refs[i]);
    10+     }
    11 --- a/src/blockencodings.cpp
    12 +++ b/src/blockencodings.cpp
    13  
    14--ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn) {
    15+-/* Reconstructing a compact block is in the hot-path for block relay,
    16+- * so we want to do it as quickly as possible. Because this often
    17+- * involves iterating over the entire mempool, we put all the data we
    18+- * need (ie the wtxid and a reference to the actual transaction data)
    19+- * in a vector and iterate over the vector directly. This allows optimal
    20+- * CPU caching behaviour, at a cost of only 40 bytes per transaction.
    21+- */
    22+-ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn)
    23+-{
    24 +ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn) {
    25-     if (cmpctblock.header.IsNull() || (cmpctblock.shorttxids.empty() && cmpctblock.prefilledtxn.empty()))
    26+     LogDebug(BCLog::CMPCTBLOCK, "Initializing PartiallyDownloadedBlock for block %s using a cmpctblock of %u bytes\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
    27      for (size_t i = 0; i < extra_txn.size(); i++) {
    28 -        uint64_t shortid = cmpctblock.GetShortID(extra_txn[i].first);
    29 +        if (extra_txn[i] == nullptr) {
    30@@ -21,8 +39,10 @@
    31                      txn_available[idit->second].reset();
    32 --- a/src/blockencodings.h
    33 +++ b/src/blockencodings.h
    34-     // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
    35+ 
    36+-    // extra_txn is a list of extra transactions to look at, in <witness hash, reference> form
    37 -    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<std::pair<Wtxid, CTransactionRef>>& extra_txn);
    38++    // extra_txn is a list of extra orphan/conflicted/etc transactions to look at
    39 +    ReadStatus InitData(const CBlockHeaderAndShortTxIDs& cmpctblock, const std::vector<CTransactionRef>& extra_txn);
    40      bool IsTxAvailable(size_t index) const;
    41 --- a/src/net_processing.cpp
    42@@ -38,8 +58,19 @@
    43 --- a/src/test/blockencodings_tests.cpp
    44 +++ b/src/test/blockencodings_tests.cpp
    45  
    46--std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
    47-+std::vector<CTransactionRef> extra_txn;
    48+-const std::vector<std::pair<Wtxid, CTransactionRef>> empty_extra_txn;
    49++const std::vector<CTransactionRef> empty_extra_txn;
    50+ 
    51+     CBlock block(BuildBlockTestCase(rand_ctx));
    52+-    std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn;
    53++    std::vector<CTransactionRef> extra_txn;
    54+     extra_txn.resize(10);
    55+         // Add an unrelated tx to extra_txn:
    56+-        extra_txn[0] = {non_block_tx->GetWitnessHash(), non_block_tx};
    57++        extra_txn[0] = non_block_tx;
    58+         // and a tx from the block that's not in the mempool:
    59+-        extra_txn[1] = {block.vtx[1]->GetWitnessHash(), block.vtx[1]};
    60++        extra_txn[1] = block.vtx[1];
    61  
    62 --- a/src/test/fuzz/partially_downloaded_block.cpp
    63 +++ b/src/test/fuzz/partially_downloaded_block.cpp
    
    0DIFF_NOISE="^(diff --git|index|@@)"
    1REVERTING="b7b249d"
    2REVERTED="a03aef9"
    3
    4diff --color -u\
    5  <(git diff -U1 "$REVERTED^..$REVERTED" | grep -Ev "${DIFF_NOISE}") \
    6  <(git diff -U1 "$REVERTING..$REVERTING^" | grep -Ev "${DIFF_NOISE}")
    
     0@@ -1,39 +1,48 @@
     1 --- a/src/blockencodings.cpp
     2 +++ b/src/blockencodings.cpp
     3      LOCK(pool->cs);
     4--    for (size_t i = 0; i < pool->vTxHashes.size(); i++) {
     5--        uint64_t shortid = cmpctblock.GetShortID(pool->vTxHashes[i].first);
     6-+    for (const auto& tx : pool->vTxHashes) {
     7+-    for (const auto& [wtxid, txit] : pool->txns_randomized) {
     8+-        uint64_t shortid = cmpctblock.GetShortID(wtxid);
     9++    for (const auto& tx : pool->txns_randomized) {
    10 +        uint64_t shortid = cmpctblock.GetShortID(tx->GetWitnessHash());
    11          std::unordered_map<uint64_t, uint16_t>::iterator idit = shorttxids.find(shortid);
    12              if (!have_txn[idit->second]) {
    13--                txn_available[idit->second] = pool->vTxHashes[i].second->GetSharedTx();
    14+-                txn_available[idit->second] = txit->GetSharedTx();
    15 +                txn_available[idit->second] = tx;
    16                  have_txn[idit->second]  = true;
    17 --- a/src/test/blockencodings_tests.cpp
    18 +++ b/src/test/blockencodings_tests.cpp
    19  // Number of shared use_counts we expect for a tx we haven't touched
    20--// (block + mempool + our copy from the GetSharedTx call)
    21+-// (block + mempool entry + our copy from the GetSharedTx call)
    22 -constexpr long SHARED_TX_OFFSET{3};
    23-+// (block + mempool entry + mempool vTxHashes + our copy from the GetSharedTx call)
    24++// (block + mempool entry + mempool txns_randomized + our copy from the GetSharedTx call)
    25 +constexpr long SHARED_TX_OFFSET{4};
    26  
    27 --- a/src/txmempool.cpp
    28 +++ b/src/txmempool.cpp
    29  
    30--    vTxHashes.emplace_back(tx.GetWitnessHash(), newit);
    31-+    vTxHashes.emplace_back(newit->GetSharedTx());
    32-     newit->vTxHashesIdx = vTxHashes.size() - 1;
    33-     if (vTxHashes.size() > 1) {
    34-+        // Update vTxHashesIdx of the to-be-moved entry.
    35-+        Assert(GetEntry(vTxHashes.back()->GetHash()))->vTxHashesIdx = it->vTxHashesIdx;
    36-+        // Remove entry from vTxHashes by replacing it with the back and deleting the back.
    37-         vTxHashes[it->vTxHashesIdx] = std::move(vTxHashes.back());
    38--        vTxHashes[it->vTxHashesIdx].second->vTxHashesIdx = it->vTxHashesIdx;
    39-         vTxHashes.pop_back();
    40+-    txns_randomized.emplace_back(tx.GetWitnessHash(), newit);
    41++    txns_randomized.emplace_back(newit->GetSharedTx());
    42+     newit->idx_randomized = txns_randomized.size() - 1;
    43+     if (txns_randomized.size() > 1) {
    44++        // Update idx_randomized of the to-be-moved entry.
    45++        Assert(GetEntry(txns_randomized.back()->GetHash()))->idx_randomized = it->idx_randomized;
    46+         // Remove entry from txns_randomized by replacing it with the back and deleting the back.
    47+         txns_randomized[it->idx_randomized] = std::move(txns_randomized.back());
    48+-        txns_randomized[it->idx_randomized].second->idx_randomized = it->idx_randomized;
    49+         txns_randomized.pop_back();
    50+-        if (txns_randomized.size() * 2 < txns_randomized.capacity()) {
    51++        if (txns_randomized.size() * 2 < txns_randomized.capacity())
    52+             txns_randomized.shrink_to_fit();
    53+-        }
    54+-    } else {
    55++    } else
    56+         txns_randomized.clear();
    57+-    }
    58+ 
    59 --- a/src/txmempool.h
    60 +++ b/src/txmempool.h
    61      using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator;
    62--    std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs); //!< All tx witness hashes/entries in mapTx, in random order
    63-+    std::vector<CTransactionRef> vTxHashes GUARDED_BY(cs); //!< All transactions in mapTx, in random order
    64+-    std::vector<std::pair<Wtxid, txiter>> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx with their wtxids, in arbitrary order
    65++    std::vector<CTransactionRef> txns_randomized GUARDED_BY(cs); //!< All transactions in mapTx, in random order
    
    benchmark % faster
    BlockEncodingLargeExtra 19.9%
    BlockEncodingNoExtra 18.5%
    BlockEncodingStdExtra 18.6%

    % faster calculated as: $1 - \frac{\text{time}_\text{branch}}{\text{time}_\text{master}}$

    0$  ./build/bin/bench_bitcoin --filter="BlockEncoding.*" -min-time=5000
    

    df5a50e5de (master + benchmark added)

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    1,644,722.24 608.01 0.3% 17,098,172.55 7,050,895.83 2.425 1,138,993.60 3.8% 5.47 BlockEncodingLargeExtra
    1,461,528.56 684.22 0.2% 15,650,495.41 6,266,693.24 2.497 1,064,736.70 3.5% 5.49 BlockEncodingNoExtra
    1,476,456.22 677.30 0.9% 15,680,631.05 6,328,282.73 2.478 1,066,421.94 3.5% 5.55 BlockEncodingStdExtra

    b9300d8d0a74b15a220a2ce529d5157d109c7ed3 (extratxn better data locality)

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    1,566,821.01 638.23 0.2% 17,099,295.07 6,718,281.29 2.545 1,134,042.47 3.9% 5.53 BlockEncodingLargeExtra
    1,467,993.68 681.20 0.1% 15,656,135.06 6,294,675.71 2.487 1,064,516.33 3.6% 5.47 BlockEncodingNoExtra
    1,454,058.97 687.73 0.5% 15,687,165.75 6,234,731.73 2.516 1,066,490.20 3.5% 5.48 BlockEncodingStdExtra

    b7b249d3ad (mempool better data locality)

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    1,316,987.01 759.31 0.1% 17,049,029.82 5,652,300.18 3.016 1,133,975.24 3.8% 5.50 BlockEncodingLargeExtra
    1,190,773.91 839.79 0.3% 15,606,374.31 5,102,241.10 3.059 1,064,614.44 3.5% 5.48 BlockEncodingNoExtra
    1,201,347.37 832.40 0.2% 15,637,600.08 5,147,161.34 3.038 1,066,621.86 3.5% 5.43 BlockEncodingStdExtra
  52. cedwies commented at 12:09 pm on August 28, 2025: none

    code-review ACK b7b249d

    Read-through + local bench; results match others which I have seen. This PR restores cache locality in the compact-block reconstruction hot path and adds a well focused benchmark.

    • Reverts the “vector of CTransactionRef” refactors and restores a cache-friendly layout for reconstruction.
    • InitData again takes std::vector<std::pair<Wtxid, CTransactionRef>> extra_txn, causing the wtxid to be available without dereferencing in the hot loop.
    • Early-exit and collision behavior are preserved.
    • Bench design is scoped to the hot path we care about and does not mix in network or dup-resolution behavior.

    Strong improvement. The memory overhead is small and well justified by the speedup.

  53. in src/bench/blockencodings.cpp:126 in df5a50e5de outdated
    121+    BlockEncodingBench(bench, 50000, 100);
    122+}
    123+
    124+static void BlockEncodingLargeExtra(benchmark::Bench& bench)
    125+{
    126+    BlockEncodingBench(bench, 50000, 5000);
    


    instagibbs commented at 1:21 pm on August 28, 2025:
    micro-nit: annotate numbered args
  54. in src/bench/blockencodings.cpp:121 in df5a50e5de outdated
    116+}
    117+
    118+static void BlockEncodingStdExtra(benchmark::Bench& bench)
    119+{
    120+    static_assert(DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN == 100);
    121+    BlockEncodingBench(bench, 50000, 100);
    


    instagibbs commented at 1:21 pm on August 28, 2025:
    micro-nit: annotate numbered args
  55. in src/bench/blockencodings.cpp:115 in df5a50e5de outdated
    110+    });
    111+}
    112+
    113+static void BlockEncodingNoExtra(benchmark::Bench& bench)
    114+{
    115+    BlockEncodingBench(bench, 50000, 0);
    


    instagibbs commented at 1:21 pm on August 28, 2025:
    micro-nit: annotate numbered args
  56. in src/bench/blockencodings.cpp:75 in df5a50e5de outdated
    70+
    71+    // bump up the size of txs
    72+    std::array<std::byte,200> sigspam;
    73+    sigspam.fill(std::byte(42));
    74+
    75+    // a reasonably large mempool of 50k txs, ~10MB total
    


    instagibbs commented at 1:25 pm on August 28, 2025:
    0    // a reasonably large mempool of 50k txs, ~10MB total plus variable extra txns
    
  57. instagibbs approved
  58. instagibbs commented at 1:32 pm on August 28, 2025: member

    ACK b7b249d3adfbd3c7b0c4d0499a86300f57982972

    non-blocking nits only

  59. achow101 merged this on Aug 28, 2025
  60. achow101 closed this on Aug 28, 2025

  61. l0rinc commented at 0:04 am on August 29, 2025: contributor
    post-merge ACK

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-02 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me