optimization: CheckBlock input duplicate detection #31682

pull l0rinc wants to merge 4 commits into bitcoin:master from l0rinc:l0rinc/optimize-CheckBlock-input-duplicate-check changing 5 files +174 −24
  1. l0rinc commented at 3:03 pm on January 18, 2025: contributor

    CheckBlock’s latency is critical for efficiently validating correct inputs during transaction validation, including mempool acceptance and new block creation.

    This PR improves performance and maintainability by introducing the following changes:

    • Benchmarks now measure CheckBlock performance for both valid and invalid cases separately (without serialization cost), ensuring accurate insights into critical paths.
    • Simplified checks for the most common cases (1 or 2 inputs - 70-90% of transactions have a single input).
    • Optimized the general case by replacing std::set with sorted std::vector for improved locality.
    • Added tests for uint256, transaction_identifier, and COutPoint comparisons, and separate validation tests against previous implementation to ensure correctness.

    The goal of this change is to make block validation faster via a low-risk alternative.


    Before:

    ns/block block/s err% total benchmark
    361,535.93 2,765.98 0.5% 11.02 CheckBlockBench
    ns/op op/s err% total benchmark
    3,176,164.16 314.85 0.4% 10.98 DuplicateInputs

    After:

    ns/block block/s err% total benchmark
    186,777.35 5,353.97 0.3% 10.98 CheckBlockBench
    ns/op op/s err% total benchmark
    985,571.51 1,014.64 0.8% 10.91 DuplicateInputs

    ~2x faster CheckBlockBench and ~3x faster DuplicateInputs


    Before:

    ns/block block/s err% ins/block cyc/block IPC bra/block miss% total benchmark
    1,096,261.84 912.19 0.1% 7,963,390.88 3,487,375.26 2.283 1,266,941.00 1.8% 11.03 CheckBlockBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    8,366,309.48 119.53 0.0% 23,865,177.67 26,620,160.23 0.897 5,972,887.41 4.0% 10.78 DuplicateInputs

    After:

    ns/block block/s err% ins/block cyc/block IPC bra/block miss% total benchmark
    845,976.08 1,182.07 0.1% 6,518,255.88 2,691,237.50 2.422 922,585.85 1.5% 10.78 `CheckBlockBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    3,923,195.87 254.89 0.1% 14,660,648.04 12,480,569.27 1.175 3,681,101.94 4.0% 11.00 DuplicateInputs

    ~1.3x faster CheckBlockBench and ~2x faster DuplicateInputs


    While the point of the change isn’t necessarily to speed up IBD, I’ve measured a reindex-chainstate until 879000 blocks multiple times - which consistently showed a ~1% speedup.

     0hyperfine \
     1--runs 2 \
     2--parameter-list COMMIT 2a170319e462ff35129b652f071ba0d0816468dc,d1689061579203ddbe0376a700890c1081b33f20 \
     3--prepare 'rm -rfd /mnt/my_storage/BitcoinData/debug.log build/ && git checkout {COMMIT} && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
     4--cleanup 'mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s.log)' \
     5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=879000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0'
     6
     7Benchmark 1: COMMIT=2a170319e462ff35129b652f071ba0d0816468dc ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=879000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
     8  Time (mean ± σ):     19928.276 s ±  7.167 s    [User: 39987.270 s, System: 829.023 s]
     9  Range (min … max):   19923.208 s … 19933.345 s    2 runs
    10 
    11Benchmark 2: COMMIT=d1689061579203ddbe0376a700890c1081b33f20 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=879000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
    12  Time (mean ± σ):     19709.943 s ± 39.615 s    [User: 39803.778 s, System: 826.234 s]
    13  Range (min … max):   19681.931 s … 19737.955 s    2 runs
    14  
    15Summary
    16  COMMIT=d1689061579203ddbe0376a700890c1081b33f20 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=879000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0 ran
    17    1.01 ± 0.00 times faster than COMMIT=2a170319e462ff35129b652f071ba0d0816468dc ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=879000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
    

    Related PRs:

    • #14397 - very similar solution, I only found it after pushing the PR (was closed for inactivity)
    • #14837 - a faster, but a lot more complex alternative (was closed for complexity and inactivity).
  2. test: Add unit tests for uint256, Txid, and COutPoint comparisons
    - Added tests for sorting and comparison operators (<, >, <=, >=, ==, !=).
    - Validated correctness and stability of sorting using shuffled input.
    2a170319e4
  3. DrahtBot commented at 3:03 pm on January 18, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31682.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK 1440000bytes

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  4. l0rinc marked this as a draft on Jan 18, 2025
  5. l0rinc force-pushed on Jan 18, 2025
  6. DrahtBot commented at 3:18 pm on January 18, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35818754067

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. DrahtBot added the label CI failed on Jan 18, 2025
  8. l0rinc force-pushed on Jan 18, 2025
  9. l0rinc force-pushed on Jan 18, 2025
  10. bench: measure `CheckBlock` speed separately from serialization
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          361,535.93 |            2,765.98 |    0.5% |     11.02 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,176,164.16 |              314.85 |    0.4% |     10.98 | `DuplicateInputs`
    
    > C++ compiler .......................... GNU 13.3.0
    
    |            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        1,096,261.84 |              912.19 |    0.1% |    7,963,390.88 |    3,487,375.26 |  2.283 |   1,266,941.00 |    1.8% |     11.03 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        8,366,309.48 |              119.53 |    0.0% |   23,865,177.67 |   26,620,160.23 |  0.897 |   5,972,887.41 |    4.0% |     10.78 | `DuplicateInputs`
    272c0522cf
  11. l0rinc force-pushed on Jan 18, 2025
  12. l0rinc marked this as ready for review on Jan 18, 2025
  13. DrahtBot removed the label CI failed on Jan 18, 2025
  14. in src/consensus/tx_check.cpp:50 in 0fae6e9615 outdated
    49+        vInOutPoints.reserve(tx.vin.size());
    50+        for (const auto& txin : tx.vin) {
    51+            vInOutPoints.push_back(txin.prevout);
    52+        }
    53+        std::sort(vInOutPoints.begin(), vInOutPoints.end());
    54+        if (std::ranges::adjacent_find(vInOutPoints) != vInOutPoints.end()) [[unlikely]] return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate");
    


    1440000bytes commented at 6:01 pm on January 18, 2025:
    I just reviewed this code and it looks okay to me. However, this is consensus related code and an optimization had caused CVE-2018-17144 so regular contributors might know better.
  15. darosior commented at 1:28 am on January 19, 2025: member
    How much of an improvement does this translate into for a user? The bar should be pretty high for touching such critical code.
  16. consensus,optimization: simplify duplicate checks for trivial inputs
    * Single input transaction: no need to check for duplicates (~70-90% of the cases, see https://transactionfee.info/charts/transactions-1in).
    * Transaction with 2 inputs: no need to create a set.
    * Many inputs: check duplicates by adding them to a sorted set (old method).
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs' -min-time=10000
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          327,098.50 |            3,057.18 |    0.9% |     10.94 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,368,218.02 |              296.89 |    1.0% |     11.13 | `DuplicateInputs`
    093b3b49c2
  17. consensus,optimization: replace tree with sorted vector
    A pre-sized vector retains locality (enabling SIMD operations), speeding up sorting and equality checks.
    
    To make sure the new branches are covered, `basic_transaction_tests` was extended with separate 2 & 3 duplicate cases, and a randomized one comparing against the old implementation. The iterations and ranges were chosen such that every new branch is expected to be hit once.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          186,777.35 |            5,353.97 |    0.3% |     10.98 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          985,571.51 |            1,014.64 |    0.8% |     10.91 | `DuplicateInputs`
    
    > C++ compiler .......................... GNU 13.3.0
    
    |            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |          845,976.08 |            1,182.07 |    0.1% |    6,518,255.88 |    2,691,237.50 |  2.422 |     922,585.85 |    1.5% |     10.78 | `CheckBlockBench
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        3,923,195.87 |              254.89 |    0.1% |   14,660,648.04 |   12,480,569.27 |  1.175 |   3,681,101.94 |    4.0% |     11.00 | `DuplicateInputs`
    ec62db0f48
  18. l0rinc force-pushed on Jan 19, 2025
  19. l0rinc commented at 6:20 pm on January 19, 2025: contributor

    The bar should be pretty high for touching such critical code.

    Absolutely, every change should have an extremely high bar here. For context, compared to the changes described in https://bitcoincore.org/en/2018/09/20/notice, this update doesn’t involve caching or reusing previously calculated values elsewhere. Instead, it operates entirely within the method, maintaining a black-box approach that ensures the exact same behavior. To validate this, I’ve added an extensive suite of tests that thoroughly cover these cases.

    How much of an improvement does this translate into for a user?

    The primary goal of this change is to improve block validation performance while maintaining a low risk profile. The PR description includes detailed measurements of the improvements. If there are additional benchmarks or scenarios you’d like me to measure, please let me know, and I’ll be happy to provide them.

  20. mzumsande commented at 7:36 pm on January 19, 2025: contributor

    This is not just affecting CheckBlock, it is called whenever a tx is validated (so also during mempool acceptance).

    The reason that this check is increased significantly percent-wise by these changes might just be that the non-contextual checks are extremely fast in the first place (compared with contextual checks such as script verification / input lookup). Since the non-contextual checks aren’t really ever performed in isolation, an interesting benchmark for me would be how much this improves transaction validation as a whole - whether it’s in the context of a mempool acceptance or IBD benchmark.

    CheckBlock’s latency is critical both for quickly discarding invalid blocks

    I don’t think that this is the case, since it’s guarded by PoW - it’s very expensive to create blocks with a valid header that fail CheckBlock, so it’s impossible to create them in bulk. So if we do encounter them, I’d say that performance is not really critical, unless the effect is extreme (in the order of seconds / minutes).

  21. l0rinc commented at 6:51 pm on January 20, 2025: contributor

    it is called whenever a tx is validated (so also during mempool acceptance)

    Updated the description, thanks.

    how much this improves transaction validation as a whole

    I can only optimize small parts of these, step-by-step. This was the next part I found that could be cheaply sped up (having preexisting tests, benchmarks - signaling that others were also interested in it).

    I have created a chainman.ProcessTransaction benchmark locally (will commit separately, not strictly related to this PR) to check the overall effect, but as you’ve stated, this is a small part of the whole process, so I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it’s proportional to the risk or not.

    0git checkout bf090135d01076aa6d4fbfe7a185682dd6f9f489 >/dev/null 2>&1 \
    1&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
    2&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
    3&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000 \
    4&& git checkout f25b91cd5a50e1035bbdb884ab11858702fadf33 >/dev/null 2>&1 \
    5&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
    6&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
    7&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    56,570.90 17,676.93 0.1% 229,118.71 179,945.97 1.273 15,499.48 0.5% 11.03 ProcessTransactionBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    56,152.78 17,808.56 0.0% 228,009.18 178,596.02 1.277 15,224.76 0.4% 10.92 ProcessTransactionBench

    Similarly, I ran a few reindex-chainstates (as proxies for a stable IBD) and - as posted in the description -, got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after). Not earth-shattering globally, but I think the risk (given all the tests and benchmarks) is relatively low as well.

  22. 1440000bytes commented at 8:06 pm on January 20, 2025: none

    I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it’s proportional to the risk or not.

    got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after).

    NACK


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me