optimization: speed up CheckBlock input checks (duplicate detection & nulls) #31682

pull l0rinc wants to merge 7 commits into bitcoin:master from l0rinc:l0rinc/optimize-CheckBlock-input-duplicate-check changing 6 files +311 −40
  1. l0rinc commented at 3:03 pm on January 18, 2025: contributor

    CheckBlock’s latency is critical for efficiently validating correct inputs during transaction validation, including mempool acceptance and new block creation.

    This PR improves performance and maintainability by introducing the following changes:

    • Simplified checks for the most common cases (1 or 2 inputs - 70-90% of transactions have a single input).
    • Optimized the general case by replacing std::set with sorted std::vector for improved locality.
    • Simplified Null prevout checks from linear to constant time.

    The goal of this change is to make block validation faster via a low-risk alternative.


    Before:

    ns/block block/s err% total benchmark
    372,743.63 2,682.81 1.1% 10.99 CheckBlockBench
    ns/op op/s err% total benchmark
    3,304,694.54 302.60 0.5% 11.05 DuplicateInputs
    9,585.10 104,328.55 0.1% 11.03 ProcessTransactionBench

    After:

    ns/block block/s err% total benchmark
    179,971.00 5,556.45 0.3% 11.02 CheckBlockBench
    ns/op op/s err% total benchmark
    963,177.98 1,038.23 1.7% 10.92 DuplicateInputs
    9,410.90 106,259.75 0.3% 11.01 ProcessTransactionBench
    • 2.07x faster CheckBlockBench
    • 3.4x faster DuplicateInputs
    • 1.8% faster ProcessTransactionBench

    Before:

    ns/block block/s err% ins/block cyc/block IPC bra/block miss% total benchmark
    1,096,261.84 912.19 0.1% 7,963,390.88 3,487,375.26 2.283 1,266,941.00 1.8% 11.03 CheckBlockBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    8,366,309.48 119.53 0.0% 23,865,177.67 26,620,160.23 0.897 5,972,887.41 4.0% 10.78 DuplicateInputs
    56,199.57 17,793.73 0.1% 229,263.01 178,766.31 1.282 15,509.97 0.5% 10.91 ProcessTransactionBench

    After:

    ns/block block/s err% ins/block cyc/block IPC bra/block miss% total benchmark
    834,855.94 1,197.81 0.0% 6,518,548.86 2,656,039.78 2.454 919,160.84 1.5% 10.78 CheckBlockBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    4,261,492.75 234.66 0.0% 17,379,823.40 13,559,793.33 1.282 4,265,714.28 3.4% 11.00 DuplicateInputs
    55,819.53 17,914.88 0.1% 227,828.15 177,520.09 1.283 15,184.36 0.4% 10.91 ProcessTransactionBench
    • 31% faster CheckBlockBench
    • 2x faster DuplicateInputs
    • 0.5% faster ProcessTransactionBench

    While the point of the change isn’t necessarily to speed up IBD, but you can see the measurements at #31682 (comment)


    Related PRs:

    • #14397 - very similar solution, I only found it after pushing the PR (was closed for inactivity)
    • #14837 - a faster, but a lot more complex alternative (was closed for complexity and inactivity).
  2. DrahtBot commented at 3:03 pm on January 18, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31682.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK 1440000bytes
    Concept ACK sipa

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31868 (optimization: speed up block serialization by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. l0rinc marked this as a draft on Jan 18, 2025
  4. l0rinc force-pushed on Jan 18, 2025
  5. DrahtBot commented at 3:18 pm on January 18, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35818754067

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. DrahtBot added the label CI failed on Jan 18, 2025
  7. l0rinc force-pushed on Jan 18, 2025
  8. l0rinc force-pushed on Jan 18, 2025
  9. l0rinc force-pushed on Jan 18, 2025
  10. l0rinc marked this as ready for review on Jan 18, 2025
  11. DrahtBot removed the label CI failed on Jan 18, 2025
  12. in src/consensus/tx_check.cpp:50 in 0fae6e9615 outdated
    49+        vInOutPoints.reserve(tx.vin.size());
    50+        for (const auto& txin : tx.vin) {
    51+            vInOutPoints.push_back(txin.prevout);
    52+        }
    53+        std::sort(vInOutPoints.begin(), vInOutPoints.end());
    54+        if (std::ranges::adjacent_find(vInOutPoints) != vInOutPoints.end()) [[unlikely]] return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-inputs-duplicate");
    


    1440000bytes commented at 6:01 pm on January 18, 2025:
    I just reviewed this code and it looks okay to me. However, this is consensus related code and an optimization had caused CVE-2018-17144 so regular contributors might know better.
  13. darosior commented at 1:28 am on January 19, 2025: member
    How much of an improvement does this translate into for a user? The bar should be pretty high for touching such critical code.
  14. l0rinc force-pushed on Jan 19, 2025
  15. l0rinc commented at 6:20 pm on January 19, 2025: contributor

    The bar should be pretty high for touching such critical code.

    Absolutely, every change should have an extremely high bar here. For context, compared to the changes described in https://bitcoincore.org/en/2018/09/20/notice, this update doesn’t involve caching or reusing previously calculated values elsewhere. Instead, it operates entirely within the method, maintaining a black-box approach that ensures the exact same behavior. To validate this, I’ve added an extensive suite of tests that thoroughly cover these cases.

    How much of an improvement does this translate into for a user?

    The primary goal of this change is to improve block validation performance while maintaining a low risk profile. The PR description includes detailed measurements of the improvements. If there are additional benchmarks or scenarios you’d like me to measure, please let me know, and I’ll be happy to provide them.

  16. mzumsande commented at 7:36 pm on January 19, 2025: contributor

    This is not just affecting CheckBlock, it is called whenever a tx is validated (so also during mempool acceptance).

    The reason that this check is increased significantly percent-wise by these changes might just be that the non-contextual checks are extremely fast in the first place (compared with contextual checks such as script verification / input lookup). Since the non-contextual checks aren’t really ever performed in isolation, an interesting benchmark for me would be how much this improves transaction validation as a whole - whether it’s in the context of a mempool acceptance or IBD benchmark.

    CheckBlock’s latency is critical both for quickly discarding invalid blocks

    I don’t think that this is the case, since it’s guarded by PoW - it’s very expensive to create blocks with a valid header that fail CheckBlock, so it’s impossible to create them in bulk. So if we do encounter them, I’d say that performance is not really critical, unless the effect is extreme (in the order of seconds / minutes).

  17. l0rinc commented at 6:51 pm on January 20, 2025: contributor

    it is called whenever a tx is validated (so also during mempool acceptance)

    Updated the description, thanks.

    how much this improves transaction validation as a whole

    I can only optimize small parts of these, step-by-step. This was the next part I found that could be cheaply sped up (having preexisting tests, benchmarks - signaling that others were also interested in it).

    I have created a chainman.ProcessTransaction benchmark locally (will commit separately, not strictly related to this PR) to check the overall effect, but as you’ve stated, this is a small part of the whole process, so I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it’s proportional to the risk or not.

    0git checkout bf090135d01076aa6d4fbfe7a185682dd6f9f489 >/dev/null 2>&1 \
    1&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
    2&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
    3&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000 \
    4&& git checkout f25b91cd5a50e1035bbdb884ab11858702fadf33 >/dev/null 2>&1 \
    5&& cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 \
    6&& cmake --build build -j$(nproc) >/dev/null 2>&1 \
    7&& build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    56,570.90 17,676.93 0.1% 229,118.71 179,945.97 1.273 15,499.48 0.5% 11.03 ProcessTransactionBench
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    56,152.78 17,808.56 0.0% 228,009.18 178,596.02 1.277 15,224.76 0.4% 10.92 ProcessTransactionBench

    Similarly, I ran a few reindex-chainstates (as proxies for a stable IBD) and - as posted in the description -, got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after). Not earth-shattering globally, but I think the risk (given all the tests and benchmarks) is relatively low as well.

  18. 1440000bytes commented at 8:06 pm on January 20, 2025: none

    I regularly got roughly ~1% speedup only. Not huge, but we have to decide if it’s proportional to the risk or not.

    got a similar ~1% speedup (2 runs per bench, 19923 & 19933 seconds before, and 19681 and 19737 seconds after).

    NACK

  19. laanwj added the label Validation on Jan 21, 2025
  20. l0rinc commented at 2:09 pm on January 21, 2025: contributor

    @1440000bytes, this behavior isn’t helpful - you’re just reiterating what I’ve already explained, without suggesting workable solutions.

    Anyway, I’ve created an alternative PR that focuses solely on the tests and benchmarks, excluding the controversial optimizations. It validates the affected code and removes microbenchmarks that overemphasize this segment’s significance, replacing it with a higher-level one, as suggested by @mzumsande.

  21. l0rinc marked this as a draft on Jan 21, 2025
  22. 1440000bytes commented at 2:34 pm on January 21, 2025: none

    @1440000bytes, this behavior isn’t helpful - you’re just reiterating what I’ve already explained, without suggesting workable solutions.

    I don’t think optimization is worth changing consensus code.

  23. in src/consensus/tx_check.cpp:42 in 093b3b49c2 outdated
    37@@ -38,10 +38,17 @@ bool CheckTransaction(const CTransaction& tx, TxValidationState& state)
    38     // of a tx as spent, it does not check if the tx has duplicate inputs.
    39     // Failure to run this check will result in either a crash or an inflation bug, depending on the implementation of
    40     // the underlying coins database.
    41-    std::set<COutPoint> vInOutPoints;
    42-    for (const auto& txin : tx.vin) {
    43-        if (!vInOutPoints.insert(txin.prevout).second)
    44+    if (tx.vin.size() == 2) {
    45+        if (tx.vin[0].prevout == tx.vin[1].prevout) [[unlikely]] {
    


    theuni commented at 4:04 pm on January 21, 2025:

    to allow the compiler to optimize for the case where paths of execution including that statement are less likely than any alternative path of execution that does not include such a statement

    See here for why this is probably a bad idea: https://blog.aaronballman.com/2020/08/dont-use-the-likely-or-unlikely-attributes/

    tl;dr: It’s impl-defined whether or not this makes the parent if(tx.vin.size() == 2) unlikely as well.


    l0rinc commented at 4:48 pm on January 21, 2025:
    Thanks for the context! Do you think it makes sense to update that here, or should I just close the PR and leave #31699 as a follow-up?

    l0rinc commented at 9:40 am on January 22, 2025:
    Fixed, thanks
  24. l0rinc force-pushed on Jan 22, 2025
  25. l0rinc commented at 9:50 am on January 22, 2025: contributor

    I pushed a new version that is based on #31699 - to separate the testing/benching from the consensus change. This will remain in draft to gather comments.


    I’ve also retriggered an IBD (simulated via multiple -reindex-chainstate runs for stability). Here I’ve compared the performance of this PR after applying my other IBD optimizations (https://github.com/bitcoin/bitcoin/pull/31144 and #31645) to measure the future potential of this change. I ran it until 800k block this time to avoid the validation overhead.

     0hyperfine --runs 2 --parameter-list COMMIT dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a,053ccbefa3563e776932bb847aae9c302d07de61 --prepare 'rm -rfd /mnt/my_storage/BitcoinData/debug.log build/ && git checkout {COMMIT} && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' --cleanup 'mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s.log)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0'
     1Benchmark 1: COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
     2  Time (mean ± σ):     12537.220 s ±  7.591 s    [User: 13362.903 s, System: 523.533 s]
     3  Range (min … max):   12531.852 s … 12542.588 s    2 runs
     4 
     5Benchmark 2: COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
     6  Time (mean ± σ):     12251.387 s ± 52.389 s    [User: 13169.814 s, System: 543.757 s]
     7  Range (min … max):   12214.343 s … 12288.432 s    2 runs
     8 
     9Summary
    10  COMMIT=053ccbefa3563e776932bb847aae9c302d07de61 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0 ran
    11    1.02 ± 0.00 times faster than COMMIT=dcc3fe7f9c4b68a4b4622ef958a3262cd8a4b08a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=30000 -reindex-chainstate -connect=0 -printtoconsole=0
    

    This reveals a ~2.5% speedup (taking the best runs out of 2).

  26. l0rinc renamed this:
    optimization: `CheckBlock` input duplicate detection
    optimization,RFC: `CheckBlock` input duplicate detection
    on Jan 22, 2025
  27. l0rinc renamed this:
    optimization,RFC: `CheckBlock` input duplicate detection
    RFC: optimize `CheckBlock` input duplicate detection
    on Jan 22, 2025
  28. 1440000bytes commented at 8:23 pm on January 22, 2025: none
    I don’t like your behavior to keep pushing for minimal changes to affect consensus code
  29. pinheadmz commented at 8:25 pm on January 22, 2025: member
    @1440000bytes keep your comments on topic. the topic is technical. comments about people are not relevant.
  30. 1440000bytes commented at 8:29 pm on January 22, 2025: none

    @1440000bytes keep your comments on topic. the topic is technical. comments about people are not relevant.

    It is on topic and response to #31682 (comment)

    I have reviewed technically before others.

  31. pinheadmz commented at 8:30 pm on January 22, 2025: member
    Everyone needs to cool down on this pull request. 24 hour bans are next for users that mention people and not code. After that its 7 day bans.
  32. 1440000bytes commented at 9:11 pm on January 22, 2025: none
    Merge this pull request if my comments are so off topic.
  33. Julian128 commented at 11:02 am on January 23, 2025: none
    Interesting how std::vector is almost always the best performing container, even when uniqueness or sorting is required.
  34. luke-jr commented at 3:33 pm on January 23, 2025: member

    std::vector is probably less likely to have bugs that std::set, so there’s an improvement in that regard as well.

    I feel like we shouldn’t need to fully sort, though. Perhaps a custom check would improve performance more significantly? (OTOH, the “normal flow” scenario might end up being a full sort anyway, unsure)

  35. in src/test/transaction_tests.cpp:412 in 9b855f18e0 outdated
    412-    // Random real transaction (e2769b09e784f32f62ef849763d4f45b98e07ba658647343b915ff832b110436)
    413-    unsigned char ch[] = {0x01, 0x00, 0x00, 0x00, 0x01, 0x6b, 0xff, 0x7f, 0xcd, 0x4f, 0x85, 0x65, 0xef, 0x40, 0x6d, 0xd5, 0xd6, 0x3d, 0x4f, 0xf9, 0x4f, 0x31, 0x8f, 0xe8, 0x20, 0x27, 0xfd, 0x4d, 0xc4, 0x51, 0xb0, 0x44, 0x74, 0x01, 0x9f, 0x74, 0xb4, 0x00, 0x00, 0x00, 0x00, 0x8c, 0x49, 0x30, 0x46, 0x02, 0x21, 0x00, 0xda, 0x0d, 0xc6, 0xae, 0xce, 0xfe, 0x1e, 0x06, 0xef, 0xdf, 0x05, 0x77, 0x37, 0x57, 0xde, 0xb1, 0x68, 0x82, 0x09, 0x30, 0xe3, 0xb0, 0xd0, 0x3f, 0x46, 0xf5, 0xfc, 0xf1, 0x50, 0xbf, 0x99, 0x0c, 0x02, 0x21, 0x00, 0xd2, 0x5b, 0x5c, 0x87, 0x04, 0x00, 0x76, 0xe4, 0xf2, 0x53, 0xf8, 0x26, 0x2e, 0x76, 0x3e, 0x2d, 0xd5, 0x1e, 0x7f, 0xf0, 0xbe, 0x15, 0x77, 0x27, 0xc4, 0xbc, 0x42, 0x80, 0x7f, 0x17, 0xbd, 0x39, 0x01, 0x41, 0x04, 0xe6, 0xc2, 0x6e, 0xf6, 0x7d, 0xc6, 0x10, 0xd2, 0xcd, 0x19, 0x24, 0x84, 0x78, 0x9a, 0x6c, 0xf9, 0xae, 0xa9, 0x93, 0x0b, 0x94, 0x4b, 0x7e, 0x2d, 0xb5, 0x34, 0x2b, 0x9d, 0x9e, 0x5b, 0x9f, 0xf7, 0x9a, 0xff, 0x9a, 0x2e, 0xe1, 0x97, 0x8d, 0xd7, 0xfd, 0x01, 0xdf, 0xc5, 0x22, 0xee, 0x02, 0x28, 0x3d, 0x3b, 0x06, 0xa9, 0xd0, 0x3a, 0xcf, 0x80, 0x96, 0x96, 0x8d, 0x7d, 0xbb, 0x0f, 0x91, 0x78, 0xff, 0xff, 0xff, 0xff, 0x02, 0x8b, 0xa7, 0x94, 0x0e, 0x00, 0x00, 0x00, 0x00, 0x19, 0x76, 0xa9, 0x14, 0xba, 0xde, 0xec, 0xfd, 0xef, 0x05, 0x07, 0x24, 0x7f, 0xc8, 0xf7, 0x42, 0x41, 0xd7, 0x3b, 0xc0, 0x39, 0x97, 0x2d, 0x7b, 0x88, 0xac, 0x40, 0x94, 0xa8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x19, 0x76, 0xa9, 0x14, 0xc1, 0x09, 0x32, 0x48, 0x3f, 0xec, 0x93, 0xed, 0x51, 0xf5, 0xfe, 0x95, 0xe7, 0x25, 0x59, 0xf2, 0xcc, 0x70, 0x43, 0xf9, 0x88, 0xac, 0x00, 0x00, 0x00, 0x00, 0x00};
    414-    std::vector<unsigned char> vch(ch, ch + sizeof(ch) -1);
    415-    DataStream stream(vch);
    416+    // Serialized random real transaction (e2769b09e784f32f62ef849763d4f45b98e07ba658647343b915ff832b110436)
    417+    std::vector<unsigned char> vch{0x01, 0x00, 0x00, 0x00, 0x01, 0x6b, 0xff, 0x7f, 0xcd, 0x4f, 0x85, 0x65, 0xef, 0x40, 0x6d, 0xd5, 0xd6, 0x3d, 0x4f, 0xf9, 0x4f, 0x31, 0x8f, 0xe8, 0x20, 0x27, 0xfd, 0x4d, 0xc4, 0x51, 0xb0, 0x44, 0x74, 0x01, 0x9f, 0x74, 0xb4, 0x00, 0x00, 0x00, 0x00, 0x8c, 0x49, 0x30, 0x46, 0x02, 0x21, 0x00, 0xda, 0x0d, 0xc6, 0xae, 0xce, 0xfe, 0x1e, 0x06, 0xef, 0xdf, 0x05, 0x77, 0x37, 0x57, 0xde, 0xb1, 0x68, 0x82, 0x09, 0x30, 0xe3, 0xb0, 0xd0, 0x3f, 0x46, 0xf5, 0xfc, 0xf1, 0x50, 0xbf, 0x99, 0x0c, 0x02, 0x21, 0x00, 0xd2, 0x5b, 0x5c, 0x87, 0x04, 0x00, 0x76, 0xe4, 0xf2, 0x53, 0xf8, 0x26, 0x2e, 0x76, 0x3e, 0x2d, 0xd5, 0x1e, 0x7f, 0xf0, 0xbe, 0x15, 0x77, 0x27, 0xc4, 0xbc, 0x42, 0x80, 0x7f, 0x17, 0xbd, 0x39, 0x01, 0x41, 0x04, 0xe6, 0xc2, 0x6e, 0xf6, 0x7d, 0xc6, 0x10, 0xd2, 0xcd, 0x19, 0x24, 0x84, 0x78, 0x9a, 0x6c, 0xf9, 0xae, 0xa9, 0x93, 0x0b, 0x94, 0x4b, 0x7e, 0x2d, 0xb5, 0x34, 0x2b, 0x9d, 0x9e, 0x5b, 0x9f, 0xf7, 0x9a, 0xff, 0x9a, 0x2e, 0xe1, 0x97, 0x8d, 0xd7, 0xfd, 0x01, 0xdf, 0xc5, 0x22, 0xee, 0x02, 0x28, 0x3d, 0x3b, 0x06, 0xa9, 0xd0, 0x3a, 0xcf, 0x80, 0x96, 0x96, 0x8d, 0x7d, 0xbb, 0x0f, 0x91, 0x78, 0xff, 0xff, 0xff, 0xff, 0x02, 0x8b, 0xa7, 0x94, 0x0e, 0x00, 0x00, 0x00, 0x00, 0x19, 0x76, 0xa9, 0x14, 0xba, 0xde, 0xec, 0xfd, 0xef, 0x05, 0x07, 0x24, 0x7f, 0xc8, 0xf7, 0x42, 0x41, 0xd7, 0x3b, 0xc0, 0x39, 0x97, 0x2d, 0x7b, 0x88, 0xac, 0x40, 0x94, 0xa8, 0x02, 0x00, 0x00, 0x00, 0x00, 0x19, 0x76, 0xa9, 0x14, 0xc1, 0x09, 0x32, 0x48, 0x3f, 0xec, 0x93, 0xed, 0x51, 0xf5, 0xfe, 0x95, 0xe7, 0x25, 0x59, 0xf2, 0xcc, 0x70, 0x43, 0xf9, 0x88, 0xac, 0x00, 0x00, 0x00, 0x00, 0x00};
    


    sipa commented at 3:50 pm on January 23, 2025:

    This could be written more concisely now:

    0static CMutableTransaction CreateTransaction()
    1{
    2    // Serialized random real transaction (e2769b09e784f32f62ef849763d4f45b98e07ba658647343b915ff832b110436)
    3    static constexpr auto ser_tx = "01000000016bff7fcd4f8565ef406dd5d63d4ff94f318fe82027fd4dc451b04474019f74b4000000008c493046022100da0dc6aecefe1e06efdf05773757deb168820930e3b0d03f46f5fcf150bf990c022100d25b5c87040076e4f253f8262e763e2dd51e7ff0be157727c4bc42807f17bd39014104e6c26ef67dc610d2cd192484789a6cf9aea9930b944b7e2db5342b9d9e5b9ff79aff9a2ee1978dd7fd01dfc522ee02283d3b06a9d03acf8096968d7dbb0f9178ffffffff028ba7940e000000001976a914badeecfdef0507247fc8f74241d73bc039972d7b88ac4094a802000000001976a914c10932483fec93ed51f5fe95e72559f2cc7043f988ac0000000000"_hex;
    4    CMutableTransaction tx;
    5    DataStream(ser_tx) >> TX_WITH_WITNESS(tx);
    6    return tx;
    7}
    

    l0rinc commented at 5:29 pm on January 23, 2025:
  36. in src/test/transaction_tests.cpp:1102 in 9b855f18e0 outdated
    1097+        uint256{2},
    1098+        uint256{3}
    1099+    };
    1100+
    1101+    std::vector shuffled{original};
    1102+    std::ranges::shuffle(shuffled, FastRandomContext());
    


    sipa commented at 4:04 pm on January 23, 2025:

    You can access the test context’s RNG here:

    0std::ranges::shuffle(shuffled, m_rng);
    

    l0rinc commented at 5:29 pm on January 23, 2025:
  37. in src/test/transaction_tests.cpp:1136 in 9b855f18e0 outdated
    1131+        Txid::FromUint256(uint256{2}),
    1132+        Txid::FromUint256(uint256{3})
    1133+    };
    1134+
    1135+    std::vector shuffled{original};
    1136+    std::ranges::shuffle(shuffled, FastRandomContext());
    


    sipa commented at 4:08 pm on January 23, 2025:

    Same here,

    0std::ranges::shuffle(shuffled, m_rng);
    

    l0rinc commented at 5:29 pm on January 23, 2025:
  38. in src/bench/mempool_stress.cpp:187 in 323a00338d outdated
    182+    });
    183+}
    184+
    185 BENCHMARK(ComplexMemPool, benchmark::PriorityLevel::HIGH);
    186 BENCHMARK(MempoolCheck, benchmark::PriorityLevel::HIGH);
    187+BENCHMARK(ProcessTransactionBench, benchmark::PriorityLevel::HIGH);
    


    sipa commented at 4:12 pm on January 23, 2025:
    Nit: newline at end of file

    l0rinc commented at 5:29 pm on January 23, 2025:
    Done
  39. in src/bench/duplicate_inputs.cpp:79 in 323a00338d outdated
    73-        assert(!CheckBlock(block, cvstate, chainparams.GetConsensus(), false, false));
    74-        assert(cvstate.GetRejectReason() == "bad-txns-inputs-duplicate");
    75-    });
    76-}
    77-
    78-BENCHMARK(DuplicateInputs, benchmark::PriorityLevel::HIGH);
    


    sipa commented at 4:17 pm on January 23, 2025:
    Adding a new, more high-level, benchmark is great, but I don’t see why you’d delete the existing benchmark?

    l0rinc commented at 5:29 pm on January 23, 2025:
    It was suggested that these may be too low level and may be misleading (distract us from real bottlenecks) - I’ve restored them so that reviewers can decide

    sipa commented at 5:31 pm on January 23, 2025:
    If so, I disagree with that argument. The presence of a benchmark does not imply it’s somehow a prioritized target for optimization. It’s useful as-is in order to not accidentally introduce performance regressions too.
  40. sipa commented at 4:48 pm on January 23, 2025: member

    Concept ACK.

    I agree with the concerns about being cautious when changing such consensus-critical code, especially given the fact that we’ve had an actual critical vulnerability related to duplicate checking. However, the changes here are very confined to a single function, easily reviewed, and I have a general preference for avoiding std::set when sorted std::vector works too as well (even beyond what benchmarks can measure, it reduces memory fragmentation e.g.).

    That said, I absolutely respect the view “This optimization is not worth the risk”, and I don’t think it requires “suggesting workable solutions”. There is no problem to be solved; it’s an improvement, and both “doing it” and “not doing anything like this at all” are valid choices we can make.

  41. l0rinc force-pushed on Jan 23, 2025
  42. maflcko commented at 10:44 am on January 24, 2025: member
    (Side-note: In theory std::flat_set constructor (4) (https://en.cppreference.com/w/cpp/container/flat_set/flat_set) could be used as a drop-in replacement to do the sort+uniq, but this requires C++23 and isn’t even implemented in any released standard lib and would only turn two calls into one.)
  43. test: validate duplicate detection in `CheckTransaction`
    The `CheckTransaction` validation function in https://github.com/bitcoin/bitcoin/blob/master/src/consensus/tx_check.cpp#L41-L45 relies on a correct ordering relation for detecting duplicate transaction inputs.
    
    This update to the tests ensures that:
    * Accurate detection of duplicates: Beyond trivial cases (e.g., two identical inputs), duplicates are detected correctly in more complex scenarios.
    * Consistency across methods: Both sorted sets and hash-based sets behave identically when detecting duplicates for `COutPoint` and related values.
    * Robust ordering and equality relations: The function maintains expected behavior for ordering and equality checks.
    
    Using randomized testing with shuffled inputs (to avoid any remaining bias introduced), the enhanced test validates that `CheckTransaction` remains robust and reliable across various input configurations. It confirms identical behavior to a hashing-based duplicate detection mechanism, ensuring consistency and correctness.
    
    To make sure the new branches in the follow-up commits will be covered, `basic_transaction_tests` was extended a randomized test one comparing against the old implementation (and also an alternative duplicate). The iterations and ranges were chosen such that every new branch is expected to be hit once.
    6421b986f6
  44. bench: measure `CheckBlock` speed separately from serialization
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          372,743.63 |            2,682.81 |    1.1% |     10.99 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,304,694.54 |              302.60 |    0.5% |     11.05 | `DuplicateInputs`
    
    > C++ compiler .......................... GNU 13.3.0
    
    |            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        1,096,261.84 |              912.19 |    0.1% |    7,963,390.88 |    3,487,375.26 |  2.283 |   1,266,941.00 |    1.8% |     11.03 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        8,366,309.48 |              119.53 |    0.0% |   23,865,177.67 |   26,620,160.23 |  0.897 |   5,972,887.41 |    4.0% |     10.78 | `DuplicateInputs`
    8f81c3d994
  45. l0rinc force-pushed on Jan 25, 2025
  46. l0rinc commented at 2:43 pm on January 25, 2025: contributor

    I feel like we shouldn’t need to fully sort, though

    We likely wouldn’t need that just for duplicate checking, but this seems “good enough”, since it’s a built-in function - especially now that I have integrated it with null checks as well.

    The code now separates (in many tiny commits to ease review) validations by input sizes:

    • 1 input - it’s either coinbase, in which case we don’t need duplicate checks and don’t need to check that none of the inputs are null, or if it’s not, we’re done with validation. Given that this is the most common case currently, seemed like an important case to tread separately.
    • 2 inputs - we check that the two are not equal and that they’re not null
    • otherwise we sort the prevouts, check duplicates via built-in (possibly SIMD optimized) method and check for nulls at the head of the sorted values in constant time (we could also check NULL during sortedPrevouts collection but that would change the validation order for duplicates & nulls).

    I’ve also extended the tests to cover random null check validations.

  47. bench: add `ProcessTransactionBench` to measure `CheckBlock` in context
    The newly introduced `ProcessTransactionBench` incorporates multiple steps in the validation pipeline, offering a more comprehensive view of `CheckBlock` performance within a realistic transaction validation context.
    
    Previous microbenchmarks, such as DeserializeAndCheckBlockTest and DuplicateInputs, focused on isolated aspects of transaction and block validation. While these tests provided valuable insights for targeted profiling, they lacked context regarding the broader validation process, where interactions between components play a critical role.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='ProcessTransactionBench' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |            9,585.10 |          104,328.55 |    0.1% |     11.03 | `ProcessTransactionBench`
    
    > C++ compiler .......................... GNU 13.3.0
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |           56,199.57 |           17,793.73 |    0.1% |      229,263.01 |      178,766.31 |  1.282 |      15,509.97 |    0.5% |     10.91 | `ProcessTransactionBench`
    7c4f0d045c
  48. optimization: move duplicate checks outside of coinbase branch
    `IsCoinBase` means single input with NULL prevout, so it makes sense to restrict duplicate check to non-coinbase transactions only.
    The behavior is the same as before, except that single-input-transactions aren't checked for duplicates anymore (~70-90% of the cases, see https://transactionfee.info/charts/transactions-1in).
    I've added braces to the conditions and loops to simplify review of followup commits.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          335,917.12 |            2,976.92 |    1.3% |     11.01 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,286,337.42 |              304.29 |    1.1% |     10.90 | `DuplicateInputs`
    |            9,561.02 |          104,591.35 |    0.2% |     11.02 | `ProcessTransactionBench`
    c161ef8ac4
  49. optimization: simplify duplicate checks for trivial inputs
    No need to create a set for checking duplicates for two-input-transactions.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          314,137.30 |            3,183.32 |    1.2% |     11.04 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,220,592.73 |              310.50 |    1.3% |     10.92 | `DuplicateInputs`
    |            9,425.98 |          106,089.77 |    0.3% |     11.00 | `ProcessTransactionBench`
    da9cf359a8
  50. optimization: replace tree with sorted vector
    A pre-sized vector retains locality (enabling SIMD operations), speeding up sorting and equality checks.
    It's also simpler (therefore more reliable) than a sorted set. It also causes less memory fragmentation.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          181,922.54 |            5,496.85 |    0.2% |     10.98 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          997,739.30 |            1,002.27 |    1.0% |     10.94 | `DuplicateInputs`
    |            9,449.28 |          105,828.15 |    0.3% |     10.99 | `ProcessTransactionBench`
    
    Co-authored-by: Pieter Wuille <pieter@wuille.net>
    cd94cd7de4
  51. optimization: look for NULL prevouts in the sorted values
    For the 2 input case we simply check them both, like we did with equality.
    
    For the general case, we take advantage of sorting, making invalid value detection constant time instead of linear in the worst case.
    
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='CheckBlockBench|DuplicateInputs|ProcessTransactionBench' -min-time=10000
    
    > C++ compiler .......................... AppleClang 16.0.0.16000026
    
    |            ns/block |             block/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          179,971.00 |            5,556.45 |    0.3% |     11.02 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          963,177.98 |            1,038.23 |    1.7% |     10.92 | `DuplicateInputs`
    |            9,410.90 |          106,259.75 |    0.3% |     11.01 | `ProcessTransactionBench`
    
    > C++ compiler .......................... GNU 13.3.0
    
    |            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |          834,855.94 |            1,197.81 |    0.0% |    6,518,548.86 |    2,656,039.78 |  2.454 |     919,160.84 |    1.5% |     10.78 | `CheckBlockBench`
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        4,261,492.75 |              234.66 |    0.0% |   17,379,823.40 |   13,559,793.33 |  1.282 |   4,265,714.28 |    3.4% |     11.00 | `DuplicateInputs`
    |           55,819.53 |           17,914.88 |    0.1% |      227,828.15 |      177,520.09 |  1.283 |      15,184.36 |    0.4% |     10.91 | `ProcessTransactionBench`
    9a3397456d
  52. l0rinc force-pushed on Jan 26, 2025
  53. 1440000bytes commented at 3:26 pm on January 27, 2025: none

    I have been disappointed for many reasons but will list two:

    1. We need Greg Maxwell and Pieter Wuille for every consensus code changes according to some bitcoiners: https://njump.me/nevent1qqspr7rhrga7kcd0tcq6vw9pus6xeg7h8cwv3g2fsmhu6yxu5tuugygpz4mhxue69uhhyetvv9ujuerpd46hxtnfduhsyg83wf2cdfqzcp4weqvdz3u2gk42phqkc75ufp5ajlp4qvmdzmuwgvpsgqqqqqqse3zhej

    They have no time to write a single comment on consensus proposals but got time to review pull requests that change consensus code.

    1. This pull request tries to change how consensus code works. Your “high standards” change with ego.

    I dont care if I am banned. I care more about bitcoin than you. I will continue to review pull requests.

  54. in src/consensus/tx_check.cpp:70 in 9a3397456d
    78+        for (const auto& in : sortedPrevouts) {
    79+            if (!in.hash.IsNull()) break; // invalid values can only be at the beginning
    80+            if (in.IsNull()) {
    81                 return state.Invalid(TxValidationResult::TX_CONSENSUS, "bad-txns-prevout-null");
    82+            }
    83+        }
    


    jasonribble commented at 11:38 pm on January 27, 2025:
    Is in a common variable naming in C/C++ or Bitcoin? Not sure what it is without digging into other code

    Julian128 commented at 9:11 am on January 28, 2025:
    I also think it’s bad naming, especially because it’s used as a keyword in other languages.

    jasonribble commented at 3:08 pm on January 28, 2025:
    100% agree. @l0rinc what do you think?

    l0rinc commented at 3:15 pm on January 28, 2025:

    Guys/bots, please stop useless spamming!

    If you care about Bitcoin, please review the code instead on a high level to make sure the original behavior is retained, risk is minimized, suggesting meaningful changes that make it more reviewable, less risky, identify missing test cases, better benchmarks, raise valid concerns.

  55. l0rinc renamed this:
    RFC: optimize `CheckBlock` input duplicate detection
    RFC: optimize `CheckBlock` input checks (duplicate detection & nulls)
    on Jan 28, 2025
  56. Julian128 commented at 2:46 pm on January 30, 2025: none
    I experimented a bit more and it looks like I got another 17% speed improvement (on block413567 on my linux machine) over your changes @l0rinc , by using a faster comparison operator and doing direct comparisons for relatively small input sizes. Can I create a new pull request for that?
  57. l0rinc commented at 3:14 pm on January 30, 2025: contributor

    by using a faster comparison operator and doing direct comparisons for relatively small input sizes.

    Quadratic comparison would likely be faster than sorting for very small input sizes (e.g. <10, I guess), but it would complicate the code and testing considerably for a case that is likely not representative. I also thought of that but I want the code to be faster while not being more difficult to comprehend. But feel free to challenge my bias by pushing a different PR with those changes and let the best approach win! You can cherry-pick it on top of these changes if you want.

  58. l0rinc renamed this:
    RFC: optimize `CheckBlock` input checks (duplicate detection & nulls)
    optimization: speed up `CheckBlock` input checks (duplicate detection & nulls)
    on Feb 14, 2025
  59. l0rinc marked this as ready for review on Feb 14, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me