bench: replace benchmark block with more representative one (413567 → 784588) #32457

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/bench-block-413567-to-784000 changing 8 files +68 −22
  1. l0rinc commented at 9:30 am on May 9, 2025: contributor

    Draft, until I investigate if we can generate a similar block instead of adding a real one to the repo


    Summary

    This PR replaces our benchmark’s reference block with one that’s more modern and representative of current usage patterns.

    Context

    The current benchmark block was mined in 2016 and added in PR #9049. Since it predates many modern script types, our benchmarks don’t accurately reflect current network conditions.

    Suggestion

    We’re replacing it with block 784588 from 2023, which provides a better balance - it’s recent enough to include modern script types while still containing legacy scripts typically encountered during IBD.

    The PR consists of two commits:

    • first documenting the current block’s script type distribution;
    • then replacing it with the new block and updating assertions accordingly.
  2. bench: document the measured block's properties
    This commit documents the current benchmark-base block's properties, to highlight the differences with the replacement block in the next commit.
    3878444c76
  3. bench: measure behavior of a more representative block
    https://mempool.space/block/413567 was mined in 2016, added as a benchmark-base in https://github.com/bitcoin/bitcoin/pull/9049.
    It lacks modern script types, making the benchmarks unrepresentative of current usage.
    In this commit we're replacing it with https://mempool.space/block/784588 from 2023. This block was selected because it's old enough to include legacy script types encountered during IBD, while also containing modern script types in proportions that better reflect current block composition.
    f01fdd00ec
  4. DrahtBot commented at 9:30 am on May 9, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32457.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32554 (RFC: bench: replace embedded raw block with configurable block generator by l0rinc)
    • #32532 (script: short-circuit GetLegacySigOpCount for known scripts by l0rinc)
    • #31682 ([IBD] specialize CheckBlock’s input & coinbase checks by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. DrahtBot added the label Tests on May 9, 2025
  6. in src/bench/strencodings.cpp:14 in f01fdd00ec
    11 #include <vector>
    12 
    13 static void HexStrBench(benchmark::Bench& bench)
    14 {
    15-    auto const& data = benchmark::data::block413567;
    16+    auto const& data = benchmark::data::block_784588;
    


    maflcko commented at 9:40 am on May 9, 2025:
    Instead of a block, this could just be random bytes from a fast random context?

    l0rinc commented at 3:12 pm on May 9, 2025:
    Yes, this one definitely, but in the other cases I’m worried about introducing a strong bias. It’s not like we’re changing these very often - but I’ll investigate anyway, let’s see how close we can get without adding 1.5 Mb to the repo.

    laanwj commented at 3:24 pm on May 9, 2025:
    it’s not just the amount of data, we’re still scared from the xz backdoor incident :smile:

    l0rinc commented at 3:44 pm on May 9, 2025:
    Understandable, but that’s why I added the hashes here, to make it self-validating.

    laanwj commented at 5:01 pm on May 9, 2025:
    Hahaha agree it would be extremely far-fetched to put data in a specific block, just to add it in the repository two years later.

    maflcko commented at 8:12 am on May 13, 2025:

    Yes, this one definitely, but in the other cases I’m worried about introducing a strong bias.

    Again, it would be good to list the benchmark that needs this. Also, serialization itself shouldn’t care if the data is synthetic (random) or if it exactly matches a real past block. If you worry about a bias, it should actually be easier to provide synthetic data, than to try to find a fitting past block. In any case, there will always be a bias, even if the data is fully synthetic, as the real chain progresses and we probably don’t want to update this for every release. For the benchmarks where it doesn’t matter, I’d say to just leave them as-is. For the benchmarks where it matters, it would be good to explain why and then find a solution for each benchmark.

  7. laanwj commented at 12:49 pm on May 9, 2025: member
    Agree with the rationale of this PR, but having 1MB+ binary files in the repo is really meh.
  8. l0rinc commented at 12:57 pm on May 9, 2025: contributor
    Agree - do you have a better idea?
  9. maflcko commented at 1:02 pm on May 9, 2025: member
    Is there a benchmark that needs this? If yes, going for synthetic, but representative (and easily adjustable) data may be a better choice for that benchmark.
  10. laanwj commented at 2:19 pm on May 9, 2025: member

    Is there a benchmark that needs this? If yes, going for synthetic, but representative (and easily adjustable) data may be a better choice for that benchmark.

    Yes, as there is a lot of random data in a block whose exact value isn’t important to benchmarking (only that it’s always the same), it seems possible to deterministically construct a similar block from code.

  11. l0rinc commented at 9:40 pm on May 18, 2025: contributor
    Added a random block generator in #32554 - let me know if it makes sense so I can close this one.
  12. l0rinc commented at 1:43 pm on May 21, 2025: contributor
  13. l0rinc closed this on May 21, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-25 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me