bench: Remove various extraneous benchmarks #31153

pull dergoegge wants to merge 2 commits into bitcoin:master from dergoegge:2024-10-nuke-benchmarks changing 11 files +35 −654
  1. dergoegge commented at 2:27 pm on October 25, 2024: member

    Measuring performance of components that do not meaningfully impact the performance of critical aspects of the system (e.g. block validation or mining) seems extraneous.

    Maintaining extraneous benchmark tests incentivizes non-impactful improvements. This frequently comes up with new contributors and they can’t be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

  2. DrahtBot commented at 2:27 pm on October 25, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31153.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK stickies-v, fanquake, brunoerg, mzumsande
    Approach NACK davidgumberg

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30987 (Don’t zero-after-free DataStream: Faster IBD on some configurations by davidgumberg)
    • #30116 (p2p: Fill reconciliation sets (Erlay) attempt 2 by sr-gi)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on Oct 25, 2024
  4. stickies-v commented at 3:04 pm on October 25, 2024: contributor
    Concept ACK. Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?
  5. in src/bench/logging.cpp:1 in cda9d1c9a6 outdated
    -1@@ -1,64 +0,0 @@
    0-// Copyright (c) 2020-2022 The Bitcoin Core developers
    


    jonatack commented at 3:24 pm on October 25, 2024:
    Not sure about removing the logging benchmarks. Logging can be very high frequency and also used for logging lock contentions. Even if that is more likely to occur for developers, it may be useful or valuable to be able to track this.
  6. jonatack commented at 3:34 pm on October 25, 2024: member

    Not sure about some of these. People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn’t degrade performance, and having existing benchmarks to refer to may make that easier. Seems odd to throw out work like #22284.

    Are any of the benchmarks called by outside sites that track their results over time?

  7. fanquake commented at 3:35 pm on October 25, 2024: member
    Concept ACK - agree with stickies-v re updating the doc.
  8. brunoerg commented at 3:59 pm on October 25, 2024: contributor
    Concept ACK
  9. instagibbs commented at 4:26 pm on October 25, 2024: member

    Perhaps also worth updating doc/benchmarking.md with a brief section on which benchmarks are typically useful to add, and which benchmarks should rather be ephemeral (e.g. provided on a separate branch, supplementary to a PR)?

    I think this documentation update would be helpful to get in even without this PR.

  10. dergoegge commented at 4:34 pm on October 25, 2024: member

    People can be asked to add benchmarks for performance improvements or to verify that a change to existing code doesn’t degrade performance

    Imo, performance improvements (and accompanying benchmarks) should only be made to code that makes critical things slower than they need to be. Otherwise we’re just improving performance for the sake of doing it, which is not something we have the resources for. We should aim our resources with intention, and the benchmarks can act as the scope.

    For example, should we spend our time making hex parsing 500x faster or maybe shave 10% from IBD? I’m arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don’t get distracted.

    Are any of the benchmarks called by outside sites that track their results over time?

    https://corecheck.dev sort of does but it’s quite noisy.

  11. dergoegge commented at 4:35 pm on October 25, 2024: member
    Will add the doc changes next week
  12. maflcko commented at 8:56 am on October 28, 2024: member

    Hex parsing is used in mining via DecodeHexBlk, no?

    Also, while some of the benchmarks do not exist to be optimized, but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

    In any case, updating the docs should be uncontroversial and useful going forward.

  13. jarolrod commented at 9:11 am on October 29, 2024: member

    but merely to give a feeling and to avoid a regression where an algorithm is swapped to go from O(n) to O(n^2). So in a sense, they are a bit of a unit test.

    This is a good reason to keep.

    I’m arguing for scoping our tests to the latter (and other critical things e.g. mining) so that we don’t get distracted.

    And this is an important statement.

    We don’t necessarily need to throw away work that has previously been a useful metric in issues (https://github.com/bitcoin/bitcoin/issues/17501) to achieve the goal of new benchmark contributions being focused on specific areas.

  14. dergoegge commented at 10:02 am on October 29, 2024: member

    throw away work that has previously been a useful metric in issues (https://github.com/bitcoin/bitcoin/issues/17501)

    The benchmarks weren’t mentioned in the issue you linked or the PR that fixed the issue?

  15. jarolrod commented at 10:20 am on October 29, 2024: member

    throw away work that has previously been a useful metric in issues (#17501)

    The benchmarks weren’t mentioned in the issue you linked or the PR that fixed the issue?

    I said it’s a useful metric, as in given that context, would it not be nice to have the benchmark?

  16. dergoegge commented at 10:29 am on October 29, 2024: member

    as in given that context

    I see. No I don’t think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn’t have found that particular issue and therefore were also not useful in verifying the fix.

    practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

  17. jarolrod commented at 10:33 am on October 29, 2024: member

    as in given that context

    I see. No I don’t think they would have been useful in that context, because all the base58 benchmarks run with inputs of the same size, so they couldn’t have found that particular issue and therefore were also not useful in verifying the fix.

    practicalswift mentioned a fuzz test in the PR, which is likely how he found the issue (speculating).

    I’m not intending to argue, and as i said your main point in the PR is valid.

    If a bench didn’t exist at the time of the issue, I imagine a review comment would have mentioned it would be nice to have, as a metric. I don’t think that’s controversial.

  18. dergoegge commented at 3:34 pm on October 30, 2024: member
    Added the doc commit, ready for further review
  19. in src/bench/parse_hex.cpp:30 in fcfb1d862a outdated
    24-static void HexParse(benchmark::Bench& bench)
    25-{
    26-    auto data = generateHexString(130); // Generates 678B0EDA0A1FD30904D5A65E3568DB82DB2D918B0AD8DEA18A63FECCB877D07CAD1495C7157584D877420EF38B8DA473A6348B4F51811AC13C786B962BEE5668F9 by default
    27-
    28-    bench.batch(data.size()).unit("base16").run([&] {
    29-        auto result = TryParseHex(data);
    


    maflcko commented at 3:40 pm on October 30, 2024:
    Hex parsing is used in mining via DecodeHexBlk, no?

    dergoegge commented at 4:10 pm on October 30, 2024:
    Does that meaningfully impact block submission? if so, seems like DecodeHexBlk should be the target for the benchmark (or submitblock as a whole)

    maflcko commented at 12:03 pm on October 31, 2024:

    I see your point, but is seems a bit arbitrary to remove this one and not ExpandDescriptor, which should also be an end-to-end target, no?

    Maybe it would help review if you created a list of all benchmarks, and then added a reason why you kept it, or why you removed it for each one. Otherwise, it may be harder for reviewers to follow and see how the change is self-consistent?


    dergoegge commented at 2:58 pm on November 4, 2024:
    Thanks, removed the ExpandDescriptor bench as well. Will comment on how I picked the benches soon.
  20. davidgumberg commented at 6:17 pm on October 30, 2024: contributor

    Concept ACK, Approach NACK

    I think there should be a burden on PR’s that claim to improve performance to demonstrate a user-impacting scenario where the improvements are relevant, and I agree that having benchmarks that test workloads that will never impact users leads to PR’s being opened and review being spent on things that don’t deserve attention.

    I think a similar burden exists for this PR to demonstrate or at least describe a rationale for why these removed benchmarks don’t test behavior that would impact users.

  21. in doc/benchmarking.md:66 in fcfb1d862a outdated
    67+- Block propagation (Cost: slow block propagation may increase the rate of
    68+  orphaned blocks)
    69+
    70+Pull requests that improve the performance of components covered by a benchmark
    71+are valuable. Adding new benchmarks is valuable if the component being tested
    72+affects critical functions mentioned above.
    


    maflcko commented at 10:36 am on October 31, 2024:

    Maybe add a note to say that performance optimizations of micro-benchmarks are not valuable, if they do not show a visible end-user end-to-end performance improvement? Also could mention that the contribution of benchmarks that do not cover an end-user end-to-end scenario may not be valuable?

    Even if they do, I think some improvements may still be rejected, if the code bloat or review/maintenance is too much to justify the improvement.


    dergoegge commented at 2:57 pm on November 4, 2024:
    Done. Let me know if you’re not happy with the wording.
  22. maflcko commented at 10:39 am on October 31, 2024: member
    No strong opinion on the removals. I think the doc update is valuable and could be extended. Simply declaring the removed benchmarks as potentially not valuable, when looked at too narrowly, could be less controversial and still achieve the essence of the goal of this pull request.
  23. kevkevinpal commented at 0:50 am on November 1, 2024: contributor

    Not yet sure on this PR we already have priority_level for benchmarks

    All the benchmarks being removed seem to be benchmark::PriorityLevel::HIGH not sure if it makes sense to move them down a level?

    This frequently comes up with new contributors and they can’t be blamed for thinking that e.g. hex parsing should be made faster if a benchmark test exists.

    I understand this is one of the motivations for this change does this happen often for benchmarks that are marked benchmark::PriorityLevel::LOW

  24. in doc/benchmarking.md:54 in fcfb1d862a outdated
    55-- P2P throughput
    56+
    57+Benchmarks help with monitoring for performance regressions and act as a scope
    58+for future performance improvements. They should cover components that impact
    59+performance critical functions of the system. Functions are performance
    60+critical, if the cost associated with a degradation in perforamnce is high. A
    


    mzumsande commented at 5:43 pm on November 1, 2024:
    typo: perforamnce
  25. in doc/benchmarking.md:72 in fcfb1d862a outdated
    71+are valuable. Adding new benchmarks is valuable if the component being tested
    72+affects critical functions mentioned above.
    73+
    74+Benchmarks are ill-suited for testing of denial of service issues as they are
    75+restricted to the same input set (introducing bias). [Fuzz
    76+tests](/doc/fuzzing.md) are better suited for this purpose, as they are much
    


    mzumsande commented at 5:55 pm on November 1, 2024:
    Not sure I follow this point: Under “denial of service” I would understand attacks that overwhelm the target with repeated requests (which are often but not necessarily identical) and as result may just slow it down / make it unresponsive instead of outright crashing it - while I agree that benchmarks don’t really find these, I wouldn’t see fuzz tests designed to find crashes as the best way to test these issues (instead of integration tests with some sort of stress)?

    dergoegge commented at 2:12 pm on November 4, 2024:

    Fuzz testing does not only find crashes, it finds slow running inputs as well. Fuzz engines usually terminate an input’s execution if it takes longer than N seconds to run, usually referred to as a “timeout”. See #28812 for examples.

    Depending on what the harness is testing, a slow input might indicate a DoS issue (e.g. infinite loop, quadratic behavior, …).

  26. mzumsande commented at 6:02 pm on November 1, 2024: contributor

    Concept ACK

    I wouldn’t oppose deleting some benchmarks, but I also like the idea of using priorities. That, combined with the doc should be sufficient to deter micro-optimiziations in non-critical code.

  27. dergoegge force-pushed on Nov 4, 2024
  28. dergoegge commented at 3:16 pm on November 4, 2024: member
    I’m not sure if using the priority levels is a good alternative to deleting the benches. Afaict, the priority level was introduced to avoid running slow benchmarks in the CI as apposed to indicating how “important” they are. As a result, we’re not catching issues with these tests until someone executes them (e.g. #29061 (comment)) and I’d prefer not to expand on that.
  29. DrahtBot added the label Needs rebase on Nov 13, 2024
  30. [bench] Remove various extraneous benchmarks 810c2dc769
  31. [doc] Ammend notes on benchmarking 8b19137a68
  32. dergoegge force-pushed on Nov 13, 2024
  33. DrahtBot removed the label Needs rebase on Nov 13, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me