test: re-bucket long-running tests #30879

pull willcl-ark wants to merge 1 commits into bitcoin:master from willcl-ark:ci-p2p_node_network_limited changing 1 files +3 −3
  1. willcl-ark commented at 2:02 pm on September 12, 2024: member

    Re-bucket:

    • p2p_node_network_limited -v*transport
    • feature_assume_utxo

    On CI runners these tests are taking longer than their current bucket suggests, often being among the last to finish.

    Re-bucket them to improve CI efficiency.

  2. DrahtBot commented at 2:02 pm on September 12, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Sep 12, 2024
  4. DrahtBot added the label CI failed on Sep 13, 2024
  5. theStack commented at 11:58 am on September 13, 2024: contributor
    The re-bucketed test in the commit doesn’t match the one in the commit/PR title (p2p_timeouts vs. p2p_node_network_limited), so I suppose one of the two has to be adapted.
  6. maflcko commented at 12:30 pm on September 13, 2024: member

    Locally these tests take about 20 seconds and appear in the correct bucket.

    I think the sorting isn’t really based on the “correct bucket”, but rather the inverse overall duration, the buckets are just a guideline/reminder. That is, as long as the longest running test is started first, everything is fine.

    Did you measure if this actually makes a meaningful difference?

  7. maflcko commented at 12:33 pm on September 13, 2024: member
    Also, could rebase for a fresh CI? At least for me the CI timeouts went away earlier this night when trying to reproduce, but I don’t know if they are really gone, and whether the issue was inside this repo, or on the CI runners, or on the CI backend software, or on the Cirrus/GHA integration.
  8. willcl-ark force-pushed on Sep 16, 2024
  9. willcl-ark force-pushed on Sep 16, 2024
  10. willcl-ark commented at 8:05 am on September 16, 2024: member

    The re-bucketed test in the commit doesn’t match the one in the commit/PR title (p2p_timeouts vs. p2p_node_network_limited), so I suppose one of the two has to be adapted.

    Ugh, I cherry-picked the wrong tests from my working branch. The title is correct, tests in the commit here updated.

    I think the sorting isn’t really based on the “correct bucket”, but rather the inverse overall duration, the buckets are just a guideline/reminder. That is, as long as the longest running test is started first, everything is fine.

    Did you measure if this actually makes a meaningful difference?

    Not sure how to measure this, but I have been frequently seeing these tests taking the longest to run out of the suite, see e.g. https://github.com/bitcoin/bitcoin/actions/runs/10833854937/job/30163805745?pr=30866#step:12:64 and (same PR another job): https://github.com/bitcoin/bitcoin/actions/runs/10833854937/job/30163805709?pr=30866#step:7:5343

    It’s only going to make a difference of some number of seconds, but possibly up to 2 minutes, even in the best case. Whether that can be considered meaningful I will leave for reviewers to determine. If we launch this test right at the end, and other long-running ones finish, it could be adding ~240 secs onto the total test suite time each run.

  11. maflcko commented at 8:08 am on September 16, 2024: member

    Thanks, the explanation is correct and the two links with data are useful to support this change.

    review ACK 45663cd02cfb99c0bf91828b33a0611bb223497e

  12. maflcko commented at 8:14 am on September 16, 2024: member

    Nit: May be good to reword the pull request description to say that this better reflects the reality in CI runs, and may even speed them up. The mention of “CI timeouts” could be removed, because adding or removing 200 seconds from a run shouldn’t be a fix (or cause) of a timeout. Any run that takes longer than half of the overall timeout value is a warning that should be investigated and fixed, because the CI timeout is really only there to catch cases where there is a true timeout (zombie process, infinite loop, infinite sleep, …).

    But just a doc-nit, up to you. The change looks good either way.

  13. willcl-ark commented at 8:20 am on September 16, 2024: member

    Nit: May be good to reword the pull request description to say that this better reflects the reality in CI runs, and may even speed them up. The mention of “CI timeouts” could be removed, because adding or removing 200 seconds from a run shouldn’t be a fix (or cause) of a timeout. Any run that takes longer than half of the overall timeout value is a warning that should be investigated and fixed, because the CI timeout is really only there to catch cases where there is a true timeout (zombie process, infinite loop, infinite sleep, …).

    But just a doc-nit, up to you. The change looks good either way.

    Thanks, re-worded the description :)

  14. DrahtBot removed the label CI failed on Sep 16, 2024
  15. maflcko commented at 9:30 am on September 16, 2024: member
    Should feature_assumeutxo also be moved up a bit? According to https://corecheck.dev/tests it is one of the top-20 slowest tests. Also, according to the CI runs in this pull (Asan + macOS) it is in the tail.
  16. test: re-bucket long-running tests
    - p2p_node_network_limited -v*transport
    - feature_assume_utxo
    
    On CI runners these tests are taking longer than their current bucket
    suggests, often being among the last to finish.
    
    Re-bucket them to improve CI efficiency.
    f5a2000579
  17. willcl-ark force-pushed on Sep 23, 2024
  18. willcl-ark renamed this:
    test: re-bucket p2p_node_network_limited
    test: re-bucket long-running tests
    on Sep 23, 2024
  19. willcl-ark commented at 2:25 pm on September 23, 2024: member

    Should feature_assumeutxo also be moved up a bit? According to corecheck.dev/tests it is one of the top-20 slowest tests. Also, according to the CI runs in this pull (Asan + macOS) it is in the tail.

    Added feature_assumeutxo to the < 2m bucket

    1p1c is the other one I have noticed cropping up as long-running, but might leave it for now.

  20. maflcko commented at 9:55 am on September 27, 2024: member
    feature_config_args.py could be moved as well, but at some point it doesn’t matter too much. I wonder if this even has an effect at all.
  21. maflcko commented at 9:56 am on September 27, 2024: member

    review ACK f5a2000579b140a1f51fc433706c775ca560c62c

    Haven’t checked what effect this has, but it sounds plausible and harmless.

  22. fanquake merged this on Sep 27, 2024
  23. fanquake closed this on Sep 27, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me