Increase timeout or remove valgrind CI job? #113

issue dergoegge openend this issue on April 12, 2023
  1. dergoegge commented at 9:19 am on April 12, 2023: member

    I think there are only ~5 occurrences of the valgrind job passing out of all the merged/closed PRs. They always (afaict) fail due to timing out.

    So maybe we should:

    • Increase the timeout? (unless the job would take way to long anyway)
    • Remove the job?
    • Only run on corpora that were changed in a PR?
  2. fanquake commented at 9:22 am on April 12, 2023: member
    We are currently set at the limit of 120m for non-credit tasks, see https://cirrus-ci.org/faq/#instance-timed-out. Maybe we can use some compute credits.
  3. maflcko commented at 9:24 am on April 12, 2023: contributor

    I run it locally, and it takes 12 CPU days, so I am not sure if this is good to spend credits on.

    Is there any value in running in remotely? If yes, we could spin up a remote (dedicated runner) for it.

  4. maflcko commented at 9:25 am on April 12, 2023: contributor
    (12 CPU days on a slow CPU running valgrind in qemu :sweat_smile: )
  5. fanquake commented at 12:52 pm on April 19, 2023: member
    Relevant CI job now that credits PR has been merged: https://cirrus-ci.com/task/6079733960540160
  6. fanquake commented at 1:04 pm on April 19, 2023: member
    Looks like it is working: credits
  7. fanquake commented at 9:27 am on April 20, 2023: member

    Relevant CI job now that credits PR has been merged: https://cirrus-ci.com/task/6079733960540160

    Ultimately timed-out after 8 hours. However looking at the job, it also didn’t use any compute credits? credits

  8. maflcko commented at 9:30 am on April 20, 2023: contributor

    I calculated the cost here: #114 (comment)

    If you want to see it in Cirrus CI, you may have to refresh the page.

  9. fanquake commented at 9:33 am on April 20, 2023: member
    Don’t know. I have refreshed multiple times, and am logged in. Nothing showing up.
  10. maflcko commented at 9:01 am on May 2, 2023: contributor
    I think this can be closed. Not sure what else can be done here?
  11. dergoegge commented at 9:37 am on May 2, 2023: member
    Hm, the job still times out even with the credits, so maybe we should remove it from CI?
  12. maflcko commented at 10:17 am on May 2, 2023: contributor
    All fuzz targets are run, except for a few slow inputs for one remaining target, so it seems better than removing it completely?
  13. fanquake commented at 10:19 am on May 2, 2023: member
    Yea I’m happy for this to remain, even if it never quite completes, if we’re running 95+% of targets.
  14. dergoegge commented at 10:21 am on May 2, 2023: member
    Ok! Should we exclude the really slow targets from running then?
  15. maflcko commented at 10:22 am on May 2, 2023: contributor
    it seems better to run them than not to run them at all?
  16. dergoegge commented at 10:27 am on May 2, 2023: member

    I find the red CI is super annoying because it has to be looked at every time and the slow targets still run in the other jobs (with MSan for example).

    We could also setup our own remote runner (like you suggested earlier) but is that worth it?

  17. maflcko commented at 10:34 am on May 2, 2023: contributor

    is that worth it?

    At least one person or CI should run all targets in valgrind on a regular basis. I do it, but it would be good to have a backup.

  18. dergoegge commented at 3:17 pm on May 5, 2023: member

    Closing.

    I will setup a machine for the valgrind job sometime soon and then open a PR to no longer run it in CI here.

  19. dergoegge closed this on May 5, 2023

  20. maflcko commented at 3:25 pm on May 5, 2023: contributor
    I still think it is good to have a result that is trivially and publicly visible for anyone with internet access. Maybe the name of the task can be changed to indicate that a timeout can be ignored?
  21. maflcko commented at 3:11 pm on May 16, 2023: contributor
    I am also thinking about adding a script or CI task to check that a pull only adds files, not removes them. Such a task would then fail (red) if non-reduced and stale inputs are deleted. So avoiding a red CI by all costs may not be a worthy goal?
  22. dergoegge commented at 3:27 pm on May 16, 2023: member

    So avoiding a red CI by all costs may not be a worthy goal?

    My main concern with the red CI is that it has to be looked at, since the valgrind job either fails because of some bug or it times out and we definitely want to catch when there’s a bug. I know it’s easy to check but I just find it very annoying. I couldn’t find this in the cirrus docs, but maybe there is an option to ignore timeouts?

    Otherwise I think, we can just have one (or more) remote runners that publish the result to GitHub somehow, so we still get the green CI (I think I saw something about cirrus allowing remote runners). I am happy to set that up at some point and until then we keep it the way it is now.

    adding a script or CI task to check that a pull only adds files, not removes them. Such a task would then fail (red) if non-reduced and stale inputs are deleted.

    Sounds good and this should only fail on that ~one pruning PR every 6 months.

  23. maflcko commented at 5:50 pm on May 16, 2023: contributor

    Otherwise I think

    Yeah, it is called persistent worker. See https://github.com/bitcoin/bitcoin/blob/904631e0fc00ac9c8a03d1ce226d071bf88c00db/.cirrus.yml#L17

    Sounds good and this should only fail on that ~one pruning PR every 6 months.

    https://github.com/bitcoin-core/qa-assets/pull/125


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/qa-assets. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:25 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me