fuzz: Use immediate task runner to increase fuzz stability #31841

pull maflcko wants to merge 1 commits into bitcoin:master from maflcko:2502-fuzz-stable changing 1 files +6 −2
  1. maflcko commented at 1:42 pm on February 11, 2025: member

    Leaking a scheduler with a non-empty queue from the fuzz initialization phase into the fuzz target execution phase is problematic, because it messes with coverage data. This in turn is problematic, because it leads to:

    • Decrease in fuzz target execution stability (non-determinism when running the fuzz target).
    • Decrease in fuzz input merge stability (non-determinism when selecting a minimum set of fuzz input to reach maximum coverage), which leads to qa-assets bloat.

    Fix one such issue. Tracking issue: #29018

    Can be tested via: RUST_BACKTRACE=1 cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake $PWD/../b-c-qa-assets/fuzz_corpora/ partially_downloaded_block.

    The failure is non-deterministic (obviously) and will show coverage in validation signals such as UpdatedBlockTip before this change and will have this one fixed after this change.

  2. DrahtBot commented at 1:42 pm on February 11, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31841.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK marcofleon, dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Feb 11, 2025
  4. maflcko force-pushed on Feb 11, 2025
  5. maflcko commented at 4:47 pm on February 11, 2025: member

    Actually, this is incomplete. It will handle the coverage generated in the notification itself (f()), but there is no guarantee that the re-lock is handled as well when the context ends happens before fuzz target execution starts after init. Relevant scope for the re-lock:

    https://github.com/bitcoin/bitcoin/blob/86528937e5c4da2e12c46085fc41e87ed759258e/src/scheduler.cpp#L56-L61

    The change here should be an improvement, but I wonder if there is a trivial one-line complete solution.

    On alternative would be to inline the scheduler in tests (make it synchronous), but the patch would be larger.

  6. maflcko renamed this:
    test: Clear scheduler after init to increase fuzz stability
    test: Use serial task runner to increase fuzz stability
    on Feb 11, 2025
  7. DrahtBot renamed this:
    test: Use serial task runner to increase fuzz stability
    test: Use serial task runner to increase fuzz stability
    on Feb 11, 2025
  8. maflcko force-pushed on Feb 11, 2025
  9. maflcko force-pushed on Feb 11, 2025
  10. maflcko force-pushed on Feb 11, 2025
  11. maflcko force-pushed on Feb 11, 2025
  12. DrahtBot added the label CI failed on Feb 11, 2025
  13. DrahtBot commented at 7:27 pm on February 11, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37046883391

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  14. maflcko renamed this:
    test: Use serial task runner to increase fuzz stability
    fuzz: Use serial task runner to increase fuzz stability
    on Feb 11, 2025
  15. maflcko commented at 7:47 pm on February 11, 2025: member

    On alternative would be to inline the scheduler in tests (make it synchronous), but the patch would be larger.

    Looks like this will lead to deadlocks due to lock order inversions between cs_main and cs_wallet in the unit tests. I’ll try to apply it to the fuzz tests only. If there are still issues in this pull, or in the future, they could be worked around for the affected fuzz target.

  16. maflcko force-pushed on Feb 11, 2025
  17. maflcko force-pushed on Feb 11, 2025
  18. DrahtBot removed the label CI failed on Feb 11, 2025
  19. maflcko force-pushed on Feb 19, 2025
  20. dergoegge commented at 10:36 am on February 19, 2025: member
    Concept ACK
  21. maflcko commented at 10:38 am on February 19, 2025: member
    (rebased without changes to make testing easier)
  22. fuzz: Use serial task runner to increase fuzz stability fa4fb6a8f1
  23. maflcko force-pushed on Mar 14, 2025
  24. fanquake commented at 6:50 am on March 17, 2025: member
  25. marcofleon commented at 7:29 pm on March 17, 2025: contributor

    ACK fa4fb6a8f15c295a2a3d3ffd737e115b8be46c1f

    Seems fine to me to use ImmediateTaskRunner for fuzz tests. I checked stability for partially_downloaded_block and saw an increase from 90% to 94%.

    Am I correct in saying that this change makes calling something like SyncWithValidationInterfaceQueue() in fuzz targets effectively a no-op?

    edit: quick nit, I think the PR title means to say “use immediate task runner” yes?

  26. DrahtBot requested review from dergoegge on Mar 17, 2025
  27. maflcko renamed this:
    fuzz: Use serial task runner to increase fuzz stability
    fuzz: Use immediate task runner to increase fuzz stability
    on Mar 17, 2025
  28. maflcko commented at 7:35 pm on March 17, 2025: member

    Am I correct in saying that this change makes calling something like SyncWithValidationInterfaceQueue() in fuzz targets effectively a no-op?

    Yes, they could be removed now. Happy to include that here, but they should also be harmless.

  29. marcofleon commented at 10:57 am on March 18, 2025: contributor

    Happy to include that here

    No need, just confirming for my own understanding.

  30. dergoegge approved
  31. dergoegge commented at 1:11 pm on March 20, 2025: member

    Code review ACK fa4fb6a8f15c295a2a3d3ffd737e115b8be46c1f

    The only downside to this is that we diverge from the production setup and e.g. bugs that only occur when fuzzing and using the SerialTaskRunner would be masked by this change. We could consider having a compile or runtime option to enable the serial runner and run both versions in CI.

  32. maflcko commented at 2:13 pm on March 20, 2025: member

    The only downside to this is that we diverge from the production setup and e.g. bugs that only occur when fuzzing and using the SerialTaskRunner would be masked by this change.

    Right, but I wonder what type of bug this could possibly uncover, given that the fuzz targets are mostly single threaded and require some kind of sync (like SyncWithValidationInterfaceQueue) when they use the task runner.

    Obviously, bugs in CScheduler itself could be missed. Though, if we are worried about that, a CScheduler fuzz target should be added. Unrelated to this pull, if we want to test multiple threads to detect lock issues and data races, I presume we’d have to add a specific fuzz target for those as well and add a tsan-fuzz CI task.

  33. fanquake merged this on Mar 21, 2025
  34. fanquake closed this on Mar 21, 2025

  35. TheCharlatan commented at 5:36 am on March 21, 2025: contributor
    Post-merge ACK
  36. maflcko deleted the branch on Mar 21, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me