fuzz: Left over tmp files when fuzzing with afl++ #28811

issue dergoegge openend this issue on November 7, 2023
  1. dergoegge commented at 10:34 am on November 7, 2023: member

    I frequently run out of disk space when fuzzing our targets with afl++ due to left over tmp files created by our TestingSetups. Similar issues were previously reported (see #22572, #22472).

    The tmp files are left over when test cases cause timeouts (or crashes) and the afl++ fork-server simply kills the process, leaving it no time to cleanup. Note: this is not a bug in afl++, they specifically advise against creating tmp files in fuzz tests. Increasing the timeout threshold sort of works but it more or less just slows down the disk filling.

    TestingSetup should have a memory-only option that results in no disk usage at all.

    • Using a ram disks doesn’t work because they would run out of memory
    • The fix in #22472 doesn’t work when fuzzing on multiple cores
  2. dergoegge renamed this:
    fuzz: Left over tmp files when fuzzing with afl
    fuzz: Left over tmp files when fuzzing with afl++
    on Nov 7, 2023
  3. maflcko added the label Tests on Nov 7, 2023
  4. maflcko commented at 10:39 am on November 7, 2023: member

    I am not sure how doable this is, as it requires every storage access to be mocked in all parts of the codebase?

    I think it would be better to fix the underlying timeouts and crashes, and treat them as bugs.

  5. dergoegge commented at 11:08 am on November 7, 2023: member

    I think it would be better to fix the underlying timeouts and crashes, and treat them as bugs.

    I mean for crashes sure but for timeouts I have the suspicion that this happens on almost all targets that use TestingSetup. There might even be another reason for why these files stick around besides timeouts.

    There is just no nice way of preventing this besides “don’t create tmp files” and I don’t want to be running around “fixing” dozens of targets and trying to maintain this everytime we add a new one.

    Ideally we should probably do both: dont create tmp files and avoid timeouts in targets.

    I am not sure how doable this is, as it requires every storage access to be mocked in all parts of the codebase?

    This is why I opened the issue, I don’t know how we can achieve this without a bunch of refactoring.

    Block storage would probably be the biggest blocker, making that interface more abstract and then having a InMemoryBlockManager would be nice but no one is gonna review that or it would take years to get merged.


    I have not yet tried restricting the maximum input size that afl++ generates, maybe that works as a temporary workaround.

  6. maflcko commented at 11:49 am on November 7, 2023: member

    I don’t want to be running around “fixing” dozens of targets and trying to maintain this everytime we add a new one.

    In essence, this is identical to the rpc timeout issue in functional tests. I don’t think a cheat code is available to sidestep maintenance here.

    If you only care about not creating leftover tmp files, you can increase the timeout massively. If temp files are still created, it should be reported and treated as a bug and fixed.

    However, I think we should also treat lower timeouts as bugs, because they harm the fuzz engine from making progress, and also block the CI if many of those inputs end up in the git repo.

  7. dergoegge commented at 1:11 pm on November 7, 2023: member

    The fact that timeouts or crashes lead to left over files on disk is a bug and we should fix that. Otherwise it isn’t really possible to conduct long running fuzzing campaigns using afl++ (or other engines that use the fork-server model) without running out of disk space.

    We should also fix the timeouts and crashes themselves but that is not what I wanted to document with this issue.

  8. maflcko commented at 1:41 pm on November 7, 2023: member

    We should also fix the timeouts and crashes themselves but that is not what I wanted to document with this issue.

    Ok. I’ve opened #28812 to investigate and fix them. OSS-Fuzz only has 5 targets affected, and it would be good if you checked and reported any timeouts and crashes you found as well.

  9. maflcko commented at 11:46 am on December 8, 2023: member

    Not sure what to do here. There are some options:

    • Forcing a static directory path per fuzzing task (Identified by target name + the CPU it runs on, or the “worker ID”), which will be cleared before the fuzz process starts.
    • Detecting a low storage and then aborting the fuzzing campaign with a verbose error message, educating about the timeout?
    • (something else?)

    I think an early abort makes the most sense, because with frequent timeouts, it doesn’t make sense to continue the fuzzing campaign anyway.

  10. dergoegge commented at 12:59 pm on December 8, 2023: member
    • we could (but probably won’t) refactor our code so that we don’t need to go to disk at all.
    • or use a custom signal handler for e.g. SIGUSR1 that deletes the data dir and then exits without any other cleanup. Afl++ can be told to send any signal instead of SIGKILL to children on timeout, etc with AFL_KILL_SIGNAL.

    I’ve worked around this so far by using a ram disk and fuzzing inside a docker container, once the disk is full the whole thing just dies and I can inspect the container afterwards for timeout inputs.


dergoegge maflcko

Labels
Tests


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me