Also keep fuzz inputs that increase coverage on older branches #266

pull ekzyis wants to merge 1 commits into bitcoin-core:main from ekzyis:keep-fuzz-inputs-for-older-branches changing 1 files +55 −30
  1. ekzyis commented at 2:49 PM on March 22, 2026: contributor

    closes #265

    I think this does what was suggested in #265 (comment).

    I haven't run this yet. I only ran the original script once in a ubuntu 24.04 vm.

    I'm currently concerned with two things:

    1. Can I continue to set the output dir of the fuzz engines to the real corpus, given hash-based filenames for the fuzz inputs, so overwrites would be idempotent? Before, afl-cmin was only run once per fuzz target. libFuzzer via the test runner was run once per sanitizer.

    2. I'm new to fuzzing. Did I actually understand what I'm doing here?

  2. ekzyis marked this as a draft on Mar 22, 2026
  3. ekzyis force-pushed on Mar 22, 2026
  4. in delete_nonreduced_fuzz_inputs.sh:106 in 888f0bb249 outdated
     114 | -      git commit -m "Reduced inputs for ${sanitizer}"
     115 | -    )
     116 | +      (
     117 | +        cd ../qa-assets
     118 | +        git add "${FUZZ_CORPORA_DIR}"
     119 | +        git commit -m "Reduced inputs for ${sanitizer}"
    


    ekzyis commented at 2:53 PM on March 22, 2026:

    This is going to create a commit per sanitizer and ref. I think it should create one commit per sanitizer for all refs (without wasting time rebuilding the same ref multiple times), or at least mention the ref in the commit message.

  5. in delete_nonreduced_fuzz_inputs.sh:44 in 888f0bb249 outdated
      38 | @@ -39,29 +39,45 @@ git clone --depth=1 https://github.com/bitcoin-core/qa-assets.git
      39 |    git commit -a -m "Delete fuzz inputs"
      40 |  )
      41 |  
      42 | -git clone --depth=1 https://github.com/bitcoin/bitcoin.git
      43 | +# TODO: optimize? --no-single-branch increased size from 69M to 170M
      44 | +# could use ls-remote to list tags and then only fetch tags we need
      45 | +git clone --depth=1 --no-single-branch https://github.com/bitcoin/bitcoin.git
    


    ekzyis commented at 2:56 PM on March 22, 2026:

    not sure how resource-constrained the vms are on which this will run


    maflcko commented at 1:24 PM on April 7, 2026:

    I think it is fine to just ask the user to pass in --extra-ref=29.x --extra-ref=...x and then fetch those with --depth=1, but no strong opinion, auto-detection may work as well.


    ekzyis commented at 2:45 PM on April 7, 2026:

    Oh, thanks for the idea! I'll consider it.

  6. in delete_nonreduced_fuzz_inputs.sh:53 in 78f513dab4
      50 | -  echo "Adding reduced seeds with afl-cmin"
      51 | +  # A fuzz input will be kept if it increases coverage on master or any of the
      52 | +  # last three major versions.
      53 | +  REFS=("master")
      54 | +  CURRENT_MAJOR_VERSION=$(git tag --list 'v*' --sort=-v:refname | sed 's/^v//' | awk -F. '{ print $1 }' | head -1)
      55 | +  PREV_MAJOR_VERSIONS=$(seq $((CURRENT_MAJOR_VERSION - 3)) $((CURRENT_MAJOR_VERSION - 1)))
    


    murchandamus commented at 3:46 PM on March 25, 2026:

    Wouldn’t that mean we’d cover the four most recent versions? The current plus three previous?


    ekzyis commented at 5:08 PM on March 25, 2026:

    Oh, yes, current plus three previous. Classic off-by-one. I didn't verify the output with the mentioned versions in #265 (master, 30.x, 29.x).

    For reference, this is the output on current master (2fe76ed8324):

    $ CURRENT_MAJOR_VERSION=$(git tag --list 'v*' --sort=-v:refname | sed 's/^v//' | awk -F. '{ print $1 }' | head -1)
    $ echo $CURRENT_MAJOR_VERSION
    31
    $ PREV_MAJOR_VERSIONS=$(seq $((CURRENT_MAJOR_VERSION - 3)) $((CURRENT_MAJOR_VERSION - 1)))
    $ echo $PREV_MAJOR_VERSIONS
    28 29 30
    

    So it should be this:

    diff --git a/delete_nonreduced_fuzz_inputs.sh b/delete_nonreduced_fuzz_inputs.sh
    index 49968fe7da..61426864f9 100644
    --- a/delete_nonreduced_fuzz_inputs.sh
    +++ b/delete_nonreduced_fuzz_inputs.sh
    @@ -50,7 +50,7 @@ git clone --depth=1 --no-single-branch https://github.com/bitcoin/bitcoin.git
       # last three major versions.
       REFS=("master")
       CURRENT_MAJOR_VERSION=$(git tag --list 'v*' --sort=-v:refname | sed 's/^v//' | awk -F. '{ print $1 }' | head -1)
    -  PREV_MAJOR_VERSIONS=$(seq $((CURRENT_MAJOR_VERSION - 3)) $((CURRENT_MAJOR_VERSION - 1)))
    +  PREV_MAJOR_VERSIONS=$(seq $((CURRENT_MAJOR_VERSION - 2)) $((CURRENT_MAJOR_VERSION - 1)))
       for version in $PREV_MAJOR_VERSIONS; do
         # versions before 29.x didn't use cmake
         if [ "$version" -lt 29 ]; then
    

    Then I can also remove the check for >= 29.x because of the new cmake build system below.

  7. in delete_nonreduced_fuzz_inputs.sh:88 in 78f513dab4 outdated
     100 | -      echo "No input corpus for $fuzz_target (ignoring)"
     101 | -    fi
     102 | +      FUZZ=$fuzz_target afl-cmin -T all -A -i "../all_inputs/$fuzz_target" -o "$FUZZ_TARGET_DIR/$ref_sha1" -- ./build_fuzz/bin/fuzz
     103 | +      # use cp instead of mv because mv fails if source and destination is same file
     104 | +      cp "$FUZZ_TARGET_DIR/$ref_sha1/"* "$FUZZ_TARGET_DIR/"
     105 | +      rm -r "$FUZZ_TARGET_DIR/$ref_sha1"
    


    murchandamus commented at 3:52 PM on March 25, 2026:

    I have the impression that this makes separate corpora in different directories for the fuzzing with each version. Am I understanding that right? If so, I’m not sure I understand why you chose to do it this way. When you fuzz, each input that exercises new code paths will be written to the target directory. So if you instead fuzz with all versions into the same directory, it should still retain any inputs that lead to new coverage, but it might produce a smaller input set, because the inputs already present in the target repository will be processed before inputs of the same length from other sources.


    ekzyis commented at 4:54 PM on March 25, 2026:

    Hey, thanks for taking a look at this!

    I have the impression that this makes separate corpora in different directories for the fuzzing with each version

    The intention is to have a single corpora for all versions. Maybe you got confused by -o $FUZZ_TARGET_DIR/$ref_sha1? I'm using a version-specific output dir because afl-cmin requires an empty output directory. FUZZ_TARGET_DIR itself is version-independent. I then run cp afterwards to move everything up into FUZZ_TARGET_DIR, which is also where the previous code put it.

    (I just realized that I don't need to fuzz into a version-specific output directory. It just needs to be empty. Maybe reusing the output dir but clearing it before every run would make its purpose more clear.)

    the inputs already present in the target repository will be processed before inputs of the same length from other sources.

    Oh, that's good to know! But since afl-cmin needs an empty output directory, I think you mean libFuzzer here, which is what fuzz/test_runner.py uses.


    murchandamus commented at 8:19 PM on March 25, 2026:

    Ah okay, I was unaware that afl-cmin behaves differently there. I’m indeed more familiar with libfuzzer which I use for my fuzzing cronjobs. Carry on then!

  8. murchandamus commented at 3:57 PM on March 25, 2026: contributor

    Thanks for taking a stab at this. I’m not that fluent in Shell and also don’t know as much about fuzzing as @maflcko and @dergoegge, but I got a couple comments for you.

  9. Also keep fuzz inputs that increase coverage on older branches 88faa27677
  10. ekzyis force-pushed on Mar 25, 2026
  11. ekzyis commented at 1:14 PM on March 26, 2026: contributor

    The script should be converted to rust first, see #268 (comment).

    Glad to close this PR, because this turned out to be conceptually simple, but surprisingly time-consuming due to shell issues

  12. ekzyis closed this on Mar 26, 2026

  13. ekzyis deleted the branch on Mar 26, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/qa-assets. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 08:25 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me