Keep fuzz inputs for older branches #274

ekzyis commented at 1:51 AM on April 12, 2026: contributor

close #265

see comment below for more information

in delete-nonreduced-fuzz-inputs/src/main.rs:89 in ca43f569ba outdated

  87 |      }
  88 | +    Ok(git_refs)
  89 | +}
  90 | +
  91 | +fn app() -> AppResult {
  92 | +    let mut git_refs = vec!["master".to_string()];

ekzyis commented at 1:53 AM on April 12, 2026:

~we might want to use the output of git rev-parse --short master instead of master in the commit messages~

afaik, this wasn't a problem before, so I don't think it's a problem now

ekzyis marked this as a draft on Apr 12, 2026

ekzyis force-pushed on Apr 12, 2026

ekzyis commented at 11:10 PM on April 12, 2026: contributor

https://github.com/bitcoin-core/qa-assets/commit/4babc489a28d335b2e7e1a4c723a8e77efa61c9e looks good to me now. Updated PR description.

I ran this against a test fuzz corpora in https://github.com/ekzyis/qa-assets/commit/a4ef010134b76b0a8a481069ca2255aef6a52410 (a different branch) to speed things up. Took 30 minutes with --extra-ref v30.2 --extra-ref v29.3. To do the same, run this in the VM:

$ git clone --depth=1 --branch=test-keep-fuzz-inputs-for-older-branches https://github.com/ekzyis/qa-assets
$ cp -r qa-assets/delete-nonreduced-fuzz-inputs .
$ cd delete-nonreduced-fuzz-inputs
$ sudo cargo run -- --extra-ref v30.2 --extra-ref v29.3

I had to allow empty commits because fuzzing another ref might not lead to any new fuzz inputs.

ekzyis marked this as ready for review on Apr 12, 2026

in delete-nonreduced-fuzz-inputs/src/main.rs:74 in 4babc489a2

  72 | +    let mut git_refs = Vec::new();
  73 | +    let mut args = env::args().skip(1);
  74 | +    while let Some(arg) = args.next() {
  75 |          match arg.as_str() {
  76 |              "--help" | "-h" => return Err(help("help requested")),
  77 | +            "--extra-ref" => {

maflcko commented at 8:05 AM on April 13, 2026:

I don't think this matches the docs above, which say:

Usage: delete-nonreduced-fuzz-inputs [--extra-ref=<ref>]...

My preference would be to keep the = and do the parsing, as suggested earlier in #269 (review):

    for arg in args {
        if let Some(s) = arg.strip_prefix("-h") {
            Err(help("Help requested"))?;
        } else if let Some(e) = arg.strip_prefix("--extra-ref=) {
//
} else {
            Err(help(&format!(
                "Too many args, or unknown named arg: {arg}"
            )))?;
        }
     }

But just a nit. An alternative could be to remove the = in the docs. Up to you.

ekzyis commented at 12:46 PM on April 13, 2026:

Oh, I tried using strip_prefix before, but with the existing match, and the combination was an experimental feature not available on the cargo version in the ubuntu VM. Seems like I didn't consider to not use match.

Using your pattern in e298604b38 now. I only changed this to recognize -hfoo as an unknown argument and support --help:

-        if let Some(_) = arg.strip_prefix("-h") {
+        if arg == "-h" || arg == "--help" {

in delete-nonreduced-fuzz-inputs/src/main.rs:492 in 4babc489a2

 485 | @@ -407,6 +486,23 @@ fn run_afl_cmin<P: AsRef<Path>, Q: AsRef<Path>>(
 486 |      Ok(())
 487 |  }
 488 |  
 489 | +/// Move every file in `dir` to its parent directory.
 490 | +fn move_files_to_parent_dir(dir: &PathBuf) -> AppResult {
 491 | +    for entry in fs::read_dir(dir).map_err(|e| format!("fs::read_dir failed: {}", e.to_string()))? {
 492 | +        let entry = entry.map_err(|e| format!("failed to read file: {}", e.to_string()))?;

maflcko commented at 8:15 AM on April 13, 2026:

        let entry = entry.map_err(|e| format!("failed to read entry: {}", e.to_string()))?;

nit

ekzyis commented at 12:47 PM on April 13, 2026:

fixed in 4b7804521f

in delete-nonreduced-fuzz-inputs/src/main.rs:501 in 4babc489a2

 496 | +            .ok_or_else(|| format!("expected {} to have parent", from.display()))?
 497 | +            .join(entry.file_name());
 498 | +        fs::rename(&from, &to).map_err(|e| format!("fs::rename failed: {}", e.to_string()))?;
 499 | +    }
 500 | +
 501 | +    fs::remove_dir_all(dir).map_err(|e| format!("fs::remove_dir_all failed: {}", e.to_string()))?;

maflcko commented at 8:16 AM on April 13, 2026:

why not https://doc.rust-lang.org/std/fs/fn.remove_dir.html ?

ekzyis commented at 1:05 PM on April 13, 2026:

Oh, good question! There was no reason. I tried using fs::remove_dir now, but I realized the directory isn't actually empty after the loop for v30.2, so fs::remove_dir fails.

First, I thought it might be because one shouldn't read a dir and move files in the same loop, so I collected all file names before renaming in 76292bffb9. This didn't fix the issue (probably still a good thing to do).

I then printed out all rename operations in 11d2d33b2d and can see that they all succeed, but the directory is indeed not empty when fs::remove_dir runs, see full log.

I think I'm running into the same problem I had when implementing this part in #266 with mv in bash:

$ mv v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2 007c7bebe2400a5f05d47382971a33e2244b4fc2
mv: 'v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2' and '007c7bebe2400a5f05d47382971a33e2244b4fc2' are the same file

except fs::rename doesn't even throw :thinking: I also didn't (and still don't) understand how they can be the same file. Sure, they have the same content, and same name, but wasn't one created when fuzzing with master, and the other one with v30.2? But when I run ls -i, sure enough, they point to the same inode:

$ ls -i v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2
$ ls -i 007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 007c7bebe2400a5f05d47382971a33e2244b4fc2

Not sure what's going on here. It's why I started to use cp instead of mv, as mentioned in the comment.

I'll figure this out to make sure this isn't hiding a bug instead of just using cp.

update:

Sure, they have the same content, and same name, but wasn't one created when fuzzing with master, and the other one with v30.2?

Oh, one explanation could be that afl-cmin isn't actually creating new inodes. The input file in all_inputs is the same inode:

$ ls -i all_inputs/addition_overflow/007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 all_inputs/addition_overflow/007c7bebe2400a5f05d47382971a33e2244b4fc2

maflcko commented at 1:36 PM on April 13, 2026:

Maybe afl-cmin creates a hard link? According to my LLM:

• afl-cmin primarily creates hard links to the minimized inputs (it uses link() in src/afl-cmin.c, os.link() in afl-cmin.py, and ln in afl-cmin.bash).

maflcko commented at 1:45 PM on April 13, 2026:

If those are really hard-links, I presume remove_dir_all is actually the correct function to call here

ekzyis commented at 5:33 PM on April 13, 2026:

Yes, afl-cmin in C++, bash and python is creating hard links. The script is installing the python version:

$ file $(which afl-cmin)
/usr/local/bin/afl-cmin: Python script, ASCII text executable

Ok, this explains it! Using remove_dir_all in 4373c7715e.

maflcko approved

maflcko commented at 8:17 AM on April 13, 2026: contributor

lgtm, just some nits

ekzyis force-pushed on Apr 13, 2026

ekzyis commented at 1:24 PM on April 13, 2026: contributor

Thanks for the review! I didn't squash the commits yet. I thought it would make the next review easier.

I'm currently debugging why I can't use fs::remove_dir, see #274 (review).

Keep fuzz inputs for older branches 4373c7715e

ekzyis force-pushed on Apr 13, 2026

ekzyis commented at 5:34 PM on April 13, 2026: contributor

Squashed all commits into 4373c7715e.

Btw, I didn't make sure that libFuzzer will only add fuzz targets to the output directory, and not delete the ones in the output directory that don't increase coverage for the fuzzed binary. I only looked at the documentation for -merge, and it doesn't mention it would:

You may use the same flag to add more interesting items to an existing corpus. Only the inputs that trigger new coverage will be added to the first corpus.

For -set_cover_merge, which is what we're using, I only found this comment in the source code:

// Merges all corpora into the first corpus. A file is added into
// the first corpus only if it adds new features. Unlike `Merger::Merge`,
// this implementation calculates an approximation of the minimum set
// of corpora files, that cover all known features (set cover problem).
// Generally, this means that files with more features are preferred for
// merge into the first corpus. When two files have the same number of
// features, the smaller one is preferred.
size_t Merger::SetCoverMerge(const std::set<uint32_t> &InitialFeatures,

in delete-nonreduced-fuzz-inputs/src/main.rs:500 in 4373c7715e

 495 | +        .ok_or_else(|| format!("expected {} to have parent", dir.display()))?;
 496 | +
 497 | +    for name in names {
 498 | +        let from = dir.join(&name);
 499 | +        let to = parent.join(&name);
 500 | +        fs::rename(&from, &to).map_err(|e| format!("fs::rename failed: {}", e.to_string()))?;

maflcko commented at 5:45 PM on April 14, 2026:

heh, I guess there could be a check to see if the file to exists, and then delete from. Otherwise, keep the move. This would allow to use remove_dir, but not important. Either way is fine.

maflcko approved

maflcko merged this on Apr 14, 2026

maflcko closed this on Apr 14, 2026

ekzyis deleted the branch on Apr 14, 2026