close #265
see comment below for more information
close #265
see comment below for more information
87 | } 88 | + Ok(git_refs) 89 | +} 90 | + 91 | +fn app() -> AppResult { 92 | + let mut git_refs = vec!["master".to_string()];
~we might want to use the output of git rev-parse --short master instead of master in the commit messages~
afaik, this wasn't a problem before, so I don't think it's a problem now
https://github.com/bitcoin-core/qa-assets/commit/4babc489a28d335b2e7e1a4c723a8e77efa61c9e looks good to me now. Updated PR description.
I ran this against a test fuzz corpora in https://github.com/ekzyis/qa-assets/commit/a4ef010134b76b0a8a481069ca2255aef6a52410 (a different branch) to speed things up. Took 30 minutes with --extra-ref v30.2 --extra-ref v29.3. To do the same, run this in the VM:
$ git clone --depth=1 --branch=test-keep-fuzz-inputs-for-older-branches https://github.com/ekzyis/qa-assets
$ cp -r qa-assets/delete-nonreduced-fuzz-inputs .
$ cd delete-nonreduced-fuzz-inputs
$ sudo cargo run -- --extra-ref v30.2 --extra-ref v29.3
I had to allow empty commits because fuzzing another ref might not lead to any new fuzz inputs.
72 | + let mut git_refs = Vec::new(); 73 | + let mut args = env::args().skip(1); 74 | + while let Some(arg) = args.next() { 75 | match arg.as_str() { 76 | "--help" | "-h" => return Err(help("help requested")), 77 | + "--extra-ref" => {
I don't think this matches the docs above, which say:
Usage: delete-nonreduced-fuzz-inputs [--extra-ref=<ref>]...
My preference would be to keep the = and do the parsing, as suggested earlier in #269 (review):
for arg in args {
if let Some(s) = arg.strip_prefix("-h") {
Err(help("Help requested"))?;
} else if let Some(e) = arg.strip_prefix("--extra-ref=) {
//
} else {
Err(help(&format!(
"Too many args, or unknown named arg: {arg}"
)))?;
}
}
But just a nit. An alternative could be to remove the = in the docs. Up to you.
Oh, I tried using strip_prefix before, but with the existing match, and the combination was an experimental feature not available on the cargo version in the ubuntu VM. Seems like I didn't consider to not use match.
Using your pattern in e298604b38 now. I only changed this to recognize -hfoo as an unknown argument and support --help:
- if let Some(_) = arg.strip_prefix("-h") {
+ if arg == "-h" || arg == "--help" {
485 | @@ -407,6 +486,23 @@ fn run_afl_cmin<P: AsRef<Path>, Q: AsRef<Path>>( 486 | Ok(()) 487 | } 488 | 489 | +/// Move every file in `dir` to its parent directory. 490 | +fn move_files_to_parent_dir(dir: &PathBuf) -> AppResult { 491 | + for entry in fs::read_dir(dir).map_err(|e| format!("fs::read_dir failed: {}", e.to_string()))? { 492 | + let entry = entry.map_err(|e| format!("failed to read file: {}", e.to_string()))?;
let entry = entry.map_err(|e| format!("failed to read entry: {}", e.to_string()))?;
nit
fixed in 4b7804521f
496 | + .ok_or_else(|| format!("expected {} to have parent", from.display()))? 497 | + .join(entry.file_name()); 498 | + fs::rename(&from, &to).map_err(|e| format!("fs::rename failed: {}", e.to_string()))?; 499 | + } 500 | + 501 | + fs::remove_dir_all(dir).map_err(|e| format!("fs::remove_dir_all failed: {}", e.to_string()))?;
Oh, good question! There was no reason. I tried using fs::remove_dir now, but I realized the directory isn't actually empty after the loop for v30.2, so fs::remove_dir fails.
First, I thought it might be because one shouldn't read a dir and move files in the same loop, so I collected all file names before renaming in 76292bffb9. This didn't fix the issue (probably still a good thing to do).
I then printed out all rename operations in 11d2d33b2d and can see that they all succeed, but the directory is indeed not empty when fs::remove_dir runs, see full log.
I think I'm running into the same problem I had when implementing this part in #266 with mv in bash:
$ mv v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2 007c7bebe2400a5f05d47382971a33e2244b4fc2
mv: 'v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2' and '007c7bebe2400a5f05d47382971a33e2244b4fc2' are the same file
except fs::rename doesn't even throw :thinking: I also didn't (and still don't) understand how they can be the same file. Sure, they have the same content, and same name, but wasn't one created when fuzzing with master, and the other one with v30.2? But when I run ls -i, sure enough, they point to the same inode:
$ ls -i v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 v30.2/007c7bebe2400a5f05d47382971a33e2244b4fc2
$ ls -i 007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 007c7bebe2400a5f05d47382971a33e2244b4fc2
Not sure what's going on here. It's why I started to use cp instead of mv, as mentioned in the comment.
I'll figure this out to make sure this isn't hiding a bug instead of just using cp.
update:
Sure, they have the same content, and same name, but wasn't one created when fuzzing with master, and the other one with v30.2?
Oh, one explanation could be that afl-cmin isn't actually creating new inodes. The input file in all_inputs is the same inode:
$ ls -i all_inputs/addition_overflow/007c7bebe2400a5f05d47382971a33e2244b4fc2
4497220 all_inputs/addition_overflow/007c7bebe2400a5f05d47382971a33e2244b4fc2
Maybe afl-cmin creates a hard link? According to my LLM:
• afl-cmin primarily creates hard links to the minimized inputs (it uses link() in src/afl-cmin.c, os.link() in afl-cmin.py, and ln in afl-cmin.bash).
If those are really hard-links, I presume remove_dir_all is actually the correct function to call here
lgtm, just some nits
Thanks for the review! I didn't squash the commits yet. I thought it would make the next review easier.
I'm currently debugging why I can't use fs::remove_dir, see #274 (review).
Squashed all commits into 4373c7715e.
Btw, I didn't make sure that libFuzzer will only add fuzz targets to the output directory, and not delete the ones in the output directory that don't increase coverage for the fuzzed binary. I only looked at the documentation for -merge, and it doesn't mention it would:
You may use the same flag to add more interesting items to an existing corpus. Only the inputs that trigger new coverage will be added to the first corpus.
For -set_cover_merge, which is what we're using, I only found this comment in the source code:
// Merges all corpora into the first corpus. A file is added into
// the first corpus only if it adds new features. Unlike `Merger::Merge`,
// this implementation calculates an approximation of the minimum set
// of corpora files, that cover all known features (set cover problem).
// Generally, this means that files with more features are preferred for
// merge into the first corpus. When two files have the same number of
// features, the smaller one is preferred.
size_t Merger::SetCoverMerge(const std::set<uint32_t> &InitialFeatures,
495 | + .ok_or_else(|| format!("expected {} to have parent", dir.display()))?; 496 | + 497 | + for name in names { 498 | + let from = dir.join(&name); 499 | + let to = parent.join(&name); 500 | + fs::rename(&from, &to).map_err(|e| format!("fs::rename failed: {}", e.to_string()))?;
heh, I guess there could be a check to see if the file to exists, and then delete from. Otherwise, keep the move. This would allow to use remove_dir, but not important. Either way is fine.