agroce
commented at 4:30 pm on July 16, 2021:
contributor
This avoids the possibility of flooding /tmp during long fuzzing campaigns by allowing an environment variable to be set that causes the test_common_Bitcoin Core directory to be cleared after each fuzz run. Because libFuzzer campaigns seem to produce less data in /tmp, I believe the possibility may not have been noticed before. A 4 hour Eclipser run produced nearly 350GB in test_common_Bitcoin Core. Tested locally.
Add environment option to keep /tmp/ clean
This avoids the possibility of flooding /tmp during long fuzzing campaigns
c07998c0d1
DrahtBot added the label
Tests
on Jul 16, 2021
agroce
commented at 6:03 pm on July 16, 2021:
contributor
Looks as if I need to add -lstdc++fs somewhere, perhaps, to build in all envs. Anyone know where?
Use boost to avoid adding new link8e03c48a5b
Type for 2nd arg to boost::filesystem::remove_alld13ce4e81d
Remove extra semicolon35323b08d3
Include boost error codes5c31c13708
Just call, don't add dependencyd0beba5ae2
maflcko
commented at 8:34 am on July 17, 2021:
member
Is this only an issue with Eclipser?
If not, can you please provide a list of the largest files/folders in that directory?
Locally, I am seeing that the size stays constant, as it should be:
agroce
commented at 1:53 pm on July 17, 2021:
contributor
I haven’t tried with AFL. It may be only under rare circumstances? But not in libFuzzer (or an AFL persist loop?)? I’ll show you a bad run later. You end up with many (very many) entries with different hashes in the ‘/tmp/test_common_Bitcoin Core/` dir. Are you saying multiple fuzz processes should use the same one? Maybe if one is launched while another is still somehow live?
Yes, I plan to add an Eclipser guide once I’ve hashed out some issues.
agroce
commented at 2:11 pm on July 17, 2021:
contributor
maflcko
commented at 7:22 am on July 18, 2021:
member
So there will be plenty of leftover directories if you CTRL+C out of the process, but then the fix here wouldn’t improve that either.
agroce
commented at 9:19 am on July 18, 2021:
contributor
Well, the processes are running under a fuzzer’s control. Some exit abnormally for reasons not clear, that aren’t flagged as crashes. For Eclipser these may be QEMU failures.
Because the last process may leave a non-cleaned-up directory, the next one (Eclipser/AFL may run millions) will clean itup so they don’t slowly accumulate over time.
Running any fuzzer that uses a new process per run or a fork will probably do this, is the way to reproduce it. I can give Eclipser install/run instructions if you want…
agroce
commented at 10:01 am on July 18, 2021:
contributor
At a guess, what happens is that every 1000 or so runs die without cleaning up but without crashing. The next run will clean up, with this set. It’s not ideal, and we should figure out the underlying solution, but it makes process-based fuzzing feasible on limited space systems.
agroce
commented at 6:35 pm on July 18, 2021:
contributor
I’ll add a full guide later, but I think you can reliably reproduce by installing Eclipser 1.x:
it will take a while (you can probably seed with something less than the full QA assets for process message, but I haven’t tried that myself), but eventually data will start piling up in /tmp. The most likely culprit is QEMU bugs that terminate the process, but are not counted as crashes (some definitely happen during corpus replay).
agroce
commented at 4:37 pm on July 19, 2021:
contributor
Confirmed that running AFL without setting this:
Produces many dirs in /tmp/test_common_Bitcoin Core
These are due to certain inputs, not crashes or timeouts of some sort, since I used plain google AFL, which terminates if a seed input dies unusually, and this was during seed execution.
The rate was 273MB, over 17 directories that did not get cleaned up, over about 6K seed inputs (some from the QA assets, some discovered locally by Eclipser/libFuzzer). So whatever causes no cleanup is quite rare. Do we want to find out what inputs cause this? I can zip them up.
maflcko
commented at 4:53 pm on July 19, 2021:
member
Good find that this doesn’t reproduce with libFuzzer. OSS-Fuzz reports also “Fuzzing Engine: afl”.
OSS-Fuzz has tagged them with “UNREPRODUCIBLE”, so I doubt the issue is with the seeds. Do you have the stdout/stderr from afl when the process exited abnormally?
agroce
commented at 5:11 pm on July 19, 2021:
contributor
Nothing ever exits abnormally. I killed afl/eclipser after a while, and observed the extra dirs in /tmp. Eventually, enough of these will cause src/test/fuzz/fuzz to abort with a “no space” message, producing spurious UNREPRODUCIBLE “crashes.” The only way to diagnose I see is to check which inputs cause the problem (assuming it is deterministic). The UNREPRODUCIBLE is because the “failing” tests aren’t the problem, the problem is earlier non-crashes that exhaust storage space.
maflcko
commented at 5:31 pm on July 19, 2021:
member
Well, something has to exit abnormally, unless I am misunderstanding something.
The static should keep the object around until after the process exits the main function normally. If the datadir is still around when it shouldn’t, it implies the process didn’t exit normally?
agroce
commented at 5:39 pm on July 19, 2021:
contributor
It’s not just afl, though; it and eclipser are using very different approaches (source instrumentation via compilation, vs. QEMU on an un-instrumented binary).
I should have some useful data soon. I agree it’s a mystery!
agroce
commented at 5:55 pm on July 19, 2021:
contributor
What’s weird is any kind of abort/crash should stop afl corpus replay. Eclipser does have some QEMU failures during corpu replay, but those can’t affect afl…
I’m scanning the QA assets now. But it’s run 1700 without seeing any “cookie crumbs” in /tmp, so maybe something does depend on the fuzzing process. I’ll also scan the (much larger) local corpus, in case it IS a deterministic problem with inputs, but not present in the QA asset seeds.
agroce
commented at 6:12 pm on July 19, 2021:
contributor
5500 without anything being left in /tmp. Hypothesis:
nothing is wrong in bitcoin code
the fuzzers here (unlike libfuzzer) cause some non-crash early exits of some sort (QEMU failures will explain Eclipser, maybe afl does some kind of silent timeout/fork-server business I don’t know about during corpus replay), and those prevent cleanup
over time this accumulates junk causing spurious failures like OSS-Fuzz sees, and I see
maflcko
commented at 6:14 pm on July 19, 2021:
member
Maybe this doesn’t happen during replay, but only while searching for new fuzz inputs?
agroce
commented at 6:39 pm on July 19, 2021:
contributor
it happens during corpus replay, though, with afl, I know.
agroce
commented at 6:42 pm on July 19, 2021:
contributor
Ok, no crumbs in /tmp for standalone runs with no instrumentation. Checking files I generated.
agroce
commented at 7:44 pm on July 19, 2021:
contributor
Ok, 13K of my corpus inputs checked, and nothing. It’s an artifact of the fuzzers, so I’m not sure there’s a way to avoid this other than something like this PR, and setting the env variable for OSS-Fuzz. Something has to clean it up, and I’m not sure an external watchdog can even work for OSS-Fuzz. However, OSS-Fuzz may be, as you point out, skipping fuzz.cpp’s main and running AFL on the libFuzzer harness basically, so something deeper may be needed there.
agroce
commented at 8:58 pm on July 19, 2021:
contributor
Nothing. I’ll look to see if I can make this fix also fix the OSS-Fuzz issue, inside the libFuzzer signature function (just looked, and that is how afl/honggfuzz hook there, also). I’ll see if I can find a way with minimal performance impact on libFuzzer, somehow (I think a static guard to do it once per run will work for that).
Try to fix OSS-Fuzz issues also24ad8174da
Just do the remove_all in the common initializede32caef5b
fanquake referenced this in commit
d542603c5a
on Jul 20, 2021
Fails banman, so back to last solution6344eb698c
agroce
commented at 5:38 pm on July 20, 2021:
contributor
@MarcoFalke do you have any idea why trying to remove the old dirs at the start of initialization for libFuzzer breaks thatt banman assertion? Is there initialization before that initialization, and I’m destroying it?
maflcko
commented at 5:45 pm on July 20, 2021:
member
Your solution only works when a single process is run on the machine. This assumption is violated when several fuzz targets are running or the same target in several processes.
maflcko
commented at 5:46 pm on July 20, 2021:
member
I think we only have two possible solutions here:
Get rid of all disk acess (might take some time to realize)
Fix the underlying bug
agroce
commented at 6:02 pm on July 20, 2021:
contributor
Oh I see, of course, how stupid of me.
My guess is the “underlying bug” is fuzzers producing terminations that 1) aren’t crashes but 2) avoid cleanup code execution. I think anything QEMU based is going to do that occasionally, and looks like AFL also does it, even during corpus replay.
One idea, I don’t know where the files come from, but could they be unlinked once opened? Depending on the access pattern that could work (since the handles will stay valid, but the files will disappear at process termination). But it won’t work if they are reopened, rather than just referenced through a one-time handle.
agroce
commented at 6:03 pm on July 20, 2021:
contributor
No disk access will likely improve throughput, at least in some situations, anyway, even over /tmp.
Clarify how much this fixes60afee48c3
agroce
commented at 6:49 pm on July 20, 2021:
contributor
What about using tmpfile – is that feasible?
maflcko
commented at 9:21 am on July 21, 2021:
member
If the program terminates abnormally, it is implementation-defined if these temporary files are deleted.
agroce
commented at 2:41 pm on July 24, 2021:
contributor
Interesting. What fuzzer have you been trying to reproduce under? This should do it:
0> rm -rf /tmp/test_common_Bitcoin*
1> git clone https://github.com/google/AFL.git
2> cd AFL
3> sudo make install
4> cd bitcoin
5> CC=afl-clang CXX=afl-clang++ ./configure --enable-fuzz
6> make clean; make -j 5
7> FUZZ=process_message afl-fuzz -i qa-assets/fuzz_seed_corpus/process_message -o fuzz_afl -m 500 -t 30000 -- src/test/fuzz/fuzz
give it a while (maybe an hour or two?), and /tmp should contain more than one leftover in test_common_Bitcoin Core. If that doesn’t work, maybe it isn’t a necessary result of afl/eclipser and there is some solution/environment aspect…
maflcko
commented at 9:29 am on July 25, 2021:
member
agroce
commented at 12:30 pm on July 25, 2021:
contributor
Hmm, sorry, missed that. Ok, that’s interesting. OSS-Fuzz seems to have seen these spurious space-triggered crashes, too. That’s presumably with aflplusplus?
Is there some difference in the environments? I guess if so the QEMU failures for Eclipser might not always cause this problem, conceivably, if we can isolate the issue.
Probably unrelated: do you know why aflplusplus can’t handle process_message? The lto instrumentation should generally be lower overhead/faster. The throughput on process_message with 2.57b is not great (3/sec or so) but aflplusplus times out every test even with crazy multi-second limits.
maflcko
commented at 2:35 pm on July 25, 2021:
member
do you know why aflplusplus can’t handle process_message?
The “every input times out” issue also happens with the historic google/afl in llvm_mode. Though it doesn’t happen with the aflpp_driver, otherwise OSS-Fuzz would have reported the issue, I presume?
The timeout won’t happen if I comment out SyncWithValidationInterfaceQueue.
maflcko
commented at 3:40 pm on July 25, 2021:
member
Haven’t been able to reproduce the “/tmp/ fills up” issue with afl-clang-fast/++ from AFL++, yet.
maflcko
commented at 1:10 pm on July 26, 2021:
member
Ok, so the root of the problem seems to be that fork can’t copy threads. Using another thread after fork is probably UB?! Unfortunately there doesn’t seem to be a way in AFL(pp) to disable fork, even for the aflpp_driver. So the solution might be to remove all threads, which would likely also fix #22551 .
agroce
commented at 1:53 pm on July 26, 2021:
contributor
One threadsafe(-ish) solution I thought of:
Instead of ‘remove_all’, crawl the dir and (even with no env variable, just unconditionally to avoid changes for OSS-Fuzz?) remove any subdirs in test_common… that are older than half an hour. They’ll never accumulate to the point of causing space issues, and no other fuzzer should have an input running anywhere near that long.
maflcko
commented at 4:15 pm on July 26, 2021:
member
I used the following diff to remove all threads, but it didn’t help
agroce
commented at 5:49 pm on July 26, 2021:
contributor
What do you think of simply automatically killing “old” subdirs of test_common_Bitcoin Core in process-based fuzzing (do nothing to libFuzzer target based fuzzing)?
maflcko
commented at 7:00 am on July 27, 2021:
member
separate-process-based fuzzing should be unaffected. See comments #22472 (comment) (no crash after 12 hours with my patch) and your screenshot in #22551 (also no crash?, disk not filled up?)
With fork-based fuzzing I am seeing a background noise of crashes (though unrelated to the disk space issue).
Preparing the affected fuzz targets for fork might also magically fix the other issues, I suspect.
maflcko
commented at 8:54 am on July 27, 2021:
member
agroce
commented at 1:36 pm on July 27, 2021:
contributor
Sorry, I didn’t mean true separate process-based (which might also see this problem, depending on the cause). I meant anything that’s not doing libFuzzer “test is a function call” approach. My big afl run isn’t crashing because it’s compiled with my PR and the environment variable is set to TRUE. :)
For me, afl or Eclipser in docker ubuntu 20.04 both reliably end up filling /tmp and failing most tests due to that, without the patch, so I’m running with the patch.
practicalswift
commented at 10:40 am on July 30, 2021:
contributor
Concept ACK on working around this issue (if we cannot fix the root cause).
Instead of removing the entire tree /tmp/test_common_Bitcoin Core/ on exit would it be possible to remove only the specific sub-directory used during the fuzzing session (say /tmp/test_common_Bitcoin Core/6d8a[…]dc38/)?
agroce
commented at 2:15 pm on July 30, 2021:
contributor
The trick is it has to remove not on exit (since failing to do so as it should is the problem) but on entry, when it’s the “last” one it needs to kill, whose name is unknown.
practicalswift
commented at 11:08 pm on July 30, 2021:
contributor
The trick is it has to remove not on exit (since failing to do so as it should is the problem) but on entry, when it’s the “last” one it needs to kill, whose name is unknown.
Oh, of course! I misunderstood the problem. Sorry! :)
agroce
commented at 11:13 pm on August 4, 2021:
contributor
If this isn’t causing much issue on OSS-Fuzz, perhaps a stopgap for Eclipser etc. fuzzing would be to remove code inside fuzz and add a script users can launch in background to occasionally clean up old leftover directories?
maflcko
commented at 5:47 am on August 5, 2021:
member
I’d still prefer to fix the underlying issue over a temporary workaround.
In #22472 (comment) I removed all threads. I wonder if I have to remove all locks too for fork to work properly in afl?
agroce
commented at 6:23 am on August 5, 2021:
contributor
I am concerned QEMU failures are going to always produce leftovers, for Eclipser…
Merge branch 'bitcoin:master' into master971aab70b0
DrahtBot
commented at 6:15 am on January 8, 2022:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#20744 (Use std::filesystem. Remove Boost Filesystem & System by fanquake)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
Needs rebase
on Feb 3, 2022
DrahtBot
commented at 4:11 pm on February 3, 2022:
contributor
🐙 This pull request conflicts with the target branch and needs rebase.
Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.
fanquake
commented at 1:35 pm on April 26, 2022:
member
What is the status of this? Needs to be reworked to remove boost::filesystem.
agroce
commented at 4:44 pm on April 26, 2022:
contributor
@MarcoFalke wanted a root cause fix; I haven’t been actively fuzzing bitcoin core lately, so am not sure if that ever happened? I assume not. I suspect for some fuzzers, it may be needed even with root-cause fixes, for some abnormal exits and long fuzzer runs.
maflcko
commented at 8:40 am on April 29, 2022:
member
Yeah, I’d prefer to fix the underlying issue. In the meantime, I think it is fine to provide a temporary opt-in workaround.
maflcko
commented at 8:40 am on April 29, 2022:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-11-22 00:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me