fuzz: Faster utxo_snapshot fuzz target

maflcko commented at 8:23 am on August 13, 2024: member

Two commits to speed up unit and fuzz tests.

Can be tested by running the fuzz target and looking at the time it took, or by looking at the flamegraph. For example:

0FUZZ=utxo_snapshot perf record -g --call-graph dwarf ./src/test/fuzz/fuzz  -runs=100
1hotspot ./perf.data

DrahtBot commented at 8:23 am on August 13, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	TheCharlatan, marcofleon, brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
#30157 (Fee Estimation via Fee rate Forecasters by ismaelsadeeq)
#29307 (util: explicitly close all AutoFiles that have been written by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Aug 13, 2024

test: Disallow fee_estimator construction in ChainTestingSetup

It is expensive to construct, and only one test uses it.

Fix both issues by disallowing the construction and moving it to the
single test that uses it.

fae8c73d9e

maflcko force-pushed on Aug 13, 2024

DrahtBot added the label CI failed on Aug 13, 2024

DrahtBot commented at 8:31 am on August 13, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28695005223

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

dergoegge commented at 8:33 am on August 13, 2024: member

What is the expected speed up?

TheCharlatan approved

TheCharlatan commented at 9:30 am on August 13, 2024: contributor

Nice, ACK fa48ded21168740640ff9d6e5b413a322d58b114

I’d re-ack quickly if you want to fix the other includes as well.

DrahtBot removed the label CI failed on Aug 13, 2024

maflcko commented at 9:00 am on August 14, 2024: member

I see another 5% in the ECC_Start flame on my machine, which can be avoided, so I’ll try to rework the second commit.

maflcko force-pushed on Aug 14, 2024

maflcko commented at 9:47 am on August 14, 2024: member

Force pushed for :rocket: lazy re-init, which gives:

100x iterations/s in the lazy case
1.35x iterations/s in the expensive case

marcofleon commented at 11:49 am on August 14, 2024: contributor

Tested ACK fa198d1a196482bf363ba57a3f4d6d2d860b8ce5

DrahtBot requested review from TheCharlatan on Aug 14, 2024

maflcko commented at 12:38 pm on August 14, 2024: member

Thanks for the review! Obviously this is fine to merge, but I am still unhappy about the average performance increase, which isn’t 100x, but closer to the worst case performance increase (1.35x). I’ll push once more, sorry.

marcofleon commented at 12:52 pm on August 14, 2024: contributor

Sounds good, I’ll review again when it’s improved then. Looked to be about 2x when I just tested it.

maflcko force-pushed on Aug 14, 2024

maflcko commented at 3:40 pm on August 14, 2024: member

Ok, I think this is easiest to solve by defining two fuzz targets, one that searches for invalid inputs (fast), and one that searches for valid (or invalid) inputs (slow).

This makes it possible to find the bug from #30514 (comment) “immediately”.

I’ve based this on a previous commit, with the bug included, so that it is trivial to test:

0FUZZ=utxo_snapshot_invalid UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" ./src/test/fuzz/fuzz

(Took a few seconds to hit the crash, but previously, I couldn’t find it even after days)

maflcko force-pushed on Aug 15, 2024

fuzz: Remove unused DataStream object fa645c7a86

maflcko force-pushed on Aug 15, 2024

maflcko commented at 8:08 am on August 15, 2024: member

I realized that I limited the search space too much for the fuzz target that deals with invalid inputs only. I’ve fixed it in the last push, and added a commit to remove an unused symbol.

This brings back the performance increase to 10x (or so) for the “fast” fuzz target. Also, the bug from #30514 (comment) is no longer found “immediately”.

One of the processes just found it after 700k iterations after one hour, which seems reasonable. The process command:

0FUZZ=utxo_snapshot_invalid UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" nice ./bld/src/test/fuzz/fuzz -max_len=128 -use_value_profile=1

TheCharlatan commented at 1:02 pm on August 15, 2024: contributor

I still see a doubling in performance of the valid test if the validation cache is initialized with minimal values. Do you want to add it back?

current HEAD:

0[#10000](/bitcoin-bitcoin/10000/)	DONE   cov: 2794 ft: 3029 corp: 66/642b lim: 33 exec/s: 161 rss: 406Mb
1Done 10000 runs in 62 second(s)

+ min validation cache:

0[#10000](/bitcoin-bitcoin/10000/)	DONE   cov: 2788 ft: 3011 corp: 54/431b lim: 68 exec/s: 333 rss: 350Mb
1Done 10000 runs in 30 second(s)

maflcko commented at 2:29 pm on August 15, 2024: member

I still see a doubling in performance of the valid test if the validation cache is initialized with minimal values. Do you want to add it back?

Sure, done. Personally, I didn’t care about the “valid” fuzz target much, but making it twice as fast can’t hurt.

For reference, the flame graph for the “valid” fuzz target now shows that 80% of the time is spent in “ProcessNewBlockHeader”, which I do not know how to avoid, and I don’t think it is worth it. The goal for this pull request was to make finding the bug #30514 (comment) possible by a fuzz engine in a reasonable time, without having a human construct the fuzz inputs manually, or manually write a fuzz input generator, which I did to find the bug.

in src/test/util/setup_common.h:88 in fa69afd518 outdated

83@@ -82,6 +84,7 @@ struct ChainTestingSetup : public BasicTestingSetup {
84     bool m_coins_db_in_memory{true};
85     bool m_block_tree_db_in_memory{true};
86     std::function<void()> m_make_chainman{};
87+    std::function<void(kernel::ChainstateManagerOpts&)> m_modify_chainman_opts{}; // Called before m_make_chainman

TheCharlatan commented at 3:16 pm on August 15, 2024:

Nit: I find these callbacks a bit ugly, how about:

 0diff --git a/src/test/fuzz/utxo_snapshot.cpp b/src/test/fuzz/utxo_snapshot.cpp
 1index 0b4e0d183a..a30ac53f97 100644
 2--- a/src/test/fuzz/utxo_snapshot.cpp
 3+++ b/src/test/fuzz/utxo_snapshot.cpp
 4@@ -54,0 +55 @@ void initialize_chain()
 5+                                               .min_validation_cache = true,
 6diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
 7index cf47d16faf..62ff61b227 100644
 8--- a/src/test/util/setup_common.cpp
 9+++ b/src/test/util/setup_common.cpp
10@@ -247 +247 @@ ChainTestingSetup::ChainTestingSetup(const ChainType chainType, TestOpts opts)
11-    m_make_chainman = [this, &chainparams] {
12+    m_make_chainman = [this, &chainparams, opts] {
13@@ -249 +249 @@ ChainTestingSetup::ChainTestingSetup(const ChainType chainType, TestOpts opts)
14-        const ChainstateManager::Options chainman_opts{
15+        ChainstateManager::Options chainman_opts{
16@@ -256,0 +257,4 @@ ChainTestingSetup::ChainTestingSetup(const ChainType chainType, TestOpts opts)
17+        if (opts.min_validation_cache) {
18+            chainman_opts.script_execution_cache_bytes = 0;
19+            chainman_opts.signature_cache_bytes = 0;
20+        }
21diff --git a/src/test/util/setup_common.h b/src/test/util/setup_common.h
22index 7d69551516..9ff2c5978e 100644
23--- a/src/test/util/setup_common.h
24+++ b/src/test/util/setup_common.h
25@@ -57,0 +58 @@ struct TestOpts {
26+    bool min_validation_cache{false}

maflcko commented at 6:09 am on August 16, 2024:

Sure, fixed the style

in src/test/fuzz/utxo_snapshot.cpp:34 in fa0dd5896b outdated

33+#include <functional>
34+#include <ios>
35+#include <memory>
36+#include <optional>
37+#include <ranges>
38+#include <vector>

TheCharlatan commented at 7:29 pm on August 15, 2024:

Nit: The includes need to be corrected again.

maflcko commented at 6:10 am on August 16, 2024:

Sure, removed the ranges include.

(Maybe another reason to enforce iwyu in the CI)

TheCharlatan approved

TheCharlatan commented at 7:33 pm on August 15, 2024: contributor

ACK fa69afd51879f8441df7a51144318a49e7dd27b9

The logic is a bit convoluted, but I don’t have many suggestions for improving it and I think it is important to get this performance improvement deployed quickly.

DrahtBot requested review from marcofleon on Aug 15, 2024

fuzz: Speed up utxo_snapshot by lazy re-init

The re-init is expensive, so skip it if there is no need.

Also, add an even faster fuzz target utxo_snapshot_invalid, which does
not need any re-init at all.

fa386642b4

maflcko force-pushed on Aug 16, 2024

marcofleon commented at 10:55 am on August 16, 2024: contributor

exec:s exec:s_dos

The second one just shows the same plot zoomed in on the last 600k iterations or so. I’m going to follow up with a plot of executions vs time as well. And I’ll do more than a million iterations, as it seems like utxo_snapshot_invalid might end up being faster if fuzzed for hours.

I compiled using only --with-sanitizers=fuzzer. Should I test for speed using other sanitizers or will the performance difference be similar regardless?

maflcko commented at 11:18 am on August 16, 2024: member

@marcofleon Thanks for checking. I presume you checked the utxo_snapshot target, not the utxo_snapshot_invalid target and you found that the new target is slower in your run? I guess it is a bit hard to compare a random search path done by the fuzz engine in the input search space, because different paths may take a different time. As mentioned earlier, improving the “valid” target isn’t a goal of this pull (https://github.com/bitcoin/bitcoin/pull/30644#issuecomment-2291393187). However, I’d be interested to understand this nonetheless. Can you try (if you want):

Iterating over the same set of input files, without a search/generation?
Try to set the same generate-seed? However, this will probably not help, because the coverage feedback is different, so the two runs will diverge. It is possible that the fuzz engine thinks the expensive reinit path is more interesting, because it yields more coverage (possibly non-determinstic), so spends more time there.
Plot a flame graph, comparing the time spent in each flame/function before and after?
Test the utxo_snapshot_invalid target?
Set if constexpr (!INVALID) { dirty_chainman = true ; } to always force th expensive path, and then do more runs than N=1?
Run the test more often than N=1?
Check which commit helps/hurts

marcofleon commented at 11:32 am on August 16, 2024: contributor

Thank you for the suggestions, I’ll do more trials for sure. That actually is the utxo_snapshot_invalid target compared to the old utxo_snapshot (sorry, file names should’ve been clearer). It’s much faster at the beginning. I ran it from scratch, not on any corpus. I’m generating a corpus now for the invalid target and then I’ll do a comparison for both when fuzzing from a corpus.

Also, for the bug from #30514 (comment). What’s the reason for max_len=128? Is that necessary to find the bug in a reasonable amount of time?

marcofleon commented at 11:36 am on August 16, 2024: contributor

To clarify, I reproduced the bug fairly quickly with the line you provided in #30644 (comment). I would like to see what happens with max_len not specified.

maflcko commented at 11:39 am on August 16, 2024: member

Thank you for the suggestions, I’ll do more trials for sure. That actually is the utxo_snapshot_invalid target compared to the old utxo_snapshot (sorry, file names should’ve been clearer).

Are you sure you are on the right commit binaries? Locally I am seeing a speed difference of night and day. 1e6 iterations with the new target should happen within a few minutes, whereas with the old it would take hours/days.

Also, for the bug from #30514 (comment). What’s the reason for max_len=128? Is that necessary to find the bug in a reasonable amount of time?

I just spawned many threads with many settings and this one just happened to finish first. However, for this bug I found that -use_value_profile=1 helps to find the bug twice as fast on my machine (in number of iterations) and max_len didn’t affect the result (at least not visibly).

maflcko commented at 11:45 am on August 16, 2024: member

I just re-tried FUZZ=utxo_snapshot_invalid ./fuzz_exe_fa0b66400c12d11579b38b1ac76f677b23e8c82a -seed=1337 -runs=1000000 and it took less time than to write the second-last comment, and as much time as FUZZ=utxo_snapshot ./fuzz_exe_389cf32aca0be6c4b01151c92cc3be79c120f197 -seed=1337 -runs=50000.

marcofleon commented at 11:46 am on August 16, 2024: contributor

I’ll recheck my setup and get back to you.

TheCharlatan approved

TheCharlatan commented at 12:48 pm on August 16, 2024: contributor

Re-ACK fa0b66400c12d11579b38b1ac76f677b23e8c82a

I saw the old “libFuzzer disabled leak detection” info message in the invalid test, but not sure what might be causing it.

maflcko commented at 12:57 pm on August 16, 2024: member

I saw the old “libFuzzer disabled leak detection” info message in the invalid test, but not sure what might be causing it.

I haven’t seen that one. Are you saying it only happens for this test? What are the steps to reproduce?

TheCharlatan commented at 1:18 pm on August 16, 2024: contributor

I haven’t seen that one. Are you saying it only happens for this test? What are the steps to reproduce?

Only saw it for the invalid test after about 2 million iterations just running FUZZ=utxo_snapshot_invalid ./src/test/fuzz/fuzz, configured with address,fuzzer (though I guess that should not matter).

maflcko commented at 1:34 pm on August 16, 2024: member

Lsan should be enabled by default, so if there was a problem, it should still be found. I’ve enabled it explicitly in #30665, but that should be unrelated to the changes here.

maflcko commented at 9:58 am on August 19, 2024: member

@marcofleon Another question I’d ask is: What software are you using? Apple-, Microsoft-, or “Anti-Virus”- software? They have a history of scanning, intercepting, and (temporarily) blocking file IO, so absent any more information or steps to reproduce, this would be my most likely guess. Most useful for me right now would be any kind of performance measurement on your machine, for example a flame graph of a fuzz run, or any other measurement that shows where the most time is spent.

(I may or may not create a follow-up to remove some file IO, to possibly improve speed the fuzz target on a non-ramdisk, but fuzzing on a non-ramdisk seems an edge case anyway and unclear whether it is worth it to improve.)

marcofleon commented at 1:38 pm on August 19, 2024: contributor

I’m using Debian. Here’s my recent run (2 million iterations) using an initial corpus for both targets:

Screenshot 2024-08-19 at 10 54 46 AM

The new target is clearly faster. I think this is a more useful run because we generally fuzz from a corpus of inputs. I also reproduced the bug using only -use_value_profile=1. So overall, I would say this PR is good to go. I agree that it’s probably not really worth spending time to improve it for non-ramdisk fuzzing, but up to you.

It is interesting that the new target eventually slows down significantly when fuzzing from no inputs. I ran this: FUZZ=utxo_snapshot_invalid perf record -g --call-graph dwarf ./src/test/fuzz/fuzz -runs=200000 and here is flamegraph: https://marcofleon.github.io/flamegraph/19-8-utxo_snap_invalid.svg

marcofleon commented at 1:41 pm on August 19, 2024: contributor

Let me know if you think there is anything more I can do to test. Looking at that flame graph is fun so I’m happy to keep fuzzing. It seems like your guess was spot on, as it does look like FlushStateToDisk is a pretty solid portion of the time spent.

maflcko commented at 2:30 pm on August 19, 2024: member

Ok, the flamegraph helps. With a quick skim, it looks like all flames are somehow IO (read or write) related. This means that you are not evaluating the fuzz target itself, but the speed of your storage and how it can handle the data that the fuzz engine produces for it to read and write. I guess, starting with an empty fuzz input set somehow encourages the fuzz engine to create large inputs, which then cause a high IO usage.

My recommendation would be to make sure you are fuzzing on a ramdisk, because otherwise you constrain the fuzz target to proceed quickly. What is df --human-readable /tmp/?

If you want to continue fuzzing on an SSD or even “rusty storage”, eliminating the “AutoFile” flames (~20%) should be trivial by using fmemopen. However, that may not be enough and you’d also have to open the assumed-chainstate in memory, which would be another ~40% of flames. However, as mentioned previously, this is best done in a separate pull request.

marcofleon commented at 2:47 pm on August 19, 2024: contributor

Makes sense, was just working with Niklas to get this done. The flame graphs are a useful new tool for me too, I’ll be utilizing those more. Thanks for the help @maflcko.

marcofleon commented at 2:55 pm on August 19, 2024: contributor

ACK fa0b66400c12d11579b38b1ac76f677b23e8c82a

maflcko added this to the milestone 28.0 on Aug 19, 2024

maflcko force-pushed on Aug 19, 2024

maflcko commented at 3:14 pm on August 19, 2024: member

Force pushed to remove stray, leftover, unused lines in the diff of the last commit. Should be easy to re-review.

in src/test/util/setup_common.h:31 in fab31b9aa0 outdated

27@@ -28,7 +28,6 @@ class arith_uint256;
28 class CFeeRate;
29 class Chainstate;
30 class FastRandomContext;
31-

TheCharlatan commented at 9:24 pm on August 19, 2024:

I don’t think this empty line should be removed.

maflcko commented at 5:54 am on August 20, 2024:

Thanks, added back

fuzz: Speed up utxo_snapshot fuzz target

This speeds up the fuzz target, which allows "valid" inputs. It does not
affect the "INVALID" fuzz target.

fa899fb7aa

maflcko force-pushed on Aug 20, 2024

TheCharlatan approved

TheCharlatan commented at 6:59 am on August 20, 2024: contributor

Re-ACK fa899fb7aa8a14acecadd8936ad5824fa0f697ff

maflcko removed review request from marcofleon on Aug 20, 2024

maflcko requested review from marcofleon on Aug 20, 2024

marcofleon approved

marcofleon commented at 9:22 am on August 20, 2024: contributor

Re ACK fa899fb7aa8a14acecadd8936ad5824fa0f697ff

brunoerg approved

brunoerg commented at 9:08 pm on August 20, 2024: contributor

ACK fa899fb7aa8a14acecadd8936ad5824fa0f697ff

Left it running for a while, got a clear improvement compared to master.

achow101 merged this on Aug 21, 2024

achow101 closed this on Aug 21, 2024

maflcko deleted the branch on Aug 21, 2024

fuzz: Faster utxo_snapshot fuzz target #30644

Code Coverage

Reviews

Conflicts