Fuzz: a more efficient descriptor parsing target

darosior commented at 1:13 pm on June 14, 2023: member

The current descriptor parsing fuzz target requires valid public or private keys to be provided. This is unnecessary as we are only interested in fuzzing the descriptor parsing logic here (other targets are focused on fuzzing keys serializations). And it’s pretty inefficient, especially for formats that need a checksum (xpub, xprv, WIF).

This introduces a new target that mocks the keys as an index in a list of precomputed keys. Keys are represented as 2 hex characters in the descriptor. The key type (private, public, extended, ..) is deterministically based on this one-byte value. Keys are deterministically generated at target initialization. This is much more efficient and also largely reduces the size of the seeds. TL;DR: for instance instead of requiring the fuzzer to generate a pk(xpub6DdBu7pBoyf7RjnUVhg8y6LFCfca2QAGJ39FcsgXM52Pg7eejUHLBJn4gNMey5dacyt4AjvKzdTQiuLfRdK8rSzyqZPJmNAcYZ9kVVEz4kj) to parse a valid descriptor, it just needs to generate a pk(03).

Note we only mock the keys themselves, not the entire descriptor key expression. As we want to fuzz the real code that parses the rest of the key expression (origin, derivation paths, ..).

This is a target i used for reviewing #17190 and #27255, and figured it was worth PR’ing on its own since the added complexity for mocking the keys is minimal and it could help prevent introducing bugs to the descriptor parsing logic much more efficiently.

DrahtBot commented at 1:13 pm on June 14, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	MarcoFalke, achow101
Concept ACK	dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26573 (Wallet: don’t underestimate the fees when spending a Taproot output by darosior)
#26567 (Wallet: estimate the size of signed inputs using descriptors by darosior)
#22838 (descriptors: Be able to specify change and receiving in a single descriptor string by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

darosior force-pushed on Jun 14, 2023

DrahtBot added the label CI failed on Jun 14, 2023

maflcko commented at 2:09 pm on June 14, 2023: member

Looks like this touches non-fuzz code? No opinion if people want this, but a much simpler implementation would be to just create a copy of the buffer and inject the pre-generated static string into it. (This is basically what a fuzz engine does automatically, with the difference that you can now provide the string dictionary yourself, directly in the fuzz target).

the fuzz target would look like:

0auto str{fdp.ConsumeString()};
1if (fdp.ConsumeBool()) {
2  str = MockDescriptor(str);
3}
4const auto desc = Parse(str);

With MockDescriptor doing a search of pk(xx) and then replacing it with pk(yy), where yy=raw_pubkeys[int(xx) % raw_pubkeys.size()] (already hex encoded).

This has the benefits of not mocking out the parsing logic, which for GetXOnlyPubKey is actually worthy to fuzz? Also, it allows to print the fuzz input, if needed, and use it over RPC for debugging. Finally, it doesn’t touch real code, only test code.

darosior commented at 2:20 pm on June 14, 2023: member

In what you suggest MockDescriptor would basically have to reimplement all of the descriptor parsing logic to be able to detect when is a key expected and replace the hex-encoded byte by an actual key. I figured the slight modification to the descriptor code to make key parsing mockable was preferable. (Note that’s what we have already in Miniscript which allows us to have an efficient miniscript_string fuzz target.)

To be clear this is not only about mocking pk() expressions but anywhere we’d expect a key.

DrahtBot removed the label CI failed on Jun 14, 2023

maflcko commented at 2:49 pm on June 14, 2023: member

Ok, I see. I guess another alternative would be to randomly inject random pre-generated keys at random positions in the string, absent of any logic. Then let the fuzz engine figure out the right positions via coverage feedback.

No strong opinion, just leaving random ideas that can be implemented with less code.

dergoegge commented at 2:50 pm on June 14, 2023: member

Concept ACK

Mocking seems fine to me, but I wonder if we could achieve something similar by placing (in)valid encoded keys in a fuzz dictionary (e.g. https://github.com/bitcoin-core/qa-assets/pull/122), which is also almost the same as Marco’s suggestion. This would be less efficient compared to what is in this PR but the inputs would still be available for debugging over RPC. We can of course also just do both since adding to the dictionaries is very easy.

darosior commented at 3:16 pm on June 14, 2023: member

Thanks both for throwing in ideas. Note i don’t have strong opinions here either, it’s just some review code that i figured could be helpful having in too. However i still don’t think the approaches suggested here would be better:

Inserting valid keys at random positions in the input. I’m assuming you describe something that looks like 1) get the number of keys from the fuzzer output 2) get the positions to insert each at from the fuzzer output. I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.
Including valid keys in a dictionary. Again, i’m not very familiar with the inernals of fuzzing engines but i suspect this would make the fuzzer grind the key with virtually no chance of finding another valid one, wasting a lot of cycles.
I don’t think keeping the raw seed human readable for debugging over RPC should generally be a goal, especially not at the expense of efficiency. To get something readable you can simply run ./src/test/fuzz/fuzz ./crash-XXXXX with a ToString() printed to stdout.

sipa commented at 3:19 pm on June 14, 2023: member

Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form “%XX” where XX is two hex characters, is replaced by a lookup in a table.

darosior commented at 3:21 pm on June 14, 2023: member

Hmm good point. Using a distinguishable character fixes the issue of having to “basically reimplement descriptor parsing logic”. ——- Original Message ——- On Wednesday, June 14th, 2023 at 5:19 PM, Pieter Wuille @.***> wrote:

Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form “%XX” where XX is two hex characters, is replaced by a lookup in a table.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

maflcko commented at 3:31 pm on June 14, 2023: member

I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.

I am happy to run a bench, if you happen to have a bug laying around :)

darosior commented at 3:35 pm on June 14, 2023: member

I don’t, but i think Pieter settled the debate anyways. :) ——- Original Message ——- On Wednesday, June 14th, 2023 at 5:31 PM, MacrabFalke @.***> wrote:

I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.

I am happy to run a bench, if you happen to have a crash+bug laying around :)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

darosior force-pushed on Jun 15, 2023

DrahtBot added the label CI failed on Jun 15, 2023

darosior force-pushed on Jun 15, 2023

darosior commented at 2:42 pm on June 15, 2023: member

Updated following @sipa’s suggestion of using a marker to be able to search and replace in the descriptor string directly.

The diff is now twice smaller, and this does not touch the descriptor parsing logic anymore. After running the new mocked_descriptor_parse target for half a hour on my laptop i get more branch coverage for descriptor.cpp than with the existing corpus for descriptor_parse.

`descriptor_parse`

`mocked_descriptor_parse`

DrahtBot removed the label CI failed on Jun 15, 2023

in src/test/fuzz/descriptor_parse.cpp:48 in 845810d94d outdated

42+
43+            // If this is a "raw" key, generate a normal privkey. Otherwise generate
44+            // an extended one.
45+            if (IdIsCPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
46+                CKey privkey;
47+                privkey.Set(key_data, key_data + 32, true);

achow101 commented at 5:08 pm on June 15, 2023:

In 845810d94df83747634a60ff59635a670bba1124 “fuzz: add a new, more efficient, descriptor parsing target”

Perhaps also include uncompressed keys?

darosior commented at 8:43 am on June 16, 2023:

Good call, done.

darosior force-pushed on Jun 16, 2023

in src/test/fuzz/descriptor_parse.cpp:51 in bf1d242e44 outdated

47+                CKey privkey;
48+                privkey.Set(key_data, key_data + 32, true);
49+                if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
50+                    CPubKey pubkey{privkey.GetPubKey()};
51+                    if (IdIsUnCompPubKey(i)) assert(pubkey.Decompress());
52+                    keys_str[i] = HexStr(pubkey);

achow101 commented at 5:22 pm on June 16, 2023:

In fad60fc6f6486d3b28de4bbe937e293f65d5cbd5 “fuzz: increase coverage of the descriptor targets”

The second argument of Set is the compressedness, so this can be simplified a bit:

0                privkey.Set(key_data, key_data + 32, IdIsCompPubKey(i));
1                if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
2                    CPubKey pubkey{privkey.GetPubKey()};
3                    keys_str[i] = HexStr(pubkey);

darosior commented at 12:16 pm on June 18, 2023:

Done.

 0diff --git a/src/test/fuzz/descriptor_parse.cpp b/src/test/fuzz/descriptor_parse.cpp
 1index 9a7b6ea308..529dadadc1 100644
 2--- a/src/test/fuzz/descriptor_parse.cpp
 3+++ b/src/test/fuzz/descriptor_parse.cpp
 4@@ -45,10 +45,9 @@ public:
 5             // an extended one.
 6             if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
 7                 CKey privkey;
 8-                privkey.Set(key_data, key_data + 32, true);
 9+                privkey.Set(key_data, key_data + 32, !IdIsUnCompPubKey(i));
10                 if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
11                     CPubKey pubkey{privkey.GetPubKey()};
12-                    if (IdIsUnCompPubKey(i)) assert(pubkey.Decompress());
13                     keys_str[i] = HexStr(pubkey);
14                 } else if (IdIsXOnlyPubKey(i)) {
15                     const XOnlyPubKey pubkey{privkey.GetPubKey()};

darosior force-pushed on Jun 18, 2023

luke-jr commented at 2:11 pm on June 24, 2023: member

Suggest putting “Fuzz” in title, and labelling.

darosior renamed this:
~~A more efficient descriptor parsing target~~
Fuzz: a more efficient descriptor parsing target
on Jun 27, 2023

DrahtBot added the label Tests on Jun 27, 2023

DrahtBot added the label Needs rebase on Jul 17, 2023

fuzz: make the parsed descriptor testing into a function

We'll be reusing it in the new target.

d60229ede5

darosior force-pushed on Jul 21, 2023

DrahtBot removed the label Needs rebase on Jul 21, 2023

DrahtBot added the label CI failed on Jul 21, 2023

maflcko commented at 10:42 am on July 21, 2023: member

For testing I’ve injected a bug in currently uncovered code:

 0diff --git a/src/script/descriptor.cpp b/src/script/descriptor.cpp
 1index 09ded5fc61..b69db182ab 100644
 2--- a/src/script/descriptor.cpp
 3+++ b/src/script/descriptor.cpp
 4@@ -509,7 +509,7 @@ public:
 5         out = "[" + origin_str + "]" + EncodeExtPubKey(xpub) + FormatHDKeypath(end_path);
 6         if (IsRange()) {
 7             out += "/*";
 8-            assert(m_derive == DeriveType::UNHARDENED);
 9+            assert(m_derive != DeriveType::UNHARDENED); // Injected BUG!! (bad)
10         }
11         return true;
12     }

…

But it looks like the fuzz target just crashed immediately anyway (see CI)

darosior force-pushed on Jul 21, 2023

darosior commented at 12:44 pm on July 21, 2023: member

Thanks for testing. Fixed the issue. I also introduced the bug you shared and it does make the target crash.

DrahtBot removed the label CI failed on Jul 21, 2023

maflcko commented at 4:44 pm on July 21, 2023: member

Did quick check to see how many iterations it would take to find my injected bug with libfuzzer:

-use_value_profile=1:
-use_value_profile=0:

in src/test/fuzz/descriptor_parse.cpp:60 in 84dee4fe69 outdated

55+                } else {
56+                    keys_str[i] = EncodeSecret(privkey);
57+                }
58+            } else {
59+                CExtKey ext_privkey;
60+                ext_privkey.SetSeed({(std::byte *)key_data, 32});

maflcko commented at 4:55 pm on July 21, 2023:

Nit: If you make key_data std::byte, you can avoid the c-style cast:

 0diff --git a/src/test/fuzz/descriptor_parse.cpp b/src/test/fuzz/descriptor_parse.cpp
 1index 1d2b672260..84dc005575 100644
 2--- a/src/test/fuzz/descriptor_parse.cpp
 3+++ b/src/test/fuzz/descriptor_parse.cpp
 4@@ -36,16 +36,16 @@ public:
 5     //! When initializing the target, populate the list of keys.
 6     void Init() {
 7         // The data to use as a private key or a seed for an xprv.
 8-        uint8_t key_data[32] = {1};
 9+        std::array<std::byte, 32> key_data{std::byte{1}};
10         // Generate keys of all kinds and store them in the keys array.
11         for (size_t i{0}; i < TOTAL_KEYS_GENERATED; i++) {
12-            key_data[31] = i;
13+            key_data[31] = std::byte(i);
14 
15             // If this is a "raw" key, generate a normal privkey. Otherwise generate
16             // an extended one.
17             if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
18                 CKey privkey;
19-                privkey.Set(key_data, key_data + 32, !IdIsUnCompPubKey(i));
20+                privkey.Set(UCharCast(key_data.begin()), UCharCast(key_data.end()), !IdIsUnCompPubKey(i));
21                 if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
22                     CPubKey pubkey{privkey.GetPubKey()};
23                     keys_str[i] = HexStr(pubkey);
24@@ -57,7 +57,7 @@ public:
25                 }
26             } else {
27                 CExtKey ext_privkey;
28-                ext_privkey.SetSeed({(std::byte *)key_data, 32});
29+                ext_privkey.SetSeed(key_data);
30                 if (IdIsXprv(i)) {
31                     keys_str[i] = EncodeExtKey(ext_privkey);
32                 } else {

darosior commented at 5:07 pm on July 21, 2023:

It’s not avoiding them, it’s hiding them behind UCharCast? :p

It boils down to choosing between casting to uint8_t or to std::byte, and i went for the option with less casts.

maflcko commented at 5:09 pm on July 21, 2023:

Yeah this doesn’t matter for the fuzz tests.

I only left the comment, so that it would be easier if CKey::Set was changed to take std::byte, it would be a smaller diff in the future.

darosior commented at 5:10 pm on July 21, 2023:

Happy to do it then.

darosior commented at 5:13 pm on July 21, 2023:

Done.

(However i just checked and UCharCast does indeed use C-style casts) https://github.com/bitcoin/bitcoin/blob/d23fda05842ba4539b225bbab01b94df0060f697/src/span.h#L272

EDIT: (Oh but the const version doesn’t) https://github.com/bitcoin/bitcoin/blob/d23fda05842ba4539b225bbab01b94df0060f697/src/span.h#L275

EDIT2: should call pastapastapasta to fix this mess

in src/test/fuzz/descriptor_parse.cpp:84 in 84dee4fe69 outdated

79+    //! Get a valid descriptor string from a descriptor string whose keys were mocked.
80+    std::optional<std::string> GetDescriptor(std::string_view mocked_desc) const {
81+        // The smallest fragment would be "pk(%00)"
82+        if (mocked_desc.size() < 7) return {};
83+
84+        // The actual, valid, descriptor string to be returned.

maflcko commented at 4:57 pm on July 21, 2023:

0        // The actual, descriptor string to be returned.

pretty sure it is not valid

darosior commented at 5:08 pm on July 21, 2023:

Done, thanks. (And also in the function’s doc comment.)

maflcko commented at 5:10 pm on July 21, 2023:

Done in the wrong commit, though? :sweat_smile:

darosior commented at 5:13 pm on July 21, 2023:

:facepalm:

darosior commented at 5:15 pm on July 21, 2023:

And i can’t even blame it on anything but myself this time! Fixed.

maflcko approved

maflcko commented at 5:00 pm on July 21, 2023: member

nice, lgtm ACK 84dee4fe690e08a5adaad1c78530666da07075d8 🦄

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: nice, lgtm ACK 84dee4fe690e08a5adaad1c78530666da07075d8 🦄
3ag2blJR9VoWl2NoAXwUZSDU54s5hPaFz7IJt+iLlaSMLYtYwbr4/XMw4CW9xm1A+BXAn9ZXtWQxrwFSRr/COBg==

darosior force-pushed on Jul 21, 2023

fuzz: add a new, more efficient, descriptor parsing target

This new target focuses on fuzzing the actual descriptor parsing logic
by not requiring the fuzzer to produce valid keys (nor a valid checksum
for that matter).
This should make it much more efficient to find bugs we could introduce
moving forward.

Using a character as a marker (here '%') to be able to search and
replace in the string without having to mock the actual descriptor
parsing logic was an insight from Pieter Wuille.

90a24741e7

fuzz: increase coverage of the descriptor targets

Once a descriptor is successfully parsed, execute more of its methods.
There is probably still room for improvements by checking for some
invariants, but this is a low hanging fruit that significantly increases
the code coverage of these targets.

131314b62e

darosior force-pushed on Jul 21, 2023

maflcko commented at 5:18 pm on July 21, 2023: member

re-ACK 131314b62e899f95d1863083d303b489b3212b16 🐓

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: re-ACK 131314b62e899f95d1863083d303b489b3212b16  🐓
3PiOVYYPtzobuHwHpzUHpihKOhhWkacXf+ZxTpk/y7EwMzhRSE9kfEk+pz/XQl8QIUv2W4fuwNKvY2pFyASt8Bg==

darosior requested review from achow101 on Jul 26, 2023

achow101 commented at 5:47 pm on July 27, 2023: member

ACK 131314b62e899f95d1863083d303b489b3212b16

DrahtBot removed review request from achow101 on Jul 27, 2023

achow101 merged this on Jul 27, 2023

achow101 closed this on Jul 27, 2023

in src/test/fuzz/descriptor_parse.cpp:77 in 131314b62e

72+    std::optional<uint8_t> IdxFromHex(std::string_view hex_characters) const {
73+        if (hex_characters.size() != 2) return {};
74+        auto idx = ParseHex(hex_characters);
75+        if (idx.size() != 1) return {};
76+        return idx[0];
77+    }

maflcko commented at 8:11 am on July 28, 2023:

Could just use the raw (single) byte here, but that would interfere with libFuzzer -only_ascii=1, which makes me wonder if this is the first ascii fuzz target and whether we should set the option somewhere somehow during input generation?

maflcko commented at 10:14 am on July 28, 2023:

Fuzz inputs until crash

Funny how -only_ascii=1 performs worse than -only_ascii=0 (https://github.com/bitcoin/bitcoin/pull/27888#issuecomment-1645976254).

maflcko commented at 10:15 am on July 28, 2023:

(or at least, not significantly better)

darosior commented at 10:16 am on July 28, 2023:

Maybe the target returning early on non-ASCII has basically the same effect?

darosior deleted the branch on Jul 28, 2023

maflcko commented at 10:46 am on July 28, 2023: member

-mutate_depth=3 seems to be the best so far:

Fuzz inputs until crash

maflcko commented at 11:41 am on July 28, 2023: member

With -mutate_depth=14 being worse (be aware that the y-axis no longer matches to all previous plots)

Fuzz inputs until crash

sidhujag referenced this in commit b11d694c10 on Aug 9, 2023

fanquake referenced this in commit ee7e4b0e40 on Feb 27, 2024

bitcoin locked this on Jul 27, 2024

Fuzz: a more efficient descriptor parsing target #27888

Reviews

Conflicts

descriptor_parse

mocked_descriptor_parse

`descriptor_parse`

`mocked_descriptor_parse`