Fuzz: a more efficient descriptor parsing target #27888

pull darosior wants to merge 3 commits into bitcoin:master from darosior:efficient_desc_target changing 1 files +144 −5
  1. darosior commented at 1:13 pm on June 14, 2023: member

    The current descriptor parsing fuzz target requires valid public or private keys to be provided. This is unnecessary as we are only interested in fuzzing the descriptor parsing logic here (other targets are focused on fuzzing keys serializations). And it’s pretty inefficient, especially for formats that need a checksum (xpub, xprv, WIF).

    This introduces a new target that mocks the keys as an index in a list of precomputed keys. Keys are represented as 2 hex characters in the descriptor. The key type (private, public, extended, ..) is deterministically based on this one-byte value. Keys are deterministically generated at target initialization. This is much more efficient and also largely reduces the size of the seeds. TL;DR: for instance instead of requiring the fuzzer to generate a pk(xpub6DdBu7pBoyf7RjnUVhg8y6LFCfca2QAGJ39FcsgXM52Pg7eejUHLBJn4gNMey5dacyt4AjvKzdTQiuLfRdK8rSzyqZPJmNAcYZ9kVVEz4kj) to parse a valid descriptor, it just needs to generate a pk(03).

    Note we only mock the keys themselves, not the entire descriptor key expression. As we want to fuzz the real code that parses the rest of the key expression (origin, derivation paths, ..).

    This is a target i used for reviewing #17190 and #27255, and figured it was worth PR’ing on its own since the added complexity for mocking the keys is minimal and it could help prevent introducing bugs to the descriptor parsing logic much more efficiently.

  2. DrahtBot commented at 1:13 pm on June 14, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK MarcoFalke, achow101
    Concept ACK dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26573 (Wallet: don’t underestimate the fees when spending a Taproot output by darosior)
    • #26567 (Wallet: estimate the size of signed inputs using descriptors by darosior)
    • #22838 (descriptors: Be able to specify change and receiving in a single descriptor string by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. darosior force-pushed on Jun 14, 2023
  4. DrahtBot added the label CI failed on Jun 14, 2023
  5. maflcko commented at 2:09 pm on June 14, 2023: member

    Looks like this touches non-fuzz code? No opinion if people want this, but a much simpler implementation would be to just create a copy of the buffer and inject the pre-generated static string into it. (This is basically what a fuzz engine does automatically, with the difference that you can now provide the string dictionary yourself, directly in the fuzz target).

    the fuzz target would look like:

    0auto str{fdp.ConsumeString()};
    1if (fdp.ConsumeBool()) {
    2  str = MockDescriptor(str);
    3}
    4const auto desc = Parse(str);
    

    With MockDescriptor doing a search of pk(xx) and then replacing it with pk(yy), where yy=raw_pubkeys[int(xx) % raw_pubkeys.size()] (already hex encoded).

    This has the benefits of not mocking out the parsing logic, which for GetXOnlyPubKey is actually worthy to fuzz? Also, it allows to print the fuzz input, if needed, and use it over RPC for debugging. Finally, it doesn’t touch real code, only test code.

  6. darosior commented at 2:20 pm on June 14, 2023: member

    In what you suggest MockDescriptor would basically have to reimplement all of the descriptor parsing logic to be able to detect when is a key expected and replace the hex-encoded byte by an actual key. I figured the slight modification to the descriptor code to make key parsing mockable was preferable. (Note that’s what we have already in Miniscript which allows us to have an efficient miniscript_string fuzz target.)

    To be clear this is not only about mocking pk() expressions but anywhere we’d expect a key.

  7. DrahtBot removed the label CI failed on Jun 14, 2023
  8. maflcko commented at 2:49 pm on June 14, 2023: member

    Ok, I see. I guess another alternative would be to randomly inject random pre-generated keys at random positions in the string, absent of any logic. Then let the fuzz engine figure out the right positions via coverage feedback.

    No strong opinion, just leaving random ideas that can be implemented with less code.

  9. dergoegge commented at 2:50 pm on June 14, 2023: member

    Concept ACK

    Mocking seems fine to me, but I wonder if we could achieve something similar by placing (in)valid encoded keys in a fuzz dictionary (e.g. https://github.com/bitcoin-core/qa-assets/pull/122), which is also almost the same as Marco’s suggestion. This would be less efficient compared to what is in this PR but the inputs would still be available for debugging over RPC. We can of course also just do both since adding to the dictionaries is very easy.

  10. darosior commented at 3:16 pm on June 14, 2023: member

    Thanks both for throwing in ideas. Note i don’t have strong opinions here either, it’s just some review code that i figured could be helpful having in too. However i still don’t think the approaches suggested here would be better:

    • Inserting valid keys at random positions in the input. I’m assuming you describe something that looks like 1) get the number of keys from the fuzzer output 2) get the positions to insert each at from the fuzzer output. I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.
    • Including valid keys in a dictionary. Again, i’m not very familiar with the inernals of fuzzing engines but i suspect this would make the fuzzer grind the key with virtually no chance of finding another valid one, wasting a lot of cycles.
    • I don’t think keeping the raw seed human readable for debugging over RPC should generally be a goal, especially not at the expense of efficiency. To get something readable you can simply run ./src/test/fuzz/fuzz ./crash-XXXXX with a ToString() printed to stdout.
  11. sipa commented at 3:19 pm on June 14, 2023: member
    Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form “%XX” where XX is two hex characters, is replaced by a lookup in a table.
  12. darosior commented at 3:21 pm on June 14, 2023: member

    Hmm good point. Using a distinguishable character fixes the issue of having to “basically reimplement descriptor parsing logic”. ——- Original Message ——- On Wednesday, June 14th, 2023 at 5:19 PM, Pieter Wuille @.***> wrote:

    Another alternative may be to just perform search-replace in the fuzz-read string before handing it to the parser? Eg anything of the form “%XX” where XX is two hex characters, is replaced by a lookup in a table.

    — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

  13. maflcko commented at 3:31 pm on June 14, 2023: member

    I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.

    I am happy to run a bench, if you happen to have a bug laying around :)

  14. darosior commented at 3:35 pm on June 14, 2023: member

    I don’t, but i think Pieter settled the debate anyways. :) ——- Original Message ——- On Wednesday, June 14th, 2023 at 5:31 PM, MacrabFalke @.***> wrote:

    I may be underestimating the fuzzer’s capabilities but it sounds much less efficient.

    I am happy to run a bench, if you happen to have a crash+bug laying around :)

    — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

  15. darosior force-pushed on Jun 15, 2023
  16. darosior force-pushed on Jun 15, 2023
  17. DrahtBot added the label CI failed on Jun 15, 2023
  18. darosior force-pushed on Jun 15, 2023
  19. darosior commented at 2:42 pm on June 15, 2023: member

    Updated following @sipa’s suggestion of using a marker to be able to search and replace in the descriptor string directly.

    The diff is now twice smaller, and this does not touch the descriptor parsing logic anymore. After running the new mocked_descriptor_parse target for half a hour on my laptop i get more branch coverage for descriptor.cpp than with the existing corpus for descriptor_parse.

    descriptor_parse

    image

    mocked_descriptor_parse

    image

  20. DrahtBot removed the label CI failed on Jun 15, 2023
  21. in src/test/fuzz/descriptor_parse.cpp:48 in 845810d94d outdated
    42+
    43+            // If this is a "raw" key, generate a normal privkey. Otherwise generate
    44+            // an extended one.
    45+            if (IdIsCPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
    46+                CKey privkey;
    47+                privkey.Set(key_data, key_data + 32, true);
    


    achow101 commented at 5:08 pm on June 15, 2023:

    In 845810d94df83747634a60ff59635a670bba1124 “fuzz: add a new, more efficient, descriptor parsing target”

    Perhaps also include uncompressed keys?


    darosior commented at 8:43 am on June 16, 2023:
    Good call, done.
  22. darosior force-pushed on Jun 16, 2023
  23. in src/test/fuzz/descriptor_parse.cpp:51 in bf1d242e44 outdated
    47+                CKey privkey;
    48+                privkey.Set(key_data, key_data + 32, true);
    49+                if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
    50+                    CPubKey pubkey{privkey.GetPubKey()};
    51+                    if (IdIsUnCompPubKey(i)) assert(pubkey.Decompress());
    52+                    keys_str[i] = HexStr(pubkey);
    


    achow101 commented at 5:22 pm on June 16, 2023:

    In fad60fc6f6486d3b28de4bbe937e293f65d5cbd5 “fuzz: increase coverage of the descriptor targets”

    The second argument of Set is the compressedness, so this can be simplified a bit:

    0                privkey.Set(key_data, key_data + 32, IdIsCompPubKey(i));
    1                if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
    2                    CPubKey pubkey{privkey.GetPubKey()};
    3                    keys_str[i] = HexStr(pubkey);
    

    darosior commented at 12:16 pm on June 18, 2023:

    Done.

     0diff --git a/src/test/fuzz/descriptor_parse.cpp b/src/test/fuzz/descriptor_parse.cpp
     1index 9a7b6ea308..529dadadc1 100644
     2--- a/src/test/fuzz/descriptor_parse.cpp
     3+++ b/src/test/fuzz/descriptor_parse.cpp
     4@@ -45,10 +45,9 @@ public:
     5             // an extended one.
     6             if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
     7                 CKey privkey;
     8-                privkey.Set(key_data, key_data + 32, true);
     9+                privkey.Set(key_data, key_data + 32, !IdIsUnCompPubKey(i));
    10                 if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
    11                     CPubKey pubkey{privkey.GetPubKey()};
    12-                    if (IdIsUnCompPubKey(i)) assert(pubkey.Decompress());
    13                     keys_str[i] = HexStr(pubkey);
    14                 } else if (IdIsXOnlyPubKey(i)) {
    15                     const XOnlyPubKey pubkey{privkey.GetPubKey()};
    
  24. darosior force-pushed on Jun 18, 2023
  25. luke-jr commented at 2:11 pm on June 24, 2023: member
    Suggest putting “Fuzz” in title, and labelling.
  26. darosior renamed this:
    A more efficient descriptor parsing target
    Fuzz: a more efficient descriptor parsing target
    on Jun 27, 2023
  27. DrahtBot added the label Tests on Jun 27, 2023
  28. DrahtBot added the label Needs rebase on Jul 17, 2023
  29. fuzz: make the parsed descriptor testing into a function
    We'll be reusing it in the new target.
    d60229ede5
  30. darosior force-pushed on Jul 21, 2023
  31. DrahtBot removed the label Needs rebase on Jul 21, 2023
  32. DrahtBot added the label CI failed on Jul 21, 2023
  33. maflcko commented at 10:42 am on July 21, 2023: member

    For testing I’ve injected a bug in currently uncovered code:

     0diff --git a/src/script/descriptor.cpp b/src/script/descriptor.cpp
     1index 09ded5fc61..b69db182ab 100644
     2--- a/src/script/descriptor.cpp
     3+++ b/src/script/descriptor.cpp
     4@@ -509,7 +509,7 @@ public:
     5         out = "[" + origin_str + "]" + EncodeExtPubKey(xpub) + FormatHDKeypath(end_path);
     6         if (IsRange()) {
     7             out += "/*";
     8-            assert(m_derive == DeriveType::UNHARDENED);
     9+            assert(m_derive != DeriveType::UNHARDENED); // Injected BUG!! (bad)
    10         }
    11         return true;
    12     }
    

    But it looks like the fuzz target just crashed immediately anyway (see CI)

  34. darosior force-pushed on Jul 21, 2023
  35. darosior commented at 12:44 pm on July 21, 2023: member
    Thanks for testing. Fixed the issue. I also introduced the bug you shared and it does make the target crash.
  36. DrahtBot removed the label CI failed on Jul 21, 2023
  37. maflcko commented at 4:44 pm on July 21, 2023: member

    Did quick check to see how many iterations it would take to find my injected bug with libfuzzer:

    • -use_value_profile=1: pull_uvp1

    • -use_value_profile=0: pull_uvp0

  38. in src/test/fuzz/descriptor_parse.cpp:60 in 84dee4fe69 outdated
    55+                } else {
    56+                    keys_str[i] = EncodeSecret(privkey);
    57+                }
    58+            } else {
    59+                CExtKey ext_privkey;
    60+                ext_privkey.SetSeed({(std::byte *)key_data, 32});
    


    maflcko commented at 4:55 pm on July 21, 2023:

    Nit: If you make key_data std::byte, you can avoid the c-style cast:

     0diff --git a/src/test/fuzz/descriptor_parse.cpp b/src/test/fuzz/descriptor_parse.cpp
     1index 1d2b672260..84dc005575 100644
     2--- a/src/test/fuzz/descriptor_parse.cpp
     3+++ b/src/test/fuzz/descriptor_parse.cpp
     4@@ -36,16 +36,16 @@ public:
     5     //! When initializing the target, populate the list of keys.
     6     void Init() {
     7         // The data to use as a private key or a seed for an xprv.
     8-        uint8_t key_data[32] = {1};
     9+        std::array<std::byte, 32> key_data{std::byte{1}};
    10         // Generate keys of all kinds and store them in the keys array.
    11         for (size_t i{0}; i < TOTAL_KEYS_GENERATED; i++) {
    12-            key_data[31] = i;
    13+            key_data[31] = std::byte(i);
    14 
    15             // If this is a "raw" key, generate a normal privkey. Otherwise generate
    16             // an extended one.
    17             if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i) || IdIsXOnlyPubKey(i) || IdIsConstPrivKey(i)) {
    18                 CKey privkey;
    19-                privkey.Set(key_data, key_data + 32, !IdIsUnCompPubKey(i));
    20+                privkey.Set(UCharCast(key_data.begin()), UCharCast(key_data.end()), !IdIsUnCompPubKey(i));
    21                 if (IdIsCompPubKey(i) || IdIsUnCompPubKey(i)) {
    22                     CPubKey pubkey{privkey.GetPubKey()};
    23                     keys_str[i] = HexStr(pubkey);
    24@@ -57,7 +57,7 @@ public:
    25                 }
    26             } else {
    27                 CExtKey ext_privkey;
    28-                ext_privkey.SetSeed({(std::byte *)key_data, 32});
    29+                ext_privkey.SetSeed(key_data);
    30                 if (IdIsXprv(i)) {
    31                     keys_str[i] = EncodeExtKey(ext_privkey);
    32                 } else {
    

    darosior commented at 5:07 pm on July 21, 2023:

    It’s not avoiding them, it’s hiding them behind UCharCast? :p

    It boils down to choosing between casting to uint8_t or to std::byte, and i went for the option with less casts.


    maflcko commented at 5:09 pm on July 21, 2023:

    Yeah this doesn’t matter for the fuzz tests.

    I only left the comment, so that it would be easier if CKey::Set was changed to take std::byte, it would be a smaller diff in the future.


    darosior commented at 5:10 pm on July 21, 2023:
    Happy to do it then.

    darosior commented at 5:13 pm on July 21, 2023:

    Done.

    (However i just checked and UCharCast does indeed use C-style casts) https://github.com/bitcoin/bitcoin/blob/d23fda05842ba4539b225bbab01b94df0060f697/src/span.h#L272

    EDIT: (Oh but the const version doesn’t) https://github.com/bitcoin/bitcoin/blob/d23fda05842ba4539b225bbab01b94df0060f697/src/span.h#L275

    EDIT2: should call pastapastapasta to fix this mess

  39. in src/test/fuzz/descriptor_parse.cpp:84 in 84dee4fe69 outdated
    79+    //! Get a valid descriptor string from a descriptor string whose keys were mocked.
    80+    std::optional<std::string> GetDescriptor(std::string_view mocked_desc) const {
    81+        // The smallest fragment would be "pk(%00)"
    82+        if (mocked_desc.size() < 7) return {};
    83+
    84+        // The actual, valid, descriptor string to be returned.
    


    maflcko commented at 4:57 pm on July 21, 2023:
    0        // The actual, descriptor string to be returned.
    

    pretty sure it is not valid


    darosior commented at 5:08 pm on July 21, 2023:
    Done, thanks. (And also in the function’s doc comment.)

    maflcko commented at 5:10 pm on July 21, 2023:
    Done in the wrong commit, though? :sweat_smile:

    darosior commented at 5:13 pm on July 21, 2023:
    :facepalm:

    darosior commented at 5:15 pm on July 21, 2023:
    And i can’t even blame it on anything but myself this time! Fixed.
  40. maflcko approved
  41. maflcko commented at 5:00 pm on July 21, 2023: member

    nice, lgtm ACK 84dee4fe690e08a5adaad1c78530666da07075d8 🦄

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: nice, lgtm ACK 84dee4fe690e08a5adaad1c78530666da07075d8 🦄
    3ag2blJR9VoWl2NoAXwUZSDU54s5hPaFz7IJt+iLlaSMLYtYwbr4/XMw4CW9xm1A+BXAn9ZXtWQxrwFSRr/COBg==
    
  42. darosior force-pushed on Jul 21, 2023
  43. darosior force-pushed on Jul 21, 2023
  44. fuzz: add a new, more efficient, descriptor parsing target
    This new target focuses on fuzzing the actual descriptor parsing logic
    by not requiring the fuzzer to produce valid keys (nor a valid checksum
    for that matter).
    This should make it much more efficient to find bugs we could introduce
    moving forward.
    
    Using a character as a marker (here '%') to be able to search and
    replace in the string without having to mock the actual descriptor
    parsing logic was an insight from Pieter Wuille.
    90a24741e7
  45. fuzz: increase coverage of the descriptor targets
    Once a descriptor is successfully parsed, execute more of its methods.
    There is probably still room for improvements by checking for some
    invariants, but this is a low hanging fruit that significantly increases
    the code coverage of these targets.
    131314b62e
  46. darosior force-pushed on Jul 21, 2023
  47. maflcko commented at 5:18 pm on July 21, 2023: member

    re-ACK 131314b62e899f95d1863083d303b489b3212b16 🐓

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: re-ACK 131314b62e899f95d1863083d303b489b3212b16  🐓
    3PiOVYYPtzobuHwHpzUHpihKOhhWkacXf+ZxTpk/y7EwMzhRSE9kfEk+pz/XQl8QIUv2W4fuwNKvY2pFyASt8Bg==
    
  48. darosior requested review from achow101 on Jul 26, 2023
  49. achow101 commented at 5:47 pm on July 27, 2023: member
    ACK 131314b62e899f95d1863083d303b489b3212b16
  50. DrahtBot removed review request from achow101 on Jul 27, 2023
  51. achow101 merged this on Jul 27, 2023
  52. achow101 closed this on Jul 27, 2023

  53. in src/test/fuzz/descriptor_parse.cpp:77 in 131314b62e
    72+    std::optional<uint8_t> IdxFromHex(std::string_view hex_characters) const {
    73+        if (hex_characters.size() != 2) return {};
    74+        auto idx = ParseHex(hex_characters);
    75+        if (idx.size() != 1) return {};
    76+        return idx[0];
    77+    }
    


    maflcko commented at 8:11 am on July 28, 2023:
    Could just use the raw (single) byte here, but that would interfere with libFuzzer -only_ascii=1, which makes me wonder if this is the first ascii fuzz target and whether we should set the option somewhere somehow during input generation?

    maflcko commented at 10:14 am on July 28, 2023:

    Fuzz inputs until crash

    Funny how -only_ascii=1 performs worse than -only_ascii=0 (https://github.com/bitcoin/bitcoin/pull/27888#issuecomment-1645976254).


    maflcko commented at 10:15 am on July 28, 2023:
    (or at least, not significantly better)

    darosior commented at 10:16 am on July 28, 2023:
    Maybe the target returning early on non-ASCII has basically the same effect?
  54. darosior deleted the branch on Jul 28, 2023
  55. maflcko commented at 10:46 am on July 28, 2023: member

    -mutate_depth=3 seems to be the best so far:

    Fuzz inputs until crash

  56. maflcko commented at 11:41 am on July 28, 2023: member

    With -mutate_depth=14 being worse (be aware that the y-axis no longer matches to all previous plots)

    Fuzz inputs until crash

  57. sidhujag referenced this in commit b11d694c10 on Aug 9, 2023
  58. fanquake referenced this in commit ee7e4b0e40 on Feb 27, 2024
  59. bitcoin locked this on Jul 27, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me