fuzz: provide more realistic values to the base58(check) decoders #31917

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/base-encoding-input-fuzz-size-reduction changing 1 files +25 −26
  1. l0rinc commented at 4:23 pm on February 20, 2025: contributor

    This is a follow-up to #30746, expanding coverage by:

    • restricting every input for the base58 conversions, capping max sizes to 100 instead of 1000 or all available input (suggested by marcofleon in #30746 (review)) since most actual usage has lengths of e.g. 21, 34, 78.
    • providing more valid values to the decoder (suggested by maflcko in #30746 (review)) by randomly providing a random input or a valid encoded one; this also enables unifying the roundtrip tests to a single roundtrip per fuzz.
  2. DrahtBot commented at 4:23 pm on February 20, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31917.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, maflcko
    Concept ACK arejula27

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Feb 20, 2025
  4. in src/test/fuzz/base_encode_decode.cpp:43 in 21b8b8464b outdated
    50 {
    51-    FuzzedDataProvider provider(buffer.data(), buffer.size());
    52-    const std::string random_string{provider.ConsumeRandomLengthString(1000)};
    53-    const int max_ret_len{provider.ConsumeIntegralInRange<int>(-1, 1000)};
    54+    // Encode/Decode roundtrip
    55+    const auto encoded{EncodeBase58Check(buffer)};
    


    marcofleon commented at 4:30 pm on February 20, 2025:

    The problem actually turned out to be this. The input that times out is about 1MB so passing in the whole buffer causes the timeout.

    I don’t think the change from 1000 to 100 for both random_string and max_ret_len makes a difference.


    brunoerg commented at 5:33 pm on February 20, 2025:
    I think we use this function on the RPC target and we limit the length to 64.

    l0rinc commented at 5:55 pm on February 20, 2025:
    Thanks for the help, I pushed a new version where every input is restricted (used 100 to be slightly different from the RPC one you’ve mentioned) and <1000, even though it seems that wasn’t the actual bottleneck.
  5. l0rinc force-pushed on Feb 20, 2025
  6. l0rinc force-pushed on Feb 20, 2025
  7. l0rinc force-pushed on Feb 21, 2025
  8. arejula27 commented at 1:15 pm on February 23, 2025: none

    Concept ACK a63c5c1a8a012b694

    I’ve reviewed the changes and have one main observation:

    1. Fuzz Test Limitations

    Limiting fuzz to the most common sizes makes sense for efficiency. However, if long strings are still expected to be supported, there should be explicit test coverage for them.

    I checked the unit tests for Base58 in src/test/base58_tests.cpp, which reads test cases from test/data/base58_encode_decode.json. The longest existing test case appears on line 18, with a hex-encoded string of size 512. Given that this PR removes the long string test from the fuzz tests, it may be beneficial to add a new test case in base58_encode_decode.json to verify behaviour at 1000 characters.

    2. Fuzz Test Execution

    After static review, I ran the fuzz tests using:

    0$ cmake --preset=libfuzzer
    1$ cmake --build build_fuzz -j "$(($(nproc)+1))"
    2
    3$ FUZZ=base58_encode_decode build_fuzz/src/test/fuzz/fuzz -max_total_time=30 -
    4$ FUZZ=base58check_encode_decode build_fuzz/src/test/fuzz/fuzz -max_total_time=30
    5$ FUZZ=base32_encode_decode build_fuzz/src/test/fuzz/fuzz -max_total_time=30 
    6$ FUZZ=base64_encode_decode build_fuzz/src/test/fuzz/fuzz -max_total_time=30 
    7$ FUZZ=psbt_base64_decode build_fuzz/src/test/fuzz/fuzz -max_total_time=30 
    

    All tests passed successfully within the 30-second run.

  9. maflcko added this to the milestone 29.0 on Feb 26, 2025
  10. in src/test/fuzz/base_encode_decode.cpp:62 in 23fc349959 outdated
    76 
    77 FUZZ_TARGET(base32_encode_decode)
    78 {
    79-    const std::string random_string{buffer.begin(), buffer.end()};
    80+    FuzzedDataProvider provider{buffer.data(), buffer.size()};
    81+    const auto random_string{provider.ConsumeRandomLengthString(100)};
    


    maflcko commented at 8:27 am on February 28, 2025:
    Why are you restricting this to 100? It wasn’t restricted before

    l0rinc commented at 2:21 pm on February 28, 2025:
    to not use all the buffer, where we had problems before. Do you suggest we use the whole buffer here, like before, or to give it a bigger max value?

    maflcko commented at 2:36 pm on February 28, 2025:

    to not use all the buffer, where we had problems before.

    What problems? A problem was introduced in the base58 fuzz target in the previous change, however you are changing the base32/64 here and below.

    Generally, if there is no reason to change the code, I think it shouldn’t be changed. Otherwise, it will just be confusing, hard to review, and may even introduce problems.


    l0rinc commented at 2:54 pm on February 28, 2025:

    What problems? A problem was introduced in the base58 fuzz target in the previous change, however you are changing the base32/64 here and below.

    I’ve learned that using the whole buffer can be problematic for long running tasks, and since base conversion is mostly scale dependent anyway, we’d get better coverage of special values (e.g. all zeros, powers of the base, etc), by using a value that isn’t too large.

    there is no reason to change the code

    This PR is meant to unify the related base fuzz tests. I’d argue it’s easier to review them if they all follow the same steps. This is a follow-up to a previous PR, we’re not modifying legacy code here.


    l0rinc commented at 5:26 pm on February 28, 2025:
    Kept the base58 fuzz changes only, removed the base[32/64] changes
  11. in src/test/fuzz/base_encode_decode.cpp:78 in 23fc349959 outdated
     95 
     96 FUZZ_TARGET(base64_encode_decode)
     97 {
     98-    const std::string random_string{buffer.begin(), buffer.end()};
     99+    FuzzedDataProvider provider{buffer.data(), buffer.size()};
    100+    const auto random_string{provider.ConsumeRandomLengthString(100)};
    


    maflcko commented at 8:27 am on February 28, 2025:
    same
  12. maflcko removed this from the milestone 29.0 on Feb 28, 2025
  13. l0rinc force-pushed on Feb 28, 2025
  14. in src/test/fuzz/base_encode_decode.cpp:69 in ac4b10f702 outdated
    65@@ -67,9 +66,9 @@ FUZZ_TARGET(base32_encode_decode)
    66         assert(encoded_string == ToLower(TrimStringView(random_string)));
    67     }
    68     // Encode/Decode roundtrip
    69-    const auto encoded{EncodeBase32(buffer)};
    70+    const auto encoded{EncodeBase32(random_string)};
    


    mzumsande commented at 8:26 pm on March 5, 2025:
    this is changed back and fourth between the two commits, probably a rebase error.

    l0rinc commented at 9:32 pm on March 5, 2025:
    Thanks for the review. It was a rebase error indeed, thanks for catching it.
  15. in src/test/fuzz/base_encode_decode.cpp:31 in ac4b10f702 outdated
    32-    if (DecodeBase58(random_string, decoded, max_ret_len)) {
    33+    if (std::vector<unsigned char> decoded; DecodeBase58(random_string, decoded, max_ret_len)) {
    34         const auto encoded_string{EncodeBase58(decoded)};
    35         assert(encoded_string == TrimStringView(random_string));
    36-        assert(encoded_string.empty() || !DecodeBase58(encoded_string, decoded, provider.ConsumeIntegralInRange<int>(0, decoded.size() - 1)));
    37+        assert(decoded.empty() || !DecodeBase58(encoded_string, decoded, provider.ConsumeIntegralInRange<int>(0, decoded.size() - 1)));
    


    mzumsande commented at 8:39 pm on March 5, 2025:
    why is the check changed from encoded_string.empty() to decoded.empty() in the first commit - especially if this function is being rewritten in the next commit anyway?

    l0rinc commented at 9:32 pm on March 5, 2025:
    it’s the decoded size that determines if the (0, decoded.size() - 1) range makes sense or not - I’m correcting it in the first commit since this was a bug that could cause a fuzz failure (added explanation to commit message). I’m reordering in the second commit so that we have single roundtrip logic instead of 2 separate ones.

    mzumsande commented at 6:53 pm on March 13, 2025:
    ok, makes sense, but I’d challenge that this is “a bug that could cause a fuzz failure” (otherwise I would’ve expected this failure to be found by now): My understanding is that in the relevant special case, both decoded an encoded_string are empty. However, it should be impossible for encoded_string to be empty and decoded non-empty (and vice versa) because of how base58 works.
  16. fuzz: Always restrict base conversion input lengths
    They seem to cause timeouts:
    > Issue 397734700: bitcoin-core:base58check_encode_decode: Timeout in base58check_encode_decode
    
    The `encoded_string.empty()` check was corrected here to `decoded.empty()` to make sure the `(0, decoded.size() - 1)` range is always valid.
    
    Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
    Co-authored-by: marcofleon <marleo23@proton.me>
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    bad1433ef2
  17. fuzz: make sure DecodeBase58(Check) is called with valid values more often
    In Base58 fuzz the two roundtrips are merged now, the new `decode_input` switches between a completely random input and a valid encoded one, to make sure the decoding passes more often.
    The `max_ret_len` can also exceed the original length now and is being validated more thoroughly.
    
    Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
    Co-authored-by: marcofleon <marleo23@proton.me>
    d5537c18a9
  18. l0rinc force-pushed on Mar 5, 2025
  19. mzumsande commented at 6:54 pm on March 13, 2025: contributor
    Code Review / lightly tested ACK d5537c18a9034647ba4c9ed4008abd7fee33989e
  20. maflcko commented at 12:47 pm on March 15, 2025: member

    review ACK d5537c18a9034647ba4c9ed4008abd7fee33989e 🚛

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK d5537c18a9034647ba4c9ed4008abd7fee33989e 🚛
    3lZRbdgnghqtziDGBOrJTlPhuDIsZdigxD/VS2S8f5+dWrvPvXpUWSzcRe/nWCWjUTD2bz/MArCoV2/DrCVmxDA==
    
  21. maflcko added this to the milestone 29.0 on Mar 15, 2025
  22. maflcko commented at 12:50 pm on March 15, 2025: member
    (In theory this could be backported to 29.0 to avoid a fuzz timeout there, but it isn’t too important, up to the backport maintainer)
  23. fanquake merged this on Mar 16, 2025
  24. fanquake closed this on Mar 16, 2025

  25. fanquake added the label Needs backport (29.x) on Mar 17, 2025
  26. glozow referenced this in commit 15ecae31a8 on Mar 17, 2025
  27. glozow referenced this in commit 458655bca8 on Mar 17, 2025
  28. glozow commented at 2:14 am on March 17, 2025: member
    Backported in #32062
  29. glozow removed the label Needs backport (29.x) on Mar 17, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me