test: cover base32/base58/base64 with symmetric roundtrip fuzz (and padding) tests #30746

pull l0rinc wants to merge 4 commits into bitcoin:master from l0rinc:l0rinc/roundtrip-tests-base32-base64 changing 5 files +114 −50
  1. l0rinc commented at 6:58 am on August 29, 2024: contributor
    Added fuzzed roundtrips for base32/base58/base64 encoding to make sure encoding/decoding are symmetric. Note that if we omit the padding in EncodeBase32 we won’t be able to decode it with DecodeBase32. Added dedicated padding tests to cover failure behavior Also moved over the Base58 json test edge cases from https://github.com/bitcoin/bitcoin/pull/30035
  2. DrahtBot commented at 6:58 am on August 29, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30746.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator
    Stale ACK marcofleon

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. l0rinc marked this as ready for review on Aug 29, 2024
  4. l0rinc renamed this:
    Add base32 and base64 roundtrip random padding tests
    test: cover base32/base64 with random roundtrip padding tests
    on Aug 29, 2024
  5. DrahtBot added the label Tests on Aug 29, 2024
  6. in src/test/base32_tests.cpp:60 in caddc34c6c outdated
    56+    BOOST_CHECK( DecodeBase32("aaaaa==="s));
    57+    BOOST_CHECK(!DecodeBase32("aaaaaa=="s));
    58+    BOOST_CHECK( DecodeBase32("aaaaaaa="s));
    59+}
    60+
    61+BOOST_AUTO_TEST_CASE(base32_roundtrip) {
    


    fjahr commented at 2:54 pm on August 29, 2024:
    Shouldn’t this be a fuzzing test?

    l0rinc commented at 5:08 pm on August 29, 2024:
    I’ve modelled these based on https://github.com/bitcoin/bitcoin/blob/master/src/test/base58_tests.cpp#L84-L98 - but I guess the existing fuzz tests already cover this as well, so I’ll migrate them all to https://github.com/bitcoin/bitcoin/blob/master/src/test/fuzz/base_encode_decode.cpp#L20-L54, thanks for the hint

    l0rinc commented at 5:29 pm on August 29, 2024:
    Done
  7. in src/test/base64_tests.cpp:63 in caddc34c6c outdated
    62+    BOOST_CHECK(!DecodeBase64("a==="s));
    63+    BOOST_CHECK( DecodeBase64("YQ=="s));
    64+    BOOST_CHECK( DecodeBase64("YWE="s));
    65+}
    66+
    67+BOOST_AUTO_TEST_CASE(base64_roundtrip) {
    


    fjahr commented at 2:54 pm on August 29, 2024:
    Here too, this is probably better implemented as a fuzzing test.

    l0rinc commented at 5:08 pm on August 29, 2024:
  8. l0rinc force-pushed on Aug 29, 2024
  9. l0rinc renamed this:
    test: cover base32/base64 with random roundtrip padding tests
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and some padding) tests
    on Aug 29, 2024
  10. l0rinc renamed this:
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and some padding) tests
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and padding) tests
    on Aug 29, 2024
  11. DrahtBot added the label CI failed on Aug 30, 2024
  12. l0rinc force-pushed on Aug 30, 2024
  13. l0rinc force-pushed on Aug 30, 2024
  14. l0rinc force-pushed on Aug 31, 2024
  15. l0rinc force-pushed on Sep 1, 2024
  16. l0rinc commented at 4:24 pm on September 1, 2024: contributor
    @maflcko, the Fuzz tests seem to be timing out now, is the new setup too heavy, should I maybe add them to different suites?
  17. maflcko commented at 5:55 am on September 2, 2024: member

    @maflcko, the Fuzz tests seem to be timing out now, is the new setup too heavy, should I maybe add them to different suites?

    Did you run them locally, and can point out which commit makes them time out? Is there a flame graph, or some other kind of benchmark or performance debugging that would help tack this down?

    Absent that, it seems odd that the fuzz tests would time out here.

  18. l0rinc force-pushed on Sep 2, 2024
  19. l0rinc force-pushed on Sep 2, 2024
  20. l0rinc commented at 2:29 pm on September 2, 2024: contributor

    it seems odd that the fuzz tests would time out here

    Split the big fuzz target into separate base58_encode_decode / base58check_encode_decode / base32_encode_decode / base64_encode_decode / psbt_base64_decode - it’s cleaner like this anyway.

    Edit: seems this fixed the timeout

  21. in src/test/fuzz/base_encode_decode.cpp:67 in 58fc3ce0a3 outdated
    79+    if (auto result{DecodeBase32(random_string)}) {
    80+        auto encoded_string{EncodeBase32(*result)};
    81+        assert(encoded_string == ToLower(TrimString(random_string)));
    82     }
    83+    // Encode/Decode roundtrip
    84+    auto encoded{EncodeBase32(random_data)};
    


    dergoegge commented at 10:06 am on September 10, 2024:

    Reusing the same data from the fuzz input for two different things (e.g. decode/encode and encode/decode) is usually not a good idea.

    How about adding a decode step after the encoding above (i.e. decode -> encode -> decode)? Or alternatively split this second round-trip into its own harness as well?


    l0rinc commented at 11:22 am on September 10, 2024:

    Thanks for the review!

    is usually not a good idea

    Can you please expand on that? I don’t mind splitting, as you’ve recommended, but I have to understand what the current drawbacks are.


    dergoegge commented at 1:03 pm on September 10, 2024:
    It might be less obvious (and maybe not even a problem) for this simple harness but the decode/encode and encode/decode round-trips are different tests. Their input spaces are not the same, which means that reusing the same fuzz input for both leads to less efficient fuzzing (e.g. the corpus will include inputs that are only relevant to one of the tests but both will be executed).

    maflcko commented at 1:35 pm on September 10, 2024:

    There is also an overhead that we probably don’t want a separate fuzz target for every trivial function, because each fuzz target may be put into a separate binary and is allocated a separate folder. So in very trivial cases, I think it is fine to combine several trivial function calls into one fuzz target.

    Encode/Decode of a base seems to be trivial enough. I think the previous CI timeout may be unrelated #30746 (comment) , because there are general CI issues right now. Generally, it is best to not rely on CI and separately double-check and confirm any result locally.

    If you want to combine the test cases, but not want to always exclude both test branches, the CallOneOf helper can be used:

    0        CallOneOf(
    1            fuzzed_data_provider,
    2            [&] {
    3                Branch1(fuzzed_data_provider);
    4            },
    5            [&] {
    6                Branch2(fuzzed_data_provider);
    7        });
    

    l0rinc commented at 2:35 pm on September 10, 2024:
    Thanks @maflcko. @dergoegge, would a CallOneOf based solution satisfy your needs?

    dergoegge commented at 10:59 am on September 13, 2024:

    There is also an overhead that we probably don’t want a separate fuzz target for every trivial function, because each fuzz target may be put into a separate binary and is allocated a separate folder.

    I don’t think the binary sizes would be a problem after we merge something like #30882 and the extra folder for the corpus is also not a huge cost (they compress really nicely).

    I’d prefer to follow best practices over having to weigh triviality against some external overhead of running these tests. “Put separate tests into separate harnesses” is easier to follow than “aggregate trivial functions in one harness” because it is entirely subjective. What is trivial? How many unrelated trivial functions are allowed in one harness? I think this creates confusion for devs that aren’t experts on fuzzing.

    would a CallOneOf based solution satisfy your needs?

    I don’t think this solves the issue, e.g. mutations that change the calloneof branch result in completely changing the meaning of the fuzz input (i.e. switching between the different tests).

    I don’t mean to derail this PR, it’s already improving things by splitting up base_encode_decode. Feel free to leave it as is.


    maflcko commented at 11:38 am on September 13, 2024:

    I don’t think the binary sizes would be a problem after we merge something like #30882 and the extra folder for the corpus is also not a huge cost (they compress really nicely).

    Are you sure? IIUC 30882 creates a separate binary for each fuzz target, so depending on the compile/link options (static, without lto, …), each fuzz target may or may not be heavy. For example, the OSS-Fuzzing already went down due to storage limitations (https://github.com/google/oss-fuzz/pull/12232). I am not saying that OSS-Fuzz should be considered a blocker, given that there are now several in-house backups, but I wanted to mention it for context.

    I’d prefer to follow best practices over having to weigh triviality against some external overhead of running these tests. “Put separate tests into separate harnesses” is easier to follow than “aggregate trivial functions in one harness” because it is entirely subjective. What is trivial? How many unrelated trivial functions are allowed in one harness? I think this creates confusion for devs that aren’t experts on fuzzing.

    I think I agree that it would be ideal to have an easy/trivial guideline to follow. However, I still think there is some threshold where a separate fuzz target (or maybe even a fuzz target at all) may not be the best choice. There are many pure util functions that (in theory) could go into a separate fuzz target. For example, StringForFeeEstimateHorizon https://github.com/bitcoin/bitcoin/blob/e43ce250c6fb65cc0c8903296c8ab228539d2204/src/test/fuzz/kitchen_sink.cpp#L45 should (in theory) be moved to a separate target. However, it is such a trivial function (a single switch-case) that linking the binary and starting it takes longer than reaching full coverage from an empty initial seed corpus. Maybe the function is too trivial to be put into a fuzz target at all, but I think there are plenty other pure util functions, such that there could easily be 500-1500 fuzz targets with similar properties. Of course there is nothing inherently wrong with having so many fuzz targets, but I could imagine that some tooling (introspector, coverage reports per fuzz target) or humans consuming the tool’s output could be overwhelmed by the number of trivial fuzz targets and miss the actually important ones.


    l0rinc commented at 11:45 am on September 13, 2024:

    Thanks for the directions @dergoegge, @maflcko.

    I don’t mean to derail this PR, it’s already improving things by splitting up base_encode_decode. Feel free to leave it as is.

    I’m all for separation of concerns, that’s what I’ve done here as well. If there’s a simple and non-controvertial way to make it even better, let me know!


    hodlinator commented at 2:02 pm on October 26, 2024:

    (My instinct would be to go with the following to avoid rocking the boat when it comes to seed corpus naming etc, but I’m too new to this aspect of fuzzing to carry much weight).

     0void base58_encode_decode(FuzzBufferType buffer);
     1void base58check_encode_decode(FuzzBufferType buffer);
     2void base32_encode_decode(FuzzBufferType buffer);
     3void base64_encode_decode(FuzzBufferType buffer);
     4void psbt_base64_decode(FuzzBufferType buffer);
     5
     6FUZZ_TARGET(base_encode_decode)
     7{
     8    base58_encode_decode(buffer);
     9    base58check_encode_decode(buffer);
    10    base32_encode_decode(buffer);
    11    base64_encode_decode(buffer);
    12    psbt_base64_decode(buffer);
    13}
    

    l0rinc commented at 11:49 am on October 28, 2024:
    Is this a blocker for anyone? If anyone has strong feelings about this, please let me know, we can either restructure here or provide it in another PR. Resolving for now.
  22. deep709 approved
  23. maflcko removed the label CI failed on Sep 12, 2024
  24. Oaksta85 approved
  25. achow101 requested review from marcofleon on Oct 15, 2024
  26. achow101 commented at 3:58 pm on October 15, 2024: member
  27. DrahtBot added the label CI failed on Oct 18, 2024
  28. l0rinc force-pushed on Oct 24, 2024
  29. DrahtBot removed the label CI failed on Oct 24, 2024
  30. in src/test/fuzz/base_encode_decode.cpp:27 in ad917fb105 outdated
    30-        const std::string encoded_string = EncodeBase58(decoded);
    31-        assert(encoded_string == TrimStringView(encoded_string));
    32-        assert(ToLower(encoded_string) == ToLower(TrimString(random_encoded_string)));
    33+    if (DecodeBase58(random_string, decoded, 100)) {
    34+        auto encoded_string{EncodeBase58(decoded)};
    35+        assert(encoded_string == TrimString(random_string));
    


    hodlinator commented at 1:05 pm on October 26, 2024:

    Should keep using TrimStringView here to avoid allocating extra std::string instances. Fuzzing code is re-run at high frequency so optimization (without compromising code coverage) is important.

    0        assert(encoded_string == TrimStringView(random_string));
    

    l0rinc commented at 11:48 am on October 28, 2024:
    Done, thanks
  31. in src/test/fuzz/base_encode_decode.cpp:21 in ad917fb105 outdated
    20-FUZZ_TARGET(base_encode_decode)
    21+FUZZ_TARGET(base58_encode_decode)
    22 {
    23-    const std::string random_encoded_string(buffer.begin(), buffer.end());
    24+    std::string random_string(buffer.begin(), buffer.end());
    25+    std::vector<unsigned char> random_data{ToByteVector(random_string)};
    


    hodlinator commented at 1:09 pm on October 26, 2024:

    I understand the change to use random_string as something that might either be encoded already, or as something to be encoded in the later half of these functions, so the rename makes sense. Would avoid stripping const for these slightly longer lived variables, just for peace of mind.

    0    const std::string random_string{buffer.begin(), buffer.end()};
    1    const std::vector<unsigned char> random_data{ToByteVector(random_string)};
    

    Another aspect is that buffer has the type using FuzzBufferType = std::span<const uint8_t> already, so it could potentially be used directly instead of introducing and allocating random_data. (Not super happy about the memcmp though).

     0 FUZZ_TARGET(base58_encode_decode)
     1 {
     2-    std::string random_string(buffer.begin(), buffer.end());
     3-    std::vector<unsigned char> random_data{ToByteVector(random_string)};
     4+    const std::string random_string{buffer.begin(), buffer.end()};
     5 
     6     // Decode/Encode roundtrip
     7     std::vector<unsigned char> decoded;
     8@@ -27,11 +28,12 @@ FUZZ_TARGET(base58_encode_decode)
     9         assert(encoded_string == TrimString(random_string));
    10         assert(encoded_string.empty() || !DecodeBase58(encoded_string, decoded, decoded.size() - 1));
    11     }
    12
    13     // Encode/Decode roundtrip
    14-    auto encoded{EncodeBase58(random_data)};
    15+    auto encoded{EncodeBase58(buffer)};
    16     std::vector<unsigned char> roundtrip_decoded;
    17     assert(DecodeBase58(encoded, roundtrip_decoded, random_string.size())
    18-        && roundtrip_decoded == random_data);
    19+        && !std::memcmp(roundtrip_decoded.data(), buffer.data(), buffer.size()));
    20 }
    

    l0rinc commented at 11:47 am on October 28, 2024:

    Would avoid stripping const for these slightly longer lived variables, just for peace of mind.

    Done, thanks.

    Not super happy about the memcmp though

    agree, memcmp is ugly, but std::ranges::equal should work here.

  32. in src/test/fuzz/base_encode_decode.cpp:94 in ad917fb105 outdated
    112+    std::string random_string(buffer.begin(), buffer.end());
    113 
    114     PartiallySignedTransaction psbt;
    115     std::string error;
    116-    (void)DecodeBase64PSBT(psbt, random_encoded_string, error);
    117+    (void) DecodeBase64PSBT(psbt, random_string, error);
    


    hodlinator commented at 1:27 pm on October 26, 2024:

    Why add the space?

    Could also verify error is returned correctly.

    0    assert(DecodeBase64PSBT(psbt, random_string, error) == error.empty());
    

    l0rinc commented at 11:47 am on October 28, 2024:
    Perfection! ✨👌✨
  33. in src/test/fuzz/base_encode_decode.cpp:28 in ad917fb105 outdated
    27+    // Decode/Encode roundtrip
    28     std::vector<unsigned char> decoded;
    29-    if (DecodeBase58(random_encoded_string, decoded, 100)) {
    30-        const std::string encoded_string = EncodeBase58(decoded);
    31-        assert(encoded_string == TrimStringView(encoded_string));
    32-        assert(ToLower(encoded_string) == ToLower(TrimString(random_encoded_string)));
    


    hodlinator commented at 1:45 pm on October 26, 2024:
    Good that you removed the normalizing of the case for Base58 roundtripping as case is significant there (unlike how we treat base32). :+1:

    l0rinc commented at 11:47 am on October 28, 2024:
    astute observation ;)
  34. in src/test/fuzz/base_encode_decode.cpp:28 in ad917fb105 outdated
    31-        assert(encoded_string == TrimStringView(encoded_string));
    32-        assert(ToLower(encoded_string) == ToLower(TrimString(random_encoded_string)));
    33+    if (DecodeBase58(random_string, decoded, 100)) {
    34+        auto encoded_string{EncodeBase58(decoded)};
    35+        assert(encoded_string == TrimString(random_string));
    36+        assert(encoded_string.empty() || !DecodeBase58(encoded_string, decoded, decoded.size() - 1));
    


    hodlinator commented at 2:29 pm on October 26, 2024:

    Could test slightly more aggressively in line with the removed test: https://github.com/bitcoin/bitcoin/blob/a5b4e42a575b4680a9eedb6ab2275d2a723cc20d/src/test/base58_tests.cpp#L93-L94

    0        assert(encoded_string.empty() || !DecodeBase58(encoded_string, decoded, fuzzed_data_provider.ConsumeIntegralInRange(0, decoded.size() - 1)));
    

    l0rinc commented at 11:47 am on October 28, 2024:
    Good idea, this will simplify the string generation as well since it requires a FuzzedDataProvider anyway
  35. hodlinator commented at 2:32 pm on October 26, 2024: contributor

    Concept ACK ad917fb105b68fb369455ec7c42292a7673db1ad

    • Extra tests/coverage, does not modify files containing the functionality being tested.
    • Does remove base58_random_encode_decode unit test, but everything seems to be carried over to the modified fuzz test (except for testing for a variable count of insufficient result buffer sizes).
  36. l0rinc force-pushed on Oct 28, 2024
  37. in src/test/fuzz/base_encode_decode.cpp:23 in 5bc0a8daab outdated
    22-FUZZ_TARGET(base_encode_decode)
    23+FUZZ_TARGET(base58_encode_decode)
    24 {
    25-    const std::string random_encoded_string(buffer.begin(), buffer.end());
    26+    FuzzedDataProvider provider(buffer.data(), buffer.size());
    27+    const std::string random_string{provider.ConsumeRemainingBytesAsString()};
    


    hodlinator commented at 10:09 pm on October 28, 2024:

    I’m afraid FuzzedDataProvider::remaining_bytes_ becomes zero after this line in the current version, meaning provider.ConsumeIntegralInRange<int>(0, decoded.size() - 1) below will always return the minimum value if I’m reading the ConsumeIntegralInRange-implementation correctly.

    0    const std::string random_string{provider.ConsumeRandomLengthString(101)};
    

    Could maybe (in another PR) add something like bool FuzzedDataProvider::consume_remaining_called_{false} which is set by ConsumeRemainingBytes*() and then assert against it in the various consume methods, to catch cases like these.


    maflcko commented at 7:18 am on October 29, 2024:

    Could maybe (in another PR)

    In theory it could also be caught at compile time with https://clang.llvm.org/docs/AttributeReference.html#consumed-annotation-checking


    maflcko commented at 11:58 am on October 29, 2024:
    However, as the file is taken from upstream, I wonder how much it should be modified at all.

    l0rinc commented at 12:06 pm on October 29, 2024:
    hmmm, I’ll check it out, I wasn’t aware that’s how this works (fuzzing is currently completely broken for me locally), but this sounds like a huge blind-spot for fuzzing - like you said, we should investigate, false confidence in this area is dangerous.

    l0rinc commented at 2:41 pm on October 29, 2024:
    Done, thanks

    hodlinator commented at 9:45 am on October 31, 2024:
    @dergoegge any thoughts on my suggestion of altering FuzzedDataProvider to trigger asserts when calls like ConsumeIntegralInRange<int>(0, decoded.size() - 1) are made after calls to ConsumeRemainingBytes*()? Maybe such an improvement could be upstreamed even (possibly with a flag to turn it off for sloppy tests).

    maflcko commented at 9:53 am on October 31, 2024:
    I’d say it would be easier to sell to upstream if it was checked at compile-time at no runtime cost/overhead. Is there a benefit of doing it at runtime?

    hodlinator commented at 9:59 am on October 31, 2024:
    It might add extra friction to agree on preprocessor logic to make the code still compile on other compilers. Also, doing it at compile-time means it’s an all/nothing thing that can’t be turned off for individual sloppy tests. But I agree it would be great to see if upstream would take a compile-time version first.

    maflcko commented at 10:05 am on October 31, 2024:

    agree on preprocessor logic

    Maybe I am missing something, but it should be pretty standard, similar to

    https://github.com/bitcoin/bitcoin/blob/f07a533dfcb172321972e5afb3b38a4bd24edb87/src/span.h#L22-L30

    ?


    hodlinator commented at 10:09 am on October 31, 2024:
    Yeah, hopefully shouldn’t be much of an obstacle, maybe I’m too pessimistic about that aspect.

    l0rinc commented at 11:37 am on October 31, 2024:
    Is there anything else for me to do here?

    hodlinator commented at 12:25 pm on October 31, 2024:
    Nothing more to do as far as this PR goes. Still interested to hear from folks more focused on fuzzing.

    dergoegge commented at 3:29 pm on November 1, 2024:
    Re #30746 (review): I think I agree with marco that a compile time check would be nicer and we should ideally upstream it so that our FuzzedDataProvider impl stays in sync.

    maflcko commented at 3:22 pm on November 29, 2024:

    I tried to implement this, but (compile-time) annotations are either useless, because they are not applied globally, or they are too verbose for this, because they need to be applied globally/virally.

    I think doing the check at runtime with a bool is probably good enough. Though that requires modifying the header file in some way or another.

    An alternative could be to forbid the direct use of the depleting member functions (with a linter) and add a new helper which eats a FuzzedDataProvider&& and returns the bytes in it. This way, the existing use-after-move checks can be used.


    hodlinator commented at 8:28 pm on November 29, 2024:

    Great that you made time to look into this!

    An alternative could be to forbid the direct use of the depleting member functions (with a linter) and add a new helper which eats a FuzzedDataProvider&& and returns the bytes in it. This way, the existing use-after-move checks can be used.

    So you mean disallowing direct calls to ConsumeRemainingBytes*-family of methods only, and provide a helper that does only that part instead after swallowing up FuzzedDataProvider&& so it should no longer be used. Sounds like it’s worth a try to me!


    maflcko commented at 9:29 am on December 2, 2024:
    Yes, and disallowing could be approximated with a single call to git grep in the linter, similar to https://github.com/bitcoin/bitcoin/blob/dbc8ba12f3b3548dd6955293c5d650320ca39c5b/test/lint/test_runner/src/main.rs#L270
  38. l0rinc force-pushed on Oct 29, 2024
  39. hodlinator approved
  40. hodlinator commented at 10:31 am on October 31, 2024: contributor

    ACK 6fd185c035c1cc4dd961cf14a2087e97fb069440

    range-diff shows all notable changes since my Concept ACK are in response to my suggestions.

    Ran each new fuzz target with:

    0₿ FUZZ=<target name> build_fuzz/src/test/fuzz/fuzz -max_total_time=30 -jobs=8
    

    Also ran:

    0₿ ctest -j<X> --test-dir build --output-on-failure -R base
    
  41. l0rinc requested review from fjahr on Nov 4, 2024
  42. l0rinc requested review from dergoegge on Nov 4, 2024
  43. marcofleon commented at 2:23 pm on November 5, 2024: contributor

    Code review ACK 6fd185c035c1cc4dd961cf14a2087e97fb069440

    Nice improvements to the fuzz test. The separation into different targets for each encoding looks good to me. I guess in principle, each roundtrip should be separated as well. But seems fine as is for these simple targets.

  44. DrahtBot requested review from hodlinator on Nov 5, 2024
  45. hodlinator approved
  46. hodlinator commented at 4:34 pm on November 5, 2024: contributor

    re-ACK 6fd185c035c1cc4dd961cf14a2087e97fb069440

    (Hoping Drahtbot will react properly to this comment.. already ACKed the same PR version above).

  47. in src/test/base64_tests.cpp:43 in 5ecef7b152 outdated
    42-    BOOST_CHECK(!DecodeBase64("nQB/pZw=\0invalid"s));
    43-    BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid\0"s));
    44+    BOOST_CHECK(!DecodeBase64("invalid\0"s)); // correct size, invalid due to \0
    45+    BOOST_CHECK( DecodeBase64("nQB/pZw="s)); // valid
    46+    BOOST_CHECK(!DecodeBase64("nQB/pZw=\0invalid"s)); // correct size, invalid due to \0
    47+    BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid\0"s)); // invalid size
    


    maflcko commented at 11:01 am on November 8, 2024:
    I don’t think this is correct. The size is identical for both, so how can it be invalid for one, but not the other? They are invalid due to the embedded = and \0.

    l0rinc commented at 11:12 am on December 2, 2024:
    I forgot about this comment, thanks for checking, both tests on 42 and 43 were invalid because of the non-trailing =, not the \0 - updated them

    hodlinator commented at 1:43 pm on December 3, 2024:

    They are invalid, but still worth testing regardless of why precisely the current implementation errors out?

    0BOOST_CHECK(!DecodeBase64("nQB/pZw=\0invalid"); // padding and NUL followed by invalid data
    1BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid\0"); // padding followed by invalid data and NUL
    

    l0rinc commented at 2:54 pm on December 3, 2024:
    What’s the advantage of the comment stating exactly what the code already states? We’re already testing \0 and non-trailing =, what else do these ones add?

    hodlinator commented at 8:13 pm on December 4, 2024:

    What’s the advantage of the comment stating exactly what the code already states?

    Valid point. Better to describe what is wrong according to the Base64 standard (+ \0 I guess).

    We’re already testing \0 and non-trailing =, what else do these ones add?

    They cover slightly different cases, mixing of \0 and =. They may not test a unique error-handling path with the current implementation, but if they can uncover issues with a potential future implementation I think it was a mistake to remove them (@maflcko was just stating the comments were invalid as I understand it).


    maflcko commented at 8:34 am on December 5, 2024:

    (@maflcko was just stating the comments were invalid as I understand it).

    Yes, it seemed odd to touch a line that doesn’t need to be touched only to add a comment that is wrong. It seems better to not touch the line in this case.


    l0rinc commented at 9:28 am on December 5, 2024:
    Done, thanks
  48. l0rinc force-pushed on Dec 2, 2024
  49. l0rinc requested review from maflcko on Dec 2, 2024
  50. l0rinc requested review from marcofleon on Dec 2, 2024
  51. l0rinc requested review from hodlinator on Dec 2, 2024
  52. in src/test/base32_tests.cpp:45 in 37e5fcfd4f outdated
    41+{
    42+    // Is valid without padding
    43+    BOOST_CHECK_EQUAL(EncodeBase32("fooba"), "mzxw6ytb");
    44+
    45+    // Valid size
    46+    BOOST_CHECK(!DecodeBase32("========"s));
    


    hodlinator commented at 11:07 am on December 3, 2024:

    nit: Could use ""sv or simply "" to avoid string allocations here and in base64_padding.

    0    BOOST_CHECK(!DecodeBase32("========"));
    

    l0rinc commented at 2:54 pm on December 3, 2024:
    I’ll consider it if I need to touch again

    l0rinc commented at 9:24 am on December 5, 2024:
    Done, thanks, kept the "sv only for the values with \0 in them
  53. in src/test/base64_tests.cpp:42 in 37e5fcfd4f outdated
    41-    BOOST_CHECK(DecodeBase64("nQB/pZw="s));
    42-    BOOST_CHECK(!DecodeBase64("nQB/pZw=\0invalid"s));
    43-    BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid\0"s));
    44+    BOOST_CHECK(!DecodeBase64("invalid\0"s)); // correct size, invalid due to \0
    45+    BOOST_CHECK( DecodeBase64("nQB/pZw="s)); // valid
    46+    BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid"s)); // invalid, padding only allowed at the end
    


    hodlinator commented at 1:36 pm on December 3, 2024:

    Last line no longer has an embedded NUL character as specified in the comment above this code block (// Decoding strings with embedded NUL characters should fail). Might belong more in base64_padding or at least in its own newline-separated block here:

    0    BOOST_CHECK(!DecodeBase64("invalid\0"s)); // correct size, invalid due to \0
    1    BOOST_CHECK( DecodeBase64("nQB/pZw="s)); // valid
    2
    3    BOOST_CHECK(!DecodeBase64("nQB/pZw=invalid"s)); // invalid, padding only allowed at the end
    

    l0rinc commented at 2:57 pm on December 3, 2024:
    You’re suggesting an empty line here only, right? I’ll consider it if I need to edit again

    hodlinator commented at 8:40 pm on December 4, 2024:
    The comment above the block doesn’t match line 42, this is wrong - so you should do more than consider it if you edit IMO. But it’s not a blocker for me.

    l0rinc commented at 9:24 am on December 5, 2024:
    moved the values without \0 above the comment in 32 and 64 tests, thanks!
  54. l0rinc requested review from hodlinator on Dec 3, 2024
  55. l0rinc force-pushed on Dec 5, 2024
  56. l0rinc force-pushed on Dec 5, 2024
  57. l0rinc force-pushed on Dec 5, 2024
  58. l0rinc commented at 9:28 am on December 5, 2024: contributor

    Thanks @maflcko & @hodlinator for the comments, I’ve rebased the PR in the first push without any changes and addresses your concerns in the second one - let me know if there’s anything left:

    • Separated \0 tests from length validation tests
    • Suffixed strings with \0 with "sv, kept the rest as normal strings
    • Brought back the =\0 case
  59. hodlinator approved
  60. hodlinator commented at 11:40 pm on December 5, 2024: contributor

    re-ACK 3b66b7e20ddba94d251c97818700d46030b14cc5

    • Confirmed that changes since last review match #30746 (comment). Thank you for incorporating our feedback.
    • Ran unit tests and each fuzz target for 5 seconds.
  61. in src/test/fuzz/base_encode_decode.cpp:24 in a32002f2d5 outdated
    23+FUZZ_TARGET(base58_encode_decode)
    24 {
    25-    const std::string random_encoded_string(buffer.begin(), buffer.end());
    26+    FuzzedDataProvider provider(buffer.data(), buffer.size());
    27+    const std::string random_string{provider.ConsumeRandomLengthString(1000)};
    28+    const auto max_ret_len{provider.ConsumeIntegralInRange<int>(1, 300)};
    


    hodlinator commented at 10:20 am on December 30, 2024:

    nit: As random_string is currently 0-1000 long, and max_ret_len is 1-300, most of the time DecodeBase58() is just returning false since max_ret_len is too small. But maybe that is intentional to stress that aspect?

    Might be better to have more similar lengths, and also to include 0 as max_ret_len since it might occur in places where that parameter is calculated.

    0    const std::string random_string{provider.ConsumeRandomLengthString(1000)};
    1    const auto max_ret_len{provider.ConsumeIntegralInRange<int>(0, 1000)};
    

    l0rinc commented at 11:52 am on December 30, 2024:
    Good point, and even negative values theoretically - thanks, pushed
  62. hodlinator approved
  63. hodlinator commented at 10:24 am on December 30, 2024: contributor

    re-ACK a32002f2d5cff72639c9782d70ae52ec162d9ef8

    • Only added commit that fuzzes the max length parameter of DecodeBase58&DecodeBase58Check.

    Re-ran the 2 modified fuzz-targets 30 seconds each.

  64. l0rinc force-pushed on Dec 30, 2024
  65. hodlinator approved
  66. hodlinator commented at 3:26 pm on December 30, 2024: contributor

    re-ACK 6983d82c84d4ca351a7cd9d1e0e20ab878da6475

    Responding to my previous feedback + testing negative max return values.

    Retested modified targets.

  67. test: Add padding tests for Base32/Base64 ae40cf1a8e
  68. test: Extend base58_encode_decode.json with edge cases
    Added edge cases, such as transitions at powers of 58, to verify boundary behavior.
    5dd3a0d8a8
  69. test: Fuzz Base32/Base58/Base64 roundtrip conversions
    This commit introduces symmetric encode-decode roundtrips for all bases.
    Minor refactors were also included:
    • Split each base into a separate fuzz target.
    • Added symmetric encode-decode roundtrip tests for all bases.
    • Removed trim testing for encoded_string, as Base58 does not use whitespace padding.
    • Made comparisons stricter by removing unnecessary lowercase conversions for bases that have mixed-case alphabets.
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    635bc58f46
  70. fuzz: Add fuzzing for max_ret_len in DecodeBase58/DecodeBase58Check
    Different values are used for max_ret_len throughout the codebase (e.g., 21, 34, 78).
    Theoretically, negative and zero values are also permitted. Let's stress-test those as well.
    
    Co-authored-by: brunoerg <brunoely.gc@gmail.com>
    f919d919eb
  71. in src/test/base58_tests.cpp:82 in 3b66b7e20d outdated
    81@@ -82,20 +82,4 @@ BOOST_AUTO_TEST_CASE(base58_DecodeBase58)
    82     BOOST_CHECK(!DecodeBase58Check("3vQB7B6MrGQZaxCuFg4oh\0" "0IOl"s, result, 100));
    


    TheCharlatan commented at 9:13 am on January 4, 2025:

    In commit 3b66b7e20ddba94d251c97818700d46030b14cc5

    s/Spit/Split in commit message.


    l0rinc commented at 12:20 pm on January 4, 2025:

    Hah, indeed, proofread all other commit messages:

    0git range-diff 836d733a6788c7f7e66da6f8007fb62f15161429..6983d82c84d4ca351a7cd9d1e0e20ab878da6475 ae40cf1a8e16462a8b9dfd076d440bc8ec796c2b..f919d919eb8425ef2bb25aa0ebf61c90ab9b07fa
    
  72. l0rinc force-pushed on Jan 4, 2025
  73. hodlinator commented at 9:43 am on January 6, 2025: contributor

    re-ACK f919d919eb8425ef2bb25aa0ebf61c90ab9b07fa

    Only commit message improvements since last ACK.


    PR summary nit:

    Added dedicated padding tests to hardcode the failure behavior.

    Not sure “hardcode” is the best verb there, how about:

    “Added dedicated padding tests to cover failure behavior.”?


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-15 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me