test: cover base32/base58/base64 with symmetric roundtrip fuzz (and padding) tests #30746

pull l0rinc wants to merge 3 commits into bitcoin:master from l0rinc:l0rinc/roundtrip-tests-base32-base64 changing 5 files +107 −47
  1. l0rinc commented at 6:58 am on August 29, 2024: contributor
    Added fuzzed roundtrips for base32/base58/base64 encoding to make sure encoding/decoding are symmetric. Note that if we omit the padding in EncodeBase32 we won’t be able to decode it with DecodeBase32. Added dedicated padding tests to hardcode the failure behavior. Also moved over the Base58 json test edge cases from https://github.com/bitcoin/bitcoin/pull/30035
  2. DrahtBot commented at 6:58 am on August 29, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  3. l0rinc marked this as ready for review on Aug 29, 2024
  4. l0rinc renamed this:
    Add base32 and base64 roundtrip random padding tests
    test: cover base32/base64 with random roundtrip padding tests
    on Aug 29, 2024
  5. DrahtBot added the label Tests on Aug 29, 2024
  6. in src/test/base32_tests.cpp:60 in caddc34c6c outdated
    56+    BOOST_CHECK( DecodeBase32("aaaaa==="s));
    57+    BOOST_CHECK(!DecodeBase32("aaaaaa=="s));
    58+    BOOST_CHECK( DecodeBase32("aaaaaaa="s));
    59+}
    60+
    61+BOOST_AUTO_TEST_CASE(base32_roundtrip) {
    


    fjahr commented at 2:54 pm on August 29, 2024:
    Shouldn’t this be a fuzzing test?

    l0rinc commented at 5:08 pm on August 29, 2024:
    I’ve modelled these based on https://github.com/bitcoin/bitcoin/blob/master/src/test/base58_tests.cpp#L84-L98 - but I guess the existing fuzz tests already cover this as well, so I’ll migrate them all to https://github.com/bitcoin/bitcoin/blob/master/src/test/fuzz/base_encode_decode.cpp#L20-L54, thanks for the hint

    l0rinc commented at 5:29 pm on August 29, 2024:
    Done
  7. in src/test/base64_tests.cpp:63 in caddc34c6c outdated
    62+    BOOST_CHECK(!DecodeBase64("a==="s));
    63+    BOOST_CHECK( DecodeBase64("YQ=="s));
    64+    BOOST_CHECK( DecodeBase64("YWE="s));
    65+}
    66+
    67+BOOST_AUTO_TEST_CASE(base64_roundtrip) {
    


    fjahr commented at 2:54 pm on August 29, 2024:
    Here too, this is probably better implemented as a fuzzing test.

    l0rinc commented at 5:08 pm on August 29, 2024:
  8. l0rinc force-pushed on Aug 29, 2024
  9. l0rinc renamed this:
    test: cover base32/base64 with random roundtrip padding tests
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and some padding) tests
    on Aug 29, 2024
  10. l0rinc renamed this:
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and some padding) tests
    test: cover base32/base58/base64 with symmetric roundtrip fuzz (and padding) tests
    on Aug 29, 2024
  11. DrahtBot added the label CI failed on Aug 30, 2024
  12. l0rinc force-pushed on Aug 30, 2024
  13. l0rinc force-pushed on Aug 30, 2024
  14. l0rinc force-pushed on Aug 31, 2024
  15. l0rinc force-pushed on Sep 1, 2024
  16. l0rinc commented at 4:24 pm on September 1, 2024: contributor
    @maflcko, the Fuzz tests seem to be timing out now, is the new setup too heavy, should I maybe add them to different suites?
  17. maflcko commented at 5:55 am on September 2, 2024: member

    @maflcko, the Fuzz tests seem to be timing out now, is the new setup too heavy, should I maybe add them to different suites?

    Did you run them locally, and can point out which commit makes them time out? Is there a flame graph, or some other kind of benchmark or performance debugging that would help tack this down?

    Absent that, it seems odd that the fuzz tests would time out here.

  18. test: Add base32/base64 padding tests 3df7e121b2
  19. test: Extend the base58_encode_decode.json with edge cases
    e.g. on a transition of power of 58 to check the boundaries
    96d44ed388
  20. l0rinc force-pushed on Sep 2, 2024
  21. test: Fuzz Base32/Base58/Base64 roundtrips both ways
    A few other minor refactors were also done:
    * Spit out each base to separate fuzz target
    * Added symmetric encode-decode roundtrips all bases
    * Removed trim testing of encoded_string since base58 doesn't have whitespace padding
    * made the comparison stricter by removing unnecessary lowercase conversions for bases that have both cases in their alphabet
    58fc3ce0a3
  22. l0rinc force-pushed on Sep 2, 2024
  23. l0rinc commented at 2:29 pm on September 2, 2024: contributor

    it seems odd that the fuzz tests would time out here

    Split the big fuzz target into separate base58_encode_decode / base58check_encode_decode / base32_encode_decode / base64_encode_decode / psbt_base64_decode - it’s cleaner like this anyway.

    Edit: seems this fixed the timeout

  24. in src/test/fuzz/base_encode_decode.cpp:67 in 58fc3ce0a3
    79+    if (auto result{DecodeBase32(random_string)}) {
    80+        auto encoded_string{EncodeBase32(*result)};
    81+        assert(encoded_string == ToLower(TrimString(random_string)));
    82     }
    83+    // Encode/Decode roundtrip
    84+    auto encoded{EncodeBase32(random_data)};
    


    dergoegge commented at 10:06 am on September 10, 2024:

    Reusing the same data from the fuzz input for two different things (e.g. decode/encode and encode/decode) is usually not a good idea.

    How about adding a decode step after the encoding above (i.e. decode -> encode -> decode)? Or alternatively split this second round-trip into its own harness as well?


    l0rinc commented at 11:22 am on September 10, 2024:

    Thanks for the review!

    is usually not a good idea

    Can you please expand on that? I don’t mind splitting, as you’ve recommended, but I have to understand what the current drawbacks are.


    dergoegge commented at 1:03 pm on September 10, 2024:
    It might be less obvious (and maybe not even a problem) for this simple harness but the decode/encode and encode/decode round-trips are different tests. Their input spaces are not the same, which means that reusing the same fuzz input for both leads to less efficient fuzzing (e.g. the corpus will include inputs that are only relevant to one of the tests but both will be executed).

    maflcko commented at 1:35 pm on September 10, 2024:

    There is also an overhead that we probably don’t want a separate fuzz target for every trivial function, because each fuzz target may be put into a separate binary and is allocated a separate folder. So in very trivial cases, I think it is fine to combine several trivial function calls into one fuzz target.

    Encode/Decode of a base seems to be trivial enough. I think the previous CI timeout may be unrelated #30746 (comment) , because there are general CI issues right now. Generally, it is best to not rely on CI and separately double-check and confirm any result locally.

    If you want to combine the test cases, but not want to always exclude both test branches, the CallOneOf helper can be used:

    0        CallOneOf(
    1            fuzzed_data_provider,
    2            [&] {
    3                Branch1(fuzzed_data_provider);
    4            },
    5            [&] {
    6                Branch2(fuzzed_data_provider);
    7        });
    

    l0rinc commented at 2:35 pm on September 10, 2024:
    Thanks @maflcko. @dergoegge, would a CallOneOf based solution satisfy your needs?

    dergoegge commented at 10:59 am on September 13, 2024:

    There is also an overhead that we probably don’t want a separate fuzz target for every trivial function, because each fuzz target may be put into a separate binary and is allocated a separate folder.

    I don’t think the binary sizes would be a problem after we merge something like #30882 and the extra folder for the corpus is also not a huge cost (they compress really nicely).

    I’d prefer to follow best practices over having to weigh triviality against some external overhead of running these tests. “Put separate tests into separate harnesses” is easier to follow than “aggregate trivial functions in one harness” because it is entirely subjective. What is trivial? How many unrelated trivial functions are allowed in one harness? I think this creates confusion for devs that aren’t experts on fuzzing.

    would a CallOneOf based solution satisfy your needs?

    I don’t think this solves the issue, e.g. mutations that change the calloneof branch result in completely changing the meaning of the fuzz input (i.e. switching between the different tests).

    I don’t mean to derail this PR, it’s already improving things by splitting up base_encode_decode. Feel free to leave it as is.


    maflcko commented at 11:38 am on September 13, 2024:

    I don’t think the binary sizes would be a problem after we merge something like #30882 and the extra folder for the corpus is also not a huge cost (they compress really nicely).

    Are you sure? IIUC 30882 creates a separate binary for each fuzz target, so depending on the compile/link options (static, without lto, …), each fuzz target may or may not be heavy. For example, the OSS-Fuzzing already went down due to storage limitations (https://github.com/google/oss-fuzz/pull/12232). I am not saying that OSS-Fuzz should be considered a blocker, given that there are now several in-house backups, but I wanted to mention it for context.

    I’d prefer to follow best practices over having to weigh triviality against some external overhead of running these tests. “Put separate tests into separate harnesses” is easier to follow than “aggregate trivial functions in one harness” because it is entirely subjective. What is trivial? How many unrelated trivial functions are allowed in one harness? I think this creates confusion for devs that aren’t experts on fuzzing.

    I think I agree that it would be ideal to have an easy/trivial guideline to follow. However, I still think there is some threshold where a separate fuzz target (or maybe even a fuzz target at all) may not be the best choice. There are many pure util functions that (in theory) could go into a separate fuzz target. For example, StringForFeeEstimateHorizon https://github.com/bitcoin/bitcoin/blob/e43ce250c6fb65cc0c8903296c8ab228539d2204/src/test/fuzz/kitchen_sink.cpp#L45 should (in theory) be moved to a separate target. However, it is such a trivial function (a single switch-case) that linking the binary and starting it takes longer than reaching full coverage from an empty initial seed corpus. Maybe the function is too trivial to be put into a fuzz target at all, but I think there are plenty other pure util functions, such that there could easily be 500-1500 fuzz targets with similar properties. Of course there is nothing inherently wrong with having so many fuzz targets, but I could imagine that some tooling (introspector, coverage reports per fuzz target) or humans consuming the tool’s output could be overwhelmed by the number of trivial fuzz targets and miss the actually important ones.


    l0rinc commented at 11:45 am on September 13, 2024:

    Thanks for the directions @dergoegge, @maflcko.

    I don’t mean to derail this PR, it’s already improving things by splitting up base_encode_decode. Feel free to leave it as is.

    I’m all for separation of concerns, that’s what I’ve done here as well. If there’s a simple and non-controvertial way to make it even better, let me know!

  25. deep709 approved
  26. maflcko removed the label CI failed on Sep 12, 2024
  27. Oaksta85 approved

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-20 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me