fuzz: Make FuzzedSock fuzz friendlier #30211

pull dergoegge wants to merge 3 commits into bitcoin:master from dergoegge:2024-05-fix-fuzzsock changing 2 files +19 −10
  1. dergoegge commented at 1:50 pm on May 31, 2024: member
    FuzzedSock has a few issues that block a fuzzer from making progress. See commit messages for details.
  2. [fuzz] Use fuzzer friendly ConsumeRandomLengthByteVector in FuzzedSock::Recv
    See comment on FuzzedDataProvider::ConsumeRandomLengthString.
    865cdf3692
  3. DrahtBot commented at 1:50 pm on May 31, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK marcofleon, brunoerg
    Approach ACK vasild

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  4. DrahtBot added the label Tests on May 31, 2024
  5. dergoegge commented at 1:53 pm on May 31, 2024: member

    These issues were discovered by Marco (@marcofleon) and me while he was working on #28803. He will have a PR for that up soon and this PR is a prerequisite (although it also helps with other harnesses that use FuzzedSock).

    cc @vasild perhaps you are interested in reviewing this.

  6. brunoerg commented at 1:10 pm on June 1, 2024: contributor
    Concept ACK
  7. in src/test/fuzz/util/net.cpp:209 in 617d43e656 outdated
    209             pad_to_len_bytes = false;
    210         }
    211     } else {
    212-        random_bytes = m_fuzzed_data_provider.ConsumeBytes<uint8_t>(
    213-            m_fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, len));
    214+        random_bytes = ConsumeRandomLengthByteVector(m_fuzzed_data_provider, len);
    


    vasild commented at 5:15 am on June 2, 2024:

    I guess using ConsumeRandomLengthByteVector() is slower than ConsumeBytes() because it uses an intermediate std::string, but is probably insignificant.

    The comment says:

    0// Returns a std::string of length from 0 to |max_length|. When it runs out of
    1// input data, returns what remains of the input. Designed to be more stable
    2// with respect to a fuzzer inserting characters than just picking a random
    3// length and then consuming that many bytes with |ConsumeBytes|.
    4inline std::string  
    5FuzzedDataProvider::ConsumeRandomLengthString(size_t max_length) {
    

    I am just curious, did you encounter a case where the current/previous implementation that uses ConsumeBytes() is “unstable” or had problems in some other way?


    dergoegge commented at 9:39 am on June 3, 2024:

    did you encounter a case where the current/previous implementation that uses ConsumeBytes() is “unstable” or had problems in some other way?

    No I just found this while investigating the peek issue (which was the main blocker).

  8. in src/test/fuzz/util/net.cpp:196 in 617d43e656 outdated
    192@@ -193,21 +193,20 @@ ssize_t FuzzedSock::Recv(void* buf, size_t len, int flags) const
    193     bool pad_to_len_bytes{m_fuzzed_data_provider.ConsumeBool()};
    194     if (m_peek_data.has_value()) {
    195         // `MSG_PEEK` was used in the preceding `Recv()` call, return `m_peek_data`.
    196-        random_bytes.assign({m_peek_data.value()});
    197+        random_bytes = m_peek_data.value();
    


    vasild commented at 5:28 am on June 2, 2024:
    This will overflow the buffer in buf in the memcopy below if peek read is called with N bytes and then non-peek read is called with M bytes and M < N.

    dergoegge commented at 9:40 am on June 3, 2024:
    Fixed.
  9. in src/test/fuzz/util/net.h:46 in 617d43e656 outdated
    42@@ -43,7 +43,7 @@ class FuzzedSock : public Sock
    43      * If `MSG_PEEK` is used, then our `Recv()` returns some random data as usual, but on the next
    44      * `Recv()` call we must return the same data, thus we remember it here.
    45      */
    46-    mutable std::optional<uint8_t> m_peek_data;
    47+    mutable std::optional<std::vector<uint8_t>> m_peek_data;
    


    vasild commented at 5:34 am on June 2, 2024:

    from the commit message:

    FuzzedSock only supports peeking at one byte at a time, which is not fuzzer friendly when trying to receive long data.

    I am not sure how likely is this to happen, but at least it is not ruled out by the recv(2) documentation: a peek read of 10 bytes returns just 1 byte and repeating the peek read of 10 bytes returns the same 1 byte over and over again. The app can never peek past the 1 byte without actually consuming/reading that byte. I guess it must be ready for this behavior from the OS or from a mocked fuzzed socket.


    dergoegge commented at 9:45 am on June 3, 2024:

    I could only find the following in the docs and based the changes here on that.

    0This flag causes the receive operation to return data from
    1the beginning of the receive queue without removing that
    2data from the queue.  Thus, a subsequent receive call will
    3return the same data.
    

    We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte. But I’m not sure if that would change much for our use case.


    vasild commented at 12:52 pm on June 10, 2024:

    We could make it so that a peek read only returns one byte and a subsequent normal read return N bytes where the first byte corresponds to the peek byte.

    This was the case before this PR, right?

    But I’m not sure if that would change much for our use case.

    Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?

    The doc you cited, in my understanding, means that what I described in the first comment of this thread is not ruled out and might happen.


    vasild commented at 12:53 pm on June 10, 2024:
    std::vector<uint8_t> can be used instead of std::optional<std::vector<uint8_t>> and if the vector is empty that is the same as the optional without a value.

    dergoegge commented at 1:09 pm on June 10, 2024:

    This was the case before this PR, right?

    No, a peek read before this PR would return 1 byte and a normal read following that would also only return that one byte (with the exception of sometimes padding with a bunch of zeros). This is what causes the fuzz input to never be interpreted as one continuous sequence of data, which is why mutations that insert pieces of data (e.g. from a dictionary) are ineffective.

    Where can I see that use case? Is there some code that does not work without the changes to FuzzedSock::Recv() done in this PR?

    Yes the i2p harness is unable to cover most of the target code without these changes: #30230.


    dergoegge commented at 1:12 pm on June 10, 2024:
    Feel free to open a follow up.

    vasild commented at 2:02 pm on June 12, 2024:
    Removed the optional in #30273.
  10. in src/test/fuzz/util/net.cpp:389 in 617d43e656 outdated
    378@@ -380,7 +379,7 @@ bool FuzzedSock::Wait(std::chrono::milliseconds timeout, Event requested, Event*
    379         return false;
    380     }
    381     if (occurred != nullptr) {
    382-        *occurred = m_fuzzed_data_provider.ConsumeBool() ? requested : 0;
    383+        *occurred = m_fuzzed_data_provider.ConsumeBool() ? 0 : requested;
    


    vasild commented at 5:40 am on June 2, 2024:
    Wow! Good catch! Maybe this deserves a comment because there is deeper reason for the way it is written than one may see at a first glance. (aka to prevent somebody restoring it back to ? requested : 0 in a future refactor, code reorganization).

    dergoegge commented at 9:41 am on June 3, 2024:
    Added some comments
  11. vasild commented at 5:44 am on June 2, 2024: contributor

    Approach ACK 617d43e656816a532ca275707dffeb6e2a2e1fc7

    Just one blocker below (about the overflow).

    Thank you for the improvements!

  12. dergoegge force-pushed on Jun 3, 2024
  13. [fuzz] Make peeking through FuzzedSock::Recv fuzzer friendly
    FuzzedSock only supports peeking at one byte at a time, which is not
    fuzzer friendly when trying to receive long data.
    
    Fix this by supporting peek data of arbitrary length instead of only one
    byte.
    a7fceda68b
  14. [fuzz] Avoid endless waiting in FuzzedSock::{Wait,WaitMany}
    Currently, when the FuzzedDataProvider of a FuzzedSock runs out of data,
    FuzzedSock::Wait and WaitMany will simulate endless waiting as the
    requested events are never simulated as occured.
    
    Fix this by simulating event occurence when ConsumeBool() returns false
    (e.g. when the data provider runs out).
    
    Co-authored-by: dergoegge <n.goeggi@gmail.com>
    22d0f1a27e
  15. dergoegge force-pushed on Jun 3, 2024
  16. marcofleon approved
  17. marcofleon commented at 12:29 pm on June 4, 2024: contributor
    Tested ACK 22d0f1a27ef7733b51b3c2138a8d01713df8f248 In i2p the private key needs to be at least 387 bytes. When this key is generated (using FuzzedSock) the bytes were being broken up such that the length of the saved key was always less than 387. Having Recv return the length of a byte vector, instead of a single byte, when MSG_PEEK is enabled fixes this and allows the fuzzer to continue through the I2P target (https://github.com/bitcoin/bitcoin/issues/28803).
  18. DrahtBot requested review from vasild on Jun 4, 2024
  19. DrahtBot requested review from brunoerg on Jun 4, 2024
  20. brunoerg approved
  21. brunoerg commented at 12:39 pm on June 4, 2024: contributor
    utACK 22d0f1a27ef7733b51b3c2138a8d01713df8f248
  22. fanquake merged this on Jun 4, 2024
  23. fanquake closed this on Jun 4, 2024

  24. in src/test/fuzz/util/net.cpp:222 in 22d0f1a27e
    214@@ -216,7 +215,11 @@ ssize_t FuzzedSock::Recv(void* buf, size_t len, int flags) const
    215         }
    216         return r;
    217     }
    218-    std::memcpy(buf, random_bytes.data(), random_bytes.size());
    219+    // `random_bytes` might exceed the size of `buf` if e.g. Recv is called with
    220+    // len=N and MSG_PEEK first and afterwards called with len=M (M < N) and
    221+    // without MSG_PEEK.
    222+    size_t recv_len{std::min(random_bytes.size(), len)};
    223+    std::memcpy(buf, random_bytes.data(), recv_len);
    


    vasild commented at 1:04 pm on June 10, 2024:

    Now it will not overflow buf but it will lose the trailing N - M bytes from the peek.

    To fix this, when assigning random_bytes earlier, it should only take/pop the first len bytes from m_peek_data.

    I think all this may be unnecessary if the app has to work with peek only returning 1 byte, as discussed in #30211 (review)


    dergoegge commented at 1:16 pm on June 10, 2024:
    These changes were very much necessary for FuzzedSock to be fuzzer friendly. Feel free to open a follow up, if you have suggestion how further improve things.

    vasild commented at 2:01 pm on June 12, 2024:

    … will lose the trailing N - M bytes from the peek.

    Fixed in https://github.com/bitcoin/bitcoin/pull/30273

  25. vasild commented at 1:04 pm on June 10, 2024: contributor
    I think this was merged prematurely.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me