test: Fuzzing siphash against reference implementation [Request for feedback]

elichai commented at 9:29 pm on September 8, 2020: contributor

Hash functions are complex, and even though the implementations look simple they can easily go wrong, usually in the buffer handling around the hash function itself(the “Writer”) examples: Rewriting the last byte that was written, processing the buffer too “early” when Write was called with exactly a full buffer(different hash functions require different behavior in these cases), the wrong amount of zeros were written in the case the buffer was exactly full when finalizing, and more.

I’ve personally found bugs in an implementation of a hash function via fuzzing against a reference implementation (the bug was actually in the ref impl), and there’s a lot of precedent (See https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=924038 for the amount of bugs just in the reference implementations submitted to NIST’s SHA3 competition)

This can also make PRs like #18014 and future SHA256 optimizations easier to review and be confident in.

I started with siphash specifically because it’s small and simple so I can get feedback on this before continuing to SHA256 etc. The downside of this method is that it means committing another implementation of the same thing separately.

About the implementation itself: I’ve used a constant seperator so it would write each time different sizes, and sometimes even empty writes. and it’s constant so coverage based fuzzers can easily figure this out and make the inputs cover all the branches.

Any feedback is welcome :)

sipa commented at 9:41 pm on September 8, 2020: member

I think including a full other reference implementation is overkill.

Every byte array as input has exactly one hash, regardless of how it is split up in chucks and hashed. You can verify the “hash everything at once” code path using test vectors. These are usually published by the same body that standardized the function, and are picked to give good coverage. By the nature of hashing, it tends to be sufficient to test things with a random inputs for a variety of lengths, so that the padding code gets exercised - but beyond that, the actual hashed data is of little relevance, as tiny errors in the actual hashing logic should mean a completely different hash anyway.

As you point out, the buffering logic is often far more error-prone, and harder to test. I think it’s a useful target for fuzzing, but all you need for that is a fuzzer that takes as input a number of chunks, and then hashes them both concatenated at once, and split up as separate writes, and compares the result.

practicalswift commented at 9:43 pm on September 8, 2020: contributor

Concept ACK

This is excellent!

BTW, if you’re interested in differential cryptography fuzzing you should check out @guidovranken’s really cool Cryptofuzz project.

The list of bug trophies for Cryptofuzz is amazing! Cryptofuzz has found bugs in OpenSSL, LibreSSL, BoringSSL, libgcrypt, Crypto++, NSS, Botan, wolfCrypt, sjcl, SymCrypt, mbed TLS, etc – more or less everything under test :) @catenacyber’s elliptic-curve-differential-fuzzer project is really cool too.

elichai commented at 9:55 pm on September 8, 2020: contributor

As you point out, the buffering logic is often far more error-prone, and harder to test. I think it’s a useful target for fuzzing, but all you need for that is a fuzzer that takes as input a number of chunks, and then hashes them both concatenated at once, and split up as separate writes, and compares the result.

I thought about this, but even if you test chuncks vs single write, you still go through the same writer with the same logic, so it might not find an off by one bug in the memcpy of one of the branches in Write/Finalize.

sipa commented at 9:57 pm on September 8, 2020: member

I thought about this, but even if you test chuncks vs single write, you still go through the same writer with the same logic, so it might not find an off by one bug in the memcpy of one of the branches in Write/Finalize.

That should be caught using test vectors, as the code path is identical for all inputs of the same length

elichai commented at 10:09 pm on September 8, 2020: contributor

That should be caught using test vectors, as the code path is identical for all inputs of the same length

You can see in the paper I’ve linked that some times the test vectors don’t cover all cases (a bug that returned the same value everytime a message of size BLOCK_SIZE was written, no matter the content, they had only 1 vector with a message of that length, so no one noticed). Also different implementations implement the buffering differently and thus might have different branching logic, so test vectors written to cover all branches of one implementation might not cover all branches of other implementations.

but I do agree that most bugs will be covered by test vectors + fuzzing hashing by chunks vs hashing in one shot

practicalswift commented at 10:16 pm on September 8, 2020: contributor

The nice thing about differential fuzzing as suggested by @elichai is that we don’t have to make any assumptions about the implementation details of the function being fuzzed. Personally I think that benefit is worth the cost (in the form of an extra file in src/test/fuzz/).

sipa commented at 10:16 pm on September 8, 2020: member

You can see in the paper I’ve linked that some times the test vectors don’t cover all cases (a bug that returned the same value everytime a message of size BLOCK_SIZE was written, no matter the content, they had only 1 vector with a message of that length, so no one noticed).

So include an test vector for every length, up to some multiple of the block size.

That’s still far less effort than including/integrating a whole separate implementation and maintain it.

sipa commented at 10:21 pm on September 8, 2020: member

@practicalswift I think that spending time on blindly writing fuzzers without making any assumptions about the code you’re testing is often a waste of time. You can be far more effective by focussing on things that aren’t guaranteed by the source code, or are likely to trigger edge cases. And this isn’t just your time, but also the time of people reviewing and maintaining the codebase later.

Fuzzing is good at finding combinations of inputs that trigger various code paths, so use it for that.

practicalswift commented at 10:31 pm on September 8, 2020: contributor

@sipa

I think that spending time on blindly writing fuzzers without making any assumptions about the code you’re testing is often a waste of time.

FWIW the fuzzers that found https://github.com/sipa/miniscript/issues/7, https://github.com/sipa/miniscript/issues/12, https://github.com/sipa/miniscript/issues/13, https://github.com/sipa/miniscript/issues/14, https://github.com/sipa/miniscript/issues/15 and https://github.com/sipa/miniscript/issues/25 were all written without making any assumptions about the code being tested :)

Don’t trust: fuzz! :)

sipa commented at 10:32 pm on September 8, 2020: member

Don’t put words in my mouth.

I’m not saying that you can’t find issues that way. I’m saying you can be more effective at it if you understand the code better.

DrahtBot added the label Build system on Sep 8, 2020

guidovranken commented at 9:07 am on September 9, 2020: contributor

Thanks for the heads up @practicalswift . I’ve implemented differential testing of Bitcoin’s Siphash. No bugs found. @sipa Untested but the use of int count may cause it to produce incorrect results or worse with inputs >= 2GB.

MarcoFalke removed the label Build system on Sep 9, 2020

MarcoFalke added the label Tests on Sep 9, 2020

MarcoFalke commented at 9:25 am on September 9, 2020: member

I don’t have any background in implementing hash functions, so I have no opinion on whether to only test that different chunk sizes on the same input reveal the same output or to test against a full reference impl. Maintaining a reference impl for hash functions should hopefully be of little overhead because we can assume they are well tested, bug free and stable, right?

Though, a requirement for merging this is that all tests pass. Right now they fail:

0SUMMARY: UndefinedBehaviorSanitizer: null-pointer-use /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_iterator.h:820:16 in

elichai force-pushed on Sep 9, 2020

practicalswift commented at 11:33 am on September 9, 2020: contributor

@guidovranken

@sipa Untested but the use of int count may cause it to produce incorrect results or worse with inputs >= 2GB.

Yes, a signed integer overflow takes place in src/crypto/siphash.cpp:

 0$ src/test/fuzz/simplest_possible_siphash_fuzzer -rss_limit_mb=8000 crash-061a172add013c03beedf57eb2a121a8289696af
 1crypto/siphash.cpp:56:10: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
 2$ cat src/test/fuzz/simplest_possible_siphash_fuzzer.cpp
 3#include <cstdint>
 4#include <vector>
 5
 6#include <crypto/siphash.h>
 7
 8void test_one_input(const std::vector<uint8_t>& buffer)
 9{
10    CSipHasher(0, 0).Write(buffer.data(), buffer.size()).Finalize();
11}

elichai commented at 12:16 pm on September 9, 2020: contributor

I don’t have any background in implementing hash functions, so I have no opinion on whether to only test that different chunk sizes on the same input reveal the same output or to test against a full reference impl. Maintaining a reference impl for hash functions should hopefully be of little overhead because we can assume they are well tested, bug free and stable, right?

Though, a requirement for merging this is that all tests pass. Right now they fail:
0SUMMARY: UndefinedBehaviorSanitizer: null-pointer-use /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_iterator.h:820:16 in 

Thanks! fixed.

practicalswift commented at 12:38 pm on September 9, 2020: contributor

Tested ACK 7f803816ee91fb2e050cdd7fd7df2f96806e0245 – the fuzzing harness is written in such a way that it is able to trigger a signed integer overflow:

0$ src/test/fuzz/crypto_siphash -rss_limit_mb=16000 crash-b992191b7160e637042d9918c3dffc562bc8bb14
1crypto/siphash.cpp:56:10: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
2SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior crypto/siphash.cpp:56:10 in

Thanks for doing this @elichai. Don’t trust: fuzz! :)

Crypt-iQ commented at 3:25 pm on October 21, 2020: contributor

Concept ACK - really cool to see differential fuzzing in action.

elichai commented at 4:10 pm on October 21, 2020: contributor

Fuzzing is good at finding combinations of inputs that trigger various code paths, so use it for that.

Using the fuzzer to find new test vectors and add them manually is a good idea. the problem is that every change in the logic (ie #18014) will require to re-run and generate new test vectors (and obviously test those against a reference implementation and against single full writes)

laanwj commented at 3:18 pm on November 19, 2020: member

I think including a full other reference implementation is overkill.

Well, I’d normally agree, but looking at the actual code it’s really small and placed in the test directory in any case. So, assuming there are no other problems such as licensing, I have no problems with it in that regard.

DrahtBot commented at 3:40 am on December 4, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

DrahtBot added the label Needs rebase on Dec 15, 2020

practicalswift commented at 8:21 pm on February 28, 2021: contributor

Needs rebase @elichai :)

Add siphash reference impl to fuzzers veorq/SipHash#bab35c64d10f63587a3693a71200620f0ee03cc4 b71cf8469a

Add a fuzz harness for siphash against reference implementation e0d809082f

elichai force-pushed on Mar 19, 2021

DrahtBot removed the label Needs rebase on Mar 19, 2021

practicalswift commented at 2:50 pm on March 19, 2021: contributor

cr ACK e0d809082f5193be422252d11a88ec44f7d14170: patch still looks correct

Personally I think this fuzzing harness has already proven itself since it was able to trigger a now fixed signed integer overflow. No need to theorize when we have empirical evidence: practical is better than theoretical :)

Thanks @elichai for raising the quality bar!

MarcoFalke commented at 5:01 am on May 6, 2021: member

How is this different from https://github.com/google/oss-fuzz/pull/5717 ?

fanquake commented at 3:22 pm on March 24, 2022: member

I think we can close this for now. I’m also ~0 on adding another implementation to this repository.

fanquake closed this on Mar 24, 2022

MarcoFalke commented at 5:34 pm on March 24, 2022: member

There is differential fuzzing in ./src/test/fuzz/crypto_diff_fuzz_chacha20.cpp IIRC, so I think it is generally ok to do differential fuzzing.

DrahtBot locked this on Mar 24, 2023

test: Fuzzing siphash against reference implementation [Request for feedback] #19920

Conflicts