Hash functions are complex, and even though the implementations look simple they can easily go wrong, usually in the buffer handling around the hash function itself(the “Writer”) examples:
Rewriting the last byte that was written, processing the buffer too “early” when Write
was called with exactly a full buffer(different hash functions require different behavior in these cases), the wrong amount of zeros were written in the case the buffer was exactly full when finalizing, and more.
I’ve personally found bugs in an implementation of a hash function via fuzzing against a reference implementation (the bug was actually in the ref impl), and there’s a lot of precedent (See https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=924038 for the amount of bugs just in the reference implementations submitted to NIST’s SHA3 competition)
This can also make PRs like #18014 and future SHA256 optimizations easier to review and be confident in.
I started with siphash specifically because it’s small and simple so I can get feedback on this before continuing to SHA256 etc. The downside of this method is that it means committing another implementation of the same thing separately.
About the implementation itself: I’ve used a constant seperator so it would write each time different sizes, and sometimes even empty writes. and it’s constant so coverage based fuzzers can easily figure this out and make the inputs cover all the branches.
Any feedback is welcome :)