The current implementation of ChaCha20 advances the block even when the last pseudorandom block of 64 bytes is only partially used. This code buffers and reuses those pseudorandom bytes.
This improvement is relevant to the new design of BIP324 where:
- One instance of ChaCha20(for encrypting the length) uses only 3 pseudorandom bytes per message, so throwing away 61/64 bytes would be a 95% waste.
- There are efficiencies to be had from being able to send a vector of plaintexts to encrypt rather than creating a concatenated buffer, but that means the block cannot be advanced in between calls.
Unfortunately, this also means that our implementation is no longer identical to the reference implementation by Daniel J Bernstein (if I had to guess though, closer to his intention for a stream cipher), and the differential fuzz test against his implementation from #22704 no longer makes sense (if it’s not against the pristine version, it does not serve the original purpose)