[crypto] Reduce wasted pseudorandom bytes in ChaCha20 #25354

pull dhruv wants to merge 1 commits into bitcoin:master from dhruv:chacha20-partial-blocks changing 5 files +109 −330
  1. dhruv commented at 3:41 am on June 13, 2022: contributor

    The current implementation of ChaCha20 advances the block even when the last pseudorandom block of 64 bytes is only partially used. This code buffers and reuses those pseudorandom bytes.

    This improvement is relevant to the new design of BIP324 where:

    • One instance of ChaCha20(for encrypting the length) uses only 3 pseudorandom bytes per message, so throwing away 61/64 bytes would be a 95% waste.
    • There are efficiencies to be had from being able to send a vector of plaintexts to encrypt rather than creating a concatenated buffer, but that means the block cannot be advanced in between calls.

    Unfortunately, this also means that our implementation is no longer identical to the reference implementation by Daniel J Bernstein (if I had to guess though, closer to his intention for a stream cipher), and the differential fuzz test against his implementation from #22704 no longer makes sense (if it’s not against the pristine version, it does not serve the original purpose)

  2. DrahtBot added the label Build system on Jun 13, 2022
  3. DrahtBot added the label Utils/log/libs on Jun 13, 2022
  4. laanwj commented at 1:28 pm on June 13, 2022: member

    if I had to guess though, closer to his intention for a stream cipher

    Yes, this behavior sounds really really strange. The key stream bytes you get shouldn’t depend on which quantities you request them in? I wonder why the reference implementation does this.

  5. dhruv commented at 3:12 pm on June 13, 2022: contributor
    @laanwj I don’t have access to DJB but I have tried to ask him upon your prompt: https://twitter.com/dhruv/status/1536354037552951296
  6. sipa commented at 3:59 pm on June 13, 2022: member

    Concept ACK on changing this; it much more closely matches the expectation of a stream cipher interface, especially as the specification does not say anything about skipping bytes.

    That said, I think that wondering about the origin of this is mostly pointless apart from historical curiosity. It’s an implementation issue, which seems to have leaked into the protocol definitions of things that have been built on top (incl. RFC8439 itself, which skips the second 32 byte of the chacha20 stream in the chacha20poly1305 cipher). Any protocol relying on it obviously can’t just swap out this implementation as it’d be a protocol break, but other than that, either behavior can be implemented in function of either implementation (add a buffer on top in one direction, or introduce explicit skipping steps in the other).

  7. real-or-random commented at 4:59 pm on June 13, 2022: contributor
    It’s just the API. See the comment starting with “Encryption/decryption of arbitrary length messages” in https://web.archive.org/web/20170119113129/http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/api/skel/ecrypt-sync.h?view=auto
  8. sipa commented at 5:11 pm on June 13, 2022: member

    It’s just the API. See the comment starting with “Encryption/decryption of arbitrary length messages” in https://web.archive.org/web/20170119113129/http://www.ecrypt.eu.org/stream/svn/viewcvs.cgi/ecrypt/trunk/api/skel/ecrypt-sync.h?view=auto

    but he is NOT allowed to make additional encryption calls once he

    • has called ECRYPT_encrypt_bytes() (unless he starts a new message
    • of course).

    Right, that explains. It’s still a bit bizarre that the function even bothers updating the position counter in the state then, but there’s nothing wrong with it.

  9. dhruv commented at 5:13 pm on June 13, 2022: contributor
    @real-or-random Nice detective work! Thank you.
  10. laanwj commented at 8:10 pm on June 13, 2022: member

    That said, I think that wondering about the origin of this is mostly pointless apart from historical curiosity. It’s an implementation issue

    Right. I’m happy that we found this issue before BIP324 was deployed. It would have been virtually impossible to make this optimization later.

  11. sipa commented at 10:19 pm on June 13, 2022: member
    I wrote a fuzz test for verifying that invoking the ChaCha20::Crypto() and/or ChaCha20::Keystream() function multiple times concatenated works identical to calling it once for the whole block: https://github.com/sipa/bitcoin/commits/pr25354. Feel free to steal/include/incorporate.
  12. dhruv commented at 3:24 am on June 14, 2022: contributor

    I wrote a fuzz test for verifying that invoking the ChaCha20::Crypto() and/or ChaCha20::Keystream() function multiple times concatenated works identical to calling it once for the whole block: https://github.com/sipa/bitcoin/commits/pr25354. Feel free to steal/include/incorporate.

    Thanks, @sipa! Yes, I will incorporate it once I’m done refreshing all BIP324 PRs to the new spec in the next few days.

  13. DrahtBot commented at 8:39 am on June 15, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26153 (Reduce wasted pseudorandom bytes in ChaCha20 + various improvements by sipa)
    • #25712 (crypto: drop 16 byte key support for ChaCha20 by theStack)
    • #25698 (crypto: avoid potential buffer overread in ChaCha20::SetKey by theStack)
    • #25695 (tidy: add modernize-use-using by fanquake)
    • #25361 (BIP324: Cipher suite by dhruv)
    • #25172 (refactor: use std:: prefix for std lib funcs by fanquake)
    • #23561 (BIP324: Handshake prerequisites by dhruv)
    • #23322 ([Fuzz] Poly1305 differential fuzzing against Floodyberry’s implementation by prakash1512)
    • #23233 (BIP324: Add encrypted p2p transport {de}serializer by dhruv)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  14. Reduce wasted pseudorandom bytes in ChaCha20 70d2a4582c
  15. dhruv force-pushed on Jun 25, 2022
  16. dhruv commented at 3:48 am on June 25, 2022: contributor
    Rebased with master for other branches in the project
  17. in src/Makefile.test.include:249 in 70d2a4582c
    245@@ -246,7 +246,6 @@ test_fuzz_fuzz_SOURCES = \
    246  test/fuzz/crypto_chacha20.cpp \
    247  test/fuzz/crypto_chacha20_poly1305_aead.cpp \
    248  test/fuzz/crypto_common.cpp \
    249- test/fuzz/crypto_diff_fuzz_chacha20.cpp \
    


    MarcoFalke commented at 6:45 am on June 27, 2022:

    nit:

     0diff --git a/test/sanitizer_suppressions/ubsan b/test/sanitizer_suppressions/ubsan
     1index 67ef512895..e698ffce53 100644
     2--- a/test/sanitizer_suppressions/ubsan
     3+++ b/test/sanitizer_suppressions/ubsan
     4@@ -19,7 +19,6 @@ unsigned-integer-overflow:bench/bench.h
     5 unsigned-integer-overflow:FuzzedDataProvider.h
     6 unsigned-integer-overflow:leveldb/
     7 unsigned-integer-overflow:minisketch/
     8-unsigned-integer-overflow:test/fuzz/crypto_diff_fuzz_chacha20.cpp
     9 implicit-integer-sign-change:*/include/boost/
    10 implicit-integer-sign-change:*/include/c++/
    11 implicit-integer-sign-change:*/new_allocator.h
    
  18. theStack commented at 1:54 pm on July 27, 2022: contributor

    Concept ACK

    In order to increase test coverage, I’d suggest to

    0            ...
    1            if (prev_block_start_pos >= 64) {
    2                prev_block_start_pos = 0;
    3            }
    4            if (bytes) continue;
    5            ...
    
    • add a corresponding test for ChaCha20::Crypt
  19. vincenzopalazzo commented at 10:52 pm on July 28, 2022: none
    Concept ACK
  20. in src/crypto/chacha20.cpp:66 in 70d2a4582c
    61 }
    62 
    63 ChaCha20::ChaCha20(const unsigned char* k, size_t keylen)
    64 {
    65     SetKey(k, keylen);
    66+    prev_block_start_pos = 0;
    


    stratospher commented at 8:03 am on September 18, 2022:
    70d2a45: could do memset(prev_block_bytes, 0, sizeof(prev_block_bytes)); here too.

    aureleoules commented at 10:44 am on September 21, 2022:
    prev_block_start_pos already initialized in class
  21. aureleoules commented at 9:59 am on September 21, 2022: member

    PR

    ./src/bench/bench_bitcoin -filter='CHACHA20_1MB'

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.89 346,242,492.84 0.4% 20.13 6.62 3.039 0.08 0.0% 0.03 CHACHA20_1MB
    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.88 346,669,699.46 0.6% 20.13 6.59 3.052 0.08 0.0% 0.03 CHACHA20_1MB

    Master

    ./src/bench/bench_bitcoin -filter='CHACHA20_1MB'

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.56 390,349,081.17 0.3% 17.03 5.87 2.900 0.05 0.0% 0.03 CHACHA20_1MB
    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.57 389,658,756.50 0.4% 17.03 5.89 2.892 0.05 0.0% 0.03 CHACHA20_1MB

    is this implementation slower?

  22. in src/crypto/chacha20.cpp:60 in 70d2a4582c
    55 
    56 ChaCha20::ChaCha20()
    57 {
    58     memset(input, 0, sizeof(input));
    59+    memset(prev_block_bytes, 0, sizeof(prev_block_bytes));
    60+    prev_block_start_pos = 0;
    


    aureleoules commented at 10:43 am on September 21, 2022:
    prev_block_start_pos already initialized in class
  23. dhruv commented at 5:49 pm on September 22, 2022: contributor

    @aureleoules

    You’re right, there’s a slowdown. However, the existing implementation is incorrect and skipping blocks will be ~20x slower for the BIP324 specification. Here are my numbers:

    master:

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    1.25 799,288,251.65 0.1% 17.33 4.74 3.654 0.05 0.0% 11.02 CHACHA20_1MB

    #25354 (This PR):

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    1.34 747,649,058.86 0.0% 19.94 5.07 3.933 0.08 0.0% 10.99 CHACHA20_1MB

    #26153 (Alternate implementation by @sipa):

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    1.25 802,562,761.09 0.3% 16.70 4.72 3.538 0.03 0.0% 10.98 CHACHA20_1MB

    Since the implementation by @sipa meets the non-inferiority bar on the benchmarks (corroborated by @aureleoules’ comment on #26153), I’m closing this PR and will shortly rebase all my branches for BIP324 to use #26153.

  24. dhruv closed this on Sep 22, 2022

  25. fanquake referenced this in commit 1e0198b6c1 on Feb 15, 2023
  26. bitcoin locked this on Sep 22, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-15 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me