Reduce wasted pseudorandom bytes in ChaCha20 + various improvements #26153

pull sipa wants to merge 11 commits into bitcoin:master from sipa:202209_chacha20 changing 15 files +582 −225
  1. sipa commented at 11:22 pm on September 21, 2022: member

    This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors.

    It separates the multiple-of-64-bytes-only “core” logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20’s new built-in caching.

    I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead.

  2. sipa force-pushed on Sep 21, 2022
  3. sipa force-pushed on Sep 21, 2022
  4. DrahtBot commented at 6:12 am on September 22, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ajtowns, dhruv
    Concept ACK theStack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25325 (Add pool based memory resource by martinus)
    • #24748 (test/BIP324: functional tests for v2 P2P encryption by stratospher)
    • #23169 (fix initialization in FastRandomContext: removes undefined behavior & uninitialized read by martinus)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. fanquake commented at 8:41 am on September 22, 2022: member

    cc @theStack re #25698 & #25712.

    (by my benchmarking, somewhat faster), @aureleoules want to benchmark here as well?

  6. in src/crypto/chacha20.h:53 in d1aa65b292 outdated
    56-    ChaCha20();
    57-    ChaCha20(const unsigned char* key, size_t keylen);
    58-    void SetKey(const unsigned char* key, size_t keylen); //!< set key with flexible keylength; 256bit recommended */
    59-    void SetIV(uint64_t iv); // set the 64bit nonce
    60-    void Seek(uint64_t pos); // set the 64bit block counter
    61+    ChaCha20() : m_aligned() {}
    


    aureleoules commented at 9:22 am on September 22, 2022:

    nit: can be removed


    sipa commented at 6:45 pm on September 22, 2022:
    Fixed.
  7. aureleoules commented at 9:34 am on September 22, 2022: member

    PR

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.58 387,995,472.42 0.5% 16.98 5.93 2.866 0.03 0.0% 0.03 CHACHA20_1MB
    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.57 389,442,111.22 0.1% 16.98 5.89 2.883 0.03 0.0% 0.03 CHACHA20_1MB

    Master

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.57 389,658,756.50 0.4% 17.03 5.89 2.892 0.05 0.0% 0.03 CHACHA20_1MB
    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.56 390,349,081.17 0.3% 17.03 5.87 2.900 0.05 0.0% 0.03 CHACHA20_1MB

    #25354

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.89 346,242,492.84 0.4% 20.13 6.62 3.039 0.08 0.0% 0.03 CHACHA20_1MB
    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    2.88 346,669,699.46 0.6% 20.13 6.59 3.052 0.08 0.0% 0.03 CHACHA20_1MB

    Results seem very close to master and definitely faster than #25354.

  8. sipa force-pushed on Sep 22, 2022
  9. sipa force-pushed on Sep 26, 2022
  10. jarolrod commented at 6:48 pm on September 27, 2022: member

    noticed something interesting, this PR is faster than master and #25354 when I tested on linux x86 but not on macOS arm64. When testing on macOS arm64, #25354 is faster:

    macOS arm64

    pr

     0|             ns/byte |              byte/s |    err% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------:|:----------
     2|                1.45 |      687,478,118.34 |    0.1% |      0.02 | `CHACHA20_1MB`
     3|                1.48 |      676,913,544.12 |    0.4% |      0.01 | `CHACHA20_256BYTES`
     4|                1.53 |      652,987,660.64 |    0.3% |      0.01 | `CHACHA20_64BYTES`
     5|                4.12 |      242,489,722.97 |    0.8% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.05 |      488,239,791.48 |    0.0% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                5.16 |      193,768,633.08 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                2.61 |      383,482,499.71 |    1.1% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|                8.47 |      118,017,131.11 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                4.27 |      234,365,831.19 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    

    master

     0|             ns/byte |              byte/s |    err% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------:|:----------
     2|                1.48 |      675,393,000.64 |    0.1% |      0.02 | `CHACHA20_1MB`
     3|                1.51 |      661,164,154.45 |    0.7% |      0.01 | `CHACHA20_256BYTES`
     4|                1.58 |      632,796,536.80 |    0.4% |      0.01 | `CHACHA20_64BYTES`
     5|                4.18 |      239,082,464.13 |    0.3% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.09 |      478,583,295.30 |    0.3% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                5.22 |      191,467,070.40 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                2.64 |      379,449,038.71 |    0.5% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|                8.32 |      120,243,551.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                4.19 |      238,565,070.80 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    

    25354

     0|             ns/byte |              byte/s |    err% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------:|:----------
     2|                1.45 |      689,833,570.39 |    0.1% |      0.02 | `CHACHA20_1MB`
     3|                1.46 |      683,001,138.22 |    0.5% |      0.01 | `CHACHA20_256BYTES`
     4|                1.50 |      667,103,331.45 |    0.4% |      0.01 | `CHACHA20_64BYTES`
     5|                4.11 |      243,166,834.68 |    0.4% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.05 |      488,903,601.82 |    0.1% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                5.10 |      196,238,854.79 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                2.61 |      383,104,484.46 |    1.4% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|                8.02 |      124,743,480.76 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                4.05 |      246,740,398.27 |    0.1% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    

    Linux x86

    pr

     0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
     2|                1.99 |      502,338,808.12 |    0.6% |           16.75 |            4.80 |  3.490 |           0.02 |    0.0% |      0.02 | `CHACHA20_1MB`
     3|                2.03 |      492,836,102.56 |    1.4% |           17.35 |            4.93 |  3.519 |           0.07 |    0.1% |      0.01 | `CHACHA20_256BYTES`
     4|                2.14 |      466,586,919.27 |    0.4% |           19.14 |            5.33 |  3.591 |           0.23 |    0.0% |      0.01 | `CHACHA20_64BYTES`
     5|                5.46 |      183,057,234.02 |    0.0% |           51.50 |           13.59 |  3.791 |           0.16 |    0.0% |      0.06 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.73 |      366,170,466.28 |    0.0% |           25.75 |            6.79 |  3.790 |           0.08 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                7.45 |      134,198,234.80 |    0.0% |           70.04 |           18.55 |  3.776 |           1.43 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                3.74 |      267,643,600.56 |    0.0% |           35.06 |            9.30 |  3.770 |           0.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|               13.01 |       76,836,476.82 |    0.1% |          122.10 |           32.39 |  3.770 |           5.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                6.94 |      144,054,781.63 |    0.5% |           61.21 |           16.39 |  3.735 |           2.49 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    

    master

     0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
     2|                2.00 |      499,954,466.20 |    0.3% |           16.88 |            4.84 |  3.486 |           0.03 |    0.0% |      0.02 | `CHACHA20_1MB`
     3|                2.05 |      486,935,198.75 |    1.4% |           17.37 |            4.97 |  3.496 |           0.05 |    0.0% |      0.01 | `CHACHA20_256BYTES`
     4|                2.22 |      450,220,269.82 |    0.4% |           18.86 |            5.37 |  3.512 |           0.13 |    0.0% |      0.01 | `CHACHA20_64BYTES`
     5|                5.68 |      176,107,008.87 |    0.2% |           51.76 |           13.70 |  3.777 |           0.19 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.83 |      353,532,983.23 |    0.1% |           25.88 |            6.85 |  3.779 |           0.09 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                7.72 |      129,486,776.95 |    0.2% |           72.62 |           18.74 |  3.875 |           2.06 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                3.89 |      257,245,464.66 |    0.3% |           36.35 |            9.40 |  3.865 |           1.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|               13.97 |       71,562,220.46 |    0.2% |          135.22 |           33.84 |  3.995 |           7.67 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                7.05 |      141,907,027.48 |    0.4% |           67.76 |           17.04 |  3.977 |           3.81 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    

    25354

     0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
     2|                1.98 |      505,256,746.75 |    0.1% |           16.98 |            4.80 |  3.540 |           0.05 |    0.0% |      0.02 | `CHACHA20_1MB`
     3|                2.06 |      484,794,937.08 |    0.2% |           17.46 |            5.00 |  3.493 |           0.08 |    0.0% |      0.01 | `CHACHA20_256BYTES`
     4|                2.22 |      449,753,059.35 |    0.3% |           18.89 |            5.38 |  3.511 |           0.17 |    0.0% |      0.01 | `CHACHA20_64BYTES`
     5|                5.66 |      176,546,195.18 |    0.7% |           51.97 |           13.63 |  3.813 |           0.22 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
     6|                2.81 |      355,375,284.35 |    0.1% |           25.99 |            6.81 |  3.817 |           0.11 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
     7|                8.04 |      124,406,853.03 |    0.4% |           73.18 |           19.44 |  3.765 |           2.14 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
     8|                4.03 |      248,427,666.34 |    0.3% |           36.63 |            9.74 |  3.760 |           1.07 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
     9|               15.09 |       66,259,474.70 |    0.4% |          136.82 |           36.48 |  3.751 |           7.92 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
    10|                7.53 |      132,846,610.84 |    0.4% |           68.57 |           18.20 |  3.768 |           3.94 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`
    
  11. sipa commented at 7:06 pm on September 27, 2022: member
    @jarolrod It’s possible that these differences are just compiler/cpu cache randomness. E.g. just having functions be layed out slightly differently in memory can cause a difference in slight timing differences. Objectively, this PR does less work per byte for long messages than master, while 25354 does more work per byte, but that’s not always possible to benchmark when you’re talking about a difference of a few CPU cycles per 64 bytes. The compiler may in some cases just emit slightly better or slightly worse code, just due to e.g. alignment differences in where the functions end up in the binary and the CPU caching effects.
  12. theStack commented at 10:30 pm on October 2, 2022: contributor
    Concept ACK
  13. DrahtBot added the label Needs rebase on Oct 19, 2022
  14. sipa force-pushed on Oct 19, 2022
  15. sipa commented at 5:54 pm on October 19, 2022: member
    Rebased.
  16. DrahtBot removed the label Needs rebase on Oct 19, 2022
  17. sipa force-pushed on Oct 19, 2022
  18. DrahtBot added the label Needs rebase on Nov 19, 2022
  19. sipa force-pushed on Nov 19, 2022
  20. DrahtBot removed the label Needs rebase on Nov 19, 2022
  21. DrahtBot added the label Needs rebase on Nov 21, 2022
  22. sipa force-pushed on Nov 21, 2022
  23. DrahtBot removed the label Needs rebase on Nov 21, 2022
  24. DrahtBot added the label Needs rebase on Nov 30, 2022
  25. sipa force-pushed on Dec 5, 2022
  26. sipa commented at 7:23 pm on December 5, 2022: member
    Rebased.
  27. DrahtBot removed the label Needs rebase on Dec 5, 2022
  28. w0xlt commented at 7:13 pm on December 15, 2022: contributor

    PR

    ns/byte byte/s err% total benchmark
    0.91 1,094,178,058.96 0.5% 0.01 CHACHA20_1MB
    0.94 1,065,805,397.14 0.3% 0.01 CHACHA20_256BYTES
    1.03 969,669,114.48 0.4% 0.01 CHACHA20_64BYTES
    2.65 376,719,455.06 0.3% 0.03 CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT
    1.33 752,662,840.82 0.4% 0.02 CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT
    3.66 273,544,917.38 0.4% 0.01 CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
    1.84 543,100,043.35 0.5% 0.01 CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT
    6.58 151,969,296.31 1.5% 0.01 CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT
    3.28 304,787,007.02 0.4% 0.01 CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT

    master

    ns/byte byte/s err% total benchmark
    0.92 1,092,717,412.60 0.1% 0.01 CHACHA20_1MB
    0.93 1,070,541,510.75 0.0% 0.01 CHACHA20_256BYTES
    1.01 991,308,062.99 0.9% 0.01 CHACHA20_64BYTES
    2.65 377,589,142.93 0.2% 0.03 CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT
    1.32 755,706,485.70 0.0% 0.02 CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT
    3.56 280,597,405.87 0.0% 0.01 CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
    1.79 559,191,860.78 0.0% 0.01 CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT
    6.43 155,424,293.51 0.0% 0.01 CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT
    3.23 309,596,620.11 0.6% 0.01 CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT
  29. in src/crypto/chacha20.cpp:308 in 4d85d8483b outdated
    310+            c[i] = m[i] ^ m_buffer[64 - m_bufleft + i];
    311+        }
    312+        m_bufleft -= reuse;
    313+        bytes -= reuse;
    314+        c += reuse;
    315+        m += reuse;
    


    aureleoules commented at 1:36 pm on December 22, 2022:
    I checked the test coverage of this PR and this does not seem to be tested (https://btc-coverage.s3.eu-west-1.amazonaws.com/PR-26153/src/crypto/chacha20.cpp.gcov.html). Is it possible to add a test?

    sipa commented at 9:27 pm on January 18, 2023:
    @aureleoules Added a commit to that effect, can you check it’s now covered?

    aureleoules commented at 10:52 am on January 19, 2023:
    I updated the coverage (same url), and it’s well covered now. Well done.
  30. sipa force-pushed on Jan 18, 2023
  31. in src/crypto/chacha20.cpp:299 in 013c0dd4cc outdated
    301+    }
    302+}
    303+
    304+void ChaCha20::Crypt(const unsigned char* m, unsigned char* c, size_t bytes)
    305+{
    306+    unsigned reuse = std::min<size_t>(m_bufleft, bytes);
    


    ajtowns commented at 8:54 pm on January 19, 2023:
    nit: reuse should be inside the if (m_bufleft) here as well?

    sipa commented at 10:26 pm on January 19, 2023:
    Done.
  32. in src/test/util/xoroshiro128plusplus.h:67 in 3e916f17d1 outdated
    62+        m_s1 = rotl(s1, 28);
    63+        return result;
    64+    }
    65+
    66+    static constexpr result_type min() noexcept { return std::numeric_limits<uint64_t>::min(); }
    67+    static constexpr result_type max() noexcept { return std::numeric_limits<uint64_t>::max(); }
    


    ajtowns commented at 9:03 pm on January 19, 2023:
    std::numeric_limits<result_type>::max() ?

    sipa commented at 10:26 pm on January 19, 2023:
    Done.
  33. ajtowns approved
  34. ajtowns commented at 9:59 pm on January 19, 2023: contributor
    ACK 4d85d8483bbc8fdc8338c1b2ebe62c1ababc85ca – code review, tests pass at each commit
  35. sipa force-pushed on Jan 19, 2023
  36. ajtowns commented at 11:37 pm on January 19, 2023: contributor
    ACK aca35e568239e02673e23f38c0fad58fc4911fd4
  37. fanquake commented at 10:41 am on January 20, 2023: member
    @dhruv @real-or-random @jonasnick might be interested in reviewing here?
  38. dhruv commented at 4:47 pm on January 20, 2023: contributor
    tACK that the usage of partial ChaCha20 blocks doesn’t advance the block counter anymore when using the unaligned version: The evidence is that the downstream BIP324 PRs #25361, #23233 and #24545, based on this PR have passing test vectors. @fanquake I can review this in the coming days too, but I don’t know anything about MuHash3072 and xoroshiro128++ yet so it’ll take some time. I’ll do the partial review first.
  39. in src/crypto/chacha20.cpp:274 in b56981e246 outdated
    297         c += 64;
    298         m += 64;
    299     }
    300 }
    301+
    302+void ChaCha20::Keystream(unsigned char* c, size_t bytes)
    


    dhruv commented at 6:13 pm on January 23, 2023:
    Would it be valuable to inline the unaligned variant functions too?

    sipa commented at 6:38 pm on January 30, 2023:
    No, they’re never called within this compilation unit, so there shouldn’t be any place they even can be inlined?
  40. in src/test/fuzz/crypto_chacha20.cpp:68 in 167a829147 outdated
    62+    unsigned char key[32] = {0};
    63+    auto key_bytes = provider.ConsumeBytes<unsigned char>(32);
    64+    std::copy(key_bytes.begin(), key_bytes.end(), key);
    65+    uint64_t iv = provider.ConsumeIntegral<uint64_t>();
    66+    uint64_t total_bytes = provider.ConsumeIntegralInRange<uint64_t>(0, 1000000);
    67+    uint64_t seek = provider.ConsumeIntegralInRange<uint64_t>(0, ~(total_bytes >> 6));
    


    dhruv commented at 7:07 pm on January 23, 2023:

    I’m a little confused by what ~(total_bytes >> 6) is trying to do here. I understand the converting bytes to blocks, but then what’s the bitwise not for? Wouldn’t it result in seeking way past the total_bytes?

    The test of course still works since the seeking is consistent across both instances. I just don’t understand the intent yet.


    dhruv commented at 5:18 am on January 24, 2023:

    Oh, I just got it. That term + the number of blocks for total_bytes will equal 2^64-1 so it is indeed the max seek position.

    A comment might help future readers.


    sipa commented at 7:01 pm on January 30, 2023:
    I added a comment.
  41. in src/test/crypto_tests.cpp:151 in 376c9284ee outdated
    148@@ -149,7 +149,7 @@ static void TestChaCha20(const std::string &hex_message, const std::string &hexk
    149     } else {
    150         rng.Keystream(outres.data(), outres.size());
    151     }
    152-    BOOST_CHECK(out == outres);
    153+    BOOST_CHECK_EQUAL(hexout, HexStr(outres));
    


    dhruv commented at 7:58 pm on January 23, 2023:
    Can we eliminate the out variable now by using hexout.size() / 2 instead of out.size()?
  42. in src/test/crypto_tests.cpp:165 in aca35e5682 outdated
    160@@ -161,6 +161,27 @@ static void TestChaCha20(const std::string &hex_message, const std::string &hexk
    161         }
    162         BOOST_CHECK_EQUAL(hexout, HexStr(outres));
    163     }
    164+
    165+    // Repeat 10x, but fragmented into 3 chunks, to exercise the ChaCha20 class's caching.
    166+    for (int i = 0; i < 10; ++i) {
    


    dhruv commented at 8:22 pm on January 23, 2023:

    Out of curiosity, since every loop re-seeks to seek, why do we repeat this 10 times? Is that primarily for different chunks?

    Separately, is non-determinism acceptable in unit tests? Is there a way to reproduce failures?


    sipa commented at 7:02 pm on January 30, 2023:

    Out of curiosity, since every loop re-seeks to seek, why do we repeat this 10 times? Is that primarily for different chunks?

    Just because it’s very easy to do, and may result in more interesting cases being hit.

    Separately, is non-determinism acceptable in unit tests? Is there a way to reproduce failures?

    The unit tests’ RNG is initialized deterministically, so there shouldn’t be any non-determinism.

  43. in src/test/fuzz/crypto_chacha20.cpp:133 in 167a829147 outdated
    127+            crypt2.Crypt(data2.data() + bytes2, data2.data() + bytes2, now);
    128+        } else {
    129+            crypt2.Keystream(data2.data() + bytes2, now);
    130+        }
    131+        bytes2 += now;
    132+        if (is_last) break;
    


    dhruv commented at 9:53 pm on January 23, 2023:
    0        if (is_last || bytes2 == total_bytes) break;
    

    dhruv commented at 9:55 pm on January 23, 2023:
    If it’s useful to test with 0 bytes as a parameter, maybe bool is_last = (iter == 255) || (bytes2 == total_bytes) || provider.ConsumeBool(); above?

    sipa commented at 7:02 pm on January 30, 2023:
    I’ve added that last suggestion.
  44. dhruv commented at 10:19 pm on January 23, 2023: contributor

    crACK tACK aca35e568239e02673e23f38c0fad58fc4911fd4 with optional comments and a couple questions.

    I rebased #25361 #23233 #23561 and #24545 on this commit and all tests passed (which includes the motivating test cases for this PR wrt the partial ChaCha20 blocks)

    I did not code review the XoRoShiRo128++ implementation.

  45. real-or-random commented at 11:11 am on January 24, 2023: contributor

    @dhruv @real-or-random @jonasnick might be interested in reviewing here?

    I can review it, but I’ll probably find time only in February. (I don’t know if it makes sense to wait for me or not.)

  46. sipa force-pushed on Jan 30, 2023
  47. Split ChaCha20 into aligned/unaligned variants e37bcaa0a6
  48. Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 6babf40213
  49. Make unrestricted ChaCha20 cipher not waste keystream bytes
    Co-authored-by: dhruv <856960+dhruv@users.noreply.github.com>
    12ff72476a
  50. Add xoroshiro128++ PRNG
    Xoroshiro128++ is a fast non-cryptographic random generator.
    Reference implementation is available at https://prng.di.unimi.it/
    
    Co-Authored-By: Pieter Wuille <pieter@wuille.net>
    5f05b27841
  51. Add fuzz test for testing that ChaCha20 works as a stream 38eaece67b
  52. Use ChaCha20 caching in FastRandomContext 5d16f75763
  53. Use ChaCha20Aligned in MuHash3072 code f21994a02e
  54. Only support 32-byte keys in ChaCha20{,Aligned} 62ec713961
  55. Inline ChaCha20 32-byte specific constants 93aee8bbda
  56. Improve test vectors for ChaCha20 fb243d25f7
  57. sipa force-pushed on Jan 30, 2023
  58. Add unit test for ChaCha20's new caching 511aa4f1c7
  59. sipa force-pushed on Jan 31, 2023
  60. ajtowns commented at 3:34 am on February 7, 2023: contributor
    ut reACK 511aa4f1c7508f15cab8d7e58007900ad6fd3d5d
  61. MarcoFalke requested review from dhruv on Feb 7, 2023
  62. dhruv commented at 6:21 pm on February 7, 2023: contributor

    tACK crACK 511aa4f1c7

    Reviewed git range-diff ceb74b844c aca35e5 511aa4f1c7 Downstream BIP324 branches include 511aa4f1c7 with green CI

  63. fanquake merged this on Feb 15, 2023
  64. fanquake closed this on Feb 15, 2023

  65. fanquake commented at 2:59 pm on February 15, 2023: member
    Going to move this forward. @real-or-random would still be great for you to leave any post-merge commentary, if you get the time.
  66. sidhujag referenced this in commit fa531606e4 on Feb 15, 2023
  67. in src/crypto/muhash.cpp:302 in f21994a02e outdated
    298@@ -299,7 +299,7 @@ Num3072 MuHash3072::ToNum3072(Span<const unsigned char> in) {
    299     unsigned char tmp[Num3072::BYTE_SIZE];
    300 
    301     uint256 hashed_in{(HashWriter{} << in).GetSHA256()};
    302-    ChaCha20(hashed_in.data(), hashed_in.size()).Keystream(tmp, Num3072::BYTE_SIZE);
    303+    ChaCha20Aligned(hashed_in.data(), hashed_in.size()).Keystream64(tmp, Num3072::BYTE_SIZE / 64);
    


    real-or-random commented at 11:32 am on March 8, 2023:
    0    static_assert(Num3072::BYTE_SIZE % 64 == 0);
    1    ChaCha20Aligned(hashed_in.data(), hashed_in.size()).Keystream64(tmp, Num3072::BYTE_SIZE / 64);
    

    sipa commented at 6:37 pm on July 18, 2023:
    Done in #28100.
  68. in src/crypto/chacha20.h:66 in 12ff72476a outdated
    62+        m_aligned.SetKey(key, keylen);
    63+        m_bufleft = 0;
    64+    }
    65 
    66     /** set the 64-bit nonce. */
    67     void SetIV(uint64_t iv) { m_aligned.SetIV(iv); }
    


    real-or-random commented at 11:51 am on March 8, 2023:

    (not this PR)

    Should SetIV() also set m_bufleft = 0? If the caller sets an IV without seeking, and we have bytes left in the buffer, then the caller will get those remaining bytes which belong to the wrong buffer.

    I mean, setting an IV without seeking is probably not a good idea anyway, and none of the callers currently do this. But I feel that’s something that then should be excluded by the interface of the class, i.e., SetIV() and SetKey() should ideally seek to 0 (and perhaps be renamed to something line SetIvAndSeek0 or SetIvAndReset.


    sipa commented at 6:37 pm on July 18, 2023:
    I had forgotten about this PR comment, but this was effectively done in #27985.
  69. in src/crypto/chacha20.cpp:43 in 511aa4f1c7
    67 
    68-ChaCha20::ChaCha20()
    69+ChaCha20Aligned::ChaCha20Aligned()
    70 {
    71     memset(input, 0, sizeof(input));
    72 }
    


    real-or-random commented at 11:53 am on March 8, 2023:

    (not this PR)

    But I think initializing with an effective zero key should not be allowed by the interface of the class.


    sipa commented at 6:36 pm on July 18, 2023:
    Done in #28100.
  70. in src/crypto/chacha20.h:63 in 511aa4f1c7
    66+    /** set 32-byte key. */
    67+    void SetKey32(const unsigned char* key32)
    68+    {
    69+        m_aligned.SetKey32(key32);
    70+        m_bufleft = 0;
    71+    }
    


    real-or-random commented at 11:57 am on March 8, 2023:

    Since this is a rekeying operation, consider memory_cleanseing the cache buffer. (It will probably be overwritten soon anyway. but SetKey() should not happen frequently, so we don’t need to worry about performance.)

    I guess there should also be destructor that does the same. And (not this PR), a destructor of ChaCha20Aligned should memory_cleanse the key.


    sipa commented at 6:36 pm on July 18, 2023:
    Done in #28100.
  71. in src/crypto/chacha20.h:41 in 511aa4f1c7
    38+    void Keystream64(unsigned char* c, size_t blocks);
    39+
    40+    /** enciphers the message <input> of length <64*blocks> and write the enciphered representation into <output>
    41+     *  Used for encryption and decryption (XOR)
    42+     */
    43+    void Crypt64(const unsigned char* input, unsigned char* output, size_t blocks);
    


    real-or-random commented at 12:01 pm on March 8, 2023:

    (not this PR)

    nit: I’d prefer an interface that distinguishes between encryption and decryption. That makes the code easier to change if it will be reused for other ciphers.


    sipa commented at 6:36 pm on July 18, 2023:
    Done in #28100.
  72. real-or-random commented at 12:02 pm on March 8, 2023: contributor
    Here are some post-merge comments. None of them are critical. Sorry, many of them are not directly related to the changes in this PR, but I felt it’s still easier that I add all of them to this review.
  73. fanquake commented at 12:04 pm on March 8, 2023: member
    @real-or-random thanks for circling around to this post-merge.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-26 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me