Reduce wasted pseudorandom bytes in ChaCha20 + various improvements

sipa commented at 11:22 pm on September 21, 2022: member

This is an alternative to #25354 (by my benchmarking, somewhat faster), subsumes #25712, and adds additional test vectors.

It separates the multiple-of-64-bytes-only “core” logic (which becomes simpler) from a layer around which performs caching/slicing to support arbitrary byte amounts. Both have their uses (in particular, the MuHash3072 code can benefit from multiple-of-64-bytes assumptions), plus the separation results in more readable code. Also, since FastRandomContext effectively had its own (more naive) caching on top of ChaCha20, that can be dropped in favor of ChaCha20’s new built-in caching.

I thought about rebasing #25712 on top of this, but the changes before are fairly extensive, so redid it instead.

sipa force-pushed on Sep 21, 2022

DrahtBot commented at 6:12 am on September 22, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ajtowns, dhruv
Concept ACK	theStack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25325 (Add pool based memory resource by martinus)
#24748 (test/BIP324: functional tests for v2 P2P encryption by stratospher)
#23169 (fix initialization in FastRandomContext: removes undefined behavior & uninitialized read by martinus)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

fanquake commented at 8:41 am on September 22, 2022: member

cc @theStack re #25698 & #25712.

(by my benchmarking, somewhat faster), @aureleoules want to benchmark here as well?

in src/crypto/chacha20.h:53 in d1aa65b292 outdated

56-    ChaCha20();
57-    ChaCha20(const unsigned char* key, size_t keylen);
58-    void SetKey(const unsigned char* key, size_t keylen); //!< set key with flexible keylength; 256bit recommended */
59-    void SetIV(uint64_t iv); // set the 64bit nonce
60-    void Seek(uint64_t pos); // set the 64bit block counter
61+    ChaCha20() : m_aligned() {}

aureleoules commented at 9:22 am on September 22, 2022:

nit: can be removed

sipa commented at 6:45 pm on September 22, 2022:

Fixed.

aureleoules commented at 9:34 am on September 22, 2022: member

PR

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.58	387,995,472.42	0.5%	16.98	5.93	2.866	0.03	0.0%	0.03	`CHACHA20_1MB`

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.57	389,442,111.22	0.1%	16.98	5.89	2.883	0.03	0.0%	0.03	`CHACHA20_1MB`

Master

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.57	389,658,756.50	0.4%	17.03	5.89	2.892	0.05	0.0%	0.03	`CHACHA20_1MB`

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.56	390,349,081.17	0.3%	17.03	5.87	2.900	0.05	0.0%	0.03	`CHACHA20_1MB`

#25354

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.89	346,242,492.84	0.4%	20.13	6.62	3.039	0.08	0.0%	0.03	`CHACHA20_1MB`

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
2.88	346,669,699.46	0.6%	20.13	6.59	3.052	0.08	0.0%	0.03	`CHACHA20_1MB`

Results seem very close to master and definitely faster than #25354.

sipa force-pushed on Sep 22, 2022

sipa force-pushed on Sep 26, 2022

jarolrod commented at 6:48 pm on September 27, 2022: member

noticed something interesting, this PR is faster than master and #25354 when I tested on linux x86 but not on macOS arm64. When testing on macOS arm64, #25354 is faster:

macOS arm64

pr

 0|             ns/byte |              byte/s |    err% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------:|:----------
 2|                1.45 |      687,478,118.34 |    0.1% |      0.02 | `CHACHA20_1MB`
 3|                1.48 |      676,913,544.12 |    0.4% |      0.01 | `CHACHA20_256BYTES`
 4|                1.53 |      652,987,660.64 |    0.3% |      0.01 | `CHACHA20_64BYTES`
 5|                4.12 |      242,489,722.97 |    0.8% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.05 |      488,239,791.48 |    0.0% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                5.16 |      193,768,633.08 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                2.61 |      383,482,499.71 |    1.1% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|                8.47 |      118,017,131.11 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                4.27 |      234,365,831.19 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

master

 0|             ns/byte |              byte/s |    err% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------:|:----------
 2|                1.48 |      675,393,000.64 |    0.1% |      0.02 | `CHACHA20_1MB`
 3|                1.51 |      661,164,154.45 |    0.7% |      0.01 | `CHACHA20_256BYTES`
 4|                1.58 |      632,796,536.80 |    0.4% |      0.01 | `CHACHA20_64BYTES`
 5|                4.18 |      239,082,464.13 |    0.3% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.09 |      478,583,295.30 |    0.3% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                5.22 |      191,467,070.40 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                2.64 |      379,449,038.71 |    0.5% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|                8.32 |      120,243,551.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                4.19 |      238,565,070.80 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

25354

 0|             ns/byte |              byte/s |    err% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------:|:----------
 2|                1.45 |      689,833,570.39 |    0.1% |      0.02 | `CHACHA20_1MB`
 3|                1.46 |      683,001,138.22 |    0.5% |      0.01 | `CHACHA20_256BYTES`
 4|                1.50 |      667,103,331.45 |    0.4% |      0.01 | `CHACHA20_64BYTES`
 5|                4.11 |      243,166,834.68 |    0.4% |      0.05 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.05 |      488,903,601.82 |    0.1% |      0.02 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                5.10 |      196,238,854.79 |    0.3% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                2.61 |      383,104,484.46 |    1.4% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|                8.02 |      124,743,480.76 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                4.05 |      246,740,398.27 |    0.1% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

Linux x86

pr

 0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 2|                1.99 |      502,338,808.12 |    0.6% |           16.75 |            4.80 |  3.490 |           0.02 |    0.0% |      0.02 | `CHACHA20_1MB`
 3|                2.03 |      492,836,102.56 |    1.4% |           17.35 |            4.93 |  3.519 |           0.07 |    0.1% |      0.01 | `CHACHA20_256BYTES`
 4|                2.14 |      466,586,919.27 |    0.4% |           19.14 |            5.33 |  3.591 |           0.23 |    0.0% |      0.01 | `CHACHA20_64BYTES`
 5|                5.46 |      183,057,234.02 |    0.0% |           51.50 |           13.59 |  3.791 |           0.16 |    0.0% |      0.06 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.73 |      366,170,466.28 |    0.0% |           25.75 |            6.79 |  3.790 |           0.08 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                7.45 |      134,198,234.80 |    0.0% |           70.04 |           18.55 |  3.776 |           1.43 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                3.74 |      267,643,600.56 |    0.0% |           35.06 |            9.30 |  3.770 |           0.71 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|               13.01 |       76,836,476.82 |    0.1% |          122.10 |           32.39 |  3.770 |           5.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                6.94 |      144,054,781.63 |    0.5% |           61.21 |           16.39 |  3.735 |           2.49 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

master

 0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 2|                2.00 |      499,954,466.20 |    0.3% |           16.88 |            4.84 |  3.486 |           0.03 |    0.0% |      0.02 | `CHACHA20_1MB`
 3|                2.05 |      486,935,198.75 |    1.4% |           17.37 |            4.97 |  3.496 |           0.05 |    0.0% |      0.01 | `CHACHA20_256BYTES`
 4|                2.22 |      450,220,269.82 |    0.4% |           18.86 |            5.37 |  3.512 |           0.13 |    0.0% |      0.01 | `CHACHA20_64BYTES`
 5|                5.68 |      176,107,008.87 |    0.2% |           51.76 |           13.70 |  3.777 |           0.19 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.83 |      353,532,983.23 |    0.1% |           25.88 |            6.85 |  3.779 |           0.09 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                7.72 |      129,486,776.95 |    0.2% |           72.62 |           18.74 |  3.875 |           2.06 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                3.89 |      257,245,464.66 |    0.3% |           36.35 |            9.40 |  3.865 |           1.02 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|               13.97 |       71,562,220.46 |    0.2% |          135.22 |           33.84 |  3.995 |           7.67 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                7.05 |      141,907,027.48 |    0.4% |           67.76 |           17.04 |  3.977 |           3.81 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

25354

 0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
 1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 2|                1.98 |      505,256,746.75 |    0.1% |           16.98 |            4.80 |  3.540 |           0.05 |    0.0% |      0.02 | `CHACHA20_1MB`
 3|                2.06 |      484,794,937.08 |    0.2% |           17.46 |            5.00 |  3.493 |           0.08 |    0.0% |      0.01 | `CHACHA20_256BYTES`
 4|                2.22 |      449,753,059.35 |    0.3% |           18.89 |            5.38 |  3.511 |           0.17 |    0.0% |      0.01 | `CHACHA20_64BYTES`
 5|                5.66 |      176,546,195.18 |    0.7% |           51.97 |           13.63 |  3.813 |           0.22 |    0.0% |      0.07 | `CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
 6|                2.81 |      355,375,284.35 |    0.1% |           25.99 |            6.81 |  3.817 |           0.11 |    0.0% |      0.03 | `CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
 7|                8.04 |      124,406,853.03 |    0.4% |           73.18 |           19.44 |  3.765 |           2.14 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
 8|                4.03 |      248,427,666.34 |    0.3% |           36.63 |            9.74 |  3.760 |           1.07 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
 9|               15.09 |       66,259,474.70 |    0.4% |          136.82 |           36.48 |  3.751 |           7.92 |    0.2% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
10|                7.53 |      132,846,610.84 |    0.4% |           68.57 |           18.20 |  3.768 |           3.94 |    0.0% |      0.01 | `CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

sipa commented at 7:06 pm on September 27, 2022: member

@jarolrod It’s possible that these differences are just compiler/cpu cache randomness. E.g. just having functions be layed out slightly differently in memory can cause a difference in slight timing differences. Objectively, this PR does less work per byte for long messages than master, while 25354 does more work per byte, but that’s not always possible to benchmark when you’re talking about a difference of a few CPU cycles per 64 bytes. The compiler may in some cases just emit slightly better or slightly worse code, just due to e.g. alignment differences in where the functions end up in the binary and the CPU caching effects.

theStack commented at 10:30 pm on October 2, 2022: contributor

Concept ACK

DrahtBot added the label Needs rebase on Oct 19, 2022

sipa force-pushed on Oct 19, 2022

sipa commented at 5:54 pm on October 19, 2022: member

Rebased.

DrahtBot removed the label Needs rebase on Oct 19, 2022

sipa force-pushed on Oct 19, 2022

DrahtBot added the label Needs rebase on Nov 19, 2022

sipa force-pushed on Nov 19, 2022

DrahtBot removed the label Needs rebase on Nov 19, 2022

DrahtBot added the label Needs rebase on Nov 21, 2022

sipa force-pushed on Nov 21, 2022

DrahtBot removed the label Needs rebase on Nov 21, 2022

DrahtBot added the label Needs rebase on Nov 30, 2022

sipa force-pushed on Dec 5, 2022

sipa commented at 7:23 pm on December 5, 2022: member

Rebased.

DrahtBot removed the label Needs rebase on Dec 5, 2022

w0xlt commented at 7:13 pm on December 15, 2022: contributor

PR

ns/byte	byte/s	err%	total	benchmark
0.91	1,094,178,058.96	0.5%	0.01	`CHACHA20_1MB`
0.94	1,065,805,397.14	0.3%	0.01	`CHACHA20_256BYTES`
1.03	969,669,114.48	0.4%	0.01	`CHACHA20_64BYTES`
2.65	376,719,455.06	0.3%	0.03	`CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
1.33	752,662,840.82	0.4%	0.02	`CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
3.66	273,544,917.38	0.4%	0.01	`CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
1.84	543,100,043.35	0.5%	0.01	`CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
6.58	151,969,296.31	1.5%	0.01	`CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
3.28	304,787,007.02	0.4%	0.01	`CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

master

ns/byte	byte/s	err%	total	benchmark
0.92	1,092,717,412.60	0.1%	0.01	`CHACHA20_1MB`
0.93	1,070,541,510.75	0.0%	0.01	`CHACHA20_256BYTES`
1.01	991,308,062.99	0.9%	0.01	`CHACHA20_64BYTES`
2.65	377,589,142.93	0.2%	0.03	`CHACHA20_POLY1305_AEAD_1MB_ENCRYPT_DECRYPT`
1.32	755,706,485.70	0.0%	0.02	`CHACHA20_POLY1305_AEAD_1MB_ONLY_ENCRYPT`
3.56	280,597,405.87	0.0%	0.01	`CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT`
1.79	559,191,860.78	0.0%	0.01	`CHACHA20_POLY1305_AEAD_256BYTES_ONLY_ENCRYPT`
6.43	155,424,293.51	0.0%	0.01	`CHACHA20_POLY1305_AEAD_64BYTES_ENCRYPT_DECRYPT`
3.23	309,596,620.11	0.6%	0.01	`CHACHA20_POLY1305_AEAD_64BYTES_ONLY_ENCRYPT`

in src/crypto/chacha20.cpp:308 in 4d85d8483b outdated

310+            c[i] = m[i] ^ m_buffer[64 - m_bufleft + i];
311+        }
312+        m_bufleft -= reuse;
313+        bytes -= reuse;
314+        c += reuse;
315+        m += reuse;

aureleoules commented at 1:36 pm on December 22, 2022:

I checked the test coverage of this PR and this does not seem to be tested (https://btc-coverage.s3.eu-west-1.amazonaws.com/PR-26153/src/crypto/chacha20.cpp.gcov.html). Is it possible to add a test?

sipa commented at 9:27 pm on January 18, 2023:

@aureleoules Added a commit to that effect, can you check it’s now covered?

aureleoules commented at 10:52 am on January 19, 2023:

I updated the coverage (same url), and it’s well covered now. Well done.

sipa force-pushed on Jan 18, 2023

in src/crypto/chacha20.cpp:299 in 013c0dd4cc outdated

301+    }
302+}
303+
304+void ChaCha20::Crypt(const unsigned char* m, unsigned char* c, size_t bytes)
305+{
306+    unsigned reuse = std::min<size_t>(m_bufleft, bytes);

ajtowns commented at 8:54 pm on January 19, 2023:

nit: reuse should be inside the if (m_bufleft) here as well?

sipa commented at 10:26 pm on January 19, 2023:

Done.

in src/test/util/xoroshiro128plusplus.h:67 in 3e916f17d1 outdated

62+        m_s1 = rotl(s1, 28);
63+        return result;
64+    }
65+
66+    static constexpr result_type min() noexcept { return std::numeric_limits<uint64_t>::min(); }
67+    static constexpr result_type max() noexcept { return std::numeric_limits<uint64_t>::max(); }

ajtowns commented at 9:03 pm on January 19, 2023:

std::numeric_limits<result_type>::max() ?

sipa commented at 10:26 pm on January 19, 2023:

Done.

ajtowns approved

ajtowns commented at 9:59 pm on January 19, 2023: contributor

ACK 4d85d8483bbc8fdc8338c1b2ebe62c1ababc85ca – code review, tests pass at each commit

sipa force-pushed on Jan 19, 2023

ajtowns commented at 11:37 pm on January 19, 2023: contributor

ACK aca35e568239e02673e23f38c0fad58fc4911fd4

fanquake commented at 10:41 am on January 20, 2023: member

@dhruv @real-or-random @jonasnick might be interested in reviewing here?

dhruv commented at 4:47 pm on January 20, 2023: contributor

tACK that the usage of partial ChaCha20 blocks doesn’t advance the block counter anymore when using the unaligned version: The evidence is that the downstream BIP324 PRs #25361, #23233 and #24545, based on this PR have passing test vectors. @fanquake I can review this in the coming days too, but I don’t know anything about MuHash3072 and xoroshiro128++ yet so it’ll take some time. I’ll do the partial review first.

in src/crypto/chacha20.cpp:274 in b56981e246 outdated

297         c += 64;
298         m += 64;
299     }
300 }
301+
302+void ChaCha20::Keystream(unsigned char* c, size_t bytes)

dhruv commented at 6:13 pm on January 23, 2023:

Would it be valuable to inline the unaligned variant functions too?

sipa commented at 6:38 pm on January 30, 2023:

No, they’re never called within this compilation unit, so there shouldn’t be any place they even can be inlined?

in src/test/fuzz/crypto_chacha20.cpp:68 in 167a829147 outdated

62+    unsigned char key[32] = {0};
63+    auto key_bytes = provider.ConsumeBytes<unsigned char>(32);
64+    std::copy(key_bytes.begin(), key_bytes.end(), key);
65+    uint64_t iv = provider.ConsumeIntegral<uint64_t>();
66+    uint64_t total_bytes = provider.ConsumeIntegralInRange<uint64_t>(0, 1000000);
67+    uint64_t seek = provider.ConsumeIntegralInRange<uint64_t>(0, ~(total_bytes >> 6));

dhruv commented at 7:07 pm on January 23, 2023:

I’m a little confused by what ~(total_bytes >> 6) is trying to do here. I understand the converting bytes to blocks, but then what’s the bitwise not for? Wouldn’t it result in seeking way past the total_bytes?

The test of course still works since the seeking is consistent across both instances. I just don’t understand the intent yet.

dhruv commented at 5:18 am on January 24, 2023:

Oh, I just got it. That term + the number of blocks for total_bytes will equal 2^64-1 so it is indeed the max seek position.

A comment might help future readers.

sipa commented at 7:01 pm on January 30, 2023:

I added a comment.

in src/test/crypto_tests.cpp:151 in 376c9284ee outdated

148@@ -149,7 +149,7 @@ static void TestChaCha20(const std::string &hex_message, const std::string &hexk
149     } else {
150         rng.Keystream(outres.data(), outres.size());
151     }
152-    BOOST_CHECK(out == outres);
153+    BOOST_CHECK_EQUAL(hexout, HexStr(outres));

dhruv commented at 7:58 pm on January 23, 2023:

Can we eliminate the out variable now by using hexout.size() / 2 instead of out.size()?

in src/test/crypto_tests.cpp:165 in aca35e5682 outdated

160@@ -161,6 +161,27 @@ static void TestChaCha20(const std::string &hex_message, const std::string &hexk
161         }
162         BOOST_CHECK_EQUAL(hexout, HexStr(outres));
163     }
164+
165+    // Repeat 10x, but fragmented into 3 chunks, to exercise the ChaCha20 class's caching.
166+    for (int i = 0; i < 10; ++i) {

dhruv commented at 8:22 pm on January 23, 2023:

Out of curiosity, since every loop re-seeks to seek, why do we repeat this 10 times? Is that primarily for different chunks?

Separately, is non-determinism acceptable in unit tests? Is there a way to reproduce failures?

sipa commented at 7:02 pm on January 30, 2023:

Out of curiosity, since every loop re-seeks to seek, why do we repeat this 10 times? Is that primarily for different chunks?

Just because it’s very easy to do, and may result in more interesting cases being hit.

Separately, is non-determinism acceptable in unit tests? Is there a way to reproduce failures?

The unit tests’ RNG is initialized deterministically, so there shouldn’t be any non-determinism.

in src/test/fuzz/crypto_chacha20.cpp:133 in 167a829147 outdated

127+            crypt2.Crypt(data2.data() + bytes2, data2.data() + bytes2, now);
128+        } else {
129+            crypt2.Keystream(data2.data() + bytes2, now);
130+        }
131+        bytes2 += now;
132+        if (is_last) break;

dhruv commented at 9:53 pm on January 23, 2023:

0        if (is_last || bytes2 == total_bytes) break;

dhruv commented at 9:55 pm on January 23, 2023:

If it’s useful to test with 0 bytes as a parameter, maybe bool is_last = (iter == 255) || (bytes2 == total_bytes) || provider.ConsumeBool(); above?

sipa commented at 7:02 pm on January 30, 2023:

I’ve added that last suggestion.

dhruv commented at 10:19 pm on January 23, 2023: contributor

crACK tACK aca35e568239e02673e23f38c0fad58fc4911fd4 with optional comments and a couple questions.

I rebased #25361 #23233 #23561 and #24545 on this commit and all tests passed (which includes the motivating test cases for this PR wrt the partial ChaCha20 blocks)

I did not code review the XoRoShiRo128++ implementation.

real-or-random commented at 11:11 am on January 24, 2023: contributor

@dhruv @real-or-random @jonasnick might be interested in reviewing here?

I can review it, but I’ll probably find time only in February. (I don’t know if it makes sense to wait for me or not.)

sipa force-pushed on Jan 30, 2023

Split ChaCha20 into aligned/unaligned variants e37bcaa0a6

Rename ChaCha20::Seek -> Seek64 to clarify multiple of 64 6babf40213

Make unrestricted ChaCha20 cipher not waste keystream bytes

Co-authored-by: dhruv <856960+dhruv@users.noreply.github.com>

12ff72476a

Add xoroshiro128++ PRNG

Xoroshiro128++ is a fast non-cryptographic random generator.
Reference implementation is available at https://prng.di.unimi.it/

Co-Authored-By: Pieter Wuille <pieter@wuille.net>

5f05b27841

Add fuzz test for testing that ChaCha20 works as a stream 38eaece67b

Use ChaCha20 caching in FastRandomContext 5d16f75763

Use ChaCha20Aligned in MuHash3072 code f21994a02e

Only support 32-byte keys in ChaCha20{,Aligned} 62ec713961

Inline ChaCha20 32-byte specific constants 93aee8bbda

Improve test vectors for ChaCha20 fb243d25f7

sipa force-pushed on Jan 30, 2023

Add unit test for ChaCha20's new caching 511aa4f1c7

sipa force-pushed on Jan 31, 2023

ajtowns commented at 3:34 am on February 7, 2023: contributor

ut reACK 511aa4f1c7508f15cab8d7e58007900ad6fd3d5d

maflcko requested review from dhruv on Feb 7, 2023

dhruv commented at 6:21 pm on February 7, 2023: contributor

tACK crACK 511aa4f1c7

Reviewed git range-diff ceb74b844c aca35e5 511aa4f1c7 Downstream BIP324 branches include 511aa4f1c7 with green CI

fanquake merged this on Feb 15, 2023

fanquake closed this on Feb 15, 2023

fanquake commented at 2:59 pm on February 15, 2023: member

Going to move this forward. @real-or-random would still be great for you to leave any post-merge commentary, if you get the time.

sidhujag referenced this in commit fa531606e4 on Feb 15, 2023

in src/crypto/muhash.cpp:302 in f21994a02e outdated

298@@ -299,7 +299,7 @@ Num3072 MuHash3072::ToNum3072(Span<const unsigned char> in) {
299     unsigned char tmp[Num3072::BYTE_SIZE];
300 
301     uint256 hashed_in{(HashWriter{} << in).GetSHA256()};
302-    ChaCha20(hashed_in.data(), hashed_in.size()).Keystream(tmp, Num3072::BYTE_SIZE);
303+    ChaCha20Aligned(hashed_in.data(), hashed_in.size()).Keystream64(tmp, Num3072::BYTE_SIZE / 64);

real-or-random commented at 11:32 am on March 8, 2023:

0    static_assert(Num3072::BYTE_SIZE % 64 == 0);
1    ChaCha20Aligned(hashed_in.data(), hashed_in.size()).Keystream64(tmp, Num3072::BYTE_SIZE / 64);

sipa commented at 6:37 pm on July 18, 2023:

Done in #28100.

in src/crypto/chacha20.h:66 in 12ff72476a outdated

62+        m_aligned.SetKey(key, keylen);
63+        m_bufleft = 0;
64+    }
65 
66     /** set the 64-bit nonce. */
67     void SetIV(uint64_t iv) { m_aligned.SetIV(iv); }

real-or-random commented at 11:51 am on March 8, 2023:

(not this PR)

Should SetIV() also set m_bufleft = 0? If the caller sets an IV without seeking, and we have bytes left in the buffer, then the caller will get those remaining bytes which belong to the wrong buffer.

I mean, setting an IV without seeking is probably not a good idea anyway, and none of the callers currently do this. But I feel that’s something that then should be excluded by the interface of the class, i.e., SetIV() and SetKey() should ideally seek to 0 (and perhaps be renamed to something line SetIvAndSeek0 or SetIvAndReset.

sipa commented at 6:37 pm on July 18, 2023:

I had forgotten about this PR comment, but this was effectively done in #27985.

in src/crypto/chacha20.cpp:43 in 511aa4f1c7

67 
68-ChaCha20::ChaCha20()
69+ChaCha20Aligned::ChaCha20Aligned()
70 {
71     memset(input, 0, sizeof(input));
72 }

real-or-random commented at 11:53 am on March 8, 2023:

(not this PR)

But I think initializing with an effective zero key should not be allowed by the interface of the class.

sipa commented at 6:36 pm on July 18, 2023:

Done in #28100.

in src/crypto/chacha20.h:63 in 511aa4f1c7

66+    /** set 32-byte key. */
67+    void SetKey32(const unsigned char* key32)
68+    {
69+        m_aligned.SetKey32(key32);
70+        m_bufleft = 0;
71+    }

real-or-random commented at 11:57 am on March 8, 2023:

Since this is a rekeying operation, consider memory_cleanseing the cache buffer. (It will probably be overwritten soon anyway. but SetKey() should not happen frequently, so we don’t need to worry about performance.)

I guess there should also be destructor that does the same. And (not this PR), a destructor of ChaCha20Aligned should memory_cleanse the key.

sipa commented at 6:36 pm on July 18, 2023:

Done in #28100.

in src/crypto/chacha20.h:41 in 511aa4f1c7

38+    void Keystream64(unsigned char* c, size_t blocks);
39+
40+    /** enciphers the message <input> of length <64*blocks> and write the enciphered representation into <output>
41+     *  Used for encryption and decryption (XOR)
42+     */
43+    void Crypt64(const unsigned char* input, unsigned char* output, size_t blocks);

real-or-random commented at 12:01 pm on March 8, 2023:

(not this PR)

nit: I’d prefer an interface that distinguishes between encryption and decryption. That makes the code easier to change if it will be reused for other ciphers.

sipa commented at 6:36 pm on July 18, 2023:

Done in #28100.

real-or-random commented at 12:02 pm on March 8, 2023: contributor

Here are some post-merge comments. None of them are critical. Sorry, many of them are not directly related to the changes in this PR, but I felt it’s still easier that I add all of them to this review.

fanquake commented at 12:04 pm on March 8, 2023: member

@real-or-random thanks for circling around to this post-merge.

fanquake referenced this in commit 5eb669024f on Aug 18, 2023

Frank-GER referenced this in commit e6242939ee on Sep 8, 2023

ogabrielides referenced this in commit 9c93afa077 on Nov 15, 2023

ogabrielides referenced this in commit 186416f525 on Nov 15, 2023

ogabrielides referenced this in commit 2e2bcfb8a2 on Nov 17, 2023

ogabrielides referenced this in commit 6183bb1fff on Nov 17, 2023

UdjinM6 referenced this in commit 7f81bf0e22 on Nov 17, 2023

PastaPastaPasta referenced this in commit 5a4406ef98 on Nov 19, 2023

PastaPastaPasta referenced this in commit 722d14c340 on Nov 19, 2023

bitcoin locked this on Jul 17, 2024

Reduce wasted pseudorandom bytes in ChaCha20 + various improvements #26153

Reviews

Conflicts

PR

Master

#25354

macOS arm64

Linux x86