This PR enables AVX2, SSE4.1 and x86 SHA-NI implementations of SHA256 instead of the “standard” one.
NOTE about testing. During runtime the SHA-NI implementation is available only if a CPU has the sha_ni
flag set.
Here are benchmark results on my machine with Intel Core i5 8350U (no sha_ni
flag) + Windows 11 Pro 22H2:
0>.\src\bench_bitcoin.exe -filter=SHA256D64.*
1
2| ns/byte | byte/s | err% | total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4| 2.79 | 357,807,381.52 | 0.2% | 0.01 | SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
5| 4.29 | 232,954,767.62 | 0.0% | 0.01 | SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation
6| 4.33 | 231,031,727.38 | 0.6% | 0.01 | SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
7| 11.65 | 85,836,280.29 | 0.3% | 0.01 | SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation
On another machine with Intel Core i9-12950HX (sha_ni
flag set) + Windows 11 Home 22H2:
- the master branch @ 058488276f8dc244fe534ba45ec8dd2b4b198a2e:
0>.\src\bench_bitcoin.exe -filter=SHA256.*
1
2| ns/byte | byte/s | err% | total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4| 24.74 | 40,421,883.67 | 1.2% | 0.02 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
5| 25.09 | 39,856,473.88 | 2.7% | 0.02 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
6| 25.02 | 39,965,849.49 | 1.4% | 0.02 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
7| 25.06 | 39,900,152.21 | 1.4% | 0.02 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
8| 17.87 | 55,953,892.74 | 4.0% | 0.01 | `SHA256_32b_AVX2 using the 'standard' SHA256 implementation`
9| 17.59 | 56,839,372.12 | 2.5% | 0.01 | `SHA256_32b_SHANI using the 'standard' SHA256 implementation`
10| 17.95 | 55,721,393.03 | 2.9% | 0.01 | `SHA256_32b_SSE4 using the 'standard' SHA256 implementation`
11| 18.47 | 54,140,724.95 | 2.1% | 0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
12| 8.27 | 120,885,364.41 | 2.3% | 0.09 | `SHA256_AVX2 using the 'standard' SHA256 implementation`
13| 8.10 | 123,519,312.24 | 0.5% | 0.09 | `SHA256_SHANI using the 'standard' SHA256 implementation`
14| 8.15 | 122,720,467.32 | 1.7% | 0.09 | `SHA256_SSE4 using the 'standard' SHA256 implementation`
15| 7.93 | 126,141,581.31 | 1.4% | 0.09 | `SHA256_STANDARD using the 'standard' SHA256 implementation`
- this PR:
0>.\src\bench_bitcoin.exe -filter=SHA256.*
1
2| ns/byte | byte/s | err% | total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4| 2.24 | 446,126,616.75 | 1.7% | 0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
5| 1.05 | 953,349,958.44 | 3.3% | 0.01 | `SHA256D64_1024_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
6| 2.82 | 354,952,157.43 | 1.1% | 0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
7| 5.45 | 183,471,444.57 | 1.6% | 0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
8| 4.37 | 228,701,362.34 | 1.4% | 0.01 | `SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
9| 1.34 | 748,698,312.44 | 2.4% | 0.01 | `SHA256_32b_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
10| 4.30 | 232,450,436.16 | 2.6% | 0.01 | `SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
11| 4.54 | 220,072,501.29 | 3.6% | 0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
12| 2.14 | 467,093,278.53 | 4.2% | 0.02 | `SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
13| 0.44 | 2,279,721,873.93 | 2.9% | 0.01 | `SHA256_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
14| 2.09 | 478,148,608.59 | 2.2% | 0.02 | `SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
15| 2.06 | 486,570,650.06 | 2.4% | 0.02 | `SHA256_STANDARD using the 'standard' SHA256 implementation`