This PR reintroduces the 1-way SSE4 SHA256 implementation using intrinsics, as suggested in #13442, specifically for MSVC builds, where a 50% performance gain has been achieved.
Here are benchmarks on my machine with Intel Core i5-8350U CPU (no sha_ni
flag) + Windows 11 Pro 22H2:
- before this PR (8a9e37fb95cbb0bf7f6e06fa05d8381db04d61e2):
0>.\src\bench_bitcoin.exe -filter=SHA256_.*
1
2| ns/byte | byte/s | err% | total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4| 9.92 | 100,826,852.23 | 0.1% | 0.01 | SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
5| 9.90 | 101,038,141.67 | 0.3% | 0.01 | SHA256_32b_SHANI using the 'standard,sse41(4way)' SHA256 implementation
6| 10.02 | 99,788,852.31 | 0.9% | 0.01 | SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
7| 10.01 | 99,883,509.98 | 0.8% | 0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
8| 4.48 | 223,348,893.31 | 1.1% | 0.05 | SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
9| 4.47 | 223,668,612.58 | 1.2% | 0.05 | SHA256_SHANI using the 'standard,sse41(4way)' SHA256 implementation
10| 4.45 | 224,638,332.29 | 0.7% | 0.05 | SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
11| 4.45 | 224,542,494.67 | 0.6% | 0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation
- with this PR:
0>.\src\bench_bitcoin.exe -filter=SHA256_.*
1
2| ns/byte | byte/s | err% | total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4| 7.04 | 142,024,691.36 | 0.2% | 0.01 | SHA256_32b_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
5| 7.03 | 142,222,222.22 | 0.2% | 0.01 | SHA256_32b_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
6| 7.08 | 141,231,323.51 | 0.8% | 0.01 | SHA256_32b_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
7| 9.88 | 101,196,866.84 | 0.4% | 0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
8| 3.01 | 332,270,069.11 | 1.3% | 0.03 | SHA256_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
9| 3.00 | 332,989,244.45 | 0.3% | 0.03 | SHA256_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
10| 3.04 | 328,612,270.38 | 2.0% | 0.03 | SHA256_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
11| 4.45 | 224,678,709.45 | 0.4% | 0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation
Based on #24773.