This PR enables AVX2, SSE4.1 and x86 SHA-NI hardware-accelerated implementations of SHA256 to replace the "standard" default implementation when building on Windows.
Testing Note: At runtime, the SHA-NI implementation is dynamically selected and available only if the CPU has the sha_ni flag set.
Here are the benchmark results on my machine using "Release" binaries:
- the master branch @ 47da4f9b716d11294d4fb0f30b04a7bcf128cc14:
> build\bin\Release\bench_bitcoin.exe -filter="SHA256.*_SHANI"
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 11.35 | 88,074,183.58 | 0.3% | 0.01 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
| 13.07 | 76,522,397.95 | 0.4% | 0.01 | `SHA256_32b_SHANI using the 'standard' SHA256 implementation`
| 4.26 | 234,554,580.85 | 0.9% | 0.05 | `SHA256_SHANI using the 'standard' SHA256 implementation`
> build\bin\Release\bench_bitcoin.exe -filter="SHA256D64_1024.*"
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 11.29 | 88,586,104.35 | 0.3% | 0.01 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
| 11.36 | 88,062,348.83 | 0.9% | 0.01 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
| 11.33 | 88,275,862.07 | 0.9% | 0.01 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
| 11.33 | 88,263,973.06 | 0.8% | 0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
- this PR:
> build\bin\Release\bench_bitcoin.exe -filter="SHA256.*_SHANI"
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 1.27 | 790,145,684.72 | 0.5% | 0.01 | `SHA256D64_1024_SHANI using the 'x86_shani(1way;2way)' SHA256 implementation`
| 1.97 | 508,638,668.32 | 0.8% | 0.01 | `SHA256_32b_SHANI using the 'x86_shani(1way;2way)' SHA256 implementation`
| 0.63 | 1,587,301,587.30 | 0.5% | 0.01 | `SHA256_SHANI using the 'x86_shani(1way;2way)' SHA256 implementation`
> build\bin\Release\bench_bitcoin.exe -filter="SHA256D64_1024.*"
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 4.02 | 248,878,761.99 | 0.3% | 0.01 | `SHA256D64_1024_AVX2 using the 'standard;sse41(4way);avx2(8way)' SHA256 implementation`
| 1.27 | 787,950,595.69 | 0.8% | 0.01 | `SHA256D64_1024_SHANI using the 'x86_shani(1way;2way)' SHA256 implementation`
| 7.00 | 142,826,631.80 | 0.8% | 0.01 | `SHA256D64_1024_SSE4 using the 'standard;sse41(4way)' SHA256 implementation`
| 11.23 | 89,055,578.20 | 1.0% | 0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`