Based on #13191.
This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.
In addition to #13191, two extra implementations are provided:
- (a) A variable-length SHA256 implementation using SHA extensions.
- (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.
Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:
- Using generic C++ code (pre-#10821): 6.1ms
- Using SSE4 (master, #10821): 4.6ms
- Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
- Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
- Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms
Benchmarks for 32-byte SHA256 on the same system:
- Using SSE4 (master, #10821): 190ns
- Using SHA-NI (this PR): 53ns
Benchmarks for 1000000-byte SHA256 on the same system:
- Using SSE4 (master, #10821): 2.5ms
- Using SHA-NI (this PR): 0.51ms