sipa
commented at 6:46 am on July 14, 2017:
member
This adds an SSE4 assembly version of the SHA256 transform by Intel, and uses it at run time if SSE4 instructions are available, and use a fallback C++ implementation otherwise. Nearly every x86_64 CPU supports SSE4. The feature is only enabled when compiled with --enable-experimental-asm.
In order to avoid build dependencies and other complications, the original Intel YASM code was translated to GCC extended asm syntax.
This gives around a 50% speedup on the SHA256 benchmark for me.
It is based on an earlier patch by @laanwj, though only includes a single assembly version (for now), and removes the YASM dependency.
in
src/crypto/sha256.cpp:181
in
e486f5774doutdated
I’d prefer to do this setup explicitly during initialization; this also avoids having to use an atomic pointer, which seems overkill (why would it ever change during runtime?) and may be inefficient on some platforms.
(also the detection might be more involved on some platforms, so it’s better for clarity to drive it from an init function instead of magically at first call).
luke-jr
commented at 7:15 am on July 14, 2017:
member
Even with inline assembly, there are build complications unfortunately. The compile will fail if the target doesn’t support it..
laanwj added the label
Validation
on Jul 14, 2017
laanwj added the label
Utils and libraries
on Jul 14, 2017
sipa force-pushed
on Jul 14, 2017
sipa
commented at 8:36 am on July 14, 2017:
member
@luke-jr There are system macros to test whether you’re compiling for x86_64 or not.
luke-jr
commented at 9:34 am on July 14, 2017:
member
You said almost every x86_64 CPU. Are we going to drop support for the outliers then?
meshcollider
commented at 10:13 am on July 14, 2017:
contributor
One of the travis builds obviously has an issue with it too:
crypto/sha256_sse42.cpp:42:9: error: inline assembly requires more registers than available
theuni
commented at 4:11 pm on July 14, 2017:
member
The clang/osx build succeeds when -fomit-frame-pointer is used. I don’t speak enough asm to know if a register can be freed up.
gmaxwell
commented at 4:20 pm on July 14, 2017:
contributor
Even with inline assembly, there are build complications unfortunately. The compile will fail if the target doesn’t support it..
No it won’t– these files are compiled without -msse4.2 already. The only thing required is that its x86_64, which the build tests for.
sipa
commented at 6:24 pm on July 14, 2017:
member
@luke-jr There is runtime detection to see if the CPU supports the extension. The only requirement is that the target is x86_64.
jonasschnelli
commented at 7:11 pm on July 14, 2017:
contributor
0Generated test/data/base58_keys_invalid.json.h
1crypto/sha256_sse42.cpp:42:9: error: inline assembly requires more registers than available
2 "shl $0x6,%2;"
3 ^
41 error generated.
No problem on Win/ OSX Linux
sipa
commented at 7:41 pm on July 14, 2017:
member
@jonasschnelli@theuni figured it out - clang isn’t compiling with -fomit-frame-pointer, and thus there is one fewer register available. Unfortunately, omitting the frame pointer still makes this code not work…
sipa force-pushed
on Jul 14, 2017
sipa
commented at 10:11 pm on July 14, 2017:
member
Updated the code to use one fewer register. The original YASM code used the dx register for two purposes, which I had separated out into two separate registers. They’re merged now.
sipa force-pushed
on Jul 14, 2017
sipa force-pushed
on Jul 14, 2017
in
src/crypto/sha256_sse42.cpp:975
in
dc1fa8410coutdated
970+; * Redistributions in binary form must reproduce the above copyright
971+; notice, this list of conditions and the following disclaimer in the
972+; documentation and/or other materials provided with the
973+; distribution.
974+;
975+; * Neither the name of the Intel Corporation nor the names of its
TheBlueMatt
commented at 11:28 pm on July 14, 2017:
We’re gonna have to do something to meet this condition, though it doesnt appear we’d have to do much.
This is the standard three clause BSD license, it is GPL and whatnot compatible. The source code to Bitcoin, which contains this notice, is part of the “documentation and/or other materials” we provide.
TheBlueMatt
commented at 11:37 pm on July 14, 2017:
We ship sans-source all the time? I figured we’d just put a “contains softare copyright Intel” in the –help output or a README somewhere.
sipa
commented at 1:05 am on July 15, 2017:
member
Marking as WIP, as this does not seem to produce correct hashes on OSX (cc @theuni).
sipa force-pushed
on Jul 15, 2017
theuni
commented at 5:46 am on July 15, 2017:
member
I poked at this for hours and came up empty-handed. I’ll wait for someone else to confirm my osx breakage isn’t just local.
theuni
commented at 6:51 am on July 15, 2017:
member
two more data points:
@fanquake verified that this crashes on osx for him as well.
I managed to reproduce a crash on Linux with an old clang (3.2), and it’s even uglier, crashing gdb as well:
Starting program: /home/cory/dev/bitcoin2/src/bitcoind
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
Program received signal SIGSEGV, Segmentation fault.
/build/buildd/gdb-7.6~20130417/gdb/dwarf2read.c:10350: internal-error: dwarf2_record_block_ranges: Assertion dwarf2_per_objfile->ranges.readin' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n
/build/buildd/gdb-7.6~20130417/gdb/dwarf2read.c:10350: internal-error: dwarf2_record_block_ranges: Assertion dwarf2_per_objfile->ranges.readin’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
0x000000000074c910 in sha256_sse42::Transform (
/build/buildd/gdb-7.6~20130417/gdb/dwarf2read.c:10350: internal-error: dwarf2_record_block_ranges: Assertion dwarf2_per_objfile->ranges.readin' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n
/build/buildd/gdb-7.6~20130417/gdb/dwarf2read.c:10350: internal-error: dwarf2_record_block_ranges: Assertion dwarf2_per_objfile->ranges.readin’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Segmentation fault (core dumped)
sipa force-pushed
on Jul 15, 2017
sipa force-pushed
on Jul 16, 2017
sipa force-pushed
on Jul 16, 2017
theuni
commented at 6:41 am on July 16, 2017:
member
Tested ACK08b7438f73236fc738fb655f766e77a81e6b7311. Good on OSX now!
Edit: Though I’d prefer to have the cpu check done separately.
sipa
commented at 7:42 pm on July 16, 2017:
member
Rebased, and moved the autodetection to an explicit SHA256AutoDetect() function that is called during initialization.
sipa force-pushed
on Jul 16, 2017
sipa
commented at 11:50 pm on July 16, 2017:
member
Improved the self test (it now tests 0, 1, and 2-block transforms), and made it assert when the selftest fails rather than failing over to the standard implementation. This way, it won’t hide problems.
gmaxwell approved
gmaxwell
commented at 0:39 am on July 17, 2017:
contributor
It’d be helpful to add a little note about the ‘L’ prefix and what problem it solves. If nothing else, it may turn up as a another useful google hit for someone in the future.
in
src/crypto/sha256.cpp:175
in
7308332e70outdated
Like with the rand init, I think we’d save ourselves from future oopses by setting this to nullptr initially, and letting SHA256AutoDetect() set the fallback to sha256::Transform if necessary.
I don’t think that will work - there is some SHA256 work before main (IIRC to set up the chain parameteters). Better if that uses the ‘canonical’ SHA256.
laanwj
commented at 6:42 am on July 18, 2017:
member
utACK, looks good to me now, but I still think it’s too late for 0.15.
At least to enable it by default, I’m ok with an --enable-experimental-asm option, then enabling it by default after the 0.15 branch-off.
sipa
commented at 8:07 am on July 18, 2017:
member
@laanwj Added a --enable-experimental-asm configure option, disabled by default.
laanwj added this to the milestone 0.15.0
on Jul 18, 2017
Me too, and it didn’t work for me unless I changed it. Using $withval here most llikelys pick up the last –with check result (for qrencode, which wasn’t installed in my case, so it always had no)
On second thought, I’d actually prefer doing it that way in order to keep sha256_sse4.cpp completely generic. It was very helpful for me while testing to just throw together a quick test app using the .cpp directly.
in
src/crypto/sha256_sse4.cpp:9
in
66043d3682outdated
4@@ -5,6 +5,8 @@
5 // This is a translation to GCC extended asm syntax from YASM code by Intel
6 // (available at the bottom of this file).
78+#include "config/bitcoin-config.h"
9+
Nit: Seems this is a log message with the side-effect of detecting the SHA256 implementation.
I’d prefer to assign the result explicitly, so that if someone happens to comment this out, or moves it to debug category, it won’t just be skipped.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-05-27 21:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me