Use constant-time conditional moves instead of byte slicing

sipa commented at 7:34 pm on December 2, 2014: contributor

This avoids the potential speed differences between reading from the begin and end of cache lines that exists in the byte-slicing approach. It’s also slightly faster.

I’ve looked at the generated code with -O3, and it looks like it is actually iterating over all data, but it’s hard to be sure. The result is slower than an equivalent that just picks the right value to add directly.

gmaxwell commented at 8:35 pm on December 2, 2014: contributor

The O2 build assembly is super easy to read for me, and appears clearly correct. O3 is harder to tell due to the inlining.

But I can confirm that even with O3 it’s running all 16 iterations:

0 a63:       83 3c 24 10             cmpl   $0x10,(%rsp)
1 a67:       4c 89 84 24 a8 02 00    mov    %r8,0x2a8(%rsp)
2 a6e:       00 
3 a6f:       0f 85 d3 fe ff ff       jne    948 <secp256k1_ecmult_gen+0x188>

(comparion with 16) Still no guarantee that a sufficiently advanced pipeline won’t realize the read is pointless and skip it. :)

ACK. (well, ignoring that it needs to be defined for field GMP)

sipa force-pushed on Dec 2, 2014

sipa commented at 8:43 pm on December 2, 2014: contributor

Fixed.

sipa force-pushed on Dec 3, 2014

sipa commented at 1:27 am on December 3, 2014: contributor

Updated to use a simpler cmov implementation (not faster though, at -O3), and storing just the x and y coordinates instead of the full point (so, dropping the infinity flag which we know is never set in the precomputed table). Gives another 3% speedup for signing.

Use constant-time conditional moves instead of byte slicing efb7d4b299

sipa force-pushed on Dec 3, 2014

gmaxwell commented at 7:08 pm on December 3, 2014: contributor

ACK.

sipa merged this on Dec 3, 2014

sipa closed this on Dec 3, 2014

sipa referenced this in commit 7b92cf66c7 on Dec 3, 2014

gmaxwell cross-referenced this on Dec 3, 2014 from issue Reduce memory timing leak risk by gmaxwell

Use constant-time conditional moves instead of byte slicing #132