- Track carry explicitly instead of adding to scalar
- Branch-free code for carry calculations
Gives ~0.6% improvement for bench_verify (64bit, endo=yes)
Gives ~0.6% improvement for bench_verify (64bit, endo=yes)
- Track carry explicitly instead of adding to scalar
- Branch-free code for carry calculations
ACK, nice!
With latest changes, perf. increased by ~2% total for bench_verify (64bit, endo=yes), maybe half that for endo=no, but...
Note that the tests are currently failing for endo=no, because of the VERIFY_CHECK for w <= 15; when endo=no, _ecmult_wnaf is called for WINDOW_G==16. There is a bug lurking here because 'int' is only required to be 16 bits AFAIK, and if it is in fact 16, _ecmult_wnaf wouldn't work for w > 15 (thus the check). Note that the bug is independent of this PR.
I guess the simplest fix is to change the element type of 'wnaf' to int32_t. Alternately, we could make it explicitly int16_t and just set WINDOW_G to 15 in all cases. Thoughts?
The code is not 16 bit safe in a number of places. I've fixed things that I've touched, but an effort should be made at some point to fix that. The current requirement for the availability of a 64-bit long long though keeps me from simply just trying to build it for a 16 bit platform. :)
OK, I just changed the check to w <= 31 for now, to reflect the effective assumption of 32-bit int.
Tested ACK, though if others feel we should react more seriously to the 16-bit incompatibility I'll withdraw it :)
Very good stuff, thanks Peter!
- Initialize 'wnaf' to zeroes using memset
- Add new 'len' arg to speed up smaller scalars (mostly for endo=yes)
I reconsidered the "external initialisation of wnaf" change (rebased out that version) and changed to a memset inside _ecmult_wnaf. Still ~2% improvement.
ACK