The issue is that MSVC for 32-bit targets implements 64x64->64 bit multiplications using a non-constant subroutine. The subroutine is not constant-time because it shortcuts when the high 32 bits of both multiplicands are all 0.
See https://research.kudelskisecurity.com/2017/01/16/when-constant-time-source-may-not-save-you/ and also https://www.bearssl.org/ctmul.html for a broader view of the issue.
By inspection of our 8x32 scalar and 10x26 field code, I found four places in the field code where the high bits are not guaranteed to be zero.
This PR inserts VERIFY_CHECKS in the 8x32 scalar code to ensure the high bits are indeed 0. There, all ->64 multiplications are in fact 32x32->64.
Moreover, this PR modifies the four multiplications in the 10x26 such that the right multiplicand, which is a constant, has never all high bits set to zero. This is ensured by shifting that constant to the left. The costs are two additional shift instructions for shifting the product back to right, for each field element multiplication and doubling. The correctness follows from the VERIFY_BITS statements for the other multiplicands preceeding the multiplications, which ensure that we have enough unused high bits such that the multiplication won’t overflow even with the left-shifted constant.
I feel that this is the most reasonable thing we can do without too much effort and loss of performance.
Possible alternatives:
- Do the same but with MSVC conditional compilation: I think that’s also okay but conditional compilation makes makes testing harder, in particular because noone here uses Windows.
- Write assembly for the multiplication. That may also improve performance for MSVC because their routine is not only variable time but also slow. But this is more work, harder to review and I don’t care about performance for MSVC. Moreover, they have a different asm syntax…
- Blacklist the 32-bit for MSVC 32-bit. The drawback is that this leaves us with no option there, and people will probably just comment out the blacklisting. And MSVC is indeed used, for example it’s a travis-tested target for rust-secp256k1.
- Do nothing (and blame the compiler). But I don’t think that’s clever given that this PR is simple. Of course there are many other non-constant multiplication issues for different platform but I don’t think that’s a reason to ignore this one.
In general, what do people think about pointing out in the README that the library is supposed to be portable but tested tested mostly on gcc and clang, and it’s therefore recommend to compile it there if possible? This sounds a little bit like “Optimized for Netspace Navigator” but please read https://blog.mozilla.org/nfroyd/2018/05/29/when-implementation-monoculture-right-thing/ for why Mozilla dropped MSVC and how nice clang-cl is as a drop-in replacement for MSVC.
This is WIP because I need to add detailed comments, and I first want to see what people think.
edit: And I had a godbolt environment to play around with this but I lost it. I’ll make a new one if people are interested.