Test demonstrating discrepancy in sqr output

peterdettman commented at 8:47 am on June 22, 2014: contributor

Please refer to field_5x52_int128_impl.h, secp256k1_fe_sqr_inner method, the reduction part at the end (comments should also apply to secp256k1_fe_mul_inner, and the issue may exist for other field implementations, but here I discuss ‘64bit’).

The docs/contract for secp256k1_fe_sqr say that the output will be magnitude 1, and other comments explain that the elements are therefore limited to M_(2^53-1) for all limbs except the first (M_(2^49-1)). Yet, by inspection the reduction in _sqr_inner appears insufficient (t1’s value is unconstrained when “r[1] = t1 + c”). Indeed, inputs can be easily found to reveal the problem, as demonstrated in the test case with this PR.

(I emphasise that the output value is mathematically correct, but violates the limits for magnitude 1.)

e.g. sqr(-65537) leads to a carry into t1, which is already 0xFFFFFFFFFFFFF, and the carry does not propagate. Indeed, in this case, it would have to propagate all the way up to t4 and back to t0 again! The test case samples some other problematic inputs across a large range of bit lengths; it is not an isolated case, and the carry into t1(==0xFF….) can be larger than 1.

It’s not clear whether this can actually cause higher-level failures. Negation uses (M+1), normalisation seems unaffected. In theory, mul_int(x, 4096) would overflow unexpectedly! mul_int(x, 8) could lead to a value that was in theory too large for mul/sqr input, but in practice those methods tolerate larger values. If there were a 9x29 field, there would definitely be problems, with only 3 bits to spare.

It would be unfortunate to have to add an entire extra reduction round in sqr/mul, but I think an earlier reduction of t9 might help (similar to the normalization changes from https://github.com/bitcoin/secp256k1/pull/24). Not sure if I’d be competent to attempt the asm version, assuming it is similarly affected.

I suppose it might be considered to “fix” this by adjusting the magnitude system or method contracts, so I await a discussion before proceeding.

Test demonstrating discrepancy in sqr output c0a4eef7c6

sipa commented at 1:49 pm on June 22, 2014: contributor

This is scary, thanks a lot for tracking that down.

Not verifying the assumptions implied by the magnitude/normalization properties in the assertions and tests is probably the first mistake. I think I’ll turn your secp256k1_fe_verify into a real asserting test, and call it everywhere magnitude assumptions are made.

For solving the actual problem (if there actually is one), I’ll see if we can relax the magnitude promises made by _sqr and _mul without additional normalizations.

sipa commented at 10:57 pm on June 22, 2014: contributor

Ok, I think you misread the constraints implies by the magnitude system.

Magnitude M means d[0]…d[3] are allowed to be up to 2^53-1; that is 0x 1 FFFFFFFFFFFFFULL * M, not 0x 0 FFFFFFFFFFFFFULL * M. Similarly, d[4] is allowed to be up to 0x 1 FFFFFFFFFFFFULL * M.

I think I originally intended magnitude 1 to mean no overflows apart from allowing up to 2^256-1, instead of up to N-1, but realized that this required an extra reduction step in multiplication, which seemed like it could be avoided, so I relaxed it. I hope this is consistent everywhere.

I did implement (based on your secp256k1_fe_verify function) a stronger check that allowed up to 2^52 + 2^32 - 1 (and 2^48 + 2^32 - 1), and enabled these checks for all secp256k1_fe_ functions, and the unit tests (including the one you added here) pass with it.

Stronger checks (and stronger contracts) are certainly useful, but I believe now that the code is correct.

peterdettman commented at 2:20 am on June 23, 2014: contributor

Ah yes, I read that wrong. In my defence, I was reasoning from the implementation of _negate, which doesn’t seem consistent with the now-clarified magnitude limits. Perhaps it’s better at least to say (M+1)*(2^52-1)?

However, if you won’t mind me saying, I think widening the magnitude limit for the lower limbs is going in the wrong direction. I’m thinking it might be simpler to just let d[4] have an extra bit (or even the same range as the other limbs) at mag 1. With an early reduction of the highest limbs of the intermediate product, mul and sqr ought to be able to make a single reduction pass and leave any extra in d[4]. I haven’t entirely thought this through yet, but I think it would be worthwhile to prototype mul/sqr with an early t9 reduction and see what would end up in t4 after the first pass. The larger size of the intermediate product could be a problem, but perhaps again here, mul/sqr could just reduce t4 at the start of the calculation.

sipa commented at 1:01 am on June 25, 2014: contributor

If you can find a way to make the reduction at the end of multiplication / squaring stronger without impacting performance, by all means please do.

As long as we don’t have that, I think we should make the assumptions around magnitude clear, and verify them throughout the field operations when -DVERIFY is enabled.

peterdettman commented at 3:52 am on June 25, 2014: contributor

Certainly let’s check the magnitudes under VERIFY. mul/sqr can await code demonstrating stronger reductions as you say.

In the meantime, suppose we set the magnitude limits to (M« 52)-1 (resp. 48) and change the contracts for mul/sqr to say 16=>2. I figure _negate could be 15=>M+1. Perhaps _add should be restricted to producing M<=16 output also. Existing group operations might need some adjustments for the negate parameters but otherwise remain valid.

My reasoning is that we are throwing away a bit of resolution by using 2^53-1. The above allows max 4 bits overflow in the limbs, which the current limits already imply. Again though, 9x29 would be a problem and we’d have to use max 8 after all.

I do still think _negate is technically wrong in its current form (consider a->n[0] == 2^53-1 at mag 1), although I don’t actually see how one could go about generating a problematic input easily.

sipa commented at 3:15 pm on September 2, 2014: contributor

Closing this, as #44 took care of this. Reworking the contracts to get rid of the factor 2 can happen elsewhere.

sipa closed this on Sep 2, 2014

peterdettman deleted the branch on Oct 30, 2014

Test demonstrating discrepancy in sqr output #36