WIP batching ecdsa verification using recovery with no recid #658

elichai commented at 1:17 AM on August 22, 2019: contributor

Hi, The idea was proposed by @roconnor-blockstream https://github.com/bitcoin/bitcoin/issues/15621

Basically the idea is to change the cost of verification of k-of-n from n verifications(amount of pubkeys) to k recoveries (amount of signatures).

Currently the performance aren't great: (no gmp, no endo(bitcoin's default))

$ ./becnch_recover
ecdsa_recover: min 66.2us / avg 68.4us / max 74.5us
ecdsa_recover_batch: 10: min 84.5us / avg 85.1us / max 85.8us
$ ./bench_verify 
ecdsa_verify: min 58.0us / avg 59.2us / max 60.2us
ecdsa_verify_openssl: min 334us / avg 340us / max 346us

Although it's already improved by batching the gej->ge conversions and by batching the Rx inversions. This can be further optimized using a suggestion from @sipa to pass in the public keys and then match against them using something like:

static int ge_equals_gej_var(const secp256k1_ge *a, const secp256k1_gej *b) {
  secp256k1_fe z2s;
  secp256k1_fe u1, s1;
  if (a->infinity && b->infinity) {
    return 1;
  } else if (a->infinity || b->infinity) {
    return 0;
  }

#ifdef VERIFY
  VERIFY_CHECK(b->x.magnitude == 1);
  VERIFY_CHECK(b->y.magnitude == 1);
  secp256k1_fe_verify(&b->x);
  secp256k1_fe_verify(&b->y);
#endif

  /* Check a.x * b.z^2 == b.x && a.y * b.z^3 == b.y, to avoid inverses. */
  secp256k1_fe_sqr(&z2s, &b->z);
  secp256k1_fe_mul(&u1, &a->x, &z2s);
  secp256k1_fe_mul(&s1, &a->y, &z2s); secp256k1_fe_mul(&s1, &s1, &b->z);
  return secp256k1_fe_equal_var(&u1, &b->x) && secp256k1_fe_equal_var(&s1, &b->y);
}

And then matching through the list (need to think how to do it the most efficiently because it's still 2 mul and a sqrt).

Other idea might be a way to batch secp256k1_gej_add_var(&xj2, &xj2, &u1j, NULL); which is basically computing A = R+X, B=R-X. I don't have a concrete suggestion into how to batch that yet but it needs some thought.

This is still a work in progress as we would want to see good performance before even considering this, although it's already going to provide a performance boost in cases that k < 7/10*n (2-of-3 and up) but i'm not sure differentiating these cases is a great idea(more complexity to consensus code)

Edit: For reference @0xb10c thankfully provided me with some stats on multisig. currently 9.2% of all historic UTXOs are multisig. with the following distribution(ordered from least used to most): (Thanks! @0xb10c)

4-of-8: 1
1-of-14: 1
11-of-11: 1
13-of-13: 1
6-of-15: 1
4-of-11: 1
6-of-10: 2
1-of-8: 2
1-of-10: 2
6-of-8: 2
8-of-10: 2
7-of-12: 2
3-of-10: 2
2-of-12: 3
5-of-15: 3
5-of-10: 4
8-of-9: 5
12-of-12: 5
1-of-9: 7
12-of-15: 8
6-of-11: 8
6-of-7: 9
10-of-15: 11
3-of-9: 11
5-of-9: 12
6-of-9: 13
2-of-10: 14
3-of-15: 21
1-of-15: 27
5-of-6: 28
6-of-6: 41
9-of-9: 51
2-of-14: 66
8-of-15: 70
2-of-15: 74
7-of-7: 98
15-of-15: 102
1-of-12: 117
5-of-8: 120
10-of-10: 144
8-of-8: 150
3-of-8: 169
6-of-13: 250
11-of-15: 280
5-of-7: 306
2-of-7: 322
4-of-9: 431
2-of-8: 600
5-of-5: 711
2-of-9: 1024
1-of-7: 1801
4-of-7: 2174
4-of-4: 2870
1-of-5: 3216
1-of-4: 3702
1-of-6: 6260
4-of-5: 10719
3-of-7: 16731
2-of-5: 20763
4-of-6: 21304
3-of-6: 24457
3-of-3: 29275
1-of-3: 39495
2-of-4: 310673
1-of-1: 478276
2-of-6: 511406
3-of-5: 624550
1-of-2: 651491
3-of-4: 3849322
2-of-2: 25711364
2-of-3: 73384150
Total inputs: 1143408605
Total multisig inputs: 105709334
ms inputs / inputs 0.0924510568992963

Add a new FE method for addition, and a batch inverse for scalars dee74d3606

elichai renamed this:
~~WIF batching ecdsa verification using recovery with no recid~~
WIP batching ecdsa verification using recovery with no recid
on Aug 22, 2019

elichai force-pushed on Aug 22, 2019

peterdettman commented at 9:31 AM on August 22, 2019: contributor

Other idea might be a way to batch secp256k1_gej_add_var(&xj2, &xj2, &u1j, NULL); which is basically computing A = R+X, B=R-X. I don't have a concrete suggestion into how to batch that yet but it needs some thought.

From memory, calculating both P+Q and P-Q can be combined with the total cost of one point addition + 1 field mul + 1 field sqr (+ some cheaper field ops).

To see this, consider P-Q = P+(-Q), -Q = (Q.x, -Q.y), therefore much of the point addition P+Q is the same as for P+(-Q), so merging the two calculations can avoid the redundancies.

Add batch verification for recovery modules 16ec57cf9f

Added tests for batch recovery 0c76f58694

Added benchmarks for batch ecdsa recovery b53ca6d690

elichai force-pushed on Aug 22, 2019

Batch subtraction and addition ffa5a8ba86

elichai commented at 1:29 PM on September 27, 2019: contributor

I batched the addition and subtraction and it doesn't seem to change much in the benchmark (should've checked how much time is spent on that lol)

Add tests for batch add-sub ace6430058

Pass pubkeys into the batch verification to skip gej->ge conversions d73df81906

in src/modules/recovery/main_impl.h:123 in ace6430058 outdated

 119 | @@ -120,6 +120,69 @@ static int secp256k1_ecdsa_sig_recover(const secp256k1_ecmult_context *ctx, cons
 120 |      return !secp256k1_gej_is_infinity(&qj);
 121 |  }
 122 |  
 123 | +static int secp256k1_ecdsa_sig_recover_four(const secp256k1_ecmult_context *ctx, secp256k1_gej out[], size_t *n, const secp256k1_scalar *sigr, const secp256k1_scalar *rn, const secp256k1_scalar* sigs, const secp256k1_scalar *message) {

gmaxwell commented at 12:41 AM on October 22, 2019:

secp256k1_gej out[]

FWIW, you pretty much never want to pass an array argument in C. ( https://lkml.org/lkml/2015/9/3/428 )

elichai commented at 11:03 AM on October 22, 2019:

Yeah I know. I did it because I find it more readable than const pointers of const pointers. But if it bothers people i'll refactor the function declarations before it get's out of WIP.

elichai commented at 11:05 AM on October 22, 2019:

Actually I'm a bit sad that because of stupidity of people doing sizeof(arr) and expecting to get the size of the arrays we lose array arguments. I find them as a nice way for the code to self document the function's contract.

elichai commented at 1:22 PM on October 22, 2019: contributor

There are some rough edges but I think it's ready for review, at least on the directions i'm taking here.

Still missing a bunch more tests and benchmakrs (for the API and for the new internal functions)

elichai marked this as ready for review on Oct 22, 2019

elichai commented at 9:55 AM on October 24, 2019: contributor

I'll try to sum up here the IRC conversation on the topic with @sipa and @gmaxwell:

limit amount of keys/sigs to 32. then the output can be a bitmask for which keys. that way it's only n outputs instead of n*4 (which is an unnecessary detail and complication) (though this will require the caller to loop through each int output to convert the bitmask to an array index). This will also let us to potentially allocate on the stack 32 elements of gej/gejs,scalars etc. such that we don't need the scratch space(and heap) anymore(at the cost of always allocating max needed on the stack).
Another thing to take into consideration is ordering. meaning that 2-of-3 that has 2 signatures for the first 2 keys will be faster than any kind of recovery batching. (i.e. just 2 verifications).
@gmaxwell proposed another optimization I haven't really understood and linked https://cse.iitkgp.ac.in/~abhij/publications/ECDSA-SP-ACNS2014.pdf, will read the paper later. (feel free to expand on that optimization though)
I think the main decision was that best optimizations can be achieved if this function won't try to be general purpose but more CMS specific. examples of optimizations(not an exhaustive list): A. Stop deserializing keys after the first failure. (The discussion wasn't clear about the definition of deserializing failure in the context of consensus) B. Being able to fail/succeed early (by passing the threshold in). C. switching between batch and regular verification. (i.e. @gmaxwell proposed that if in k-of-n the first sig matches key number n-k then we should use regular verification).

elichai commented at 9:59 AM on October 24, 2019: contributor

I propose we use the following interface. would love feedback on it.

We have a direct CMS function that accepts, pubkeys, signatures, messages, k, n(or a float of k/n? probably a bad idea). The actual parsing of keys would be on core side and implemented in a way that on the first failure to parse it will stop parsing and set the current position as the actual n. that way if there's e.g. a 6-of-10 scheme but it fails to parse key number 8 it will pass here 8 keys and 6 signatures and tell secp that it's actually a 6-of-8. and then if the first two sigs fail to generate valid keys then we can fail early.

And this way we separate the weird der_lax and other requirements(i.e. compressed/uncompressed) into core. but the rest of CMS will be in secp. Thoughts? @sipa @gmaxwell

gmaxwell commented at 9:28 PM on October 31, 2019: contributor

No float. That would be crazy. :P There should be no float anywhere in the library itself (benchmarks or some test harness-- whatever, sure--) the library should work on systems that don't implement float, plus float is begging for crazy off-by-ones.

N and M is fine. That interface sounds good, -- though I think I'll need to think more about it being sufficient for Bitcoin, I think it is at least if we accept changing the policy (not consensus) behaviour, which I think would be fine.

I like this style of interface because it says little about how its internally implemented, which is usually a good idea because the caller won't be a super crypto expert and also because it lets the inside be flexible (like using whatever verification approach makes the most sense for the input).

real-or-random commented at 12:53 PM on May 8, 2023: contributor

Closing due to lack of activity.

real-or-random closed this on May 8, 2023

z3r086 approved