Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning)

theStack commented at 12:06 pm on October 31, 2025: contributor

Description

This PR implements BIP352 with scanning limited to full-nodes. Light-client scanning is planned to be added in a separate PR in the future. The following 7 API functions are currently introduced:

Sender side [BIP description]:

secp256k1_silentpayments_sender_create_outputs: given a list of $n$ secret keys $a_1 … a_n$, a serialized outpoint, and a list of recipients (each consisting of silent payments scan pubkey and spend pubkey), create the corresponding transaction outputs (x-only public keys) for the sending transaction

Receiver side, label creation [BIP description]:

secp256k1_silentpayments_recipient_label_create: given a scan secret key and label integer, calculate the corresponding label tweak and label object
secp256k1_silentpayments_recipient_label_serialize: given a label object, create the corresponding 33-byte serialization
secp256k1_silentpayments_recipient_label_parse: given a 33-byte label representation, create the corresponding label object
secp256k1_silentpayments_recipient_create_labeled_spend_pubkey: given a spend public key and a label object, create the corresponding labeled spend public key

Receiver side, scanning [BIP description]:

secp256k1_silentpayments_recipient_prevouts_summary_create: given a list of $n$ public keys $A_1 … A_n$ and a serialized outpoint, create a prevouts_summary object needed for scanning
secp256k1_silentpayments_recipient_scan_outputs: given a prevouts_summary object, a recipients scan secret key and spend public key, and the relevant transaction outputs (x-only public keys), scan for outputs belonging to the recipients and and return the tweak(s) needed for spending the output(s). Optionally, a label_lookup callback function can be provided to also scan for labels.

For a higher-level overview on what these functions exactly do, it’s suggested to look at a corresponding Python implementation that was created based on the secp256k1lab project (it passes the test vectors, so this “executable pseudo-code” should be correct).

Changes to the previous take

Based on the latest state of the previous PR #1698 (take 3), the following changes have been made:

rebased on master (adapting to the new unit test framework added in #1734; thanks to furszy for his mini-guide)
removed functionality that is only relevant for light clients (_prevout_summary_{parse,serialize}, __recipient_create_output_pubkeys), adapted tests and benchmark accordingly
adapted examples to have both Bob and Carol do full-node scanning, with a TODO to switch the latter to light-client scanning once available in the future
addressed various review comments, if still relevant (https://github.com/bitcoin-core/secp256k1/pull/1698#discussion_r2433803533, #1698 (review), #1698 (review), #1698 (review))

The scope reduction isn’t immediately visible in commit count (only one commit was only introducing light-client relevant functionality and could be completely removed), but the review burden compared #1698 is still significantly lower in terms of LOC, especially in the receiving commit.

Open questions / TODOs

~~Recent proposals of reducing the worst-case scanning time (see posts by w0xlt and jonasnick, #1698 (comment) ff.) are not taken into account yet.~~ :arrow_right: solved by marking already-found outputs, see #1765 (comment) :heavy_check_mark:
Not providing prevouts_summary (de)serialization functionality yet in the API poses the risk that users try to do it anyway by treating the opaque object as “serialized”. How to cope with that? Is adding a “don’t do this” comment in API header sufficient? :arrow_right: solved by mentioning a “don’t do this” comment in the API header (same phrasing as in other modules), see #1765 (comment) :heavy_check_mark:

real-or-random added the label feature on Nov 6, 2025

w0xlt commented at 9:27 pm on November 6, 2025: contributor

Added the optimized version on top of this PR: https://github.com/w0xlt/secp256k1/commit/8d16914cad57ba07da09d104f0c605ae6284462f

For more context: #1698 (comment)

theStack commented at 1:58 am on November 7, 2025: contributor

Small supplementary update: I’ve created a corresponding Python implementation of the provided API functions based on secp256k1lab (https://github.com/theStack/secp256k1lab/blob/add_bip352_module_review_helper/src/secp256k1lab/bip352.py) (also linked in the PR description). The hope is that this makes reviewing this PR a bit easier by having a less noisy, “executable pseudo-code”-like description on what happens under the hood. The code passes the BIP352 test vectors and hence should be correct.

Added the optimized version on top of this PR: w0xlt@8d16914

For more context: #1698 (comment)

Thanks for rebasing on top of this PR, much appreciated! I will take a closer look within the next days.

w0xlt commented at 2:04 am on November 14, 2025: contributor

Nit: Not related to optimization, but the diff below removes some redundant public-key serialization code:

  0diff --git a/src/modules/silentpayments/main_impl.h b/src/modules/silentpayments/main_impl.h
  1index 106da20..922433d 100644
  2--- a/src/modules/silentpayments/main_impl.h
  3+++ b/src/modules/silentpayments/main_impl.h
  4@@ -21,6 +21,19 @@
  5 /** magic bytes for ensuring prevouts_summary objects were initialized correctly. */
  6 static const unsigned char secp256k1_silentpayments_prevouts_summary_magic[4] = { 0xa7, 0x1c, 0xd3, 0x5e };
  7 
  8+/* Serialize a ge to compressed 33 bytes. Keeps eckey_pubkey_serialize usage uniform
  9+ * (expects non-const ge*), and centralizes the VERIFY_CHECK. */
 10+static SECP256K1_INLINE void secp256k1_sp_ge_serialize33(const secp256k1_ge* in, unsigned char out33[33]) {
 11+    size_t len = 33;
 12+    secp256k1_ge tmp = *in;
 13+    int ok = secp256k1_eckey_pubkey_serialize(&tmp, out33, &len, 1);
 14+#ifdef VERIFY
 15+    VERIFY_CHECK(ok && len == 33);
 16+#else
 17+    (void)ok;
 18+#endif
 19+}
 20+
 21 /** Sort an array of silent payment recipients. This is used to group recipients by scan pubkey to
 22  *  ensure the correct values of k are used when creating multiple outputs for a recipient.
 23  *
 24@@ -68,13 +81,11 @@ static int secp256k1_silentpayments_calculate_input_hash_scalar(secp256k1_scalar
 25     secp256k1_sha256 hash;
 26     unsigned char pubkey_sum_ser[33];
 27     unsigned char input_hash[32];
 28-    size_t len;
 29     int ret, overflow;
 30 
 31     secp256k1_silentpayments_sha256_init_inputs(&hash);
 32     secp256k1_sha256_write(&hash, outpoint_smallest36, 36);
 33-    ret = secp256k1_eckey_pubkey_serialize(pubkey_sum, pubkey_sum_ser, &len, 1);
 34-    VERIFY_CHECK(ret && len == sizeof(pubkey_sum_ser));
 35+    secp256k1_sp_ge_serialize33(pubkey_sum, pubkey_sum_ser);
 36     secp256k1_sha256_write(&hash, pubkey_sum_ser, sizeof(pubkey_sum_ser));
 37     secp256k1_sha256_finalize(&hash, input_hash);
 38     /* Convert input_hash to a scalar.
 39@@ -85,15 +96,13 @@ static int secp256k1_silentpayments_calculate_input_hash_scalar(secp256k1_scalar
 40      * an error to ensure strict compliance with BIP0352.
 41      */
 42     secp256k1_scalar_set_b32(input_hash_scalar, input_hash, &overflow);
 43-    ret &= !secp256k1_scalar_is_zero(input_hash_scalar);
 44+    ret = !secp256k1_scalar_is_zero(input_hash_scalar);
 45     return ret & !overflow;
 46 }
 47 
 48 static void secp256k1_silentpayments_create_shared_secret(const secp256k1_context *ctx, unsigned char *shared_secret33, const secp256k1_ge *public_component, const secp256k1_scalar *secret_component) {
 49     secp256k1_gej ss_j;
 50     secp256k1_ge ss;
 51-    size_t len;
 52-    int ret;
 53 
 54     secp256k1_ecmult_const(&ss_j, public_component, secret_component);
 55     secp256k1_ge_set_gej(&ss, &ss_j);
 56@@ -103,12 +112,7 @@ static void secp256k1_silentpayments_create_shared_secret(const secp256k1_contex
 57      * impossible at this point considering we have already validated the public key and
 58      * the secret key.
 59      */
 60-    ret = secp256k1_eckey_pubkey_serialize(&ss, shared_secret33, &len, 1);
 61-#ifdef VERIFY
 62-    VERIFY_CHECK(ret && len == 33);
 63-#else
 64-    (void)ret;
 65-#endif
 66+    secp256k1_sp_ge_serialize33(&ss, shared_secret33);
 67 
 68     /* Leaking these values would break indistinguishability of the transaction, so clear them. */
 69     secp256k1_ge_clear(&ss);
 70@@ -585,7 +589,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
 71                 secp256k1_ge output_negated_ge, tx_output_ge;
 72                 secp256k1_gej tx_output_gej, label_gej;
 73                 unsigned char label33[33];
 74-                size_t len;
 75 
 76                 secp256k1_xonly_pubkey_load(ctx, &tx_output_ge, tx_outputs[j]);
 77                 secp256k1_gej_set_ge(&tx_output_gej, &tx_output_ge);
 78@@ -595,7 +598,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
 79                 secp256k1_ge_neg(&output_negated_ge, &output_ge);
 80                 secp256k1_gej_add_ge_var(&label_gej, &tx_output_gej, &output_negated_ge, NULL);
 81                 secp256k1_ge_set_gej_var(&label_ge, &label_gej);
 82-                ret = secp256k1_eckey_pubkey_serialize(&label_ge, label33, &len, 1);
 83                 /* Serialize must succeed because the point was just loaded.
 84                  *
 85                  * Note: serialize will also fail if label_ge is the point at infinity, but we know
 86@@ -603,7 +605,7 @@ int secp256k1_silentpayments_recipient_scan_outputs(
 87                  * Thus, we know that label_ge = tx_output_gej + output_negated_ge cannot be the
 88                  * point at infinity.
 89                  */
 90-                VERIFY_CHECK(ret && len == 33);
 91+                secp256k1_sp_ge_serialize33(&label_ge, label33);
 92                 label_tweak = label_lookup(label33, label_context);
 93                 if (label_tweak != NULL) {
 94                     found = 1;
 95@@ -617,7 +619,6 @@ int secp256k1_silentpayments_recipient_scan_outputs(
 96                 secp256k1_gej_neg(&label_gej, &tx_output_gej);
 97                 secp256k1_gej_add_ge_var(&label_gej, &label_gej, &output_negated_ge, NULL);
 98                 secp256k1_ge_set_gej_var(&label_ge, &label_gej);
 99-                ret = secp256k1_eckey_pubkey_serialize(&label_ge, label33, &len, 1);
100                 /* Serialize must succeed because the point was just loaded.
101                  *
102                  * Note: serialize will also fail if label_ge is the point at infinity, but we know
103@@ -625,7 +626,7 @@ int secp256k1_silentpayments_recipient_scan_outputs(
104                  * Thus, we know that label_ge = tx_output_gej + output_negated_ge cannot be the
105                  * point at infinity.
106                  */
107-                VERIFY_CHECK(ret && len == 33);
108+                secp256k1_sp_ge_serialize33(&label_ge, label33);
109                 label_tweak = label_lookup(label33, label_context);
110                 if (label_tweak != NULL) {
111                     found = 1;

w0xlt commented at 2:25 am on November 14, 2025: contributor

nit: The following diff removes the implicit cast and clarifies that k is 4 bytes

 0diff --git a/src/modules/silentpayments/main_impl.h b/src/modules/silentpayments/main_impl.h
 1index 922433d..d94aed6 100644
 2--- a/src/modules/silentpayments/main_impl.h
 3+++ b/src/modules/silentpayments/main_impl.h
 4@@ -512,7 +512,8 @@ int secp256k1_silentpayments_recipient_scan_outputs(
 5     secp256k1_xonly_pubkey output_xonly;
 6     unsigned char shared_secret[33];
 7     const unsigned char *label_tweak = NULL;
 8-    size_t j, k, found_idx;
 9+    size_t j, found_idx;
10+    uint32_t k;
11     int found, combined, valid_scan_key, ret;
12 
13     /* Sanity check inputs */

in src/modules/silentpayments/main_impl.h:625 in c11d30c25f

620+                ret = secp256k1_eckey_pubkey_serialize(&label_ge, label33, &len, 1);
621+                /* Serialize must succeed because the point was just loaded.
622+                 *
623+                 * Note: serialize will also fail if label_ge is the point at infinity, but we know
624+                 * this cannot happen since we only hit this branch if tx_output != output_xonly.
625+                 * Thus, we know that label_ge = tx_output_gej + output_negated_ge cannot be the

jonasnick commented at 4:21 pm on November 14, 2025:

Should be - tx_output_gej?

jonasnick commented at 4:25 pm on November 14, 2025: contributor

Thanks @theStack for the new PR. I can confirm that this PR is a rebased version of #1698, with the light client functionality removed and comments addressed, except for:

#1698 (review)
#1698 (review)
#1698#pullrequestreview-3313483669 (only the last one, “elemement”)

jonasnick commented at 4:33 pm on November 14, 2025: contributor

Not providing prevouts_summary (de)serialization functionality yet in the API poses the risk that users try to do it anyway by treating the opaque object as “serialized”. How to cope with that? Is adding a “don’t do this” comment in API header sufficient?

Is there a reason for serializing prevouts_summary without light client functionality? If not, I think the don’t do this comment is sufficient. Right now, in contrast to the docs of all other opaque objects, this is missing, however:

The exact representation of data inside the opaque data structures is implementation defined and not guaranteed to be portable between different platforms or versions.

theStack force-pushed on Nov 15, 2025

theStack commented at 0:48 am on November 15, 2025: contributor

@w0xlt, @jonasnick: Thanks for the reviews! I’ve addressed the suggested changes:

in _recpient_scan_outputs: changed the type of k to uint32_t (comment above)
in _recipient_create_label: added a scan key validity check (+added a test for that) (#1698 - comment)
unified all mentions of “Silent Payments” to title case in the header API and example (#1698 - comment)
fixed typo s/elemement/element/ (#1698 - review)
in _recipient_scan_outputs: fixed comment in second label candidate (review above)
extended the API header comment for the _prevouts_summary opaque data structure, to point out that the data structure is implementation defined (like docs of all other opaque structs) (comment above)

Nit: Not related to optimization, but the diff below removes some redundant public-key serialization code:

Given that this compressed-pubkey-serialization pattern shows up repeatedly also in other modules (ellswift, musig), I think it would make the most sense to add a general helper (e.g. in eckey{,_impl}.h), which could be done in an independent PR. I’ve opened issue #1773 to see if there is conceptual support for doing this.

Not providing prevouts_summary (de)serialization functionality yet in the API poses the risk that users try to do it anyway by treating the opaque object as “serialized”. How to cope with that? Is adding a “don’t do this” comment in API header sufficient?

Is there a reason for serializing prevouts_summary without light client functionality? If not, I think the don’t do this comment is sufficient.

Good point, I can’t think of a good reason for full nodes wanting to serialize prevouts_summary.

theStack force-pushed on Nov 15, 2025

nymius commented at 10:52 pm on November 20, 2025: none

To address the open questions, I’ve reviewed the proposed changes by @w0xlt on 8d16914cad57ba07da09d104f0c605ae6284462f.

I’m going to focus more on the key aspects I extracted from the review and the merits of each change, rather on the big O improvement claims, because I didn’t get that far.

These are multiple different changes rather than a single one, so to make the review easier I suggest to brake it in multiple commits. I would state on each of them the purpose and the real case scenario where the change would be relevant.

Also, I would use clearer names for the variables or at least document their purpose.

The changes I’ve identified so far are the following:

Improve label lookup using hash table: I think this is implementation dependent and should be improved by the user rather than by the library itself. Examples, as part of the documentation, are a usage demonstration, although can point the user the best practices, I prefer clarity rather than performance on them. For example, on the rust-secp256k1 bindings, I used a HashMap for the example because it is a familiar standard structure for Rust users. If they would like to gain more performance there, they have other tools available to replace that structure by themselves.
On secp256k1_silentpayments_recipient_sort_cmp, I understood the change: (r1->index < r2->index) ? -1 : (r1->index > r2->index) ? 1 : 0 is to make the unstable secp256k1_hsort implementation stable. Considering it only affects secp256k1_silentpayments_sender_create_outputs, which didn’t receive more changes than a variable removal, what are the performance improvements there?
SECP256K1_SP_SCAN_BATCH: I just learned about it (ge_set_gej_all) during this review, but seems that affine to jacobian conversion is needed for the comparison against labels, and is faster to do it in batches. I think the impact of this improvement will only affect the small subset of transaction with large amount of outputs. A good benchmark for this would be coinjoin transactions, although is not supported by BIP 352, a test case is doable.
Double head: I’m not sure of the target of this change, I guess is to skip already found or scanned inputs, but couldn’t figure out what is tracking each head.
Binary tree search for x-only lookups: I think it explain itself, faster lookups on the not labeled case. This may have its merits, but I need to remove the other changes to have a clear answer.

In general I agree with @jonasnick that we should define a clear target to benchmark and improve. As I’ve said before, the base case should be a wallet with a single label for change. For other improvements, I would try to match up plausible real world scenarios before making complex changes in the code base of the PR. Finally, by looking at bench_silentpayments_full_tx_scan, the use_labels case is very simple. If we want to test performance improvements on the label lookup, I would start there.

In conclusion, from the proposed commit and the discussion around it, the only changes I’ve found clear enough to consider are:

Tracking of found outputs: simple enough, small performance improvement for the usual case, better for larger transactions.

w0xlt commented at 11:34 pm on November 20, 2025: contributor

Thanks @nymius for reviewing the changes, addressing the main points, and proposing a simplification.

I’m currently splitting the optimization commit into smaller pieces to make it easier to review. I’ll also take a closer look at your commit and run it against the benchmark files.

The only part of the discussion that still feels a bit ambiguous is the “base” or “usual” case. From my understanding of #1698#pullrequestreview-3341766084, the concern is not about typical usage, but rather about an attacker crafting malicious transactions with many outputs, causing the scanning process to take hours.

So the goal of this optimization would be to mitigate that scenario, not the collaborative one.

w0xlt commented at 5:55 am on November 21, 2025: contributor

I ran the examples/silentpayments_mixed_1.c file with simplified the version suggested by @nymius https://github.com/jonasnick/secp256k1/commit/311b4ebb2bf6612c46cefd80a8812d9c9b5dc6c4 . It shows slightly worse performance (see below), but the simpler approach may still be worth it

Without the secp256k1_silentpayments_recipient_sort_cmp stabilization, I got 82s for the complex version vs. 114s for the simpler one. Whether the 30s difference justifies the additional complexity is up for discussion — I don’t have a strong opinion.

Answering the questions: secp256k1_silentpayments_recipient_sort_cmp stabilization + heads speed up the non-adversarial case. The stable sort ensures that the transaction outputs are ordered sequentially by the index $k$.

The optimized receiver implementation relies on a heuristic (the head pointers) that assumes the next output it is looking for ($k+1$) is located immediately after the previous one ($k$).

With Stable Sort: The scanner complexity is roughly O(N) (Linear).
Without Stable Sort: The scanner complexity degrades to O(N²) (Quadratic).

theStack commented at 3:46 pm on November 21, 2025: contributor

@w0xlt, @nymius: Thanks for investigating this deeper. I’ve now also had a chance to look at the suggested optimizations and came to similar conclusions as stated in #1765 (comment). I particularly agree with the stated points that the changes should not increase complexity significantly and that the most important optimization candidate to consider for mitigating the worst-case scanning attack is “skip outputs that we have already found” (as previously stated by @jonasnick, see #1698 (comment) and https://github.com/jonasnick/secp256k1/commit/311b4ebb2bf6612c46cefd80a8812d9c9b5dc6c4). I don’t think stabilizing the sorting helps at all, since this is something that happens at the sender side, and we can’t rely on the attacker using a specific implementation (even if they did, it’s trivial for them to shuffle the outputs after creation).

For the proposed target to benchmark, I’m proposing the following modified example that exhibits the worst-case scanning time based on a labels cache with one entry (for change outputs), by creating a tx with 23255 outputs [1] all targeted for Bob: https://github.com/bitcoin-core/secp256k1/commit/1df4287eb8daa34a07888300e0b18ba1c97a3432

Shower-thought from this morning: what if we treat the tx_outputs input as actual list that we modify, and remove an entry if it is found? This would have a similar effect as the “track found outputs” idea, but without the need of dynamic memory allocation. It’s a ~10-lines diff and seems to work fine: https://github.com/theStack/secp256k1/commit/9aba4709b937eb44507494b3bd5e80871193671c It reduces the run-time of the proposed example above from ~10 minutes to roughly ~2 seconds on my machine.

Any thoughts on this? Maybe I’m still missing something.

[1] that’s an upper bound of maximum outputs per block: floor(1000000/43) = 23255

nymius commented at 7:34 pm on November 21, 2025: none

Shower-thought from this morning: what if we treat the tx_outputs input as actual list that we modify, and remove an entry if it is found? This would have a similar effect as the “track found outputs” idea, but without the need of dynamic memory allocation. It’s a ~10-lines diff and seems to work fine: theStack@9aba470 It reduces the run-time of the proposed example above from ~10 minutes to roughly ~2 seconds on my machine.

Any thoughts on this? Maybe I’m still missing something.

1df4287eb8daa34a07888300e0b18ba1c97a3432 is a good target. These are the runtime times I’ve obtained testing against this target:

Branch	Runtime
theStack@9aba470	~1.41s
theStack@9103229d27d85fa8b199705f29bd7dada54ebaa7 (baseline)	~9m
jonasnick@311b4eb	~1.49s

I had to increase stack size to be able to fit all N_OUTPUT size allocations in the example.

Initially I preferred the is_found allocation rather than the element shifts. But your solution seems to be more performant.

w0xlt commented at 10:14 am on November 22, 2025: contributor

@theStack Yes — if we want to keep only the adversarial-scenario optimizations, we can drop sort stabilization and the extra heads.

I like your idea of avoiding dynamic memory allocation. That’s a very interesting direction. On my machine, the scan completes in about 0.4s, which feels like a good balance between simplicity and the optimization needed for the labeled case.

Below are the changes I had to make for your example to run on my machine and to record the scan time.

 0diff --git a/examples/silentpayments.c b/examples/silentpayments.c
 1index 5e71e73..d43332f 100644
 2--- a/examples/silentpayments.c
 3+++ b/examples/silentpayments.c
 4@@ -10,6 +10,7 @@
 5 #include <stdio.h>
 6 #include <stdlib.h>
 7 #include <string.h>
 8+#include <time.h>
 9 
10 #include <secp256k1_extrakeys.h>
11 #include <secp256k1_silentpayments.h>
12@@ -112,15 +113,21 @@ const unsigned char* label_lookup(
13     return NULL;
14 }
15 
16+static secp256k1_xonly_pubkey tx_inputs[N_INPUTS];
17+static const secp256k1_xonly_pubkey *tx_input_ptrs[N_INPUTS];
18+static secp256k1_xonly_pubkey tx_outputs[N_OUTPUTS];
19+static secp256k1_xonly_pubkey *tx_output_ptrs[N_OUTPUTS];
20+static secp256k1_silentpayments_found_output found_outputs[N_OUTPUTS];
21+static secp256k1_silentpayments_found_output *found_output_ptrs[N_OUTPUTS];
22+static secp256k1_silentpayments_recipient recipients[N_OUTPUTS];
23+static const secp256k1_silentpayments_recipient *recipient_ptrs[N_OUTPUTS];
24+/* 2D array for holding multiple public key pairs. The second index, i.e., [2],
25+ * is to represent the spend and scan public keys. */
26+static unsigned char (*sp_addresses[N_OUTPUTS])[2][33];
27+
28 int main(void) {
29     unsigned char randomize[32];
30     unsigned char serialized_xonly[32];
31-    secp256k1_xonly_pubkey tx_inputs[N_INPUTS];
32-    const secp256k1_xonly_pubkey *tx_input_ptrs[N_INPUTS];
33-    secp256k1_xonly_pubkey tx_outputs[N_OUTPUTS];
34-    secp256k1_xonly_pubkey *tx_output_ptrs[N_OUTPUTS];
35-    secp256k1_silentpayments_found_output found_outputs[N_OUTPUTS];
36-    secp256k1_silentpayments_found_output *found_output_ptrs[N_OUTPUTS];
37     secp256k1_silentpayments_prevouts_summary prevouts_summary;
38     secp256k1_pubkey unlabeled_spend_pubkey;
39     struct labels_cache bob_labels_cache;
40@@ -209,11 +216,6 @@ int main(void) {
41     {
42         secp256k1_keypair sender_keypairs[N_INPUTS];
43         const secp256k1_keypair *sender_keypair_ptrs[N_INPUTS];
44-        secp256k1_silentpayments_recipient recipients[N_OUTPUTS];
45-        const secp256k1_silentpayments_recipient *recipient_ptrs[N_OUTPUTS];
46-        /* 2D array for holding multiple public key pairs. The second index, i.e., [2],
47-         * is to represent the spend and scan public keys. */
48-        unsigned char (*sp_addresses[N_OUTPUTS])[2][33];
49         unsigned char seckey[32];
50 
51         printf("Sending...\n");
52@@ -340,6 +342,9 @@ int main(void) {
53              *        `secp256k1_silentpayments_recipient_prevouts_summary_create`
54              *     2. Call `secp256k1_silentpayments_recipient_scan_outputs`
55              */
56+            clock_t start, end;
57+            double cpu_time_used;
58+
59             ret = secp256k1_silentpayments_recipient_prevouts_summary_create(ctx,
60                 &prevouts_summary,
61                 smallest_outpoint,
62@@ -356,14 +361,20 @@ int main(void) {
63 
64             /* Scan the transaction */
65             n_found_outputs = 0;
66+            
67+            start = clock();
68             ret = secp256k1_silentpayments_recipient_scan_outputs(ctx,
69                 found_output_ptrs, &n_found_outputs,
70-                (const secp256k1_xonly_pubkey * const *)tx_output_ptrs, N_OUTPUTS,
71+                (const secp256k1_xonly_pubkey **)tx_output_ptrs, N_OUTPUTS,
72                 bob_scan_key,
73                 &prevouts_summary,
74                 &unlabeled_spend_pubkey,
75                 label_lookup, &bob_labels_cache /* NULL, NULL for no labels */
76             );
77+            end = clock();
78+            cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
79+            printf("Bob's scan took %f seconds\n", cpu_time_used);
80+            
81             if (!ret) {
82                 printf("This transaction is not valid for Silent Payments, skipping.\n");
83                 return EXIT_SUCCESS;
84@@ -435,7 +446,7 @@ int main(void) {
85             n_found_outputs = 0;
86             ret = secp256k1_silentpayments_recipient_scan_outputs(ctx,
87                 found_output_ptrs, &n_found_outputs,
88-                (const secp256k1_xonly_pubkey * const *)tx_output_ptrs, 1, /* dummy scan with one output (we only care about Bob) */
89+                (const secp256k1_xonly_pubkey **)tx_output_ptrs, 1, /* dummy scan with one output (we only care about Bob) */
90                 carol_scan_key,
91                 &prevouts_summary,
92                 &unlabeled_spend_pubkey,

theStack commented at 4:39 pm on November 22, 2025: contributor

@nymius, @w0xlt: Thanks once again for the quick feedback and for benchmarking! Shortly after my previous comment, I’ve been notified about yet another approach to tackle the worst-case scanning time attack (kudos to @furszy for bringing up the idea!), that I think is even more elegant: we can use the pointers in the tx_outputs list directly to track outputs by setting them to NULL if one has been found, and accordingly only treat them if they are non-NULL. With this, it’s an only four lines of code change: https://github.com/theStack/secp256k1/commit/2087f9209df35764745dd744051e3d125d3c6b43. It kind of combines the previous two approaches of https://github.com/jonasnick/secp256k1/commit/311b4ebb2bf6612c46cefd80a8812d9c9b5dc6c4 and https://github.com/theStack/secp256k1/commit/9aba470 (-> mark spent outputs, but not in a newly allocated array, but by modifying the tx_outputs input list, in order to avoid dynamic memory allocation), with very similar run-time results.

The only tiny drawback about these non-malloc approaches might be that something that is conceptually an “in” parameter is modified, which might be a bit unsound in a strict API design sense. On the other hand, it shouldn’t matter for the user (I doubt that these lists passed in would ever be reused for anything else after by the callers), and we already do the same in the sending API for the recipients, so it’s probably fine.

@theStack Yes — if we want to keep only the adversarial-scenario optimizations, we can drop sort stabilization and the extra heads.

The way I see it currently, code paths for non-adversarial scenarios with increasing k values would be hit so rarely in practice, that I’m sceptical that it’s worth it put much effort into those optimizations. When scanning, the vast majority of transactions won’t have any matches in the first place. Out of those few that do have a match, the vast majority will very likely again not contain any repeated recipient (IMHO it doesn’t make that much sense to do that, unless the recipient explicitly asks “I want to receive my payment split up in multiple UTXOs, but still in a single tx”?), so in the bigger picture those optimizations wouldn’t matter all that much, and I’d assume that the dominant factor should be by far all the (unavoidable) ECDH computations per transaction. But that’s still more of a guess and it’s still good to already have optimization ideas at hand if we need them in the future.

w0xlt commented at 6:06 am on November 23, 2025: contributor

@theStack Thanks for continuing to refine the optimization. The deletion approach performs slightly better (0.40 s vs. 0.45 s), likely because deleting items shrinks the array and cuts the number of loop iterations by about 50% compared to nullifying them.

theStack referenced this in commit 1c8233acf2 on Nov 25, 2025

theStack referenced this in commit c16252ddcc on Nov 25, 2025

theStack force-pushed on Nov 25, 2025

theStack commented at 7:08 pm on November 25, 2025: contributor

To summarize, the following table shows the proposed mitigations for the worst-case scanning attack so far, with benchmark results from my machine. The previous baseline commit with the worst-case example has been updated to include @w0xlt’s changes, in order to work without stack size limit changes. (EDIT: These benchmark results are based on a baseline that doesn’t represent the worst-case and are thus not representative, as noticed by w0xlt below.)

Branch	Approach	Runtime
https://github.com/theStack/secp256k1/commit/c16252ddccdeaefdc6c4ec4425ddda91ee813320 (Branch baseline)	modified example to exercise worst-case scanning, no fix	641.391838s
https://github.com/theStack/secp256k1/commit/ec2797715ac51ebe9d57e396d1cb46d4fd474eab (Branch fix1_…)	mark found outputs in `calloc`ed array	0.543969s
https://github.com/theStack/secp256k1/commit/135ca0a194f871f5be9fa78d47c4aad8e13a06e0 (Branch fix2_…)	remove matched outputs by shifting remaining entries	0.514952s
https://github.com/theStack/secp256k1/commit/8360150954facf9887ebdc687dada781ddcfec9a (Branch fix3_…)	mark found outputs by `NULL` in `tx_outputs` input array	0.544740s

The run-times of the fixes vary slightly (the removal approach “fix2” being the fastest, confirming #1765 (comment) above), but are all in the same ballpark. I don’t think exact performance results matter much here, as the goal of the mitigation should be to IMHO roughly cut the run-time down from “minutes” to “seconds” (and remember, this is already for the absolute worst-case, one giant non-standard transaction filling out a whole block, and it can only slow down one specific SP recipient). Thus, I decided to pick the the simplest approach that avoids dynamic memory allocation, i.e. fix number 3 using NULL as marker in tx_outputs.

With that tackled, I believe that all of the open questions and TODOs are addressed now (updated the PR description accordingly). The latest force-push also includes a rebase on master (to include the CI fix #1771).

real-or-random referenced this in commit e7f7083b53 on Nov 27, 2025

real-or-random commented at 4:30 pm on November 27, 2025: contributor

I assume you want to rebase on master now that #1774 has been merged.

theStack force-pushed on Nov 27, 2025

theStack commented at 0:12 am on November 28, 2025: contributor

I assume you want to rebase on master now that #1774 has been merged.

Yes, done. Using the new secp256k1_eckey_pubkey_serialize33 function, the sending and receiving commit diffs got a bit smaller (as outlined in #1765#pullrequestreview-3462461331), -13 LOC in total.

nymius commented at 5:39 am on November 29, 2025: none

Thanks for the summary, I’ve checked the results. I agree with fix3_* branch as the final solution, is elegant and trade-offs are clear.

w0xlt commented at 9:47 am on November 30, 2025: contributor

Hi @theStack . That’s not actually the worst case. The worst case is when the attacker shuffles the outputs randomly (or in any adversarial order). I modified your benchmark to randomly shuffle outputs, and scanning takes several minutes to hours with the current optimization proposal.

https://github.com/w0xlt/secp256k1/commit/435cb96c2dc3c9d4a0fa967fb84432d8dd77783d

The issue is that the scanning function receives a label lookup callback, which makes it harder to build a sorted index upfront. Instead, for each output tweak index k, it linearly scans all transaction outputs — resulting in O(n²) complexity. An attacker creating ~23,000 unordered outputs can make scanning take hours.

If we slightly change the API to replace the label lookup callback with a label entry set, we can sort once and search fast:

Build a sorted index of outputs by serialized x-only pubkey — O(n log n) upfront
Binary search for each candidate — O(log n) per lookup

That way, scanning 23,255 adversarially-ordered outputs drops from hours to ~0.3s. The optimization is purely receiver-side, requires no protocol changes, and works regardless of output ordering. This reduces architectural flexibility, though.

Before: for each k, scan all n outputs → O(k × n) ≈ O(n²) After: sort outputs, binary search → O(n log n + k log n)  

The following commit implements this proposal:   https://github.com/w0xlt/secp256k1/commit/2305d7361ad4a2ead9e335ea76ea4fc047e275cf

You can verify by running ./build/bin/silentpayments_example (ordered) and ./build/bin/silentpayments_shuffled_example (shuffled).

theStack commented at 2:12 am on December 1, 2025: contributor

Hi @theStack . That’s not actually the worst case. The worst case is when the attacker shuffles the outputs randomly (or in any adversarial order). @w0xlt: Good catch, I indeed missed involving the output order for the worst-case scenario, so https://github.com/theStack/secp256k1/commit/c16252ddccdeaefdc6c4ec4425ddda91ee813320 is not a useful target (in hindsight, the benchmark results look too good to be true, and my previous words of “we can’t rely on the attacker using a specific implementation” strike back very hard). Thanks for the updated benchmark and the new worst-case fix proposal!

I modified your benchmark to randomly shuffle outputs, and scanning takes several minutes to hours with the current optimization proposal.

Are you sure it could take up “to hours”? That would be significantly worse than the previous baseline commit (~10 minutes on my machine), which seems implausible to me. In the unoptimized scanning implementation, the run-time should be very much independent on the order of outputs, as the total number of inner loop iterations is constant ($1+2+3…+N = (N*(N+1))/2 = 270409140$ for our worst-case scenario of $N = 23255$), and it’d be very surprising if any of the three proposed optimizations perform worse than the baseline for shuffled outputs.

I only did two runs so far with your shuffle-patch added, and got runtimes of ~10 minutes (unoptimized) and ~5 minutes (NULL-ify patch, “fix3”). Will do some more thorough benchmarks tomorrow, including your proposed solution.

w0xlt commented at 3:57 am on December 1, 2025: contributor

@theStack Thanks for looking into this. Sorry if I wasn’t clear earlier. The shuffled benchmark does perform better in the optimized version than in the baseline, but it loses the sub-second scan times. The new proposed approach (assuming it works as expected) should keep those sub-second times even on the shuffled benchmark.

theStack commented at 6:55 pm on December 1, 2025: contributor

@w0xlt: I’ve looked at your proposed scanning implementation https://github.com/w0xlt/secp256k1/commit/2305d7361ad4a2ead9e335ea76ea4fc047e275cf and can confirm that it fixes the worst-case scanning attack, as the run-time of the modified example with shuffled outputs is reduced to <1s. What makes this different approach potentially interesting is that it could also to speed up the common case (i.e. no matches), as the number of point additions needed for labels scanning seems to be significantly smaller for the small-scale user (where N_LABELS < 2 * N_OUTPUTS for the average transaction; if N_LABELS=1 and we assume the average transaction to scan for has N_OUTPUTS=2, your approach is 4x faster w.r.t. needed point additions). To verify that I understood it correctly and make the idea digestible to other interested reviewers, I’ve tried to summarize the current PR’s scanning approach (as proposed in BIP-352) vs. your approach on the following gist, written in a somewhat hand-wavy way, I hope the pseudo-code is understandable: https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78. Please let me know if I got that right, I think the approach leads to the same result, but I’m still trying to wrap my head around it. // EDIT: I’ve been told by @setavenger that this approach is also used by Frigate Electrum Server and the BlindBit wallet.

On the other hand, I’m still uncertain on whether adding the additional code complexity and the involvement of dynamic memory allocation at all in a secp256k1 module is worth all this, considering that we are talking about an attack that costs an adversary money (at least the fees for the whole block space) and can only slow down one specific SP recipient. It seems a very unusual kind of attack. Changing the scanning API at this point seems also questionable, as I think it went already through several iterations and refinements over the years, and I think the “pass labels directly” approach is slower for users with a higher number of labels involved. Also, the mental load for reviewers is significantly higher if the approach doesn’t follow the algorithm as specified in BIP-352, as they have to convince themselves that the result is equivalent.

I’m hence debating on whether one solution could be to keep the PR simply as-is and accept the fact that an adversary can slow down a single entity for a few minutes (for non-standard transactions), to a few seconds (for standard transactions, i.e. tx size <= 100kvB). I don’t have a good final answer to this question myself, but want to discuss is at least and come to a decision that both maintainers and users are on board with before taking any further action.

My (in retrospect somewhat naive) hope was that there could be a simple few-lines fix for reducing the worst-case scanning attack down to seconds without the need of dynamic memory allocation, but such an easy fix doesn’t seem to be in sight.

w0xlt commented at 4:51 am on December 2, 2025: contributor

Hi @theStack thanks a lot for taking the time to review the proposal and write up the comparison.

My view is that we should aim for the most efficient and resilient solution, even if the attack scenario is somewhat unusual. The spec can be updated as needed to reflect improvements. Independently of adversarial cases, reducing scanning time from several minutes to under a second is a major win for user experience.

Regarding complexity: although this approach adds some moving parts, the underlying idea (e.g., using binary search) is conceptually straightforward, so I don’t think it should pose a significant burden by itself.

As for the dynamic memory concern, the allocation doesn’t necessarily have to happen inside the secp256k1 module. We could let the caller provide the required workspace. For example, could we leverage secp256k1_scratch_space here?

theStack commented at 5:03 pm on December 2, 2025: contributor

@w0xlt: After tinkering a bit with your proposed implementation https://github.com/w0xlt/secp256k1/commit/2305d7361ad4a2ead9e335ea76ea4fc047e275cf, I found that it could be significantly simplified, and it turned out we don’t even need dynamic memory allocation:

instead of building an explicit index, we can simply sort the user-supplied tx_outputs array of x-only pubkeys in place and perform binary search directly on that (using the existing secp256k1_pubkey_xonly_pubkey_cmp)
the preparation of the label set group elements array is not needed as well; _pubkey_load is a very cheap operation, so it doesn’t hurt to call it repeatedly in each k loop iteration (remember that the common case is not having a match, i.e. k doesn’t exceed 0 for most scanned transactions)

Doing this in https://github.com/theStack/secp256k1/commit/1a40e14d31089960d4113b49eeffd287b81cd2a1 still keeps the worst-case scanning benchmark time down at ~450ms on my machine. Admittedly, there is a tiny bit of run-time overhead introduced with the simplification (repeated x-only pubkey serialization for the common case), but I don’t think that matters much, considering that the involved elliptic curve operations are likely orders of magnitude more expensive than that. With these new insights, I can retract my doubts about code complexity and dynamic memory allocation of your proposed approach, and it’s good to have two potential scanning implementations. :+1:

It should be pointed out that the “label set” approach doesn’t come for free though: one major drawback is that for users with a larger label set, the typical scanning scenario (assuming most txs don’t have more than just a few outputs) can be slower at some point, as the number of point additions is higher than with the “BIP” approach. So if we follow your approach, it’s better for small-case users and to eliminate the worst-case scanning attack, but users with a lot of labels are likely not happy with it and would very much prefer the “BIP” scanning approach. Not really sure how to deal with that and what to prioritize. (Provide two scanning implementations and let the user pick? :grimacing: )

I’ll anyways try to come up with another “modified example” benchmark to better verify the claims about typical scanning scenarios, depending on the number of labels involved and transaction output count. We have spent a good time on the worst-scanning scenario now, but it would be wrong to solely optimize for that and get the typical scanning scenario out of sight.

Independently of adversarial cases, reducing scanning time from several minutes to under a second is a major win for user experience.

I don’t agree with that statement, as I can’t think of a legit non-adversarial case that would lead to scanning times of several minutes, in particular as this would at the minimum involve creating a large non-standard transaction. When considering user experience, by far the most relevant case I’d consider is the “no match” scenario (one k loop iteration), as it is the most frequent one, followed by the “single match” (two k loop iterations) scenario. Everything above that should be already quite rare in practice.

As for the dynamic memory concern, the allocation doesn’t necessarily have to happen inside the secp256k1 module. We could let the caller provide the required workspace. For example, could we leverage secp256k1_scratch_space here?

Good question. Right now, the scratch space API is not even exposed to the user. Btw I don’t think dynamic memory allocation is a strict no-go in general, it just seems that there are still some open questions around that area, and waiting until they are all tackled would probably delay the PR for an unforeseeable time. It seems that as of now we can easily avoid dynamic memory allocation anyways though, in both of the proposed implementations.

w0xlt commented at 10:04 pm on December 2, 2025: contributor

Not really sure how to deal with that and what to prioritize. (Provide two scanning implementations and let the user pick? 😬 )

Some (very) rough ideas on the trade-off…

If we can define a general rule that determines when it’s better to use the label-set scan versus the BIP-based scan, we could do something like:

0/* Choose algorithm based on this transaction's characteristics */
1if (n_labels > 2 * n_tx_outputs) {
2    /* BIP approach: iterate over outputs, add, and look up in label callback */
3    use_bip_scan();
4} else {
5    /* Label-set approach: iterate over labels, add, and search in output index */
6    use_labelset_scan();
7}

The heuristic above is just a rough estimate. The BIP version performs roughly ~2N point additions, where N is the number of outputs, while the label-set version performs about ~L additions, where L is the number of configured labels.

But to build a reliable decision rule, we’d need more benchmark data across a range of (N, L) combinations.

theStack commented at 7:11 pm on December 3, 2025: contributor

Here is a first attempt at creating a scanning benchmark for the common case scenario (i.e. no matches and hence only one k iteration), comparing the “BIP” and “Label-set” approaches over a combination of L (number of labels to scan for) and N (number of tx outputs): https://github.com/theStack/secp256k1/commit/fef63fd40f2fad9e26eaf7145d9a38eaceb1e8ef. Running this modified example outputs the following on my arm64 machine:

 0$ ./build/bin/silentpayments_example
 1Silent Payments (BIP-352) scanning benchmarks
 2[common case scenario, i.e. only one k iteration without match]
 3
 4Legend: L... number of labels, N... number of transaction outputs
 5
 6===== BIP approach (calculate label candidates for each output, look them up in labels cache) =====
 7L= 1: [N=2:  69 us] [N=5:  82 us] [N=10:  89 us] [N=20: 121 us] [N=50: 197 us] [N=100: 343 us] [N=200: 632 us]
 8L= 2: [N=2:  61 us] [N=5:  80 us] [N=10:  78 us] [N=20: 101 us] [N=50: 177 us] [N=100: 314 us] [N=200: 566 us]
 9L= 3: [N=2:  54 us] [N=5:  58 us] [N=10:  76 us] [N=20:  95 us] [N=50: 166 us] [N=100: 288 us] [N=200: 535 us]
10L= 5: [N=2:  51 us] [N=5:  55 us] [N=10:  67 us] [N=20: 160 us] [N=50: 159 us] [N=100: 271 us] [N=200: 496 us]
11L=10: [N=2:  49 us] [N=5:  55 us] [N=10:  66 us] [N=20:  89 us] [N=50: 158 us] [N=100: 276 us] [N=200: 494 us]
12L=20: [N=2:  54 us] [N=5:  55 us] [N=10:  68 us] [N=20:  90 us] [N=50: 159 us] [N=100: 271 us] [N=200: 505 us]
13L=50: [N=2:  49 us] [N=5:  55 us] [N=10:  71 us] [N=20:  88 us] [N=50: 160 us] [N=100: 274 us] [N=200: 504 us]
14
15===== Label-set approach (calculate output candidate for each label, look it up in outputs) =====
16L= 1: [N=2:  51 us] [N=5:  47 us] [N=10:  48 us] [N=20:  51 us] [N=50:  59 us] [N=100:  77 us] [N=200: 116 us]
17L= 2: [N=2:  49 us] [N=5:  57 us] [N=10:  50 us] [N=20:  53 us] [N=50:  65 us] [N=100:  81 us] [N=200: 120 us]
18L= 3: [N=2:  49 us] [N=5:  51 us] [N=10:  50 us] [N=20:  56 us] [N=50:  63 us] [N=100:  82 us] [N=200: 118 us]
19L= 5: [N=2:  58 us] [N=5:  52 us] [N=10:  55 us] [N=20:  56 us] [N=50:  68 us] [N=100:  81 us] [N=200: 119 us]
20L=10: [N=2:  63 us] [N=5:  59 us] [N=10:  61 us] [N=20:  62 us] [N=50:  71 us] [N=100:  87 us] [N=200: 126 us]
21L=20: [N=2:  70 us] [N=5:  71 us] [N=10:  74 us] [N=20:  90 us] [N=50:  83 us] [N=100: 101 us] [N=200: 139 us]
22L=50: [N=2: 104 us] [N=5: 107 us] [N=10: 111 us] [N=20: 113 us] [N=50: 122 us] [N=100: 141 us] [N=200: 192 us]

These results should be taken with a large grain of salt, as the numbers fluctuate a lot (visible for the BIP approach, where all lines should have about the same results, as the performance is in theory independent of L; also, for the first two L the results are surprisingly worse, maybe some unintended caching effects for later runs?) but one can at least see the general previously suspected trend that the BIP approach gets worse with increasing N, while the label-set approach gets worse with increasing L. As a very rough first statement, I’d say assuming that users don’t use more than a handful (let’s say up to 10) labels and transactions with more outputs get more common in the future, using the label-set approach seems superior, and the only reason to keep the BIP approach is to target users that want to scan for a large number of (let’s say dozens) of labels.

Comments or ideas on how to best reason about this would be much appreciated (also a rough review of the modified benchmark example, it could also be that there are bugs). Maybe someone has an idea how to connect this with some actual historic on-chain stats? The list of L and N values to benchmark for can be adapted by changing the two arrays n_labels_bench and n_outputs_bench at the top of the example file.

If we can define a general rule that determines when it’s better to use the label-set scan versus the BIP-based scan, we could do something like:

Makes sense yeah, thought about something similar. What makes this a bit inconvenient (aside from additional review and maintenance burden, obviously) is that users would have to pass both the labels set and implement and pass in a call-back functions, which seems to result in a bloated API. Maybe it’s better to focus on one approach and ship another scan function for “giant label set” power users later? I could be wrong, but I don’t think using more than three or four labels would be a very common use-case.

w0xlt commented at 8:27 am on December 4, 2025: contributor

@theStack Thanks very much for this benchmark. It is very insightful.

From my understanding, your benchmark confirms that the label-set approach performs better overall and scales more smoothly with the transaction size N (number of outputs). For small N they are comparable, but as N grows the label-set approach becomes significantly faster than the BIP approach.

I also evaluated Montgomery batch inversion via secp256k1_ge_set_all_gej_var in the label-set scanning path on top of your commit. This seems to mainly help when there are many labels (large L). For L = 50 the label-set approach becomes noticeably faster still, so it dominates the BIP-style scanning for almost all (L, N) combinations in this benchmark. All runs are using the same “common case” scenario as in your example (one k iteration, no match).

https://github.com/w0xlt/secp256k1/commit/1e8d196e456de957e497275205e3e77df2b4f74b

Below is the benchmark commit as measured on my machine (no batch inversion):

 0===== BIP approach (calculate label candidates for each output, look them up in labels cache) =====
 1L= 1: [N=2: 351 us] [N=5: 200 us] [N=10: 231 us] [N=20: 261 us] [N=50: 289 us] [N=100: 451 us] [N=200: 745 us]
 2L= 2: [N=2:  81 us] [N=5: 109 us] [N=10:  82 us] [N=20: 153 us] [N=50: 218 us] [N=100: 379 us] [N=200: 631 us]
 3L= 3: [N=2:  55 us] [N=5:  59 us] [N=10:  75 us] [N=20:  99 us] [N=50: 185 us] [N=100: 299 us] [N=200: 586 us]
 4L= 5: [N=2:  52 us] [N=5:  58 us] [N=10:  70 us] [N=20:  90 us] [N=50: 171 us] [N=100: 302 us] [N=200: 514 us]
 5L=10: [N=2:  48 us] [N=5:  53 us] [N=10:  62 us] [N=20:  86 us] [N=50: 153 us] [N=100: 269 us] [N=200: 489 us]
 6L=20: [N=2:  44 us] [N=5:  52 us] [N=10:  64 us] [N=20:  86 us] [N=50: 147 us] [N=100: 267 us] [N=200: 481 us]
 7L=50: [N=2:  44 us] [N=5:  47 us] [N=10:  61 us] [N=20:  91 us] [N=50: 153 us] [N=100: 257 us] [N=200: 469 us]
 8
 9===== Label-set approach (calculate output candidate for each label, look it up in outputs) =====
10L= 1: [N=2:  39 us] [N=5:  38 us] [N=10:  37 us] [N=20:  38 us] [N=50:  41 us] [N=100:  48 us] [N=200:  64 us]
11L= 2: [N=2:  37 us] [N=5:  36 us] [N=10:  36 us] [N=20:  37 us] [N=50:  41 us] [N=100:  46 us] [N=200:  65 us]
12L= 3: [N=2:  40 us] [N=5:  36 us] [N=10:  37 us] [N=20:  39 us] [N=50:  43 us] [N=100:  49 us] [N=200:  75 us]
13L= 5: [N=2:  39 us] [N=5:  39 us] [N=10:  41 us] [N=20:  42 us] [N=50:  47 us] [N=100:  51 us] [N=200:  84 us]
14L=10: [N=2:  47 us] [N=5:  47 us] [N=10:  45 us] [N=20:  46 us] [N=50:  49 us] [N=100:  55 us] [N=200:  71 us]
15L=20: [N=2:  58 us] [N=5:  56 us] [N=10:  58 us] [N=20:  55 us] [N=50:  59 us] [N=100:  66 us] [N=200:  84 us]
16L=50: [N=2:  88 us] [N=5:  85 us] [N=10:  93 us] [N=20:  88 us] [N=50:  94 us] [N=100: 100 us] [N=200: 120 us]

The numbers after applying batch inversion:

0===== Label-set approach (calculate output candidate for each label, look it up in outputs) =====
1L= 1: [N=2:  38 us] [N=5:  37 us] [N=10:  37 us] [N=20:  37 us] [N=50:  41 us] [N=100:  48 us] [N=200:  61 us]
2L= 2: [N=2:  35 us] [N=5:  38 us] [N=10:  37 us] [N=20:  44 us] [N=50:  41 us] [N=100:  47 us] [N=200:  62 us]
3L= 3: [N=2:  35 us] [N=5:  38 us] [N=10:  39 us] [N=20:  38 us] [N=50:  46 us] [N=100:  46 us] [N=200:  66 us]
4L= 5: [N=2:  49 us] [N=5:  36 us] [N=10:  36 us] [N=20:  40 us] [N=50:  42 us] [N=100:  49 us] [N=200:  63 us]
5L=10: [N=2:  37 us] [N=5:  37 us] [N=10:  37 us] [N=20:  37 us] [N=50:  41 us] [N=100:  49 us] [N=200:  65 us]
6L=20: [N=2:  46 us] [N=5:  47 us] [N=10:  42 us] [N=20:  42 us] [N=50:  48 us] [N=100:  56 us] [N=200:  73 us]
7L=50: [N=2:  54 us] [N=5:  46 us] [N=10:  46 us] [N=20:  50 us] [N=50:  52 us] [N=100:  60 us] [N=200:  82 us]

The only code change here is using batch inversion in the label-set path.

theStack commented at 7:33 pm on December 4, 2025: contributor

Here is a first attempt at creating a scanning benchmark for the common case scenario (i.e. no matches and hence only one k iteration), comparing the “BIP” and “Label-set” approaches over a combination of L (number of labels to scan for) and N (number of tx outputs): https://github.com/theStack/secp256k1/commit/fef63fd40f2fad9e26eaf7145d9a38eaceb1e8ef

This is now available as an actual benchmark using the framework (i.e. implemented in the module’s bench_impl.h), leading to more stable results, especially for the lower-N cases (N<=10), which I’d consider to hit most frequently: https://github.com/theStack/secp256k1/commit/8eced6432bf8a6fd4dc968f846ca8373043d092b, see https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78?permalink_comment_id=5892266#gistcomment-5892266 for the results on my machine. Both approaches are now also benchmarked for no-labels scanning (i.e. L=0) and for the BIP approach, only L=1 is tested, as any higher values don’t have an influence on the run-time, if we assume that the label cache lookup cost is negligible. @w0xlt:

From my understanding, your benchmark confirms that the label-set approach performs better overall and scales more smoothly with the transaction size N (number of outputs). For small N they are comparable, but as N grows the label-set approach becomes significantly faster than the BIP approach.

I agree. It seems to me that, unless there is a really strong reasons to support use-cases with dozens of labels, we should consider switching from the BIP to the Label-set scanning approach. I’m not sure though if larger N values appear frequent enough in blocks nowadays to have a significant influence over the overall scanning time (I’d roughly guess that the vast majority of SP eligible txs wouldn’t exceed 10 outputs, though that should be verified), but that could change in the future.

I also evaluated Montgomery batch inversion via secp256k1_ge_set_all_gej_var in the label-set scanning path on top of your commit. This seems to mainly help when there are many labels (large L). For L = 50 the label-set approach becomes noticeably faster still, so it dominates the BIP-style scanning for almost all (L, N) combinations in this benchmark. All runs are using the same “common case” scenario as in your example (one k iteration, no match).

That’s great, didn’t review this code in detail yet but it seems to be simple enough to be considered on top, if we go with the label set scanning approach; the benchmarks also look very promising, will integrate them and run on my machine as well tomorrow.

in src/modules/silentpayments/main_impl.h:220 in ffffd7ff98 outdated

217+        ARG_CHECK(recipients[i]->index == i);
218+    }
219+
220+    seckey_sum_scalar = secp256k1_scalar_zero;
221+    for (i = 0; i < n_plain_seckeys; i++) {
222+        ret = secp256k1_scalar_set_b32_seckey(&addend, plain_seckeys[i]);

furszy commented at 9:31 pm on December 4, 2025:

In https://github.com/bitcoin-core/secp256k1/commit/ffffd7ff98368b29759cd3d9933896fb9fa69b1f:

what if plain_seckeys[i] is null? (same for taproot_seckeys[I] below).

theStack commented at 1:47 am on December 6, 2025:

Good catch, will add an ARG_CHECK to check for non-NULL, here and in other API functions (scanning) where similar checks for “array of pointers” elements are missing. I’ve also opened #1779 to do the same for API functions in master for consistency.

theStack commented at 0:29 am on December 13, 2025:

Introducing non-NULL checks for this PR actually paid off a lot, as they revealed two sneaky bugs:

in the code exercising the recipient test vectors (run_silentpayments_test_vector_receive), the found_outputs pointer array was set up incorrectly, as the init loop went only over the number of expected finds (num_found_output_pubkeys), rather than the full number of tx outputs (num_to_scan_outputs)
in the example, the same tx_outputs array of pointers is used twice for scanning (once for Bob, once for Carol); in the first call, tx outputs that have a match are marked with NULL, i.e. on the second call, those outputs were skipped for scanning. This went unnoticed as Carol is obviously not expected to find Bob’s outputs anyway (so in some sense one could say it’s not even that bad), but I would still say it’s a bug and we should setup the tx outputs pointer array again before the second call.

in src/modules/silentpayments/main_impl.h:311 in ffffd7ff98 outdated

308+         */
309+        if (k < UINT32_MAX) {
310+            k++;
311+        } else {
312+            return 0;
313+        }

furszy commented at 4:23 pm on December 5, 2025:

In ffffd7ff98368b29759cd3d9933896fb9fa69b1f:

As k depends on n_recipients, wouldn’t be simpler to ensure that n_recipients < UINT32_MAX early on the function?

0ARG_CHECK(n_recipients > 0 && n_recipients < UINT32_MAX);

theStack commented at 2:52 am on December 6, 2025:

Bounding the number of recipients seems reasonable to me (even though by protocol they could exceed the 32-bit range in theory, if they are not going all to the same scan pubkey, and we would at some point allow ~185 GB sized transactions :p). I wonder if we could even just set the type of n_recipients to uint32_t instead of size_t, so the limit is already enforced at compile-time? IIRC there were some previous discussions between @josibake and @jonasnick about that topic (in take 3 or 2 of this PR), will look those up next week before changing.

real-or-random commented at 8:18 am on December 11, 2025:

Bounding the number of recipients seems reasonable to me (even though by protocol they could exceed the 32-bit range in theory, if they are not going all to the same scan pubkey, and we would at some point allow ~185 GB sized transactions :p).

Since noone has a machine with infinite memory, any implementation of the protocol will need to have some limit. I think this simply shows that the protocol spec could be improved. There should be reasonable limits for all arrays. Sometimes they’re implied, e.g., if the size is serialization in a fixed-length integer, but sometimes they’re not. Even if they’re implied, it will be better to be explicit in the BIP and just state the limit.

This is something I learned from working on multiple BIPs that specify schemes with arrays or other containers. When you write the spec, you want it to be implementable on many platforms, so you try not to be too specific when it comes to choices like maximum sizes. But later, when you go ahead and actually implement the thing, you feel it would be great if the spec would just tell you the limit (because this would have prevented this entire discussion).

I wonder if we could even just set the type of n_recipients to uint32_t instead of size_t, so the limit is already enforced at compile-time? IIRC there were some previous discussions between @josibake and @jonasnick about that topic (in take 3 or 2 of this PR), will look those up next week before changing.

I can’t remember these discussions, but using uint32_t is the most natural thing to do. And then we should update the BIP to make that clear in the spec.

theStack commented at 2:43 am on December 12, 2025:

I wonder if we could even just set the type of n_recipients to uint32_t instead of size_t, so the limit is already enforced at compile-time? IIRC there were some previous discussions between @josibake and @jonasnick about that topic (in take 3 or 2 of this PR), will look those up next week before changing.

I can’t remember these discussions, but using uint32_t is the most natural thing to do. And then we should update the BIP to make that clear in the spec.

Fwiw the previous discussion I had in mind started here: #1698 (review) (see enumeration point 3. specifically). Seems that limiting n_recipients to unsigned 32 bit values via ARG_CHECKing was proposed (same as #1765 (review) above), but doing it by changing its type wasn’t considered yet. I agree that the BIP should state the limit for clarity and to avoid being out-of-spec, will take a look into that.

theStack commented at 6:04 pm on December 12, 2025:

Opened https://github.com/bitcoin/bips/pull/2055.

Eunovo commented at 7:26 am on December 17, 2025:

As k depends on n_recipients, wouldn’t be simpler to ensure that n_recipients < UINT32_MAX early on the function?

k doesn’t depend on n_recipients. It is reset to 0 anytime the recipient scan_pubkey changes. You can have 2^64-1 recipients that have significantly less scan_pubkeys, and this will be fine if k does not exceed UINT32_MAX.

theStack commented at 11:03 am on December 17, 2025:

As k depends on n_recipients, wouldn’t be simpler to ensure that n_recipients < UINT32_MAX early on the function?

k doesn’t depend on n_recipients. It is reset to 0 anytime the recipient scan_pubkey changes. You can have 2^64-1 recipients that have significantly less scan_pubkeys, and this will be fine if k does not exceed UINT32_MAX.

Right, I think the intended statement here was “the upper bound of k depends on n_recipients” (or “k is bounded by / can’t exceed n_recipients”).

Explicitly limiting n_recipients is not strictly needed, but I’d still say it makes a lot of sense. If we imagine a hypothetical scenario where someone has $\geq 2^{32}$ recipients (ignoring for a moment now that this is orders of magnitudes higher than what current consensus rules allow and thus will very likely never happen), it’s potentially confusing to the user that this can either succeed or fail, depending on how large the individual scan_pubkey groups are. An API doc comment like “sending to more than $2^{32}-1$ recipients that share the same scan public key is not allowed” and only verifying this at run-time is just more complex than “sending to more than $2^{32}-1$ recipients is not allowed”, and having that already avoided at compile-time due to the data type.

That said, if the proposed BIP change https://github.com/bitcoin/bips/pull/2055 doesn’t make it in, I think reintroducing the k < UINT32_MAX run-time checks before incrementing k isn’t the end of the world either (though it’s introducing a branch that we likely can’t test, which we usually try to avoid).

in include/secp256k1_silentpayments.h:48 in ffffd7ff98 outdated

43+ *  be used.
44+ */
45+typedef struct secp256k1_silentpayments_recipient {
46+    secp256k1_pubkey scan_pubkey;
47+    secp256k1_pubkey spend_pubkey;
48+    size_t index;

furszy commented at 5:04 pm on December 5, 2025:

q: Instead of having this weird index field in the public API, wouldn’t be simpler to include the generated output directly inside secp256k1_silentpayments_recipient? E.g.

0struct secp256k1_silentpayments_recipient {
1    /* Inputs */
2    secp256k1_pubkey in_scan_pubkey;
3    secp256k1_pubkey in_spend_pubkey;
4    /* Output */
5    secp256k1_xonly_pubkey out_generated_output;
6}

theStack commented at 2:34 am on December 6, 2025:

I think this change would be slightly simpler for the implementation, but possibly quite confusing for the user. Note that the passed in _silentpayments_recipient array gets sorted in place (to group by scan pubkey), so we need some way to map the generated outputs back to the original recipients. Doing this by just keeping the original order (that’s what the index field is needed for) seems the most obvious (expected?) way. Admittedly with your proposed change an user would have the necessary information to do the mapping themselves, but it seems odd to e.g. pass in two recipients (r1, r2), get the result back in a different order (r2, r1) and then having to find out again at what position each recipient has moved, in order to get the corresponding generated output each.

furszy commented at 4:36 pm on December 6, 2025:

Admittedly with your proposed change an user would have the necessary information to do the mapping themselves, but it seems odd to e.g. pass in two recipients (r1, r2), get the result back in a different order (r2, r1) and then having to find out again at what position each recipient has moved, in order to get the corresponding generated output each.

If you have all the information, does the initial ordering matter?

================================================================================

Also, I woke up creative.. another idea to remove the index field could be to store the initial ordering inside the generated_outputs array. We basically have a n_recipients size array with elements that are essentially buffers :). So.. we could do something like:

  0diff --git a/src/modules/silentpayments/main_impl.h b/src/modules/silentpayments/main_impl.h
  1--- a/src/modules/silentpayments/main_impl.h	(revision ffffd7ff98368b29759cd3d9933896fb9fa69b1f)
  2+++ b/src/modules/silentpayments/main_impl.h	(date 1765038514517)
  3@@ -185,13 +185,14 @@
  4     const unsigned char * const *plain_seckeys,
  5     size_t n_plain_seckeys
  6 ) {
  7-    size_t i, k;
  8+    size_t i, k, j;
  9     secp256k1_scalar seckey_sum_scalar, addend, input_hash_scalar;
 10     secp256k1_ge prevouts_pubkey_sum_ge;
 11     secp256k1_gej prevouts_pubkey_sum_gej;
 12     unsigned char shared_secret[33];
 13     secp256k1_pubkey current_scan_pubkey;
 14     int ret, sum_is_zero;
 15+    secp256k1_xonly_pubkey *output = NULL;
 16 
 17     /* Sanity check inputs. */
 18     VERIFY_CHECK(ctx != NULL);
 19@@ -211,9 +212,6 @@
 20     } else {
 21         ARG_CHECK(n_plain_seckeys == 0);
 22     }
 23-    for (i = 0; i < n_recipients; i++) {
 24-        ARG_CHECK(recipients[i]->index == i);
 25-    }
 26 
 27     seckey_sum_scalar = secp256k1_scalar_zero;
 28     for (i = 0; i < n_plain_seckeys; i++) {
 29@@ -268,6 +266,13 @@
 30         return 0;
 31     }
 32     secp256k1_scalar_mul(&seckey_sum_scalar, &seckey_sum_scalar, &input_hash_scalar);
 33+
 34+    /* Store recipients ordering prior to sorting the array */
 35+    for (i = 0; i < n_recipients; i++) {
 36+        /* Use the output struct as a buffer for storing the pointer of the recipient */
 37+        memcpy(generated_outputs[i]->data, &recipients[i], sizeof(secp256k1_silentpayments_recipient*));
 38+    }
 39+
 40     /* _recipient_sort sorts the array of recipients in place by their scan public keys (lexicographically).
 41      * This ensures that all recipients with the same scan public key are grouped together, as specified in BIP0352.
 42      *
 43@@ -294,7 +299,17 @@
 44             secp256k1_silentpayments_create_shared_secret(ctx, shared_secret, &pk, &seckey_sum_scalar);
 45             k = 0;
 46         }
 47-        if (!secp256k1_silentpayments_create_output_pubkey(ctx, generated_outputs[recipients[i]->index], shared_secret, &recipients[i]->spend_pubkey, k)) {
 48+
 49+        /* Look-up for the correct output to initialize based on the initial ordering */
 50+        for (j = 0; j < n_recipients; j++) {
 51+            secp256k1_silentpayments_recipient *ptr = NULL;
 52+            memcpy(&ptr, generated_outputs[j]->data, sizeof(secp256k1_silentpayments_recipient*));
 53+            if (ptr == recipients[i]) {
 54+                output = generated_outputs[j];
 55+            }
 56+        }
 57+
 58+        if (!secp256k1_silentpayments_create_output_pubkey(ctx, output, shared_secret, &recipients[i]->spend_pubkey, k)) {
 59             secp256k1_scalar_clear(&seckey_sum_scalar);
 60             secp256k1_memclear_explicit(&shared_secret, sizeof(shared_secret));
 61             return 0;
 62
 63===================================================================
 64diff --git a/include/secp256k1_silentpayments.h b/include/secp256k1_silentpayments.h
 65--- a/include/secp256k1_silentpayments.h	(revision ffffd7ff98368b29759cd3d9933896fb9fa69b1f)
 66+++ b/include/secp256k1_silentpayments.h	(date 1765038119367)
 67@@ -45,7 +45,6 @@
 68 typedef struct secp256k1_silentpayments_recipient {
 69     secp256k1_pubkey scan_pubkey;
 70     secp256k1_pubkey spend_pubkey;
 71-    size_t index;
 72 } secp256k1_silentpayments_recipient;
 73 
 74 /** Create Silent Payments outputs for recipient(s).
 75
 76
 77===================================================================
 78diff --git a/src/modules/silentpayments/tests_impl.h b/src/modules/silentpayments/tests_impl.h
 79--- a/src/modules/silentpayments/tests_impl.h	(revision ffffd7ff98368b29759cd3d9933896fb9fa69b1f)
 80+++ b/src/modules/silentpayments/tests_impl.h	(date 1765038518151)
 81@@ -98,7 +98,6 @@
 82     for (i = 0; i < 3; i++) {
 83         CHECK(secp256k1_ec_pubkey_parse(CTX, &recipients[i].scan_pubkey, (*sp_addresses[i])[0], 33));
 84         CHECK(secp256k1_ec_pubkey_parse(CTX, &recipients[i].spend_pubkey,(*sp_addresses[i])[1], 33));
 85-        recipients[i].index = i;
 86         recipient_ptrs[i] = &recipients[i];
 87         generated_output_ptrs[i] = &generated_outputs[i];
 88     }
 89@@ -174,7 +173,6 @@
 90         CHECK(secp256k1_ec_pubkey_parse(CTX, &r[i].scan_pubkey, (*sp_addresses[i])[0], 33));
 91         CHECK(secp256k1_ec_pubkey_parse(CTX, &r[i].spend_pubkey,(*sp_addresses[i])[1], 33));
 92         /* Set the index value incorrectly */
 93-        r[i].index = 0;
 94         rp[i] = &r[i];
 95         op[i] = &o[i];
 96     }
 97@@ -183,13 +181,6 @@
 98     t[0] = &taproot;
 99     p[0] = ALICE_SECKEY;
100 
101-    /* Fails if the index is set incorrectly */
102-    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_sender_create_outputs(CTX, op, rp, 2, SMALLEST_OUTPOINT, NULL, 0, p, 1));
103-
104-    /* Set the index correctly for the next tests */
105-    for (i = 0; i < 2; i++) {
106-        r[i].index = i;
107-    }
108     CHECK(secp256k1_silentpayments_sender_create_outputs(CTX, op, rp, 2, SMALLEST_OUTPOINT, NULL, 0, p, 1));
109 
110     /* Check that null arguments are handled */
111@@ -257,7 +248,6 @@
112         CHECK(secp256k1_ec_pubkey_negate(CTX, &neg_spend_pubkey));
113         r[0].spend_pubkey = neg_spend_pubkey;
114         for (i = 0; i < 2; i++) {
115-            r[i].index = i;
116             rp[i] = &r[i];
117         }
118         CHECK(secp256k1_silentpayments_sender_create_outputs(CTX, op, rp, 2, SMALLEST_OUTPOINT, NULL, 0, p, 1) == 0);

That being said, this improves the public API at the cost of making the internal code a bit more complex. So I’m not strong on it — I just had some fun thinking about different ways to remove the ugly index field. So no need to take it.

theStack commented at 3:58 am on December 7, 2025:

If you have all the information, does the initial ordering matter?

In theory it doesn’t, in practice I think some users would be at least surprised or annoyed by getting the generated outputs back in an order that is different from the passed in recipients (some might even lose money by confusing the amounts for e.g. the actual recipient and the change output).

Also, I woke up creative.. another idea to remove the index field could be to store the initial ordering inside the generated_outputs array. We basically have a n_recipients size array with elements that are essentially buffers :).

That’s a neat and indeed very creative idea 😃. (Ab)using a pubkey object’s underlying memory for temporarily storing a pointer value feels very hacky though, so not sure if we want to introduce these kind of tricks in general. On the other hand, simplifying the API would be indeed a win, so I’m not fully opposed either. Curious what others think about this.

real-or-random commented at 9:29 am on December 11, 2025:

That’s a neat and indeed very creative idea 😃. (Ab)using a pubkey object’s underlying memory for temporarily storing a pointer value feels very hacky though, so not sure if we want to introduce these kind of tricks in general. On the other hand, simplifying the API would be indeed a win, so I’m not fully opposed either. Curious what others think about this.

My thinking is that I’m slightly in favor. This is not as crazy as it may look. The data member of the pubkey is even unsigned char, so this should be perfectly legal in C. (If people have doubts, we could even convert to intptr_t first and then read out the individual chars from an actual integer type.)

I have thought of getting rid of index, including hacky ways, but somehow I didn’t see this way.

edit: But it will be nice to hear other people’s opinions. There was a lot of discussion about the thing already. I can imagine that others had found this approach but rejected it for various reasons.

theStack commented at 2:56 am on December 12, 2025:

Glad that there is support for this idea. One potentially drawback I only notice now is that finding the output pointer is done via a loop over n_recipients, which might come with a performance loss for a larger numbers of recipients, as the nested loops run in $O(n^2)$ time. Maybe that can be avoided as well somehow?

real-or-random commented at 8:02 am on December 12, 2025:

Oh, indeed. I don’t think it can be avoided (but I’m happy to be proven wrong).

Hm, then my current feeling is that we should keep what we have. Avoiding a bit of (harmless!) API inconvenience at the cost of a bit of added internal complexity seems to be a reasonable tradeoff with users in mind. But doing it at the cost of increasing the complexity seems wrong.

edit: Still curious to hear what the others think.

in src/modules/silentpayments/main_impl.h:270 in ffffd7ff98 outdated

267+     */
268+    if (!secp256k1_silentpayments_calculate_input_hash_scalar(&input_hash_scalar, outpoint_smallest36, &prevouts_pubkey_sum_ge)) {
269+        secp256k1_scalar_clear(&seckey_sum_scalar);
270+        return 0;
271+    }
272+    secp256k1_scalar_mul(&seckey_sum_scalar, &seckey_sum_scalar, &input_hash_scalar);

furszy commented at 7:00 pm on December 5, 2025:

In https://github.com/bitcoin-core/secp256k1/commit/ffffd7ff98368b29759cd3d9933896fb9fa69b1f:

This is not realistically possible but.. should also check that the input hash scalar is not the inverse of the aggr sk (aka the result is not 1).

theStack commented at 3:39 am on December 6, 2025:

I’m trying to wrap my head around on what the exact implications would be if that indeed happened. If $a * inputhash = 1$,the shared secret would then be equal to the recipient spend public key $B_{scan}$. That sounds problematic, I suppose it’s a loss of privacy as the output tweaks are revealed, but the funds are still safe? (Wouldn’t any other small-ish value also be problematic, as guessing $2* B_{scan}, 3*B_{scan}$ etc. is trivial as well?).

real-or-random commented at 10:27 am on December 11, 2025:

In ffffd7f:

This is not realistically possible but.. should also check that the input hash scalar is not the inverse of the aggr sk (aka the result is not 1).

Hm, what should go wrong? 1 is a valid secret key for ECDH.

I’m trying to wrap my head around on what the exact implications would be if that indeed happened. If $a * inputhash = 1$,the shared secret would then be equal to the recipient spend public key $B_{scan}$. That sounds problematic, I suppose it’s a loss of privacy as the output tweaks are revealed, but the funds are still safe? (Wouldn’t any other small-ish value also be problematic, as guessing $2* B_{scan}, 3*B_{scan}$ etc. is trivial as well?).

No, I think this is a (common) misconception. With this argument, any value would be bad. If the secret key was 1784a975e6545c8cdb932dd228de6d6c, wouldn’t this be bad. If the attacker guessed that value, then privacy would be lost?

Small values will be easier to guess for attackers that start guessing at 0 (or 1). But there’s no point in assuming a specific attacker. What matters here to the sender is just that it will be sufficiently unpredictable (because the raw shared secret goes into a hash), i.e., it has sufficient entropy from the attacker’s perspective.

I think what adds to this misconception is that the situation is more complex for “human-chosen” random, e.g., passwords. It’s bad to use “hamster” as a password because it’s an actual word, and so humans are more likely to pick it. Real attackers make use of that fact. So even if the space of allowed passwords is huge, letting a person pick one may yield enough entropy (depending on the person). The point here is that the process of generating a password is not clearly specified and depends on the person.

But cryptography is usually simpler. We know the exact process of generating a secret key, and so we know the exact distribution of secret keys. Typically, secret keys are just chosen uniformly at random (and we’ll often want this stronger requirement instead of just “sufficient entropy”, e.g., in a one-time pad). Ruling out values such as 1 only removes entropy (though ruling out one value makes only a negligible difference, of course.)

A valid reason to rule out specific values would not be that the attacker can guess them but to avoid creating problems in further computations. Sometimes that’s the case. For example, we typically rule out 0 as a secret key because we’d like to avoid the special casing in (de)serializing the point at infinity (and other corner cases). Sometimes corner values can make the protocol. But I don’t see a problem with a 1 here. (But maybe we’re overlooking something, and @furszy sees a valid problem.)

In this specific case, we’re dealing with a derived value and not with one chosen uniformly at random. But as I said above, it can be argued that it has sufficient entropy: assuming that the individual summed-up keys are independent, if one of the individual summed-up keys is uniformly random, then the sum is. Then we multiply with a hash, so the product is also uniformly random. (So even if 0 or 1 are problematic values, they would occur only with negligible probability.)

If you want to be fully formal, you’d need to take into account that the hash value depends on this sum (the sum pubkey goes into the hash and the attacker can even know this pubkey sum). So in theory, you could, for example, have a hash function that always maps to the inverse. This won’t happen for SHA256 obviously, and if you assume the ROM, you can make that formal.

In fact, even a malicious sender could not force 0 or 1. This is like a Fiat-Shamir construction. The sender will need to pick the pubkey sum (and thus the seckey sum) before getting the hash value, so it’s not feasible to pick the seckey sum depending on the hash value.

theStack commented at 7:08 pm on December 16, 2025:

@real-or-random: Thanks for elaborating, that was very helpful for understanding! Thoughts of “why would 1 be any different from any other value” crossed my mind before, but I still had doubts. It’s clear now where the misconception came from, and I’m convinced that a check is not needed here.

in src/modules/silentpayments/main_impl.h:169 in ffffd7ff98 outdated

166+     * error anyway to protect against this function being called with a malicious inputs, i.e., spend_pubkey = -(_create_output_tweak(shared_secret33, k))*G
167+     */
168+    if (!secp256k1_eckey_pubkey_tweak_add(&output_ge, &output_tweak_scalar)) {
169+        secp256k1_scalar_clear(&output_tweak_scalar);
170+        return 0;
171+    };

furszy commented at 7:22 pm on December 5, 2025:

In https://github.com/bitcoin-core/secp256k1/commit/ffffd7ff98368b29759cd3d9933896fb9fa69b1f:

Extra semicolon here.

furszy commented at 7:33 pm on December 5, 2025: member

A bit late but joining this party full-time now. I just finished the sending part. Will keep reviewing. Cool stuff.

in src/modules/silentpayments/tests_impl.h:157 in ffffd7ff98 outdated

152+     */
153+    sp_outputs[0] = &BOB_OUTPUT;
154+    sp_outputs[1] = &CAROL_OUTPUT_TWO;
155+    sp_outputs[2] = &CAROL_OUTPUT_ONE;
156+    test_recipient_sort_helper(sp_addresses, sp_outputs);
157+}

furszy commented at 1:35 am on December 7, 2025:

In ffffd7ff98368b29759cd3d9933896fb9fa69b1f:

The heapsort comment was a bit misleading to me. It only matters because we’re hardcoding the expected outputs, but it’s not something we actually care about.

The goal of the test is to make sure that shuffling the inputted recipients never changes which outputs are produced, and that each expected output appears exactly once.

So we don’t need to rely on any final ordering assumptions at all; they depend on heapsort’s stability, which (per comment) is unstable. Take the following if you like it:

 0static void shuffle(unsigned char (*sp_addresses[])[2][33], size_t size) {
 1    size_t i, j;
 2    for (i = size - 1; i > 0; i--) {
 3        unsigned char (*tmp)[2][33] = sp_addresses[i];
 4        j = testrand_bits(8) % (i + 1);
 5        sp_addresses[i] = sp_addresses[j];
 6        sp_addresses[j] = tmp;
 7    }
 8}
 9
10/* Ensure that shuffling the inputted recipients never changes
11 * which outputs are produced, and that each expected output
12 * appears exactly once */
13static void test_recipient_sort(void) {
14    int i, j;
15    unsigned char (*sp_addresses[3])[2][33] = { &CAROL_ADDRESS, &CAROL_ADDRESS, &BOB_ADDRESS };
16    unsigned char const *seckey_ptrs[1] = { ALICE_SECKEY };
17    unsigned char xonly_ser[32];
18
19    secp256k1_silentpayments_recipient recipients[3];
20    const secp256k1_silentpayments_recipient *recipient_ptrs[3];
21    secp256k1_xonly_pubkey generated_outputs[3];
22    secp256k1_xonly_pubkey *generated_output_ptrs[3];
23
24    for (i = 0; i < 6; i++) {
25        int pos_bob = -1, pos_carol_one = -1, pos_carol_two = -1;
26
27        /* Randomize array */
28        shuffle(sp_addresses, 3);
29
30        for (j = 0; j < 3; j++) {
31            CHECK(secp256k1_ec_pubkey_parse(CTX, &recipients[j].scan_pubkey, (*sp_addresses[j])[0], 33));
32            CHECK(secp256k1_ec_pubkey_parse(CTX, &recipients[j].spend_pubkey,(*sp_addresses[j])[1], 33));
33            recipients[j].index = j;
34            recipient_ptrs[j] = &recipients[j];
35            generated_output_ptrs[j] = &generated_outputs[j];
36        }
37        CHECK(secp256k1_silentpayments_sender_create_outputs(CTX, generated_output_ptrs,
38                                                             recipient_ptrs, 3,
39                                                             SMALLEST_OUTPOINT,
40                                                             NULL, 0,
41                                                             seckey_ptrs, 1) == 1);
42
43        /* Verify output correctness and they all appear once */
44        for (j = 0; j < 3; j++) {
45            int* pos = NULL;
46            CHECK(secp256k1_xonly_pubkey_serialize(CTX, xonly_ser, &generated_outputs[j]));
47            if (secp256k1_memcmp_var(xonly_ser, BOB_OUTPUT, 32) == 0) {
48                pos = &pos_bob;
49            } else if (secp256k1_memcmp_var(xonly_ser, CAROL_OUTPUT_ONE, 32) == 0) {
50                pos = &pos_carol_one;
51            } else if (secp256k1_memcmp_var(xonly_ser, CAROL_OUTPUT_TWO, 32) == 0) {
52                pos = &pos_carol_two;
53            } else {
54                TEST_FAILURE("Error: unknown generated output");
55            }
56            CHECK(*pos == -1); /* enforce uniqueness */
57            *pos = j;
58        }
59    }
60}

theStack commented at 4:03 am on December 7, 2025:

Thanks for taking a closer look there. I agree this could be improved.

The goal of the test is to make sure that shuffling the inputted recipients never changes which outputs are produced, and that each expected output appears exactly once.

So we don’t need to rely on any final ordering assumptions at all; they depend on heapsort’s stability, which (per comment) is unstable.

Note that order of the generated outputs does indeed matter and should be tested. The sending API as-is guarantees that for a passed in recipient at position i (recipients[i]), the generated output at the same position (generated_outputs[i]) can be found (and hence be spent later) by that exact recipient. The only thing not guaranteed due to unstable heap sort is the order in which the increasing k values are used for generating those outputs, for recipients that share the same scan public key.

For example, let’s say we have three recipients, leading to three outputs: r0 = (scan_pk, spend_pk_A) -> tx_out0 r1 = (scan_pk, spend_pk_B) -> tx_out1 r2 = (scan_pk, spend_pk_C) -> tx_out2

These three recipients are all in one group, sharing the same ecdh_shared_secret (see BIP-352). The output tweak $t_k$ for creating depends on that shared secret and a counter k that is increased for each spend public key in the group, starting with k=0. Due to heap sort being unstable, the order in which these spend public keys are picked from the group to create outputs is not guaranteed, i.e. it might be that spend_pk_C is used first to create the output with k=0 (-> assigned to tx_out2), then spend_pk_A with k=1 (-> assigned to tx_out0) and lastly spend_pk_B with k=2 (-> assigned to tx_out1). For a recipient the k value that was used for output generation doesn’t matter, they will still find them if they include all other tx outputs and follow the scanning protocol correctly. Note that assigning the generated outputs at the intended position (matching the recipient position) is done using the index field in the recipient data structure (see also previous related comment #1765 (review)).

So in the quoted test code, CAROL_OUTPUT_ONE means “output to carol, generated with k=0” and CAROL_OUTPUT_TWO means “output to carol, generated with k=1” accordingly. If sp_addresses[i] is assigned to BOB_ADDRESS, sp_outputs[i] must be BOB_OUTPUT (this case is unambiguous, the bob’s scan pubkey appears only once in the recipient list). If sp_adresses[i] is assigned to CAROL_ADDRESS, so_outputs[i] must be either CAROL_OUTPUT_ONE or CAROL_OUTPUT_TWO (that scan pubkey appears twice in the recipient list, so the order within those is ambiguous).

I will give this some more thoughts on how to better express this in the test, suggestions of course welcome.

furszy commented at 3:30 pm on December 7, 2025:

I did this with my brain based on the other comment; the one I’m saying “If you have all the information, does the initial ordering matter?” (https://github.com/bitcoin-core/secp256k1/pull/1765#discussion_r2595102919). I just read your response there. Thanks.

We can easily adapt the generalized test to verify that the initial recipients’ ordering matches the final outputs order. We just need to cache their initial positions during creation (post-shuffle), then check that Bob stays in the same spot and that the two Carols appear exactly once, working around the heapsort instability. That is simple to do.

Overall, this latest test covers strictly more cases than the previous one. And with this approach, we can also easily confirm we’re not biased by the fixed number of outputs we currently have (right now we only verify arrays of length 3). E.g. it would be nice to test against arrays of length two as well. Also, something very important, it would be good to test k is increased for a second output from Bob when there are two outputs for Carol (If k is only ever increased and not reset or something unexpected happens, we do have a major issue). I’m not expecting this ^^ to fail, it is just test coverage we should have.

in include/secp256k1_silentpayments.h:150 in 913fdee7e1 outdated

145+ *  In:           scan_key32: pointer to the recipient's 32 byte scan key
146+ *                         m: integer for the m-th label (0 is used for change outputs)
147+ */
148+SECP256K1_API SECP256K1_WARN_UNUSED_RESULT int secp256k1_silentpayments_recipient_create_label(
149+    const secp256k1_context *ctx,
150+    secp256k1_pubkey *label,

furszy commented at 5:16 pm on December 8, 2025:

I’m unsure about the label use-case so far but leaving that topic aside (will keep thinking about it),

I’m wondering if we could use a typedef for the label here instead of relying on the public key structure. At least for me, public keys are meant for sending coins to, and we don’t want anyone sending coins to the group element label = hash(scan_key || m) * G. Clarifying the intent with a simple typedef might save us a few headaches.

theStack commented at 6:11 pm on December 9, 2025:

The idea of avoiding the pubkey type for labels has some merit, though I think that introducing a typedef alone would not completely solve the problem. In order to (de)serialize labels (needed to e.g. create entries for the labels cache, or load labels from wallet backups), a user has to call the secp256k1_pubkey_{serialize,parse} functions, i.e. the “pubkey” terminology would still show up and be mixed with _silentpayments_label types (which might even be more confusing for users). Should we introduce aliases for these functions as well? Or, maybe having the typedef alone would still be better than not having it (with proper documentation), will give this some more thoughts.

real-or-random commented at 1:11 pm on December 11, 2025:

I think that is a great point. My current thinking is that we should make an entirely new type including serialization functions.

The pubkey type is a bit stretched sometimes. If we could redesign the entire library from scratch, I’d seriously consider having entirely separate pubkey types for all schemes. You can (ab)use an ECDSA pubkey to perform ECDH, but neither ECDH nor ECDSA have been designed to do this. Usually these things are okay, but they create plenty of headaches in theory, and maybe a careful API should, at least, ask its users to convert their keys explicitly. The only real separation we currently have is that for Schnorr sigs but this is because they use x-only keys.

But even if we try to be in line with the existing code base, a label is indeed a bit far from a pubkey, or even a key: there is not even a corresponding secret key (or secret label). Sure, the underlying group element has some discrete logarithm, but what I want to say is that it doesn’t have a meaning in the API of the scheme.

theStack commented at 5:53 pm on December 16, 2025:

I think that is a great point. My current thinking is that we should make an entirely new type including serialization functions.

Good idea. I was initially a bit reluctant to add a new type and more API functions, but now I’m quite happy with the secp256k1_silentpayments_label type (see latest force-push), and think it’s conceptually the right approach to have a clear separation. I’ve added public API functions for both parsing and serializing labels. In theory, only serializing is really needed right now for users (to create the labels cache), but I thought it’s better to add both for consistency. Could still remove the parsing function if reviewers feel it fits better into a future module extension. It doesn’t hurt to have it now, but it might become more relevant in practice if the “LabelSet” scanning approach is implemented.

But even if we try to be in line with the existing code base, a label is indeed a bit far from a pubkey, or even a key: there is not even a corresponding secret key (or secret label). Sure, the underlying group element has some discrete logarithm, but what I want to say is that it doesn’t have a meaning in the API of the scheme.

Unless I’m misinterpreting your comment, I think the discrete logarithm of a label’s underlying group element is indeed quite important in the API: the corresponding “secret label” is what we call the label tweak, returned together with the label in _silentpayments_recipient_label_create, and is needed later for spending a received payment to a labeled address. The label cache is a mapping of label to label tweak, so it’s also reflected in the scanning API, where a callback function has to be provided doing a lookup in data structure (i.e. returning a label tweak, if a given label is found).

real-or-random commented at 8:26 am on December 17, 2025:

Unless I’m misinterpreting your comment, I think the discrete logarithm of a label’s underlying group element is indeed quite important in the API:

Yes, sorry, I was wrong here. It plays a role in the API, but it’s not meant to be “secret”. But I think my main point still holds up: calling label stuff a “key” is misleading. There’s no authorization or secrecy aspect to it.

real-or-random referenced this in commit be5e4f02fd on Dec 10, 2025

theStack commented at 7:15 am on December 11, 2025: contributor

Thinking a bit more about the “LabelSet” scanning approach and particularly on how to avoid the repeated x-only pubkey serialization in the compare and binary search finding functions (as pointed out in #1765 (comment)), I’ve come to realize that the tx outputs could simply be provided as already serialized x-only pubkeys (raw 32-bytes, directly extracted from a taproot output’s scriptPubKey), as we don’t do any elliptic curve operations with them (in contrast to the “BIP” approach). That interface change gets rids of repeated internal serialization and, maybe even more important for overall performance, avoids the need of preparatory secp256k1_xonly_pubkey_parse calls from the user for each tx output, costing a field element square root calculation each (in order to recover the y coordinate, via $y = \sqrt(x^3 + 7)$ ). I’ve updated the benchmark with that method, adding the xonly-pubkey parsing calls prior to scanning for the BIP benchmark, in order to have a fair comparison: https://github.com/theStack/secp256k1/commit/ab201ed32b5b326d7a065ac0bcaab52b09693ecd (this still doesn’t include the batch inversion patch from @w0xlt yet).

While it’s nice to have a faster implementation for small-scale users, I’m still unsure how to proceed now for this PR. No matter how much we optimize the “LabelSet” approach, it will always be at some point worse for use-cases with larger amount of labels. I’ve e.g. been made aware that for use-cases like “proof of payments”, having tens of thousands of labels doesn’t even seem unlikely. In an ideal world we would probably implement both approaches and choose whatever is faster, either automatically with a single API function (as prototyped in https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78?permalink_comment_id=5897811#gistcomment-5897811) or by providing two different API functions, with good documentation that recommends when to use which one, depending on the L and N parameters.

We could either:

only release the “BIP” scanning approach for now (that’s the current state of this PR), with the drawback that it doesn’t provide the best possible performance for small-scale users (only a handful of labels) yet
only release the “LabelSet” scanning approach, with the drawback that it’s practically unusable for power-users with a large amount of labels, as each additional label would degrade scanning performance
release both already (best for the user for maximum performance, but increases complexity)

I’d advocate for options 1 or 3 (being now aware that having lots of labels could be much more common than I thought, option 2 doesn’t seem to be a good idea, as it would make the module practically unusable for a certain user group), with a tendency to option 1 to keep it simple.

To take a step back, the original goal of looking deeper into alternative scanning approaches was fixing the “worst-case scanning attack”, caused by a single 1vMB transaction full of outputs that all go to a single recipient group (i.e. all having the same scan pubkey), leading to a scanning time of several minutes. The “LabelSet” scanning approaches fixes this attack for users with small amounts of labels nicely, but for larger amounts of labels the “BIP” scanning approach would still be used, which I think suffers from this problem inherently due to nested looping over the transaction outputs.

Circling back to what I stated earlier:

I’m hence debating on whether one solution could be to keep the PR simply as-is and accept the fact that an adversary can slow down a single entity for a few minutes (for non-standard transactions), to a few seconds (for standard transactions, i.e. tx size <= 100kvB).

in src/modules/silentpayments/main_impl.h:98 in ffffd7ff98 outdated

 95+    secp256k1_declassify(ctx, &ss, sizeof(ss));
 96+    /* This can only fail if the shared secret is the point at infinity, which should be
 97+     * impossible at this point considering we have already validated the public key and
 98+     * the secret key.
 99+     */
100+    secp256k1_eckey_pubkey_serialize33(&ss, shared_secret33);

real-or-random commented at 10:32 am on December 11, 2025:

Now that I wrote this other long comment above, I’m not sure if this argument is confusing here. The shared secret could, in theory, be the point at infinity, no? I mean, it will happen only with negligible probability even for malicious senders, so I don’t see how it would be a problem. But the comment here says that we have validated the public key and secret key. This is true. We have validated the sum secret key to be not 0, but this validation happened before we multiply a hash, and the hash could be zero. Okay, it seems that we also check that the hash is not zero in secp256k1_silentpayments_calculate_input_hash_scalar Okay, I guess the code is perfectly fine, but the comment could be made more precise.

theStack commented at 6:03 pm on December 16, 2025:

Reworked the comment to call the involved parts of the point multiplication “public component” and “secret component” (not entirely sure if these are good names, but at least they are consistent with the function’s current parameter names), with a proper explanation what they reflect at the sender and receiver call-sites, respectively. Now that I think about it, another option might be to VERIFY_CHECK both of the group element and the scalar inputs to be non-infinity/non-zero and have proper comments at each call-site.

real-or-random commented at 8:41 am on December 17, 2025:

another option might be to VERIFY_CHECK both of the group element and the scalar inputs to be non-infinity/non-zero and have proper comments at each call-site.

That sounds very clean, yes.

w0xlt commented at 9:25 pm on December 11, 2025: contributor

As I understand it, although the LabelSet approach was originally proposed to avoid worst-case performance issues, an interesting side effect is that it gives the user more control and predictability over scanning costs: runtime depends more on the number of labels (which the wallet controls) and less on the number of P2TR outputs in the block (which the wallet cannot control).

The LabelSet approach also scales more smoothly with respect to the number of labels than the BIP approach scales with respect to the number of outputs, as shown in the benchmarks:

0Benchmark                               ,    Min(us)    ,    Avg(us)    ,    Max(us)
1sp_full_scan_BIP-algo_L=50_N=50         ,   138.0       ,   139.0       ,   139.0
2sp_full_scan_LabelSet-algo_L=50_N=50    ,    54.2       ,    54.3       ,    54.6

The hybrid approach gets the best of both worlds and matches the faster of the two algorithms.

That said, I agree that the BIP is the easiest and most thoroughly reviewed solution (and the safest to adopt). The LabelSet approach appears to offer the best default performance per the benchmark but introduces new assumptions and requires additional review. The hybrid approach is likely the best engineering solution, though it comes at the cost of greater complexity and review effort.

theStack force-pushed on Dec 16, 2025

theStack commented at 5:02 pm on December 16, 2025: contributor

Thanks for all the review comments so far, very helpful!

Force-pushed with the following changes:

added non-NULL ARG_CHECKs for “array of pointer” parameters, including API tests (as done in #1779 for existing functions)
fixed two “array of pointer” initialization bugs that were uncovered with these checks, one in the receiving test vectors code, one in the example (see #1765 (review) for details)
switched the data type for parameters regarding “number of recipients” (index and n_recipients) from size_t to uint32_t and removed the now obsolete “k exceeds UINT32_MAX” error handling cases accordingly (see also the suggested BIP change: https://github.com/bitcoin/bips/pull/2055)
introduced a new secp256k1_silentpayments_label type, including (de)serialization API functions (as suggested in #1765 (review))
deterministic sorting of recipients: modified our compare function to use the index as tie-breaker (stabilizes the unstable heap sort result): for the end user, this doesn’t matter much, but having a well-defined determinstic order helps reasoning and simplifies our testing code, and removes some potentially confusing “don’t rely on the order” comments (kudos to @w0xlt who proposed this in a different context a longer time ago for tackling the “worst-case scanning attack”, see #1765 (comment))
reworked comment in _create_shared_secret w.r.t. the point multiplication result not being infinity (referring to both the sender and receiver call-sites), see #1765 (review)
updated the benchmark commit to remove “silentpayments_output_scan” from help texts (only relevant for light-clients), adapted the error messages to master (see #1778)
various smaller fixes in the API docs and the impl. comments about untestable code branches (e.g. s/greater than or equal to the curve order/zero or greater than or equal to the curve order)

Regarding #1765 (review), the test_recipient_sort function has not been modified yet (other than removing the now obsolete “unstable heapsort” comment, so at least that potentially confusing part is gone), I’m still thinking and discussing with @furszy on how to best generalize this. A first draft on how that could look like, based on the linked suggestion above, is here: https://gist.github.com/theStack/c0fd8fa9a5c474f8baf4c57169758d41

theStack force-pushed on Dec 16, 2025

theStack commented at 6:38 pm on December 16, 2025: contributor

Force-pushed another time (sorry for the noise!) to rename the label creation and (de)serialization API functions to be more consistent:

_silentpayments_recipient_create_label -> _silentpayments_recipient_label_create
_silentpayments_label_serialize -> _silentpayments_recipient_label_serialize
_silentpayments_label_parse -> _silentpayments_recipient_label_parse

Also updated the API summary in the PR description accordingly.

in include/secp256k1_silentpayments.h:353 in 26e1e88f32

348+ *             prevouts_summary: pointer to the transaction prevouts summary data (see
349+ *                               `secp256k1_silentpayments_recipient_prevouts_summary_create`).
350+ *       unlabeled_spend_pubkey: pointer to the recipient's unlabeled spend public key
351+ *                 label_lookup: pointer to a callback function for looking up
352+ *                               a label value. This function takes a label
353+ *                               public key as an argument and returns a pointer to

real-or-random commented at 8:49 am on December 17, 2025:

nit: still saying “label public key” here.

either way, it will be good to emphasize that it takes a serialized label

in include/secp256k1_silentpayments.h:293 in 26e1e88f32

288+ *
289+ *  In:         label: pointer to the label public key to check (computed during
290+ *                     scanning)
291+ *      label_context: pointer to the recipients label cache.
292+ */
293+typedef const unsigned char* (*secp256k1_silentpayments_label_lookup)(const unsigned char* label33, const void* label_context);

real-or-random commented at 9:03 am on December 17, 2025:

Perhaps the wording can be improved here (because currently, “label cache” is not explained):

(Please reformat the paragraph, I guessed the line lengths…)

 0/** Type of callback function for label lookups
 1 *
 2 * A function of this type will be used to retrieve the label tweak for a given
 3 * label during scanning. A typical implementation will perform a lookup in a
 4 * key-value store called the "label cache".
 5 *
 6 *  For creating the label cache,
 7 *  `secp256k1_silentpayments_recipient_label_create` and
 8 *  `secp256k1_silentpayments_recipient_label_serialize` can be used.
 9 *
10 *  Returns: pointer to the 32-byte label tweak if there is a match.
11 *           NULL pointer if there is no match.
12 *
13 *  In:         label: pointer to the label public key to check (computed during
14 *                     scanning)
15 *      label_context: pointer to the recipient's label cache.
16 */
17typedef const unsigned char* (*secp256k1_silentpayments_label_lookup)(const unsigned char* label33, const void* label_context);

in include/secp256k1_silentpayments.h:330 in 26e1e88f32

325+ *  tweak if the label is found, or NULL otherwise. The returned pointer must remain
326+ *  valid until the next call to `label_lookup` or until the function returns,
327+ *  whichever comes first. It is not retained beyond that.
328+ *
329+ *  For creating the labels cache, `secp256k1_silentpayments_recipient_label_create`
330+ *  and `secp256k1_silentpayments_recipient_label_serialize` can be used.

real-or-random commented at 9:03 am on December 17, 2025:

0 *  For creating the label cache, `secp256k1_silentpayments_recipient_label_create`
1 *  and `secp256k1_silentpayments_recipient_label_serialize` can be used.

(as above)

theStack force-pushed on Dec 17, 2025

theStack commented at 8:12 pm on December 17, 2025: contributor

@real-or-random: Thanks! Addressed your review comments regarding the remaining “label pubkey” terminology and the label cache explanation in the API documentation and reworked commenting _create_shared_secret w.r.t. non-infinity/non-zero requirements ([1], [2], [3], [4]). Also added test coverage for the newly introduced _recipient_label_{parse,serialize} (including NULL parameter API tests), as I noticed this was missing.

Fwiw I’ve updated the BIP-352 Python implementation (based on secp256k1lab, with names reflecting this module PR to ease review) to include the latest changes. It also contains the “LabelSet approach” now, so reviewers can (hopefully) more easily form an opinion about the two scanning approaches, seeing it from a higher-level perspective, without being too distracted by the C noise yet: https://github.com/theStack/secp256k1lab/blob/775f74373b30390932e10ab6f9d31db606339987/src/secp256k1lab/bip352.py#L234

A branch with an alternative version of this PR, implementing the “LabelSet approach” (as opposed to the “BIP approach” for scanning, used right now) is coming up, likely within the next 1-2 days. The actual scanning implementation change is not even that large, the tedious part is the API change which requires to change tests, benchmarks and the example code as well.

theStack commented at 11:45 pm on December 19, 2025: contributor

A branch with an alternative version of this PR, implementing the “LabelSet approach” (as opposed to the “BIP approach” for scanning, used right now) is coming up, likely within the next 1-2 days. The actual scanning implementation change is not even that large, the tedious part is the API change which requires to change tests, benchmarks and the example code as well.

This is now ready: https://github.com/theStack/secp256k1/tree/silentpayments_module_fullnode_only-LabelSet-scan

Note that the “LabelSet approach” _scan_outputs API differs in the following ways from the “BIP approach”:

transaction outputs to scan for are passed in raw 32-byte format, rather than as secp256k1_xonly_pubkey instances, as we don’t have to treat tx outputs as group elements; this is a small but nice advantage I think, as the user has to do less preparatory work, including potential error handling for crytographically invalid taproot outputs, and it’s also faster (i.e. no secp256k1_xonly_pubkey_parse calls, involving a field element square root calculation)
the label cache callback type and the corresponding parameters are gone, and instead an array of pointers to secp256k1_silentpayments_label_entry instances is passed, each consisting of a (label, label_tweak) pair; directly passing labels information seems to make things easier for the user as well I think, having to maintain a separate data structure and providing a callback function is just more error-prone and complex

Differences in the implementation, in comparison to the “BIP approach”:

tx_outputs are lexicographically sorted in place before the scanning process starts, to prepare for a fast lookup using binary search
rather than iterating over all transaction outputs for each new k value and do tx output -> label candidate -> label cache calculations/lookups, do it in the other direction: for each new k value, iterate over the list of passed labels repeatedly, where each label is added to the unlabeled output candidate, and do label -> output candidate -> tx outputs calculations/lookups instead
lookups in the transaction outputs, as already mentioned, are done via binary search, using the function secp256k1_silentpayments_tx_output_find

I think in some sense the LabelSet approach is even more straight-forward and easier to understand, as it more resembles the steps on the sender side, and there is no need to handle the missing y-parity information in x-only public keys (by trying both). Thanks go once again to @w0xlt who brought up the initial idea and implementation in this PR and even developed it further by using batch inversion (see #1765 (comment) and https://github.com/w0xlt/secp256k1/commit/1e8d196e456de957e497275205e3e77df2b4f74b). I decided to not include this optimization yet to keep the branch in a simple reviewer-friendly state for now, but think it’s an attractive option to consider.

Comments about the two approaches in general and ideas on how to concretely move on with this PR would be much appreciated. I see merit in both approaches, they are just addressing different use-cases, depending on how many labels are planned to be used. The drawback in the “BIP approach” is that it suffers (I think inherently) from the worst-case scanning attack, independently on how many labels are used. If that is a no-go, I think the “LabelSet approach” is a realistic option, with the drawback that it doesn’t scale well for large-amount-of-labels use-cases. Maybe that’s fine for the first release of the module, if that’s well-documented.

in include/secp256k1_silentpayments.h:137 in 2c073807c3 outdated

132+ *  `secp256k1_silentpayments_recipient_label_serialize` and
133+ *  `secp256k1_silentpayments_recipient_label_parse`.
134+ */
135+typedef struct secp256k1_silentpayments_label {
136+    unsigned char data[68];
137+} secp256k1_silentpayments_label;

furszy commented at 10:02 pm on December 21, 2025:

In 2c073807c3172ec1b6325c78cd78ece458955b6d:

Why 68 bytes when compressed group elements are 33 bytes? Could be 41 bytes if you want extra space for a 4 bytes magic number and 4 extra bytes for any future upgrade (connected to #1765 (review)).

theStack commented at 4:32 am on December 23, 2025:

The “compressed” notion is only relevant for serialization. IIUC for the opaque data structure types we generally prefer to use representation that closely match internal data types (in this case, secp256k1_ge, using secp256k1_ge_{from,to}_bytes; note that something like a “compressed secp256k1_ge” object doesn’t exist), and _save/_load functions shouldn’t involve any expensive operations.

For example, in the LabelSet-approach, we pass in the list of labels as secp256k1_silentpayments_label instances. If this type was representing a group element where only the x coordinate and the y parity is known (assuming that such a hypothetical “compressed group element” internal type would really exist), the full y coordinate would need to be calculated every time. If we do that already at the object creation time (in _label_create or _label_parse), it only has to be done once.

Regarding “4 extra bytes for any future upgrade”, that’s of course an option. On the other hand, I can’t imagine that a label is ever anything more than a group element. Did you have anything specific already in mind?

furszy commented at 6:49 pm on December 23, 2025:

Regarding “4 extra bytes for any future upgrade”, that’s of course an option. On the other hand, I can’t imagine that a label is ever anything more than a group element. Did you have anything specific already in mind?

Not really; that thought was connected to the previous one: “As we can decrease the payload size by compressing the point, we can afford to use some extra space for future upgrades”, which is usually a nice-to-have feature in any protocol. But I agree that nothing else really comes to mind atm.

Can close this convo. Would rather focus on adding a label serialization specification to the BIP, so we can continue the discussion -> #1765 (review).

in src/modules/silentpayments/main_impl.h:354 in 2c073807c3 outdated

349+
350+/* Loads a group element from a label. Returns 1 unless the label wasn't properly initialized. */
351+static int secp256k1_silentpayments_label_load(const secp256k1_context* ctx, secp256k1_ge* ge, const secp256k1_silentpayments_label* label) {
352+    ARG_CHECK(secp256k1_memcmp_var(&label->data[0], secp256k1_silentpayments_label_magic, 4) == 0);
353+    secp256k1_ge_from_bytes(ge, label->data + 4);
354+    return 1;

furszy commented at 10:08 pm on December 21, 2025:

In 2c073807c3172ec1b6325c78cd78ece458955b6d:

Aren’t we missing an infinity check here? We ensure it during save through secp256k1_ge_to_bytes but the load internal secp256k1_ge_from_bytes doesn’t seem to be checking it.

theStack commented at 4:03 am on December 23, 2025:

secp256k1_ge_from_bytes interprets a 64-byte array as affine coordinates (32 bytes for x,y each), so the notion of infinity doesn’t exist here, the called function secp256k1_ge_from_storage always sets r->infinity = 0;.

As I currently understand it, we generally assume that user-facing opaque data types are only created and modified by public API functions, and the only protection against manual tampering are the magic bytes. Once that ARG_CHECK is passed in a _load function, we just assume that the object must have been created with the corresponding _save counterpart.

real-or-random commented at 7:49 am on January 6, 2026:

As I currently understand it, we generally assume that user-facing opaque data types are only created and modified by public API functions, and the only protection against manual tampering are the magic bytes. Once that ARG_CHECK is passed in a _load function, we just assume that the object must have been created with the corresponding _save counterpart.

Indeed, and I don’t think we want to make any guarantees about what happens if the caller passes a modified object. And the magic bytes are just a best-effort protection against the user passing an opaque object of the wrong type.

in include/secp256k1_silentpayments.h:163 in 2c073807c3 outdated

158+ */
159+SECP256K1_API int secp256k1_silentpayments_recipient_label_serialize(
160+    const secp256k1_context *ctx,
161+    unsigned char *out33,
162+    const secp256k1_silentpayments_label *label
163+) SECP256K1_ARG_NONNULL(1) SECP256K1_ARG_NONNULL(2) SECP256K1_ARG_NONNULL(3);

furszy commented at 10:28 pm on December 21, 2025:

In 2c073807c3172ec1b6325c78cd78ece458955b6d:

q: why not include the label magic number in the output? It seems like a good way to ensure we’re parsing a label and not just an arbitrary compressed group element.

theStack commented at 3:40 am on December 23, 2025:

Good question. The primary use-case of serializing labels right now is creating the label cache (at least for the “BIP approach” of scanning implemented in this PR), and for that I’d say it makes sense to target the most dense serialization for space-efficiency; having all key entries in the cache repeatedly prefixed with the exact same four bytes (increasing its size by >10%) seems wasteful, same for e.g. a large number of labels are saved to disk. (Sure, the users could manually strip those four bytes away and add them again before calling _label_parse, but that doesn’t seem great either).

Leaving this open for others to weigh in, maybe it’s worth it to do. In any case it seems that label serialization should be specified somewhere, I only notice now that it’s apparently not specified in the BIP.

real-or-random commented at 8:05 am on January 6, 2026:

In any case it seems that label serialization should be specified somewhere,

It’s not entirely clear in this case, but I agree that this is rather a protocol-level concern: interoperability could indeed matter if you want to use the same label DB with different scanning implementations. And in this case, size indeed matters, so I think we should not add the magic bytes.

in src/modules/silentpayments/main_impl.h:403 in 2c073807c3 outdated

398+    ARG_CHECK(secp256k1_ecmult_gen_context_is_built(&ctx->ecmult_gen_ctx));
399+    ARG_CHECK(label_tweak32 != NULL);
400+    ARG_CHECK(scan_key32 != NULL);
401+
402+    /* ensure that the passed scan key is valid, in order to avoid creating unspendable labels */
403+    ret = secp256k1_ec_seckey_verify(ctx, scan_key32);

furszy commented at 10:39 pm on December 21, 2025:

In 2c073807c3172ec1b6325c78cd78ece458955b6d:

Should be able to remove this line, the _ec_pubkey_create_helper() call you do below performs the same sk validity check.

theStack commented at 3:00 am on December 23, 2025:

Note that the _ec_pubkey_create_helper() call below performs a generator point multiplication on a different scalar, namely hash(b_scan || m), so its success doesn’t tell us anything on whether the scan key is valid or not.

furszy commented at 10:51 pm on December 21, 2025: member

went through 2c073807c3172ec1b6325c78cd78ece458955b6d only and left a few comments. Will keep moving.

fjahr commented at 0:03 am on December 23, 2025: contributor

Comments about the two approaches in general and ideas on how to concretely move on with this PR would be much appreciated. I see merit in both approaches, they are just addressing different use-cases, depending on how many labels are planned to be used. The drawback in the “BIP approach” is that it suffers (I think inherently) from the worst-case scanning attack, independently on how many labels are used. If that is a no-go, I think the “LabelSet approach” is a realistic option, with the drawback that it doesn’t scale well for large-amount-of-labels use-cases. Maybe that’s fine for the first release of the module, if that’s well-documented.

(Ignorance warning: I have mostly been reading along here but I have not read all the context from the previous PRs and I have not reviewed all the code)

I guess you are hinting at the same thought I had in your last sentence: I think one approach that would be most defensive (as in should be completely safe for users) would be to only go with LabelSet and also set a max value for the number of labels that can be used. That max number would be somewhat arbitrary but we could set a target time on a target hardware and arrive at some benchmark that we feel comfortable with allowing. This of course wouldn’t prevent users from using more labels but in order to do so they would need to call scanning multiple times which should make sure they know what they are getting into in terms of performance. They would likely read the docs and then complain here that this restriction should be removed because they have a use-case. Then we could consider further actions based on this feedback.

This still seems doable from a implementation POV and lifting the limit in a future version shouldn’t cause any compatibility problems. Doing anything resembling that to mitigate the worst case for the BIP approach seems much worse UX wise and much harder to explain.

Also, I guess if we stay with the API from LabelSet internally the BIP approach could still be used if the number of labels is high and the number of outputs is manageable. But there might be some thresholds that users run into that are much harder to explain in the docs. And I honestly have no clue if the label_context/label_lookup UX is something that users desire, lacking the historical context from the previous PRs.

FWIW, my concerns with the worst case attack are somewhat mild but we should remember that this might be used in some of the most adversarial contexts imaginable: The users could a taker of donations with powerful political adversaries or an exchange with fierce competition for example. That’s why I would have a hard time feeling comfortable with letting users just adapt it at this point.

Eunovo commented at 3:46 pm on December 28, 2025: none

===== BIP approach (calculate label candidates for each output, look them up in labels cache) ===== L= 1: [N=2: 351 us] [N=5: 200 us] [N=10: 231 us] [N=20: 261 us] [N=50: 289 us] [N=100: 451 us] [N=200: 745 us] L= 2: [N=2: 81 us] [N=5: 109 us] [N=10: 82 us] [N=20: 153 us] [N=50: 218 us] [N=100: 379 us] [N=200: 631 us] L= 3: [N=2: 55 us] [N=5: 59 us] [N=10: 75 us] [N=20: 99 us] [N=50: 185 us] [N=100: 299 us] [N=200: 586 us] L= 5: [N=2: 52 us] [N=5: 58 us] [N=10: 70 us] [N=20: 90 us] [N=50: 171 us] [N=100: 302 us] [N=200: 514 us] L=10: [N=2: 48 us] [N=5: 53 us] [N=10: 62 us] [N=20: 86 us] [N=50: 153 us] [N=100: 269 us] [N=200: 489 us] L=20: [N=2: 44 us] [N=5: 52 us] [N=10: 64 us] [N=20: 86 us] [N=50: 147 us] [N=100: 267 us] [N=200: 481 us] L=50: [N=2: 44 us] [N=5: 47 us] [N=10: 61 us] [N=20: 91 us] [N=50: 153 us] [N=100: 257 us] [N=200: 469 us]

===== Label-set approach (calculate output candidate for each label, look it up in outputs) ===== L= 1: [N=2: 39 us] [N=5: 38 us] [N=10: 37 us] [N=20: 38 us] [N=50: 41 us] [N=100: 48 us] [N=200: 64 us] L= 2: [N=2: 37 us] [N=5: 36 us] [N=10: 36 us] [N=20: 37 us] [N=50: 41 us] [N=100: 46 us] [N=200: 65 us] L= 3: [N=2: 40 us] [N=5: 36 us] [N=10: 37 us] [N=20: 39 us] [N=50: 43 us] [N=100: 49 us] [N=200: 75 us] L= 5: [N=2: 39 us] [N=5: 39 us] [N=10: 41 us] [N=20: 42 us] [N=50: 47 us] [N=100: 51 us] [N=200: 84 us] L=10: [N=2: 47 us] [N=5: 47 us] [N=10: 45 us] [N=20: 46 us] [N=50: 49 us] [N=100: 55 us] [N=200: 71 us] L=20: [N=2: 58 us] [N=5: 56 us] [N=10: 58 us] [N=20: 55 us] [N=50: 59 us] [N=100: 66 us] [N=200: 84 us] L=50: [N=2: 88 us] [N=5: 85 us] [N=10: 93 us] [N=20: 88 us] [N=50: 94 us] [N=100: 100 us] [N=200: 120 us]

These benchmark results from #1765 (comment) are interesting. I expect the BIP scanning approach benchmark to report increasing scanning times as N increases, independent of L, because we expect constant lookup times for labels, but this benchmark reports the scanning time using the BIP approach for L=1; N=2 to be several times higher than L=50; N=2. The scanning times shouldn’t vary too much as we vary L (unless we drop it to 0).

It is also surprising to see such a large difference between the BIP approach and LabelSet scanning times for L=1; N=2.

I ran theStack@8eced64 on my machine and obtained the following results, which are more in line with my expectations:

 0Benchmark                               ,    Min(us)    ,    Avg(us)    ,    Max(us)    
 1
 2sp_full_scan_BIP-algo_L=0_N=2           ,    48.4       ,    48.7       ,    49.3    
 3sp_full_scan_BIP-algo_L=0_N=5           ,    48.5       ,    48.6       ,    48.9    
 4sp_full_scan_BIP-algo_L=0_N=10          ,    48.6       ,    48.7       ,    48.8    
 5sp_full_scan_BIP-algo_L=0_N=20          ,    48.9       ,    49.0       ,    49.2    
 6sp_full_scan_BIP-algo_L=0_N=50          ,    49.6       ,    49.7       ,    49.8    
 7sp_full_scan_BIP-algo_L=0_N=100         ,    49.9       ,    50.8       ,    51.1    
 8sp_full_scan_BIP-algo_L=0_N=200         ,    51.0       ,    51.4       ,    52.4    
 9-----
10sp_full_scan_BIP-algo_L=1_N=2           ,    52.4       ,    53.0       ,    53.8    
11sp_full_scan_BIP-algo_L=1_N=5           ,    58.0       ,    58.2       ,    58.5    
12sp_full_scan_BIP-algo_L=1_N=10          ,    64.9       ,    66.8       ,    68.1    
13sp_full_scan_BIP-algo_L=1_N=20          ,    82.9       ,    83.6       ,    84.6    
14sp_full_scan_BIP-algo_L=1_N=50          ,   143.0       ,   144.0       ,   146.0    
15sp_full_scan_BIP-algo_L=1_N=100         ,   234.0       ,   239.0       ,   241.0    
16sp_full_scan_BIP-algo_L=1_N=200         ,   412.0       ,   414.0       ,   417.0    
17-----
18
19sp_full_scan_LabelSet-algo_L=0_N=2      ,    48.8       ,    48.9       ,    49.3    
20sp_full_scan_LabelSet-algo_L=0_N=5      ,    49.0       ,    49.1       ,    49.3    
21sp_full_scan_LabelSet-algo_L=0_N=10     ,    48.2       ,    49.6       ,    50.3    
22sp_full_scan_LabelSet-algo_L=0_N=20     ,    49.3       ,    49.4       ,    49.6    
23sp_full_scan_LabelSet-algo_L=0_N=50     ,    56.5       ,    58.1       ,    59.5    
24sp_full_scan_LabelSet-algo_L=0_N=100    ,    71.8       ,    74.1       ,    75.7    
25sp_full_scan_LabelSet-algo_L=0_N=200    ,   112.0       ,   112.0       ,   112.0    
26-----
27sp_full_scan_LabelSet-algo_L=1_N=2      ,    49.6       ,    49.7       ,    49.9    
28sp_full_scan_LabelSet-algo_L=1_N=5      ,    50.0       ,    50.1       ,    50.5    
29sp_full_scan_LabelSet-algo_L=1_N=10     ,    50.8       ,    50.9       ,    51.0    
30sp_full_scan_LabelSet-algo_L=1_N=20     ,    52.9       ,    53.0       ,    53.1    
31sp_full_scan_LabelSet-algo_L=1_N=50     ,    60.6       ,    60.7       ,    60.9    
32sp_full_scan_LabelSet-algo_L=1_N=100    ,    73.0       ,    75.5       ,    76.4    
33sp_full_scan_LabelSet-algo_L=1_N=200    ,   109.0       ,   112.0       ,   114.0    
34-----
35sp_full_scan_LabelSet-algo_L=2_N=2      ,    48.8       ,    50.1       ,    50.7    
36sp_full_scan_LabelSet-algo_L=2_N=5      ,    51.0       ,    51.1       ,    51.2    
37sp_full_scan_LabelSet-algo_L=2_N=10     ,    49.5       ,    51.3       ,    52.5    
38sp_full_scan_LabelSet-algo_L=2_N=20     ,    53.4       ,    53.8       ,    54.0    
39sp_full_scan_LabelSet-algo_L=2_N=50     ,    59.7       ,    60.9       ,    61.9    
40sp_full_scan_LabelSet-algo_L=2_N=100    ,    73.6       ,    75.4       ,    77.8    
41sp_full_scan_LabelSet-algo_L=2_N=200    ,   109.0       ,   109.0       ,   109.0    
42-----
43sp_full_scan_LabelSet-algo_L=3_N=2      ,    49.4       ,    50.0       ,    51.6    
44sp_full_scan_LabelSet-algo_L=3_N=5      ,    49.7       ,    49.9       ,    50.4    
45sp_full_scan_LabelSet-algo_L=3_N=10     ,    50.4       ,    50.6       ,    50.8    
46sp_full_scan_LabelSet-algo_L=3_N=20     ,    52.3       ,    52.5       ,    52.8    
47sp_full_scan_LabelSet-algo_L=3_N=50     ,    59.9       ,    60.1       ,    60.3    
48sp_full_scan_LabelSet-algo_L=3_N=100    ,    76.3       ,    78.8       ,    79.6    
49sp_full_scan_LabelSet-algo_L=3_N=200    ,   113.0       ,   115.0       ,   116.0    
50-----
51sp_full_scan_LabelSet-algo_L=5_N=2      ,    53.9       ,    54.0       ,    54.1    
52sp_full_scan_LabelSet-algo_L=5_N=5      ,    54.3       ,    54.4       ,    54.6    
53sp_full_scan_LabelSet-algo_L=5_N=10     ,    55.2       ,    55.3       ,    55.4    
54sp_full_scan_LabelSet-algo_L=5_N=20     ,    57.3       ,    57.5       ,    57.8    
55sp_full_scan_LabelSet-algo_L=5_N=50     ,    62.6       ,    64.6       ,    65.6    
56sp_full_scan_LabelSet-algo_L=5_N=100    ,    77.5       ,    78.6       ,    79.6    
57sp_full_scan_LabelSet-algo_L=5_N=200    ,   112.0       ,   113.0       ,   114.0    
58-----
59sp_full_scan_LabelSet-algo_L=10_N=2     ,    56.0       ,    57.2       ,    59.0    
60sp_full_scan_LabelSet-algo_L=10_N=5     ,    58.1       ,    59.4       ,    60.2    
61sp_full_scan_LabelSet-algo_L=10_N=10    ,    60.2       ,    60.7       ,    61.2    
62sp_full_scan_LabelSet-algo_L=10_N=20    ,    62.6       ,    63.0       ,    63.9    
63sp_full_scan_LabelSet-algo_L=10_N=50    ,    70.7       ,    71.0       ,    71.2    
64sp_full_scan_LabelSet-algo_L=10_N=100   ,    86.3       ,    87.4       ,    88.3    
65sp_full_scan_LabelSet-algo_L=10_N=200   ,   119.0       ,   123.0       ,   128.0    
66-----
67sp_full_scan_LabelSet-algo_L=20_N=2     ,    69.0       ,    69.6       ,    70.5    
68sp_full_scan_LabelSet-algo_L=20_N=5     ,    71.2       ,    71.2       ,    71.3    
69sp_full_scan_LabelSet-algo_L=20_N=10    ,    72.5       ,    72.6       ,    72.7    
70sp_full_scan_LabelSet-algo_L=20_N=20    ,    74.9       ,    75.0       ,    75.2    
71sp_full_scan_LabelSet-algo_L=20_N=50    ,    83.4       ,    83.5       ,    83.6    
72sp_full_scan_LabelSet-algo_L=20_N=100   ,    97.8       ,    99.3       ,   101.0    
73sp_full_scan_LabelSet-algo_L=20_N=200   ,   137.0       ,   138.0       ,   139.0    
74-----
75sp_full_scan_LabelSet-algo_L=50_N=2     ,    97.2       ,    99.5       ,   100.0    
76sp_full_scan_LabelSet-algo_L=50_N=5     ,   101.0       ,   102.0       ,   103.0    
77sp_full_scan_LabelSet-algo_L=50_N=10    ,   103.0       ,   104.0       ,   105.0    
78sp_full_scan_LabelSet-algo_L=50_N=20    ,   107.0       ,   107.0       ,   108.0    
79sp_full_scan_LabelSet-algo_L=50_N=50    ,   114.0       ,   116.0       ,   117.0    
80sp_full_scan_LabelSet-algo_L=50_N=100   ,   132.0       ,   133.0       ,   134.0    
81sp_full_scan_LabelSet-algo_L=50_N=200   ,   165.0       ,   167.0       ,   168.0

IIUC, the BIP scanning approach is designed to keep scanning times low as the number of labels increases. If we expect most transactions to have N << L, then the BIP scanning approach will be faster overall, especially when there is a significant number of labels. The LabelSet scanning approach takes almost twice as much time as the BIP approach for L=50;N=2. @w0xlt ’s benchmark results with Batch Inversion look promising, but I expect it to be slower than the BIP scanning approach when we scan real blocks, even with a relatively low number of labels.

theStack commented at 4:57 am on December 29, 2025: contributor

@fjahr: Thanks for weighing in! After much back-and-forth over the last few weeks on the two approaches and their trade-offs, I’ve reached very similar conclusions and am now convinced that shipping the module with the BIP approach as-is would not be a good idea, and “LabelSet with limits” approach as you sketched out is the safest option. It seems that the worst-case scanning attack for large-number-of-labels use-cases could only be fixed at the protocol level anyways (by e.g. limiting the size of SP eligible transactions to 100kvB, or more concretely by bounding the number of taproot outputs), and until that is done the BIP approach should probably not be used at all.

I’ve opened a PR following the proposed “LabelSet scanning with a limit on labels” path: #1792 (by now only suggesting a limit by documentation, but that can easily be changed), where the discussion can be continued. @Eunovo:

These benchmark results from #1765 (comment) are interesting. I expect the BIP scanning approach benchmark to report increasing scanning times as N increases, independent of L, because we expect constant lookup times for labels, but this benchmark reports the scanning time using the BIP approach for L=1; N=2 to be several times higher than L=50; N=2. The scanning times shouldn’t vary too much as we vary L (unless we drop it to 0).

It is also surprising to see such a large difference between the BIP approach and LabelSet scanning times for L=1; N=2.

I remember also seeing these large differences for low L/N values and believe this was a flaw in the benchmark itself (when it was hand-written in the example binary, rather than using our benchmark framework); for some reason the very first benchmark result line showed significantly longer run-times than later ones, even when the order L or N values was re-arranged. With the “actual benchmark” as you ran it, I couldn’t reproduce this large gap anymore.

IIUC, the BIP scanning approach is designed to keep scanning times low as the number of labels increases. If we expect most transactions to have N « L, then the BIP scanning approach will be faster overall, especially when there is a significant number of labels. The LabelSet scanning approach takes almost twice as much time as the BIP approach for L=50;N=2.

That matches my general understanding as well. The theoretical cross-over point should be at $2 * N < L$, as that’s the number of EC point additions needed for each approach (2*N BIP approach, L for the LabelSet approach).

in src/modules/silentpayments/main_impl.h:562 in 2ee7d87934

557+}
558+
559+int secp256k1_silentpayments_recipient_scan_outputs(
560+    const secp256k1_context *ctx,
561+    secp256k1_silentpayments_found_output **found_outputs, uint32_t *n_found_outputs,
562+    const secp256k1_xonly_pubkey **tx_outputs, uint32_t n_tx_outputs,

nymius commented at 7:46 pm on January 13, 2026:

As I commented in the proposed change for BIP 352, the limit on the n_tx_outputs is based on the fact that you cannot receive more than k different silent payment outputs in a transaction to the same silent payment address¹, but even in the case this was possible, nothing should prevent you from adding more non silent outputs to the transaction. To avoid a constraint that doesn’t hold by itself, I would keep the old size_t type for n_tx_outputs. The changes on the recipient fields and the n_found_outputs are different, because those fields are bounded by k.

Point 3 of #1698 (review) ↩︎

theStack commented at 2:07 am on January 20, 2026:

I agree, see my previous answer on the BIPs PR: https://github.com/bitcoin/bips/pull/2055/files#r2692907049

Switched back to size_t as type for the n_tx_outputs parameter accordingly, for both the sending (_create_outputs) and receiving (_scan_outputs) AP functions. I think the uint32_t type really only makes sense for the n_found_outputs parameter.

in src/modules/silentpayments/main_impl.h:667 in 2ee7d87934

662+                secp256k1_xonly_pubkey_load(ctx, &tx_output_ge, tx_outputs[j]);
663+                secp256k1_gej_set_ge(&tx_output_gej, &tx_output_ge);
664+                /* Negate the generated output and calculate first scan label candidate:
665+                 *     label1 = tx_output - generated_output
666+                 */
667+                secp256k1_ge_neg(&output_negated_ge, &output_ge);

jonasnick commented at 3:10 pm on January 19, 2026:

It seems like we can do this negation outside the j loop over the transaction outputs. (Comment “Negate the generated output” would need to be updated as well)

theStack commented at 2:07 am on January 20, 2026:

Indeed, done.

in src/modules/silentpayments/main_impl.h:689 in 2ee7d87934

684+
685+                /* If not found, negate the tx_output and calculate second scan label candidate:
686+                 *     label2 = -tx_output - generated_output
687+                 */
688+                secp256k1_gej_neg(&label_gej, &tx_output_gej);
689+                secp256k1_gej_add_ge_var(&label_gej, &label_gej, &output_negated_ge, NULL);

jonasnick commented at 3:16 pm on January 19, 2026:

What @w0xlt’s first optimization approaches showed was that we can use secp256k1_ge_set_all_gej_var in scanning. In the simplest implementation we can just use it for the two candidate labels in the innermost loop. This would already be an improvement.

theStack commented at 2:07 am on January 20, 2026:

Good idea. Done as suggested for the two candidate labels, and used this opportunity to deduplicate the label checking code with a for loop.

theStack commented at 2:07 am on January 20, 2026: contributor

Tackled the latest review suggestions #1765 (review), #1765 (review), #1765 (review).

Also added TODO comments for limiting the k values both on the sender and receiver side, as this would be a proposed mitigation against the worst-case scanning issue (previously suggested by @RubenSomsen). Since this is a protocol change (backwards-incompatible at least in theory) and needs some more research and a BIP update, I’ll leave this PR in draft state until that preparatory work is ready.

theStack marked this as a draft on Jan 20, 2026

theStack force-pushed on Jan 20, 2026

theStack force-pushed on Feb 2, 2026

theStack commented at 6:57 pm on February 2, 2026: contributor

The PR has been updated to implement the suggested “limited-k” protocol restriction with an example value of K_max=1000: https://github.com/bitcoin-core/secp256k1/pull/1765/changes/66e41d0878f0d9dc4bc946b1170c04735aad85a6 (see discussion #1799 (comment) ff.) It also includes the common case and worst-case scanning benchmarks with different group sizes (i.e. K_max limits), so reviewers can test out on how different limits would map to worst-scanning scanning times on their machines: https://github.com/bitcoin-core/secp256k1/pull/1765/changes/e3e4162f1aa494df8968aa9f748942b761b7ab54

I’m keeping the PR as a draft, since the limit used (K_max=1000) is still just an example number and more discussion is needed before a concrete change in BIP-352 is applied.

in include/secp256k1_silentpayments.h:160 in 7ce90dd752 outdated

155+    const unsigned char *in33
156+) SECP256K1_ARG_NONNULL(1) SECP256K1_ARG_NONNULL(2) SECP256K1_ARG_NONNULL(3);
157+
158+/** Serialize a Silent Payments label
159+ *
160+ *  Returns: 1 always

w0xlt commented at 0:10 am on February 4, 2026:

0 *  Returns: 1 when the label could be serialized, 0 otherwise.

theStack commented at 4:42 pm on February 5, 2026:

This is actually intended as-is, because the API description generally assumes that passed in custom (opaque) types are valid (see also other serialization functions, for example secp256k1_ec_pubkey_serialize, and commits like https://github.com/bitcoin-core/secp256k1/pull/1089/changes/b8f8b99f0fb3a5cd4c6fb1c9c8dfed881839e19e).

in include/secp256k1_silentpayments.h:342 in 7ce90dd752 outdated

337+ *  and `secp256k1_silentpayments_recipient_label_serialize` can be used.
338+ *
339+ *  Returns: 1 if output scanning was successful.
340+ *           0 if the transaction is not a Silent Payments transaction,
341+ *             or if the arguments are invalid.
342+ *

w0xlt commented at 6:56 am on February 4, 2026:

0 *  Note:
1 *  Scanning is bounded by SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT and may
2 *  miss outputs if a transaction contains more outputs for a single scan public
3 *  key group than this limit.

theStack commented at 4:36 pm on February 5, 2026:

Thanks, added.

in src/modules/silentpayments/main_impl.h:316 in 7ce90dd752 outdated

311+            /* Creating the shared secret requires that the public and secret components are
312+             * non-infinity and non-zero, respectively. Note that the involved parts (input hash,
313+             * secret key sum, and scan public key) have all been verified at this point. */
314+            secp256k1_silentpayments_create_shared_secret(ctx, shared_secret, &pk, &seckey_sum_scalar);
315+            k = 0;
316+        }

w0xlt commented at 7:00 am on February 4, 2026:

This check should happen before generating the output for k, so we don’t partially fill the outputs array with an out-of-protocol output on failure.

0        }
1        if (k >= SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT) {
2          secp256k1_scalar_clear(&seckey_sum_scalar);
3          secp256k1_memclear_explicit(&shared_secret, sizeof(shared_secret));
4          return 0;
5       }

theStack commented at 4:36 pm on February 5, 2026:

Good point, done. Also thanks for adding the clearing of secrets that I missed.

in src/modules/silentpayments/main_impl.h:328 in 7ce90dd752

323+        k++;
324+        /* Once the number of created outputs for the current recipient group exceeds the
325+         * protocol limit, fail, as the recipient isn't guaranteed to find further ouputs. */
326+        if (k > SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT) {
327+            return 0;
328+        }

w0xlt commented at 7:03 am on February 4, 2026:

It can be removed per https://github.com/bitcoin-core/secp256k1/pull/1765/changes#r2762506274

in src/modules/silentpayments/main_impl.h:737 in 7ce90dd752

732+                }
733+                secp256k1_silentpayments_label_save(&found_outputs[k]->label, &label_ge);
734+            } else {
735+                found_outputs[k]->found_with_label = 0;
736+                /* Set the label public key with an invalid public key value. */
737+                memset(&found_outputs[k]->label, 0, sizeof(secp256k1_pubkey));

w0xlt commented at 7:04 am on February 4, 2026:

0                memset(&found_outputs[k]->label, 0, sizeof(found_outputs[k]->label));

theStack commented at 4:35 pm on February 5, 2026:

Ugh, that was a pretty embarrassing (undiscovered) bug. Thanks for spotting, fixed, and also updated the comment line above to not mention “public key” anymore.

real-or-random referenced this in commit d071aa56d5 on Feb 4, 2026

theStack force-pushed on Feb 5, 2026

theStack commented at 5:08 pm on February 5, 2026: contributor

Force-pushed after tackling the latest review comments of @w0xlt (thanks!). Note that the PR also still includes two TODOs resulting from the discussion in #1799:

/* TODO: clarify the limitations of labels w.r.t. light clients */ (link)
/* TODO: potential optimization: the worst-case run-time can be cut in half by randomizing the outputs */ (link)

I consider the first one to be significantly more important, since there are serious concerns about a potential lack of wallet interoperability (e.g. full node wallets scanning for lots of labels not being able to be restored on light client wallets) if we don’t document this properly.

w0xlt commented at 7:44 pm on February 5, 2026: contributor

One subtle API/UX point about secp256k1_silentpayments_recipient_scan_outputs: the implementation bounds scanning by SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT by capping the k loop to k_max = min(n_tx_outputs, GROUP_LIMIT)

The function still returns 1 (success) and only reports *n_found_outputs (also capped), so a caller could reasonably assume “all outputs were scanned” unless they read the note in the header docs. It might be worth returning a distinct status / flag when the bound is hit.

w0xlt commented at 10:56 pm on February 5, 2026: contributor

I ran silentpayments_scan_worstcase benchmarks for the BIP scan.

Case	K=10	K=100	K=1000	K=2325	K=23255
No optimization (PR as-is)	0.168s	1.285s	12.13s	29.29s	287.33s (4.79 min)
Only “found output removal” (0efc140)	0.169s	1.234s	12.14s	29.22s	148.55s (2.48 min)
All optimizations (0efc140 + 2c24fe8 + 4795a3e)	0.075s	0.547s	5.19s	11.80s	60.55s (1.01 min)

Notes on the optimizations:

0efc140: removes found outputs from the scan list in-place, so later k-iterations scan a shrinking set (~2× at K≈N).
2c24fe8: avoids re-serializing/recomputing the generated x-only key inside each inner-loop comparison (compare 32-byte x-only encodings).
4795a3e: batches label-candidate conversion to affine (batch inversion), reducing per-output label overhead.

Bench file used (which allows shuffling outputs)): https://github.com/w0xlt/secp256k1/blob/silentpayments_module_fullnode_only_2/src/modules/silentpayments/bench_impl.h

To run the K=23255 row, I temporarily set SECP256K1_SILENTPAYMENTS_RECIPIENT_GROUP_LIMIT >= 23255 (otherwise the scan caps earlier).

The data above indicates the combined optimization effect at K=N is roughly multiplicative: ~2× (removal) × ~2.4× (other work per scanned output) ≈ ~4.8×, and we get 287s → 60s (~4.75×).

real-or-random commented at 12:34 pm on February 6, 2026: contributor

For the “All optimizations” row, is this with zero labels? Asking because 2c24fe8 introduces a branch for label_lookup == NULL

2c24fe8: avoids re-serializing/recomputing the generated x-only key inside each inner-loop comparison (compare 32-byte x-only encodings).

Have you tried using secp256k1_memcmp_var on the secp256k1_xonly_pubkeys directly? (Or, cleaner, add an equality check function for secp256k1_xonly_pubkey that is implemented via comparing memory).

In fact, if there are no labels, the fact that the caller passes the tx outputs as secp256k1_xonly_pubkey hurts performance. These could be just the serialized xonly keys. ~~(And even if there are labels, we could also batch deserialize the tx outputs.)~~

theStack commented at 7:45 pm on February 6, 2026: contributor

@w0xlt: Thanks for investigating optimizations, that’s very helpful. I tried to pinpoint the speedup in your “all optimizations” case and found that the biggest impact comes from batch inversion (~2.5x):

Case	K=100	K=1000	K=2325
Baseline (PR as-is) https://github.com/theStack/secp256k1/commit/164fe4504dab254b2112ac021c85fa468759658f	3.23s	31.40s	70.83s
Randomize outputs https://github.com/theStack/secp256k1/commit/6b5cf7958f5e4740b5538b400a4ad5c7d3dddc3c [1]	1.68s	16.21s	38.73s
Randomize outputs + Batch inversion https://github.com/theStack/secp256k1/commit/e8efd94f270f3053b24742d152a74c6961a8bc8d	0.66s	6.42s	15.33s

So doing both randomizing outputs (2x) and using batch inversion for label candidates (~2.5x) gives us a nice ~5x speedup in total. In comparison, avoiding the repeated serialization of outputs was not noticeable. I think the batch inversion optimization is particularly interesting as it could also improve performance for the “no match” common case (to be verified). The price to pay for that is more complex code, e.g. resulting in three nested loops already (k, j, ci), but I think we should consider it.

[1] It seems that your baseline case is already (very likely unintended?) optimized by doing the shuffling in the benchmarks, which is already ~2x better than the actual worst-case, namely tx outputs listed in the order (k=N-1, k=N-2, …, k=1, k=0) @real-or-random:

In fact, if there are no labels, the fact that the caller passes the tx outputs as secp256k1_xonly_pubkey hurts performance. These could be just the serialized xonly keys. (And even if there are labels, we could also batch deserialize the tx outputs.)

Hmm good point yeah, though it’s unclear if there will be any demand for no-label scanning. As for batch deserializing x-only outputs, is there currently a way to do this (or at all)? It seems that the expensive part is fe_sqrt there for calculating the y coordinate.

real-or-random commented at 7:53 pm on February 6, 2026: contributor

As for batch deserializing x-only outputs, is there currently a way to do this (or at all)? It seems that the expensive part is fe_sqrt there for calculating the y coordinate.

Nevermind, no, there’s no reasonable way to batch square roots. This was a brain fart. :)

theStack commented at 5:45 pm on February 10, 2026: contributor

I think the batch inversion optimization is particularly interesting as it could also improve performance for the “no match” common case (to be verified).

Following up on that, the following table shows benchmark results for the “no match” case for various tx output counts (N), with and without batch inversion used (as per $ ./build/bin/bench silentpayments_scan_nomatch):

Case	N=2	N=5	N=10	N=100	N=1000	N=2325	N=23255
Baseline (PR as-is) https://github.com/theStack/secp256k1/commit/164fe4504dab254b2112ac021c85fa468759658f	43.0us	46.7us	53.4us	176.0us	1418.0us	3245.0us	32094.0us
Batch inversion https://github.com/theStack/secp256k1/commit/e8efd94f270f3053b24742d152a74c6961a8bc8d	42.3us	43.7us	46.3us	96.4us	595.0us	1334.0us	13050.0us
Speedup	~1.65%	~6.86%	~15.33%	~82.57%	~138.31%	~143.25%	~145.93%

Considering that these performance gains are relevant for all Silent Payments scanners, these numbers strongly indicate that we should include batch inversion. I assume that the vast majority of transactions fall into first three columns (N<=10) these days, so >2x gains would be rather the exception than the rule, but even the consistent single per-cent speedups seem like a win, and worth it the additional implementation complexity. Thoughts on that?

Here is an alternative branch for batch inversion, that uses only one j loop: https://github.com/theStack/secp256k1/commit/2e22a5c7492fc72f658ca6d3bfedb948084bb193 (in contrast to the solution linked above, where no-label and label scanning have individual j loops).

w0xlt commented at 4:34 am on February 11, 2026: contributor

I hope you all don’t mind - I have one more optimization suggestion.

Each commit was benchmarked in isolation to show the individual contribution of each change. The setup is the worst-case full-block scanning attack: N = K = 23,255 taproot outputs, all using the same label (m = 0), reversed in order, as defined in the benchmark file. No randomization, because it does not matter in this approach.

#	Commit	Description	Time	vs prev	vs baseline
1	1721f43	Remove per-group recipient limit (baseline)	283.3 s	—	1x
2	5be27fe	Batch-invert scan label candidates	117.7 s	2.4x	2.4x
3	fa8adde	Preprocess outputs for order-independent lookup	116.0 s	1.01x	2.4x
4	f825296	Reuse preloaded output group elements	113.8 s	1.02x	2.5x
5	bb83012	Cache discovered labels as spend-key variants	0.70 s	162x	405x
6	41e7ae2	Compute tweak*G once per k for all spend variants	0.24 s	3.0x	1,192x

All optimizations are purely receiver-side reorganizations. They don’t change cryptographic semantics or rely on sender cooperation. The number of labels in the wallet also doesn’t matter.

If this approach is correct, there might be no need to restrict the protocol to k_max.

This certainly increases the complexity, but perhaps it’s worth it.

The repository is: https://github.com/w0xlt/secp256k1/tree/silentpayments_module_fullnode_label_cache

theStack commented at 3:58 pm on February 11, 2026: contributor

@w0xlt: Interesting. Looking at the “Cache discovered labels as spend-key variants” commit (https://github.com/bitcoin-core/secp256k1/commit/bb8301221ad2189113e0df4e4d46ff664abca646), it seems to help under the assumption that an attacker uses the same labeled recipient addresses repeatedly in the adversarial tx. If that assumption doesn’t hold (e.g. different labeled spend pubkey used for each output; maybe we should define that as the actual worst-case), then it seems to perform strictly worse than the baseline, as the “derived spend key” cache only gets filled and looked up, without ever having a hit?

On the other hand, an attacker can’t derive labeled addresses without knowing the scan secret key, so the assumption that an adversarial transaction is likely to contain many outputs with repeated labeled addresses might be not so far-fetched. :man_shrugging:

w0xlt commented at 9:22 pm on February 11, 2026: contributor

@theStack thanks for the feedback.

I added a benchmark toggle to simulate the “actual worst-case” where each matched output uses a distinct labeled spend key (labels don’t repeat, so “discovered-label” caches get no hits).

With K=N=23255 and outputs ordered in descending k (the same worst-case ordering):

Pre-optimization scan (PR as-is, single label m=0): ~283 s
Optimized-scan, single label m=0 (repeated label): ~0.26 s
Optimized-scan, distinct labels m=k (no cache hits): ~245 s

So if this data is correct, the optimization is still a modest (or negligible) win even in the “distinct labels” adversary model, and it’s a huge win in the repeated-label case.

Generating many distinct labeled addresses is also not something a sender can do from just the receiver’s public address (labels depend on the receiver’s scan secret key), unless they first collect many labeled addresses published by the receiver.

w0xlt commented at 1:44 am on February 12, 2026: contributor

Added a few commits to this repository to handle the distinct labels worst case (each of the K matching outputs uses a different label m=0..K-1, so the discovered-label cache gets no hits).. It reduces that case from ~216.1 s to ~66.9 s (3.2x over the unbounded cache, 4.4x over the unoptimized baseline).

#	Commit	Description	Time	vs 1721f43	vs 1faa518
—	1721f43	Unoptimized	295.9 s	—	—
7	1faa518	Unbounded cache + distinct labels	216.1 s	1.4x	1x
8	dcf1d7f	Cap cached labels at 16	128.9 s	2.3x	1.7x
9	a953f9a	Rotate label discovery scan order	85.6 s	3.5x	2.5x
10	22217b2	Scan only unused outputs	66.9 s	4.4x	3.2x

At this point the main trade-off is added complexity + some extra memory/bookkeeping in the receiver scan.

We may be able to simplify parts of it (or not). Assuming the approach is sound, it gives us additional mitigation knobs beyond a k_max restriction (e.g., bounded label-variant caching and unused-output tracking), while keeping protocol semantics unchanged.

theStack commented at 4:50 pm on February 12, 2026: contributor

@w0xlt: Thanks for providing more optimizations, I’m looking forward to look at them in detail soon.

We may be able to simplify parts of it (or not). Assuming the approach is sound, it gives us additional mitigation knobs beyond a k_max restriction (e.g., bounded label-variant caching and unused-output tracking), while keeping protocol semantics unchanged.

If those optimizations work as intended and we decide to implement them, this might at some point raise the question if we need to restrict K_max at all. Considering that all of these optimizations increase complexity quite a bit (from what I saw so far), I’d strongly argue that we should still introduce the protocol limit, alone for the reason that there might be other implementations of BIP-352 than libsecp256k1 in the future. If we don’t clearly document these optimizations in the BIP (something that I would very much like to avoid, unless it also benefits the “no match” common case; the more complex the specification is, the more likely bugs are introduced), we can’t expect alternative implementations to pick them up, and thus they would still be vulnerable to the quadratic scaling issue.

Limiting the k value iterations just seems like the simplest approach with negligible drawbacks. Combining that with the “batch inversion” and “randomizing outputs” (these two convinced me the most so far in terms of simplicity and efficiency) sounds like a good combination to me to move forward.

Eunovo commented at 8:41 am on February 13, 2026: none

I added a benchmark toggle to simulate the “actual worst-case” where each matched output uses a distinct labeled spend key (labels don’t repeat, so “discovered-label” caches get no hits).

With K=N=23255 and outputs ordered in descending k (the same worst-case ordering):

Pre-optimization scan (PR as-is, single label m=0): ~283 s

Optimized-scan, single label m=0 (repeated label): ~0.26 s

Optimized-scan, distinct labels m=k (no cache hits): ~245 s

~~This is quite unexpected. With K=N=23255 and distinct labels turned ON, the scanning algorithm still takes minutes on my machine.~~ EDIT: I thought the results were presented for distinct labels.

From what I can understand from https://github.com/w0xlt/secp256k1/tree/silentpayments_module_fullnode_label_cache, The main speedup comes from the use of a tx-specific labels cache that allows avoiding the worst-case path unless a new label is used. This should work pretty well as long as the distinct Labels used in the Tx, is less than the labels cache size i.e If we use 17 Labels and the cache size is 16, then there will always be a cache miss.

Apart from the “batch inversion” optimisation, the rest of the commits seem to be implementing “Removal and Randomisation” in a different way (I am still unsure about this).

w0xlt commented at 6:04 pm on February 13, 2026: contributor

@Eunovo thanks for reviewing and testing.

The latest round of optimization commits (fa8adde onward) aren’t re-implementing “Removal and Randomisation”. They introduce a different mechanism: caching discovered labels so future k iterations can check them directly via binary search, instead of re-running the expensive label discovery loop. That’s the core.

For the repeated-label worst case (all 23,255 outputs match using the same label, i.e. K=N=23255, m=0), this is where the massive speedup comes from - 0.27s vs ~159s with batch inversion + removal and randomization only.

You’re right that with 17 distinct labels and a cache size of 16, we get constant misses. Even so, the full stack of optimizations brings that case (K=N=23255, m=k) down to ~67s (commit 22217b2), vs ~159s with batch inversion + removal and randomization. The “randomization” in a953f9a (technically a rotation) was just reintroduced specifically to handle the distinct-label worst case (m=k) that @theStack raised.

I don’t think we should worry much about the distinct-label case (m=k). An attacker would need to collect thousands of labeled addresses from the same recipient, which is more of a theoretical possibility than a realistic attack. But since it was raised, I proposed a solution.

That said, as @theStack noted, if we’re happy with a K_max protocol limit, then batch inversion + output randomization is likely sufficient — no need for the label cache complexity.

The optimizations proposed here are primarily useful if we want to avoid capping k entirely.

Eunovo commented at 9:02 am on February 16, 2026: none

A Quick Note on the Scanning API

The full node scanning API might be doing too much. It locks users into a specific scanning strategy instead of giving them primitives to build their own optimisations.

For example, when I implemented multi-threaded scanning (bip352.cpp#L386-L558), I had to add these libsecp functions:

secp256k1_silentpayments_recipient_create_tweak
secp256k1_silentpayments_recipient_create_output_pubkey
secp256k1_silentpayments_recipient_create_output_label

This suggests libsecp should expose these building blocks directly rather than bundling everything into one scanning function.

Similarly, @w0xlt’s optimisation included a lot of complicated logic that doesn’t really fit libsecp’s usual focus on math and crypto. I think this kind of stuff is better handled by clients using SP primitives from libsecp. This will keep the library focused and give clients flexibility to optimise however they need.

nymius commented at 3:12 pm on February 16, 2026: none

The full node scanning API might be doing too much. It locks users into a specific scanning strategy instead of giving them primitives to build their own optimizations.

I agree. However, my understanding was that the API explicitly aims to reduce degrees of freedom for implementations; a rationale I considered sufficient.

While splitting the API appears straightforward, are there alternatives to avoid these additions at the current review stage?

Parallelize by pre-partitioning the tx_outputs vector and use a thread-safe cache for tweaks.
Retain the current API and defer exposing primitives to a subsequent iteration.

theStack commented at 5:05 pm on February 16, 2026: contributor

@Eunovo, @nymius :

The full node scanning API might be doing too much. It locks users into a specific scanning strategy instead of giving them primitives to build their own optimizations.

I agree. However, my understanding was that the API explicitly aims to reduce degrees of freedom for implementations; a rationale I considered sufficient.

Yes, the current high-level API design is intentional. Providing an API with lower-level primitives was in fact the initial approach, see the first PR (#1471). At the time, I also thought this was a reasonable idea. However, the discussion highlighted that this approach goes against some of the basic principles of libsecp256k1 (see the comments #1471 (comment) ff.). Our goal is to provide high-level APIs that are easy and safe to use, rather than exposing low-level operations and leaving the user to figure out how to combine them - which increases the risk of misuse. So after that, we agreed on providing a high-level API instead, starting with the second take (PR #1519).

Performance is of course important, but IIUC the recently presented optimizations all seem to be targeting the worst-case, while not gaining anything for the common case (“no match”) scenario. Given that we already have some relatively straight-forward ways to mitigate the worst-case now (namely K_max protocol limit, batch inversion, randomizing outputs, see #1765 (comment)), I’d propose to simply take use of (a reasonable subset of) those and keep it as simple as possible otherwise. (Sure, there might be still sophisticated ways to improve that by another order of magnitude and get the worst-case down from e.g. 15 seconds to under a second, but I very much doubt that any users would justify spending additional engineering power on mitigating that further, at the cost of introducing additional complexity. What matters much more in practice is the common case performance, as it dominates the whole scanning process, and for that I’m not convinced that a lower-level API would even help much; users should already be able to parallelize scanning on a per-tx basis).

So I double-down on what @w0xlt stated above:

That said, as @theStack noted, if we’re happy with a K_max protocol limit, then batch inversion + output randomization is likely sufficient — no need for the label cache complexity.

The optimizations proposed here are primarily useful if we want to avoid capping k entirely.

Two of the last remaining blockers are

agree on a concrete K_max limit, add it to BIP-352
provide sufficient documentation in the API headers about the risks of wallet interoperability w.r.t. label usage (cc @craigraw)
implement batch inversion (also helps the common case)

My suggestions would be:

K_max = 1000 (see benchmarks with the proposed optimizations)
API docs comment, directly at the label creation function:

0 * WARNING: Creating a large number of labels may significantly degrade
1 * scanning performance in certain Silent Payments wallet implementations,
2 * such as light clients. The scanning function provided in this module,
3 * which is designed for full nodes, performs consistently even with hundreds
4 * of thousands of labels. Other implementations may not share this property
5 * or may be unable to use it due to lacking full transaction data.
6 *
7 * To maximize wallet interoperability, it is recommended to create only
8 * the change label (m = 0) and avoid distributing labeled addresses.

Feedback very much welcome. I’m currently working on a branch that adds batch inversion, which I will likely push within the next 1-2 days.

w0xlt commented at 5:25 pm on February 16, 2026: contributor

My suggestions would be:

K_max = 1000 (see #1765 (comment) with the proposed optimizations) API docs comment, directly at the label creation function:

Looks good to me.

nymius commented at 6:53 pm on February 16, 2026: none

@theStack thanks for providing all the context. I agree with your conclusion.

API docs comment, directly at the label creation function:

Should it be expressed in the BIP document too?

craigraw commented at 5:20 am on February 17, 2026: none

API docs comment, directly at the label creation function:

The API docs comment LGTM.

Should it be expressed in the BIP document too?

My view would be yes.

theStack force-pushed on Feb 18, 2026

theStack commented at 6:40 pm on February 18, 2026: contributor

Thanks for the feedback!

Force-pushed with the following changes:

Added an API docs WARNING about label creation and wallet interoperability, as suggested above
Included a separate commit that implements batch inversion (using secp256k1_ge_set_all_gej_var) for label scanning (see the before/after bench silentpayments benchmark results in the commit message)
Renamed various variables in the scanning function for more clarity (output_ge -> unlabeled_output_ge, output_xonly -> unlabeled_output_xonly, spend_pubkey -> unlabeled_spend_pubkey, output_tweak_scalar -> t_k_scalar)

The next steps would be to update the BIP for introducing the K_max value and to include a warning about wallet interoperability there as well (I agree that it’s a good idea to do that).

in src/modules/silentpayments/main_impl.h:424 in 25980e8265 outdated

419+    secp256k1_write_be32(m_serialized, m);
420+    secp256k1_sha256_write(&hash, m_serialized, sizeof(m_serialized));
421+    secp256k1_sha256_finalize(&hash, label_tweak32);
422+
423+    ret &= secp256k1_ec_pubkey_create_helper(&ctx->ecmult_gen_ctx, &label_tweak_scalar, &label_ge, label_tweak32);
424+    secp256k1_silentpayments_label_save(label, &label_ge);

w0xlt commented at 4:57 am on February 24, 2026:

Label is saved even if ret is false.

theStack commented at 5:58 pm on February 24, 2026:

Good catch. We can’t directly branch on ret as this would violate the constant-time guarantees, but clearing with secp256k1_memczero is possible, with !ret as condition flag (that’s how it’s also done in e.g. secp256k1_keypair_create or secp256k1_ellswift_create). Applied this for both output parameters label and label_tweak32.

in src/modules/silentpayments/tests_impl.h:637 in 25980e8265

632+            test->num_taproot_inputs > 0 ? taproot_keypair_ptrs : NULL, test->num_taproot_inputs,
633+            test->num_plain_inputs > 0 ? plain_seckeys : NULL, test->num_plain_inputs
634+        );
635+        secp256k1_context_set_illegal_callback(CTX, NULL, NULL);
636+        /* We expect exactly one ARG_CHECK if the number of input keys was 0. */
637+        CHECK(ecount == (test->num_taproot_inputs + test->num_plain_inputs == 0));

w0xlt commented at 4:58 am on February 24, 2026:

0        CHECK(ecount == ((test->num_taproot_inputs + test->num_plain_inputs) == 0));

in src/modules/silentpayments/tests_impl.h:707 in 25980e8265

702+            test->num_taproot_inputs > 0 ? xonly_pubkeys : NULL, test->num_taproot_inputs,
703+            test->num_plain_inputs > 0 ? plain_pubkeys : NULL, test->num_plain_inputs
704+        );
705+        secp256k1_context_set_illegal_callback(CTX, NULL, NULL);
706+        /* We expect exactly one ARG_CHECK if the number of input keys was 0. */
707+        CHECK(ecount == (test->num_taproot_inputs + test->num_plain_inputs == 0));

w0xlt commented at 4:58 am on February 24, 2026:

0        CHECK(ecount == ((test->num_taproot_inputs + test->num_plain_inputs) == 0));

in src/modules/silentpayments/main_impl.h:591 in 25980e8265 outdated

586+    ARG_CHECK(n_found_outputs != NULL);
587+    ARG_CHECK(tx_outputs != NULL);
588+    ARG_CHECK(n_tx_outputs > 0);
589+    for (i = 0; i < n_tx_outputs; i++) {
590+        ARG_CHECK(found_outputs[i] != NULL);
591+        ARG_CHECK(tx_outputs[i] != NULL);

w0xlt commented at 5:26 am on February 24, 2026:

line 577:

0-    secp256k1_ge unlabeled_spend_pubkey_ge, prevouts_pubkey_sum_ge;
1+    secp256k1_ge unlabeled_spend_pubkey_ge, prevouts_pubkey_sum_ge, tx_output_ge;

Then:

0        ARG_CHECK(tx_outputs[i] != NULL);
1        /* Validate each tx output object early so malformed pubkeys are always rejected. */
2        if (!secp256k1_xonly_pubkey_load(ctx, &tx_output_ge, tx_outputs[i])) {
3            return 0;
4        }

theStack commented at 5:58 pm on February 24, 2026:

Makes sense, done.

in src/modules/silentpayments/tests_impl.h:758 in 25980e8265 outdated

753+    }
754+
755+    /* compare expected and scanned outputs (including calculated seckey tweaks and signatures) */
756+    match = 0;
757+    for (i = 0; i < n_found; i++) {
758+        CHECK(secp256k1_xonly_pubkey_serialize(CTX, found_output, &found_outputs[i]->output));

w0xlt commented at 5:30 am on February 24, 2026:

nit:

0    for (i = 0; i < n_found; i++) {
1        match = 0;
2        CHECK(secp256k1_xonly_pubkey_serialize(CTX, found_output, &found_outputs[i]->output));

theStack commented at 5:58 pm on February 24, 2026:

Didn’t take this one, as I find it clearer if both the initialization and check of the match value happen outside the for loop.

in src/modules/silentpayments/tests_impl.h:590 in 25980e8265

585+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 1, ALICE_SECKEY, NULL, &p, &label_lookup, &labels_cache));
586+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 1, ALICE_SECKEY, &ps, NULL, &label_lookup, &labels_cache));
587+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 0, ALICE_SECKEY, &ps, &p, &label_lookup, &labels_cache));
588+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 1, ALICE_SECKEY, &ps, &p, NULL, &labels_cache));
589+
590+    /* Check that malformed secret key, public key, and prevouts_summary arguments are handled */

w0xlt commented at 5:33 am on February 24, 2026:

0    /* Check that malformed secret key, public keys, and prevouts_summary arguments are handled */
1    tp[0] = &malformed_t;
2    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 1, ALICE_SECKEY, &ps, &p, NULL, NULL));
3    tp[0] = &t;

theStack force-pushed on Feb 24, 2026

theStack commented at 5:58 pm on February 24, 2026: contributor

Rebased on master and applied most review suggestions by @w0xlt (thanks!). Also fixed a minor dormant bug in the test vector code that was revealed by adding the K_max limit vectors, see https://github.com/theStack/secp256k1/commit/7ae84f8971ce36ab5cb9a7af30c436dd42d34ea2 (the found_signatures array had a fixed size of 10, rather than MAX_OUTPUTS_PER_TEST_CASE, making it crash if the latter ever exceeds 10).

The K_max test vector is not included in this PR yet, as I’m unsure if we want to include that relatively large amount of data in this repository (+120k LOC, ~1 MB .json file, ~3.5 MB vectors.h file), considering we have already manual tests for this. Happy with either approach though, depending on what maintainers prefer.

real-or-random commented at 9:20 am on February 25, 2026: contributor

Added a few commits to this repository to handle the distinct labels worst case (each of the K matching outputs uses a different label m=0..K-1, so the discovered-label cache gets no hits).. It reduces that case from ~216.1 s to ~66.9 s (3.2x over the unbounded cache, 4.4x over the unoptimized baseline).

I was hesitant at first when you suggested the cache idea because the first revision added malloc calls, and these are annoying for various reasons, in general and in this library in particular. But if the conclusion is anyway that the cache size is bounded, we may get away with a small cache on the stack. (I think at that point, picking a cache size from, say, 2, 4, 8, 16, or 32 is arbitrary… We can’t predict what users will need.)

What I don’t understand from your bechmarks is why https://github.com/bitcoin-core/secp256k1/commit/1faa518f35f75ae6df49ac71cb0c29a970a09b6a (7 - Unbounded cache + distinct labels) gives a speedup of 1.4. IIUC, since we never hit the cache, maintaining the cache should make scanning slower. So assuming the benchmark is sound, then there must be something else in that commit that provides a speedup. It would be nice to extract this piece.

In general, I suggest including only the simple optimizations here in this PR. I’d love to see further refinements, but they can happen in follow-up PRs. This will also simplify the discussion a bit less cluttered if people can post their benchmarks in a dedicated PR. (And this won’t stop @w0xlt or anyone else from opening a draft PR on top of this PR if they want to do it now.)

theStack commented at 2:01 am on February 26, 2026: contributor

The K_max test vector is not included in this PR yet, as I’m unsure if we want to include that relatively large amount of data in this repository (+120k LOC, ~1 MB .json file, ~3.5 MB vectors.h file), considering we have already manual tests for this. Happy with either approach though, depending on what maintainers prefer.

Small update on that: by applying some formatting changes in the generated test vectors output (mostly packing multiple empty recipients in a single line, avoiding wasteful repeated whitespace characters), the vectors.h file size could at least be reduced to ~1.5 MB (~10k LOC): https://github.com/theStack/secp256k1/commit/84ef0acb3ae9b118b92ea6e50a62763d43b55e4f. That’s still for the K_max=1000 limit though, for the proposed K_max=2324, we are likely again in the ~3-4 MB area. I’m still investigating on whether using a more efficient encoding for the .json file (see https://github.com/bitcoin/bips/pull/2106#issuecomment-3957958154 ff.) is worth it the implementation complexity.

real-or-random referenced this in commit b9cb1cbfd7 on Mar 3, 2026

build: add skeleton for new silentpayments (BIP352) module ae2c7092d9

silentpayments: sending

Add a routine for the entire sending flow which takes a set of private keys,
the smallest outpoint, and list of recipients and returns a list of
x-only public keys by performing the following steps:

1. Sum up the private keys
2. Calculate the input_hash
3. For each recipient group:
    3a. Calculate a shared secret
    3b. Create the requested number of outputs

This function assumes a single sender context in that it requires the
sender to have access to all of the private keys. In the future, this
API may be expanded to allow for a multiple senders or for a single
sender who does not have access to all private keys at any given time,
but for now these modes are considered out of scope / unsafe.

Internal to the library, add:

1. A function for creating shared secrets (i.e., a*B or b*A)
2. A function for generating the "SharedSecret" tagged hash
3. A function for creating a single output public key

63ca58f794

silentpayments: recipient label support

Add function for creating a label tweak. This requires a tagged hash
function for labels. This function is used by the receiver for creating
labels to be used for a) creating labeled addresses and b) to populate
a labels cache when scanning.

Add function for creating a labeled spend pubkey. This involves taking
a label tweak, turning it into a public key and adding it to the spend
public key. This function is used by the receiver to create a labeled
silent payment address.

Add tests for the label API.

8af67e2320

silentpayments: receiving

Add routine for scanning a transaction and returning the necessary
spending data for any found outputs. This function works with labels via
a lookup callback and requires access to the transaction outputs.
Requiring access to the transaction outputs is not suitable for light
clients, but light client support is enabled in a future release.

Add an opaque data type for passing around the prevout public key sum
and the input hash tweak (input_hash). This data is passed to the scanner
before the ECDH step as two separate elements so that the scanner can
multiply the scan_key * input_hash before doing ECDH.

Finally, add test coverage for the receiving API.

877c9c94a7

silentpayments: respect per-group recipients protocol limit (K_max=2323)

This affects both the sending and scanning API functions:
* Sending fails if any group is exceeding the limit.
* Scanning doesn't look beyond the limit.

98b5aad732

silentpayments: add examples/silentpayments.c

Demonstrate sending and scanning on full nodes.

645e3af177

silentpayments: add benchmarks for scanning

Add a benchmark for a full transaction scan, both for the common
case and worst-case (full-block sized tx) scenarios.
Only benchmarks for scanning are added as this is the most
performance critical portion of the protocol.

Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com>

ff86b278ba

silentpayments: optimize scanning by using batch inversion

This improves the worst-case scanning scenario performance by ~2.5x
and also helps notably in the common case ("no match") scenario,
especially if the number of outputs is high:

Benchmark results on parent commit (without optimization):

silentpayments_scan_nomatch_N=2         ,       43.0       ,       43.1       ,       43.1
silentpayments_scan_nomatch_N=5         ,       46.8       ,       46.9       ,       46.9
silentpayments_scan_nomatch_N=10        ,       53.5       ,       53.6       ,       53.6
silentpayments_scan_nomatch_N=100       ,      175.0       ,      178.0       ,      186.0
silentpayments_scan_nomatch_N=1000      ,     1425.0       ,     1430.0       ,     1436.0
silentpayments_scan_nomatch_N=2323      ,     3251.0       ,     3260.0       ,     3269.0
silentpayments_scan_nomatch_N=23250     ,    32039.0       ,    32167.0       ,    32294.0
silentpayments_scan_worstcase_K=10      ,   352706.0       ,   352781.0       ,   352872.0
silentpayments_scan_worstcase_K=100     ,  3230214.0       ,  3231089.0       ,  3231796.0
silentpayments_scan_worstcase_K=1000    , 31401430.0       , 31406989.0       , 31416195.0
silentpayments_scan_worstcase_K=2323    , 70759992.0       , 70764742.0       , 70769823.0

Benchmark results on this commit (with optimization):

silentpayments_scan_nomatch_N=2         ,       42.2       ,       42.2       ,       42.2
silentpayments_scan_nomatch_N=5         ,       43.6       ,       43.6       ,       43.7
silentpayments_scan_nomatch_N=10        ,       47.1       ,       47.1       ,       47.1
silentpayments_scan_nomatch_N=100       ,      101.0       ,      104.0       ,      109.0
silentpayments_scan_nomatch_N=1000      ,      654.0       ,      658.0       ,      664.0
silentpayments_scan_nomatch_N=2323      ,     1473.0       ,     1476.0       ,     1479.0
silentpayments_scan_nomatch_N=23250     ,    14400.0       ,    14412.0       ,    14425.0
silentpayments_scan_worstcase_K=10      ,   157215.0       ,   157242.0       ,   157267.0
silentpayments_scan_worstcase_K=100     ,  1440043.0       ,  1440081.0       ,  1440106.0
silentpayments_scan_worstcase_K=1000    , 13995233.0       , 13995646.0       , 13995923.0
silentpayments_scan_worstcase_K=2323    , 31535643.0       , 31538121.0       , 31542702.0

b553ef7e25

tests: add BIP-352 test vectors

Add the BIP-352 test vectors. The vectors are generated with a Python script
that converts the .json file from the BIP to C code:

$ ./tools/tests_silentpayments_generate.py test_vectors.json > ./src/modules/silentpayments/vectors.h

Co-authored-by: Ron <4712150+macgyver13@users.noreply.github.com>
Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com>
Co-authored-by: Tim Ruffing <1071625+real-or-random@users.noreply.github.com>

a888049c56

tests: add constant time tests

Co-authored-by: Jonas Nick <2582071+jonasnick@users.noreply.github.com>
Co-authored-by: Sebastian Falbesoner <91535+thestack@users.noreply.github.com>

c20c062d20

tests: add sha256 tag test

Test midstate tags used in silent payments.

81cfeb7395

ci: enable silentpayments module ac0af42e8d

docs: update README 81619becbd

theStack force-pushed on Mar 4, 2026

theStack commented at 3:31 pm on March 4, 2026: contributor

The suggested “limited-k” protocol restriction has landed in BIP-352 (with K_max=2323, see BIPs PR, mailing list post). I’ve updated this PR accordingly with the new limit and additional test vectors, and updated the benchmark results (see commit message b553ef7e25a99a1700d1d440c623546b59442159). There is still one TODO for randomizing the outputs; this optimization would reduce the run-time of the worst-case scanning scenario from ~30 seconds to ~15 seconds on my machine. Curious to hear opinions on whether we want to do this now or in a future release (or at all). Personally I feel it’s not strictly necessary, but happy to implement it if there is demand. We would need a shuffling function (probably Fisher-Yates?) and a source for randomness (using the context instance has been proposed before, see #1799 (comment)).

In the course of rebasing, the following changes have also been made:

initialize tagged hash midstates using secp256k1_sha256_initialize_midstate (#1825)
use the ARRAY_SIZE macro for fixed-size arrays in tests and benchmarks (#1824)
adapted _sha256_{write,finalize} calls to pass the newly introduced hash context (#1777)

theStack marked this as ready for review on Mar 4, 2026

nymius commented at 9:33 pm on March 4, 2026: none

There is still one TODO for randomizing the outputs; this optimization would reduce the run-time of the worst-case scanning scenario from ~30 seconds to ~15 seconds on my machine.

Another option is to leave this outside of the implementation and mention in the documentation that randomization brings down the runtime of the worst case scenario by ~50% and reduces the variance between different scanning sessions.

This would avoid the implementation details of Fisher-Yates or the need for random bits.

I prefer this over other alternatives because the performance improvement can be obtained outside of the library code without adding new lines to the current PR. This is not deterrent from adding the optimization in the future.

theStack commented at 7:24 pm on March 5, 2026: contributor

@nymius: Recommending output shuffling in the API docs seems like a reasonable middle ground. Even if users don’t follow the recommendation, it should still be fine since there’s no fundamental security risk.

Curious what others think. I’m still leaning toward the view that even doing nothing would be acceptable, since the existing measurements taken (K_max limit, batch inversion) already disincentivizes targeted attacks by adversaries (making them unlikely to ever appear in practice). But I might be biased :)

johngitbuild-art approved

in src/modules/silentpayments/main_impl.h:354 in 81619becbd

349+
350+int secp256k1_silentpayments_recipient_label_parse(const secp256k1_context* ctx, secp256k1_silentpayments_label* label, const unsigned char *in33) {
351+    secp256k1_ge ge;
352+
353+    VERIFY_CHECK(ctx != NULL);
354+    ARG_CHECK(label != NULL);

w0xlt commented at 11:21 pm on March 12, 2026:

This avoids retaining old data from a previous success.

0    ARG_CHECK(label != NULL);
1    memset(label, 0, sizeof(*label));

in src/modules/silentpayments/main_impl.h:490 in 81619becbd

485+    secp256k1_gej prevouts_pubkey_sum_gej;
486+    secp256k1_scalar input_hash_scalar;
487+
488+    /* Sanity check inputs */
489+    VERIFY_CHECK(ctx != NULL);
490+    ARG_CHECK(prevouts_summary != NULL);

w0xlt commented at 11:22 pm on March 12, 2026:

Same here.

0    ARG_CHECK(prevouts_summary != NULL);
1    memset(prevouts_summary, 0, sizeof(*prevouts_summary));

in src/modules/silentpayments/tests_impl.h:417 in 81619becbd

412+
413+        CHECK(secp256k1_silentpayments_recipient_label_serialize(CTX, label_ser, &l));
414+        CHECK(secp256k1_silentpayments_recipient_label_parse(CTX, &parsed_label, label_ser));
415+        CHECK(secp256k1_silentpayments_recipient_label_serialize(CTX, parsed_label_ser, &parsed_label));
416+        CHECK(secp256k1_memcmp_var(label_ser, parsed_label_ser, 33) == 0);
417+    }

w0xlt commented at 11:24 pm on March 12, 2026:

 0     {
 1         secp256k1_silentpayments_label parsed_label;
 2         unsigned char parsed_label_ser[33];
 3+        static const unsigned char invalid_label_ser[33] = { 0 };
 4 
 5         CHECK(secp256k1_silentpayments_recipient_label_serialize(CTX, label_ser, &l));
 6         CHECK(secp256k1_silentpayments_recipient_label_parse(CTX, &parsed_label, label_ser));
 7         CHECK(secp256k1_silentpayments_recipient_label_serialize(CTX, parsed_label_ser, &parsed_label));
 8         CHECK(secp256k1_memcmp_var(label_ser, parsed_label_ser, 33) == 0);
 9+
10+        CHECK(secp256k1_silentpayments_recipient_label_parse(CTX, &parsed_label, invalid_label_ser) == 0);
11+        CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_label_serialize(CTX, parsed_label_ser, &parsed_label));
12     }

in src/modules/silentpayments/tests_impl.h:483 in 81619becbd

478+    md.data[4] = 1;
479+    CHECK(secp256k1_xonly_pubkey_parse(CTX, &t, &BOB_ADDRESS[0][1]));
480+    tp[0] = &t;
481+    pp[0] = &p;
482+    fp[0] = &f;
483+    CHECK(secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 1, pp, 1));

w0xlt commented at 11:25 pm on March 12, 2026:

 0+    /* Reusing a prevouts_summary after a failure must not leave the old object valid. */
 1+    {
 2+        secp256k1_pubkey neg_p = p;
 3+        secp256k1_pubkey const *pp_sum_zero[2];
 4+
 5+        CHECK(secp256k1_ec_pubkey_negate(CTX, &neg_p));
 6+        pp_sum_zero[0] = &p;
 7+        pp_sum_zero[1] = &neg_p;
 8+        CHECK(secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, NULL, 0, pp_sum_zero, 2) == 0);
 9+        CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_scan_outputs(CTX, fp, &n_f, tp, 1, ALICE_SECKEY, &ps, &p, NULL, NULL));
10+        CHECK(secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 1, pp, 1));
11+    }

in src/modules/silentpayments/tests_impl.h:518 in 81619becbd

513+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 1, NULL, 1));
514+
515+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 0, pp, 1));
516+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 1, pp, 0));
517+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, NULL, 0, pp, 0));
518+    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, NULL, 0, NULL, 0));

w0xlt commented at 11:26 pm on March 12, 2026:

0    CHECK_ILLEGAL(CTX, secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, NULL, 0, NULL, 0));
1    CHECK(secp256k1_silentpayments_recipient_prevouts_summary_create(CTX, &ps, SMALLEST_OUTPOINT, tp, 1, pp, 1));

w0xlt commented at 11:41 pm on March 12, 2026: contributor

the first revision added malloc calls

I’ve been working on a simplified version without malloc. It seems to work fine, but I’ll run more tests.

why https://github.com/bitcoin-core/secp256k1/commit/1faa518f35f75ae6df49ac71cb0c29a970a09b6a (7 - Unbounded cache + distinct labels) gives a speedup of 1.4

Sorry, that wasn’t clear. The 1.4x gain is the accumulated improvement from commits 1–6 regarding the distinct-labels worst case.

Recommending output shuffling in the API docs seems like a reasonable middle ground.

I agree. Since we decided to keep this implementation simple, it seems like the natural choice.

Add “silentpayments” module implementing BIP352 (take 4, limited to full-node scanning) #1765

Description

Changes to the previous take

Open questions / TODOs

A Quick Note on the Scanning API