Silent Payments module: discussion about different scanning approaches (BIP, LabelSet, hybrid)

theStack commented at 3:21 am on January 11, 2026: contributor

As suggested in #1792 (comment), this issue is created to discuss different scanning approaches for Silent Payments. Currently PRs for both the BIP and LabelSet approaches are open:

#1765
#1792

A prototype implementation of the hybrid approach is available at https://github.com/w0xlt/secp256k1/commit/6c76c7c5fa30c5fa7676511de22c47953ee76bd1#diff-4d053be8d1f6d948b412f26ae89711a9dcd2a2683da581cb52b9f6757480361b

See https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78 for a summary, and https://github.com/theStack/secp256k1lab/blob/add_bip352_module_review_helper/src/secp256k1lab/bip352.py for a Python implementation based on secp256k1lab.

As the BIP approach suffers from the worst-case scanning attack, following the LabelSet approach with a limit on labels to scan for is the currently suggested way to proceed.

theStack commented at 3:39 am on January 11, 2026: contributor

Answering comments #1792 (comment) and #1792 (comment) here (to keep the discussion in #1792 focused on the LabelSet PR): @Sjors:

I think it’s fine to initially go for the LabelSet approach, since many current implementations are mobile, which means they benefit from the safety this provides and won’t run into the label limit anytime soon.

I agree.

I do think that hybrid approach is useful. Someone who goes through the trouble of automating label generation might happily add some CPU cores if lets them avoid the complexity of splitting their users across multiple watch keys. Especially if the cost of an attack far outweighs the cost to deal with it.

What makes the hybrid approach somewhat unattractive in my opinion is the very clunky API. Users have to pass in the labels information twice, providing both a list of labels and a label cache lookup callback function. The two underlying data structures also have to be maintained and kept in sync accordingly by the user. I would say that this clearly doesn’t meet the “hard to misuse API” goal that we usually try to follow in this library.

We could document the hardware required to guarantee that a worst case block takes less than 10 minutes to process. For the hybrid approach that’s just a single number IIUC. For the LabelSet approach it’s a function of L. With that in place a warning should be enough IMO, for L > ? based on lower end hardware (mobile) and e.g. a 10 second worst case.

Documenting this makes sense yeah. With “warning”, I guess you mean something purely in the docs, or is there another way during run-time? (I can’t think of any). @Eunovo:

I had an offline conversation with @RubenSomsen about this. The conversations around the scanning approaches have been focused on the time it takes to complete a tx scan, but it’s also important that we consider the time it takes to scan an entire Block.

The current per-transaction insights/benchmarks also apply for blocks, if all transactions in a block were equally-sized in terms of number of outputs. That’s of course a simplifying assumption, but I think one that is good enough to give a rough estimate in terms of EC point addition cost for common scenarios (see examples below).

BIP-style scanning scales with the number of outputs, independently of the number of transactions.

LabelSet scanning scales with the number of labels * the number of transactions. LabelSet will be Fast when you scan Blocks with few transactions and few labels, but it will get slower compared to BIP-style scanning as the number of transactions increases, even without an increase in the number of labels.

Right. I think what you state is ultimately just a different way of expressing the per-transaction turn-over point inequation $L < 2 * N$ (indicating when LabelSet is faster than BIP), where “few transactions per block” maps to a higher N value, and “high number of transactions per block” to a lower N value.

Two concrete example calculations, showing that for up to a few labels, LabelSet approach should be a bit faster:

The best-case scenario for the BIP approach is if every transaction consists of only one output (high number of txs per block). Even in that case, for the single-label scenario, the BIP approach has to do twice as many EC point additions per block as for the LabelSet approach (L=1,N=1 => num_txs * 1 < num_txs * 2).
A much more realistic scenario is probably to assume that every transaction consists of two outputs (due to change). In that case, scanning up to 3 labels is still faster than the BIP approach (L=3,N=2 => num_txs * 3 < num_txs * 4).

I think we should do more Block scanning benchmarks. I am interested in doing this myself next week. I can use LabelSet scanning in https://github.com/bitcoin/bitcoin/pull/32966 and benchmark Block scanning times.

That’s a good idea, looking forward to see results on real-world block data.

Sjors commented at 3:53 am on January 12, 2026: member

What makes the hybrid approach somewhat unattractive in my opinion is the very clunky API.

I’m not wedded to the particular API, but it’s nice to have something with the same scaling properties. But if it’s only recommend for power users, having the API be a bit clunky might be acceptable.

We could document the hardware required to guarantee that a worst case block takes less than 10 minutes to process. […]

Documenting this makes sense yeah. With “warning”, I guess you mean something purely in the docs, or is there another way during run-time? (I can’t think of any).

Just docs.

jonasnick commented at 4:11 pm on January 19, 2026: contributor

My working assumption is that we ultimately want to support many labels for some use cases and also optimize for few-label wallets. Hence, looking at @theStack’s benchmarks, in the long run we should have both the BIP approach and the labelset approach (thanks @w0xlt for coming up with this!).

As we’ve discussed, the BIP approach suffers from a quadratic scaling issue under adversarially created transactions. I believe we can address this relatively easily, and I’d prefer to do so before shipping the BIP approach. It would be great to hear thoughts from the BIP authors on this (@RubenSomsen, @josibake). I’ve discussed possible mitigations with @real-or-random and the ~~most straightforward option seems to be to put a limit on k. I do not think this would prevent any realistic use case (I even struggle to imagine a use case that requires more than one k)~~ (EDIT: this was based on a misunderstanding, see below). We could run some worst case benchmarks to find a reasonable limit.

I haven’t yet looked at the proposed hybrid approach (implementing both approaches under a unified API). What are the advantages? The downside, as mentioned by @theStack, appears to be a clunky API. If there isn’t a compelling benefit, I’d lean toward implementing one approach at a time.

RubenSomsen commented at 2:27 pm on January 20, 2026: none

I calculated the theoretical numbers for comparison. You can view them here. I’ll be referring to these numbers throughout.

Labels allow the recipient to distinguish the source of funds, without requiring another silent payments address (which would scale linearly - 10 addresses means 10x the scanning cost). I think communicating the functionality to users will be a bit of a challenge because it’s hard for people to wrap their head arounds the fact that using a different label for the same address still means the address can be linked to the same user (perhaps a name change will help - “tagged addresses”?), but regardless, it’s clearly a useful feature and it’s crucial for non-donation use cases where recipients need to know who paid them (e.g. an exchange receiving funds from users).

Hypothetically, if we ignored the issue of the worst-case targeted attack and I had to pick one algorithm, my strong inclination would be to go with the BIP style approach. The primary reason for this is that we get a clear and consistent upper bound on the maximum scanning cost for all users, irrespective of their label usage. The numbers reveal that we only lose a minor amount of performance for users that don’t use labels. This performance can be regained in the future by supporting both algorithms.

But of course we’re talking about this because the issue has been hard to ignore. The numbers show that a worst-case targeted attack block is nearly 360x slower (@Eunovo had this benchmarked at 4 minutes on his laptop) than a non-targeted worst-case block.

If we wanted to work around it, there is one “soft” approach - (1) let the API call contain a limited range for K so it has to be called repeatedly. This basically pushes the problem to wallets, which have to decide if and how to communicate this to the user (e.g., “An unusually large tx contained at least X outputs for you. It’s slow to scan but so far contained Y sats. Want to continue?”).

A more strict approach is to simply (2) limit K in the BIP. Technically this is a backwards incompatible change, but in practice no wallets have supported sending multiple outputs to the same recipient so it seems safe to assume this won’t have any practical impact. This does raise the question of what we would limit K to. @Sjors came up with a theoretical scenario where users from one exchange are sending money to another exchange, causing a single transaction with lots of different outputs to the same recipient. At K = 1000, I imagine even this type of scenario would be comfortably covered, but this results in 30x slower block validation (~20 seconds?). What upper bound are we willing to accept for the targeted attack?

The other approach is to (3) go with the LabelSet approach and limit the use of labels. Again, we’d be faced with having to determine a limit, but this time it seems really inevitable it will limit functionality. If we wanted to match the same non-targeted upper bound as the BIP style approach, this would result in a very conservative label limit of 12. And whatever limit we’d pick, if we put it in the BIP, then any future implementation that does want a higher limit will not be BIP compliant.

Note I think even with the BIP style approach we could be more clear about label usage. Currently there is a very non-committal reference to 100k labels (resulting in a ~4MB lookup table) in the BIP. By default, I’d like for most scanning tools to either scan for no labels other than the change label (primarily light clients because to them label usage has bandwidth overhead), or 100k labels. Anyone who wants to go over 100k (or derive labels in a non-standard way) is on their own.

Finally, there is also the idea of (4) putting a lower limit on the number of sats in an eligible output. I think putting this in the BIP is difficult because any reasonable definition of dust would change with the BTC price, and I’m inclined not to explicitly push for this at the wallet level either, but it does make the attack more costly. At the current dust limit, the attacker has to send ~$7000 worth of sats to the (lucky?) victim. On top of that they have to pay one block’s worth of fees, though this can be as low as a few hundred dollars.

Personally, I feel (2) is the most clear-cut approach, assuming we can agree on a reasonable limit for K, but perhaps (1) is also a decent middle ground if we feel the attack is not all that practical but still want to do something about it.

I’d also like to take this moment to appoint @theStack co-author of BIP352. I think he’s proven himself more than capable, and with @josibake’s current reduced involvement I think it will greatly help move things forward.

jonasnick commented at 2:53 pm on January 21, 2026: contributor

Thanks for weighing in @RubenSomsen. I have a clarifying question:

Sjors came up with a theoretical scenario where users from one exchange are sending money to another exchange, causing a single transaction with lots of different outputs to the same recipient.

Why would this transaction have outputs with k > 0? Wouldn’t each user get an address from the destination exchange with a different label?

RubenSomsen commented at 3:10 pm on January 21, 2026: none

@jonasnick thanks for considering the arguments.

k changes whenever multiple outputs are sent to the same recipient, irrespective of what label is used. This is needed to ensure the resulting outputs can’t be linked. Everything except the label part would be the exact same, and the label is assumed to be public information so anyone could try to subtract it to find the link.

real-or-random commented at 4:45 pm on January 21, 2026: contributor

Hey Ruben, thanks for the detailed comments. I, more or less, agree with everything you said. A more detailed response:

(1) let the API call contain a limited range for K so it has to be called repeatedly.

Yeah, I had this idea, too. But it doesn’t really convince me unless we see actual demand from wallets. I think scanning should be a background process, so it’s okay if it takes something on the order of a few seconds but it shouldn’t be crazy. (Where do we draw the line exactly? Okay, bikeshedding very much possible…) But this should be well-documented. If it turns out that wallets need to implement some other, more responsive functionality (e.g., some RPC that scans a single transaction and needs to be quick), then we could reconsider that idea. But currently, I don’t think it will be necessary.

A more strict approach is to simply (2) limit K in the BIP.

The fact that @jonasnick and I came to the same conclusion that limiting k seems to be a good idea (without knowing that you had suggested the same) probably means we’re on the right track. (Yes, when we talked about it, we wrongly assumed that different labels can work with the same k=0, but even if we need different k values for different labels, limiting k seems fine to me.) I agree that limiting k seems much less restrictive for applications than limiting the number of labels.

Yet another idea is to require outputs to be in sequential k order, i.e., the k=0 output appears before the k=1 output in the list of outputs which appears before the k=2 output in the list, etc. @jonasnick had suggested this in one of the PR, but wasn’t sure whether it limits flexibility too much.

I like this idea, too. It’s more straightforward and feels less like a kludge. It should make the scanning linear in the number of outputs, which should get rid of all the performance issues. (Is it reasonable to get a benchmark here?) And having no limit on k is nice because wallet implementors won’t need to scratch their heads around it. And I doubt that this approach is too limiting. It precludes any protocol on top of silent payments which would rely on the order of outputs.

Some protocols such as LN do indeed rely on the order of outputs. But you probably wouldn’t run them on top of SP?
It also excludes BIP69, namely the idea that outputs should appear in deterministic (sorted) order to avoid wallet fingerprinting. This is not an unreasonable idea, but it seems that it has mostly been abandoned, with Electrum being the last big wallet to implement it. See https://github.com/spesmilo/electrum/issues/8849 and also the report mentioned there. So there’s not too much value for SP transactions to blend in with BIP69 transactions. @RubenSomsen What’s your opinion on this idea?

There’s also the possibility to combine this with a maximum k, e.g., k=0 to k=99 may be everywhere in the list but outputs with k >= 100 must be sequential. In that sense, it’s even a relaxation of simply limiting k. But this seems a bit overengineered.

theStack commented at 3:41 am on January 22, 2026: contributor

A more strict approach is to simply (2) limit K in the BIP.

Yet another idea is to require outputs to be in sequential k order, i.e., the k=0 output appears before the k=1 output in the list of outputs which appears before the k=2 output in the list, etc.

Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): https://github.com/theStack/secp256k1/commit/edf222f2106cd45fbaf2f1d47ee87a774c35d42a (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

Benchmark results on my arm64 laptop:

0<compile with SP_ORDERED_K_RULE_ENABLED set to 0>
1$ ./build/bin/bench silentpayments_scan_worstcase
2Benchmark                               ,    Min(us)       ,    Avg(us)       ,    Max(us)
3
4[ "limited k" protocol rule ]
5silentpayments_scan_worstcase_K=10      ,   192912.0       ,   192912.0       ,   192912.0
6silentpayments_scan_worstcase_K=100     ,  1644624.0       ,  1644624.0       ,  1644624.0
7silentpayments_scan_worstcase_K=1000    , 16637185.0       , 16637185.0       , 16637185.0

0<compile with SP_ORDERED_K_RULE_ENABLED set to 1>
1$ ./build/bin/bench silentpayments_scan_worstcase
2Benchmark                               ,    Min(us)       ,    Avg(us)       ,    Max(us)
3
4[ "ordered k" protocol rule ]
5silentpayments_scan_worstcase_K=23255   ,   416104.0       ,   416104.0       ,   416104.0

With a limit of K=1000, scanning takes ~16s (close to the assumed ~20s stated by @RubenSomsen above) on my machine, while the “ordered k” scanning for the absolute worst-case (all outputs match) only took <0.5s. This confirms the assumption that scanning with the “ordered k” protocol rule would get rid of all the performance issues.

Sjors commented at 8:03 am on January 22, 2026: member

(1) let the API call contain a limited range for K so it has to be called repeatedly.

Yeah, I had this idea, too. But it doesn’t really convince me unless we see actual demand from wallets. I think scanning should be a background process, so it’s okay if it takes something on the order of a few seconds but it shouldn’t be crazy. (Where do we draw the line exactly? Okay, bikeshedding very much possible…)

On mobile there’s a couple of lines.

One line is at about 1/100th of a second. Anything that takes longer would block smooth scrolling, since the UI is typically the main thread (disclaimer: it’s been a decade since implemented an iOs app that had to worry about this, but it’s still a thing). That means it needs to be done in a background thread. If we’re going to be well above that anyway, then the mobile developer can’t avoid using a background thread and we don’t have to worry about this particular line.

Having to use a background thread means they have to worry about thread safety. Although it could still be a single long running thread, dedicated to libsecp operations. But then you can’t sign anything, or do any other secp operation, while a heavy block is being scanned. So perhaps developers need to worry about thread safety anyway, and this doesn’t make it worse. And perhaps most operations are safe anyway, e.g. scan 1 block, sign a transaction, scan the next block, doesn’t really interfere?

Still, (mobile) UI needs to give some feedback to the user. It can be a block height that frequently goes up, or a progress bar that gradually moves. There I would say that a few seconds is the max you want to go by without any feedback, or the user might think the app is frozen.

If a specific block takes exceptionally long, it’s probably also good to show the user something to relax them. That’s easier to implement if the call returns after a second or so, saying it needs more. The alternative if for the developer to implement yet another thread that measures the time a call takes and compares it to a baseline. @theStack wrote:

Benchmark results on my arm64 laptop:

Desktop users are probably more patient anyway. I think we should measure the performance on mobile phones. Perhaps not the slowest one out there, although it would be nice if the slowest known smartphone that can run an application with libsecp, gets an answer from the library within 1 second.

jonasnick commented at 10:40 am on January 22, 2026: contributor

k changes whenever multiple outputs are sent to the same recipient, irrespective of what label is used.

Sorry, that was a significant error in my mental model (at least my most recent one). In that case, I agree that having more than one k is a relevant use case and I take back what I said about restricting k being obviously the most straightforward option.

Annother thought on the limited-k vs. ordered-k approaches: limited-k appears to lead to a more secure libsecp API. The limit would be enforced in the secp256k1_silentpayments_sender_create_outputs function. If it’s exceeded, the function returns an error and no transaction will be made.

With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

real-or-random commented at 12:31 pm on January 22, 2026: contributor

With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

That’s a great point.

Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): theStack@edf222f (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

These are useful numbers, and they confirm that scanning takes O(K * N). I believe for picking a reasonable max K_max, we’d rather want to look at entire blocks, as brought up by @Eunovo. The worst-case block is probably one filled with transactions, each having N = K_max many outputs. And then we’ll start to see the “quadratic” impact when considering different K_max values. For example, what’s the scanning time for a block filled with N = K_max = 1000 transactions vs. a block filled with (about 10x as many) N = K_max = 100 transactions?

theStack commented at 1:56 pm on January 22, 2026: contributor

Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): theStack@edf222f (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

These are useful numbers, and they confirm that scanning takes O(K * N). I believe for picking a reasonable max K_max, we’d rather want to look at entire blocks, as brought up by @Eunovo. The worst-case block is probably one filled with transactions, each having N = K_max many outputs. And then we’ll start to see the “quadratic” impact when considering different K_max values. For example, what’s the scanning time for a block filled with N = K_max = 1000 transactions vs. a block filled with (about 10x as many) N = K_max = 100 transactions?

Fair point. Note that for transactions where all outputs match (with N = K_max), the “quadratic” impact can be easily avoided by marking found outputs and skipping those in further iterations (that’s currently already done in #1765). If we want to keep that optimization, I’d assume that the worst-case scanning cost per block is reached by maximizing the number of unmatched outputs (in contrast to any found output, those have to be processed over and over again, blowing the cost up) in a single transaction.

EDIT: Sorry, that was wrong. If the matched outputs are ordered in reverse by an adversary (i.e. from max_k-1…0), then marking and skipping found outputs alone doesn’t gain anything.

w0xlt commented at 2:53 pm on January 22, 2026: contributor

The limited-k and ordered-k proposals shift the problem from ‘how do we optimize scanning?’ to ‘what constraints are acceptable?’ .

The discussion about optimizations (LabelSet, hybrid) was initially motivated by the worst-case attack. If we’re willing to accept protocol constraints like limited-k or ordered-k, the BIP approach might suffice since these constraints bound the worst-case.

In this scenario, the benefits of LabelSet would be:

Simpler API (no callback function, direct label entries)
Simpler implementation (no y-parity handling)
Wallet label count constraint (L ≤ 500) instead of protocol constraint (limited-k)

Both alternatives seem acceptable for typical users.

real-or-random commented at 3:30 pm on January 22, 2026: contributor

@theStack Let’s do some math:

Assume a transaction with N outputs, and K of them are silent payment outputs for a specific recipient, order in reverse (which is the worst case as you’ve figured out).

The 1st iteration in the K loop needs to loop over N outputs
The 2nd iteration in the K loops needs to loop over N-1 outputs
…
The K-th iteration in the K loops needs to loop over N-K+1 outputs

Total number of iterations of the inner loop for a single transaction with N outputs is and K silent payment outputs is:

0N + (N-1) + ... + (N-K+1)
1 = ((N-K) + K) + ((N-K) + (K-1)) + ... + ((N-K) + 1)
2 = K * (N-K) + (K + (K-1) + ... + 1)
3 = K * (N-K) + K*(K+1)/2
4 = NK - K^2 + K^2/2 + K/2
5 = NK - K^2/2 + K/2.

With the simplifying assumption that the number of outputs in a block is limited to B (and not the number of bytes), and we fit T transactions in it, each transaction has N = B/T outputs. This means that for an entire block with T transactions, the number of iterations of the inner loop is

0T * ((BK/T) - K^2/2 + K/2)
1  = T * (-K^2/2 + K/2) + BK

This is clearly maximized for T = 1… So if I’m not mistaken, the worst block appears to be one with a single transaction with K at the limit and N as high as possible to still fit in the block. And this matches exactly the benchmarks you’ve already produced, right?

theStack commented at 5:07 pm on January 22, 2026: contributor

@theStack Let’s do some math: … This is clearly maximized for T = 1… So if I’m not mistaken, the worst block appears to be one with a single transaction with K at the limit and N as high as possible to still fit in the block. @real-or-random: Nice, I’ve followed these steps on pen and paper and could eventually (after some temporary sign confusion) reproduce the conclusion. I assume @RubenSomsen reached a similar conclusion, since in the gist writeup, the “targeted attack” scenario only considers the single-transaction, maximum-N case.

And this matches exactly the benchmarks you’ve already produced, right?

Yes.

With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

Another potential drawback of the “ordered-k” rule is that it breaks existing test vectors in the BIP, which currently explicitly check that order does not matter. Adapting those is probably not the end of the world, but in comparison the “limited-k” rule is significantly easier to cope with in that regards (and probably less confusing for existing SP wallets testing against the vectors), as only new test vectors would be added; currently, the largest test case w.r.t. number of outputs is N=4. So in that sense “ordered-k” feels slightly more like a breaking change than “limited-k”.

RubenSomsen commented at 11:31 pm on January 22, 2026: none

@real-or-random thanks for your comments.

require outputs to be in sequential k order

Other than what was already mentioned about “ordered-k”, I’m also concerned what putting restrictions on output order does to privacy. ~~If a known SP user sends you money from the 1st and 5th output in a tx, it implies they own the 2nd, 3rd and 4th output as well.~~ (edit: incorrect, see the next comment) And in coinjoin scenarios, you now have to insist on a certain order for outputs, which reveals information to other coinjoin participants. @theStack

I assume @RubenSomsen reached a similar conclusion, since in the gist writeup, the “targeted attack” scenario only considers the single-transaction, maximum-N case.

Yes, a single tx with maximum N is the worst case for two reasons:

The attack scales directly with the number of outputs, and a single tx is the optimal way to maximize them
Whenever limit-K approaches N, you can get an up to 2x speedup by removing checked outputs (note I am assuming the outputs are always randomized). This is why in my gist I have a different formula for the “limit K” scenario and the unlimited “worst case” scenario (where K == N). @w0xlt

Wallet label count constraint (L ≤ 500) instead of protocol constraint (limited-k)

I think one important point of nuance is that the slowdown from using labels under LabelSet is consistently present. Every user who approaches L=500 will forever be up to 4x slower than BIP style (non-targeted). Under BIP style, if limit K=120 then only one user will be 4x slower (and everyone else will actually be ~3x faster) for as long as that user is being attacked.

I understand not everyone is sold on labels, and it’s uncertain what adoption will be like, but I do feel pretty confident in saying we’re far more likely to see high label usage than anyone trying to make transactions with lots of outputs to the same recipient in the foreseeable future. Labels enable some pretty common use cases, such as knowing who paid you, accompanying a payment with a description before the payment is made, and even the ability to spend funds that were sent to some set of labels separately from your other funds.

Let me test the waters with a concrete proposal - how would people feel about a K=1000 limit (~31x slower, benchmarked at 16 seconds by @theStack)? That’s a limit no future use case would conceivably hit in practice. If someone attacked you for a full day, filling 144 blocks with outputs to you, UX wise the scanning experience will be equivalent to having been offline for a month and then coming back online (and everyone else gets to scan those 144 blocks ~3x faster than normal).

real-or-random commented at 8:49 am on January 23, 2026: contributor

Other than what was already mentioned about “ordered-k”, I’m also concerned what putting restrictions on output order does to privacy. If a known SP user sends you money from the 1st and 5th output in a tx, it implies they own the 2nd, 3rd and 4th output as well.

I don’t think so. The 1st output could have k=0 and the 5th output could have k=1?

And in coinjoin scenarios, you now have to insist on a certain order for outputs, which reveals information to other coinjoin participants.

My thinking is that it’s not yet clear how exactly a CoinJoin would be created privately. So it’s difficult to judge whether ordered-k would have an impact here. But sure, not having ordered-k retains all the flexibility.

I tend to think that limited-k is the most pragmatic solution.

Let me test the waters with a concrete proposal - how would people feel about a K=1000 limit (~31x slower, benchmarked at 16 seconds by @theStack)? That’s a limit no future use case would conceivably hit in practice. If someone attacked you for a full day, filling 144 blocks with outputs to you, UX wise the scanning experience will be equivalent to having been offline for month and then coming back online (and everyone else gets to scan those 144 blocks ~3x faster than normal).

By the way, one can fit 2324 P2TR outputs in a standard transaction (see https://bitcoinops.org/en/tools/calc-size/). But that would be ~ 2.3 * 16s = 37 s….

I have really no idea what number we should aim at. I can’t follow @Sjors’s conclusions:

although it would be nice if the slowest known smartphone that can run an application with libsecp, gets an answer from the library within 1 second.

My thinking is that scanning will always take place in a background thread, and in a typical scenario, you’d scan more than one block (e.g., what @RubenSomsen mentions when he says “equivalent to having been offline for [a] month and then coming back online”). So there’s no scenario of a user actively sitting in front of a phone and waiting for the wallet app to respond. (Sure, if that scenario existed, we would want to aim for something below a second, maybe much lower. But it’s not the scenario we need to consider.)

For a background thread, perhaps even 37 s is not too crazy, considering also that the victim will be paid for the scanning in some sense. Even if your machine is 10x slower, you’re still well below the expected block time. And again: If you’re offline for a week or a month, you’d also expect your wallet to be busy for quite some time catching up (not only with SP but also just with block validation!).

Side track: Another possible consideration is multiple CPUs. Scanning of different transactions is easy to parallelize. That would suggest limiting the number N of outputs too. (In theory, one could also parallelize the inner loop, but that insight is worth nothing if we don’t implement it that way.) But on the other hand, our limit should probably consider some kind of “weakest but still meaningful” scanning client. And it’s fair to assume that this is one with just one CPU (or at least just one CPU available for scanning).

real-or-random closed this on Jan 23, 2026

real-or-random reopened this on Jan 23, 2026

RubenSomsen commented at 12:28 pm on January 23, 2026: none

I don’t think so. The 1st output could have k=0 and the 5th output could have k=1?

Ah I see, I misunderstood. In that case my argument can be dismissed.

By the way, one can fit 2324 P2TR outputs in a standard transaction

I would not be opposed to K = 2324 (71x slower) either. At that pace one targeted attack block is equivalent to coming back online after half a day. It’s 5x better than the current worst-case, where one targeted attack block takes 2.5 days.

considering also that the victim will be paid for the scanning in some sense

Let me push back on this a little - there is currently nothing in the spec stopping an attacker from sending 0 sat outputs. There’s even the possibility that all outputs are 0 sats except for one and you have to grind through all the 0 sat outputs to figure out whether it’s yours. Individual wallets could choose to set limits here, but that won’t be consistent across the ecosystem.

Another possible consideration is multiple CPUs

I think even for limited-k this could apply. Finding a result where k=1 should be very rare. From k=2 onward you could just throw all your threads at the problem. For eight threads you’d check k=2 until k=9 simultaneously, then at worst you find nothing and seven of your threads will have done some superfluous computation - harmless in practice.

But on the other hand, our limit should probably consider some kind of “weakest but still meaningful” scanning client

I imagine that any device with no more than one CPU would be a light client that also has bandwidth constraints, at which point you’d have to limit your label usage and the LabelSet approach can be applied.

Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

The steps are:

Receive tweak data (~33 bytes per eligible tx)
Calculate what would be your taproot output(s) (i.e. the LabelSet approach)
Check against a filter if they’re in the block (more labels == more false positives)

Perhaps we’re not getting around the fact that both approaches have use cases…

bitcoin-core deleted a comment on Jan 23, 2026

real-or-random added the label feature on Jan 23, 2026

real-or-random added the label performance on Jan 23, 2026

theuni commented at 8:38 pm on January 23, 2026: contributor

Apologies for coming in late, I’ve just been catching up on the various attacks and approaches to mitigate them.

One proposal that seems noticeably absent is a BIP-style + filter approach. E.g. rather than maintaining a pre-sorted labels cache, instead pre-generate a probabilistic (bloom/cuckoo/xor/ribbon/etc.) filter and pass that in for queries instead. Then only do a real find if it’s found in the filter. Those finds could even be done as a batch at the end.

SIMD could be leveraged to parallelize any necessary hashing operations.

IIUC, that would eliminate any output ordering concerns while requiring a theoretically minimal amount of memory compared to a sorted (hash) set. The cost of course would be a trivial amount of false-positives that would result in unnecessary lookups.

…Or am I greatly over-simplifying the problem? :)

theStack commented at 5:30 am on January 24, 2026: contributor

With a limit of K=1000, scanning takes ~16s on my machine

Update on these worst-case benchmarks: due to a silly bug in the code for reversing the matched outputs, half of them landed right at the start of all outputs, rather than within the [n-k,n-1] group at the end (d’oh!). This effectively cut down the scanning time in half; correspondingly, the corrected worst-case benchmark results are about twice as high, i.e. ~32s for K=1000 (rather than the ~16s stated previously). [1] See https://github.com/theStack/secp256k1/commit/b16369f6ed103c973201c99ff1569613da00705b for the fixed version, I’d be glad if someone is willing to run them on their machine, and/or could give the benchmark code a quick sanity check.

This probably doesn’t make finding a reasonable K limit to agree on any easier :| I haven’t given much thought on what worst-case time are still acceptable, but in terms of K I guess I would still feel fine if it is limited to the “hundreds” range, as that still seems absurdly high for any real use case.

Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

The steps are:

Receive tweak data (~33 bytes per eligible tx)

Calculate what would be your taproot output(s) (i.e. the LabelSet approach)

Check against a filter if they’re in the block (more labels == more false positives)

Good point. I think for that scenario we don’t need a LabelSet scanning function though: the second step is comparably simple (-> just create one taproot ouput per labeled spend pubkey with k=0), and if the filter matches and the full block/transaction data has been downloaded, the actual scanning can then be done with the BIP approach.

It took me a while to remember and find it, but luckily an API function for calculating taproot output candidates actually already exists in the take 3 PR for that exact purpose. The reason we took it out for the take 4 PR was to reduce scope and focus on full-node functionality first. Once we ship that function (and other related ones, like e.g. prevouts_summary (de)serialization) in a future release, I think light clients are all set for the above scenario, and a dedicated LabelSet scanning function should not be strictly necessary. @theuni: The label cache lookup cost is treated as negligible in comparison to all the other much more expensive (elliptic curve) operations happening in the loop iterations. With the current BIP approach API, providing an efficient lookup function is in the responsibility of the user. I’ve so far tried out a hash table implementation (khash+rapidhash) and that seemed to work well enough, even for one million entries there was no noticeable decline in performance [2]. Note that the quadratic scaling issue discussed for the BIP approach is independent on the number of labels, it can occur even with L=1, if all outputs of an adversarial tx are targeted to that single labeled address.

[1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts). [2] https://groups.google.com/g/bitcoindev/c/bP6ktUyCOJI/m/HGCF9YxNAQAJ

Eunovo commented at 3:41 pm on January 26, 2026: none

Let me push back on this a little - there is currently nothing in the spec stopping an attacker from sending 0 sat outputs. There’s even the possibility that all outputs are 0 sats except for one and you have to grind through all the 0 sat outputs to figure out whether it’s yours. Individual wallets could choose to set limits here, but that won’t be consistent across the ecosystem. @RubenSomsen I previously thought that it was a good idea for wallets to skip doing any work for 0 sat outputs. If we skip checking 0 outputs in a transaction, with N outputs and M outputs set to 0:

The scanner has to check at least K=0..M-1 to ensure that no potential outputs are lost. This means the scanner has to do (N-M) work M times. In a worst-case attack with N_max outputs and N_max /2 outputs set to zero, the scanner has to check K=0..(N_max/2) -1 for N_max/2 outputs that have non-zero values. This means any wallet that implements this output skipping technique will have to do this (N_max^2)/4 work, just to ensure that their wallet doesn’t have any payments in the transaction when they could have just checked K=0 and only done the N_max work once.

Hence, it’s not advisable to implement any limits that determine which outputs to skip, as that could open wallets up to unnecessary scanning work.

[1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts). @theStack Isn’t this already done for the LabelSet approach? The outputs must be sorted to enable binary searches. If we already accept sorting the tx_outputs, what is the reason not to accept random shuffling?

theStack commented at 5:13 pm on January 26, 2026: contributor

[1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts).

@theStack Isn’t this already done for the LabelSet approach? The outputs must be sorted to enable binary searches. If we already accept sorting the tx_outputs, what is the reason not to accept random shuffling?

That’s a good point. The major difference is that for the LabelSet approach the sorting is important even for the common case (=no match) scenario for performance reasons ($L * log(N)$ vs. $L * N$ output comparisons if we do it with binary vs. linear search), while for the BIP approach it is only relevant to reduce scanning time of pathological transactions. I.e. it boils down to a philosophical question whether we should include “optimizing for the worst-case” code paths, and whether we value a clean API more than a 2x speed-up for a scenario that the majority of users will likely never hit in their life-time. I’m leaning towards the former, but am still open for both. ~~If I’m not mistaken, sorting outputs should have the same effect as shuffling, given that the generated outputs are pseudorandom~~ (EDIT: sorting doesn’t work, see #1799 (comment)). We could:

~~sort~~/randomize the tx outputs internally (as in the LabelSet approach)
shift the responsibility to the user, i.e. recommend or demand from the user to ~~sort~~/randomize outputs prior calling (by mentioning that in the API docs)
not do anything of the above
(anything else?)

Happy to hear opinions about that.

w0xlt commented at 10:31 pm on January 26, 2026: contributor

Small note: the LabelSet scan path (PR #1792) still has an optimization opportunity: batch inversion for the per-label candidate points. That reduces field inversions from O(L) to O(L/chunk) per k, which materially improves scanning time at larger label counts. See the benchmark discussion here:

https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78?permalink_comment_id=5897811#gistcomment-5897811

Also, the hybrid approach still seems valuable for “power user” label counts, even if the initial API is a bit clunky. The prototype heuristic (8 * n_tx_outputs < n_labels + 2) is a reasonable starting point but can be refined using Ruben’s cost calculations.

real-or-random commented at 8:33 am on January 27, 2026: contributor

In the BIP approach:

I.e. it boils down to a philosophical question whether we should include “optimizing for the worst-case” code paths, and whether we value a clean API more than a 2x speed-up for a scenario that the majority of users will likely never hit in their life-time. I’m leaning towards the former, but am still open for both. If I’m not mistaken, sorting outputs should have the same effect as shuffling, given that the generated outputs are pseudorandom.

We can’t really rely on outputs being pseudorandom. The non-SP outputs don’t need to be random at all, and the other values can be ground by the sender. It should suffice to randomize the direction in which the outputs are processed, i.e., whether you start from the top or the bottom. Thus, we only need one random bit per transaction. We could totally let the user pass a randomness argument and hash it together with the txid, or perform some more efficient PRF to get a random bit.

real-or-random commented at 9:41 am on January 27, 2026: contributor

Labels enable some pretty common use cases, such as knowing who paid you […]

Indeed, and I consider this almost a requirement for a good UX. (Unless your use case is donations, which is a valid use case but just not the only one.) @w0xlt approach is a great implementation, but I feel that typical clients will need as many labels as incoming payments, and this makes the BIP approach still quite attractive.

Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

Hm, indeed. I just took a look at the take 3 PR again, and this is a bit hidden there in the API and also in the example because that one doesn’t use labels for the light client.

But doesn’t this also mean that light clients are inherently limited to a small number of labels due to bandwidth? They need to query every candidate output in a light client protocol… Perhaps some tricks could be played, e.g., reusing a label for a new payment after the first payment has been received. I don’t know.

Perhaps we’re not getting around the fact that both approaches have use cases…

Yes, this appears to be true.

If we use the BIP approach for full clients, then light clients will be inherently more restricted in functionality and not just in terms of trust/privacy. And how are we going to educate wallet devs and users about this?

If we use the LabelSet approach for full clients, then even those can’t have a massive set of labels, i.e., they can’t have a massive set of incoming payments (assuming they want to keep track of them).

Eunovo commented at 10:58 am on January 27, 2026: none

We can’t really rely on outputs being pseudorandom. The non-SP outputs don’t need to be random at all, and the other values can be ground by the sender. It should suffice to randomize the direction in which the outputs are processed, i.e., whether you start from the top or the bottom. Thus, we only need one random bit per transaction. We could totally let the user pass a randomness argument and hash it together with the txid, or perform some more efficient PRF to get a random bit.

I may have misunderstood your suggestion, but to divide the expected runtime by 2, we need to perform a uniform random search. Randomising the direction means that the attacker has a 1/2 chance of triggering the worst-case search scenario; full uniform random shuffling will always ensure the worst-case search runtime is divided by 2. @theStack Skipping already matched outputs in the scanning algorithm doesn’t produce any speedup in the worst-case ordering without random shuffling, because all the matched outputs are at the end. A Fisher-Yates shuffle implementation should be O(N) time, so it shouldn’t meaningfully affect scanning time, and the changes to the common case scanning time should be negligible.

real-or-random commented at 12:48 pm on January 27, 2026: contributor

Randomising the direction means that the attacker has a 1/2 chance of triggering the worst-case search scenario;

I admit I haven’t spent much thought on this, and so I may very well be wrong here. But my current thinking is that if the algorithm needs to look at i outputs to find a specific one at position i when starting from the top, then it will need to look at N-i when starting at the bottom (+/- 1, let’s not care). On average, that’s i+(N-i)/2 = N/2. Is this wrong? (I agree, the attacker has a 1/2 chance of triggering the worst case, but I think such an attacker will also trigger the best case with probability 1/2.)

Eunovo commented at 1:39 pm on January 27, 2026: none

I admit I haven’t spent much thought on this, and so I may very well be wrong here. But my current thinking is that if the algorithm needs to look at i outputs to find a specific one at position i when starting from the top, then it will need to look at N-i when starting at the bottom (+/- 1, let’s not care). On average, that’s i+(N-i)/2 = N/2. Is this wrong? (I agree, the attacker has a 1/2 chance of triggering the worst case, but I think such an attacker will also trigger the best case with probability 1/2.

Indeed, the attacker can also trigger the best case with a probability of 1/2, but the runtime will have a high variance (very fast and very slow) as opposed to random shuffling, which will always end up around the N/2 runtime.

It’s also possible to randomise the direction N times, so that half of the time the result is in front and the other half is at the end (if the outputs are sorted in K ascending or descending order). We can avoid shuffling this way, but I don’t know how to provide the randomness. Going by what you said earlier, we will need N random bits?

w0xlt commented at 6:27 pm on January 27, 2026: contributor

Perhaps we’re not getting around the fact that both approaches have use cases…

I might be missing something, but doesn’t the hybrid approach address all the scenarios discussed here, without requiring any protocol restrictions?

theStack commented at 6:49 pm on January 27, 2026: contributor

@real-or-random:

We can’t really rely on outputs being pseudorandom. The non-SP outputs don’t need to be random at all, and the other values can be ground by the sender.

Oh right, I forgot about non-SP outputs, and also oversimplified the SP part in my head, assuming that all recipients would have the same labeled spend pubkey, i.e. L=1 (with differently labeled recpient addresses, each SP output can be ground individually). So sorting outputs doesn’t help indeed.

Hm, indeed. I just took a look at the #1698 again, and this is a bit hidden there in the API and also in the example because that one doesn’t use labels for the light client.

But doesn’t this also mean that light clients are inherently limited to a small number of labels due to bandwidth? They need to query every candidate output in a light client protocol… Perhaps some tricks could be played, e.g., reusing a label for a new payment after the first payment has been received. I don’t know.

That’s definitely the case, yes. I believe @RubenSomsen has done some calculations on what number of labels is still feasible for light clients, in light of e.g. the increasing false positive rate of BIP158 filters.

If we use the BIP approach for full clients, then light clients will be inherently more restricted in functionality and not just in terms of trust/privacy.

I can’t follow this conclusion, can you elaborate? I’d agree to the statement “if we only release the BIP approach for now, the SP module is not useful for light clients (yet)”, and think that’s still fine for a first release; focusing on full-node functionality first was in fact the main motivation behind opening the take 4 PR (but again, not sure if you meant it like that).

My current understanding is: for light clients, neither a BIP nor LabelSet scanning function alone (e.g. take 5 as-is) are helpful. What they need is a function to derive outputs for each of their spend pubkeys (unlabeled and labeled) and k=0 (that’s exactly what _recipient_create_output_pubkeys in the take 3 PR does), in order to check if a block/tx is relevant for them in the first place. Only if that’s the case (i.e. one of the derived outputs is included in the downloaded tx’s outputs, and thus a k=0 match has been detected manually by the user), then a “full” scanning function would come into play, for also finding the potential remaining outputs (k>=1).

So concluding from that, I’d think that the ingredients “BIP approach scanning function” (released now, already useful for full-nodes on its own) and “k=0 outputs creation function” (released later) are sufficient for proper light client support, without noticeable drawbacks in performance. Light client scanners will spend most of their time with the outputs creation function (-> common case, no match), and only ever hit the “full” scanning functions for matches, so it shouldn’t matter too much in the grand scheme whether BIP or LabelSet approach is used for that last step. Did I miss something? The outputs creation function is in some sense a down-sized variant of LabelSet approach. For optimizing it, @w0xlt’s batch inversion idea should work as well and could be explored. @Eunovo:

@theStack Skipping already matched outputs in the scanning algorithm doesn’t produce any speedup in the worst-case ordering without random shuffling, because all the matched outputs are at the end.

Yes. That insight was discussed last week (see #1799 (comment) and #1799 (comment)).

A Fisher-Yates shuffle implementation should be O(N) time, so it shouldn’t meaningfully affect scanning time, and the changes to the common case scanning time should be negligible.

I agree that performance for the common case shouldn’t be an issue if we sort. Even if we were worried about that, we could do it conditionally, e.g. only if the number of outputs is larger than a critical number, or even only after the first match (k=0). The point of my previous post was mostly about introducing more complexity and a slightly less ideal API (modifying “arrays of pointers” that are supposed to be only inputs is a bit ugly; on the other hand, we do it on the sender API as well…).

theStack commented at 0:04 am on January 28, 2026: contributor

Perhaps we’re not getting around the fact that both approaches have use cases…

I might be missing something, but doesn’t the hybrid approach address all the scenarios discussed here, without requiring any protocol restrictions?

Looking at the worst-case scenario: if the number of labels is very large (let’s say L=100_000, to mention a number that is currently mentioned in BIP-352) and an adversarial tx with N=K=23255 is scanned, then both the BIP and LabelSet approach would suffer from a very high scanning time [1], and hence the hybrid approach wouldn’t help us to mitigate.

For supporting use-cases with a high amount of labels, I don’t think there is a way around introducing protocol limits (unless we feel okay with worst-case scanning times in the multiple minutes range, but recent discussions haven’t indicated that).

[1] Under the assumption that a match is found after iterating over L/2 labels on average with LabelSet, that would be N*L/2 = ~1.16 billion inner loop iterations in total, each doing one EC point addition. With BIP style scanning, we have an upper bound of N*N/2 = ~270 million inner loop iterations, each doing two EC point additions. So for this specific example, LabelSet seems to do strictly worse, even without the “randomize outputs” speedup for the BIP approach. (Maybe we should benchmark it though to verify, these calculations e.g. don’t take potential batch inversion optimizations into account.)

w0xlt commented at 6:16 pm on January 28, 2026: contributor

if the number of labels is very large (let’s say L=100_000 … both the BIP and LabelSet approach would suffer from a very high scanning time

Yes, I was thinking about PR #1792 (L ≤ 500). If we want to support a large number of labels, the hybrid approach alone is unlikely to handle the worst-case scenario. I’ll run some benchmarks later.

real-or-random commented at 1:51 pm on January 29, 2026: contributor

For randomization:

Indeed, the attacker can also trigger the best case with a probability of 1/2, but the runtime will have a high variance (very fast and very slow) as opposed to random shuffling, which will always end up around the N/2 runtime.

It’s also possible to randomise the direction N times, so that half of the time the result is in front and the other half is at the end (if the outputs are sorted in K ascending or descending order). We can avoid shuffling this way, but I don’t know how to provide the randomness. Going by what you said earlier, we will need N random bits?

Okay, yeah, we may want to randomize the direction in every loop iteration. This still seems to be simpler (and more efficient?) than a random shuffle. Obtaining random bits is an orthogonal issue in the sense that it will be necessary for any kind randomization we may want to use.

Taking a step back, I believe this randomization discussion is relevant to this thread, but it’s not the primary thing that is blocking us from making progress. Perhaps what we could do now is to add an (optional?) rand32 arg, i.e., 32 bytes of caller-provided randomness, to the scanning API. The value could even be entirely unused in the first iteration that we merge, but this leaves us all doors open for future refinements.

theStack commented at 8:06 pm on January 29, 2026: contributor

Taking a step back, I believe this randomization discussion is relevant to this thread, but it’s not the primary thing that is blocking us from making progress.

Agree. The ~2x speedup gained from randomization is nice and we should implement it in some way or another, but it doesn’t address the fundamental issue of quadratic scaling. Doing only that, we’re unfortunately still operating in the same order of magnitude.

In my view, the primary blocker to move forward is reaching an agreement on a concrete K_max limit. This is based on my impression that there is general consensus regarding the importance of use cases with a high number of labels, which suggests a BIP-style scanning approach with a protocol fix. Light clients are limited by the number of labels they can use, but that’s an inherent problem, and as far as I understood not something a secp256k1 module could solve (releasing the LabelSet approach scanning function in take 5 wouldn’t change anything about that fact, as it is only faster if the label count is very low; the main problem though is not having full transaction and prevouts data in the first place).

I’ll update the take 4 draft PR, by implementing the limitation with an arbitrary value (maybe with K=1000, since that was suggested before and seems to at least not have received strong objections so far) on both sender and receiver sides, and include the worst-case benchmarks shown above, so it’s easier for reviewers to form an opinion and experiment with other K_max values.

Happy to hear feedback about the approach to move on, I’ll also ping developers of existing SP wallets to chime in the discussion. One question might be if this is even the right place to discuss protocol changes, or should that happen on the mailing list instead, and/or a PR in the bips repository (my intuition would be to only open a BIPs PR after we already have a consensus on what the limit would be)?

Perhaps what we could do now is to add an (optional?) rand32 arg, i.e., 32 bytes of caller-provided randomness, to the scanning API. The value could even be entirely unused in the first iteration that we merge, but this leaves us all doors open for future refinements.

Makes sense to stay flexible. Before going ahead with that, an alternative idea: instead of an explicit parameter, would it be an option to leverage the secp256k1_context for that purpose, e.g. with a _silentpayments_context_randomize function (that could still be introduced in the future)? I’m asking this since the API is already quite large w.r.t. number of parameters (currently the BIP approach scanning function has 10 parameters, with the additional rand32 it would be 11…), and passing in a different randomness each time the scan function is called seems not needed; it should be sufficient to set it once at the start when the module is initialized (reusing seems fine, it doesn’t gain the attacker any advantage)?

real-or-random commented at 9:22 am on January 30, 2026: contributor

would it be an option to leverage the secp256k1_context for that purpose, e.g. with a _silentpayments_context_randomize function (that could still be introduced in the future)? […] and passing in a different randomness each time the scan function is called seems not needed; it should be sufficient to set it once at the start when the module is initialized (reusing seems fine, it doesn’t gain the attacker any advantage)?

Oh sure. We could reuse the same randomness every time. It’s just the secret of some PRF and the attacker won’t learn about it if we just throw it into some SHA256 or similar. [1] I’d even go a step further. We already have functionality to randomize the context (for protecting ecmult_gen), so we could simply reuse that randomness. This means we don’t have to change the API at all, great call.

[1] This reminds me of the secret key in the SipHash PRF even. It’s “secret” but also only relevant to performance in the end.

real-or-random commented at 10:24 am on January 30, 2026: contributor

If we use the BIP approach for full clients, then light clients will be inherently more restricted in functionality and not just in terms of trust/privacy.

I can’t follow this conclusion, can you elaborate? […]

My current understanding is: for light clients, neither a BIP nor LabelSet scanning function alone (e.g. take 5 as-is) are helpful. What they need is a function to derive outputs for each of their spend pubkeys (unlabeled and labeled) and k=0 (that’s exactly what _recipient_create_output_pubkeys in the take 3 PR does), in order to check if a block/tx is relevant for them in the first place. Only if that’s the case (i.e. one of the derived outputs is included in the downloaded tx’s outputs, and thus a k=0 match has been detected manually by the user), then a “full” scanning function would come into play, for also finding the potential remaining outputs (k>=1).

Yes, that’s my understanding, too. I think the confusion comes from me using the term “LabelSet approach” a bit more broadly.

My thinking is, and I believe that matches Ruben’s insight from above, that the performance of light clients will at least have a linear factor in the number L of labels. As you say, a light client will need to derive L outputs (k=0 each) per eligible transaction and will also need to send these over the wire as part of some light client protocol, which checks inclusion in the UTXO set. This means that a light client needs o(L) computation and o(L) bandwidth for scanning. This is similar in performance to the LabelSet scanning approach for full clients, and this makes it slow for use cases with a large number of labels. In contrast, the running time of the BIP approach does not really depend on the number of labels (under our working assumption that lookups in the label cache are O(1)).

This means that, if we move forward with the BIP approach, then, for instance, a wallet generated on a full client may not be portable to a light client due to the two scanning methods having different performance characteristics. But perhaps this is simply a drawback that we need to accept if we want to move forward at all.

edit: Sorry, I had submitted this accidentally when I was still in the middle of writing.

theStack commented at 3:35 pm on February 3, 2026: contributor

@real-or-random: Thanks for elaborating, that pretty much matches my understanding.

This means that, if we move forward with the BIP approach, then, for instance, a wallet generated on a full client may not be portable to a light client due to the two scanning methods having different performance characteristics. But perhaps this is simply a drawback that we need to accept if we want to move forward at all.

I agree that this is a potential drawback. If we document that trade-off properly in the API docs and explain the implications of generating/scanning a high number of labels, I tend to think that’s acceptable though?

I’ve drafted a potential change in BIP-352 for introducing a K_max protocol limit as discussed before: https://github.com/theStack/bips/commit/961d1442139ceecd6c0cc5775ef911d69aabed4c (the value 1000 should be seen as a placeholder; what changes with different values is the last sentence in the footnote, e.g. with K_max=100 it would be “order of seconds” instead). The “take 4” PR has also been updated to enforce that change: #1765 (comment)

To gauge feedback on this protocol restriction in the broader ecosystem, I’m pinging developers of existing and upcoming wallets with Silent Payments support: @benma @craigraw @cygnet3 @Eunovo @jvgelder @nymius @pythcoiner @setavenger @Sosthene00 Please comment if you have any concerns about a K_max protocol restriction, for current or potential future SP use cases.

(As mentioned in a previous comment, if someone feels the discussion should happen elsewhere, like e.g. the mailing list, I’m fine to move over there as well, though I’m not sure if that’s necessary).

real-or-random commented at 6:46 pm on February 3, 2026: contributor

I agree that this is a potential drawback. If we document that trade-off properly in the API docs and explain the implications of generating/scanning a high number of labels, I tend to think that’s acceptable though?

I tend to agree. When I wrote my comment above, what I had in mind was using labels for different incoming payments in the sense that you create a fresh label for every incoming payment. But that’s totally the wrong way because it defeats the purpose of silent payments, namely to avoid the sender and the recipient needing to interact for every payment.

With that in mind, I suspect that the average SP recipient user doesn’t have too many labels, and then light client scanning will be fine.

Of course, labels are still useful to distinguish payments for different purposes, e.g., some organization receiving donations for multiple distinct purposes, or an exchange receiving deposits could have a label per user. The latter potentially requires a huge number of labels, but that’s certainly an advanced use case, and the exchange won’t want to use a light client…

I’m curious if this matches @RubenSomsen’s thinking.

(As mentioned in a previous comment, if someone feels the discussion should happen elsewhere, like e.g. the mailing list, I’m fine to move over there as well, though I’m not sure if that’s necessary).

The discussion started here because the issue was caught when reviewing the PR, and there’s nothing wrong with that. It would feel a bit unnatural at this stage to move the discussion to the list now (if you ask me.) In any case, we would like to change the BIP first, and this will happen in the BIPs repo. It may still be a good idea to mention on the mailing list that there’s a discussion going on here, with the note that people can also post to the list if they really don’t want to use GitHub. @theStack If you agree, can you post to the list?

nymius commented at 7:12 pm on February 3, 2026: none

I’ve been following the discussion, and I think the rationale for limiting k is sane. I don’t have a real counter example yet where this limitation may be detrimental. The ones I have tried are mainly related to exchanges or larger custodians, but they lead me nowhere. Maybe Cake Wallet developers have some ideas about it.

I think that mentioning this on the mailing list may bring new eyes with other perspectives to consider the limit.

RubenSomsen commented at 1:09 am on February 4, 2026: none

I’m curious if this matches @RubenSomsen’s thinking.

It’s hard to predict how people are going to use labels, but if there are going to be SP only wallets that support a large number of labels, then users may still opt to hand out a lot of labels simply to track their incoming payments in the same way they might do with regular wallets today. That may not play into the strengths of SP, but that doesn’t necessarily mean we will not see some of that behavior in practice.

It’s also possible that we’d see some light clients that opt to download all the taproot outputs to do BIP style scanning (i.e. supporting 100k labels). It’s more bandwidth, but this can be mitigated to some extent using cut-through, similar to the tweak data.

My general sense is to try and allow for these possibilities, even if it’s not at all certain they will come into existence.

craigraw commented at 9:29 am on February 4, 2026: none

I agree that this is a potential drawback. If we document that trade-off properly in the API docs and explain the implications of generating/scanning a high number of labels, I tend to think that’s acceptable though?

I don’t agree. The benefit of a high number of labels (which is relatively weak in most cases) does not outweigh the benefit of user optionality to switch wallet software. Putting it another way, I would take a working ecosystem of silent payments wallets using different scanning techniques over a theoretical advantage of tracking incoming payments with labels. Every payment needs a payment confirmation, and we should explore ways of doing this that align with static payment codes rather than compromising the ecosystem with an approach that does not scale well and runs contrary to its design goals.

Please comment if you have any concerns about a K_max protocol restriction, for current or potential future SP use cases.

IMO any use of labels over a small number greatly compromises the practical ability of light clients to recover the wallet. I believe data makes the point best. I estimated the required time to scan 1000 labels using Frigate on an 8x RTX 3060 GPU machine over a 64 week period:

	Time (ms)	Tx/sec
No label	16,949	4,199,074
Change label	17,042	4,176,159
10 labels	19,334	3,681,086
30 labels	25,109	2,834,446
1000 labels (est.)	291,000 (4.9 min)	244,000

RubenSomsen commented at 11:41 am on February 4, 2026: none

The benefit of a high number of labels (which is relatively weak in most cases) does not outweigh the benefit of user optionality to switch wallet software

While I don’t share the conclusion, I recognize the concern. It is plausible we will have two types of wallets: those that support a large number of labels and those that do not. The former will be cross-compatible with the latter but not vice versa.

greatly compromises the practical ability of light clients to recover the wallet

Allow me to point out that we need more precise terminology here. In this thread we have been talking about two types of wallets - those that run their own full node and those that use auxiliary data to do the scanning. The latter has (perhaps imprecisely) been referred to as a “light client”. What you are referring to here is a third model of “outsourced scanning”.

I believe data makes the point best

I think this is an interesting model and it needs to be further explored, and the field of parallel GPU computation seems under-explored in general, but I think these benchmarks need more work.

The benchmarks only implement the LabelSet approach - from what I understand that was design choice to avoid the need to receive, store, or re-calculate a user’s labels (up to ~4MB). I am not necessarily convinced that is the right tradeoff.

And from what I saw, there were also numbers produced saying 8x RTX 3080 was several times slower than 8x RTX 3060 - which until investigated further makes these numbers seem somewhat unreliable.

The drop-off in performance with more labels also seems more steep than what we’ve encountered in other benchmarks, but the general trend is indisputably correct - LabelSet does not perform well at a high number of labels.

But regardless of where these benchmarks end up, we still have two distinct label limits for wallets. What I’d like to see is for there to be two clear standards to maximize cross-compatibility between the wallets that support the lower number of labels. Otherwise some wallets support 10 labels and some 25, and things do get messy.

With good engineering we ought to be able to get the more restricted wallets to still support up to 100 labels. Even in the outsourced scanning model without storing data it’s doable to re-calculate 100 labels for every scan session and do BIP style scanning. And for light clients that rely on auxiliary data it’s possible to implement custom block filters that will perform well enough for this use case. That said, the path with least resistance is certainly to either have 100k labels or none (except change) as the two limits.

Summarizing the problem neutrally: BIP style scanning, which is required to support many labels, cannot be applied across the board. For light clients that rely on auxiliary data this is because the bandwidth optimal way does not give access to taproot outputs (though some clients may opt to use more bandwidth). For outsourced scanning the results seem inconclusive, but one problem stems around the fact that the full set of labels needs to be available for every user when scanning. Given that BIP style scanning may not see full adoption, the question arises whether this is a good enough reason to drop label support altogether in order to maximize cross-compatibility.

craigraw commented at 1:07 pm on February 4, 2026: none

And from what I saw, there were also numbers produced saying 8x RTX 3080 was several times slower than 8x RTX 3060 - which until investigated further makes these numbers seem somewhat unreliable.

This is incorrect afaik (see benchmarks) although it does appear that more independent memory controllers and PCIe lanes is a factor. I agree that further research into this is required, but it’s somewhat unrelated to this discussion: with an outsourced scanning model that uses ephemeral client keys, you cannot use the BIP style scanning approach. I think we agree here anyway.

I believe it is worth considering other approaches to payment origination/confirmation before committing to a “two types of sp wallet” future.

RubenSomsen commented at 1:56 pm on February 4, 2026: none

This is incorrect afaik (see benchmarks)

I don’t want to publish anything that was not explicitly made public, but those benchmarks do not contain the 3080, while the ones I saw in a spreadsheet do. Am I wrong to assume you have seen them?

it does appear that more independent memory controllers and PCIe lanes is a factor. I agree that further research into this is required

Yes, I am very much interested in having the bottlenecks mapped out.

an outsourced scanning model that uses ephemeral client keys, you cannot use the BIP style scanning approach

While I don’t think this affects the more general conclusion that some wallets likely won’t be supporting BIP style scanning, I don’t fully agree with that statement. At a modest number of labels (e.g. 100), the labels can be generated on-the-fly without prohibitive cost, and then the BIP style approach will be faster. Technically it would also be possible for the user (or a third party on behalf of the user) to send all of its label data (100k = ~4MB) to the server. And depending on how you define “ephemeral”, storing encrypted label data for which you forget the decryption key (most logically the scan private key) would also work.

I believe it is worth considering other approaches to payment origination/confirmation

What did you have in mind? Notifications?

One more general high level observation: I think the issue is that the “right” choice in terms of protocol constraints very much depends on what kind of wallet ecosystem consequently will come into existence - this is difficult to predict and seems to be where the majority of the disagreement originates from.

macgyver13 commented at 4:34 pm on February 4, 2026: none

I’d like to clarify the context around the benchmarking data that’s been referenced in this discussion.

The benchmark spreadsheet I shared was intended to inform hardware selection for hosting a scanning cluster, not to make protocol-level recommendations. The data was collected on virtual instances from vast.ai without performance guarantees, so accuracy may vary. I shared these results informally with community members to provide general guidance, but intentionally avoided publishing them as definitive facts because the methodology wasn’t rigorous enough. My goal was to support informed discussion, not to establish concrete benchmarks.

And from what I saw, there were also numbers produced saying 8x RTX 3080 was several times slower than 8x RTX 3060 - which until investigated further makes these numbers seem somewhat unreliable.

The difference between 8x RTX 3060 and 8x RTX 3080 based on my testing is not sufficient to draw strong conclusions. I have had retesting some of the results on my list but have kept putting it off for several months now. Using verifiable hardware and establishing a trusted benchmarking process are probably the next best steps if such data is deemed valuable. I am happy to support that effort - please reach out if you have resources.

Re: benchmarks I will stand behind the claim that more “lower cost” GPUs do perform better than fewer “higher cost” GPUs under multi-user workloads.

I want to support this important conversation about scanning approaches while ensuring the data is used appropriately - as directional input rather than authoritative measurements.

defenwycke commented at 9:10 pm on February 4, 2026: none

Solving BIP-352’s Scanning & Labels Problem

Hello all,

I’m sorry I didn’t see this sooner. I’ve been working on a project that implements Silent Payments and ran into the same scaling issues you’re discussing. Thought I’d share what I landed on - maybe it helps.

The Scanning Problem

BIP-352’s current approach:

For each transaction: For k in 0..K: tweak = hash(shared_secret || k) check if output matches

K can explode. More outputs = more iterations. It’s O(transactions × K) which doesn’t scale.

The fix is simple: include the output index in the tweak.

tweak = hash(shared_secret || output_index)

That’s it.

Now:

- No K iteration at all
- Each output position is inherently unique
- Scanning becomes O(n) - one check per output
- The output index is already known from the transaction structure

If you need some entropy for edge cases (same recipient multiple times in one tx), bound a nonce to something small like 0-99:

tweak = hash(shared_secret || output_index || nonce)

But honestly, output_index alone handles 99% of cases.

The Labels Problem

Labels are trickier. The current approach tweaks the spend pubkey:

labeled_spend = spend_pubkey + hash(scan_pubkey || label) * G

This has issues:

1. If scan_pubkey is public, anyone can enumerate all your labeled addresses
2. Sharing multiple labeled addresses reveals they're all you (same scan_pubkey)
3. It's still O(n × labels) for full scanning

My solution: labels are just encrypted metadata, not cryptographic - labels don’t need to be in the address derivation at all. They’re UX, not a crypto feature.

How it works:

1. Recipient creates label locally: { index: 1, name: "Donations" }
2. Recipient shares address with hint: sp1<addr>?l=1
3. Sender sees ?l=1, includes it in encrypted payload
4. Address derivation is standard (no label involvement)
5. Recipient decrypts payload, sees label: 1, looks up locally → "Donations"

The label number is meaningless without the recipient’s local dictionary.

- Alice's label=1 means "Donations"
- Bob's label=1 means "Salary"
- Observer sees encrypted blob with some number in it
- Learns nothing

Privacy properties: - Scanning stays O(n) - labels don’t affect it - No address linkability (all payments use same base keys) - Label semantics are recipient-local (never transmitted) - Everyone uses labels (default to 0), so no fingerprinting

Implementation:

PaymentMetadata {
    label: u32,           // Just a number
    memo: Option<String>, // Optional sender note
}
// Encrypted with ECDH shared secret, included in tx

I use fixed-size encryption (80 bytes) to prevent size-based fingerprinting. Memo capped at 59 bytes - plenty for "Invoice [#123](/bitcoin-core-secp256k1/123/)" type stuff.

Summary

BIP-352 Currently:

K iteration - O(n × K)
Labels scanning - O(n × labels)
Label linkability - Same scan_pubkey links addresses

Simpler fix:

K iteration - Include output_index in tweak → O(n)
Labels scanning - Labels in encrypted metadata → O(n)
Label linkability - Labels not in derivation → unlinkable

The key insight: separate the crypto layer from the UX layer. Address derivation should be simple and fast. Labels are just metadata that rides along encrypted.

I estimate if you implement the suggestions you will be ~400x faster than limited-K.

BIP-352 current (K=1000) / single tx = 16.6 sec / with 10 labels = 166 sec BIP-352 ordered-K / single tx = 0.4 sec / with 10 labels = 4 sec Output index + metadata labels / single tx = 0.4 sec / with 10 labels = 0.4 sec

Happy to discuss if anyone wants to dig into the details. Good luck.

theStack commented at 9:13 pm on February 4, 2026: contributor

@craigraw: Thanks for chiming in. Providing a scanning function that scales well with the number of labels doesn’t necessarily mean users will take advantage of that capability immediately (or ever). In my view, the motivation behind releasing the BIP approach now is not primarily to encourage using lots of labels, but to remain most flexible without having to change the API later, if the demand for those use cases ever appears. Other than the small protocol fix needed to avoid the quadratic scaling issue, I don’t see much drawbacks in releasing the BIP approach.

If an API doc warning like “creation of more than N labels is strongly discouraged for wallet compatibility reasons, only do this if you are fully aware of the drawbacks” or similar is deemed insufficient (it seemed that this alone didn’t convince you), we could go one step further and e.g. restrict the m range for the label creation function to raise awareness on the trade-off directly at the root. Of course, all that is only making it slightly harder, and if a user really wants to use a large number of labels, it’s trivial to side-step a restriction like this as well.

Please comment if you have any concerns about a K_max protocol restriction, for current or potential future SP use cases.

IMO any use of labels over a small number greatly compromises the practical ability of light clients to recover the wallet. […]

This statement is somewhat unrelated to the quoted sentence and the concrete BIP change proposal, which doesn’t even mention the term “label”; scanning with the BIP approach is vulnerable to the quadratic scaling issue even if only a single label is scanned. I assume your argument is basically “no wallet should scan using the BIP approach, therefore there is no need to fix this issue in the first place?”.

craigraw commented at 7:25 am on February 5, 2026: none

This statement is somewhat unrelated to the quoted sentence and the concrete BIP change proposal

It was, apologies - my statement was rather a contribution to this general discussion. To answer your question directly, I have no specific concerns with a K_max protocol restriction.

I assume your argument is basically “no wallet should scan using the BIP approach, therefore there is no need to fix this issue in the first place?”.

No, this is not my argument. Wallets should explore different approaches to scanning depending on their needs. Exchanges may use specialised implementations with BIP style scanning to handle very large numbers of labels. But this module might be considered the reference implementation for scanning, so decisions made here will affect the whole ecosystem.

If an API doc warning like “creation of more than N labels is strongly discouraged for wallet compatibility reasons, only do this if you are fully aware of the drawbacks”

What number did you have in mind for N? I think this is really the critical question, since it may influence the winners and losers in scanning approaches if we are to aim for wallet compatibility.

Other than the small protocol fix needed to avoid the quadratic scaling issue, I don’t see much drawbacks in releasing the BIP approach.

I assume by “the BIP approach” you also mean the accompanying API doc. One drawback is the risk that the API doc warning is ignored and we end up with a fractured ecosystem of wallets that cannot reliably recover each other’s funds. Another drawback lies in making public servers or light client scanning too slow in trying to support some documented minimum number of labels. Although we can’t quantify these risks precisely, both of them are at some degree probably fatal to widespread adoption.

real-or-random commented at 3:20 pm on February 5, 2026: contributor

Two quick remarks:

I believe it is worth considering other approaches to payment origination/confirmation before committing to a “two types of sp wallet” future.

It is certainly worth considering other approaches, but I haven’t seen many of these worked out.
Assuming people will want to use both types, I doubt that this future is avoidable at all. Already from an implementation point of view, we wouldn’t want to have a slower-than-possible full scanning algorithm just because other approaches to scanning do inherently scale worse.

real-or-random commented at 3:27 pm on February 5, 2026: contributor

The fix is simple: include the output index in the tweak.

tweak = hash(shared_secret || output_index)

That’s a stronger form of “ordered k” with even more restrictions and, arguably, an annoying API. The caller needs to know upfront not just the order of all SP outputs but even their position. This could be very annoying if you want to mix SP outputs with non-SP outputs. And makes the drawback pointed out by @jonasnick even worse, namely that if you put the outputs at the wrong position, they’ll be lost (or at least difficult to recover – but note that the recipient may not even be aware that there should have been an incoming payment, so there’s no reason for them to even consider some difficult recovery procedure).

The Labels Problem

03. Sender sees ?l=1, includes it in encrypted payload

This requires sending some payload as part of the tx, and this will be a fundamental change to the SP design. It’s too late to talk about this for v1.

theStack commented at 6:04 pm on February 5, 2026: contributor

If an API doc warning like “creation of more than N labels is strongly discouraged for wallet compatibility reasons, only do this if you are fully aware of the drawbacks”

What number did you have in mind for N? I think this is really the critical question, since it may influence the winners and losers in scanning approaches if we are to aim for wallet compatibility.

Unfortunately I don’t have a good answer to that, and think that probably needs more research. The obvious conservative pick for maximum wallet compatibility would be N=1, since this is the bare minimum to still be compliant with BIP-352 recommendations.

Other than the small protocol fix needed to avoid the quadratic scaling issue, I don’t see much drawbacks in releasing the BIP approach.

I assume by “the BIP approach” you also mean the accompanying API doc.

Yes.

One drawback is the risk that the API doc warning is ignored and we end up with a fractured ecosystem of wallets that cannot reliably recover each other’s funds. […]

I see your concerns, but want to point out that this problem would not disappear by avoiding a certain scanning approach. If we assumed for one moment that the BIP approach doesn’t exist and all wallets used the LabelSet approach, then we could still end up with a large discrepancy between the number of labels created. For some light client wallets it’s likely reasonable to only ever scan for the change label, while other wallets attached to full nodes might be targeted to only run on powerful servers that have the computing power to even scan for hundreds or thousands of labels (used e.g. by exchanges), even without the attractive “each additional label is free” property of the BIP approach.

I’m not sure how much a secp256k1 module could help prevent the “fractured ecosystem” issue, other than providing proper API docs. I do agree that it would be irresponsible to recommend creating lots of labels as a default, without at least mentioning the potential wallet interoperability drawbacks. The current PR #1765 has a TODO for that in the API docs, concrete suggestions are welcome.

defenwycke commented at 8:44 pm on February 5, 2026: none

@real-or-random

That’s a stronger form of “ordered k” with even more restrictions and, arguably, an annoying API. The caller needs to know upfront not just the order of all SP outputs but even their position. This could be very annoying if you want to mix SP outputs with non-SP outputs. And makes the drawback pointed out by @jonasnick even worse, namely that if you put the outputs at the wrong position, they’ll be lost (or at least difficult to recover – but note that the recipient may not even be aware that there should have been an incoming payment, so there’s no reason for them to even consider some difficult recovery procedure).

Your correct! I’m switching to a count based K. This will remove fund loss risk when shuffling within my project. Keep up the good work.

craigraw commented at 6:23 am on February 6, 2026: none

The obvious conservative pick for maximum wallet compatibility would be N=1, since this is the bare minimum to still be compliant with BIP-352 recommendations.

I agree with this choice. IMO when faced with a clear and serious risk for a speculative and relatively weak upside, the conservative choice is the correct engineering decision should wallet interoperability be a goal.

I see your concerns, but want to point out that this problem would not disappear by avoiding a certain scanning approach.

I agree (but again this is not my argument).

I do agree that it would be irresponsible to recommend creating lots of labels as a default, without at least mentioning the potential wallet interoperability drawbacks.

Speaking generally, I see it as a problem of omission. Without a clear recommendation in the BIP highlighting the risks of lots of labels, we have recommendations elsewhere that encourage this. For example, take the image at https://bitcoin.design/guide/how-it-works/silent-payments/#labels, part of a well known and much referenced guide to creating wallet interfaces. It clearly recommends adding a label for “Invoice #47654”, suggesting that it is reasonable for a mobile (light client) wallet to create a new label for every invoice. Based on your quoted statement, we agree this is irresponsible. I don’t blame the UX designer who created this though - we have not provided the technical guidance to indicate the risks to wallet interoperability.

I’m not sure how much a secp256k1 module could help prevent the “fractured ecosystem” issue, other than providing proper API docs.

Something is better than nothing. If the API docs discussed the risks to wallet interoperability and recommended N = 1 for maximum interoperability, we would as a developer community at least have a quotable reference.

theStack commented at 3:56 pm on February 12, 2026: contributor

The announcement of a potential K_max restriction with request for feedback (see comment pinging SP wallet devs here and the mailing list post) has passed one week, and so far no concerns about a limit have been expressed. Two SP wallet developers have confirmed explicitly that they don’t see a problem with this change (https://github.com/bitcoin-core/secp256k1/issues/1799#issuecomment-3843157542, #1799 (comment)), @craigraw was expressing the need of clear API documentation, to strive for maximum wallet compatibility. I agree that this is important, and the stated suggestion sounds reasonable to me:

If the API docs discussed the risks to wallet interoperability and recommended N = 1 for maximum interoperability, we would as a developer community at least have a quotable reference.

(This is only slightly related to the K_max protocol change and the BIP vs. LabelSet decision-making process though, as we’d want to include such a warning in the API docs about the drawbacks of creating a larger number of labels in either scanning approach.)

It’s still an open question what concrete value to set K_max to, and there are some optimizations investigated (shout out to @w0xlt) that look promising and might affect this decision, but my conclusion from what I’ve written above is that we should focus on the BIP scanning approach, as the primary blocker (quadratic scaling issue) can be mitigated. If there are no objections, I will close the LabelSet scanning PR and open a BIP-352 PR with a proposed K_max limit soon, with the goal of getting the BIP scanning PR PR ready.

theStack commented at 8:41 pm on February 19, 2026: contributor

Thanks for all the feedback! There is now a PR open for the proposed K_max protocol limit: https://github.com/bitcoin/bips/pull/2106, where the discussion can be continued.

theStack commented at 3:37 pm on March 4, 2026: contributor

Closing this issue, as we’ve decided to go with the BIP scanning approach, and the major blocker for that was resolved with the K_max protocol change (see see BIPs PR, mailing list post). Thanks to everyone for joining the discussion!

theStack closed this on Mar 4, 2026

Silent Payments module: discussion about different scanning approaches (BIP, LabelSet, hybrid) #1799