Silent Payments module: discussion about different scanning approaches (BIP, LabelSet, hybrid) #1799

issue theStack openend this issue on January 11, 2026
  1. theStack commented at 3:21 am on January 11, 2026: contributor

    As suggested in #1792 (comment), this issue is created to discuss different scanning approaches for Silent Payments. Currently PRs for both the BIP and LabelSet approaches are open:

    A prototype implementation of the hybrid approach is available at https://github.com/w0xlt/secp256k1/commit/6c76c7c5fa30c5fa7676511de22c47953ee76bd1#diff-4d053be8d1f6d948b412f26ae89711a9dcd2a2683da581cb52b9f6757480361b

    See https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78 for a summary, and https://github.com/theStack/secp256k1lab/blob/add_bip352_module_review_helper/src/secp256k1lab/bip352.py for a Python implementation based on secp256k1lab.

    As the BIP approach suffers from the worst-case scanning attack, following the LabelSet approach with a limit on labels to scan for is the currently suggested way to proceed.

  2. theStack commented at 3:39 am on January 11, 2026: contributor

    Answering comments #1792 (comment) and #1792 (comment) here (to keep the discussion in #1792 focused on the LabelSet PR): @Sjors:

    I think it’s fine to initially go for the LabelSet approach, since many current implementations are mobile, which means they benefit from the safety this provides and won’t run into the label limit anytime soon.

    I agree.

    I do think that hybrid approach is useful. Someone who goes through the trouble of automating label generation might happily add some CPU cores if lets them avoid the complexity of splitting their users across multiple watch keys. Especially if the cost of an attack far outweighs the cost to deal with it.

    What makes the hybrid approach somewhat unattractive in my opinion is the very clunky API. Users have to pass in the labels information twice, providing both a list of labels and a label cache lookup callback function. The two underlying data structures also have to be maintained and kept in sync accordingly by the user. I would say that this clearly doesn’t meet the “hard to misuse API” goal that we usually try to follow in this library.

    We could document the hardware required to guarantee that a worst case block takes less than 10 minutes to process. For the hybrid approach that’s just a single number IIUC. For the LabelSet approach it’s a function of L. With that in place a warning should be enough IMO, for L > ? based on lower end hardware (mobile) and e.g. a 10 second worst case.

    Documenting this makes sense yeah. With “warning”, I guess you mean something purely in the docs, or is there another way during run-time? (I can’t think of any). @Eunovo:

    I had an offline conversation with @RubenSomsen about this. The conversations around the scanning approaches have been focused on the time it takes to complete a tx scan, but it’s also important that we consider the time it takes to scan an entire Block.

    The current per-transaction insights/benchmarks also apply for blocks, if all transactions in a block were equally-sized in terms of number of outputs. That’s of course a simplifying assumption, but I think one that is good enough to give a rough estimate in terms of EC point addition cost for common scenarios (see examples below).

    BIP-style scanning scales with the number of outputs, independently of the number of transactions.

    LabelSet scanning scales with the number of labels * the number of transactions. LabelSet will be Fast when you scan Blocks with few transactions and few labels, but it will get slower compared to BIP-style scanning as the number of transactions increases, even without an increase in the number of labels.

    Right. I think what you state is ultimately just a different way of expressing the per-transaction turn-over point inequation $L < 2 * N$ (indicating when LabelSet is faster than BIP), where “few transactions per block” maps to a higher N value, and “high number of transactions per block” to a lower N value.

    Two concrete example calculations, showing that for up to a few labels, LabelSet approach should be a bit faster:

    • The best-case scenario for the BIP approach is if every transaction consists of only one output (high number of txs per block). Even in that case, for the single-label scenario, the BIP approach has to do twice as many EC point additions per block as for the LabelSet approach (L=1,N=1 => num_txs * 1 < num_txs * 2).
    • A much more realistic scenario is probably to assume that every transaction consists of two outputs (due to change). In that case, scanning up to 3 labels is still faster than the BIP approach (L=3,N=2 => num_txs * 3 < num_txs * 4).

    I think we should do more Block scanning benchmarks. I am interested in doing this myself next week. I can use LabelSet scanning in https://github.com/bitcoin/bitcoin/pull/32966 and benchmark Block scanning times.

    That’s a good idea, looking forward to see results on real-world block data.

  3. Sjors commented at 3:53 am on January 12, 2026: member

    What makes the hybrid approach somewhat unattractive in my opinion is the very clunky API.

    I’m not wedded to the particular API, but it’s nice to have something with the same scaling properties. But if it’s only recommend for power users, having the API be a bit clunky might be acceptable.

    We could document the hardware required to guarantee that a worst case block takes less than 10 minutes to process. […]

    Documenting this makes sense yeah. With “warning”, I guess you mean something purely in the docs, or is there another way during run-time? (I can’t think of any).

    Just docs.

  4. jonasnick commented at 4:11 pm on January 19, 2026: contributor

    My working assumption is that we ultimately want to support many labels for some use cases and also optimize for few-label wallets. Hence, looking at @theStack’s benchmarks, in the long run we should have both the BIP approach and the labelset approach (thanks @w0xlt for coming up with this!).

    As we’ve discussed, the BIP approach suffers from a quadratic scaling issue under adversarially created transactions. I believe we can address this relatively easily, and I’d prefer to do so before shipping the BIP approach. It would be great to hear thoughts from the BIP authors on this (@RubenSomsen, @josibake). I’ve discussed possible mitigations with @real-or-random and the most straightforward option seems to be to put a limit on k. I do not think this would prevent any realistic use case (I even struggle to imagine a use case that requires more than one k) (EDIT: this was based on a misunderstanding, see below). We could run some worst case benchmarks to find a reasonable limit.

    I haven’t yet looked at the proposed hybrid approach (implementing both approaches under a unified API). What are the advantages? The downside, as mentioned by @theStack, appears to be a clunky API. If there isn’t a compelling benefit, I’d lean toward implementing one approach at a time.

  5. RubenSomsen commented at 2:27 pm on January 20, 2026: none

    I calculated the theoretical numbers for comparison. You can view them here. I’ll be referring to these numbers throughout.

    Labels allow the recipient to distinguish the source of funds, without requiring another silent payments address (which would scale linearly - 10 addresses means 10x the scanning cost). I think communicating the functionality to users will be a bit of a challenge because it’s hard for people to wrap their head arounds the fact that using a different label for the same address still means the address can be linked to the same user (perhaps a name change will help - “tagged addresses”?), but regardless, it’s clearly a useful feature and it’s crucial for non-donation use cases where recipients need to know who paid them (e.g. an exchange receiving funds from users).

    Hypothetically, if we ignored the issue of the worst-case targeted attack and I had to pick one algorithm, my strong inclination would be to go with the BIP style approach. The primary reason for this is that we get a clear and consistent upper bound on the maximum scanning cost for all users, irrespective of their label usage. The numbers reveal that we only lose a minor amount of performance for users that don’t use labels. This performance can be regained in the future by supporting both algorithms.

    But of course we’re talking about this because the issue has been hard to ignore. The numbers show that a worst-case targeted attack block is nearly 360x slower (@Eunovo had this benchmarked at 4 minutes on his laptop) than a non-targeted worst-case block.

    If we wanted to work around it, there is one “soft” approach - (1) let the API call contain a limited range for K so it has to be called repeatedly. This basically pushes the problem to wallets, which have to decide if and how to communicate this to the user (e.g., “An unusually large tx contained at least X outputs for you. It’s slow to scan but so far contained Y sats. Want to continue?”).

    A more strict approach is to simply (2) limit K in the BIP. Technically this is a backwards incompatible change, but in practice no wallets have supported sending multiple outputs to the same recipient so it seems safe to assume this won’t have any practical impact. This does raise the question of what we would limit K to. @Sjors came up with a theoretical scenario where users from one exchange are sending money to another exchange, causing a single transaction with lots of different outputs to the same recipient. At K = 1000, I imagine even this type of scenario would be comfortably covered, but this results in 30x slower block validation (~20 seconds?). What upper bound are we willing to accept for the targeted attack?

    The other approach is to (3) go with the LabelSet approach and limit the use of labels. Again, we’d be faced with having to determine a limit, but this time it seems really inevitable it will limit functionality. If we wanted to match the same non-targeted upper bound as the BIP style approach, this would result in a very conservative label limit of 12. And whatever limit we’d pick, if we put it in the BIP, then any future implementation that does want a higher limit will not be BIP compliant.

    Note I think even with the BIP style approach we could be more clear about label usage. Currently there is a very non-committal reference to 100k labels (resulting in a ~4MB lookup table) in the BIP. By default, I’d like for most scanning tools to either scan for no labels other than the change label (primarily light clients because to them label usage has bandwidth overhead), or 100k labels. Anyone who wants to go over 100k (or derive labels in a non-standard way) is on their own.

    Finally, there is also the idea of (4) putting a lower limit on the number of sats in an eligible output. I think putting this in the BIP is difficult because any reasonable definition of dust would change with the BTC price, and I’m inclined not to explicitly push for this at the wallet level either, but it does make the attack more costly. At the current dust limit, the attacker has to send ~$7000 worth of sats to the (lucky?) victim. On top of that they have to pay one block’s worth of fees, though this can be as low as a few hundred dollars.

    Personally, I feel (2) is the most clear-cut approach, assuming we can agree on a reasonable limit for K, but perhaps (1) is also a decent middle ground if we feel the attack is not all that practical but still want to do something about it.

    I’d also like to take this moment to appoint @theStack co-author of BIP352. I think he’s proven himself more than capable, and with @josibake’s current reduced involvement I think it will greatly help move things forward.

  6. jonasnick commented at 2:53 pm on January 21, 2026: contributor

    Thanks for weighing in @RubenSomsen. I have a clarifying question:

    Sjors came up with a theoretical scenario where users from one exchange are sending money to another exchange, causing a single transaction with lots of different outputs to the same recipient.

    Why would this transaction have outputs with k > 0? Wouldn’t each user get an address from the destination exchange with a different label?

  7. RubenSomsen commented at 3:10 pm on January 21, 2026: none

    @jonasnick thanks for considering the arguments.

    k changes whenever multiple outputs are sent to the same recipient, irrespective of what label is used. This is needed to ensure the resulting outputs can’t be linked. Everything except the label part would be the exact same, and the label is assumed to be public information so anyone could try to subtract it to find the link.

  8. real-or-random commented at 4:45 pm on January 21, 2026: contributor

    Hey Ruben, thanks for the detailed comments. I, more or less, agree with everything you said. A more detailed response:

    (1) let the API call contain a limited range for K so it has to be called repeatedly.

    Yeah, I had this idea, too. But it doesn’t really convince me unless we see actual demand from wallets. I think scanning should be a background process, so it’s okay if it takes something on the order of a few seconds but it shouldn’t be crazy. (Where do we draw the line exactly? Okay, bikeshedding very much possible…) But this should be well-documented. If it turns out that wallets need to implement some other, more responsive functionality (e.g., some RPC that scans a single transaction and needs to be quick), then we could reconsider that idea. But currently, I don’t think it will be necessary.

    A more strict approach is to simply (2) limit K in the BIP.

    The fact that @jonasnick and I came to the same conclusion that limiting k seems to be a good idea (without knowing that you had suggested the same) probably means we’re on the right track. (Yes, when we talked about it, we wrongly assumed that different labels can work with the same k=0, but even if we need different k values for different labels, limiting k seems fine to me.) I agree that limiting k seems much less restrictive for applications than limiting the number of labels.

    Yet another idea is to require outputs to be in sequential k order, i.e., the k=0 output appears before the k=1 output in the list of outputs which appears before the k=2 output in the list, etc. @jonasnick had suggested this in one of the PR, but wasn’t sure whether it limits flexibility too much.

    I like this idea, too. It’s more straightforward and feels less like a kludge. It should make the scanning linear in the number of outputs, which should get rid of all the performance issues. (Is it reasonable to get a benchmark here?) And having no limit on k is nice because wallet implementors won’t need to scratch their heads around it. And I doubt that this approach is too limiting. It precludes any protocol on top of silent payments which would rely on the order of outputs.

    • Some protocols such as LN do indeed rely on the order of outputs. But you probably wouldn’t run them on top of SP?
    • It also excludes BIP69, namely the idea that outputs should appear in deterministic (sorted) order to avoid wallet fingerprinting. This is not an unreasonable idea, but it seems that it has mostly been abandoned, with Electrum being the last big wallet to implement it. See https://github.com/spesmilo/electrum/issues/8849 and also the report mentioned there. So there’s not too much value for SP transactions to blend in with BIP69 transactions. @RubenSomsen What’s your opinion on this idea?

    There’s also the possibility to combine this with a maximum k, e.g., k=0 to k=99 may be everywhere in the list but outputs with k >= 100 must be sequential. In that sense, it’s even a relaxation of simply limiting k. But this seems a bit overengineered.

  9. theStack commented at 3:41 am on January 22, 2026: contributor

    A more strict approach is to simply (2) limit K in the BIP.

    Yet another idea is to require outputs to be in sequential k order, i.e., the k=0 output appears before the k=1 output in the list of outputs which appears before the k=2 output in the list, etc.

    Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): https://github.com/theStack/secp256k1/commit/edf222f2106cd45fbaf2f1d47ee87a774c35d42a (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

    Benchmark results on my arm64 laptop:

    0<compile with SP_ORDERED_K_RULE_ENABLED set to 0>
    1$ ./build/bin/bench silentpayments_scan_worstcase
    2Benchmark                               ,    Min(us)       ,    Avg(us)       ,    Max(us)
    3
    4[ "limited k" protocol rule ]
    5silentpayments_scan_worstcase_K=10      ,   192912.0       ,   192912.0       ,   192912.0
    6silentpayments_scan_worstcase_K=100     ,  1644624.0       ,  1644624.0       ,  1644624.0
    7silentpayments_scan_worstcase_K=1000    , 16637185.0       , 16637185.0       , 16637185.0
    
    0<compile with SP_ORDERED_K_RULE_ENABLED set to 1>
    1$ ./build/bin/bench silentpayments_scan_worstcase
    2Benchmark                               ,    Min(us)       ,    Avg(us)       ,    Max(us)
    3
    4[ "ordered k" protocol rule ]
    5silentpayments_scan_worstcase_K=23255   ,   416104.0       ,   416104.0       ,   416104.0
    

    With a limit of K=1000, scanning takes ~16s (close to the assumed ~20s stated by @RubenSomsen above) on my machine, while the “ordered k” scanning for the absolute worst-case (all outputs match) only took <0.5s. This confirms the assumption that scanning with the “ordered k” protocol rule would get rid of all the performance issues.

  10. Sjors commented at 8:03 am on January 22, 2026: member

    (1) let the API call contain a limited range for K so it has to be called repeatedly.

    Yeah, I had this idea, too. But it doesn’t really convince me unless we see actual demand from wallets. I think scanning should be a background process, so it’s okay if it takes something on the order of a few seconds but it shouldn’t be crazy. (Where do we draw the line exactly? Okay, bikeshedding very much possible…)

    On mobile there’s a couple of lines.

    One line is at about 1/100th of a second. Anything that takes longer would block smooth scrolling, since the UI is typically the main thread (disclaimer: it’s been a decade since implemented an iOs app that had to worry about this, but it’s still a thing). That means it needs to be done in a background thread. If we’re going to be well above that anyway, then the mobile developer can’t avoid using a background thread and we don’t have to worry about this particular line.

    Having to use a background thread means they have to worry about thread safety. Although it could still be a single long running thread, dedicated to libsecp operations. But then you can’t sign anything, or do any other secp operation, while a heavy block is being scanned. So perhaps developers need to worry about thread safety anyway, and this doesn’t make it worse. And perhaps most operations are safe anyway, e.g. scan 1 block, sign a transaction, scan the next block, doesn’t really interfere?

    Still, (mobile) UI needs to give some feedback to the user. It can be a block height that frequently goes up, or a progress bar that gradually moves. There I would say that a few seconds is the max you want to go by without any feedback, or the user might think the app is frozen.

    If a specific block takes exceptionally long, it’s probably also good to show the user something to relax them. That’s easier to implement if the call returns after a second or so, saying it needs more. The alternative if for the developer to implement yet another thread that measures the time a call takes and compares it to a baseline. @theStack wrote:

    Benchmark results on my arm64 laptop:

    Desktop users are probably more patient anyway. I think we should measure the performance on mobile phones. Perhaps not the slowest one out there, although it would be nice if the slowest known smartphone that can run an application with libsecp, gets an answer from the library within 1 second.

  11. jonasnick commented at 10:40 am on January 22, 2026: contributor

    k changes whenever multiple outputs are sent to the same recipient, irrespective of what label is used.

    Sorry, that was a significant error in my mental model (at least my most recent one). In that case, I agree that having more than one k is a relevant use case and I take back what I said about restricting k being obviously the most straightforward option.

    Annother thought on the limited-k vs. ordered-k approaches: limited-k appears to lead to a more secure libsecp API. The limit would be enforced in the secp256k1_silentpayments_sender_create_outputs function. If it’s exceeded, the function returns an error and no transaction will be made.

    With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

  12. real-or-random commented at 12:31 pm on January 22, 2026: contributor

    With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

    That’s a great point.

    Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): theStack@edf222f (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

    These are useful numbers, and they confirm that scanning takes O(K * N). I believe for picking a reasonable max K_max, we’d rather want to look at entire blocks, as brought up by @Eunovo. The worst-case block is probably one filled with transactions, each having N = K_max many outputs. And then we’ll start to see the “quadratic” impact when considering different K_max values. For example, what’s the scanning time for a block filled with N = K_max = 1000 transactions vs. a block filled with (about 10x as many) N = K_max = 100 transactions?

  13. theStack commented at 1:56 pm on January 22, 2026: contributor

    Here are worst-case benchmarks for both of these proposed protocol fixes (I called them “limited k” and “ordered k” rules): theStack@edf222f (based on #1765). A transaction with N=23255 outputs is created, where the matched outputs in the same recipient group (of size K) are placed at the very end, at positions [N-K, N-1], in order to maximize the total number of iterations of the inner loop.

    These are useful numbers, and they confirm that scanning takes O(K * N). I believe for picking a reasonable max K_max, we’d rather want to look at entire blocks, as brought up by @Eunovo. The worst-case block is probably one filled with transactions, each having N = K_max many outputs. And then we’ll start to see the “quadratic” impact when considering different K_max values. For example, what’s the scanning time for a block filled with N = K_max = 1000 transactions vs. a block filled with (about 10x as many) N = K_max = 100 transactions?

    Fair point. Note that for transactions where all outputs match (with N = K_max), the “quadratic” impact can be easily avoided by marking found outputs and skipping those in further iterations (that’s currently already done in #1765). If we want to keep that optimization, I’d assume that the worst-case scanning cost per block is reached by maximizing the number of unmatched outputs (in contrast to any found output, those have to be processed over and over again, blowing the cost up) in a single transaction.

    EDIT: Sorry, that was wrong. If the matched outputs are ordered in reverse by an adversary (i.e. from max_k-1…0), then marking and skipping found outputs alone doesn’t gain anything.

  14. w0xlt commented at 2:53 pm on January 22, 2026: none

    The limited-k and ordered-k proposals shift the problem from ‘how do we optimize scanning?’ to ‘what constraints are acceptable?’ .

    The discussion about optimizations (LabelSet, hybrid) was initially motivated by the worst-case attack. If we’re willing to accept protocol constraints like limited-k or ordered-k, the BIP approach might suffice since these constraints bound the worst-case.

    In this scenario, the benefits of LabelSet would be:

    • Simpler API (no callback function, direct label entries)
    • Simpler implementation (no y-parity handling)
    • Wallet label count constraint (L ≤ 500) instead of protocol constraint (limited-k)

    Both alternatives seem acceptable for typical users.

  15. real-or-random commented at 3:30 pm on January 22, 2026: contributor

    @theStack Let’s do some math:

    Assume a transaction with N outputs, and K of them are silent payment outputs for a specific recipient, order in reverse (which is the worst case as you’ve figured out).

    • The 1st iteration in the K loop needs to loop over N outputs
    • The 2nd iteration in the K loops needs to loop over N-1 outputs
    • The K-th iteration in the K loops needs to loop over N-K+1 outputs

    Total number of iterations of the inner loop for a single transaction with N outputs is and K silent payment outputs is:

    0N + (N-1) + ... + (N-K+1)
    1 = ((N-K) + K) + ((N-K) + (K-1)) + ... + ((N-K) + 1)
    2 = K * (N-K) + (K + (K-1) + ... + 1)
    3 = K * (N-K) + K*(K+1)/2
    4 = NK - K^2 + K^2/2 + K/2
    5 = NK - K^2/2 + K/2. 
    

    With the simplifying assumption that the number of outputs in a block is limited to B (and not the number of bytes), and we fit T transactions in it, each transaction has N = B/T outputs. This means that for an entire block with T transactions, the number of iterations of the inner loop is

    0T * ((BK/T) - K^2/2 + K/2)
    1  = T * (-K^2/2 + K/2) + BK
    

    This is clearly maximized for T = 1… So if I’m not mistaken, the worst block appears to be one with a single transaction with K at the limit and N as high as possible to still fit in the block. And this matches exactly the benchmarks you’ve already produced, right?

  16. theStack commented at 5:07 pm on January 22, 2026: contributor

    @theStack Let’s do some math: … This is clearly maximized for T = 1… So if I’m not mistaken, the worst block appears to be one with a single transaction with K at the limit and N as high as possible to still fit in the block. @real-or-random: Nice, I’ve followed these steps on pen and paper and could eventually (after some temporary sign confusion) reproduce the conclusion. I assume @RubenSomsen reached a similar conclusion, since in the gist writeup, the “targeted attack” scenario only considers the single-transaction, maximum-N case.

    And this matches exactly the benchmarks you’ve already produced, right?

    Yes.

    With the ordered-k approach we’d instead have to add a warning to the API doc of sender_create_outputs stating that the generated outputs must appear exactly in the same order in the transaction. If the wallet would (accidentally) ignore the warning and, e.g., sort the outputs via BIP 69 before making a transaction, it would be a mess to recover the coins.

    Another potential drawback of the “ordered-k” rule is that it breaks existing test vectors in the BIP, which currently explicitly check that order does not matter. Adapting those is probably not the end of the world, but in comparison the “limited-k” rule is significantly easier to cope with in that regards (and probably less confusing for existing SP wallets testing against the vectors), as only new test vectors would be added; currently, the largest test case w.r.t. number of outputs is N=4. So in that sense “ordered-k” feels slightly more like a breaking change than “limited-k”.

  17. RubenSomsen commented at 11:31 pm on January 22, 2026: none

    @real-or-random thanks for your comments.

    require outputs to be in sequential k order

    Other than what was already mentioned about “ordered-k”, I’m also concerned what putting restrictions on output order does to privacy. If a known SP user sends you money from the 1st and 5th output in a tx, it implies they own the 2nd, 3rd and 4th output as well. (edit: incorrect, see the next comment) And in coinjoin scenarios, you now have to insist on a certain order for outputs, which reveals information to other coinjoin participants. @theStack

    I assume @RubenSomsen reached a similar conclusion, since in the gist writeup, the “targeted attack” scenario only considers the single-transaction, maximum-N case.

    Yes, a single tx with maximum N is the worst case for two reasons:

    1. The attack scales directly with the number of outputs, and a single tx is the optimal way to maximize them

    2. Whenever limit-K approaches N, you can get an up to 2x speedup by removing checked outputs (note I am assuming the outputs are always randomized). This is why in my gist I have a different formula for the “limit K” scenario and the unlimited “worst case” scenario (where K == N). @w0xlt

    Wallet label count constraint (L ≤ 500) instead of protocol constraint (limited-k)

    I think one important point of nuance is that the slowdown from using labels under LabelSet is consistently present. Every user who approaches L=500 will forever be up to 4x slower than BIP style (non-targeted). Under BIP style, if limit K=120 then only one user will be 4x slower (and everyone else will actually be ~3x faster) for as long as that user is being attacked.

    I understand not everyone is sold on labels, and it’s uncertain what adoption will be like, but I do feel pretty confident in saying we’re far more likely to see high label usage than anyone trying to make transactions with lots of outputs to the same recipient in the foreseeable future. Labels enable some pretty common use cases, such as knowing who paid you, accompanying a payment with a description before the payment is made, and even the ability to spend funds that were sent to some set of labels separately from your other funds.


    Let me test the waters with a concrete proposal - how would people feel about a K=1000 limit (~31x slower, benchmarked at 16 seconds by @theStack)? That’s a limit no future use case would conceivably hit in practice. If someone attacked you for a full day, filling 144 blocks with outputs to you, UX wise the scanning experience will be equivalent to having been offline for a month and then coming back online (and everyone else gets to scan those 144 blocks ~3x faster than normal).

  18. real-or-random commented at 8:49 am on January 23, 2026: contributor

    Other than what was already mentioned about “ordered-k”, I’m also concerned what putting restrictions on output order does to privacy. If a known SP user sends you money from the 1st and 5th output in a tx, it implies they own the 2nd, 3rd and 4th output as well.

    I don’t think so. The 1st output could have k=0 and the 5th output could have k=1?

    And in coinjoin scenarios, you now have to insist on a certain order for outputs, which reveals information to other coinjoin participants.

    My thinking is that it’s not yet clear how exactly a CoinJoin would be created privately. So it’s difficult to judge whether ordered-k would have an impact here. But sure, not having ordered-k retains all the flexibility.

    I tend to think that limited-k is the most pragmatic solution.

    Let me test the waters with a concrete proposal - how would people feel about a K=1000 limit (~31x slower, benchmarked at 16 seconds by @theStack)? That’s a limit no future use case would conceivably hit in practice. If someone attacked you for a full day, filling 144 blocks with outputs to you, UX wise the scanning experience will be equivalent to having been offline for month and then coming back online (and everyone else gets to scan those 144 blocks ~3x faster than normal).

    By the way, one can fit 2324 P2TR outputs in a standard transaction (see https://bitcoinops.org/en/tools/calc-size/). But that would be ~ 2.3 * 16s = 37 s….

    I have really no idea what number we should aim at. I can’t follow @Sjors’s conclusions:

    although it would be nice if the slowest known smartphone that can run an application with libsecp, gets an answer from the library within 1 second.

    My thinking is that scanning will always take place in a background thread, and in a typical scenario, you’d scan more than one block (e.g., what @RubenSomsen mentions when he says “equivalent to having been offline for [a] month and then coming back online”). So there’s no scenario of a user actively sitting in front of a phone and waiting for the wallet app to respond. (Sure, if that scenario existed, we would want to aim for something below a second, maybe much lower. But it’s not the scenario we need to consider.)

    For a background thread, perhaps even 37 s is not too crazy, considering also that the victim will be paid for the scanning in some sense. Even if your machine is 10x slower, you’re still well below the expected block time. And again: If you’re offline for a week or a month, you’d also expect your wallet to be busy for quite some time catching up (not only with SP but also just with block validation!).

    Side track: Another possible consideration is multiple CPUs. Scanning of different transactions is easy to parallelize. That would suggest limiting the number N of outputs too. (In theory, one could also parallelize the inner loop, but that insight is worth nothing if we don’t implement it that way.) But on the other hand, our limit should probably consider some kind of “weakest but still meaningful” scanning client. And it’s fair to assume that this is one with just one CPU (or at least just one CPU available for scanning).

  19. real-or-random closed this on Jan 23, 2026

  20. real-or-random reopened this on Jan 23, 2026

  21. RubenSomsen commented at 12:28 pm on January 23, 2026: none

    I don’t think so. The 1st output could have k=0 and the 5th output could have k=1?

    Ah I see, I misunderstood. In that case my argument can be dismissed.

    By the way, one can fit 2324 P2TR outputs in a standard transaction

    I would not be opposed to K = 2324 (71x slower) either. At that pace one targeted attack block is equivalent to coming back online after half a day. It’s 5x better than the current worst-case, where one targeted attack block takes 2.5 days.

    considering also that the victim will be paid for the scanning in some sense

    Let me push back on this a little - there is currently nothing in the spec stopping an attacker from sending 0 sat outputs. There’s even the possibility that all outputs are 0 sats except for one and you have to grind through all the 0 sat outputs to figure out whether it’s yours. Individual wallets could choose to set limits here, but that won’t be consistent across the ecosystem.

    Another possible consideration is multiple CPUs

    I think even for limited-k this could apply. Finding a result where k=1 should be very rare. From k=2 onward you could just throw all your threads at the problem. For eight threads you’d check k=2 until k=9 simultaneously, then at worst you find nothing and seven of your threads will have done some superfluous computation - harmless in practice.

    But on the other hand, our limit should probably consider some kind of “weakest but still meaningful” scanning client

    I imagine that any device with no more than one CPU would be a light client that also has bandwidth constraints, at which point you’d have to limit your label usage and the LabelSet approach can be applied.

    Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

    The steps are:

    • Receive tweak data (~33 bytes per eligible tx)
    • Calculate what would be your taproot output(s) (i.e. the LabelSet approach)
    • Check against a filter if they’re in the block (more labels == more false positives)

    Perhaps we’re not getting around the fact that both approaches have use cases…

  22. bitcoin-core deleted a comment on Jan 23, 2026
  23. real-or-random added the label feature on Jan 23, 2026
  24. real-or-random added the label performance on Jan 23, 2026
  25. theuni commented at 8:38 pm on January 23, 2026: contributor

    Apologies for coming in late, I’ve just been catching up on the various attacks and approaches to mitigate them.

    One proposal that seems noticeably absent is a BIP-style + filter approach. E.g. rather than maintaining a pre-sorted labels cache, instead pre-generate a probabilistic (bloom/cuckoo/xor/ribbon/etc.) filter and pass that in for queries instead. Then only do a real find if it’s found in the filter. Those finds could even be done as a batch at the end.

    SIMD could be leveraged to parallelize any necessary hashing operations.

    IIUC, that would eliminate any output ordering concerns while requiring a theoretically minimal amount of memory compared to a sorted (hash) set. The cost of course would be a trivial amount of false-positives that would result in unnecessary lookups.

    …Or am I greatly over-simplifying the problem? :)

  26. theStack commented at 5:30 am on January 24, 2026: contributor

    With a limit of K=1000, scanning takes ~16s on my machine

    Update on these worst-case benchmarks: due to a silly bug in the code for reversing the matched outputs, half of them landed right at the start of all outputs, rather than within the [n-k,n-1] group at the end (d’oh!). This effectively cut down the scanning time in half; correspondingly, the corrected worst-case benchmark results are about twice as high, i.e. ~32s for K=1000 (rather than the ~16s stated previously). [1] See https://github.com/theStack/secp256k1/commit/b16369f6ed103c973201c99ff1569613da00705b for the fixed version, I’d be glad if someone is willing to run them on their machine, and/or could give the benchmark code a quick sanity check.

    This probably doesn’t make finding a reasonable K limit to agree on any easier :| I haven’t given much thought on what worst-case time are still acceptable, but in terms of K I guess I would still feel fine if it is limited to the “hundreds” range, as that still seems absurdly high for any real use case.

    Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

    The steps are:

    • Receive tweak data (~33 bytes per eligible tx)

    • Calculate what would be your taproot output(s) (i.e. the LabelSet approach)

    • Check against a filter if they’re in the block (more labels == more false positives)

    Good point. I think for that scenario we don’t need a LabelSet scanning function though: the second step is comparably simple (-> just create one taproot ouput per labeled spend pubkey with k=0), and if the filter matches and the full block/transaction data has been downloaded, the actual scanning can then be done with the BIP approach.

    It took me a while to remember and find it, but luckily an API function for calculating taproot output candidates actually already exists in the take 3 PR for that exact purpose. The reason we took it out for the take 4 PR was to reduce scope and focus on full-node functionality first. Once we ship that function (and other related ones, like e.g. prevouts_summary (de)serialization) in a future release, I think light clients are all set for the above scenario, and a dedicated LabelSet scanning function should not be strictly necessary. @theuni: The label cache lookup cost is treated as negligible in comparison to all the other much more expensive (elliptic curve) operations happening in the loop iterations. With the current BIP approach API, providing an efficient lookup function is in the responsibility of the user. I’ve so far tried out a hash table implementation (khash+rapidhash) and that seemed to work well enough, even for one million entries there was no noticeable decline in performance [2]. Note that the quadratic scaling issue discussed for the BIP approach is independent on the number of labels, it can occur even with L=1, if all outputs of an adversarial tx are targeted to that single labeled address.

    [1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts). [2] https://groups.google.com/g/bitcoindev/c/bP6ktUyCOJI/m/HGCF9YxNAQAJ

  27. Eunovo commented at 3:41 pm on January 26, 2026: none

    Let me push back on this a little - there is currently nothing in the spec stopping an attacker from sending 0 sat outputs. There’s even the possibility that all outputs are 0 sats except for one and you have to grind through all the 0 sat outputs to figure out whether it’s yours. Individual wallets could choose to set limits here, but that won’t be consistent across the ecosystem. @RubenSomsen I previously thought that it was a good idea for wallets to skip doing any work for 0 sat outputs. If we skip checking 0 outputs in a transaction, with N outputs and M outputs set to 0:

    The scanner has to check at least K=0..M-1 to ensure that no potential outputs are lost. This means the scanner has to do (N-M) work M times. In a worst-case attack with N_max outputs and N_max /2 outputs set to zero, the scanner has to check K=0..(N_max/2) -1 for N_max/2 outputs that have non-zero values. This means any wallet that implements this output skipping technique will have to do this (N_max^2)/4 work, just to ensure that their wallet doesn’t have any payments in the transaction when they could have just checked K=0 and only done the N_max work once.

    Hence, it’s not advisable to implement any limits that determine which outputs to skip, as that could open wallets up to unnecessary scanning work.

    [1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts). @theStack Isn’t this already done for the LabelSet approach? The outputs must be sorted to enable binary searches. If we already accept sorting the tx_outputs, what is the reason not to accept random shuffling?

  28. theStack commented at 5:13 pm on January 26, 2026: contributor

    [1] note that we could achieve the previous benchmark results by randomizing the outputs within the scan function, to bring down the average iterations to a find a match to N/2. I’m not sure though if that’s a good idea; tending to think that modifying “list of pointer” arguments is really ugly from an API point of view (even if it’s only done for high output counts).

    @theStack Isn’t this already done for the LabelSet approach? The outputs must be sorted to enable binary searches. If we already accept sorting the tx_outputs, what is the reason not to accept random shuffling?

    That’s a good point. The major difference is that for the LabelSet approach the sorting is important even for the common case (=no match) scenario for performance reasons ($L * log(N)$ vs. $L * N$ output comparisons if we do it with binary vs. linear search), while for the BIP approach it is only relevant to reduce scanning time of pathological transactions. I.e. it boils down to a philosophical question whether we should include “optimizing for the worst-case” code paths, and whether we value a clean API more than a 2x speed-up for a scenario that the majority of users will likely never hit in their life-time. I’m leaning towards the former, but am still open for both. If I’m not mistaken, sorting outputs should have the same effect as shuffling, given that the generated outputs are pseudorandom. We could:

    • sort/randomize the tx outputs internally (as in the LabelSet approach)
    • shift the responsibility to the user, i.e. recommend or demand from the user to sort/randomize outputs prior calling (by mentioning that in the API docs)
    • not do anything of the above
    • (anything else?)

    Happy to hear opinions about that.

  29. w0xlt commented at 10:31 pm on January 26, 2026: none

    Small note: the LabelSet scan path (PR #1792) still has an optimization opportunity: batch inversion for the per-label candidate points. That reduces field inversions from O(L) to O(L/chunk) per k, which materially improves scanning time at larger label counts. See the benchmark discussion here:

    https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78?permalink_comment_id=5897811#gistcomment-5897811

    Also, the hybrid approach still seems valuable for “power user” label counts, even if the initial API is a bit clunky. The prototype heuristic (8 * n_tx_outputs < n_labels + 2) is a reasonable starting point but can be refined using Ruben’s cost calculations.

  30. real-or-random commented at 8:33 am on January 27, 2026: contributor

    In the BIP approach:

    I.e. it boils down to a philosophical question whether we should include “optimizing for the worst-case” code paths, and whether we value a clean API more than a 2x speed-up for a scenario that the majority of users will likely never hit in their life-time. I’m leaning towards the former, but am still open for both. If I’m not mistaken, sorting outputs should have the same effect as shuffling, given that the generated outputs are pseudorandom.

    We can’t really rely on outputs being pseudorandom. The non-SP outputs don’t need to be random at all, and the other values can be ground by the sender. It should suffice to randomize the direction in which the outputs are processed, i.e., whether you start from the top or the bottom. Thus, we only need one random bit per transaction. We could totally let the user pass a randomness argument and hash it together with the txid, or perform some more efficient PRF to get a random bit.

  31. real-or-random commented at 9:41 am on January 27, 2026: contributor

    Labels enable some pretty common use cases, such as knowing who paid you […]

    Indeed, and I consider this almost a requirement for a good UX. (Unless your use case is donations, which is a valid use case but just not the only one.) @w0xlt approach is a great implementation, but I feel that typical clients will need as many labels as incoming payments, and this makes the BIP approach still quite attractive.

    Hmm, in thinking about this, I’m realizing one important factor that has been overlooked. Light clients generally do not have access to the full set of taproot outputs (you could download them, and even apply cut-through, but it’d be more bandwidth), so they’d need the LabelSet approach.

    Hm, indeed. I just took a look at the take 3 PR again, and this is a bit hidden there in the API and also in the example because that one doesn’t use labels for the light client.

    But doesn’t this also mean that light clients are inherently limited to a small number of labels due to bandwidth? They need to query every candidate output in a light client protocol… Perhaps some tricks could be played, e.g., reusing a label for a new payment after the first payment has been received. I don’t know.

    Perhaps we’re not getting around the fact that both approaches have use cases…

    Yes, this appears to be true.

    If we use the BIP approach for full clients, then light clients will be inherently more restricted in functionality and not just in terms of trust/privacy. And how are we going to educate wallet devs and users about this?

    If we use the LabelSet approach for full clients, then even those can’t have a massive set of labels, i.e., they can’t have a massive set of incoming payments (assuming they want to keep track of them).

  32. Eunovo commented at 10:58 am on January 27, 2026: none

    We can’t really rely on outputs being pseudorandom. The non-SP outputs don’t need to be random at all, and the other values can be ground by the sender. It should suffice to randomize the direction in which the outputs are processed, i.e., whether you start from the top or the bottom. Thus, we only need one random bit per transaction. We could totally let the user pass a randomness argument and hash it together with the txid, or perform some more efficient PRF to get a random bit.

    I may have misunderstood your suggestion, but to divide the expected runtime by 2, we need to perform a uniform random search. Randomising the direction means that the attacker has a 1/2 chance of triggering the worst-case search scenario; full uniform random shuffling will always ensure the worst-case search runtime is divided by 2. @theStack Skipping already matched outputs in the scanning algorithm doesn’t produce any speedup in the worst-case ordering without random shuffling, because all the matched outputs are at the end. A Fisher-Yates shuffle implementation should be O(N) time, so it shouldn’t meaningfully affect scanning time, and the changes to the common case scanning time should be negligible.

  33. real-or-random commented at 12:48 pm on January 27, 2026: contributor

    Randomising the direction means that the attacker has a 1/2 chance of triggering the worst-case search scenario;

    I admit I haven’t spent much thought on this, and so I may very well be wrong here. But my current thinking is that if the algorithm needs to look at i outputs to find a specific one at position i when starting from the top, then it will need to look at N-i when starting at the bottom (+/- 1, let’s not care). On average, that’s i+(N-i)/2 = N/2. Is this wrong? (I agree, the attacker has a 1/2 chance of triggering the worst case, but I think such an attacker will also trigger the best case with probability 1/2.)

  34. Eunovo commented at 1:39 pm on January 27, 2026: none

    I admit I haven’t spent much thought on this, and so I may very well be wrong here. But my current thinking is that if the algorithm needs to look at i outputs to find a specific one at position i when starting from the top, then it will need to look at N-i when starting at the bottom (+/- 1, let’s not care). On average, that’s i+(N-i)/2 = N/2. Is this wrong? (I agree, the attacker has a 1/2 chance of triggering the worst case, but I think such an attacker will also trigger the best case with probability 1/2.

    Indeed, the attacker can also trigger the best case with a probability of 1/2, but the runtime will have a high variance (very fast and very slow) as opposed to random shuffling, which will always end up around the N/2 runtime.

    It’s also possible to randomise the direction N times, so that half of the time the result is in front and the other half is at the end (if the outputs are sorted in K ascending or descending order). We can avoid shuffling this way, but I don’t know how to provide the randomness. Going by what you said earlier, we will need N random bits?


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-27 16:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me