Revert “bnb: exit selection when best_waste is 0” #25495

pull murchandamus wants to merge 1 commits into bitcoin:master from murchandamus:2022-06-fix-waste-0-bug changing 2 files +4 −6
  1. murchandamus commented at 9:38 pm on June 28, 2022: contributor

    This reverts commit 9b5950db8683f9b4be03f79ee0aae8a780b01a4b.

    Waste can be negative. At feerates lower than long_term_feerate this means that a waste of 0 may be a suboptimal solution and this causes the search to exit prematurely. Only when the feerate is equal to the long_term_feerate would achieving a waste of 0 indicate that we have achieved an optimal solution, because it would mean that the excess is 0. It seems unlikely that this would ever occur outside of test cases, and even then we should prefer solutions with more inputs over solutions with fewer according to previous decisions—but solutions with more inputs are found later in the branch exploration.

    The “optimization” described in #18257 and implemented in #18262 is therefore a premature exit on a suboptimal solution and should be reverted.

  2. Revert "bnb: exit selection when best_waste is 0"
    This reverts commit 9b5950db8683f9b4be03f79ee0aae8a780b01a4b.
    
    Waste can be negative. At feerates lower than long_term_feerate this
    means that a waste of 0 may be a suboptimal solution and this causes the
    search to exit prematurely.
    Only when the feerate is equal to the long_term_feerate would achieving
    a waste of 0 indicate that we have achieved an optimal solution,
    because it would mean that the excess is 0. It seems unlikely
    that this would ever occur outside of test cases, and even then we
    should prefer solutions with more inputs over solutions with fewer
    according to previous decisions—but solutions with more inputs are found
    later in the branch exploration.
    
    The "optimization" described in #18257 and implemented in #18262 is
    therefore a premature exit on a suboptimal solution and should be reverted.
    af56d63eca
  3. sipa commented at 10:34 pm on June 28, 2022: member

    utACK af56d63eca8a246c02506c2aef7ea8a22e2d07bb

    (I looked just at the diff, and guessed why the original code was wrong without looking at the description)

  4. DrahtBot added the label Wallet on Jun 28, 2022
  5. S3RK commented at 6:52 am on June 29, 2022: contributor

    utACK af56d63eca8a246c02506c2aef7ea8a22e2d07bb

    Agree we should continue searching even solution with zero waste was found. Didn’t test

  6. glozow commented at 10:23 am on June 29, 2022: member
    utACK af56d63eca8a246c02506c2aef7ea8a22e2d07bb, agree it is incorrect to stop here unless we could rule out the possibility of a better solution with negative waste. SelectCoinsBnB doesn’t know what long term feerate and effective feerate are (and probably shouldn’t) so it’s better to have no exit early condition at all.
  7. fanquake requested review from achow101 on Jun 29, 2022
  8. DrahtBot commented at 11:59 am on June 29, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  9. achow101 commented at 2:48 pm on June 29, 2022: member

    ACK af56d63eca8a246c02506c2aef7ea8a22e2d07bb

    Oops, yes, waste can be negative.

  10. fanquake merged this on Jun 29, 2022
  11. fanquake closed this on Jun 29, 2022

  12. achow101 referenced this in commit c92eb6cda0 on Jul 11, 2022
  13. yancyribbens commented at 1:45 pm on August 20, 2022: contributor

    @Xekyo Can you point me to why this decision is so or where this discussion happened?

    ..we should prefer solutions with more inputs over solutions with fewer according to previous decisions

    In your paper, section 3.1.4 discusses conflicting goals, although it isn’t all-together clear to me that we should maximize the number of UTXOs used. If however, we do want to maximize the number of UTXOs used, I think the sort should be reversed here so the selection process starts with the smallest (non-dust) utxos?

  14. murchandamus commented at 4:30 pm on September 2, 2022: contributor

    @yancyribbens: The code had an early exit when an input set happened to achieve a waste score of 0. Waste is calculated by comparing the current cost of spending inputs with a hypothetical long-term cost (and adding the cost of change or excess in the case of changeless solutions). Specifically, the input-component of the waste score is calculated from input_weight × (feerate - longterm_feerate). As you can see, if the current feerate is lower than the long-term feerate, the waste score of inputs will be negative. (I blogged about the waste metric, if you want to read more.) By picking the input set that results in the lowest waste score, we automatically prefer spending more and heavier inputs at low feerates, and fewer and lighter inputs at high feerates.

    E.g. at 1 s/vB, we would prefer a solution with two P2PKH (2×148 B × (1 s/vB - 10 s/vB) = -2664 s) inputs over a solution with three P2WPKH inputs (3×68 vB × (1 s/vB - 10 s/vB) = -1836 s), and a solution with three P2WPKH inputs over a solution with two P2WPKH inputs (2×68 vB × (1 s/vB - 10 s/vB) = -1224 s).

    On the other hand, at high feerates say 60 s/vB, we’ll prefer a solution with one P2PKH input (148 B × (60 s/vB - 10 s/vB) = 7400 s) over one with three P2WPKH inputs (148 B < 3×68 vB), but prefer two P2WPKH inputs over one P2PKH input (2×68 vB < 148 B), and obviously one P2WPKH over a solution with two P2WPKH inputs.

    Over the course of many transactions, this causes our wallet to opportunistically use more and heavier UTXOs at low feerates, which reduces our overall transaction fees, while preventing runaway UTXO pool bloat as we have seen from strategies that always try to minimize the input set (e.g. largest first selection).

    So, we don’t want to maximize the UTXOs used, but we just want to prefer spending them at lower feerates when it is opportune. By starting with the biggest UTXOs, we’ll find one solution as quick as possible, but the algorithm hypothetically enumerates all possible input set (except that we stop after trying 100,000 combinations). While the reverse sort order would also enumerate all, I would expect that it performs much worse to find the first solution. It would be like trying to fill a bucket with sand and rocks where we currently put the rocks first, then pour the sand, and then instead pour the sand first and then try to fit the big rocks: the smaller pieces are much more likely to fit a remaining gap.

  15. yancyribbens commented at 11:54 am on September 4, 2022: contributor

    @Xekyo thanks, I read your blog post. I understand the strategy where when fees are cheap, keep adding utxos to the selection and if fees are expensive go for a minimal utxo selection set. The thing that I think is confusing is the terminology, specifically “negative waste”. Personally, I think “weight” is a better word that we should be aiming to minimize.

    For example: selection_weight = current_waste + input_weight × (feerate - longterm_feerate)

    We therefore try to minimize the selection weight, and waste is always the excess (positive) amount.

  16. murchandamus commented at 2:53 pm on September 6, 2022: contributor
    Thanks for the feedback that the terminology is confusing. Would consequently referring to it as waste_score as in my blog post be less confusing? I’m afraid that overloading the term “weight” with a second meaning in the context of transaction building would not be a good way reduce the confusion. I guess in hindsight it might have been better to invert it and call it opportunity_cost, input_set_preference or similar.
  17. yancyribbens commented at 8:50 am on September 8, 2022: contributor

    Thanks BTW for writing up the terminology in your blog post. It’s much easier to understand than trying to read the reference implementation.

    In your post you describe weight as:

    weight: total weight of the input set

    Which I think is a bit confusing since you’re defining weight in terms of itself. Reading the history of weight it looks like it can also be called vbytes which to me helps make sense of what weight is (a measure of data size after some possible transform).

    I understand not wanting to overload the term “weight”, however in terms of graph theory, I think “weight” is typical terminology used to describe the value assigned to an edge. In the case of DFS, using graph terminology feels fitting.

    I think waste_score is better, or maybe even just score. Terminology that helps describe the combination of ?=waste + vbytes or ?=waste + weight (preference of following an edge) would be helpful. In hindsight I think edge_weight = waste + vbytes * (feerate - longterm_feerate) would be preferable. opportunity_cost and input_set_preference are good as well I think.

  18. murchandamus commented at 6:16 pm on September 13, 2022: contributor

    In your post you describe weight as:

    weight: total weight of the input set

    Which I think is a bit confusing since you’re defining weight in terms of itself.

    Oh, I’m explaining what the components of the formula, not defining terms.

    Reading the history of weight it looks like it can also be called vbytes which to me helps make sense of what weight is (a measure of data size after some possible transform).

    I guess I could have added a bit more detail here in my blog post. There are four terms referring to different expressions of transaction size in Bitcoin:

    • size (raw byte length):
      The actual amount of bytes necessary to transfer or store a transaction. The relevant metric in the context of the pre-segwit blocksize limit.
    • weight The weight of a transaction with 1 weightunit per byte of witness data and 4 weightunits per byte of non-witness data. Used in the context of the post-segwit blockweight limit.
    • virtual size Derived from weight by dividing by 4, to make it more human-comparable to values people were used to before segwit.
    • stripped size The remaining size of a transaction after pruning the witness data. Relevant for non-segwit only, who don’t process witness data and continue to enforce the blocksize limit.

    The formula works out when using weight and the feerate is given in sat/kWU. If the feerate were given in sat/kvB as Bitcoin Core usually uses, one should use the vsize.

    I think edge_weight = waste + vbytes * (feerate - longterm_feerate) would be preferable.

    It’s not clear to me what you’re proposing here, given that vbytes * (feerate - longterm_feerate) is already a component of waste. I also don’t see what the inspiration for “edge” in this context would be.

  19. yancyribbens commented at 11:08 am on September 14, 2022: contributor

    Thanks for the explanation.

    It’s not clear to me what you’re proposing here, given that vbytes * (feerate - longterm_feerate) is already a component of waste

    I had thought weight and virtual size where interchangeable terms although it looks like I was mistaken.

    I also don’t see what the inspiration for “edge” in this context would be.

    The use of edge here was to try and avoid confusion with block weight. I agree with your earlier post that overloading the therm weight is confusing. Using edge_weight would make it clear we are talking about the weight of each edge (branch), not the block weight. Although, perhaps waste_score or opportunity_cost is better since weight is not in the name at all.

  20. murchandamus commented at 6:49 pm on September 15, 2022: contributor
    As a follow-up to clarify perhaps: weight in the context of the waste score is not at all redefined. It’s exactly the same “weight” as in the context of the block weight limit and transaction weight. I just clarify for the formula that only the weight of the inputs is relevant in the context. You could of course use the complete weight of the resulting transaction, but the weight of the transaction header and recipient outputs would be the same for all solutions and thus just shift all compared values by a fixed offset respective to comparing the weight of the inputs.
  21. yancyribbens commented at 8:39 am on September 16, 2022: contributor

    weight in the context of the waste score is not at all redefined.

    I understand that. Originally, I had proposed re-defining waste to be a component of weight since I think a negate waste is confusing.

    I’m afraid that overloading the term “weight”

    Now that I understand better what block weight is, I agree that overloading the term would be confusing.

    you could of course use the complete weight of the resulting transaction

    I’m not sure what is meant by this. Could you clarify what you mean?

  22. murchandamus commented at 6:45 pm on September 16, 2022: contributor

    Let’s say you’re building a transaction that pays two recipients to a P2SH and a P2WSH output. Your UTXO pool consists only of segwit UTXOs, and let’s say we only find two input set candidates, one with a single P2SH-P2WSH input and one with two P2WSH inputs.

    The weight of the transaction with two P2WSH inputs is:

        2×272 + 42 + 124 + 172
    

    and the weight of the transaction with one P2SH-P2WSH input is:

       1×364 + 42 + 124 + 172
    

    As you can see, whatever the current feerate in that calculation the cost stemming from the transaction header and the two outputs (42 + 124 + 172) will be the same across all evaluated input sets and is therefore irrelevant if we’re just looking for the minimal waste score.

  23. yancyribbens commented at 8:27 am on September 17, 2022: contributor

    Alright, I see what you mean. The point is that the difference in feerate does not need to take into account the difference in feerate for different transaction types. It’s only important that the difference in fee changes the waste score in a way that sways the UTXO selection process, but shifting by a fixed value IE the difference in transaction weight isn’t important for the selection to work.

    Back to the earlier conversation, maybe we can settle on a code change PR which changes the terminology to waste_score instead of waste. Anything with weight in the name seems of of the question and a more radical change like opportunity_cost or input_set_preference would require too much change to existing documentation.

  24. murchandamus commented at 10:06 pm on September 18, 2022: contributor

    Alright, I see what you mean. The point is that the difference in feerate does not need to take into account the difference in feerate for different transaction types.

    I’m not sure I understand what you mean. The targeted feerate is fixed across the complete transaction building process. It’s a user input before we start picking inputs.

    It’s only important that the difference in fee changes the waste score in a way that sways the UTXO selection process, but shifting by a fixed value IE the difference in transaction weight isn’t important for the selection to work.

    The different weights of different input types do matter, larger outputs have a more negative waste score at low feerates and a higher waste score at high feerates—so legacy outputs get picked preferentially at low feerates, while they get selected less at high feerates.
    It’s the transaction parts of the transaction that are preset (the header and recipient outputs) that are the same for each selection attempt. We can ignore their transaction weight because they are the same regardless of what input set we pick, and therefore they don’t sway the input selection.

    Back to the earlier conversation, maybe we can settle on a code change PR which changes the terminology to waste_score instead of waste. Anything with weight in the name seems of of the question and a more radical change like opportunity_cost or input_set_preference would require too much change to existing documentation.

    TBH, I’m not convinced that renaming waste to waste_score is that significant an improvement of readability. Perhaps it would be more useful to add a line or two to the documentation to explain that waste can be negative and how it works.

  25. yancyribbens commented at 7:39 am on September 19, 2022: contributor

    The different weights of different input types do matter..

    That makes sense, I see what you mean now.

    TBH, I’m not convinced that renaming waste to waste_score is that significant an improvement of readability.

    Alright. Thanks for entertaining the possibility.

  26. murchandamus deleted the branch on Jun 22, 2023
  27. bitcoin locked this on Jun 21, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-06 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me