wallet: Remove redundant waste calculation

yancyribbens commented at 9:52 am on September 11, 2022: contributor

The SelectCoinsBnB algorithm computes the waste score while it’s trying to find a match. Then, if it finds a match, it computes the waste score a second time here. To simplify the code (particularly since this is a complex section) its easier to digest and understand if it does one thing, to find a possible selection. I don’t think having it calculate the waste score twice adds much and we can depend on the test harness to make sure the waste score is correct.

DrahtBot added the label Wallet on Sep 11, 2022

yancyribbens force-pushed on Sep 12, 2022

wallet: Remove redundant waste calculation d78589322a

glozow requested review from murchandamus on Sep 12, 2022

DrahtBot commented at 4:59 pm on September 12, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25932 (refactor: Simplify backtrack logic by yancyribbens)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

S3RK commented at 6:13 am on September 14, 2022: contributor

There are two uses for waste metric in coin selection:

BnB algorithm is designed to optimise for the waste. This is why keeping track of the current and best waste is required to make algorithm work
waste is used to compare solutions produced by different algorithms. This is why there is a generic waste calculation for the SelectionResult

Also, the two definitions of waste are coincidentally equal right now, but they don’t have to. There is a possibility that we will use additional heuristics for 2) without changing 1).

in src/wallet/coinselection.cpp:96 in d78589322a

92@@ -93,19 +93,24 @@ std::optional<SelectionResult> SelectCoinsBnB(std::vector<OutputGroup>& utxo_poo
93         bool backtrack = false;
94         if (curr_value + curr_available_value < selection_target || // Cannot possibly reach target with the amount remaining in the curr_available_value.
95             curr_value > selection_target + cost_of_change || // Selected value is out of range, go back and try other branch
96-            (curr_waste > best_waste && (utxo_pool.at(0).fee - utxo_pool.at(0).long_term_fee) > 0)) { // Don't select things which we know will be more wasteful if the waste is increasing

S3RK commented at 6:15 am on September 14, 2022:

By removing this criteria for backtracking you significantly expand the search space that will lead a) to longer execution time b) overall less less solutions found due to hard cap of TOTAL_TRIES.

yancyribbens commented at 7:23 am on September 14, 2022:

This doesn’t change the search space. With this extra condition removed, this code path is used instead. And in the case where curr_waste > best_waste using the code path L98, nothing different happens because of this condition. Since best_waste is better than curr_waste, nothing happens and in both branches, backtrack=true is set. Running the benchmarks will show there’s no difference.

yancyribbens commented at 7:28 am on September 14, 2022:

@S3RK The search space is not changed. The code is now simpler, however, the change could be pulled from this PR and added to a new PR since it’s independent of other improvements.

S3RK commented at 5:15 pm on September 14, 2022:

I agree that without limitation to the number of iterations you’d get the same result with your algorithm. But we actually have a hard limit of TOTAL_TRIES. That means if there are more combinations of coins than TOTAL_TRIES we’re not going to explore all of them. This is why we want to exit as early as possible, in particular this condition allows to cut off branches where we didn’t reach the target yet but the solution is already worse than our current best.

in src/wallet/coinselection.cpp:101 in d78589322a

 98             backtrack = true;
 99         } else if (curr_value >= selection_target) {       // Selected value is within range
100-            curr_waste += (curr_value - selection_target); // This is the excess value which is added to the waste for the below comparison
101+
102+            // This is the excess waste for the current iteration.
103+            curr_waste = (curr_value - selection_target);

S3RK commented at 6:16 am on September 14, 2022:

This is just the excess not the waste. Check line42 in this file or GetSelectionWaste for the waste formula.

yancyribbens commented at 3:27 pm on September 14, 2022:

@S3RK shouldn’t the variable be called excess_val or something instead of curr_waste then? To me either would make sense although as the variable is named right now, it doesn’t seem to be called excess.

S3RK commented at 5:16 pm on September 14, 2022:

No, waste Contains excess but there are other components to it

yancyribbens commented at 3:59 pm on September 14, 2022: contributor

@S3RK thanks for taking a look and your comment. I understand the idea to formalize a scoring system so we can have a metric that’s algorithm independent. However, the boolean check done here is all that’s needed to ensure we minimize our UTXO set when fees are expensive:

(utxo_pool.at(0).fee - utxo_pool.at(0).long_term_fee) > 0

Which as I pointed out in a previous PR is a more confusing way of saying: utxo_pool.at(0).fee > utxo_pool.at(0).long_term_fee

I guess we could leave the code as is where it independently computes the waste, although as you can see by this PR, it’s not actually needed. IE if you notice, all tests still pass even though it no longer tracks the “waste score”. Instead I suppose we get rid of the other waste calulation.

Also, the two definitions of waste are coincidentally equal right now, but they don’t have to. There is a possibility that we will use additional heuristics for 2) without changing 1).

I think what you’re saying is we don’t acctually need this assert then to check for equality?

S3RK commented at 6:51 pm on September 14, 2022: contributor

However, the boolean check done here is all that’s needed to ensure we minimize our UTXO set when fees are expensive:

I addressed the question about the Boolean check in the inline thread. Here however, I want to warn against mis-representing the BnB algo. BnB doesn’t just minimize UTXO set when fees are high, it optimizes for the waste metric that consists of two parts 1) excess 2) opportunity cost of spending inputs at long_term_fee_rate

I guess we could leave the code as is where it independently computes the waste, although as you can see by this PR, it’s not actually needed. IE if you notice, all tests still pass even though it no longer tracks the “waste score”.

If I remember the tests correctly, then this rather shows deficiency in tests rather than the correctness of implemented approach. 1) There is only one test where we hit TOTAL_TRIES limit and in this test we just verify there is no solution found 2) There are no tests asserting waste that has both components excess and input opportunity costs

I’d recommend you contribute these tests first before modifying the algorithm.

Instead I suppose we get rid of the other waste calulation.

The other waste calculation is used to set waste score for the BnB SelectionResult for the purpose of comparing it to other solution. See use-case 2 from my previous comment.

I think what you’re saying is we don’t acctually need this assert then to check for equality?

While I’d say we don’t really need this assert, others might disagree and there’s no harm in it.

achow101 commented at 7:34 pm on September 14, 2022: member

NACK @S3RK is correct and this change would be detrimental. Waste in BnB is used as an optimizing parameter - it allows us to prune branches when searching. Removing it means that we are searching more branches (which impacts the iteration limit), and the change which makes it only the excess means that we may be pruning branches that we were not pruning previously, in addition to not pruning many branches that we would have.

achow101 closed this on Sep 14, 2022

murchandamus commented at 7:44 pm on September 15, 2022: contributor

Yeah, @S3RK and @achow101 are correct. In a high-feerate situation (i.e. feerate > long_term_feerate_estimate) adding more inputs will only increase the waste. So if a partial input set already has a higher waste than the current best solution we cannot find a better solution (one with a lower waste) by adding more inputs. Thus, if the waste of a partial input set is higher than the best waste, we can skip the entire subtree from the search space. I believe this backtrack clause to be a major speed-up, because otherwise evaluating all possible subsets of the cut subtree would take up to exponential work in the size of the subtree.

I think that you were onto this optimization though: https://github.com/bitcoin/bitcoin/pull/26104

bitcoin locked this on Sep 15, 2023

wallet: Remove redundant waste calculation #26061

Conflicts