This PR adds tests for some edge cases in coin selection.
There are 4 sections by the type of utxo pool:
- a few small inputs of equal value
- one big input and many small inputs
- just one input
- one big and one small input
This PR adds tests for some edge cases in coin selection.
There are 4 sections by the type of utxo pool:
36 | + tx1 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1) 37 | + tx2 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1) 38 | + tx3 = w0.sendtoaddress(w1.getnewaddress("", "legacy"), 1) 39 | + self.generate(self.nodes[0], 1) 40 | + 41 | + # TODO: set long_term_fee_rate explicitly
I noticed there is a todo left here. Is that an open question?
fixed that
141 | + w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05) 142 | + 143 | + # We will target a solution with 20 coins that will add up exactly to 1btc. 144 | + # Add a coin that will cover all the fees and exactly match the target. 145 | + # TODO: fix that knapsack targets change even if there is no change needed 146 | + # (148*20+42+31)*350+5000000 = 6061550
What's the 42 here? Is that the header plus a P2SH output?
42 is header (11) + p2pwkh (31). I'll updated the comment
83 | + # TODO: implement optimization and uncomment the assertions 84 | + # assert_equal(len(tx['vin']), 1) 85 | + # assert_equal(len(tx['vout']), 1) 86 | + # assert(tx['vin'][0]['txid'] in [tx1, tx2]) 87 | + 88 | + # fees for 1 legacy input tx = (148+42)*20 = 3800sat
I'm wondering what the 42 is here. Is that the output plus header? My first intuition was that it was just the output, but then I realized that it probably is the whole transaction weight that's being calculated. Does that mean that we're expecting a wrapped segwit output here? Perhaps you could split up the calculation a bit more for clarity, e.g. (10+148+32), if I interpret that correctly.
Yes, 42 is output + header. I'll update the comment
56 | + 57 | + def select_coins_for(target): 58 | + result = w1.fundrawtransaction(w1.createrawtransaction([], {dummyaddress: target}), {"fee_rate": fee_rate}) 59 | + return w1.decoderawtransaction(result['hex']) 60 | + 61 | + # 1-input-1-output tx is 110vbytes which results in 2200sat fee at 20sat/vbyte
# 1-input-1-output tx is 110vbytes (42/4+68+31) which results in 2200sat fee at 20sat/vbyte
I'll update the comment
106 | + assert_equal(len(tx['vin']), 1) 107 | + assert_equal(len(tx['vout']), 1) 108 | + assert_equal(tx['vin'][0]['txid'], tx3) 109 | + 110 | + # a solution with two bech32 inputs and a change is the most efficient 111 | + # but it could only be found by SRD
I was wondering for a bit why the solution could not be found by Knapsack, but then I realized that it probably would, but then a solution with one bech32 input and one legacy input would be preferred because it overshoots the target less by spending more in fees. Given the 1000 iterations, Knapsack would definitely find that solution.
However, then if SRD also found the solution with two bech32 inputs, it would get preferred per the waste metric. Was that your reasoning as well?
Exactly. Knapsack prefers a solution with legacy+bech32 as it's closer to the target + MIN_CHANGE due to higher fees. I'll update the comment
118 | + # This is more than cost of change and BnB won't find a solution with single input 119 | + # We either get 120 | + # 1) 2x bech32 solution (SRD only) 121 | + # 2) legacy + bech32 solution 122 | + tx = select_coins_for(0.99994899) 123 | + # legacy + dropping more for fees is still better in terms of waste than legacy + bech32
What do you think would be the outcome if we changed knapsack to keep the solution with the best waste instead of the solution with the smallest change?
To keep the solution with best waste out of a 1000 trials? Hm.. that would fix this test case, but it'd make Knapsack behave exactly like running SRD in a loop. I guess we can then just drop knapsack altogether.
135 | + w0 = self.nodes[0].get_wallet_rpc(self.default_wallet_name) 136 | + self.nodes[0].createwallet(wallet_name="test_one_big_and_many_small_coins") 137 | + w1 = self.nodes[0].get_wallet_rpc("test_one_big_and_many_small_coins") 138 | + 139 | + # Create a bunch of small coins 140 | + for i in range(100):
I was just looking into this user asking about a transaction with 856 inputs a few days ago. I'd be curious to see if we could still reproduce behavior like that on a wallet with a UTXO pool of only about 1,000. I think SRD would hopefully make that better, but I would say we could even go higher than 100 here. :)
Actually I was trying to keep the number of small coins as a low as possible while still demonstrating some inefficiencies of coin selection.
What is the test scenario that would be possible with 1000 coins, but not possible with 100 coins?
P.S. I think we can definitely reproduce this scenario, I see no reasons why not? If utxo pool is equivalent to this test cases then even with SRD it will find better solution in ~856/1000 = 85%, which leaves 15% to a really bad knapsack alternative.
Yeah, I think it would be easy to produce the scenario:
• 100 small coins, 1 big coin • no BnB solution • a large set of the small coins produces a smaller change output than the 1 big coin, so knapsack prefers the former • SRD only finds the 1 big coin by itself as a solution in ~1/101 cases
⇒ usually SRD finds the best, but it's a worse waste than best solution, but better than knapsack
One could probably make it even less likely that the best solution is found by SRD by having two big ones that are both necessary
Isn't this exactly test case 3) in test_one_big_and_many_small_coins?
140 | + for i in range(100): 141 | + w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05) 142 | + 143 | + # We will target a solution with 20 coins that will add up exactly to 1btc. 144 | + # Add a coin that will cover all the fees and exactly match the target. 145 | + # TODO: fix that knapsack targets change even if there is no change needed
I'm not sure we should be improving Knapsack. I'd rather get rid of it, but only after we have better coin selection than Knapsack. @achow101 and I were looking into why Knapsack still produces such a high percentage of the selection results in simulations and realized that after all those 1000 random walks, it'll still prefer lowest_larger if the latter produces a smaller change (that's still bigger than min_change).
Also, even when Knapsack can't find changeless transactions, wouldn't the expectation be that BnB finds it?
I'm not really arguing for improving it. I guess maybe one way that fix that would be to just remove Knapsack that targets no change and leave only knapsack that targets a MIN_CHANGE.
P.S. BnB won't produce same solutions even in the situations when a changeless solution is possible, because BnB optimizes for waste. But I think this is okay.
164 | + # lock big coin 165 | + w1.lockunspent(False, [tx for tx in w1.listunspent() if tx['txid'] == txid]) 166 | + return fee_total, max_fee 167 | + 168 | + # in all the scenarios below the best solution would be to take one big coin and create a change output 169 | + best_fee = (42 + 68 + 31) * fee_rate / COIN
42 kinda throws me off a bit as before, but still assuming it's header + output
yes, it's. I'll split off the calculation
175 | + assert_approx(fee_total, best_fee * 100) 176 | + 177 | + # 1) knapsack latches to the exact match solution and can't find the best solution 178 | + # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries 179 | + # chances for SRD to find it are roughly 2*(21/102) = ±40%. 180 | + # Chances are doubled due to coinselection run twice with and without APS enabled
What's "APS"?
avoid partial spends. We run transaction building two times (with and without aps)
What's "APS"?
In more detail, see the configuration flag -maxapsfee added by #14582 in 0.21, which sets the max allowed avoid partial spends (APS) fee. It defaults to 0 (fee is the same with and without APS), and setting it to -1 will disable APS, unless -avoidpartialspends is set.
174 | + assert_approx(max_fee, best_fee) 175 | + assert_approx(fee_total, best_fee * 100) 176 | + 177 | + # 1) knapsack latches to the exact match solution and can't find the best solution 178 | + # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries 179 | + # chances for SRD to find it are roughly 2*(21/102) = ±40%.
I think the chance is (1-(101/102)×(100/101)×…×(81/82))=(1-81/102)… oh, I see, that's (21/102). So, to get a better SRD solution that would need to occur at least once, which is 1-(81/102)² = 36.93%… okay, I gotcha.
Yes. that is correct. Not sure if I can improve the comment here somehow
172 | + # knapsack always finds the best solution 173 | + fee_total, max_fee = select_coins(big_coin=1.01051, target=1.00000001, iterations=100) 174 | + assert_approx(max_fee, best_fee) 175 | + assert_approx(fee_total, best_fee * 100) 176 | + 177 | + # 1) knapsack latches to the exact match solution and can't find the best solution
We've previously used "exact match" for any changeless solution, so it wasn't immediately obvious to me that it was used in the literal sense here. Perhaps you should clarify that you mean that it's literally the same to the sat.
Will update the comment
Updated, but I think we should stop using "exact match" in place of changeless solution. Changeless solution is a solution in the range around exact match constrained by the cost of change/discard fee rate.
183 | + assert_approx(max_fee, knapsack_fee) 184 | + # total fee is lower due to lucky SRD solutions 185 | + assert_greater_than(knapsack_fee * 100, fee_total) 186 | + assert_greater_than(fee_total, best_fee * 100) 187 | + 188 | + # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE
I realized above use of "exact match" here. :)
Thanks. Will retouch the comment
184 | + # total fee is lower due to lucky SRD solutions 185 | + assert_greater_than(knapsack_fee * 100, fee_total) 186 | + assert_greater_than(fee_total, best_fee * 100) 187 | + 188 | + # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE 189 | + # BnB can find the same solution from previous test but dropping the change output to fees
# BnB will find the same solution from previous test but dropping the change output to fees
BnB is deterministic, so if it "can" I would expect that it does? Does "change output" mean "remainder" here?—Is this the BnB without an upper bound or the variant we currently use?
Any BnB, since we're within the cost of change bound. I'll retouch the comment
208 | + assert_greater_than(srd_fee * 100, fee_total) 209 | + assert_greater_than(fee_total, best_fee * 100) 210 | + 211 | + # 4) knapsack latches to exact match + MIN_CHANGE 212 | + # SRD still occasionally find better solution 213 | + fee_total, max_fee = select_coins(big_coin=1.00051, target=0.99, iterations=100)
It's a bit hard for me to follow how the exact values were picked, since both the big_coin and target are changing. I haven't tried to step by step reproduce each calculation in this first review, and I do see that many of your test cases do explain some of the calculation, but I just noticed that this one for example does not.
Since it's not trivial for me to follow, I kinda expect that other reviewers might also need a bit help. :grin:
This is basically the same as test case "1)" in this section, but all the values decreased by MIN_CHANGE. The idea is to demonstrate that Knapsack latches not only to down-to-satoshi matches. I'll improve the comment
249 | + # if change is > MIN_CHANGE than we just spend big coin and create a change 250 | + tx = select_coins_for(49.9) 251 | + assert_equal(len(tx['vin']), 1) 252 | + assert_equal(len(tx['vout']), 2) 253 | + 254 | + # if small coins > MIN_CHANGE than we just add one small coin to the big one and create change
This comment confuses me. After paying the fees, the big coin doesn't have enough funds to pay fees and create the output here, so we would always need both coins.
Oops. The idea was that you have more than 1 small coin. If small coins > MIN_CHANGE that the tests in test_one_big_and_many_small_coins doesn't work. Any idea how to improve that? For now, added more than one small coin in this test case
Cool! Concept ACK. Did a light review, haven't reproduced all the numbers.
Retouched comments to address Xekyo's feedback.
Looks good. What is the next step you plan for this PR? How can one help?
I thought this PR can be merged roughly as it is (addressing all the concerns of course), as I believe the test is valuable in itself.
As a follow up I'd like to fix some of the inefficiencies of coin selection by increasing BnB upper limit. But for that we need to improve how we drop change for fees. I tried one approach in #23367 but it doesn't work that well. So I'm going splitting #23367 into parts that could be merged independently (the first one -> #24649).
Let's talk on the next wallet meeting.
While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I'm not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.
If we do want these tests, then I don't think it should use legacy addresses so much as hopefully we won't make those by default soon.
While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I'm not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.
Agree, these are not your normal tests, primarily because it's hard to define the "right" or "expected" outcome. I created these tests to expose some shortcomings of current coin selection and justify increasing BnB limit (#24752). It'd be nice to have them around in some form to avoid regressions, maybe we can incorporate them somehow into simulations. wdyt?
If we do want these tests, then I don't think it should use legacy addresses so much as hopefully we won't make those by default soon.
To throw my relatively unproductive opinion into the mix, this is a staggeringly large number of magic numbers. Ideally all numbers in tests are "obvious" or programmatic. It's a nightmare to maintain otherwise.
Didn't read or think enough about overall thrust of the tests.
Thanks for the feedback. Maybe a better way to verify coin selection would to integrate some simulation/benchmark into this repo.