This PR adds tests for some edge cases in coin selection.
There are 4 sections by the type of utxo pool:
- a few small inputs of equal value
- one big input and many small inputs
- just one input
- one big and one small input
This PR adds tests for some edge cases in coin selection.
There are 4 sections by the type of utxo pool:
36+ tx1 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1)
37+ tx2 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1)
38+ tx3 = w0.sendtoaddress(w1.getnewaddress("", "legacy"), 1)
39+ self.generate(self.nodes[0], 1)
40+
41+ # TODO: set long_term_fee_rate explicitly
141+ w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05)
142+
143+ # We will target a solution with 20 coins that will add up exactly to 1btc.
144+ # Add a coin that will cover all the fees and exactly match the target.
145+ # TODO: fix that knapsack targets change even if there is no change needed
146+ # (148*20+42+31)*350+5000000 = 6061550
83+ # TODO: implement optimization and uncomment the assertions
84+ # assert_equal(len(tx['vin']), 1)
85+ # assert_equal(len(tx['vout']), 1)
86+ # assert(tx['vin'][0]['txid'] in [tx1, tx2])
87+
88+ # fees for 1 legacy input tx = (148+42)*20 = 3800sat
56+
57+ def select_coins_for(target):
58+ result = w1.fundrawtransaction(w1.createrawtransaction([], {dummyaddress: target}), {"fee_rate": fee_rate})
59+ return w1.decoderawtransaction(result['hex'])
60+
61+ # 1-input-1-output tx is 110vbytes which results in 2200sat fee at 20sat/vbyte
0 # 1-input-1-output tx is 110vbytes (42/4+68+31) which results in 2200sat fee at 20sat/vbyte
106+ assert_equal(len(tx['vin']), 1)
107+ assert_equal(len(tx['vout']), 1)
108+ assert_equal(tx['vin'][0]['txid'], tx3)
109+
110+ # a solution with two bech32 inputs and a change is the most efficient
111+ # but it could only be found by SRD
I was wondering for a bit why the solution could not be found by Knapsack, but then I realized that it probably would, but then a solution with one bech32 input and one legacy input would be preferred because it overshoots the target less by spending more in fees. Given the 1000 iterations, Knapsack would definitely find that solution.
However, then if SRD also found the solution with two bech32 inputs, it would get preferred per the waste metric. Was that your reasoning as well?
118+ # This is more than cost of change and BnB won't find a solution with single input
119+ # We either get
120+ # 1) 2x bech32 solution (SRD only)
121+ # 2) legacy + bech32 solution
122+ tx = select_coins_for(0.99994899)
123+ # legacy + dropping more for fees is still better in terms of waste than legacy + bech32
135+ w0 = self.nodes[0].get_wallet_rpc(self.default_wallet_name)
136+ self.nodes[0].createwallet(wallet_name="test_one_big_and_many_small_coins")
137+ w1 = self.nodes[0].get_wallet_rpc("test_one_big_and_many_small_coins")
138+
139+ # Create a bunch of small coins
140+ for i in range(100):
Actually I was trying to keep the number of small coins as a low as possible while still demonstrating some inefficiencies of coin selection.
What is the test scenario that would be possible with 1000 coins, but not possible with 100 coins?
P.S. I think we can definitely reproduce this scenario, I see no reasons why not? If utxo pool is equivalent to this test cases then even with SRD it will find better solution in ~856/1000 = 85%, which leaves 15% to a really bad knapsack alternative.
Yeah, I think it would be easy to produce the scenario:
• 100 small coins, 1 big coin • no BnB solution • a large set of the small coins produces a smaller change output than the 1 big coin, so knapsack prefers the former • SRD only finds the 1 big coin by itself as a solution in ~1/101 cases
⇒ usually SRD finds the best, but it’s a worse waste than best solution, but better than knapsack
One could probably make it even less likely that the best solution is found by SRD by having two big ones that are both necessary
test_one_big_and_many_small_coins
?
140+ for i in range(100):
141+ w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05)
142+
143+ # We will target a solution with 20 coins that will add up exactly to 1btc.
144+ # Add a coin that will cover all the fees and exactly match the target.
145+ # TODO: fix that knapsack targets change even if there is no change needed
lowest_larger
if the latter produces a smaller change (that’s still bigger than min_change
).
I’m not really arguing for improving it. I guess maybe one way that fix that would be to just remove Knapsack that targets no change and leave only knapsack that targets a MIN_CHANGE.
P.S. BnB won’t produce same solutions even in the situations when a changeless solution is possible, because BnB optimizes for waste. But I think this is okay.
164+ # lock big coin
165+ w1.lockunspent(False, [tx for tx in w1.listunspent() if tx['txid'] == txid])
166+ return fee_total, max_fee
167+
168+ # in all the scenarios below the best solution would be to take one big coin and create a change output
169+ best_fee = (42 + 68 + 31) * fee_rate / COIN
175+ assert_approx(fee_total, best_fee * 100)
176+
177+ # 1) knapsack latches to the exact match solution and can't find the best solution
178+ # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries
179+ # chances for SRD to find it are roughly 2*(21/102) = ±40%.
180+ # Chances are doubled due to coinselection run twice with and without APS enabled
What’s “APS”?
In more detail, see the configuration flag -maxapsfee
added by #14582 in 0.21, which sets the max allowed avoid partial spends (APS) fee. It defaults to 0 (fee is the same with and without APS), and setting it to -1 will disable APS, unless -avoidpartialspends
is set.
174+ assert_approx(max_fee, best_fee)
175+ assert_approx(fee_total, best_fee * 100)
176+
177+ # 1) knapsack latches to the exact match solution and can't find the best solution
178+ # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries
179+ # chances for SRD to find it are roughly 2*(21/102) = ±40%.
172+ # knapsack always finds the best solution
173+ fee_total, max_fee = select_coins(big_coin=1.01051, target=1.00000001, iterations=100)
174+ assert_approx(max_fee, best_fee)
175+ assert_approx(fee_total, best_fee * 100)
176+
177+ # 1) knapsack latches to the exact match solution and can't find the best solution
183+ assert_approx(max_fee, knapsack_fee)
184+ # total fee is lower due to lucky SRD solutions
185+ assert_greater_than(knapsack_fee * 100, fee_total)
186+ assert_greater_than(fee_total, best_fee * 100)
187+
188+ # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE
184+ # total fee is lower due to lucky SRD solutions
185+ assert_greater_than(knapsack_fee * 100, fee_total)
186+ assert_greater_than(fee_total, best_fee * 100)
187+
188+ # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE
189+ # BnB can find the same solution from previous test but dropping the change output to fees
0 # BnB will find the same solution from previous test but dropping the change output to fees
BnB is deterministic, so if it “can” I would expect that it does? Does “change output” mean “remainder” here?—Is this the BnB without an upper bound or the variant we currently use?
208+ assert_greater_than(srd_fee * 100, fee_total)
209+ assert_greater_than(fee_total, best_fee * 100)
210+
211+ # 4) knapsack latches to exact match + MIN_CHANGE
212+ # SRD still occasionally find better solution
213+ fee_total, max_fee = select_coins(big_coin=1.00051, target=0.99, iterations=100)
It’s a bit hard for me to follow how the exact values were picked, since both the big_coin
and target
are changing. I haven’t tried to step by step reproduce each calculation in this first review, and I do see that many of your test cases do explain some of the calculation, but I just noticed that this one for example does not.
Since it’s not trivial for me to follow, I kinda expect that other reviewers might also need a bit help. :grin:
249+ # if change is > MIN_CHANGE than we just spend big coin and create a change
250+ tx = select_coins_for(49.9)
251+ assert_equal(len(tx['vin']), 1)
252+ assert_equal(len(tx['vout']), 2)
253+
254+ # if small coins > MIN_CHANGE than we just add one small coin to the big one and create change
test_one_big_and_many_small_coins
doesn’t work. Any idea how to improve that? For now, added more than one small coin in this test case
I thought this PR can be merged roughly as it is (addressing all the concerns of course), as I believe the test is valuable in itself.
As a follow up I’d like to fix some of the inefficiencies of coin selection by increasing BnB upper limit. But for that we need to improve how we drop change for fees. I tried one approach in #23367 but it doesn’t work that well. So I’m going splitting #23367 into parts that could be merged independently (the first one -> #24649).
Let’s talk on the next wallet meeting.
While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I’m not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.
If we do want these tests, then I don’t think it should use legacy addresses so much as hopefully we won’t make those by default soon.
While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I’m not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.
Agree, these are not your normal tests, primarily because it’s hard to define the “right” or “expected” outcome. I created these tests to expose some shortcomings of current coin selection and justify increasing BnB limit (#24752). It’d be nice to have them around in some form to avoid regressions, maybe we can incorporate them somehow into simulations. wdyt?
If we do want these tests, then I don’t think it should use legacy addresses so much as hopefully we won’t make those by default soon.
To throw my relatively unproductive opinion into the mix, this is a staggeringly large number of magic numbers. Ideally all numbers in tests are “obvious” or programmatic. It’s a nightmare to maintain otherwise.
Didn’t read or think enough about overall thrust of the tests.