test: coinselection edge cases #24580

S3RK commented at 8:39 AM on March 16, 2022: contributor

This PR adds tests for some edge cases in coin selection.

There are 4 sections by the type of utxo pool:

a few small inputs of equal value
one big input and many small inputs
just one input
one big and one small input

S3RK force-pushed on Mar 16, 2022

fanquake added the label Tests on Mar 16, 2022

fanquake requested review from murchandamus on Mar 16, 2022

in test/functional/wallet_coinselection.py:44 in 1607529ea5 outdated

  36 | +        tx1 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1)
  37 | +        tx2 = w0.sendtoaddress(w1.getnewaddress("", "bech32"), 1)
  38 | +        tx3 = w0.sendtoaddress(w1.getnewaddress("", "legacy"), 1)
  39 | +        self.generate(self.nodes[0], 1)
  40 | +
  41 | +        # TODO: set long_term_fee_rate explicitly

murchandamus commented at 9:58 PM on March 16, 2022:

I noticed there is a todo left here. Is that an open question?

S3RK commented at 8:07 AM on March 22, 2022:

fixed that

in test/functional/wallet_coinselection.py:146 in 1607529ea5 outdated

 141 | +            w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05)
 142 | +
 143 | +        # We will target a solution with 20 coins that will add up exactly to 1btc.
 144 | +        # Add a coin that will cover all the fees and exactly match the target.
 145 | +        # TODO: fix that knapsack targets change even if there is no change needed
 146 | +        # (148*20+42+31)*350+5000000 = 6061550

murchandamus commented at 8:51 PM on March 17, 2022:

What's the 42 here? Is that the header plus a P2SH output?

S3RK commented at 8:52 AM on March 21, 2022:

42 is header (11) + p2pwkh (31). I'll updated the comment

in test/functional/wallet_coinselection.py:88 in 1607529ea5 outdated

  83 | +        # TODO: implement optimization and uncomment the assertions
  84 | +        # assert_equal(len(tx['vin']), 1)
  85 | +        # assert_equal(len(tx['vout']), 1)
  86 | +        # assert(tx['vin'][0]['txid'] in [tx1, tx2])
  87 | +
  88 | +        # fees for 1 legacy input tx = (148+42)*20 = 3800sat

murchandamus commented at 9:03 PM on March 17, 2022:

I'm wondering what the 42 is here. Is that the output plus header? My first intuition was that it was just the output, but then I realized that it probably is the whole transaction weight that's being calculated. Does that mean that we're expecting a wrapped segwit output here? Perhaps you could split up the calculation a bit more for clarity, e.g. (10+148+32), if I interpret that correctly.

S3RK commented at 8:53 AM on March 21, 2022:

Yes, 42 is output + header. I'll update the comment

in test/functional/wallet_coinselection.py:61 in 1607529ea5 outdated

  56 | +
  57 | +        def select_coins_for(target):
  58 | +            result = w1.fundrawtransaction(w1.createrawtransaction([], {dummyaddress: target}), {"fee_rate": fee_rate})
  59 | +            return w1.decoderawtransaction(result['hex'])
  60 | +
  61 | +        # 1-input-1-output tx is 110vbytes which results in 2200sat fee at 20sat/vbyte

murchandamus commented at 9:09 PM on March 17, 2022:

        # 1-input-1-output tx is 110vbytes (42/4+68+31) which results in 2200sat fee at 20sat/vbyte

S3RK commented at 8:54 AM on March 21, 2022:

I'll update the comment

in test/functional/wallet_coinselection.py:111 in 1607529ea5 outdated

 106 | +        assert_equal(len(tx['vin']), 1)
 107 | +        assert_equal(len(tx['vout']), 1)
 108 | +        assert_equal(tx['vin'][0]['txid'], tx3)
 109 | +
 110 | +        # a solution with two bech32 inputs and a change is the most efficient
 111 | +        # but it could only be found by SRD

murchandamus commented at 9:27 PM on March 17, 2022:

I was wondering for a bit why the solution could not be found by Knapsack, but then I realized that it probably would, but then a solution with one bech32 input and one legacy input would be preferred because it overshoots the target less by spending more in fees. Given the 1000 iterations, Knapsack would definitely find that solution.

However, then if SRD also found the solution with two bech32 inputs, it would get preferred per the waste metric. Was that your reasoning as well?

S3RK commented at 8:59 AM on March 21, 2022:

Exactly. Knapsack prefers a solution with legacy+bech32 as it's closer to the target + MIN_CHANGE due to higher fees. I'll update the comment

in test/functional/wallet_coinselection.py:127 in 1607529ea5 outdated

 118 | +        # This is more than cost of change and BnB won't find a solution with single input
 119 | +        # We either get
 120 | +        # 1) 2x bech32 solution (SRD only)
 121 | +        # 2) legacy + bech32 solution
 122 | +        tx = select_coins_for(0.99994899)
 123 | +        # legacy + dropping more for fees is still better in terms of waste than legacy + bech32

murchandamus commented at 9:34 PM on March 17, 2022:

What do you think would be the outcome if we changed knapsack to keep the solution with the best waste instead of the solution with the smallest change?

S3RK commented at 7:31 AM on March 22, 2022:

To keep the solution with best waste out of a 1000 trials? Hm.. that would fix this test case, but it'd make Knapsack behave exactly like running SRD in a loop. I guess we can then just drop knapsack altogether.

in test/functional/wallet_coinselection.py:144 in 1607529ea5 outdated

 135 | +        w0 = self.nodes[0].get_wallet_rpc(self.default_wallet_name)
 136 | +        self.nodes[0].createwallet(wallet_name="test_one_big_and_many_small_coins")
 137 | +        w1 = self.nodes[0].get_wallet_rpc("test_one_big_and_many_small_coins")
 138 | +
 139 | +        # Create a bunch of small coins
 140 | +        for i in range(100):

murchandamus commented at 9:37 PM on March 17, 2022:

I was just looking into this user asking about a transaction with 856 inputs a few days ago. I'd be curious to see if we could still reproduce behavior like that on a wallet with a UTXO pool of only about 1,000. I think SRD would hopefully make that better, but I would say we could even go higher than 100 here. :)

S3RK commented at 7:46 AM on March 22, 2022:

Actually I was trying to keep the number of small coins as a low as possible while still demonstrating some inefficiencies of coin selection.

The more coins we have -> the more time the test takes.
The more coins we have -> the harder it's to debug and to reason about.

What is the test scenario that would be possible with 1000 coins, but not possible with 100 coins?

P.S. I think we can definitely reproduce this scenario, I see no reasons why not? If utxo pool is equivalent to this test cases then even with SRD it will find better solution in ~856/1000 = 85%, which leaves 15% to a really bad knapsack alternative.

murchandamus commented at 5:08 PM on March 22, 2022:

Yeah, I think it would be easy to produce the scenario:

• 100 small coins, 1 big coin • no BnB solution • a large set of the small coins produces a smaller change output than the 1 big coin, so knapsack prefers the former • SRD only finds the 1 big coin by itself as a solution in ~1/101 cases

⇒ usually SRD finds the best, but it's a worse waste than best solution, but better than knapsack

One could probably make it even less likely that the best solution is found by SRD by having two big ones that are both necessary

S3RK commented at 8:18 AM on March 23, 2022:

Isn't this exactly test case 3) in test_one_big_and_many_small_coins?

in test/functional/wallet_coinselection.py:149 in 1607529ea5 outdated

 140 | +        for i in range(100):
 141 | +            w0.sendtoaddress(w1.getnewaddress("", "legacy"), 0.05)
 142 | +
 143 | +        # We will target a solution with 20 coins that will add up exactly to 1btc.
 144 | +        # Add a coin that will cover all the fees and exactly match the target.
 145 | +        # TODO: fix that knapsack targets change even if there is no change needed

murchandamus commented at 9:40 PM on March 17, 2022:

I'm not sure we should be improving Knapsack. I'd rather get rid of it, but only after we have better coin selection than Knapsack. @achow101 and I were looking into why Knapsack still produces such a high percentage of the selection results in simulations and realized that after all those 1000 random walks, it'll still prefer lowest_larger if the latter produces a smaller change (that's still bigger than min_change).

murchandamus commented at 9:43 PM on March 17, 2022:

Also, even when Knapsack can't find changeless transactions, wouldn't the expectation be that BnB finds it?

S3RK commented at 7:28 AM on March 22, 2022:

I'm not really arguing for improving it. I guess maybe one way that fix that would be to just remove Knapsack that targets no change and leave only knapsack that targets a MIN_CHANGE.

P.S. BnB won't produce same solutions even in the situations when a changeless solution is possible, because BnB optimizes for waste. But I think this is okay.

in test/functional/wallet_coinselection.py:169 in 1607529ea5 outdated

 164 | +            # lock big coin
 165 | +            w1.lockunspent(False, [tx for tx in w1.listunspent() if tx['txid'] == txid])
 166 | +            return fee_total, max_fee
 167 | +
 168 | +        # in all the scenarios below the best solution would be to take one big coin and create a change output
 169 | +        best_fee = (42 + 68 + 31) * fee_rate / COIN

murchandamus commented at 9:51 PM on March 17, 2022:

42 kinda throws me off a bit as before, but still assuming it's header + output

S3RK commented at 9:04 AM on March 21, 2022:

yes, it's. I'll split off the calculation

in test/functional/wallet_coinselection.py:184 in 1607529ea5 outdated

 175 | +        assert_approx(fee_total, best_fee * 100)
 176 | +
 177 | +        # 1) knapsack latches to the exact match solution and can't find the best solution
 178 | +        # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries
 179 | +        # chances for SRD to find it are roughly 2*(21/102) = ±40%.
 180 | +        # Chances are doubled due to coinselection run twice with and without APS enabled

murchandamus commented at 10:05 PM on March 17, 2022:

What's "APS"?

S3RK commented at 9:04 AM on March 21, 2022:

avoid partial spends. We run transaction building two times (with and without aps)

jonatack commented at 8:55 AM on March 22, 2022:

What's "APS"?

In more detail, see the configuration flag -maxapsfee added by #14582 in 0.21, which sets the max allowed avoid partial spends (APS) fee. It defaults to 0 (fee is the same with and without APS), and setting it to -1 will disable APS, unless -avoidpartialspends is set.

in test/functional/wallet_coinselection.py:183 in 1607529ea5 outdated

 174 | +        assert_approx(max_fee, best_fee)
 175 | +        assert_approx(fee_total, best_fee * 100)
 176 | +
 177 | +        # 1) knapsack latches to the exact match solution and can't find the best solution
 178 | +        # SRD can find a solution better than knapsack if draws the big coin in the first 21 tries
 179 | +        # chances for SRD to find it are roughly 2*(21/102) = ±40%.

murchandamus commented at 10:08 PM on March 17, 2022:

I think the chance is (1-(101/102)×(100/101)×…×(81/82))=(1-81/102)… oh, I see, that's (21/102). So, to get a better SRD solution that would need to occur at least once, which is 1-(81/102)² = 36.93%… okay, I gotcha.

S3RK commented at 9:07 AM on March 21, 2022:

Yes. that is correct. Not sure if I can improve the comment here somehow

in test/functional/wallet_coinselection.py:177 in 1607529ea5 outdated

 172 | +        # knapsack always finds the best solution
 173 | +        fee_total, max_fee = select_coins(big_coin=1.01051, target=1.00000001, iterations=100)
 174 | +        assert_approx(max_fee, best_fee)
 175 | +        assert_approx(fee_total, best_fee * 100)
 176 | +
 177 | +        # 1) knapsack latches to the exact match solution and can't find the best solution

murchandamus commented at 10:12 PM on March 17, 2022:

We've previously used "exact match" for any changeless solution, so it wasn't immediately obvious to me that it was used in the literal sense here. Perhaps you should clarify that you mean that it's literally the same to the sat.

S3RK commented at 9:07 AM on March 21, 2022:

Will update the comment

S3RK commented at 8:35 AM on March 22, 2022:

Updated, but I think we should stop using "exact match" in place of changeless solution. Changeless solution is a solution in the range around exact match constrained by the cost of change/discard fee rate.

in test/functional/wallet_coinselection.py:188 in 1607529ea5 outdated

 183 | +        assert_approx(max_fee, knapsack_fee)
 184 | +        # total fee is lower due to lucky SRD solutions
 185 | +        assert_greater_than(knapsack_fee * 100, fee_total)
 186 | +        assert_greater_than(fee_total, best_fee * 100)
 187 | +
 188 | +        # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE

murchandamus commented at 10:12 PM on March 17, 2022:

I realized above use of "exact match" here. :)

S3RK commented at 9:08 AM on March 21, 2022:

Thanks. Will retouch the comment

in test/functional/wallet_coinselection.py:189 in 1607529ea5 outdated

 184 | +        # total fee is lower due to lucky SRD solutions
 185 | +        assert_greater_than(knapsack_fee * 100, fee_total)
 186 | +        assert_greater_than(fee_total, best_fee * 100)
 187 | +
 188 | +        # 2) no exact match, effective value of the big coin is smaller than target + MIN_CHANGE
 189 | +        # BnB can find the same solution from previous test but dropping the change output to fees

murchandamus commented at 10:14 PM on March 17, 2022:

        # BnB will find the same solution from previous test but dropping the change output to fees

BnB is deterministic, so if it "can" I would expect that it does? Does "change output" mean "remainder" here?—Is this the BnB without an upper bound or the variant we currently use?

S3RK commented at 9:08 AM on March 21, 2022:

Any BnB, since we're within the cost of change bound. I'll retouch the comment

in test/functional/wallet_coinselection.py:218 in 1607529ea5 outdated

 208 | +        assert_greater_than(srd_fee * 100, fee_total)
 209 | +        assert_greater_than(fee_total, best_fee * 100)
 210 | +
 211 | +        # 4) knapsack latches to exact match + MIN_CHANGE
 212 | +        # SRD still occasionally find better solution
 213 | +        fee_total, max_fee = select_coins(big_coin=1.00051, target=0.99, iterations=100)

murchandamus commented at 10:18 PM on March 17, 2022:

It's a bit hard for me to follow how the exact values were picked, since both the big_coin and target are changing. I haven't tried to step by step reproduce each calculation in this first review, and I do see that many of your test cases do explain some of the calculation, but I just noticed that this one for example does not.

Since it's not trivial for me to follow, I kinda expect that other reviewers might also need a bit help. :grin:

S3RK commented at 7:23 AM on March 22, 2022:

This is basically the same as test case "1)" in this section, but all the values decreased by MIN_CHANGE. The idea is to demonstrate that Knapsack latches not only to down-to-satoshi matches. I'll improve the comment

in test/functional/wallet_coinselection.py:261 in 1607529ea5 outdated

 249 | +        # if change is > MIN_CHANGE than we just spend big coin and create a change
 250 | +        tx = select_coins_for(49.9)
 251 | +        assert_equal(len(tx['vin']), 1)
 252 | +        assert_equal(len(tx['vout']), 2)
 253 | +
 254 | +        # if small coins > MIN_CHANGE than we just add one small coin to the big one and create change

murchandamus commented at 10:24 PM on March 17, 2022:

This comment confuses me. After paying the fees, the big coin doesn't have enough funds to pay fees and create the output here, so we would always need both coins.

S3RK commented at 8:44 AM on March 22, 2022:

Oops. The idea was that you have more than 1 small coin. If small coins > MIN_CHANGE that the tests in test_one_big_and_many_small_coins doesn't work. Any idea how to improve that? For now, added more than one small coin in this test case

murchandamus commented at 10:24 PM on March 17, 2022: contributor

Cool! Concept ACK. Did a light review, haven't reproduced all the numbers.

test: coinselection edge cases e378883892

S3RK force-pushed on Mar 22, 2022

S3RK commented at 8:51 AM on March 22, 2022: contributor

Retouched comments to address Xekyo's feedback.

murchandamus commented at 8:38 PM on March 23, 2022: contributor

Looks good. What is the next step you plan for this PR? How can one help?

S3RK commented at 8:50 AM on March 24, 2022: contributor

I thought this PR can be merged roughly as it is (addressing all the concerns of course), as I believe the test is valuable in itself.

As a follow up I'd like to fix some of the inefficiencies of coin selection by increasing BnB upper limit. But for that we need to improve how we drop change for fees. I tried one approach in #23367 but it doesn't work that well. So I'm going splitting #23367 into parts that could be merged independently (the first one -> #24649).

Let's talk on the next wallet meeting.

S3RK commented at 6:27 AM on March 29, 2022: contributor

There is probably a conflict with #24494.

S3RK cross-referenced this on Mar 30, 2022 from issue Optimize coin selection by dropping BnB upper limit by S3RK

S3RK cross-referenced this on Apr 4, 2022 from issue wallet: increase BnB upper limit by S3RK

achow101 commented at 4:38 PM on April 14, 2022: member

While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I'm not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.

If we do want these tests, then I don't think it should use legacy addresses so much as hopefully we won't make those by default soon.

S3RK commented at 6:59 AM on April 19, 2022: contributor

While I appreciate more tests, these tests seem to be extremely specific to the way that coin selection currently works and look like they would start failing as soon as we change something, such as removing knapsack. So I'm not sure if we actually want to have these tests if they are going to be removed and/or require a ton of recalculation for future coin selection changes.

Agree, these are not your normal tests, primarily because it's hard to define the "right" or "expected" outcome. I created these tests to expose some shortcomings of current coin selection and justify increasing BnB limit (#24752). It'd be nice to have them around in some form to avoid regressions, maybe we can incorporate them somehow into simulations. wdyt?

If we do want these tests, then I don't think it should use legacy addresses so much as hopefully we won't make those by default soon.

instagibbs commented at 7:27 PM on May 20, 2022: member

To throw my relatively unproductive opinion into the mix, this is a staggeringly large number of magic numbers. Ideally all numbers in tests are "obvious" or programmatic. It's a nightmare to maintain otherwise.

Didn't read or think enough about overall thrust of the tests.

S3RK commented at 6:06 AM on May 23, 2022: contributor

Thanks for the feedback. Maybe a better way to verify coin selection would to integrate some simulation/benchmark into this repo.

S3RK closed this on May 23, 2022

S3RK cross-referenced this on Apr 4, 2023 from issue wallet: coin selection, don't return results that exceed the max allowed weight by furszy

bitcoin locked this on May 23, 2023