coinselector_tests use non-representative CoinSelectionParams and UTXO costs #27754

issue Xekyo openend this issue on May 25, 2023
  1. Xekyo commented at 1:29 pm on May 25, 2023: member

    Is there an existing issue for this?

    • I have searched the existing issues

    Current behaviour

    Most test cases in the coinselector_tests have effective_feerate, long_term_feerate, discard_feerate, and cost_of_change in the corresponding CoinSelectionParams set to 0 which is not accepted during production use of the wallet with the default configuration—we require transactions to at least pay minRelayTxFee (1 ṩ/vB). Under a minimum feerate of 1 ṩ/vB, a (standard) change output will always incur a cost_of_change of at least 31 ṩ.

    Similarly, the add_coin() functions uses values for the UTXOs that are not encountered in production use. The UTXOs are created with feerate, input_bytes, custom_size, and fees set to 0. Inputs can never be smaller than 41 bytes, and thus must incur a fee of at least 41 ṩ during production use.

    image

    The main purpose of the coinselector_tests is to test that coin selection behaves as expected for production use. While it is also necessary to test edge cases and outliers, the strange values for CoinSelectionParams and UTXOs cause coin selection to behave differently than in production which means that production behavior is not well-reflected in the tests. It is also confusing to developers trying to reason about coin selection behavior.

    E.g. our coin selection works by creating multiple input set candidates and then picking the one that scores lowest on “waste”. In the case of multiple input sets tying on waste, we pick the one using more inputs. When cost_of_change is set to 0, our waste calculation surmises that we created a changeless transaction and any overselection will be dropped to fees. This is incorrect when we are actually comparing input sets produced by coin selection algorithms that produce change (e.g. SRD vs Knapsack in the check_max_weight test case) which then has an unexpected outcome.

    Expected behaviour

    Coin selection tests should be tested with values encountered during production.

    • We should test feerates from the range of commonly encountered feerates: e.g. [1000, 5000, 40_000, 500_000]. As edge cases we may also want to have some tests for feerate=0 (permitted by consensus but non-standard), feerate=2_000_000 (should trigger extreme feerate warning).
    • Long_term_feerate should be set to the default value of 10_000 except in cases where it is explicit focus of the tests
    • discard_feerate should be set to 3000 except in cases where it is explicit focus of the tests
    • UTXOs should be created using representative conditions for the coinselector_tests. Preferably they would be calculated from the feerate in the CSParams and a representative input weight. Inputs should be assumed to weigh between ~57.5–148 vB by default, perhaps with an edge cases testing for 41 vB (minimum input size). Inputs should incur fees of at least 58 ṩ (P2TR at 1 ṩ/vB).
    • Change should be assumed to cost fees of at least 31 ṩ (P2WPKH at 1 ṩ/vB).

    The CoinSelectionParams should be set up in the SETUP for the test cases once, and only CSP that are explicitly relevant to the test should be amended in the individual test cases. The add_coin() functions should use values corresponding to the test case’s CSP, unless they are explicitly set differently to test something.

    Steps to reproduce

    I was looking at coinselector_tests.cpp and came upon the check_max_weight test case. In Scenario 1 there are 1515 coins of 0.033 ₿ and one coin of 50 ₿. A solution only composed of the smaller coins would exceed the allowed weight, and it is checking whether a valid solution is found.

    What surprised me was that the expected outcome was supposed to be that only one input is picked; the 50 ₿ input. This was surprising to me because SRD now uses a heap to kick out the lowest value UTXOs if it exceeds the allowed max_weight, so I would expect SRD always to find a solution if there is one.

    Since the test sets long-term-feerate, and feerate to 0, and both a lowest_larger solution per Knapsack and a solution with many inputs per SRD would have change, I would expect their waste to be just the cost_of_change which should be the same for either (and at feerate of 0 should be zero). So I would expect both input sets to tie on waste.

    ChooseSelection breaks ties on waste by preferring the input set with more inputs.

    Yet, the test consistently picks the single input solution by knapsack while I would have expected it to pick the SRD solution because they should be tied for waste and the latter uses more inputs.

    So, I added some print statements to ChooseSelection and learned that the input sets in this test do have non-zero waste. I had a hard time explaining where the waste score for these input sets came from until I realized that our transaction building assumes that no change is created when cost_of_change is set to 0, and thus any overshoot during the selection is presumed to be excess and added to the waste score.

    At production feerates and costs the SRD solution would be preferred, because both input sets produce change and the SRD solution would produce a lower waste score.

    What version of Bitcoin Core are you using?

    master@fa53611cf1b2ab2f49470ba75ee29a94a7b89105


Xekyo


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me