josibake
commented at 1:13 pm on March 16, 2022:
member
Concept
Following #23789, Bitcoin Core wallet will now generate a change address that matches the payment address type. This improves privacy by not revealing which of the outputs is the change at the time of the transaction in scenarios where the input address types differ from the payment address type. However, information about the change can be leaked in a later transaction. This proposal attempts to address that concern.
Leaking information in a later transaction
Consider the following scenario:
Alice has a wallet with bech32 type UTXOs and pays Bob, who gives her a P2SH address
Alice’s wallet generates a P2SH change output, preserving her privacy in txid: a
Alice then pays Carol, who gives her a bech32 address
Alice’s wallet combines the P2SH UTXO with a bech32 UTXO and txid: b has two bech32 outputs
From a chain analysis perspective, it is reasonable to infer that the P2SH input in txid: b was the change from txid: a. To avoid leaking information in this scenario, Alice’s wallet should avoid picking the P2SH output and instead fund the transaction with only bech32 Outputs. If the payment to Carol can be funded with just the P2SH output, it should be preferred over the bech32 outputs as this will convert the P2SH UTXO to bech32 UTXOs via the payment and change outputs of the new transaction.
TLDR; Avoid mixing output types, spend non-default OutputTypes when it is economical to do so.
Approach
AvailableCoins now populates a struct, which makes it easier to access coins by OutputType. Coin selection tries to find a funding solution by each output type and chooses the most economical by waste metric. If a solution can’t be found without mixing, coin selection runs over the entire wallet, allowing mixing, which is the same as the current behavior.
I’ve also added a functional test (test/functional/wallet_avoid_mixing_output_types.py) and unit test (src/wallet/test/availablecoins_tests.cpp.
in
src/wallet/spend.cpp:790
in
312d9347b3outdated
788- // Choose coins to use
789- std::optional<SelectionResult> result = SelectCoins(wallet, vAvailableCoins, /* nTargetValue */ selection_target, coin_control, coin_selection_params);
790+ // First, do coin selection on a filtered subset of coins, filtered by OutputType
791+ // Attempt with older output types first (LEGACY > P2SH > BECH32 > BECH32M)
792+ std::optional<SelectionResult> result = [&] {
793+ for (const OutputType& type : OUTPUT_TYPES) {
good catch, fixed the formatting! also added a comment re: output_types
maflcko
commented at 1:41 pm on March 16, 2022:
member
I think an argument against this change is that any privacy leaking tx could very well be a coin-join between two wallets that use different output types. Though, I don’t think any such coin-join software exists today, so until then it probably makes sense to use inputs of a single type. #24362#pullrequestreview-909069657 mentions that legacy inputs could preferably be used during low-fee periods to reduce the fees paid by the user. Though, if #24362 is merged, then it would probably not be worth it to optimize for fees.
josibake force-pushed
on Mar 16, 2022
josibake
commented at 1:59 pm on March 16, 2022:
member
I think an argument against this change is that any privacy leaking tx could very well be a coin-join between two wallets that use different output types. Though, I don’t think any such coin-join software exists today, so until then it probably makes sense to use inputs of a single type.
I agree, until mixed inputs in transactions are more common, this change should improve user privacy.
#24362#pullrequestreview-909069657 mentions that legacy inputs could preferably be used during low-fee periods to reduce the fees paid by the user. Though, if #24362 is merged, then it would probably not be worth it to optimize for fees.
I prefer this approach over #24362 in order to keep the privacy benefits of matching change output addresses for all address types. I also see this as a small first step, where we can optimize for fees in a later PR if it is deemed useful/worth it.
josibake force-pushed
on Mar 16, 2022
murchandamus
commented at 2:20 pm on March 16, 2022:
contributor
Very exciting, I’m glad you’re working on this. I was just discussing this idea with people yesterday.
A couple comments on the approach described in the OP:
I noticed that you suggest preferring the spending of older types and compare it to spending newer types preferentially. While I do agree that the old types are the ones we’d want to get rid off rather, I am not sure this would lead to the optimal outcome. I would suggest running coin selection for each of the OutputGroup sets in parallel and then choosing the input set per the waste metric. Due to how the waste metric works, I’d expect this to lead to the older larger output types getting preferentially spent at lower feerates (and in the current feerate environment, usually first). What are your thoughts on that?
Regarding splitting the OutputGroups: since each OutputGroup will only have one type, you could just get the available set once and subdivide it from there to run the coin selection in parallel. Would that perhaps be easier?
CoinSelection is pretty fast, with the exception of Knapsack, so running it on subsets of the UTXO pool shouldn’t be a huge issue, running it on all subsets in parallel may be even faster than running it on the whole UTXO pool once since the complexity of each subset would be much smaller.
Ceterum censeo Knapsackinem esse delendam.
josibake
commented at 2:20 pm on March 16, 2022:
member
force pushed to fix clang-formatting and add comments per @MarcoFalke ’s suggestion. can be verified with git range-diff master f900010 fdf1fa5
in
src/outputtype.h:25
in
fdf1fa5e6coutdated
21@@ -22,6 +22,7 @@ enum class OutputType {
22 BECH32M,
23 };
2425+// Keep output types sorted by age, with older types first
I mean multisig P2SH should not be mixed with legacy either? Mapping to OutputType can’t fix this. In fact it will probably break anyway when the wallet can spend segwit v2 outputs.
It’s unlikely that multisig things would actually be mixed with single key things, but I think it would be incorrect to group P2SH with p2sh-segwit. Conceivably, the same multisig could have been p2sh, then upgraded to p2sh-segwit, and then to bech32. In that case, it would still be legacy, p2sh-segwit, and bech32.
This is not guaranteed to be true. We could have a P2SH multisig and that would be LEGACY.
so if a P2SH_SEWGIT and a P2SH multisig were spent together, when the UTXO is later spent it would reveal that some inputs were legacy multisig and others p2sh-segwit?
As an alternative, would it be better to map WITNESS_V0_SCRIPTHASH to P2SH and map SCRIPTHASH and PUBKEYHASH to LEGACY?
By the time we get determining the output type, we already know that the output either belongs to the wallet, or is watch only. Thus we can do additional solver invocations for scripthash with the redeemScript in order to figure out the true type.
For watch only outputs, we may not be able to do that because not all of the scripts will necessarily be available. However we could still use solving data provided in coinControl to try to determine that.
Now also considering the reedem script when a UTXO is P2SH
josibake
commented at 6:52 pm on March 16, 2022:
member
I would suggest running coin selection for each of the OutputGroup sets in parallel and then choosing the input set per the waste metric. Due to how the waste metric works, I’d expect this to lead to the older larger output types getting preferentially spent at lower feerates (and in the current feerate environment, usually first). What are your thoughts on that?
This makes a lot of sense. By running coin selection on each OutputGroup set you are preserving privacy by avoiding mixing the inputs, but also still choosing the best spend w.r.t to fees. Running them in parallel (as opposed to sequentially) also helps speed things up (which you also mentioned).
murchandamus
commented at 7:08 pm on March 16, 2022:
We could perhaps amend availableCoins to just return the outputs grouped by type already. E.g. availableCoins could be a struct with a vector for each an output type, and a member all that returns the four vectors concatenated.
DrahtBot
commented at 7:22 pm on March 17, 2022:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#25685 (wallet: Faster transaction creation by removing pre-set-inputs fetching responsibility from Coin Selection by furszy)
#25659 (wallet: simplify ListCoins implementation by furszy)
#25647 (wallet: return change from SelectionResult by S3RK)
#24814 (refactor: improve complexity of removing preselected coins by rag-hav)
#24699 (wallet: Improve AvailableCoins performance by reducing duplicated operations by achow101)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
in
src/wallet/spend.cpp:790
in
fdf1fa5e6coutdated
788- // Choose coins to use
789- std::optional<SelectionResult> result = SelectCoins(wallet, vAvailableCoins, /* nTargetValue */ selection_target, coin_control, coin_selection_params);
790+ // First, do coin selection on a filtered subset of coins, filtered by OutputType
791+ // Attempt with older output types first (LEGACY > P2SH > BECH32 > BECH32M)
792+ std::optional<SelectionResult> result = [&] {
793+ // This takes advantange of the fact that OUTPUT_TYPES is already sorted by age (LEGACY, P2SH, BECH32, BECH32M)
This assumes there is a strict preference for newer UTXO types. Rather, it should try to prefer using UTXOs that are not the user’s preferred address type(s).
murchandamus
commented at 5:22 pm on March 22, 2022:
I think the amended approach is now that it would pick the input set with the best waste metric score from all SelectionResults of running coin selection on the separate output types.
sorry, forgot to respond to this! yes, rather than rely on a forced ordering of old to new, I’m implementing @Xekyo ’s suggestion of running coin selection on each output group and then picking the solution with the best waste metric. If no solution is found, then run coin selection over all coins (current behavior).
I’ve updated it now to not force an ordering on OutputType.
danai554 approved
DrahtBot added the label
Needs rebase
on Mar 22, 2022
maflcko referenced this in commit
a697a3fc91
on Mar 24, 2022
josibake force-pushed
on Mar 28, 2022
josibake renamed this:
[RFC] wallet: avoid mixing different `OutputTypes` during coin selection
wallet: avoid mixing different `OutputTypes` during coin selection
on Mar 28, 2022
DrahtBot removed the label
Needs rebase
on Mar 28, 2022
in
test/functional/wallet_abandonconflict.py:43
in
e556ee628eoutdated
murchandamus
commented at 4:07 pm on March 28, 2022:
Review note: I first thought that the self.generate(self.nodes[1], 1) here would of course confirm all of the above transactions, but this obviously only happens after all three transactions get built, which means that unconfirmed change UTXOs could have been used in construction of the latter.
Note to author: This may point to an underlying problem. Since the UTXO pools divided by type are much smaller, the backoff steps allowing for unconfirmed change to be used may come into play much more often. Then, since the waste metric does not yet account for the unconfirmed inputs, we may spend unconfirmed UTXOs more often. This in turn may cause a problem especially if the follow-up transaction is aiming for a higher feerate than the transaction that created the unconfirmed input, at least while #15553 remains unsolved.
interesting, is it by design that the waste metric does not consider unconfirmed change? a few ideas to address this:
have the waste metric also consider unconfirmed change
consider things like unconfirmed inputs when comparing solutions
for example, if solution A is to spend all confirmed inputs with a waste metric of X and solution B spends unconfirmed change with a waste metric of Y, prefer solution A even if X > Y.
murchandamus
commented at 3:14 pm on March 30, 2022:
No, Bitcoin Core ignoring the feerate of parent transactions when using unconfirmed inputs is a bug, but one that turned out more difficult to fix than anticipated. I have started a branch (https://github.com/Xekyo/bitcoin/tree/ancestor-aware-funding) which I intend to push forward in collaboration with @glozow. We were stuck by not being able to get information on unconfirmed parent transactions in the wallet yet, and this should now be resolved by a node:wallet interface that provides mining scores which @glozow came up with.
Then, the intended goal is that we’d automatically add sufficient fees to elevate the mining score for both the new and the ancestor transactions to the target feerate and account for these additional fees in the waste metric.
This is better handled now by running by OutputType at each stage of back-off
in
test/functional/wallet_avoidreuse.py:307
in
eaaece5598outdated
292@@ -293,7 +293,7 @@ def test_getbalances_used(self):
293294 # send multiple transactions, reusing one address
295 for _ in range(101):
296- self.nodes[0].sendtoaddress(new_addr, 1)
murchandamus
commented at 5:41 pm on March 28, 2022:
It’s not clear to me why this change is being made. There don’t seem to be any checks or asserts referring to the UTXO pool of nodes[0], and wouldn’t all the coinbase outputs and change outputs be of the same type?
this is related to your earlier comment (https://github.com/bitcoin/bitcoin/pull/24584#discussion_r836601796) regarding unconfirmed UTXOs being spent more often. In this test, after the first tx is sent, the next 100 txs build a chain where the unconfirmed change is being spent, eventually hitting the mempool chain limit of 25 and causing an error. By explicitly setting fee_rate=1, nodes[0] starts spending its confirmed coinbase UTXOs instead.
This also demonstrates your earlier point that this change is causing unconfirmed outputs to be spent more often
I re-worked this to run coin selection by OutputType at each back-off, which fixed the preference for spending unconfirmed UTXOs. as such, I was able to leave the tests as is and removed the test changes from the PR.
murchandamus
commented at 5:51 pm on March 28, 2022:
Maybe call this return_value_unused.
It’s not obvious what this dummy is and why it’s needed. I take it that it’s a return value for the script solver to put the actual solution components in, and we don’t care about it here because we only want to learn the type. Perhaps you could add a comment here to elucidate that for future reviewers.
815@@ -816,9 +816,40 @@ static bool CreateTransactionInternal(
816 // Get available coins
murchandamus
commented at 5:55 pm on March 28, 2022:
Commit message in “wallet: run coin selection by OutputType” 49d802f25cac9390fb715b0bd4051c5e592365e3 is missing a word in the first sentence of the body.
you’re right, they can all be named result and I think that’s better than having individual names for each run (result_legacy, result_p2sh, etc)
in
src/wallet/spend.cpp:823
in
49d802f25coutdated
821- std::optional<SelectionResult> result = SelectCoins(wallet, availableCoins.all(), /* nTargetValue */ selection_target, coin_control, coin_selection_params);
822+
823+ // Run coin selection on each OutputType and compute the Waste Metric
824+ std::vector<SelectionResult> results;
825+ if (auto result1{SelectCoins(wallet, availableCoins.Legacy, /* nTargetValue */ selection_target, coin_control, coin_selection_params)}) {
826+ result1->ComputeAndSetWaste(coin_selection_params.m_cost_of_change);
murchandamus
commented at 6:07 pm on March 28, 2022:
I’m a bit surprised that we need to post-process all the SelectionResults when they get returned with ComputeAndSetWaste(…). It would feel more natural, that each coin selection algorithm automatically calculates the waste and returns it in the SelectionResults if the feerate is known (which is obvious from coin_control, I think.
Also, if we do need it, you could just collect all the results and then run it once for each result at the end of this block.
murchandamus
commented at 6:19 pm on March 28, 2022:
Perhaps call these is_…, i.e. is_bech32_address(…), because you’re testing the input object for a property and returning a boolean.
in
test/functional/wallet_avoid_mixing_output_types.py:125
in
072e755e55outdated
120+
121+ self.generate(B, 1)
122+ self.log.debug(
123+ "Make the same payment again, with a higher feerate. "
124+ "Check that Alice's wallet prefers bech32 UTXOs."
125+ )
murchandamus
commented at 6:27 pm on March 28, 2022:
I’m wondering whether there is a misunderstanding here. It’s not that certain types are strictly preferred at the corresponding feerate categories, but that larger input scripts will simply affect the waste score more due to their size in either direction due to their larger transaction weight. This in turn leads to a higher waste score for the same count of legacy inputs at high feerates vs more efficient types, and a more negative waste score at low feerates. Obviously, this would cause one specific type to always win if you always have the same count of inputs among the options, but wouldn’t necessarily hold true if the input sets could have varying input counts.
I think we are on the same page, it’s just my wording in the comments isn’t correct. Initially, I wrote the comments for when I was strictly preferring older types to newer types. I’ll update these comments to be more accurate.
in
test/functional/wallet_avoid_mixing_output_types.py:102
in
072e755e55outdated
murchandamus
commented at 6:36 pm on March 28, 2022:
What is the starting situation of the nodes. Are they both empty? Is Alice or Bob the default node that has the previous block rewards? Do they both have funds? Are the funds all of one type, or mixed?
in
test/functional/wallet_avoid_mixing_output_types.py:25
in
072e755e55outdated
20+ CASE 2: Alice has mixed UTXOs (P2SH, bech32, legacy) and makes multiple
21+ payments to Bob's bech32 address, where legacy is preferred at
22+ low fee rates, P2SH at medium fee rates, and bech32 at high rates.
23+ CASE 3: Alice has mixed UTXOs (legacy, P2SH, bech32) and makes multiple
24+ payments to Bob, where legacy and P2SH are mixed at lower fee rates
25+ and finally all UTXOs are mixed for the final payment.
murchandamus
commented at 6:39 pm on March 28, 2022:
It feels that these tests are more focused on the cost efficiency of the input sets than validating that input sets of only one type are produced.
• Could you perhaps provide a bit more information on the initial setup of the wallets’ UTXO pools?
• From what I gather, most of these tests end up using a single input to fund their transactions. A single input transaction will always have the same type for all inputs. :grin: Given that I understood that right, it would be more convincing if the wallets had to select from a number of UTXOs of various values and types, and needed to pick multiple UTXOs to create the input set, while you then showed that we consistently pick input sets with multiple inputs of the same type. I also feel that the inputs being of the same type would be the more important aspect of this PR rather than which type of inputs gets chosen exactly, although showing that the solution with the lower waste was picked would certainly be an appreciated secondary test scenario.
• The way these tests are implemented right now, they may be a bit too tightly coupled to the exact behavior of the coin selection algorithms we currently use, but I am hoping to change the composition of coin selection algorithms in the near term. This would be another reason why I’d prefer if the test were implemented more generically “all input have the same type” vs “exactly this type of inputs gets chosen”. :)
Looking at the test again, I think I am trying to test for some things that would be better to test for in a unit test. Ideally, this test should create large UTXO sets for each node, spend back and forth at different fee rates and to different address types. The assertions should check that all inputs are of the same OutputType and finally test a scenario where the wallet can fund from any single OutputType and is forced to mix OutputTypes to fund the transaction
this should be addressed now. I re-worked the test to make it less dependent on specific UTXOs and transactions and added a more detailed explanation.
murchandamus
commented at 9:53 pm on March 28, 2022:
contributor
Thanks for working on this, I like how you’ve split up availableCoins and how you have structured your tests. I have some ideas on some additional tests you might want to add, and a few questions.
glozow
commented at 8:52 pm on March 29, 2022:
member
Concept ACK
Since the UTXO pools divided by type are much smaller, the backoff steps allowing for unconfirmed change to be used may come into play much more often. Then, since the waste metric does not yet account for the unconfirmed inputs, we may spend unconfirmed UTXOs more often.
is it by design that the waste metric does not consider unconfirmed change?
Our preference for different coin selection solutions is spread out over multiple places in the code, including the waste metric and the sequence of CoinEligibilityFilters used in each AttemptSelection() call. The way we encode our preference for confirmed UTXOs is by attempting coin selection with confirmed UTXOs first, and only trying with unconfirmed UTXOs if that fails.
Approach-wise, did you consider adding output type to CoinEligibilityFilter instead, and trying selection for each output type within AttemptSelection()? This way you try to do selection for every output type for confirmed UTXOs only, before you try unconfirmed.
Another possibility is adding the logic for preferring confirmed over unconfirmed UTXOs to SelectionResult and editing the operator< to account for it.
josibake
commented at 1:17 pm on March 30, 2022:
member
Approach-wise, did you consider adding output type to CoinEligibilityFilter instead, and trying selection for each output type within AttemptSelection()? This way you try to do selection for every output type for confirmed UTXOs only, before you try unconfirmed.
This is how I wrote the code initially (by adding OutputType to CoinEligibilityFilter) but abandoned it in favor of separating the OutputTypes into different vectors during GetAvailableCoins and then running coin selection on each vector separately and comparing the results.
Another possibility is adding the logic for preferring confirmed over unconfirmed UTXOs to SelectionResult and editing the operator< to account for it.
I like this idea. I think it keeps the logic a little cleaner.
josibake
commented at 1:24 pm on March 30, 2022:
member
based on feedback so far (thanks, everyone!), I think next steps are:
Better break out P2SH, vs lumping all P2SH with P2SH_SEGWIT
Account for unconfirmed txs being preferred when picking the “best” solution
Refactor functional tests to be less tightly coupled to coin selection and better documented
Potentially move some tests out of functional into unit tests
sidhujag referenced this in commit
fad0b37317
on Apr 2, 2022
DrahtBot added the label
Needs rebase
on Apr 4, 2022
josibake
commented at 6:36 pm on April 23, 2022:
member
re-worked this based on feedback and overall much happier with the approach:
When output is P2SH, run the solver on the redeemScript to determine the true OutputType. This puts P2SH outputs which are multisig into the legacy bucket @achow101
Instead of running SelectCoins multiple times over each OutputType, I moved the logic into AttemptSelection. This ensures that all of the existing back-offs are respected, which addressed the issue regarding unconfirmed change @Xekyo@glozow
Refactored the test to be better documented and less brittle w.r.t coin selection
Removed changes to existing functional tests as they are no longer failing after fixing the unconfirmed change issue. I do believe there is room for improvement if a test relies on specific behavior from coin selection, but that is out of the scope of this PR
josibake force-pushed
on Apr 23, 2022
josibake force-pushed
on Apr 23, 2022
josibake
commented at 6:57 pm on April 23, 2022:
member
force pushed a few minor code style edits. also noticed the branch is failing to build after rebasing on master. seems related to the fuzzer, will troubleshoot in the morning. fixed
DrahtBot removed the label
Needs rebase
on Apr 23, 2022
josibake force-pushed
on Apr 24, 2022
josibake force-pushed
on Apr 25, 2022
LarryRuane
commented at 4:14 pm on May 4, 2022:
contributor
Concept ACK
The second commit (68d96078714c1ed1804c293a2f05c3fb33092efa) doesn’t compile due to the allow_mixed_output_types optional argument to AttemptSelection(). Fixed in the third commit, but it’s good if each commit compiles.
I would favor making allow_mixed_output_types non-optional.
josibake
commented at 4:17 pm on May 4, 2022:
member
Concept ACK
The second commit (68d9607) doesn’t compile due to the allow_mixed_output_types optional argument to AttemptSelection(). Fixed in the third commit, but it’s good if each commit compiles.
ah, good catch! i must have broken it when trying to break up the commits more cleanly.
I would favor making allow_mixed_output_types non-optional.
can you explain a bit more what you mean by this?
LarryRuane
commented at 4:34 pm on May 4, 2022:
contributor
can you explain a bit more what you mean by this?
It’s just that I prefer code to be more explicit rather than hidden. Optional arguments are good if an existing function has many callers, and it would be disruptive to change all of them. In this case, all the calls (except one in src/bench) are in the same file (spend.cpp), so it wouldn’t be much to update the callers. But this is just a suggestion.
josibake
commented at 4:51 pm on May 4, 2022:
member
It’s just that I prefer code to be more explicit rather than hidden
this is a great point. ill update when fixing the compile issue
bitcoin deleted a comment
on May 5, 2022
bitcoin deleted a comment
on May 5, 2022
josibake force-pushed
on May 5, 2022
josibake
commented at 12:50 pm on May 5, 2022:
member
fixed the compilation issue by my moving the allow_mixed_output_types bool to the correct commit and made the boolean non-optional (thanks @LarryRuane). having the bool non-optional helps the readability of the logic in coin selection.
also put the function comment changes for AttemptSelection in the correct commit.
nit, perhaps move this down to just before the switch below (just so it’s declared close to where it’s used).
in
src/wallet/spend.cpp:634
in
12bb64f9fcoutdated
617@@ -525,34 +618,36 @@ std::optional<SelectionResult> SelectCoins(const CWallet& wallet, const std::vec
618619 // If possible, fund the transaction with confirmed UTXOs only. Prefer at least six
620 // confirmations on outputs received from other wallets and only spend confirmed change.
621- if (auto r1{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 6, 0), vCoins, coin_selection_params)}) return r1;
622- if (auto r2{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), vCoins, coin_selection_params)}) return r2;
623+ if (auto r1{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 6, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/false)}) return r1;
I agree but decided to leave them alone since it’s not really in scope for this PR and would have involved touching extra lines just to remove the number
in
test/functional/wallet_avoid_mixing_output_types.py:15
in
12bb64f9fcoutdated
10+ - BECH32
11+ - P2SH-SEGWIT
12+ - LEGACY
13+
14+This test verifies that mixing different output types is avoided unless
15+absolutely necessary. Both wallets start with zero funds and Alice mines
0absolutely necessary. Both wallets start with zero funds. Alice mines
in
test/functional/wallet_avoid_mixing_output_types.py:16
in
12bb64f9fcoutdated
11+ - P2SH-SEGWIT
12+ - LEGACY
13+
14+This test verifies that mixing different output types is avoided unless
15+absolutely necessary. Both wallets start with zero funds and Alice mines
16+enough blocks to have spendable coinbase outputs. Alice sends a series of 10
0enough blocks to have spendable coinbase outputs. Alice sends 10
in
test/functional/wallet_avoid_mixing_output_types.py:24
in
12bb64f9fcoutdated
19+
20+Bob then sends random valued payments back to Alice, some of which need
21+unconfirmed change, and we verify that none of these payments contain mixed
22+inputs. Finally, Bob sends the remainder of his funds, which requires mixing.
23+
24+They payment values are random, but chosen such that they sum up to a specified
0 """Return a randomly chosen list of n positive integers summing to m.
bitcoin deleted a comment
on May 5, 2022
in
src/wallet/spend.h:40
in
12bb64f9fcoutdated
31@@ -32,10 +32,35 @@ struct TxSize {
32 TxSize CalculateMaximumSignedTxSize(const CTransaction& tx, const CWallet* wallet, const std::vector<CTxOut>& txouts, const CCoinControl* coin_control = nullptr);
33 TxSize CalculateMaximumSignedTxSize(const CTransaction& tx, const CWallet* wallet, const CCoinControl* coin_control = nullptr) EXCLUSIVE_LOCKS_REQUIRED(wallet->cs_wallet);
3435+/** COutputs available for spending, stored by OutputType.
36+ * This struct is really just a wrapper around OutputType vectors with a convenient
37+ * method for concatenating and returning all COutputs as one vector.
38+ *
39+ * clear(), size() methods are implemented so that one can interact with
40+ * the AvailableCoins struct as if it were a vector */
Nit: I think our style guide suggests this formatting.
0/**
1 * COutputs available for spending, stored by OutputType.
2 * This struct is really just a wrapper around OutputType vectors with a convenient
3 * method for concatenating and returning all COutputs as one vector.
4 *
5 * clear(), size() methods are implemented so that one can interact with
6 * the AvailableCoins struct as if it were a vector
7 */
LarryRuane
commented at 6:05 pm on May 5, 2022:
contributor
Concept, tested, and mostly code review ACK12bb64f9fcd11207e5996e41204ab2ea822379c1
Suggestions are minor. I would like to get a better understanding of the algorithm by studying the functional test, so I may have a few more comments. But this looks very good! Thanks for hosting review club!
in
src/wallet/spend.h:109
in
66db9bff47outdated
106+ * param@[in] coins The vector of coins available for selection prior to filtering
107+ * param@[in] coin_selection_params Parameters for the coin selection
108+ * returns If successful, a SelectionResult containing the input set
109+ * If failed, a nullopt
110+ */
111+std::optional<SelectionResult> ChooseSelectionAlgorithm(const CWallet& wallet, const CAmount& nTargetValue, const CoinEligibilityFilter& eligibility_filter, const std::vector<COutput>& coins,
Naming nit: this function does not “choose a selection algorithm”, but rather it picks one out of several coin selection algorithms’ results. How about ChooseSelectionResult or similar?
LarryRuane
commented at 6:40 pm on May 5, 2022:
contributor
The description has a typo (Aproach). Also, for future reference, my understanding is that the description (initial comment) should be fairly concise and mostly plaintext, because it ends up in the git commit history. It’s probably better to make the detailed description a separate comment. It’s probably too late here, because if you added a comment with details, it would end up at the bottom.
in
src/wallet/spend.cpp:668
in
4b0a67bfa7outdated
662 return r8;
663 }
664 }
665 }
666+ // If our wallet does not allow us to spend unconfirmed change, attempt to fund the transaction with mixed OutputTypes
667+ if (auto r9{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/true)}) return r9;
The allow mixed outputs should be on an earlier backoff. Given the issues the Bitcoin Core wallet still has around not accounting for lower feerates in parent transactions of unconfirmed UTXOs, I would prefer to allow mixed inputs before allowing unconfirmed UTXOs. If I’m alone in that, I would most certainly prefer mixed inputs over having more than twelve unconfirmed ancestor transactions.
my reasoning for starting it at the r6 back-off is because that is also the first back-off where we allow partial spending of OutputGroups. so the status quo in the wallet is “spend unconfirmed change before relaxing privacy-preserving constraints.”
you could argue that mixed outputs are not as bad as partially spending an output group w.r.t privacy, but i would still advocate for it being considered along with unconfirmed change, meaning if spending unconfirmed change is more efficient in terms of the waste metric, do that instead of mixing.
I’m not sure that’s an apt comparison. Address reuse is generally discouraged and hopefully happens seldom for most wallets, especially those that care about privacy. In addition, we already run opportunistic avoid-partial-spend in parallel for every single coin selection attempt and prefer its result as long as it is not more expensive even when not explicitly required.
This PR however follows a change in the wallet behavior that will generally match the output type of the recipient on every spend. We must expect wallets to frequently have outputs of different types as a result. This PR does not introduce avoidance of mixing as parallel coin selection results but rather requires it for almost all selection attempts. Splitting the UTXO pool into two or more sets doesn’t half the size of the possible input set candidates, but reduces the combination space to tiny fractions.
I expect this already to cause a reduction in changeless transactions, but worse, I expect the change as proposed to considerably increase the use of unconfirmed inputs and thusly reduce the resulting transactions’ reliability. Given the existing feerate estimation issue with unconfirmed inputs, this would more frequently cause the wallet to undershoot the target feerate, especially if we allow up to 12 unconfirmed inputs before considering mixing.
I’d prefer that mixing be allowed with r2, I’d be okay with r3. I might be convinced to introduce mixing on r4 via simulation results showing that coin selection results and reliability do not get significantly worse. My stance on this would generally be improved by us having finished ancestor aware fee estimation and the waste metric having been updated to account for automatic bumping of low feerate parent transactions.
thanks for the detailed explanation and example. very good points about splitting the UTXO set.
I’ve changed the logic and added a new back-off that allows mixing before considering unconfirmed change. as such, you will still get the privacy improvement when spending from confirmed UTXOs.
what’s nice about how this is implemented is that we can easily move the logic up or down by changing the boolean so if we are able to fix the fee estimation issue and convince ourselves via simulations that there are not undesirable effects, we can easily move the restriction down at a later date by changing a boolean.
in
test/functional/wallet_avoid_mixing_output_types.py:36
in
12bb64f9fcoutdated
31+from test_framework.test_framework import BitcoinTestFramework
32+from test_framework.blocktools import COINBASE_MATURITY
33+
34+
35+def is_bech32_address(node, addr):
36+ """Check if an address cointains a bech32 output."""
I would have expected to count all five types (incl. bech32m and others), even if you don’t explicitly create them here. This test would currently not fail if I Alice additionally sent Bob a P2TR output.
murchandamus
commented at 8:14 pm on May 5, 2022:
contributor
I like the new approach much better.
It seems to me that there should be some tests for the new AvailableCoins struct that ensures that all types get sorted into the right buckets. For example I would have hoped to see all types getting generated and seeing that all buckets were populated. This only happens indirectly right now, by testing whether there are mixed inputs when sweeping the wallet.
Also, I would prefer if all four types were tested instead of just two.
josibake
commented at 10:04 am on May 7, 2022:
member
it seems to me that there should be some tests for the new AvailableCoins struct that ensures that all types get sorted into the right buckets
I am working on a unit test for GetAvailableCoins which covers this. I think a unit test is more appropriate than a functional test in this case. I was planning to add the unit test in a follow-up PR, but also happy to add it here, so long as this PR isn’t getting too big
Also, I would prefer if all four types were tested instead of just two.
I’ll update the functional test, because agreed, no reason not to
josibake
commented at 10:07 am on May 7, 2022:
member
thanks for the review @LarryRuane and suggestions on code style!
be fairly concise and mostly plaintext, because it ends up in the git commit history.
completely forgot about this, thanks for the reminder!
josibake force-pushed
on May 7, 2022
josibake force-pushed
on May 7, 2022
josibake
commented at 12:31 pm on May 7, 2022:
member
josibake
commented at 5:15 pm on May 13, 2022:
member
updated with a unit test for GetAvailableCoins and changed the logic to allow mixing OutputTypes before spending unconfirmed change (thanks @Xekyo for the detailed explanation on this), along with a few other stylistic changes.
josibake force-pushed
on May 13, 2022
in
src/wallet/spend.cpp:240
in
7f0ef4163eoutdated
226+
227+ if (wtx.tx->vout[i].scriptPubKey.IsPayToScriptHash()) {
228+ CTxDestination destination;
229+ ExtractDestination(wtx.tx->vout[i].scriptPubKey, destination);
230+ const CScript redeemScript{GetScriptForDestination(destination)};
231+ type = Solver(redeemScript, return_values_unused);
In 7f0ef4163e5dcde1487a7c4b09efa4b1d6b9f870 “refactor: add AvailableCoins struct”
This looks wrong. ExtractDestination will give the P2SH address for this scriptPubKey, and then we get that same script again and run the Solver on it. It’s just going to return TxoutType::SCRIPTHASH which we already knew.
What this should be doing is looking up the redeemScript in the wallet (by ScriptID) and then running Solver with that redeemScript. We also need to remember that this was inside of a P2SH so that when checking TxoutType::WITNESS_V0_* later, we can know whether it is bech32 or p2sh-segwit.
In 7f0ef4163e5dcde1487a7c4b09efa4b1d6b9f870 “refactor: add AvailableCoins struct”
If something is just SCRIPTHASH, it should be Legacy.
in
src/wallet/spend.h:46
in
7f0ef4163eoutdated
42+ * the AvailableCoins struct as if it were a vector
43+ */
44+struct AvailableCoins {
45+ /* Vectors for each OutputType */
46+ std::vector<COutput> Legacy;
47+ std::vector<COutput> P2SH;
In 7f0ef4163e5dcde1487a7c4b09efa4b1d6b9f870 “refactor: add AvailableCoins struct”
0 std::vector<COutput> P2SH_Segwit;
DrahtBot added the label
Needs rebase
on May 23, 2022
josibake force-pushed
on Jun 9, 2022
josibake
commented at 1:30 pm on June 9, 2022:
member
rebased and responded to feedback from @achow101, specifically, fixing the logic for how P2SH is handled when running available coins.
I am confident this logic is correct for the standard output types (bech32m, bech32, p2sh-segwit, legacy), which covers > 95% of UTXOs (per https://transactionfee.info/charts/inputs-types-by-count/). I’d like to add more test cases for P2SH, but not considering that a blocker for right now
DrahtBot removed the label
Needs rebase
on Jun 9, 2022
in
src/wallet/spend.h:53
in
36ccbbc48aoutdated
49+ std::vector<COutput> P2SH_Segwit;
50+ std::vector<COutput> Bech32;
51+ std::vector<COutput> Bech32m;
52+
53+ /** Other is a catch all for anything that doesn't match the known OutputTypes */
54+ std::vector<COutput> Other;
murchandamus
commented at 7:32 pm on June 14, 2022:
71 /**
72 * Wrapper function for AvailableCoins which skips the `feerate` parameter. Use this function
73 * to list all available coins (e.g. listunspent RPC) while not intending to fund a transaction.
74 */
75-void AvailableCoinsListUnspent(const CWallet& wallet, std::vector<COutput>& vCoins, const CCoinControl* coinControl = nullptr, const CAmount& nMinimumAmount = 1, const CAmount& nMaximumAmount = MAX_MONEY, const CAmount& nMinimumSumAmount = MAX_MONEY, const uint64_t nMaximumCount = 0) EXCLUSIVE_LOCKS_REQUIRED(wallet.cs_wallet);
76+void AvailableCoinsListUnspent(const CWallet& wallet, AvailableCoins& coins, const CCoinControl* coinControl = nullptr, const CAmount& nMinimumAmount = 1, const CAmount& nMaximumAmount = MAX_MONEY, const CAmount& nMinimumSumAmount = MAX_MONEY, const uint64_t nMaximumCount = 0) EXCLUSIVE_LOCKS_REQUIRED(wallet.cs_wallet);
murchandamus
commented at 7:50 pm on June 14, 2022:
In 36ccbbc48a80116eb8e979667abd5f44cf36b030 “refactor: add AvailableCoins struct”
Consistency nit: I noticed that vCoins was renamed to availableCoins throughout this commit, except here it’s renamed to coins, then in the test it is available. In the next commit, the instances of AvailableCoins seem to all be named coins. Perhaps it should be either coins or available_coins throughout.
originally, i tried to match whatever name was being used to minimize the diff, but looking at this again i think having a consistent naming of available_coins throughout is much better (despite a slightly larger diff)
murchandamus
commented at 8:23 pm on June 14, 2022:
In f42db05a0ea8a4acfaf8e0117824cfd0f1d0547b “refactor: use AvailableCoins struct in SelectCoins”
The commit message mentions a newly introuduced method ChooseSelectionAlgorithm, but the code introduces ChooseSelectionResult.
in
src/wallet/spend.cpp:638
in
df6aa1d597outdated
635+ // Also avoid mixing different UTXO OutputTypes to preserve privacy
636+ if (auto r1{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 6, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/false)}) return r1;
637+ if (auto r2{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/false)}) return r2;
638+
639+ // If we can't fund the tx from confirmed UTXOs of a single OutputType, allow mixed output types
640+ if (auto r3{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/true)}) return r3;
murchandamus
commented at 8:36 pm on June 14, 2022:
In df6aa1d597b4de1f4707719b3ad5518a94645ee1 “wallet: run coin selection by OutputType”
AFAICT, this duplicates the selection attempts with segregated types for CoinEligibilityFilter(1, 1, 0).
Either way, the selection with segregated types is attempted first and mixing only happens as a last resort. So, directly setting allow_mixed_output_types=true should result in equivalent back-off steps without repeating the segregated output attempts before allowing mixing.
Not adding this line also obsoletes the following commit to fix the duplicate appearance of r3, 321a13e177dc79581ec896568ab8e7633d95b17f “refactor: update variable names”.
0 // If we can't fund the tx from confirmed UTXOs of a single OutputType, allow mixed output types
1 if (auto r2{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), coins, coin_selection_params, /*allow_mixed_output_types=*/true)}) return r2;
ah, good catch! AttemptSelection will only try with mixed inputs if it is unable to find at least one solution from segregated inputs and if mixing is allowed. So yes, running r2 with allow_mixed_inputs=true achieves want we want without needing the duplicated r3
in
test/functional/wallet_avoid_mixing_output_types.py:80
in
784b03888eoutdated
71+
72+
73+def is_legacy_address(node, addr):
74+ """Check if an address contains a legacy output."""
75+ addr_info = node.getaddressinfo(addr)
76+ return (not addr_info['isscript'] and not addr_info['iswitness'])
murchandamus
commented at 8:43 pm on June 14, 2022:
In 784b03888e70a4744e7b387576f874ac18339dbb “test: functional test for new coin selection logic "
Not sure if this is relevant, but I think this may be inconsistent with the categorization pointed out by @achow101 previously, which IIUC prescribes that non-segwit P2SH addresses should be labeled as “legacy”.
it is inconsistent in the general sense, but for the test it is correct in that we are only generating p2sh-segwit outputs. i’ll update the comment in the function, tho, to avoid any confusion
murchandamus
commented at 8:52 pm on June 14, 2022:
contributor
Looks pretty good, I think I’d consider this ready if the unnecessary additional back-off were removed.
DrahtBot added the label
Needs rebase
on Jun 17, 2022
josibake force-pushed
on Jun 19, 2022
josibake force-pushed
on Jun 19, 2022
josibake
commented at 1:53 pm on June 19, 2022:
member
force-pushed for rebase (pretty big rebase following #25005) and also to address feedback from @Xekyo
a few notes:
after #25005 , it made more sense to leave the function named AvailableCoins (as opposed to changing it to GetAvailableCoins)
available_coins is the preferred variable name throughout (vs coin, available, availableCoins, vCoins, etc)
dropped a refactor commit after removing the extraneous step in the back-offs
fixed the functional test to be more clear and removed the unconfirmed tx test as mixing is allowed before spending unconfirmed change
fixed an error in bench/coin_selection.cpp after making allow_mixed_output_types boolean required for the function call
DrahtBot removed the label
Needs rebase
on Jun 19, 2022
we check that the provider exists a few lines up with bool solvable = provider ? IsSolvable(*provider, output.scriptPubKey) : false; so no need to check it again here.
good point with GetCScript; i’m thinking we should print something to the logs and continue if we can’t get the script? also open to other suggestions
added a check on spendable before looking up the CScript (since it doesn’t make sense to try and find the script if it’s not spendable by us) and handled the case where ExtractDest fails or GetScript fails with a continue.
im thinking just print a log and continue if we can’t extract a destination?
in
src/wallet/spend.cpp:239
in
d8cd07ec67outdated
235+ // We don't need those here, so we are leaving them in return_values_unused
236+ std::vector<std::vector<uint8_t>> return_values_unused;
237+ TxoutType type;
238+ bool is_from_p2sh{false};
239+
240+ if (wtx.tx->vout[i].scriptPubKey.IsPayToScriptHash()) {
furszy
commented at 5:19 pm on June 20, 2022:
member
Some initial, code-only, feedback so we can eliminate duplicated operations/code across the different vectors calls:
What about using a single map<OutputType, std::vector> instead of have one vector definition per output type and encapsulate the vector push, shuffle and erase inside the CoinsResult::push_back, CoinsResult::shuffle and CoinsResult::erase methods?
Then another point is that the bench binary isn’t compiling:
Need to add the new flag allow_mixed_output_types to src/bench/coin_selection.cpp, line 78.
DrahtBot added the label
Needs rebase
on Jun 20, 2022
josibake
commented at 12:43 pm on June 21, 2022:
member
Then another point is that the bench binary isn’t compiling:
Need to add the new flag allow_mixed_output_types to src/bench/coin_selection.cpp, line 78.
DrahtBot removed the label
Needs rebase
on Jun 21, 2022
josibake force-pushed
on Jun 22, 2022
josibake
commented at 9:46 am on June 22, 2022:
member
rebased off master and took suggestions from @furszy (thanks for the review!); git range-diff master eca9588 e04037c
What about using a single map<OutputType, std::vector> instead of have one vector definition per output type and encapsulate the vector push, shuffle and erase inside the CoinsResult::push_back, CoinsResult::shuffle and CoinsResult::erase methods?
this is how i originally started prototyping this, but i realized there are quite a few places outside of coin selection that use the OutputTypes enum and decided against adding an “OTHER” OutputType. i do think your suggested approach is better, but merits its own discussion in a follow-up PR. i also think we could improve the mapping from TxoutType -> OutputType in the same follow up (or a separate one)
josibake force-pushed
on Jun 22, 2022
josibake
commented at 10:12 am on June 22, 2022:
member
small fix-up on the functional test removing some unneeded lines; git range-diff master e04037c 7dfac26
josibake force-pushed
on Jun 22, 2022
josibake force-pushed
on Jun 24, 2022
josibake
commented at 10:43 am on June 24, 2022:
member
rebased master and force pushed to fix a ci issue
in
src/wallet/spend.cpp:476
in
838685a4d0outdated
473+ results.push_back(*result);
474+ };
475+
476+ // If we can't fund the transaction from any individual OutputType, run coin selection
477+ // over all available coins, else pick the best solution from the results
478+ std::optional<SelectionResult> result = [&] {
it was originally a lambda, but i do think it’s more legible without it. also, removing the lambda also takes care of the shadowing
in
src/wallet/spend.cpp:479
in
838685a4d0outdated
476+ // If we can't fund the transaction from any individual OutputType, run coin selection
477+ // over all available coins, else pick the best solution from the results
478+ std::optional<SelectionResult> result = [&] {
479+ if (results.size() == 0) {
480+ if (allow_mixed_output_types) {
481+ if (auto result{ChooseSelectionResult(wallet, nTargetValue, eligibility_filter, available_coins.all(), coin_selection_params)}) {
nit, readability; I had a little trouble reading which arguments were to which functions. (I’m unsure if remove_preset should be snake_case or camelCase; it’s both a function and a variable):
agree that this is not the most readable :sweat_smile: , however i really like @furszy ’s suggestion for refactoring this: move the shuffle and erase functions into the CoinsResult struct. i didn’t take his suggestion in this PR because it requires a bigger refactor than the scope of this PR, but i plan to address it in a follow-up.
43+ * the CoinsResult struct as if it were a vector
44+ */
45 struct CoinsResult {
46- std::vector<COutput> coins;
47- // Sum of all the coins amounts
48+ /* Vectors for each OutputType */
60+ /** The following methods are provided so that CoinsResult can mimic a vector,
61+ * i.e methods can work with invidivudal OutputType vectors or on the entire object */
62+ uint64_t size() const;
63+ void clear();
64+
65+ /** Sum of all available coins **/
49+ std::vector<COutput> legacy;
50+ std::vector<COutput> P2SH_segwit;
51+ std::vector<COutput> bech32;
52+ std::vector<COutput> bech32m;
53+
54+ /** Other is a catch all for anything that doesn't match the known OutputTypes */
0 /** Other is a catch-all for anything that doesn't match the known OutputTypes */
in
src/wallet/spend.h:59
in
838685a4d0outdated
56+
57+ /** Concatenate and return all COutputs as one vector */
58+ std::vector<COutput> all() const;
59+
60+ /** The following methods are provided so that CoinsResult can mimic a vector,
61+ * i.e methods can work with invidivudal OutputType vectors or on the entire object */
0 * i.e., methods can work with individual OutputType vectors or on the entire object */
in
src/wallet/spend.cpp:463
in
838685a4d0outdated
460+{
461+ // Run coin selection on each OutputType and compute the Waste Metric
462+ std::vector<SelectionResult> results;
463+ if (auto result{ChooseSelectionResult(wallet, nTargetValue, eligibility_filter, available_coins.legacy, coin_selection_params)}) {
464+ results.push_back(*result);
465+ };
38+ * COutputs available for spending, stored by OutputType.
39+ * This struct is really just a wrapper around OutputType vectors with a convenient
40+ * method for concatenating and returning all COutputs as one vector.
41+ *
42+ * clear(), size() methods are implemented so that one can interact with
43+ * the CoinsResult struct as if it were a vector
110+ * param@[in] allow_mixed_output_types Relax restriction that SelectionResults must be of the same OutputType
111+ * returns If successful, a SelectionResult containing the input set
112+ * If failed, a nullopt
113+ */
114+std::optional<SelectionResult> AttemptSelection(const CWallet& wallet, const CAmount& nTargetValue, const CoinEligibilityFilter& eligibility_filter, const CoinsResult& available_coins,
115+ const CoinSelectionParams& coin_selection_params, bool allow_mixed_output_types);
LarryRuane
commented at 10:08 pm on June 25, 2022:
i tried to avoid changing the formatting as much as possible to minimize the diff on this PR, so im inclined to leave it as is, but happy to change if others have an opinion about it.
not sure what the recommended best practice here is :man_shrugging:
in
src/wallet/spend.h:126
in
838685a4d0outdated
134+ * returns If successful, a SelectionResult containing the input set
135+ * If failed, a nullopt
136 */
137-std::optional<SelectionResult> AttemptSelection(const CWallet& wallet, const CAmount& nTargetValue, const CoinEligibilityFilter& eligibility_filter, std::vector<COutput> coins,
138+std::optional<SelectionResult> ChooseSelectionResult(const CWallet& wallet, const CAmount& nTargetValue, const CoinEligibilityFilter& eligibility_filter, const std::vector<COutput>& available_coins,
139 const CoinSelectionParams& coin_selection_params);
LarryRuane
commented at 10:10 pm on June 25, 2022:
in
src/wallet/test/availablecoins_tests.cpp:3
in
838685a4d0outdated
0@@ -0,0 +1,105 @@
1+// Copyright (c) 2022 The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
LarryRuane
commented at 10:16 pm on June 25, 2022:
0// file COPYING or https://www.opensource.org/licenses/mit-license.php.
huh, interesting. seems like we have a mix of http and https throughout the codebase in the license header (mental note to fix this in a follow up)
in
test/functional/wallet_avoid_mixing_output_types.py:4
in
838685a4d0outdated
0@@ -0,0 +1,189 @@
1+#!/usr/bin/env python3
2+# Copyright (c) 2022 The Bitcoin Core developers
3+# Distributed under the MIT software license, see the accompanying
4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
LarryRuane
commented at 10:20 pm on June 25, 2022:
0# file COPYING or https://www.opensource.org/licenses/mit-license.php.
in
test/functional/wallet_avoid_mixing_output_types.py:7
in
838685a4d0outdated
0@@ -0,0 +1,189 @@
1+#!/usr/bin/env python3
2+# Copyright (c) 2022 The Bitcoin Core developers
3+# Distributed under the MIT software license, see the accompanying
4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
5+"""Test output type mixing during coin selection
6+
7+A wallet may have different type UTXOs to choose from during coin selection,
LarryRuane
commented at 10:21 pm on June 25, 2022:
0A wallet may have different types of UTXOs to choose from during coin selection,
LarryRuane
commented at 10:38 pm on June 25, 2022:
contributor
Code review ACK838685a4d018961dcd8315a59028aaa262b61916
Looks very good. My suggestions are all nits. I don’t really understand the wallet code well; my review is pretty superficial so don’t put much weight on my ACK.
Not sure about this because average users will use bech32/bech32m addresses. If some users prefer other addresses then it should be okay to accumulate other types in the wallet.
Still need to review and test the approach. My first reaction after looking at the diagram was if the transaction has mixed inputs, will it be okay to use mixed outputs?
Or never use mixed inputs in a transaction?
in
src/wallet/spend.cpp:238
in
2f3d3622a8outdated
235+
236+ // If the Output is P2SH and spendable, we want to know if it is
237+ // a P2SH (legacy) or one of P2SH-P2WPKH, P2SH-P2WSH (P2SH-Segwit). We can determine
238+ // this from the redeemScript. If the Output is not spendable, it will be classified
239+ // as a P2SH (legacy), since we have no way of knowing otherwise without the redeemScript
240+ if (output.scriptPubKey.IsPayToScriptHash() && spendable) {
In 2f3d3622a8b0d5dccdd574d5e37d8239a2f8ed4d “refactor: store by OutputType in CoinsResult”
I think WITNESS_UNKNOWN should also be classified as bech32m.
0 case TxoutType::WITNESS_UNKNOWN:
1 case TxoutType::WITNESS_V1_TAPROOT:
in
test/functional/wallet_avoid_mixing_output_types.py:177
in
ffcc600156outdated
172+ A.sendtoaddress(B.getnewaddress(address_type="bech32m"), v)
173+
174+ self.generate(A, 1)
175+
176+ self.log.info("Sending payments from B to A")
177+ for v in generate_payment_values(10, 20):
ah, good point. there is nothing to prevent that, altho it is incredibly unlikely. ill change the values to 5 payments which add up to 9 to guarantee there is never a value of >= 10
in
test/functional/wallet_avoid_mixing_output_types.py:74
in
ffcc600156outdated
69+ """
70+ addr_info = node.getaddressinfo(addr)
71+ return (
72+ addr_info['isscript'] and
73+ not addr_info['iswitness']
74+ )
In ffcc600156e45968b47586e6758f107a56d6b80e “test: functional test for new coin selection logic”
The easiest way to check for address type is to check what the descriptor starts with. sh(wpkh( or sh(wsh( is p2sh-segwit, wpkh( or wsh( is bech32, tr( is bech32m, and pkh( is legacy.
nice! this test only runs with --descriptors so ill update the is_<addrtype> functions
achow101
commented at 8:18 pm on June 27, 2022:
member
It would be nice to have a test for the backoff behavior where we don’t do mixing of 6 conf funds.
in
src/wallet/spend.h:40
in
2f3d3622a8outdated
40+ * method for concatenating and returning all COutputs as one vector.
41+ *
42+ * clear(), size() methods are implemented so that one can interact with
43+ * the AvailableCoins struct as if it were a vector
44+ */
45 struct CoinsResult {
murchandamus
commented at 6:40 pm on June 28, 2022:
In 2f3d3622a8b0d5dccdd574d5e37d8239a2f8ed4d refactor: store by OutputType in CoinsResult:
Naming-nit:
From what I can tell, the struct that previously was AvailableCoins was renamed to CoinsResult. I’d submit that AvailableCoins was more descriptive.—Isn’t any function’s output a “result”? It feels like adding “…Result” dosen’t add any information. The most notable things about this struct are that it a) lists the available UTXOs, b) separated by output types. These aspects are not captured well by CoinsResult.
Since AvailableCoins seems to be out of favor: We usually call the set of UTXOs available to a wallet its “UTXO pool”. Since this codebase uses “Coin” for UTXO frequently, maybe call this struct CoinPool, or CoinPools to stress that there are multiple sets?
CoinsResult was introduced in #25005 , which changed AvailableCoins to return a struct instead of populating a vector and was introduced here after a rebase.
Personally, I prefer AvailableCoins as the name of the struct (and renaming the AvailableCoins function to GetAvailableCoins), but changing the name again in this PR felt a little silly and would have expanded the diff on this PR. I have a follow-up PR in mind which incorporates some feedback from @furszy regarding the CoinsResult struct, and I think that would be a better place to address the naming
in
src/bench/coin_selection.cpp:57
in
3520cf9d76outdated
murchandamus
commented at 6:50 pm on June 28, 2022:
In commit message for 3520cf9d76d24e43b3ba08fe03687ee1b2e2368e (refactor: use AvailableCoins struct in SelectCoins)
The struct has been renamed to CoinsResult but is referred to as AvailableCoins in the commit message.
in
src/wallet/spend.h:124
in
3520cf9d76outdated
121+ * (according to the waste metric) will be chosen.
122+ *
123+ * param@[in] wallet The wallet which provides solving data for the coins
124+ * param@[in] nTargetValue The target value
125+ * param@[in] eligilibity_filter A filter containing rules for which coins are allowed to be included in this selection
126+ * param@[in] available_coins The vector of coins available for selection prior to filtering
murchandamus
commented at 7:25 pm on June 28, 2022:
Consistency nit: In 3520cf9d76d24e43b3ba08fe03687ee1b2e2368e (refactor: use AvailableCoins struct in SelectCoins)
As below, should perhaps be referred to as the struct?
0 * param@[in] available_coins The struct of coins, organized by OutputType, available for selection prior to filtering
josibake
commented at 4:53 pm on July 5, 2022:
member
Not sure about this because average users will use bech32/bech32m addresses. If some users prefer other addresses then it should be okay to accumulate other types in the wallet.
regardless of what the wallet default address type is, since #23789 , a user will accumulate non-default address types in their wallet based on the change address matching logic. this is the primary motivation for this PR. perhaps a better phrasing in the description would be “non-default” (instead of older types).
josibake force-pushed
on Jul 5, 2022
josibake
commented at 5:37 pm on July 5, 2022:
member
sorry for the late response all, and thanks for the review! force-pushed to address feedback from @LarryRuane , @achow101 and @Xekyo ; changes can be verified with git range-diff master 838685a 7f12300
DrahtBot added the label
Needs rebase
on Jul 6, 2022
josibake force-pushed
on Jul 7, 2022
josibake
commented at 12:04 pm on July 7, 2022:
member
force pushed for rebase; git range-diff master 7f12300 2e30c09
DrahtBot removed the label
Needs rebase
on Jul 7, 2022
Not sure about this because average users will use bech32/bech32m addresses. If some users prefer other addresses then it should be okay to accumulate other types in the wallet.
regardless of what the wallet default address type is, since #23789 , a user will accumulate non-default address types in their wallet based on the change address matching logic. this is the primary motivation for this PR. perhaps a better phrasing in the description would be “non-default” (instead of older types).
It happens only when user sends bitcoin to non-default type. This was my point in the quoted part of comment.
josibake
commented at 5:45 pm on July 7, 2022:
member
It happens only when user sends bitcoin to non-default type. This was my point in the quoted part of comment.
gotcha, i think we are on the same page, unless i am misunderstanding your comment. my point was this is not related to average user behavior. for example:
if i have P2PKH as my default and i regularly pay to bech32/bech32m addresses, i will begin to accumulate non-default UTXOs in my wallet, which might be undesirable for me, the user.
although, reading this again, that section of the description is a bit outdated. originally, i implemented this to prefer spending “older types” first, but it was pointed out by other reviewers that it would be better to always spend the most economical UTXOs without forcing an ordering by OutputType. ill update the the description to remove references to older type UTXOs as that is not really the primary concern of this PR: the issue this PR addresses is the privacy leak that can happen when different type UTXOs are mixed when funding a transaction.
achow101
commented at 8:42 pm on July 7, 2022:
member
ACK2e30c09e1ec488a0bd22c88b7f79d8d1bc3b098e
DrahtBot added the label
Needs rebase
on Jul 8, 2022
josibake force-pushed
on Jul 12, 2022
josibake
commented at 10:28 am on July 12, 2022:
member
force-pushed for a rebase; git range-diff master 2e30c09 6530d19
DrahtBot removed the label
Needs rebase
on Jul 12, 2022
in
src/wallet/test/coinselector_tests.cpp:68
in
4be724d8e7outdated
66@@ -67,7 +67,7 @@ static void add_coin(const CAmount& nValue, int nInput, CoinSet& set, CAmount fe
67 set.insert(coin);
68 }
murchandamus
commented at 4:47 pm on July 13, 2022:
Not a request for this PR, but in the future perhaps consider using a scripted-diffs for large renaming operations.
in
src/wallet/spend.cpp:629
in
074e58f46boutdated
625@@ -601,34 +626,35 @@ std::optional<SelectionResult> SelectCoins(const CWallet& wallet, CoinsResult& a
626627 // If possible, fund the transaction with confirmed UTXOs only. Prefer at least six
628 // confirmations on outputs received from other wallets and only spend confirmed change.
629- if (auto r1{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 6, 0), available_coins, coin_selection_params)}) return r1;
630- if (auto r2{AttemptSelection(wallet, value_to_select, CoinEligibilityFilter(1, 1, 0), available_coins, coin_selection_params)}) return r2;
631+ // Also, allow mixing if and only if a solution from any single output cannot be found
murchandamus
commented at 5:35 pm on July 13, 2022:
Nit: In “wallet: run coin selection by OutputType” (074e58f46b6ca2fe51fb6cb52ac296b971153e9c)
The “output” here threw me off, I think you meant “output type”
0 // Also, allow mixing if and only if a solution from any single output type cannot be found
Perhaps go with:
0 // Allow mixing only if no solution from any single output type can be found
Also, this comment seems to be one line too early, shouldn’t this be between r1 and r2?
in
test/functional/wallet_avoid_mixing_output_types.py:55
in
d0d64c7858outdated
50+ addr_info = node.getaddressinfo(addr)
51+ return addr_info['desc'].startswith('tr(')
52+
53+
54+def is_p2sh_segwit_address(node, addr):
55+ """Check if an address contains a P2SH-Segwit outputself.
murchandamus
commented at 5:41 pm on July 13, 2022:
0 """Check if an address contains a P2SH-Segwit output.
in
test/functional/wallet_avoid_mixing_output_types.py:18
in
d0d64c7858outdated
13+
14+This test verifies that mixing different output types is avoided unless
15+absolutely necessary. Both wallets start with zero funds. Alice mines
16+enough blocks to have spendable coinbase outputs. Alice sends 10
17+random value payments to Bob, creating 5 bech32 UTXOs and 5 P2SH UTXOs,
18+for a total of 30BTC in Bob's wallet.
murchandamus
commented at 5:53 pm on July 13, 2022:
In “test: functional test for new coin selection logic” (d0d64c7858a0d0e2a402a635041126770c7123a5):
This description seems to be outdated. I think it’s now all four types and three UTXOs each for a total of 40 ₿.
in
src/wallet/test/availablecoins_tests.cpp:36
in
6530d1950eoutdated
iirc, i copied this from other tests which use CreateTransaction. personally, i prefer it to just -1 because it’s more clear to me having a NAME_CONSTANT, but i don’t have a super strong feelings about it
murchandamus approved
murchandamus
commented at 6:06 pm on July 13, 2022:
contributor
ACK6530d1950e6f6d5e1e64a2d59f244bb8c6b3996a
Got a few nits pertaining to comments and descriptions mostly. The code appears ready to ship. I think that some readability improvements could be made in a follow-up especially for instances where the same operations is performed for each output type. Basically the idea that @LarryRuane suggested would also apply where AttemptSelection(…) is called. The output types could be put in a list and the AttemptSelection(…) call could be made for each of the listed output types.
achow101
commented at 8:04 pm on July 18, 2022:
member
re-ACK6530d1950e6f6d5e1e64a2d59f244bb8c6b3996a
aureleoules
commented at 0:06 am on July 19, 2022:
member
I believe this PR conflicts with master silently?
I cannot compile the code when rebasing with master but do not get any merge conflicts.
It looks like the issue comes from the last commit 6530d1950e6f6d5e1e64a2d59f244bb8c6b3996a.
in
src/wallet/test/availablecoins_tests.cpp:80
in
6530d1950eoutdated
75+ // For each OutputType, We expect 2 UTXOs in our wallet following the self transfer:
76+ // 1. One UTXO as the recipient
77+ // 2. One UTXO from the change, due to payment address matching logic
78+
79+ // Bech32m
80+ wallet->GetNewDestination(OutputType::BECH32M, "", dest, error);
Store COutputs by OutputType in CoinsResult.
The struct stores vectors of `COutput`s by `OutputType`
for more convenient access
2e67291ca3
refactor: use CoinsResult struct in SelectCoins
Pass the whole CoinsResult struct to SelectCoins instead of only a
vector. This means we now have to remove preselected coins from each
OutputType vector and shuffle each vector individually.
Pass the whole CoinsResult struct to AttemptSelection. This involves
moving the logic in AttemptSelection to a newly named function,
ChooseSelectionResult. This will allow us to run ChooseSelectionResult
over each OutputType in a later commit. This ensures the backoffs work
properly.
Update unit and bench tests to use CoinResult.
77b0707206
josibake force-pushed
on Jul 19, 2022
josibake force-pushed
on Jul 19, 2022
wallet: run coin selection by `OutputType`
Run coin selection on each OutputType separately, choosing the best
solution according to the waste metric.
This is to avoid mixing UTXOs that are of different OutputTypes,
which can hurt privacy.
If no single OutputType can fund the transaction, then coin selection
considers the entire wallet, potentially mixing (current behavior).
This is done inside AttemptSelection so that all OutputTypes are
considered at each back-off in coin selection.
438e04845b
test: functional test for new coin selection logic
Create a wallet with mixed OutputTypes and send a volley of payments,
ensuring that there are no mixed OutputTypes in the txs. Finally,
verify that OutputTypes are mixed only when necessary.
da03cb41a4
test: add unit test for AvailableCoins
test that UTXOs are bucketed correctly after
running AvailableCoins
71d1d13627
josibake force-pushed
on Jul 19, 2022
josibake
commented at 4:47 pm on July 19, 2022:
member
responded to review from @Xekyo , fixed the silent merge (thanks @aureleoules , @furszy ); git range-diff master 6530d19 71d1d13
I think that some readability improvements could be made in a follow-up especially for instances where the same operations is performed for each output type. Basically the idea that @LarryRuane suggested would also apply where AttemptSelection(…) is called. The output types could be put in a list and the AttemptSelection(…) call could be made for each of the listed output types.
definitely agree! both @furszy and @LarryRuane had some great suggestions which id like to tackle in a follow-up
I think I originally tried this and ran into strange behavior. Also, in other places an empty SelectionResult is returned as opposed to a nullopt. Going to leave this for now, but might be worth taking a second look at and perhaps using the newly created BResult class
It shouldn’t be a problem to return std::nullopt, but the current code is equivalent anyways. The default constructor for a std::optional is to make a nullopt object.
aureleoules
commented at 11:02 am on July 20, 2022:
I understand this returns the SelectionResult with the least waste as std::min_element compares the elements with the operator<, but maybe add a comment to explicit this behavior?
+nit
0 // Find the best solution (lowest waste)
1 return std::make_optional(*std::min_element(results.begin(), results.end()));
I thinkskip_if_no_sqlite() is sufficient for this? will add if i need to touch the functional test file again
aureleoules
commented at 11:20 am on July 20, 2022:
member
ACK71d1d13627ccd27319f347e2d8167c8fe8a433f4.
I believe the CoinsResult struct is much easier to work with than the previous coin vector.
This PR does seems to fix the privacy issue explained in the description.
kristapsk
commented at 8:39 am on July 21, 2022:
contributor
Concept ACK
achow101
commented at 8:19 pm on July 21, 2022:
member
re-ACK71d1d13627ccd27319f347e2d8167c8fe8a433f4
unknown changes_requested
unknown
commented at 11:13 pm on July 21, 2022:
none
Approach NACK
I was not able to achieve the privacy mentioned in PR description. Steps to reproduce:
Create a wallet w1, send 0.001 four times from signet faucet to bech32 addresses, wait for confirmation and spend more than 0.002 to a legacy address.
There will be 2 UTXO in wallet: 1 legacy and other bech32
This is a valid argument and there can be multiple interpretations even without coinjoin: #24584#pullrequestreview-911639998
Also still curious about the two questions/solutions I had about solving this problem in #24584 (comment)
achow101
commented at 11:25 pm on July 21, 2022:
member
Approach NACK
I was not able to achieve the privacy mentioned in PR description. Steps to reproduce:
1. Create a wallet w1, send 0.001 four times from signet faucet to bech32 addresses, wait for confirmation and spend more than 0.002 to a legacy address.
2. There will be 2 UTXO in wallet: 1 legacy and other bech32
3. Send more than 0.001 to a bech32 address
I don’t see how this example tests what is described in the OP? You ended up making a transaction that has mixed inputs, but the that’s because those are the only funds in your wallet. The point of this PR is to avoid mixing input types, not never mix input types.
josibake
commented at 9:47 am on July 22, 2022:
member
hey @1440000bytes , thanks for testing! as mentioned, the goal is not to never mix; the goal is to avoid mixing if possible. I’ve updated the OP to explicitly state this. also, take a look at the functional test, as this demonstrates the desired behavior: creating a wallet with mixed UTXO types, funding several transactions without mixing, and finally emptying the wallet (which is impossible to do without mixing).
This is a valid argument and there can be multiple interpretations even without coinjoin: #24584#pullrequestreview-911639998
while there can be other interpretations, the most likely interpretation today is that the non-default input in the subsequent transaction is the change from the previous transaction (and in the majority of cases, this interpretation is correct).
Also still curious about the two questions/solutions I had about solving this problem in #24584 (comment)
i responded here: #24584 (comment) , please let me know which parts of your original question you feel are unaddressed.
How about insufficient funds error message or use mixed output types for change? This could happen even when some transactions are unconfirmed and confirmed UTXOs are of different types
murchandamus
commented at 7:21 pm on July 26, 2022:
contributor
How about insufficient funds error message or use mixed output types for change? This could happen even when some transactions are unconfirmed and confirmed UTXOs are of different types
An advanced user that cares so much about not mixing inputs is probably already using coin control to manually select inputs. I think this PR is a reasonable middle-path of rolling out a privacy improvement while minimizing negative impact on the UX. Adding cases in which transaction building fails while there are sufficient funds available as the default behavior would be an unreasonable UX disimprovement. I would consider the addition of options for more strict behavior to be out of scope for this PR.
in
src/wallet/test/availablecoins_tests.cpp:91
in
71d1d13627
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-04-05 21:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me