Issue#1643: Coinselection prunes extraneous inputs from ApproximateBestSubset

murchandamus commented at 0:48 am on September 13, 2014: contributor

Improvement for Issue#1643: A further pass over the available inputs has been added to ApproximateBestSubset after a candidate set has been found. It will prune any extraneous inputs in the selected subset, in order to decrease the number of inputs and the resulting change.

murchandamus commented at 0:50 am on September 13, 2014: contributor

This is the first time I am trying to collaborate on an open-source project, please feel free to point out when I am doing something wrong.

BitcoinPullTester commented at 1:02 am on September 13, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4906_3b92ce7ad950ceba14f1cc6ad637923f71ca61be/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

murchandamus commented at 1:38 am on September 13, 2014: contributor

I was made aware by @sipa that a candidate set that ended up matching the targetValue with the addition of a dust UTXO would still be picked immediately. That seems to be correct.

However, unless one is trying to make a minuscule payment, it seems that the likelihood to add a bigger UTXO at some point that would exceed the target should be sufficiently larger, than the chance that you’d only add inputs that inch closer and closer to the target to match it finally without exceeding it. It should rather be unlikely to even happen in 1000 iterations (gut feeling), so the patch should work fine in the majority of situations.

gmaxwell commented at 10:31 pm on September 15, 2014: contributor

Have you simulated this, e.g. how a wallet progressed over time? I would expect this to result in grinding down the wallet into lots of dust change over time even worse than the current approach.

Generally, so long as it doesn’t result in bloating things up to the point where the transaction confirms slowly, we should generally prefer to make transactions bigger (under the rational that transaction fees will increase over time or at worst stay constant). E.g. If you already have a change output, a pass that looks at the scriptpubkeys you’re spending and keeps adding more inputs assigned to the same pubkeys until the fee increase is some threshold or the change exceeds 2x the payment value (or something like that) would probably result in lower total transaction fees over time. (and better privacy)

murchandamus commented at 11:11 am on September 18, 2014: contributor

@gmaxwell I had not simulated it in advance. I will do so though, just haven’t gotten around to it yet. Perhaps I’ll get around to it this evening, otherwise I will probably get around to it tomorrow.

murchandamus closed this on Sep 18, 2014

murchandamus reopened this on Sep 18, 2014

murchandamus commented at 8:29 am on September 23, 2014: contributor

@gmaxwell So, I have done some simulations. My approach was: • To recreate the wallet logic in Python. • To add the pruning feature conditionally • To have a wallet of each type, starting wtih the same utxopool, execute and receive a number of payments. • They obviously end up with the same amount, and I can compare the number and sizes of final utxo.

So far I have tried a few different scenarios, but I ended up being limited at around 10k utxo and 1000 operations, as my python implementation was not as fast as I had hoped. My initial experiments suggest that the pruning wallet does end up with a slightly bigger number of utxo, but it’s only slightly bigger than the regular wallet. (I’ll be adding the results later, when I am on the right computer.) I have started implementing the simulation in a compiled language to speed it up in order to do some more and bigger experiments.

laanwj added the label Wallet on Sep 25, 2014

gmaxwell commented at 8:19 am on October 8, 2014: contributor

Please keep us updated on your progress. If you’d like some more eyes on your testing code, feel free to point it out.

murchandamus commented at 1:07 pm on October 11, 2014: contributor

@gmaxwell The code for my simulation can be found here: https://github.com/Xekyo/CoinSelectionSimulator

It is working now, and can be executed in bearable time, but the code could be a bit clearer. I am currently in my exam period and haven’t had as much time to work on it, as I would like.

I have tried a few different strategies to select coins, and have been experimenting with different distributions of incoming and outgoing payments.

Different strategies:

“Regular Wallet” implements the algorithm currently in use as I understand it, to be used as a baseline.
“Pruning Wallet” selects the coins just like the regular wallet, but afterwards discards inputs that are smaller than the remaining change in the order they were selected.
“Pruning Wallet with minimum number of Inputs”, like PW, but when pruning keeps at least the miminum number of Inputs.
“Double Wallet”: Like PW, but when unable to directly match the target, aims for twice the target, in order to create change in the magnitude of what the user was trying to transfer instead of dust.

Some results:

Experiment 1: Start: 5000 random UTXO from range(2500, 250000) satoshis 10000 transactions (randomly 50/50 distributed incoming/outgoing) in the range(540,250000) satoshis Regular Wallet: 4853 changes created, in range(1,1591) satoshis, average change 63.01±97.78 satoshi, spent (1,12) inputs per transaction sent, average 1.44±1.15

… writing up my results I realize that I will want to create a .csv instead of the current text-form result, sorry gonna go back to studying now, but I’ll probably be able to do it in the next break.

Some questions I haven’t been able to answer satisfyingly:

What would constitute realistic data for incoming and outgoing transactions of one wallet (how many incoming/outgoing transactions, what average size and distributions for each direction, is it necessary to regard the change in value over time)?
Haven’t researched yet: When does the required transaction fee increase?

murchandamus commented at 12:59 pm on October 19, 2014: contributor

Hi, Finished with my exams for this semester, finally had time to pick this up again.

(When I created the .csv files with my output, I realized that my random models would sometimes generate spending instructions that asked for more than the wallet’s current value. While those were were ignored as impossible, they still got into the statistics, so I wanted to fix that first.)

I’ve been thinking about what one might want to be optimizing for. This is what I got so far:

Non-dust change: The creation of small change outputs should be avoided. They bloat the blockchain and are expensive to spend. -> If a change output has to be created, it would be preferable to create a change output of the magnitude of the payment value. (Which would also help obscure recipient output and change output.)
Privacy: UTXO should be picked non-deterministically, and as few different pubkeys as possible should be involved. -> Whenever the CoinSelection selects a UTXO, any UTXO assigned to the same pubkey should be preferred in selection. -> Small inputs that are significantly smaller than the size of the change and don’t share their pubkey with a larger input should be pruned, as they increase transaction fee and decrease privacy.
Minimize transaction fee A transaction input set should be preferred when it costs less to send. -> If spending some input costs more than what it contributes to the outputs it should not be added. Also, inputs that are extraneous should be pruned if that would lower the transaction fee.
Consolidate small UTXO? Once created, very small UTXO are in the blockchain anyway. Is it preferable to spend very small UTXO, in order to remove them from the UTXO-pool, or to ignore them until they become valuable enough to spend in their own right?

Here are the statistics from my latest experiment. ex19

gmaxwell commented at 7:48 am on October 20, 2014: contributor

Very interesting results!

gmaxwell commented at 7:59 am on October 20, 2014: contributor

Consolidate small UTXO? Once created, very small UTXO are in the blockchain anyway. Is it preferable to spend very small UTXO, in order to remove them from the UTXO-pool

It’s preferable to spend them: since it reduces the storage for a minimal full node (see the pruning patches in #4701)… subject to the restriction that you don’t want someone to be able to gratitiously increase your transaction costs by sending you tiny utxo.

RHavar commented at 2:34 pm on February 18, 2015: contributor

Some questions I haven’t been able to answer satisfyingly:

What would constitute realistic data for incoming and outgoing transactions of one wallet (how many incoming/outgoing transactions, what average size and distributions for each direction, is it necessary to regard the change in value over time)?

Perhaps I can help. I have data from MoneyPot.com’s hot wallet which you can use for simulation: https://gist.github.com/RHavar/7cd6f3fcf2bd3e485458

The positive amounts are amounts deposited, the negative amounts are sends. The data is sorted over time (oldest at the top, newest at the bottom) denominated in bitcoins.

Things to note:

deposits are only added to the list when they get 1 confirmation (so they tend to come in batches) and send amounts are added to the list instantly
At various times, the cumulative total dips below 0. This happens when people have won enough money to be able to withdraw more than the total people have put in the site. For the purpose of simulation, you might want to assume there is an additional 50 BTC input at the very start, to avoid this case.

jgarzik commented at 6:15 pm on July 23, 2015: contributor

This PR has been open for a while, but garnered no ACKs. The author seems to have put a fair amount of time and thought into it. However, this definitely needs more review and testing.

Ping, maintainers/testers/helpers?

murchandamus commented at 9:28 pm on July 23, 2015: contributor

@jgarzik: Hi Jeff, Thanks for the ping. I had lost track of this, but I see now that @RHavar posted data that might help to explore what the patch would do with some realistic data for transaction sizes. I’ll have a look this weekend.

gmaxwell added the label Privacy on Aug 18, 2015

murchandamus commented at 7:52 am on September 1, 2015: contributor

I’ve created another testcase using the MoneyPot.com data. On that one, I get different results yet again, this time the Regular wallet has the least UTXO in the end.

Looking over the results, I noticed that all wallets created a change output in the single digit satoshis. Do I remember correctly that the wallet shouldn’t create Dust Outputs? Shouldn’t the change be at least 540 satoshis? If so, I should probably fix that behavior still and then have another look.

However, if it ends up looking as promising as today, I would propose to just expire this Pull-Request.

maflcko commented at 9:20 am on September 23, 2015: member

@Xekyo:

prune extraneous inputs

Wouldn’t “extraneous inputs” imply an issue in the coin selection algorithm which should be fixed instead? I invite you to look at #6696 which is a different approach. I am happy to shoot up some simulations if anyone is interested. (Ping me at the other PR)

laanwj commented at 3:41 pm on November 20, 2015: member

From what I understand ApproximateBestSubset is an approximate algorithm for the following:

0Input: [sizes], nLow, nTarget
1Output: X=subset([sizes]) for which nTargetValue <= sum(X) < nLow
2        and sum(X)-nTargetValue as small as possible

The “approximate” part means that it may select a solution which is not the optimal one (e.g. sum(X)-nTargetValue is not really as small as possible).

Your fix drops elements i from the result X for which sum(X)-i is still >= nTargetvalue, which if possible leads to a trivially better solution.

From what I understand this can improve the result because vValue is sorted from low to high, and elements are added in that order, it can happen that an element is added to get the sum above nTargetValue, but makes earlier-added small elements redundant.

What about moving the post-processing step to the end? This removes the performance impact per checked subset, and still makes sure redundant outputs are removed from the final result.

I’ve managed to reproduce this as well. For e.g. these input values:

0vValues = [10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 1000]
1nTargetValue = 251

ApproximateBestSubset sometimes selects [10, 10, 1000]. Postprocessing like this can drop the redundant 10 inputs.

Concept ACK - this cannot make things worse. Would be nice to have a unit test.

maflcko commented at 4:10 pm on November 20, 2015: member

@laanwj Great that you had a look at this.

vValue is sorted from low to high …

Guess I missed that during review.

unit test @Xekyo are you still working on this? Should I take over?

murchandamus commented at 12:30 pm on November 21, 2015: contributor

Hi,

Am 20.11.2015 11:11 schrieb “MarcoFalke” notifications@github.com:

@Xekyo are you still working on this? Should I take over?

I’d be interested to take another look, but I’m currently traveling. Please feel free to take over. Otherwise, I’ll check it out in December.

Xekyo

murchandamus commented at 9:27 pm on December 6, 2015: contributor

@laanwj I’ve edited my fork to move the post-processing step to the end of ApproximateBestSubset. However, this patch may cause fewer dust outputs to be spent which contradicts Greg’s assessment above. Are you sure about “cannot make things worse”? I feel my simulations have been somewhat inconclusive on that point.

Coinselection prunes extraneous inputs from ApproximateBestSubset

A further pass over the available inputs has been added to ApproximateBestSubset after a candidate set has been found. It will prune any extraneous inputs in the selected subset, in order to decrease the number of input and the resulting change.

5c03483e26

murchandamus force-pushed on Dec 6, 2015

murchandamus commented at 10:30 pm on December 6, 2015: contributor

@MarcoFalke I realized my mistake and fixed it. After all my commits, I did the rebase as you suggested, and pushed to “Fix-#1643”. I hope I did that right. :)

murchandamus commented at 11:26 pm on December 6, 2015: contributor

I’ve added a simple test with

0   vValues = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100]
1   nTargetValue = 222

with an expected nBest == 230 on one iteration.

Sorry, I’m not sure how I would run the test on my machine right now, so I figured I’d just push it. I’ll check back tomorrow.

in src/test/coinselection_tests.cpp: in 238b7a2bd7 outdated

14+
15+using namespace std;
16+
17+BOOST_FIXTURE_TEST_SUITE(coinselection_tests, BasicTestingSetup)
18+
19+BOOST_AUTO_TEST_CASE(sanity)

maflcko commented at 11:52 am on December 7, 2015:

I think you can just add this to src/wallet/test/wallet_tests.cpp.

murchandamus commented at 3:50 pm on December 7, 2015:

Ah thanks, I had been wondering why there was no wallet tests in the regular test directory.

in src/wallet/wallet.cpp: in 238b7a2bd7 outdated

1634     }
1635+
1636+    //Reduces the approximate best subset by removing any inputs that are smaller than the surplus of nTotal beyond nTargetValue. 
1637+    for (unsigned int i = 0; i < vValue.size(); i++)
1638+    {                        
1639+	if (vfBest[i] && (nBest - vValue[i].first) >= nTargetValue )

laanwj commented at 4:19 pm on December 7, 2015:

Nit: indentation

murchandamus commented at 4:30 pm on December 7, 2015:

I’ve replaced the tab with four spaces.

in src/wallet/wallet.cpp: in 238b7a2bd7 outdated

1624@@ -1625,13 +1625,21 @@ static void ApproximateBestSubset(vector<pair<CAmount, pair<const CWalletTx*,uns
1625                             nBest = nTotal;
1626                             vfBest = vfIncluded;
1627                         }
1628-                        nTotal -= vValue[i].first;

laanwj commented at 4:21 pm on December 7, 2015:

This (along with && !fReachedTarget is a new, different change to the algorithm than the post-processing step initially proposed.

I think we should separate these into different PRs, the postprocessing to remove extraneous outputs below is obviously correct, but I’m not sure about this.

murchandamus commented at 4:25 pm on December 7, 2015:

@laanwj Fair enough, I’ll take it out and create another PR later.

laanwj commented at 4:35 pm on December 7, 2015: member

ACK

Moved set reduction to the end of ApproximateBestSubset to reduce performance impact af9510e037

murchandamus force-pushed on Dec 7, 2015

murchandamus commented at 4:38 pm on December 7, 2015: contributor

I rebased and used fixup on the commits that only fixed previous mistakes.

murchandamus force-pushed on Dec 7, 2015

Added a test for the pruning of extraneous inputs after ApproximateBestSet fc0f52d780

murchandamus force-pushed on Dec 7, 2015

murchandamus commented at 7:12 pm on December 7, 2015: contributor

@laanwj Travis had cleared my patch just when I realized that I had also messed up the indentation on the test. I’ve rebased it into two commits, one commit for the code, and one for the test. I expect this to turn green again momentarily. Sorry for keeping Travis so busy. :(

Is there anything else that needs to be done about this PR?

laanwj merged this on Dec 8, 2015

laanwj closed this on Dec 8, 2015

laanwj referenced this in commit 0800092fc2 on Dec 8, 2015

maflcko commented at 1:01 pm on December 9, 2015: member

Wallet code itself looks good! Post-merge Tested ACK. @Xekyo Thanks for sticking with this so long! There seems to be a small issue with the test code but I will create another PR for this…

Note @laanwj: vValue is not sorted from low to high but from high to low but I think you meant it the right way. ;)

maflcko commented at 1:10 pm on December 9, 2015: member

Oh, and you mentioned this will improve cases such as #1643. Which is not true, at least for this very transaction mentioned in #1643.

Still, I assume pruning will help for really large wallets (like the ones of exchanges or heavy users) when they have a odd distribution of input amount sizes.

laanwj referenced this in commit 96e8d12033 on Dec 10, 2015

laanwj referenced this in commit cc452d073c on Jul 1, 2016

laanwj referenced this in commit 20f3cd75f6 on Jul 1, 2016

lateminer referenced this in commit 77ea5eb519 on Jan 7, 2018

bitcoin locked this on Feb 15, 2022

murchandamus deleted the branch on Oct 6, 2023

Issue#1643: Coinselection prunes extraneous inputs from ApproximateBestSubset #4906