Coin selection algorithm proposal #23164

brunoerg commented at 2:00 AM on October 3, 2021: contributor

I’ve been working on a coin selection algorithm using Evolutionary Algorithm. I am creating this issue to discuss the possibilities related to it. If here is not the best place to discuss it, feel free to close this issue.

If you don’t know what an Evolutionary Algorithm is, here is a good definition: "EA is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions (see also loss function). Evolution of the population then takes place after the repeated application of the above operators". Anyway, I recommend you to study more about EA before trying to understand this proposal.

In Evolutionary Algorithms we have genes, chromosomes, and population, for example:

So, to begin our algorithm, we must first create an initial population. The population will contain an arbitrary number of possible solutions to the problem, oftentimes called members. In this case, a gene is a UTXO, so, every chromosome is a set of UTXOs.

member1 = [utxo1, utxo2, utxo3]
member2 = [uxto2, utxo5, utxo6, utxo7, utxo8]
...

To create the initial population, we can use the following approach (considering 5 members per population):

1 member composed of all UTXOs
1 member that selects randomly from the shuffled UTXOs until the target is exceeded
3 random members

Obs.: Most evolutionary algorithms set a length for the chromosome. However, for this approach, we don’t do it because we don't know how many UTXOs our final solution will use.

Ok, having our initial population, it is time to evaluate each solution (fitness). To do it, we can use the waste metric, introduced in Bitcoin Core recently. So, our metric to evaluate the members is the cost of creating change, the excess selection amount, and the cost of spending inputs now as opposed to sometime in the future (when we expect to be able to consolidate inputs). See more: https://bitcoincore.reviews/22009

After evaluating each member, we can define what is the best one among them and build our next population (new generation).

Our new population will have (considering 5 members):

3 members from mutation
1 member keeping the same chromosome of the best solution from the previous generation
1 new random member

Considering 50% mutation rate, we create 3 members copying the same chromosome of the best solution from the previous generation and applying mutation, like:

for (gene in chromosome) {
  const random_value = getRandomValueBetween0and1()
  if (random_value == 1) {
    gene = getRandomUtxo()
  }
}

Ok, now we have a new population, and then, we can repeat all the processes (fitness and mutation) N times (being N the number of generations), the best member of the last generation will be our final solution.

brunoerg added the label Feature on Oct 3, 2021

lsilva01 commented at 4:30 AM on October 5, 2021: contributor

That sounds interesting. Could you provide a performance benchmark (or other relevant metrics) between the current implementation and the one suggested in this issue?

brunoerg commented at 10:45 AM on October 5, 2021: contributor

That sounds interesting. Could you provide a performance benchmark (or other relevant metrics) between the current implementation and the one suggested in this issue?

Yeah! I'm doing it and as soon as I finish it I'll post here. I've implemented this algorithm in coinselect js (https://github.com/brunoerg/coinselect/blob/master/genetic.js) to be able to compare with other ones.

martinus commented at 3:58 PM on October 6, 2021: contributor

It would also be interested in a comparison with repeated random draws. So when the EA does e.g. 10000 fitness evaluations, compare that with choosing the best individual from 10000 random draws. That should give some insight into how well the evolution itself actually works.

brunoerg commented at 12:05 AM on October 7, 2021: contributor

It would also be interested in a comparison with repeated random draws. So when the EA does e.g. 10000 fitness evaluations, compare that with choosing the best individual from 10000 random draws. That should give some insight into how well the evolution itself actually works. @martinus Interesting, but... "So when the EA does e.g. 10000 fitness evaluations, compare that with choosing the best individual from 10000 random draws" I have to evaluate that 10000 random draws to know what is the best one, didn't understand it..

martinus commented at 5:07 AM on October 7, 2021: contributor

@brunoerg sorry for not wording this clear... I've meant that it would be interesting to see if an optimization that uses EA to mutate & crossover is actually an improvement over just random draws. To make the comparison fair, the EA and random draws should do the same number of fitness evaluations. If the EA optimizes with 10000 evaluations to find the best individual, a fair comparison would be with an algorithm that simply randomly choose 10000 individuals to find the best. If EA performs consistently better, then this is a sign that the crossover & mutation actually is a benefit. If just randomly picking individuals perform better this is a sign that the crossover & mutation does not really work well to explore the fitness space. It could go both ways.

brunoerg commented at 10:53 AM on October 7, 2021: contributor

@martinus Cool! Going to test it and post the results here! Thank you!

brunoerg commented at 5:13 PM on October 18, 2021: contributor

@martinus thanks for your suggestion. Testing the mutation I proposed vs. creating the new individuals randomly, I realized that the results were very similar. So, I changed the approach. Considering X generations, until the generation X/2, I don't apply mutation, I create the new individuals randomly, after that, considering 5 individuals for my population, I do: 1 individual with the same chromosome from the best individual of previous generation 1 individual from mutation (as I've done) 2 random individuals with more or less* genes that the best individual of previous generations

if the fee rate is lower than the long term fee rate, I create 1 individual 40% bigger and 1 individual 20% bigger. Else, I create 2 individuals but smallers (40% and 20%).

murchandamus commented at 7:41 PM on October 19, 2021: contributor

This is pretty cool, I'm curious whether it can be tweaked to find more interesting solutions than random draws.

I would imagine that a smaller mutation rate and a larger population would be helpful, but I'm not sure that the genes are sufficiently decoupled for the algorithm to actually converge on a better solution. E.g. with an EA for a robot searching a labyrinth, the genes would indicate how the robot acts when seeing a specific surrounding, and learning better reactions for different situations contributing to an overall greater fitness. I don't see how just flipping half the UTXO could converge on a better input set, especially for larger UTXO pools (say 100+).

Perhaps it would be more interesting to compose new descendants from the UTXOs picked in the fittest and runner-up solution. Let's say the best solution A is composed of the three UTXOs {A1, A2, A3} and the second best solution B has the four UTXOs {B1, B2, B3, B4}. The UTXOs in each solution are sorted by value. The next generation would reuse A, and a few descendants generated by flipping a coin on whether to use A1 or B1, A2 or B2, A3 or B3, nothing or B4, with a small chance (5%?) of taking neither or both resulting e.g. in {A1, A2, B3} and {A1, B2, A3, B3}. Or perhaps by taking the start of A and the end of B and vice versa. Then maybe generate two more such descendants from the second and third best. Cull and redo any solutions that don't have enough funds in the first place. Rather than flipping each UTXO with a 50% chance, maybe have a small chance that a UTXO is dropped, gets replaced by a random different one, or a previously unselected UTXO is randomly added.

5 seems very small for the population. Perhaps a bigger population would make fitter individuals stand out more quickly. Maybe try something like 20 or 30?

brunoerg commented at 1:04 AM on October 20, 2021: contributor

@Xekyo Thanks, Murch! <br><br>

I would imagine that a smaller mutation rate and a larger population would be helpful

Yeah, I'm gonna test with a smaller one, most algorithms use less than 1% for mutation rate.

Let's say the best solution A is composed of the three UTXOs {A1, A2, A3} and the second best solution B has the four UTXOs {B1, B2, B3, B4}. The UTXOs in each solution are sorted by value. The next generation would reuse A, and a few descendants generated by flipping a coin on whether to use A1 or B1, A2 or B2, A3 or B3, nothing or B4, with a small chance (5%?) of taking neither or both resulting e.g. in {A1, A2, B3} and {A1, B2, A3, B3}. Or perhaps by taking the start of A and the end of B and vice versa.

Interesting. It would be a great crossover and mutation.

Then maybe generate two more such descendants from the second and third best

I think it would be interesting if we increase the population. Considering 5 members, the second/third best solutions might be really bad, we shouldn't consider them to create decendents.. However, if we consider 20 or 30 members, I'm sure the second or third best solutions are decent and we can use them to create the new ones.

If the best solution A is {A1, A2, A3} and the second best solution B is {B1, B2, B3, B4}, we will first apply crossover, right? So, I think we can flip a coin to decide the genes (50% A/50% B), in this case, it would generate some ones like:

A = {A1, A2, A3} B = {B1, B2, B3, B4} possible descendants: X = {A1, A2, B3} X = {A1, B2, B3} X = {A1, A2, A3, B4} X = {A1, B2, B3, B4} ...

And then, we can apply mutation, maybe we can do:

for gene in genes:
  random = getRandomValueBetween1And1000()
  if random == 1:
     //gets replaced by a random different one
  elif random == 2:
    //drop it

random = getRandomValueBetween1And1000()
if random > 1: 
  //push a new UTXO

Of course, it is just an example to show we can use what you suggested as mutation:

have a small chance that a UTXO is dropped, gets replaced by a random different one, or a previously unselected UTXO is randomly added.

About the new generations, considering 20 members (A (the best one), B (second), C, D, E, F, G...) the decedents could be: 1 - A 2 - Crossover A/B + mutation 3 - Crossover A/B + mutation 4- Crossover A/B + mutation 5 - Crossover B/C + mutation 6 - Crossover B/C + mutation 7 - Crossover B/C + mutation 8 - Crossover A/C + mutation 9 - Crossover A/C + mutation 10 - Crossover A/C + mutation ... What do you think?

murchandamus commented at 8:36 PM on October 20, 2021: contributor

You could have multiple different ways of creating descendants. For the head-tail-swap, I was actually thinking like this:

AAA + BBBB = AABB + BBA // i.e. the head attaches to the tail of the other and vice versa to generate two new individuals while the cutting point wouldn't necessarily be the middle.

Another would pick either the A or B UTXO for each position:

AAA + BBBB = BABB (just one example)

I assume that literature about EA would have additional suggestions, but I'm not sure whether the computational effort is defensible compared to more systematic approaches. Perhaps the calculation could be interrupted early if there hasn't been an improvement for ten generations beside the maximum generation count, but that wouldn't do anything for the worst case, of course.

brunoerg commented at 11:50 AM on October 24, 2021: contributor

@Xekyo

Yeah, I know different approaches for crossover, but I think I will have to implement them to analyse what is the best one for this scenario. Something that makes it difficult is the length of the chromosome, I know some approaches that just works if all the chromosomes have the same length.

Perhaps the calculation could be interrupted early if there hasn't been an improvement for ten generations beside the maximum generation count

Make sense. With the waste metric, we could execute this algorithm last, stopping its execution when it finds a solution with a better waste metric compared to the previous algorithms (knapsolver, srd)? What do you think?

murchandamus commented at 8:43 PM on October 25, 2021: contributor

You could make the chromosomes equal length by having a boolean flag for every UTXO in the wallet, but that might cause issues for very large wallets.

Make sense. With the waste metric, we could execute this algorithm last, stopping its execution when it finds a solution with a better waste metric compared to the previous algorithms (knapsolver, srd)? What do you think?

That might be an option, but I think it would first have to be shown that it produces good results in an affordable computational effort.

bitcoin deleted a comment on Oct 25, 2021

brunoerg commented at 12:15 AM on October 26, 2021: contributor

@Xekyo I am gonna implement various solutions (different crossover and mutation) to check if the computational effort is satisfactory. The complexity of an EA depends on its implementation. Thanks for the tips.

aureleoules commented at 9:18 AM on September 16, 2022: member

Very interesting algorithm, have you continued working on it @brunoerg?

brunoerg commented at 7:38 PM on September 16, 2022: contributor

@aureleoules more or less, hope to do more simulations soon. Do you have any suggestions to improve the idea?

aureleoules commented at 11:53 PM on September 17, 2022: member

Not really sorry, but I'd love to see your draft.

murchandamus commented at 6:56 PM on September 20, 2022: contributor

I'm not sure whether I made myself clear above. I find evolutionary algorithms interesting, but I remain skeptical that they're a good approach for coin selection. I would suspect that they require a lot of computation and would be very slow, while I don't see why recombining different subsets of input selections would hone in on a better solution.

I would second what @martinus said above. Before getting too much into the details on this, it would be good to substantiate that it even produces useful results, e.g. by showing that the input sets improve over time as the EA runs, and that it can beat e.g. the best out of multiple random selections.

priyanshiiit commented at 5:14 AM on November 15, 2022: none

murchandamus commented at 9:42 PM on December 2, 2022: contributor

Mh, their description of “How Bitcoin selects transaction inputs” matches the “Knapsack” algorithm, but BnB was merged to Bitcoin Core in 2018. They also cite my Master thesis as having been written in 2020 instead of 2016. Gonna have to give the whole paper a read some time.

murchandamus commented at 8:41 PM on December 9, 2022: contributor

I finished reading the paper. I did not find it insightful. Their approach does not seem useful to me. It is optimizing for things that I would not consider important (reducing dust events, which is already achieved with a minimum change) and is extremely inefficient in trying to attain them. The simulation results are implausible (apparently their average BnB result is 10^10 sats greater than the target), making me think that their scenarios are not representative for the task at hand or their re-implementation of Bitcoin Core's coin selection inaccurate.

It seems to me that they are essentially replicating the behavior of BnB at high feerates, but do so by randomly generating the input sets instead of efficiently traversing the possible solution space, while only using a single output type and ignoring fees.

adamjonas commented at 8:53 PM on March 10, 2023: member

This has gone stale and @brunoerg confirmed can close.

adamjonas closed this on Mar 10, 2023

bitcoin locked this on Mar 9, 2024