For-block transaction selection algorithm makes RelayNetwork-esque services much harder #6531

TheBlueMatt commented at 12:17 AM on August 7, 2015: contributor

See https://github.com/TheBlueMatt/RelayNode/issues/12#issuecomment-128234446. Essentially, there is no tie-breaking mechanism in the transaction selection, and we end up picking essentially a random subset of transactions that have the same feePerKb. Current blocks/mempools contains ton of transactions selected out of a pool of the same feePerKb set. I /think/ creating the heap with a tie-breaking mechanism of eg. the transaction hash should fix the issue, but it would need testing/more digging to confirm.

TheBlueMatt commented at 12:17 AM on August 7, 2015: contributor

This is currently making the relay network incredibly ineffecient.

sipa commented at 12:24 AM on August 7, 2015: member

Ideally, this would reflect some real costs, or there is no reason why different miners would use the same tie-breaking...

laanwj added the label Mining on Aug 7, 2015

TheBlueMatt commented at 10:18 PM on August 7, 2015: contributor

Indeed, but there is no reason why miners would pick different tie-breakers, either :). Still, if you want something for which you could make a reasonable argument, use time-entered-mempool. I would wildly assume that this would work nearly as well.

On August 7, 2015 2:24:41 AM GMT+02:00, Pieter Wuille notifications@github.com wrote:

Ideally, this would reflect some real costs, or there is no reason why different miners would use the same tie-breaking...

Reply to this email directly or view it on GitHub: #6531 (comment)

hno commented at 8:35 PM on October 9, 2015: none

As a miner I'd like to see mostly the same transactions included in each successive blocktemplates from the same node. With larger blocks there is several optimizations that can be done in mining operations if the bulk of the mined block is static.

Sorting same priority transactions in first come first served order (time of entry to mempool) seems natural to me.

I don't think hash based sorting is a good idea as it adds another dimension on transaction priority than fee and time when transaction was seen. Such abstract dimensions tends to open the door for unexpected creative (ab)use cases.

sipa commented at 8:38 PM on October 9, 2015: member

First seen based order will differ between different nodes, hash based does not.

hno commented at 7:47 AM on October 13, 2015: none

Of course first seen will differ somewhat between nodes, but not by a huge amount. Moreover, time will not change on one node which makes for least amount of change per node in which transactions are selected.

hash based selection increases the churn in template changes per node significantly, which actually complicates compression in a distributed setup. To inject the same level of churn in first come based policy the spammer would need to find another priority rule such as increasing the fee.

also time of arrival selection is very easy to explain and predict when explaining how equal fee transactions gets selected for block inclusion.

and time of arrivali is already a major factor in bitcoin network decisions, both in block chain consensus and relaying policy both of which uses first seen policy.

Note: Imho there is very little need for a more precise consensus on next block content across the network. It would help centrally managed next-block compression algorithms, but I don't see much benefit in any other context.

TheBlueMatt commented at 10:01 AM on October 13, 2015: contributor

Regarding first seen not differing: this is true unless someone is trying to make them differ. Its rather trivial to make many nodes differ entirely if your tie-breaker is first-seen. Considering that, in a healthy fee market environment, there will always be new transactions with a fee that causes new work to be required to match the filter, I'm not too concerned with churn causing poor filter matches. Remember such a metric is really only to make such systems work in the face of an attacker who is deliberately breaking them.

Regarding the need for such a system: it isnt just about centralized systems. It makes just as much sense in a decentralized one as well, if that one has a need for mempool syncing to compress blocks (all current block compression protocols have this requirement). In a decentralized one, you would need to sync your mempool-discard-transaction-selection to use the same metric.

dcousens commented at 11:36 PM on October 13, 2015: contributor

Maybe make it optional?

Sort by hash by default, but allow miners to sort by time (hell, even the first change should make the latter simpler for hands-on miners).

TheBlueMatt commented at 11:56 PM on October 13, 2015: contributor

@dcousens making it configurable would entirely defeat the purpose of standardizing so that compression works better.

dcousens commented at 12:01 AM on October 14, 2015: contributor

@TheBlueMatt other than its mention in the last 2 comments, it appears I've missed the purpose of this being about compression? Any reference I can read/re-read?

sipa commented at 12:44 AM on October 14, 2015: member

When there are many equal feerate transactions on the network (such as during the recent stress tests), a multiple of what is enough to fill blocks to the end, miners will pick a random subset based on the order they received them in, which results in very bad block compression, as there is no way to estimate what another miner is going to put in a block.

This is not a problem when transactions have all slightly different feerate (as is usually the case), because miners pick the same top ones first in that.

About this breaking "hearing first" as priority... yes, but that is inevitable (in a hopefully expensive-to-exploit way) with limited mempools.

TheBlueMatt commented at 4:05 AM on October 14, 2015: contributor

@sipa currently it is not based on the order miners received them. It is completely random based on the layout of other transactions in the mempool and how the heap ends up being built/used.

hno commented at 7:42 AM on October 16, 2015: none

As a miner I prefer contents of my templates to be predictable and easy to explain, using first come first served with fee as priority is a very natural rule.

I do not care much for the spam transactions, but I do care that block content selection makes sense to users.

I don't see templates being somewhat different in the long tail across the network as a huge problem for compression. It is a problem for the current relay node block compression algorithm, but that is only because it is compressing based on the contents of a single mempool.

The number of miners is fairly limited and I do not expect any of them to have a problem to carry compression dictionaries for the next 5-10 expected blocks of each significant miner, even if all of them carry a different subset of low priority "spam" transactions in the tail of their blocks. We all have mostly the same transactions in our mempools, and only ordering differ somewhat. For an attacker to make a noticeable problem for a distributed compression he would need to find fast paths into many miners, being able to carefully control their mempool contents. Just doing the attack on 3-4 paths in the network is not very significant from a distributed compression ratio perspective.

additionally the "split view" across the network exists no matter what same-fee ordering priority is used as the same can be accomplished with doublespend transactions.

TheBlueMatt commented at 9:14 PM on October 28, 2016: contributor

Seems to have been solved by the evolution of fee markets (see, eg, compact block performance numbers).

TheBlueMatt closed this on Oct 28, 2016

bitcoin locked this on Sep 8, 2021