When I run estimatefee 1 right now, I get -1 as the reply, meaning that it isn't possible to get a 95% chance of inclusion in the next block no matter what fee I include, but based on the collected mempool stats that isn't true.
I traced the TxConfirmStats::EstimateMedianVal() call to see what was happening. It went like this:
In order to get "enough transaction data points in this range of buckets", it first grouped buckets 97 through 75 inclusive, found that their average confirm probability was 94.45% (below the 95% threshold), and so hit the break and stopped looking further.
It turns out that if it had looked at bucket 74 it would have had enough data points on its own, and would have had a probability of 95.51%, and so we could have returned a value of around 0.01 BTC.
So I'm seeing a 94.45% chance of confirmation in the next block if the fee is 0.01156269 or more per kB, and a 95.51% chance if the fee is in the range 0.01051153 - 0.01156269 per kB.
It appears that the code makes the assumption that higher fees always result in a higher probability, but that isn't actually the case.
Wouldn't it be better to start at the cheapest bucket and work upwards, stopping at the first one that has a high enough probability, rather than stopping as soon as we find an expensive bucket with too low a probability? Otherwise it is possible for a small miner to sabotage fee estimation by targeting just one bucket. If he has 5% of the hash rate and refuses to confirm transactions in a particular expensive (0.01 BTC per kB) bucket he will ensure that the search stops at that bucket for all estimatefee 1 calls. He can create enough such transactions to make the bucket statistically significant for less than 1 BTC per day which causing almost 'unlimited' damage to the usefulness of estimatefee 1.
Also, isn't 95% too high of a threshold when we see almost that many empty or nearly empty blocks? If I lower the threshold to 90% I see a much more reasonable result, of around 0.0006 BTC per kB for next-block confirmation. In his original email Alex wrote:
Since even the highest fee transactions are confirmed within the first block only 90-93% of the time, I decided to use 80% as my cutoff.
That seems like a reasonable argument. Why was the number later changed from 85% to 95%?