Detect and ignore transactions that were CPFP'd in the fee estimator

darosior commented at 3:07 pm on June 15, 2022: member

The fee estimator currently considers the fee of a transaction in isolation (and doesn’t consider transactions with unconfirmed parents). This could hold well when it was just a few outliers. But as CPFP is becoming common practice whether it is among wallets (#23074 has a few instances) or protocols (for instance the LN with anchor outputs), i’m afraid that if fees rise this can lead to significantly underestimated results. See the first commit’s message for details and #23074 for history.

This PR is a much simpler alternative to #23074. The point is to fix the issue before we come up with a reasonable way to track package feerates in the fee estimator.

The functional test is in a separate commit so reviewers can more easily cherry-pick it on master and see the fee estimator would return the wrong estimate.

darosior commented at 3:08 pm on June 15, 2022: member

cc @glozow @instagibbs @naumenkogs since you Concept ACK’d #23074. This is implementing #23074 (comment).

instagibbs commented at 3:12 pm on June 15, 2022: member

seems reasonable to not take signals that are obviously bogus, will give more thinking

laanwj added the label Mempool on Jun 15, 2022

in test/functional/feature_fee_estimation.py:118 in e928f4ce73 outdated

114@@ -115,13 +115,13 @@ def check_estimates(node, fees_seen):
115     check_smart_estimates(node, fees_seen)
116 
117 
118-def send_tx(wallet, node, utxo, feerate):
119+def send_tx(wallet, node, feerate, utxo=None):

luke-jr commented at 10:07 pm on June 18, 2022:

Prefer just passing None as utxo explicitly, as appropriate.

darosior commented at 5:20 pm on June 22, 2022:

Done. What do you think of this PR conceptually?

luke-jr commented at 5:10 pm on June 26, 2022:

Feels like a hack that would be better off identifying CPFP from the block contents itself. Possibly even using it for the estimates, if practical. But at least this should be better than nothing.

DrahtBot commented at 0:27 am on June 21, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ismaelsadeeq
Concept ACK	naumenkogs, murchandamus, josibake

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

in src/policy/fees.cpp:614 in e928f4ce73 outdated

609+    // Note this check is only best-effort, as the child might not be part of the block. Or
610+    // another child might be part of the block that we didn't have in our mempool.
611+    CFeeRate feerate_entry{entry->GetFee(), (uint32_t)entry->GetTxSize()};
612+    for (const CTxMemPoolEntry& child : entry->GetMemPoolChildren()) {
613+        CFeeRate ancestor_score{child.GetModFeesWithAncestors(), (uint32_t)child.GetSizeWithAncestors()};
614+        if (ancestor_score > feerate_entry) return false;

glozow commented at 10:51 am on June 22, 2022:

Hm, I don’t think ancestor_score > individual_feerate is really an indication that it was CPFP’d. We’d want to see if it had descendants in the block. It could just mean that this transaction had high-feerate ancestors.

Suggested alternative: I think ancestor_score < individual_feerate would be a good indication that this tx was the sponsor of a CPFP in the block. Maybe we use that condition, and drop all of the transaction’s ancestors from the fee estimator? Or, to reduce over-pruning, we drop the ancestors whose feerates are lower than this transaction’s?

darosior commented at 12:04 pm on June 22, 2022:

(TL;DR your suggestion is best but i’m leaving the response to the first section for context)

Yeah it’s a best effort. I could make sure that the descendant was also part of the block. Is it worth it though? If an ancestor-less transaction with mempool child that pay a higher feerate than itself got mined, the child probably got included as well (not necessarily, but i assumed very probably). Do you think that assumption isn’t reasonable? My doubt is more with more convoluted packages. For instance:

0A ------ D
1      __/ /
2B /      /
3      _ /
4C /

With A 1sat/vb, B 10sat/vb, C 10sat/vb and D 1sat/vb. It would be a false positive for A. To prevent this we could simply condition the check on the descendants feerate of A:

 0diff --git a/src/policy/fees.cpp b/src/policy/fees.cpp
 1index 029dd108c3..5d5dfb2efe 100644
 2--- a/src/policy/fees.cpp
 3+++ b/src/policy/fees.cpp
 4@@ -609,9 +609,12 @@ bool CBlockPolicyEstimator::processBlockTx(unsigned int nBlockHeight, const CTxM
 5     // Note this check is only best-effort, as the child might not be part of the block. Or
 6     // another child might be part of the block that we didn't have in our mempool.
 7     CFeeRate feerate_entry{entry->GetFee(), (uint32_t)entry->GetTxSize()};
 8-    for (const CTxMemPoolEntry& child : entry->GetMemPoolChildren()) {
 9-        CFeeRate ancestor_score{child.GetModFeesWithAncestors(), (uint32_t)child.GetSizeWithAncestors()};
10-        if (ancestor_score > feerate_entry) return false;
11+    CFeeRate feerate_desc{entry->GetModFeesWithDescendants(), (uint32_t)entry->GetSizeWithDescendants()};
12+    if (feerate_desc > feerate_entry) {
13+        for (const CTxMemPoolEntry& child : entry->GetMemPoolChildren()) {
14+            CFeeRate ancestor_score{child.GetModFeesWithAncestors(), (uint32_t)child.GetSizeWithAncestors()};
15+            if (ancestor_score > feerate_entry) return false;
16+        }
17     }
18 
19     // How many blocks did it take for miners to include this transaction?

Regarding your suggestion. do you mean we should iterate over all the mempool entries that were part of the block to detect CPFPs beforehand? I agree it’s much better. It makes sure that the child was part of the same block. And your suggestion to only drop the ancestors whose feerates are lower than this transaction’s covers the case i detailed above. Nice! Is that roughly what you were thinking about:

 0diff --git a/src/policy/fees.cpp b/src/policy/fees.cpp
 1index 029dd108c3..8c0e71ea18 100644
 2--- a/src/policy/fees.cpp
 3+++ b/src/policy/fees.cpp
 4@@ -604,16 +604,6 @@ bool CBlockPolicyEstimator::processBlockTx(unsigned int nBlockHeight, const CTxM
 5         return false;
 6     }
 7 
 8-    // Do not consider this transaction's feerate if it might have been mined due to one of
 9-    // its children (CPFP).
10-    // Note this check is only best-effort, as the child might not be part of the block. Or
11-    // another child might be part of the block that we didn't have in our mempool.
12-    CFeeRate feerate_entry{entry->GetFee(), (uint32_t)entry->GetTxSize()};
13-    for (const CTxMemPoolEntry& child : entry->GetMemPoolChildren()) {
14-        CFeeRate ancestor_score{child.GetModFeesWithAncestors(), (uint32_t)child.GetSizeWithAncestors()};
15-        if (ancestor_score > feerate_entry) return false;
16-    }
17-
18     // How many blocks did it take for miners to include this transaction?
19     // blocksToConfirm is 1-based, so a transaction included in the earliest
20     // possible block has confirmation count of 1
21@@ -625,6 +615,7 @@ bool CBlockPolicyEstimator::processBlockTx(unsigned int nBlockHeight, const CTxM
22         return false;
23     }
24 
25+    CFeeRate feerate_entry{entry->GetFee(), (uint32_t)entry->GetTxSize()};
26     feeStats->Record(blocksToConfirm, (double)feerate_entry.GetFeePerK());
27     shortStats->Record(blocksToConfirm, (double)feerate_entry.GetFeePerK());
28     longStats->Record(blocksToConfirm, (double)feerate_entry.GetFeePerK());
29@@ -659,6 +650,18 @@ void CBlockPolicyEstimator::processBlock(unsigned int nBlockHeight,
30     shortStats->UpdateMovingAverages();
31     longStats->UpdateMovingAverages();
32 
33+    for (const auto& entry : entries) {
34+        CFeeRate individual_feerate{entry->GetFee(), (uint32_t)entry->GetTxSize()};
35+        CFeeRate ancestor_score{entry->GetModFeesWithAncestors(), (uint32_t)entry->GetSizeWithAncestors()};
36+
37+        if (individual_feerate > ancestor_score) {
38+            for (const CTxMemPoolEntry& parent : entry->GetMemPoolParents()) {
39+                CFeeRate parent_feerate{parent.GetFee(), (uint32_t)parent.GetTxSize()};
40+                if (parent_feerate < individual_feerate) _removeTx(parent.GetTx().GetHash(), true);
41+            }
42+        }
43+    }
44+
45     unsigned int countedTxs = 0;
46     // Update averages with data points from current block
47     for (const auto& entry : entries) {

glozow commented at 2:06 pm on June 23, 2022:

Yeah, it’s more sensible to just look at the sponsor’s ancestors since you’re guaranteed that they were all in the block too.

I like the second approach in your above comment, i.e. 5485acfe88051234a09862819ddc2953f9d42058. I was going to suggest possibly removing all ancestors, and not just direct parents, but I see that we can’t really query all ancestors since we only have the single entry.

glozow commented at 10:56 am on June 22, 2022: member

Agree it’s better not to feed inaccurate data to our fee estimator. Do we have a preference for underestimation vs overestimation of feerates? It’s probably bad to include both sponsors and sponsees in CPFP situations?

Ideally, one day we come up with a way to accurately record CPFP feerates in the fee estimator. Not to scope creep - I don’t know how we could do it properly right now and I think this PR is better than having inaccurate estimates.

darosior commented at 11:12 am on June 22, 2022: member

Do we have a preference for underestimation vs overestimation of feerates?

I think it’s always better to be slightly overestimate than to slightly under-estimate, as the latter can result in frustration for the regular onchain user or degraded service quality for professionals (for instance stuck transactions can create a “shortage” of confirmed UTxOs or too long mempool chains). As far as i can tell from discussions and the algorithm it’s always been the opinion to err on the conservative side.

It’s probably bad to include both sponsors and sponsees in CPFP situations?

Sponsees aren’t included since transactions with mempool dependencies don’t get considered for fee estimation. And i don’t think we can do that until we find a proper way of considering packages in the fee estimator.

darosior force-pushed on Jun 22, 2022

darosior commented at 5:21 pm on June 22, 2022: member

Force-pushed to address @luke-jr’s nit and include @glozow’s great suggestion to identify CPFP from child, not parents.

glozow commented at 2:12 pm on June 23, 2022: member

transactions with mempool dependencies don’t get considered for fee estimation.

Ah, my bad! Seems like we’re definitely underestimating in CPFP situations right now, then, since we include the parents but not the child.

I think it’s always better to be slightly overestimate than to slightly under-estimate

I agree, it seems safer to sometimes pay a little extra than to miss your confirmation target.

in src/policy/fees.cpp:675 in 5485acfe88 outdated

659+    // the data points.
660+    for (const auto& entry : entries) {
661+        CFeeRate individual_feerate{entry->GetFee(), (uint32_t)entry->GetTxSize()};
662+        CFeeRate ancestor_score{entry->GetModFeesWithAncestors(), (uint32_t)entry->GetSizeWithAncestors()};
663+
664+        if (individual_feerate > ancestor_score) {

mzumsande commented at 4:39 pm on July 13, 2022:

I’m not sure why we have this criterium individual_feerate > ancestor_score. In a CPFP situation with a child C, a low-fee parent P1 which it pays for, and another high-fee parent P2 of C, it could prevent us from removing P1 from the fee estimation, because C’s ancestor score (which is influenced by P2) might be larger than its individual feerate.

I understand that the check is a heuristic and can’t cover everything, but would there be any downside if we just dropped this check and only applied the second criterion, parent_feerate < individual_feerate?

darosior commented at 9:36 am on July 14, 2022:

Only having parent_feerate < individual_feerate does not account for the size of the dependencies. For instance, a transaction C (2000 sat, 100 vb) spending a transaction B (200 sat, 1000 vb) which in turn spends a transaction A (1000sat, 100 vb):

0A --- B --- C

C clearly does not CPFP A, while it has a higher feerate. We want to check parent_feerate < ancestor_score. We do it indirectly since individual_feerate > ancestor_score.

individual_feerate > ancestor_score is to make sure the child is paying, to prevent a pay-for-sibling scenario:

0A
1  \
2     C
3  /
4B

B is 2sat/vb, A is 1sat/vb, C is 1sat/vb. The ancestor score of C is larger than the feerate of A, but it’s not a CPFP.

So this may be too conservative in some scenario, such as the one you describe. But i can’t think of a reasonable way to take it into account. Also i think we better be overly conservative if that gives us confidence we won’t let some false positives through.

mzumsande commented at 12:39 pm on July 14, 2022:

I don’t understand how individual_feerate > ancestor_score helps in your first example: Both B and C have been excluded from the fee estimation already in validation, because they had mempool ancestors. The algorithm added in this PR can only ever remove (direct) parents from the fee estimation, it doesn’t loop through ancestors of a higher level - so C and its ancestor score seem irrelevant, because the processing of C could only ever remove B (which is already removed) but can’t remove A because it’s not a parent of C? And while processing B (which could remove A in theory) the second criterion parent_feerate < individual_feerate would work out, so we don’t remove A there either.

darosior commented at 1:27 pm on July 14, 2022:

You are right, i wrote this example assuming we were going through all ancestors of a transaction. :facepalm:

Nonetheless the first example works also if A and B are siblings.

darosior commented at 2:43 pm on April 3, 2023:

Resolving this, i don’t think we can get rid of this heuristic. #25380 (comment) gives a counter example where it would lead to a false positive. Please let me know if you disagree.

in src/policy/fees.cpp:668 in 200a49f7e7 outdated

651@@ -652,6 +652,23 @@ void CBlockPolicyEstimator::processBlock(unsigned int nBlockHeight,
652     shortStats->UpdateMovingAverages();
653     longStats->UpdateMovingAverages();
654 
655+    // Do not consider transactions that were CPFP'd.
656+    // For each mempool transaction that was included in a block, if its individual feerate is higher
657+    // than its ancestor score it incentivized the inclusion of some of its parents in the block.

ariard commented at 2:05 am on July 21, 2022:

While I agree than a child with an individual feerate than its ancestor score incentivized the inclusion of some of its parents in the block, I don’t think it means the parent wouldn’t have make it alone on its own feerate in the block. I wonder if a most sophisticated heuristic wouldn’t be to qualify the sponsoring if the parent feerate is lower than your top mempool feerate group at block reception. Of course, considering block reception would make things free of block propagation allowing your mempool content to fluctuate though assuming perfect information propagation might be okay.

I believe such situations could arise when there is no entity with the full-view of the package, in the sense that the CPFP can be attached by a different transaction issuer than the parent one. Therefore this second party might have different views of the mempool or more conservative fee-bumping strategy.

However, I think that type of situations is for now scarce enough than the current proposed heuristic individual_feerate > ancestor_score is good enough ?

In any case, I think it would be good to document the heuristic rational and the bounds here, in case of future changes in the ecosystem of Bitcoin applications invalidating this heuristic.

darosior commented at 11:34 am on July 26, 2022:

I don’t think it’s reasonable to assume the content of the mempool of the mining pool node was the same at the time it created a block template (which can be relatively long before a grinder found a valid pow) than our mempool at the time we received the block (which can be relatively long after they found it). Even then, i don’t think it’d be worth creating a block template just for a slightly more accurate filtering. Instead, i think any more complexity than a simple conservative heuristic should go into making the estimator packages-aware.

Regarding the documentation of the heuristic, what are you suggesting? You are commenting on a code comment documenting this heuristic.

ariard commented at 2:08 am on July 21, 2022: none

I agree it’s better to overestimate rather than underestimate, especially if you’ve have time-sensitive safety confirmations or offering block inclusion SLA to your withdrawals. I still wonder if we can’t come up with a better heuristic.

naumenkogs commented at 8:44 am on August 25, 2022: member

Concept ACK.

The inaccuracy seems real. Since we’re targeting a simple heuristic, there will be downsides. The risk is overestimating and paying extra I guess. If we really want to check, we can measure how would fees differ by applying this change to historic blocks (?)

Otherwise, it would help if @darosior summarized the alternatives suggested in couple sentences (or helping in navigating the design scope e.g. “what could be possibly done”).

As for the current code, I think it’s an improvement, but I think we indeed need more eyes/opinions.

glozow requested review from josibake on Dec 5, 2022

DrahtBot added the label Needs rebase on Dec 19, 2022

murchandamus commented at 7:30 pm on February 17, 2023: contributor

Concept ACK

Did some light review, looks reasonable, would review again when rebased

darosior force-pushed on Apr 3, 2023

DrahtBot removed the label Needs rebase on Apr 3, 2023

darosior commented at 2:41 pm on April 3, 2023: member

Thanks everyone for the interest, finally reviving this. Rebased and modernized the code (use static casts and named parameters, make use of batched RPC calls after #26656). @naumenkogs

If we really want to check, we can measure how would fees differ by applying this change to historic blocks (?)

We’d also need historical snapshots of a mempool to do that. If someone has got some and is willing to test, i’d be very interested in seeing the results.

Otherwise, it would help if @darosior summarized the alternatives suggested in couple sentences (or helping in navigating the design scope e.g. “what could be possibly done”).

The design space is actually quite narrow. Short of tracking packages, we can only get rid of invalid data points. I aim for the detection logic to be straightforward and conservative: better let a few invalid data points still go through rather than drop some valid ones. To achieve this we go through the mempool entries that were included in a block when connecting it. We then apply two heuristics to determine whether it is a child paying for a parent we may be tracking for fee estimation:

The individual feerate of this transaction is higher than its ancestor feerate. (Will be equal if it has no ancestor.)
The individual feerate of the parent is lower than the individual feerate of this transaction. (Note it is not necessary to check if the parent itself has ancestors since if it does we wouldn’t be tracking it for fee estimation.)

It was suggested by @mzumsande in #25380 (review) we may be able to get rid of the first heuristic, effectively only checking if a transaction we are tracking was mined along with a child transaction that has a higher feerate. I don’t think it is correct, as the child transaction’s effective feerate may be lower than the parent’s. For instance:

0A (1000sats; 100vb)
1                    \
2                      C (1200sats; 100vb)
3                    /
4B (100sats; 100vb)

In this diagram, C would be incorrectly accounted as a CPFP of A if we were to only use the second heuristic. However by using both heuristics C would still be detected as a CPFP of B (and only of B).

Another thing that could be done here is to take into account “second-order” CPFP. That is, if the child of the child of a transaction we are tracking for fee estimation is bumping. For instance (same as above + D):

0A (1000sats; 100vb)
1                    \
2                      C (1200sats; 100vb) -- D (10000sats; 100vb)
3                    /
4B (100sats; 100vb)

In this diagram we would not be detecting neither A nor B as being CPFP’d. But solving this requires looking at the descendant feerate of an entry, which AFAICT brings a lot of complexity (and computation) to make sure we don’t count a descendant multiple times in more complicated packages. In addition i suspect most CPFPs used on the network currently are “first-order” CPFPs (e.g. an “accelerate” button on a deposit on an onchain wallet, or anchor outputs for commitment transactions in Lightning). Therefore, even if it was reasonable to try to account for longer chains of CPFPs (which i’m not convinced it is), i’d be more inclined to not attempts to do so here. If we want to do this let’s merge this simple and straightforward fix for most of the CPFP usage and try to tackle edge cases in a follow-up.

Finally, i’m considering being more strict on the first heuristic. We could for instance require that the child is at least a 10% increase in feerate to be considered a CPFP. Because currently a parent would be considered to have been CPFP’d if it has a child with an individual feerate 0.001sat/vb higher than its ancestor feerate. In this case i think the parent’s feerate is still a good datapoint and we shouldn’t get rid of it. What do you all think?

josibake commented at 8:29 am on April 5, 2023: member

Concept ACK

We’d also need historical snapshots of a mempool to do that. If someone has got some and is willing to test, i’d be very interested in seeing the results.

I have historical snapshots and would be happy to help run the test. Although, since running the test is not trivial, I wouldn’t make a blocker for the PR.

DrahtBot added the label CI failed on May 30, 2023

DrahtBot removed the label CI failed on May 31, 2023

DrahtBot added the label Needs rebase on Jun 20, 2023

in src/policy/fees.cpp:678 in 5ae45ad9cf outdated

659+        CFeeRate ancestor_score{entry->GetModFeesWithAncestors(), static_cast<uint32_t>(entry->GetSizeWithAncestors())};
660+
661+        if (individual_feerate > ancestor_score) {
662+            for (const CTxMemPoolEntry& parent : entry->GetMemPoolParents()) {
663+                CFeeRate parent_feerate{parent.GetFee(), static_cast<uint32_t>(parent.GetTxSize())};
664+                if (parent_feerate < individual_feerate) _removeTx(parent.GetTx().GetHash(), /* inBlock = */true);

ismaelsadeeq commented at 9:04 pm on July 4, 2023:

0 if (parent_feerate < individual_feerate) _removeTx(parent.GetTx().GetHash(), /* inBlock = */true);

This means parents that are confirmed from previous blocks are going to be removed as well right? not just parents from this block we are processing?

glozow commented at 9:32 am on July 5, 2023:

No, any parents from previous blocks would not exist in the mempool anymore. They would have been removed when that block was processed.

ismaelsadeeq commented at 11:16 am on July 5, 2023:

Thank you If the ancestors are confirmed in the previous block, and the child (with fee rate higher than ancestors score) is in the block we are processing, is that also considered CPFP? if yes then we are not removing those ancestors here right (they were removed when that block was confirmed and recorded in block stats)?

Also why is /* inBlock = */true ?

murchandamus commented at 2:00 pm on July 5, 2023:

CPFP refers to a couple of unconfirmed transactions where the descendant transaction pays a higher feerate than the parent transaction. If the parent transaction is already confirmed, it’s no longer a CPFP situation.

ismaelsadeeq commented at 2:55 pm on July 5, 2023:

Thank you @Xekyo

darosior commented at 4:36 pm on July 18, 2023:

Also why is /* inBlock = */true ?

That’s so it’s not counted as a failure.

glozow commented at 9:36 am on July 5, 2023: member

I have historical snapshots and would be happy to help run the test. Although, since running the test is not trivial, I wouldn’t make a blocker for the PR.

Maybe something a bit simpler related to historical data: are we able to see what percentage of block transactions actually have ancestors/descendants? My understanding is that the percentage is quite low. So an alternative approach could be to just ignore any transactions with in-block descendants (just like we ignore any with in-block ancestors) instead of trying add logic to detect CPFPs.

darosior commented at 3:50 pm on July 18, 2023: member

Good point. I would have bet it’s a non-trivial percentage (and an increasing one), so i wrote a quick Python script to measure this. It’s there.

Here is the results for a few (small) block ranges i picked at random. The CPFP usage seems non trivial to me (between 10% to 15% of all transactions in a block).

0Between block heights 799241 and 799251:                                                                                                                                                                                                                                                                                                                                   
1    - The average percentage of transactions with a descendant in the same block is 11.518659558263519%
2    - The average percentage of transactions with an ancestor in the same block is 19.363290175171365%
3    - The highest percentage of transactions with a descendant in the same block is 23.660842165568603
4    - The highest percentage of transactions with an ancestor in the same block is 30.215827338129497
5    - The lowest percentage of transactions with a descendant in the same block is 2.486788933789245
6    - The lowest percentage of transactions with an ancestor in the same block is 6.264840182648401

0Between block heights 789241 and 789251:                                                  
1    - The average percentage of transactions with a descendant in the same block is 16.697372248618247%
2    - The average percentage of transactions with an ancestor in the same block is 38.6041888878115%
3    - The highest percentage of transactions with a descendant in the same block is 22.668609492089924
4    - The highest percentage of transactions with an ancestor in the same block is 61.427238805970156
5    - The lowest percentage of transactions with a descendant in the same block is 11.335830212234708
6    - The lowest percentage of transactions with an ancestor in the same block is 22.358722358722357

0Between block heights 759241 and 759251:                                                  
1    - The average percentage of transactions with a descendant in the same block is 12.82089552238806%
2    - The average percentage of transactions with an ancestor in the same block is 13.124378109452737%
3    - The highest percentage of transactions with a descendant in the same block is 25.75250836120401
4    - The highest percentage of transactions with an ancestor in the same block is 23.076923076923077
5    - The lowest percentage of transactions with a descendant in the same block is 4.3478260869565215
6    - The lowest percentage of transactions with an ancestor in the same block is 4.3478260869565215

0Between block heights 659241 and 659251:
1    - The average percentage of transactions with a descendant in the same block is 13.264294370386692%
2    - The average percentage of transactions with an ancestor in the same block is 13.80412030406522%
3    - The highest percentage of transactions with a descendant in the same block is 18.257956448911223
4    - The highest percentage of transactions with an ancestor in the same block is 17.75544388609715
5    - The lowest percentage of transactions with a descendant in the same block is 9.807767752059632
6    - The lowest percentage of transactions with an ancestor in the same block is 10.6553911205074

This data is interesting, and IMO relevant, but it’s missing one crucial thing in the context of this PR: the percentage of transactions with descendant(s) in the same block among fee estimation candidates. I’m working on adding this now.

darosior commented at 4:34 pm on July 18, 2023: member

Here are the results for a few block ranges with the percentage of candidates for fee estimation among all transactions, and the percentage of CPFP among these candidates. It’s around 10%.

0Between block heights 798241 and 798251:
1    - The average percentage of transactions with a descendant in the same block is 8.198356016709338%
2    - The average percentage of transactions with an ancestor in the same block is 16.332030723622154%
3    - The highest percentage of transactions with a descendant in the same block is 17.461263408820024%
4    - The highest percentage of transactions with an ancestor in the same block is 28.820545339251748%
5    - The lowest percentage of transactions with a descendant in the same block is 1.0730593607305936%
6    - The lowest percentage of transactions with an ancestor in the same block is 3.881278538812785%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 83.66796927637785%
8    - The average percentage of transactions with a descendant in the same block among candidates is 6.22644548236431%

0Between block heights 797241 and 797251:
1    - The average percentage of transactions with a descendant in the same block is 22.226875865846193%
2    - The average percentage of transactions with an ancestor in the same block is 40.3943426012436%
3    - The highest percentage of transactions with a descendant in the same block is 35.889395667046756%
4    - The highest percentage of transactions with an ancestor in the same block is 59.27003438243851%
5    - The lowest percentage of transactions with a descendant in the same block is 11.915269196822594%
6    - The lowest percentage of transactions with an ancestor in the same block is 14.298323036187114%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 59.60565739875641%
8    - The average percentage of transactions with a descendant in the same block among candidates is 13.410085941300471%

0Between block heights 796241 and 796251:
1    - The average percentage of transactions with a descendant in the same block is 6.943150046598323%
2    - The average percentage of transactions with an ancestor in the same block is 17.707362534948743%
3    - The highest percentage of transactions with a descendant in the same block is 20.015729453401494%
4    - The highest percentage of transactions with an ancestor in the same block is 44.85330681253108%
5    - The lowest percentage of transactions with a descendant in the same block is 0.5854372484449323%
6    - The lowest percentage of transactions with an ancestor in the same block is 0.7318123116659492%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 82.29263746505126%
8    - The average percentage of transactions with a descendant in the same block among candidates is 5.309439744187596%

0Between block heights 756241 and 756251:
1    - The average percentage of transactions with a descendant in the same block is 14.151887221464301%
2    - The average percentage of transactions with an ancestor in the same block is 14.502046384720327%
3    - The highest percentage of transactions with a descendant in the same block is 23.841059602649008%
4    - The highest percentage of transactions with an ancestor in the same block is 25.57552822453485%
5    - The lowest percentage of transactions with a descendant in the same block is 5.805038335158817%
6    - The lowest percentage of transactions with an ancestor in the same block is 5.914567360350493%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 85.49795361527967%
8    - The average percentage of transactions with a descendant in the same block among candidates is 9.845220998883038%

0Between block heights 656241 and 656251:
1    - The average percentage of transactions with a descendant in the same block is 19.724950884086446%
2    - The average percentage of transactions with an ancestor in the same block is 19.51277013752456%
3    - The highest percentage of transactions with a descendant in the same block is 33.03791235844412%
4    - The highest percentage of transactions with an ancestor in the same block is 32.890201870999505%
5    - The lowest percentage of transactions with a descendant in the same block is 2.5210084033613445%
6    - The lowest percentage of transactions with an ancestor in the same block is 2.5210084033613445%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 80.48722986247544%
8    - The average percentage of transactions with a descendant in the same block among candidates is 10.447178285491116%

policy/fees: ignore transactions that were CPFP'd

The fee estimator would previously assume that a miner was only
incentivized by the fees of a transaction in isolation, completely
disregarding its descendants. For instance, upon confirmation of a
package {A, B} where A is 1sat/vb and B 20sat/vb, the fee estimator
would assume that 1sat/vb was sufficient for getting this transaction
confirmed.

This can lead to significantly underestimated fees as CPFP gets more
usage on the network. It is currently actively used by many wallets, and
will get more adoption along with L2s that rely on it for fee bumping
(for instance the Lightning Network with anchor outputs).

I previously attempted a boutique accounting of child feerates [0], but a
decent tracking of ancestor scores as new child come in and parts of
packages get confirmed will require significantly more work. In the
meantime just detect transactions that were likely confirmed because of
their descendants' feerate and don't take them into account for fee
estimation. It's not perfect but should prevent the mistake in most
cases (and for the very unlikely edge cases the current algorithm is pretty
resistent to outliers anyways).

[0] https://github.com/bitcoin/bitcoin/pull/23074

77e426d270

qa: fee est test: keep the list of confirmed utxos up to date in RBF test 506b0e4b2d

qa: fee est: test we ignore parent transactions in a CPFP 12e36414dc

darosior force-pushed on Jul 18, 2023

darosior commented at 6:33 pm on July 18, 2023: member

Rebased.

Also optimized my script a bit and ran it over a longer period (through May and June of this year, which was marked with pretty high fees and usage):

0Between block heights 788250 and 796890:
1    - The average percentage of transactions with a descendant in the same block is 14.775036605051687%
2    - The average percentage of transactions with an ancestor in the same block is 32.52518731543957%
3    - The highest percentage of transactions with a descendant in the same block is 58.61574323112785%
4    - The highest percentage of transactions with an ancestor in the same block is 87.25593113583876%
5    - The lowest percentage of transactions with a descendant in the same block is 0.07524454477050413%
6    - The lowest percentage of transactions with an ancestor in the same block is 0.07524454477050413%
7    - The average percentage of transactions in a block that would be considered for fee estimation is 67.47481268456043%
8    - The average percentage of transactions with a descendant in the same block among candidates is 13.464749562330006%

DrahtBot removed the label Needs rebase on Jul 18, 2023

glozow commented at 1:05 pm on August 1, 2023: member

@darosior thanks for the data! The proportion of transactions with descendants seems pretty high, so my comment about ignoring stuff with descendants is probably not a good idea.

ismaelsadeeq commented at 10:57 am on September 10, 2023: member

Please correct me if I am wrong on this.

The CBlockPolicyEstimator is not package aware, Some transactions that are recorded in the fee statistics as confirmed, with their actual fee rate shouldn’t be. Their child may pay for their inclusion in the block, we should record the child as confirmed with its ancestor score instead.

The fee estimator does not take into account transactions that have a parent in the mempool.

That means we are not just disregarding a some amount of fee rate statistics (all transactions with in mempool parent), We are tracking the stat with inaccurate feerate.

This PR is trying to ensure that the ancestors of a transaction that is assumed to be CPFP should not be recorded in the fee estimator statistics as confirmed.

If CPFP adoption continues to increase, the data that CBlockPolicyEstimator will have for fee estimation will be reduced drastically, which might potentially result in the fee estimator giving us a high fee estimate or some unexpected fee rate, because we don’t have the right amount of data.

On Master e25af11225d9d94ecf7068bf7a9a359268786fbe CBlockPolicyEstimator updates from mempool every time a new block is connected, Instead of mempool cs being blocked only on mempool transaction removal, we were blocking on both transaction removals and fee estimator updating.

If we want to further improve fee estimation e.g. opting into the approach in #23074 comment or add heavy-calculation steps to CBlockPolicyEstimator, it might not be viable as we would be slowing down block relay in the process.

With this #28368 we can now conveniently do that and try to add transactions to the fee estimator by ancestor feerate using #23074 comment approach or similar. ~~I’ve started looking at that and drafted a gist on how that will look like.~~ Edit: The approach in the gist has some issues I would like to know what you think @darosior @glozow.

Idk why @DrahdBot did not update but this PR conflicts with #28368 as updating the fee estimator from the new interface does not have details to detect the heuristic in this PR.

darosior commented at 8:35 am on September 11, 2023: member

This PR is trying to ensure that the ancestors of a transaction that is assumed to be CPFP should not be recorded in the fee estimator statistics as confirmed.

The intent of this PR is to prevent a transaction tracked in the fee estimator to be assumed as being mined thanks to solely its own feerate when it’s not. Technically it’s equivalent to what you describe.

If CPFP adoption continues to increase, the data that CBlockPolicyEstimator will have for fee estimation will be reduced drastically, which might potentially result in the fee estimator giving us a high fee estimate or some unexpected fee rate, because we don’t have the right amount of data.

The fee estimator would fail if it doesn’t have enough data points to give a reasonable estimation. Unfortunately right now in the situation you describe it would return an incorrectly low estimate instead for the reason mentioned above.

If we want to further improve fee estimation e.g. opting into the approach in #23074#pullrequestreview-764375424

I’m skeptical we could improve the fee estimator to consider packages without a significant rewrite. The current algorithm tracks the time between arrival in our mempool and inclusion in a block. With a package you don’t have a single arrival time anymore. And it’s unclear how a replacement of part of the package would affect our estimates.

Here is an example. A 4-txs package with a 10sats/vb feerate arrives at block height n. At block height n+2 we get a fifth transaction, bumping the feerate to 12sats/vb. At n+8 3 of the transactions get replaced, the package feerate is bumped to 14sats/vb and included at block n+9. It’s not clear how we would analyze this situation in the current framework.

There is certainly an argument for restricting ourselves to certain kinds of packages, or to packages that are relayed once (assuming package relay otherwise it’s still more complex) and never change. This way we could adapt our fee estimation framework to what’s being used on the network and maybe adapt it as usage patterns change.

To be fair this seems to me closer to a research project than something we could implement anytime soon, but maybe i’m overly pessimistic.

glozow commented at 9:28 am on September 11, 2023: member

Idk why @DrahdBot did not update but this PR conflicts with #28368

DrahtBot doesn’t detect silent merge conflicts afaik. But yeah, this PR uses the vector of entries to detect cpfp-ness in processBlock and those entries would be gone if the fee estimator updates in the background. I think we’d probably want to resolve that by making copies of the information the fee estimator needs.

If we want to further improve fee estimation e.g. opting into the approach in #23074#pullrequestreview-764375424

I’m skeptical we could improve the fee estimator to consider packages without a significant rewrite.

I agree I don’t see any simple solution for package-aware fee estimation… fwiw I was just brainstorming in that comment and it has many of the same inaccuracy problems. Perhaps with cluster mempool, we can notify the fee estimator each time a transaction’s miner score changes (which is a lot of signals but should encompass package-aware scores + timing)?

I think this PR is the right thing to do for now.

ismaelsadeeq referenced this in commit 55beb4f39b on Sep 13, 2023

ismaelsadeeq commented at 1:15 pm on September 13, 2023: member

Thanks @darosior and @glozow . I agree with your comments, hence resolved conflict with #28368 in latest push. Happy to rebased to master if this gets in first, else if #28368 gets in first feel free to cherry-pick https://github.com/bitcoin/bitcoin/commit/68f0813061e72b7b916107135af32271ed6d07a8 to resolve merge conflict.

in test/functional/feature_fee_estimation.py:423 in 12e36414dc

418+        tx = send_tx(self.wallet, node, None, low_feerate)
419+        u = {"txid": tx["txid"], "vout": 0, "value": Decimal(tx["tx"].vout[0].nValue) / COIN}
420+        send_tx(self.wallet, node, u, high_feerate)
421+        self.sync_mempools(wait=0.1, nodes=[node, miner])
422+        self.generate(miner, 1)
423+        assert node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"] == 0

ismaelsadeeq commented at 1:18 pm on September 13, 2023:

nit

0       assert_equal(node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"], 0)

in test/functional/feature_fee_estimation.py:431 in 12e36414dc

426+        tx = send_tx(self.wallet, node, None, high_feerate)
427+        u = {"txid": tx["txid"], "vout": 0, "value": Decimal(tx["tx"].vout[0].nValue) / COIN}
428+        send_tx(self.wallet, node, u, low_feerate)
429+        self.sync_mempools(wait=0.1, nodes=[node, miner])
430+        self.generate(miner, 1)
431+        assert node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"] == 1

ismaelsadeeq commented at 1:19 pm on September 13, 2023:

nit

0    assert_equal(node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"], 1)

in test/functional/feature_fee_estimation.py:440 in 12e36414dc

435+        u = {"txid": tx["txid"], "vout": 0, "value": Decimal(tx["tx"].vout[0].nValue) / COIN}
436+        send_tx(self.wallet, node, u, high_feerate)
437+        self.sync_mempools(wait=0.1, nodes=[node, miner])
438+        self.generate(miner, 1)
439+        # Decay of 0.962, truncated to 2 decimals in the RPC result
440+        assert node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"] == Decimal("1.96")

ismaelsadeeq commented at 1:19 pm on September 13, 2023:

nit

0    assert_equal(node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"], Decimal("1.96"))

instagibbs commented at 2:04 pm on September 26, 2023:

this seems very brittle, can you add a formula to recalculate this if something changes, or use the formula directly here?

in test/functional/feature_fee_estimation.py:468 in 12e36414dc

463+                txs.append(tx)
464+            batch_send_tx = (node.sendrawtransaction.get_request(tx["hex"]) for tx in txs)
465+            node.batch(batch_send_tx)
466+            self.sync_mempools(wait=0.1, nodes=[node, miner])
467+            self.generate(miner, 1)
468+        assert node.estimatesmartfee(2)["feerate"] == med_feerate * 1000 / COIN

ismaelsadeeq commented at 1:21 pm on September 13, 2023:

nit

0    assert_equal(node.estimatesmartfee(2)["feerate"], (med_feerate * 1000 / COIN))

ismaelsadeeq commented at 1:21 pm on September 13, 2023: member

Code review ACK 12e36414dcbd0a8ee00ab02325833baadd3c29d9

ismaelsadeeq referenced this in commit ab6443bb9a on Sep 13, 2023

ismaelsadeeq referenced this in commit caae68b94b on Sep 13, 2023

ismaelsadeeq referenced this in commit 1e5d7c3b01 on Sep 14, 2023

instagibbs commented at 2:15 pm on September 21, 2023: member

I will review

in src/policy/fees.cpp:678 in 77e426d270 outdated

673+        CFeeRate ancestor_score{entry->GetModFeesWithAncestors(), static_cast<uint32_t>(entry->GetSizeWithAncestors())};
674+
675+        if (individual_feerate > ancestor_score) {
676+            for (const CTxMemPoolEntry& parent : entry->GetMemPoolParents()) {
677+                CFeeRate parent_feerate{parent.GetFee(), static_cast<uint32_t>(parent.GetTxSize())};
678+                if (parent_feerate < individual_feerate) _removeTx(parent.GetTx().GetHash(), /* inBlock = */true);

instagibbs commented at 1:59 pm on September 26, 2023:

If we had a chain:

0A->B-C
12   1  100 sat/vbyte

Would this mean that A would be considered as not a CPFP by this logic?

if there’s a limit to this logic it should be made clear

darosior commented at 2:11 pm on September 26, 2023:

Yes. This code would only catch the most simple one-parent one-child CPFPs. Do you think we should do better? I’m a bit wary of starting to consider the child as then we’d have to consider its own ancestor feerate to avoid false positives.

instagibbs commented at 2:13 pm on September 26, 2023:

I’m fine with the limitation, but the commit message/docs should be very explicit what’s been ignored by this commit

naumenkogs commented at 1:48 pm on October 2, 2023:

I stared at the result of _removeTx for some time. false would mean it’s already been removed by another child, which is fine in the current implementation but might be confusing later (if not documented).

naumenkogs commented at 1:51 pm on October 2, 2023:

Another option is to consider 3+ chains as usual, without dropping the child. (CPFP implementations use RBF, right?) Although I guess it won’t be worse than today anyway, assuming someone really wants to trigger poor estimation on purpose?

I agree this should be better documented.

in test/functional/feature_fee_estimation.py:420 in 12e36414dc

415+
416+        # If a transaction got mined and one of its descendants has a higher ancestor
417+        # score, it does not get taken into account by the fee estimator.
418+        tx = send_tx(self.wallet, node, None, low_feerate)
419+        u = {"txid": tx["txid"], "vout": 0, "value": Decimal(tx["tx"].vout[0].nValue) / COIN}
420+        send_tx(self.wallet, node, u, high_feerate)

instagibbs commented at 2:01 pm on September 26, 2023:

please use named args

in test/functional/feature_fee_estimation.py:428 in 12e36414dc

423+        assert node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"] == 0
424+
425+        # If it has descendants which have a lower ancestor score, it does though.
426+        tx = send_tx(self.wallet, node, None, high_feerate)
427+        u = {"txid": tx["txid"], "vout": 0, "value": Decimal(tx["tx"].vout[0].nValue) / COIN}
428+        send_tx(self.wallet, node, u, low_feerate)

instagibbs commented at 2:02 pm on September 26, 2023:

please use named args

in test/functional/feature_fee_estimation.py:442 in 12e36414dc

437+        self.sync_mempools(wait=0.1, nodes=[node, miner])
438+        self.generate(miner, 1)
439+        # Decay of 0.962, truncated to 2 decimals in the RPC result
440+        assert node.estimaterawfee(1)["short"]["fail"]["totalconfirmed"] == Decimal("1.96")
441+
442+        # Generate and mine packages of transactions, 80% of them are a [low fee, high fee] package

instagibbs commented at 2:05 pm on September 26, 2023:

make some of these comments a log where appropriate

in test/functional/feature_fee_estimation.py:448 in 12e36414dc

443+        # which get mined because of the child transaction. 20% are single-transaction packages with
444+        # a medium-high feerate.
445+        # Assert that we don't give the low feerate as estimate, assuming the low fee transactions
446+        # got mined on their own.
447+        for _ in range(4):
448+            txs = []  # Batch the RPCs calls.

instagibbs commented at 2:06 pm on September 26, 2023:

is batching special for some reason?

ismaelsadeeq referenced this in commit 272bf55e8e on Oct 5, 2023

ismaelsadeeq referenced this in commit bf2a0ec344 on Oct 13, 2023

DrahtBot requested review from naumenkogs on Oct 28, 2023

DrahtBot requested review from murchandamus on Oct 28, 2023

DrahtBot added the label CI failed on Dec 10, 2023

maflcko commented at 8:55 pm on January 9, 2024: member

Needs rebase, if still relevant

DrahtBot commented at 0:03 am on April 8, 2024: contributor

🤔 There hasn’t been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn’t been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

darosior commented at 3:14 pm on April 9, 2024: member

Let’s face it: looks like i’m not getting back to this. I think it’s still a bit concerning, but hey i’m focused on other things right now and other people are looking at fee estimation. Let’s mark this as up for brags.

Let not the weight of past deeds anchor your steps; the investments of yesterday should not dictate the journey of today. For wisdom lies in moving forward, guided by the present’s light, not shackled by the shadows of sunk costs.

Inspired by @willcl-ark

darosior closed this on Apr 9, 2024

darosior commented at 3:15 pm on April 9, 2024: member

@ismaelsadeeq you may be interested in picking this up.

achow101 added the label Up for grabs on Apr 9, 2024

fanquake removed the label Up for grabs on Jul 25, 2024

fanquake commented at 10:07 am on July 25, 2024: member

PIcked up in #30079.

bitcoin locked this on Jul 25, 2025

Detect and ignore transactions that were CPFP’d in the fee estimator #25380

Code Coverage

Reviews

Conflicts