p2p: improve TxOrphanage denial of service bounds and increase -maxorphantxs #31829

pull glozow wants to merge 16 commits into bitcoin:master from glozow:2025-01-orphanage-peer-dos changing 13 files +974 βˆ’110
  1. glozow commented at 9:14 pm on February 9, 2025: member

    This PR is part of the orphan resolution project, see #27463.

    We want to limit the CPU work and memory used by TxOrphanage to avoid denial of service attacks. On master, this is achieved by limiting the number of transactions in this data structure to 100, and the weight of each transaction to 400KWu (the largest standard tx) [0]. We always allow new orphans, but if the addition causes us to exceed 100, we evict one randomly. This is dead simple, but has problems:

    • It makes the orphanage trivially churnable: any one peer can render it useless by spamming us with lots of orphans. It’s possible this is happening: “Looking at data from node alice on 2024-09-14 shows that we’re sometimes removing more than 100k orphans per minute. This feels like someone flooding us with orphans.” [1]
    • Effectively, opportunistic 1p1c is useless in the presence of adversaries: it is opportunistic and pairs a low feerate tx with a child that happens to be in the orphanage. So if nothing is able to stay in orphanages, we can’t expect 1p1cs to propagate.
    • This number is also often insufficient for the volume of orphans we handle: historical data show that overflows are pretty common, and there are times where “it seems like [the node] forgot about the orphans and re-requested them multiple times.” [1]

    Just jacking up the -maxorphantxs number is not a good enough solution, because it doesn’t solve the churnability problem, and the effective resource bounds scale poorly.

    This PR introduces numbers for {global, per-peer} {memory usage, announcements}, representing resource limits:

    • The (constant) global announcement limit is the number of unique (wtxid, peer) pairs in the orphanage [2]. This represents a cap on CPU, and does not change with the number of peers we have. Evictions must happen whenever this limit is reached.
    • The (variable) per-peer announcement limit is the global announcement limit divided by the number of peers. Peers are allowed to exceed this limit provided the global announcement limit has not been reached. The per-peer announcement limit decreases with more peers.
    • The (constant) per-peer memory usage reservation is the amount of orphan weight [3] reserved per peer [4]. Reservation means that peers are effectively guaranteed this amount of space. It is not a limit; peers are allowed to exceed this limit provided the global usage limit is not reached.
    • The (variable) global memory usage limit is the number of peers multiplied by the per-peer reservation [5]. As such, the global memory usage limit scales up with the number of peers we have. Evictions must happen whenever this limit is reached.
    • We introduce a “Peer DoS Score” which is the maximum between its “CPU Score” and “Memory Score.” The CPU score is the ratio between the number of orphans announced by this peer / peer announcement limit. The memory score is the total usage of all orphans announced by this peer / peer usage reservation.

    Eviction changes in a few ways:

    • It is triggered if any of these 3 conditions are true: (1) -maxorphantxs is exceeded (2) the global announcement limit is exceeded (3) the global memory usage limit is exceeded.
    • On each iteration of the loop, instead of selecting a random orphan, select a peer and delete 1 of its items. Specifically, select the “DoSiest peer” i.e. the one with the highest DoS score. This is the maximum between its CPU DoS score (based on announcements) and Memory DoS score (based on tx weight). After the peer has been selected, we evict a random orphan.
    • Instead of evicting orphans, we evict announcements, i.e., we don’t erase the orphan unless this peer is its only announcer. Of course, over the course of several iteration loops, we may erase all announcers and then the orphan itself. The purpose of this change is to prevent a peer from being able to churn another peer’s announcers.

    This PR also:

    • Bounds the number of transactions that can be in a peer’s work set by ensuring it is a subset of the peer’s announcements.
    • Halts processing at 100 transactions within EraseForBlock. Theoretically, it’s possible that every transaction in the orphanage spends inputs that are in the block, and we don’t want EraseForBlock to run for too long.
    • Changes the default -maxorphantxs to the global announcement limit.

    This means we can receive 1p1c packages in the presence of spammy peers. It also makes the orphanage more useful and increases our download capacity without drastically increasing orphanage resource usage.

    [0]: This means the effective memory limit in orphan weight is 100 * 400KWu = 40MWu [1]: https://delvingbitcoin.org/t/stats-on-orphanage-overflows/1421 [2]: Limit is 3000, which is equivalent to one max size ancestor package (24 transactions can be missing inputs) for each peer (default max connections is 125). [3]: Orphan weight is used in place of actual memory usage because something like “one maximally sized standard tx” is easier to reason about than “considering the bytes allocated for vin and vout vectors, it needs to be within N bytes…” etc. We can also consider a different formula to encapsulate more the memory overhead but still have an interface that is easy to reason about. [4]: The limit is 404KWu, which is the maximum size of an ancestor package. [5]: With 125 peers, this is 50.5MWu, which is a small increase from the existing limit of 40MWu. While the actual memory usage limit is higher (this number does not include the other memory used by TxOrphanage to store the outpoints map, etc.), this is within the same ballpark as the old limit.

  2. glozow added the label P2P on Feb 9, 2025
  3. DrahtBot commented at 9:14 pm on February 9, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31829.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Approach ACK sipa

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31859 (test: Rename send_message to send_without_ping by maflcko)
    • #31628 (test: add coverage for immediate orphanage eviction case by instagibbs)
    • #30987 (Don’t zero-after-free DataStream: Faster IBD on some configurations by davidgumberg)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. glozow force-pushed on Feb 9, 2025
  5. DrahtBot added the label CI failed on Feb 9, 2025
  6. DrahtBot commented at 9:20 pm on February 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36925040096

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. glozow force-pushed on Feb 9, 2025
  8. [txorphanage] change type of usage to int64_t 2771b69e46
  9. glozow force-pushed on Feb 10, 2025
  10. glozow commented at 1:13 pm on February 10, 2025: member
    Rebased
  11. glozow marked this as ready for review on Feb 10, 2025
  12. DrahtBot removed the label CI failed on Feb 10, 2025
  13. in src/txorphanage.cpp:55 in ef2f44e653 outdated
    35@@ -36,9 +36,10 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    36         return false;
    37     }
    38 
    


    instagibbs commented at 4:43 pm on February 10, 2025:

    ef2f44e653a4877d4e65fbd5a51ec83ceb96d212

    MAX_GLOBAL_ORPHAN_ANNOUNCEMENTS doesn’t occur in the PR, should be DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS?


    glozow commented at 12:04 pm on February 11, 2025:
    thanks, fixed
  14. in src/txorphanage.h:47 in ef2f44e653 outdated
    42+    unsigned int m_reserved_weight_per_peer{DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER};
    43+
    44+    /** The maximum number of announcements across all peers, representing a computational upper bound,
    45+     * i.e. the maximum number of evictions we might do at a time. There is no per-peer announcement
    46+     * limit until the global limit is reached. Also, this limit is constant regardless of how many
    47+     * peers we have: if we only have 1 peer, this is the number of orphans they may provide. As
    


    instagibbs commented at 4:50 pm on February 10, 2025:
    s/may provide/may provide without risking eviction/?

    glozow commented at 12:06 pm on February 11, 2025:
    changed
  15. in src/txorphanage.h:48 in ef2f44e653 outdated
    43+
    44+    /** The maximum number of announcements across all peers, representing a computational upper bound,
    45+     * i.e. the maximum number of evictions we might do at a time. There is no per-peer announcement
    46+     * limit until the global limit is reached. Also, this limit is constant regardless of how many
    47+     * peers we have: if we only have 1 peer, this is the number of orphans they may provide. As
    48+     * more peers are added, each peer's allocation is reduced. */
    


    instagibbs commented at 4:50 pm on February 10, 2025:
    s/allocation/protected allocation/?

    glozow commented at 12:06 pm on February 11, 2025:
    added
  16. in src/txorphanage.h:148 in ef2f44e653 outdated
    136@@ -111,7 +137,6 @@ class TxOrphanage {
    137 
    138 protected:
    139     struct OrphanTx : public OrphanTxBase {
    


    instagibbs commented at 4:51 pm on February 10, 2025:
    do we think we’ll end up using the derived class in a meaningful way later?

    glozow commented at 4:33 am on February 12, 2025:
    not entirely sure… could be cleaned up
  17. in src/txorphanage.h:247 in ef2f44e653 outdated
    233+
    234+    unsigned int GetGlobalMaxUsage() const {
    235+        return std::max<unsigned int>(m_peer_orphanage_info.size() * m_reserved_weight_per_peer, 1);
    236+    }
    237+
    238+    unsigned int GetPerPeerMaxAnnouncements() const {
    


    instagibbs commented at 5:02 pm on February 10, 2025:
    note: this is use to “normalize” the one DoS score metric against the other only. If this wasn’t used, trimming would heavily favor announcement trimming scores first.

    glozow commented at 4:40 am on February 12, 2025:
    Yes, they’re both meant to be ratios :+1:
  18. in src/txorphanage.h:196 in ef2f44e653 outdated
    182+            FeeFrac cpu_score(m_iter_list.size(), peer_max_ann);
    183+            FeeFrac mem_score(m_total_usage, peer_max_mem);
    184+            return std::max<FeeFrac>(cpu_score, mem_score);
    185+        }
    186     };
    187     std::map<NodeId, PeerOrphanInfo> m_peer_orphanage_info;
    


    instagibbs commented at 5:11 pm on February 10, 2025:

    re:global memory limits, they won’t be increased until each connected peer offers up their own orphan or announcement (which makes sense)

    would be good to have test/fuzz coverage that there isn’t some bug where this continuously grows, raising the global limits to unsafe levels? Arg in SanityCheck?


    sipa commented at 4:26 am on February 19, 2025:

    Regarding https://github.com/bitcoin/bitcoin/pull/31829/commits/2771b69e461230feb7761b0afff41b44bd3ba34f#r1949554535

    With GetGlobalMaxUsage() computing memory limits on-the-fly, is this comment still relevant?


    instagibbs commented at 4:40 pm on February 20, 2025:

    I guess my worry is that we would somehow add a regression that results in the peer set growing indefinitely in the orphanage, which would grow memory usage substantially. Here’s some suggested coverage which would hopefully uncover such an issue (unless the caller plum forgets to EraseForPeer):

      0diff --git a/src/test/fuzz/txorphan.cpp b/src/test/fuzz/txorphan.cpp
      1index 9133323449..df0a02d75d 100644
      2--- a/src/test/fuzz/txorphan.cpp
      3+++ b/src/test/fuzz/txorphan.cpp
      4@@ -45,10 +45,15 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
      5         outpoints.emplace_back(Txid::FromUint256(uint256{i}), 0);
      6     }
      7 
      8     CTransactionRef ptx_potential_parent = nullptr;
      9 
     10+    // Peers which have offered orphans (via tx or announcement) and have not subsequently
     11+    // "disconnected" aka called EraseForPeer
     12+    std::set<NodeId> connected_peers;
     13+
     14+
     15     LIMITED_WHILE(outpoints.size() < 200'000 && fuzzed_data_provider.ConsumeBool(), 10 * DEFAULT_MAX_ORPHAN_TRANSACTIONS)
     16     {
     17         // construct transaction
     18         const CTransactionRef tx = [&] {
     19             CMutableTransaction tx_mut;
     20@@ -121,14 +126,16 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     21 
     22                         if (add_tx) {
     23                             Assert(orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start);
     24                             Assert(orphanage.TotalOrphanUsage() == tx_weight + total_bytes_start);
     25                             Assert(tx_weight <= MAX_STANDARD_TX_WEIGHT);
     26+                            connected_peers.insert(peer_id);
     27                         } else {
     28                             // Peer may have been added as an announcer.
     29                             if (orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start) {
     30                                 Assert(orphanage.HaveTxFromPeer(wtxid, peer_id));
     31+                                connected_peers.insert(peer_id);
     32                             } else {
     33                                 // Otherwise, there must not be any change to the peer byte count.
     34                                 Assert(orphanage.UsageByPeer(peer_id) == total_peer_bytes_start);
     35                             }
     36 
     37@@ -158,10 +165,11 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     38                         // Total bytes should not have changed. If peer was added as announcer, byte
     39                         // accounting must have been updated.
     40                         Assert(orphanage.TotalOrphanUsage() == total_bytes_start);
     41                         if (added_announcer) {
     42                             Assert(orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start);
     43+                            connected_peers.insert(peer_id);
     44                         } else {
     45                             Assert(orphanage.UsageByPeer(peer_id) == total_peer_bytes_start);
     46                         }
     47                     }
     48                 },
     49@@ -190,10 +198,11 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     50                         Assert(!have_tx && !have_tx_and_peer && !orphanage.EraseTx(wtxid));
     51                     }
     52                 },
     53                 [&] {
     54                     orphanage.EraseForPeer(peer_id);
     55+                    connected_peers.erase(peer_id); // "DisconnectPeer"
     56                     Assert(!orphanage.HaveTxFromPeer(tx->GetWitnessHash(), peer_id));
     57                     Assert(orphanage.UsageByPeer(peer_id) == 0);
     58                 },
     59                 [&] {
     60                     // test mocktime and expiry
     61@@ -212,7 +221,8 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     62 
     63         const bool have_tx{orphanage.HaveTx(tx->GetWitnessHash())};
     64         const bool get_tx_nonnull{orphanage.GetTx(tx->GetWitnessHash()) != nullptr};
     65         Assert(have_tx == get_tx_nonnull);
     66     }
     67-    orphanage.SanityCheck();
     68+
     69+    orphanage.SanityCheck(connected_peers.size());
     70 }
     71diff --git a/src/txorphanage.cpp b/src/txorphanage.cpp
     72index 40a2503af5..c6c156409d 100644
     73--- a/src/txorphanage.cpp
     74+++ b/src/txorphanage.cpp
     75@@ -402,11 +402,11 @@ std::vector<TxOrphanage::OrphanTxBase> TxOrphanage::GetOrphanTransactions() cons
     76         ret.push_back({o.second.tx, o.second.announcers, o.second.nTimeExpire});
     77     }
     78     return ret;
     79 }
     80 
     81-void TxOrphanage::SanityCheck() const
     82+void TxOrphanage::SanityCheck(int expected_num_peers) const
     83 {
     84     // Check that cached m_total_announcements is correct. First count when iterating through m_orphans (counting number
     85     // of announcers each), then count when iterating through peers (counting number of orphans per peer).
     86     unsigned int counted_total_announcements{0};
     87     // Check that m_total_orphan_usage is correct
     88@@ -461,10 +461,16 @@ void TxOrphanage::SanityCheck() const
     89                 return orphan_it->second.tx->GetWitnessHash() == wtxid;
     90             }) != info.m_iter_list.end());
     91         }
     92     }
     93 
     94+    // We should not be offering more global memory for orphanage than expected
     95+    if (expected_num_peers != -1) {
     96+        Assert((size_t) expected_num_peers == m_peer_orphanage_info.size());
     97+        Assert(GetGlobalMaxUsage() == std::max<int64_t>(expected_num_peers * GetPerPeerMaxUsage(), 1));
     98+    }
     99+
    100     Assert(wtxids_in_peer_map.size() == m_orphans.size());
    101     Assert(counted_total_announcements == 0);
    102 }
    103 
    104 bool TxOrphanage::NeedsTrim(unsigned int max_orphans) const
    105diff --git a/src/txorphanage.h b/src/txorphanage.h
    106index a5a6a94bec..562e9a3321 100644
    107--- a/src/txorphanage.h
    108+++ b/src/txorphanage.h
    109@@ -140,11 +140,11 @@ public:
    110         return peer_it == m_peer_orphanage_info.end() ? 0 : peer_it->second.m_iter_list.size();
    111     }
    112 
    113     /** Check consistency between PeerOrphanInfo and m_orphans. Recalculate counters and ensure they
    114      * match what is cached. */
    115-    void SanityCheck() const;
    116+    void SanityCheck(int expected_num_peers = -1) const;
    117 
    118 protected:
    119     struct OrphanTx : public OrphanTxBase {
    120     };
    
  19. in src/txorphanage.cpp:103 in ef2f44e653 outdated
    108-        it_last->second.list_pos = old_pos;
    109+        // Find this orphan iter's position in the list, and delete it.
    110+        auto& orphan_list = peer_it->second.m_iter_list;
    111+        size_t old_pos = std::distance(orphan_list.begin(), std::find(orphan_list.begin(), orphan_list.end(), it));
    112+
    113+        if (!Assume(old_pos < orphan_list.size())) continue;
    


    instagibbs commented at 5:21 pm on February 10, 2025:
    Maybe a bit too paranoid since it was just computed explicitly from the underlying list?

    glozow commented at 12:04 pm on February 11, 2025:
    removed
  20. in src/txorphanage.cpp:200 in ef2f44e653 outdated
    200-        ++nEvicted;
    201+        if (!Assume(!peer_it_heap.empty())) break;
    202+        // Find the peer with the highest DoS score, which is a fraction of {usage, announcements} used
    203+        // over the allowance. This metric causes us to naturally select peers who have exceeded
    204+        // their limits (i.e. a DoS score > 1) before peers who haven't. We may choose the same peer
    205+        // change since the last iteration of this loop.
    


    instagibbs commented at 5:29 pm on February 10, 2025:
    wording confusion: same peer change?

    glozow commented at 12:05 pm on February 11, 2025:
    fixed
  21. in src/test/orphanage_tests.cpp:247 in 95b61662e5 outdated
    242+        auto ptx = MakeTransactionSpending(/*outpoints=*/{}, det_rand);
    243+        orphanage.AddTx(ptx, dos_peer);
    244+    }
    245+    peer_usages.emplace_back(orphanage.UsageByPeer(dos_peer));
    246+
    247+    // Force an eviction. Note that no limiting has happened yet at this point. LimitOrphans may
    


    instagibbs commented at 5:44 pm on February 10, 2025:

    happened yet at this point

    Do you mean prior to LimitOrphans?


    glozow commented at 12:12 pm on February 11, 2025:
    yes, clarified
  22. in src/test/orphanage_tests.cpp:256 in 95b61662e5 outdated
    250+    orphanage.LimitOrphans(prev_count - 1, det_rand);
    251+    BOOST_CHECK(orphanage.Size() <= prev_count - 1);
    252+
    253+    // The DoS peer's orphans have been evicted, nobody else's have.
    254+    for (NodeId peer{0}; peer <= dos_peer; ++peer) {
    255+        BOOST_CHECK_EQUAL(peer == dos_peer, peer_usages.at(peer) != orphanage.UsageByPeer(peer));
    


    instagibbs commented at 5:46 pm on February 10, 2025:
    nit: would rather we check that we evicted the dos’y peer and somehow didn’t add more resources allocated to him

    glozow commented at 12:16 pm on February 11, 2025:
    Added
  23. in test/functional/p2p_opportunistic_1p1c.py:476 in 653f1bb84d outdated
    467+        peer_normal.wait_for_getdata([parent_txid_int])
    468+
    469+        self.log.info("Send another round of very large orphans from a DoSy peer")
    470+        for large_orphan in large_orphans[60:]:
    471+            peer_doser.send_and_ping(msg_tx(large_orphan))
    472+
    


    instagibbs commented at 6:04 pm on February 10, 2025:

    could do this for both cases

    0       # Something was evicted
    1        assert_greater_than(len(large_orphans), len(node.getorphantxs()))
    

    glozow commented at 12:37 pm on February 11, 2025:
    added
  24. in src/bench/txorphanage.cpp:103 in 3ce9ef7dd3 outdated
    44+}
    45+
    46+static void OrphanageEvictionMany(benchmark::Bench& bench)
    47+{
    48+    NodeId NUM_PEERS{125};
    49+    unsigned int NUM_TRANSACTIONS(DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS / NUM_PEERS);
    


    instagibbs commented at 6:15 pm on February 10, 2025:
    seems wrong, you’re sending 3000/125=24 transactions total?

    glozow commented at 12:18 pm on February 11, 2025:
    Yes, and they are each announced by every peer. This bench is to test the maximum number of transactions where every peer has 100% overlap. We call AddTx DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS times, which is the maximum before eviction would trigger. If we increase DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS, the bench will scale too.

    instagibbs commented at 3:02 pm on February 15, 2025:
    are you sure it’s being sent via every peer for this benchmark? Looks like there’s no overlap?

    glozow commented at 4:44 pm on February 20, 2025:
    You are right. That explains why it doesn’t get slower haha
  25. in src/txorphanage.cpp:276 in 33034eaa3b outdated
    275+                // items that are no longer in the orphanage. We should only do this once per peer
    276+                // per call to AddChildrenToWorkSet, so keep track of which peers we have trimmed.
    277+                // We also never need to do it more than once since evictions don't happen in this
    278+                // function.
    279+                if (orphan_work_set.size() + 1 > MAX_ORPHAN_WORK_QUEUE && !peers_workset_trimmed.contains(announcer)) {
    280+                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return m_orphans.contains(wtxid); });
    


    instagibbs commented at 8:28 pm on February 10, 2025:

    this is backwards. See added unit test:

     0diff --git a/src/test/orphanage_tests.cpp b/src/test/orphanage_tests.cpp
     1index fe0f81fdb4..dde42d9d4a 100644
     2--- a/src/test/orphanage_tests.cpp
     3+++ b/src/test/orphanage_tests.cpp
     4@@ -77,0 +78,24 @@ static CTransactionRef MakeTransactionSpending(const std::vector<COutPoint>& out
     5+// 101 output transaction
     6+static CTransactionRef MakeHugeTransactionSpending(const std::vector<COutPoint>& outpoints, FastRandomContext& det_rand)
     7+{
     8+    CKey key;
     9+    MakeNewKeyWithFastRandomContext(key, det_rand);
    10+    CMutableTransaction tx;
    11+    // If no outpoints are given, create a random one.
    12+    if (outpoints.empty()) {
    13+        tx.vin.emplace_back(Txid::FromUint256(det_rand.rand256()), 0);
    14+    } else {
    15+        for (const auto& outpoint : outpoints) {
    16+            tx.vin.emplace_back(outpoint);
    17+        }
    18+    }
    19+    // Ensure txid != wtxid
    20+    tx.vin[0].scriptWitness.stack.push_back({1});
    21+    tx.vout.resize(101);
    22+    tx.vout[0].nValue = CENT;
    23+    tx.vout[0].scriptPubKey = GetScriptForDestination(PKHash(key.GetPubKey()));
    24+    tx.vout[1].nValue = 3 * CENT;
    25+    tx.vout[1].scriptPubKey = GetScriptForDestination(WitnessV0KeyHash(key.GetPubKey()));
    26+    return MakeTransactionRef(tx);
    27+}
    28+
    29@@ -598,0 +623,23 @@ BOOST_AUTO_TEST_CASE(peer_worksets)
    30+
    31+        {
    32+            // We will fill the orphanage with a single parent and 101 children
    33+            // from that single transaction to cause potential deletion of work set
    34+            // from peer 0.
    35+            auto tx_missing_parent = MakeHugeTransactionSpending({}, det_rand);
    36+            std::vector<CTransactionRef> tx_orphans;
    37+            for (unsigned int i{0}; i < MAX_ORPHAN_WORK_QUEUE + 1; i++) {
    38+                auto tx_orphan = MakeTransactionSpending({COutPoint{tx_missing_parent->GetHash(), i}}, det_rand);
    39+                BOOST_CHECK(orphanage.AddTx(tx_orphan, /*peer=*/node0));
    40+            }
    41+
    42+            // 101 transactions in the orphanage (no trimming of orphanage yet), now
    43+            // add parent to work set, which will all be allocated to peer 0.
    44+            // work set should get trimmed exactly once down to MAX_ORPHAN_WORK_QUEUE
    45+            orphanage.AddChildrenToWorkSet(*tx_missing_parent, det_rand);
    46+            for (unsigned int i{0}; i < MAX_ORPHAN_WORK_QUEUE; i++) {
    47+                BOOST_CHECK(orphanage.GetTxToReconsider(node0));
    48+            }
    49+
    50+            // We should have emptied the work queue in MAX_ORPHAN_WORK_QUEUE steps
    51+            BOOST_CHECK(!orphanage.HaveTxToReconsider(node0));
    52+        }
    53diff --git a/src/txorphanage.cpp b/src/txorphanage.cpp
    54index 09d50a244a..2a881db669 100644
    55--- a/src/txorphanage.cpp
    56+++ b/src/txorphanage.cpp
    57@@ -276 +276 @@ void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, FastRandomContext
    58-                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return m_orphans.contains(wtxid); });
    59+                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return !m_orphans.contains(wtxid); });
    

    glozow commented at 12:45 pm on February 11, 2025:
    Wow, very bad bit flip :facepalm: thank you

    glozow commented at 5:06 pm on February 11, 2025:
    Wrote a similar test to check that an evicted work item is the one that doesn’t exist in m_orphans anymore.
  26. in src/txorphanage.cpp:281 in 33034eaa3b outdated
    281+                    peers_workset_trimmed.insert(announcer);
    282+                }
    283+
    284+                // Add this tx to the work set. If the workset is full, even after trimming, don't
    285+                // accept any new work items until the work queue has been flushed.
    286+                if (orphan_work_set.size() < MAX_ORPHAN_WORK_QUEUE) {
    


    instagibbs commented at 8:40 pm on February 10, 2025:
    should we debug log if we’re not adding to work set? might be good to know it’s happening

    glozow commented at 4:39 am on February 12, 2025:
    Added a couple pushes ago, but gone again. After a bit of offline discussion with @mzumsande and @sipa, it seemed better to just synchronously remove wtxids from worksets when they are removed as announcements. This means that the work set is always a subset of announcement set (added this to SanityCheck). Also, potentially failing to add things to work set seemed to make this less useful.

    instagibbs commented at 12:45 pm on February 12, 2025:
    Ok! that was my suggestion offline too. Bounding announcements means we bound the workset :+1:
  27. in src/txorphanage.cpp:149 in ff24c1feeb outdated
    139@@ -140,13 +140,12 @@ void TxOrphanage::EraseForPeer(NodeId peer)
    140     if (nErased > 0) LogDebug(BCLog::TXPACKAGES, "Erased %d orphan transaction(s) from peer=%d\n", nErased, peer);
    141 }
    142 
    143-void TxOrphanage::LimitOrphans(unsigned int max_orphans, FastRandomContext& rng)
    144+unsigned int TxOrphanage::MaybeExpireOrphans()
    145 {
    146-    unsigned int nEvicted = 0;
    147+    int nErased = 0;
    


    mzumsande commented at 8:42 pm on February 10, 2025:
    nit: doesn’t really matter if unsigned int (return value) or int is used, but would be nice to make it consistent, also with the %u / %d format specifiers in the logprints.

    glozow commented at 4:37 am on February 12, 2025:
    made consistent + made both logs %u
  28. in src/txorphanage.cpp:245 in ff24c1feeb outdated
    175         ++nEvicted;
    176     }
    177+    return nEvicted;
    178+}
    179+
    180+void TxOrphanage::LimitOrphans(unsigned int max_orphans, FastRandomContext& rng)
    


    mzumsande commented at 9:12 pm on February 10, 2025:
    nit: commit msg of c929d71c0828544be509934312b6a7d11b47ea4d lacks a verb (e.g. “split”).

    glozow commented at 4:25 am on February 12, 2025:
    added
  29. in test/functional/p2p_opportunistic_1p1c.py:451 in 653f1bb84d outdated
    446+        # normal package request to time out.
    447+        self.wait_until(lambda: len(node.getorphantxs()) == num_individual_dosers)
    448+
    449+        self.log.info("Send an orphan from a non-DoSy peer. Its orphan should not be evicted.")
    450+        low_fee_parent = self.create_tx_below_mempoolminfee(self.wallet)
    451+        high_fee_child = self.wallet.create_self_transfer(utxo_to_spend=low_fee_parent["new_utxo"], fee_rate=20*FEERATE_1SAT_VB)
    


    instagibbs commented at 9:22 pm on February 10, 2025:
    the honest orphan should be as large as possible: target_vsize=100000

    glozow commented at 12:38 pm on February 11, 2025:
    done
  30. in test/functional/p2p_opportunistic_1p1c.py:500 in 653f1bb84d outdated
    495+                # runtime of this test.
    496+                peer_doser_batch.send_message(msg_tx(tx))
    497+
    498+        self.log.info("Send an orphan from a non-DoSy peer. Its orphan should not be evicted.")
    499+        low_fee_parent = self.create_tx_below_mempoolminfee(self.wallet)
    500+        high_fee_child = self.wallet.create_self_transfer(utxo_to_spend=low_fee_parent["new_utxo"], fee_rate=20*FEERATE_1SAT_VB)
    


    instagibbs commented at 9:23 pm on February 10, 2025:
    the honest orphan should be as large as possible: target_vsize=100000

    glozow commented at 12:38 pm on February 11, 2025:
    done
  31. in test/functional/p2p_opportunistic_1p1c.py:485 in 653f1bb84d outdated
    473+        self.log.info("Provide the orphan's parent. This 1p1c package should be successfully accepted.")
    474+        peer_normal.send_and_ping(msg_tx(low_fee_parent["tx"]))
    475+        assert_equal(node.getmempoolentry(orphan_tx.rehash())["ancestorcount"], 2)
    476+
    477+    @cleanup
    478+    def test_orphanage_dos_many(self):
    


    instagibbs commented at 9:26 pm on February 10, 2025:
    can we get a functional test case that covers the “protects fully sized ancestor package” scenario in p2p_orphan_handling.py?

    glozow commented at 4:38 am on February 12, 2025:
    (thanks, I took your test with just a few tweaks)
  32. instagibbs commented at 9:27 pm on February 10, 2025: member

    The resource bounds additions seem to make sense, still working through the workset change implications.

    I’ve got a minimal fuzz harness checking that the “honest” peer cannot be evicted, please feel free to take it: https://github.com/instagibbs/bitcoin/tree/2025-01-orphanage-peer-dos_greg_2

  33. in src/txorphanage.h:236 in ef2f44e653 outdated
    230+    unsigned int GetPerPeerMaxUsage() const {
    231+        return m_reserved_weight_per_peer;
    232+    }
    233+
    234+    unsigned int GetGlobalMaxUsage() const {
    235+        return std::max<unsigned int>(m_peer_orphanage_info.size() * m_reserved_weight_per_peer, 1);
    


    mzumsande commented at 9:58 pm on February 10, 2025:
    should this be int64_t instead of int in the spirit of the first commit?

    glozow commented at 4:37 am on February 12, 2025:
    fixed

    sipa commented at 2:25 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    The int64_t return type won’t actually do anything here, because std::max is instantiated for unsigned int, and also, the std::map::size() may return something smaller (particularly on 32-bit systems).

    0int64_t GetGlobalMaxUsage() const {
    1        return std::max<int64_t>(int64_t(m_peer_orphanage_info.size()) * m_reserved_weight_per_peer, 1);
    2}
    

    glozow commented at 12:57 pm on February 14, 2025:
    Thanks! Fixed
  34. in src/txorphanage.cpp:185 in ef2f44e653 outdated
    169@@ -165,13 +170,68 @@ unsigned int TxOrphanage::MaybeExpireOrphans()
    170 
    171 unsigned int TxOrphanage::MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng)
    172 {
    173+    // Exit early to avoid building the heap unnecessarily
    174+    if (!NeedsTrim() && m_orphans.size() <= max_orphans) return 0;
    175+
    176+    std::vector<PeerMap::iterator> peer_it_heap;
    177+    for (auto it = m_peer_orphanage_info.begin(); it != m_peer_orphanage_info.end(); ++it) peer_it_heap.push_back(it);
    178+    peer_it_heap.reserve(m_peer_orphanage_info.size());
    


    mzumsande commented at 10:35 pm on February 10, 2025:
    should call reserve before pushing entries to peer_it_heap, not after.

    glozow commented at 4:25 am on February 12, 2025:
    fixed
  35. in src/txorphanage.cpp:173 in ef2f44e653 outdated
    169@@ -165,13 +170,68 @@ unsigned int TxOrphanage::MaybeExpireOrphans()
    170 
    171 unsigned int TxOrphanage::MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng)
    172 {
    173+    // Exit early to avoid building the heap unnecessarily
    174+    if (!NeedsTrim() && m_orphans.size() <= max_orphans) return 0;
    


    mzumsande commented at 10:40 pm on February 10, 2025:
    Could m_orphans.size() <= max_orphans be inside NeedsTrim?

    glozow commented at 4:25 am on February 12, 2025:
    Moved into NeedsTrim
  36. glozow force-pushed on Feb 11, 2025
  37. glozow commented at 5:04 pm on February 11, 2025: member
    Thanks @instagibbs for the testing and review, added your fuzz commits and took comments. Still need to write the p2p_orphan_handling test.
  38. glozow force-pushed on Feb 11, 2025
  39. DrahtBot added the label CI failed on Feb 11, 2025
  40. DrahtBot commented at 5:08 pm on February 11, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37041607307

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  41. glozow force-pushed on Feb 11, 2025
  42. mzumsande commented at 7:33 pm on February 11, 2025: contributor

    Halfway through, some minor points below - my main conceptual question is why m_total_announcements is a meaningful metric in limiting the orphanage.

    My understanding is that m_total_orphan_usage exists to limit memory usage, and m_total_announcements to limit CPU usage - but why the number of announcements instead of number of orphans? Why would it make the situation any less DoSy if we remove an announcer but keep the orphan? Since we only assign the tx to one peer’s workset after 7426afbe62414fa575f91b4f8d3ea63bcc653e8b, more announcers for the same number of orphans doesn’t really mean any additional work.

  43. glozow commented at 10:49 pm on February 11, 2025: member

    My understanding is that m_total_orphan_usage exists to limit memory usage, and m_total_announcements to limit CPU usage - but why the number of announcements instead of number of orphans?

    Yep, to limit CPU usage. The complexity of eviction for example is bounded by the total number of announcements: in the worst case, each orphan has many announcers and the MaybeTrimOrphans loop first removes announcements until each orphan just has 1 left, and then finally can remove transactions. See comment above declaration, “The loop can run a maximum of m_max_global_announcement times”

    Why would it make the situation any less DoSy if we remove an announcer but keep the orphan?

    Perhaps I should have stated this in the OP more explicitly, but a major motivation for this eviction strategy is to prevent any peer from being to evict any announcements of another peer, hence the per-peer limits. If we changed the eviction code to remove orphans wholesale instead of just announcements, we’d have a similar situation to today’s: an attacker can cause churn of an honest orphan by announcing it along with a lot of other orphans.

    So evicting announcements instead of orphans isn’t less DoSy, but it does make the orphanage less churnable.

  44. glozow force-pushed on Feb 12, 2025
  45. glozow force-pushed on Feb 12, 2025
  46. in test/functional/p2p_orphan_handling.py:753 in 4e17767f4b outdated
    748+            peer_doser.send_and_ping(msg_tx(large_orphan))
    749+
    750+        self.log.info("Provide the top ancestor. The whole package should be re-evaluated after enough time.")
    751+        peer_normal.send_and_ping(msg_tx(ancestor_package[0]["tx"]))
    752+
    753+        self.wait_until(lambda: node.getmempoolentry(ancestor_package[-1]["txid"])["ancestorcount"] == DEFAULT_ANCESTOR_LIMIT)
    


    instagibbs commented at 2:06 pm on February 12, 2025:
    on second thought, this will probably throw an assertion if the final child isn’t in the mempool yet? Probably need to prepend this clause with ancestor_package[-1]["txid"] in node.getrawmempool()

    glozow commented at 4:00 pm on February 13, 2025:
    ah hm, maybe we don’t need to wait? I guess if you send a ping, you don’t get a pong until all 24 are processed.

    instagibbs commented at 5:04 pm on February 13, 2025:
    I’ll defer to you, but IIUC we’ll do ~1 orphan processing per message processing step, so it might take a bit more to process all 24 from orphanage? Alternatively we could query the first tx and wait until the descendant count hits DEFAULT_ANCESTOR_LIMIT

    glozow commented at 12:52 pm on February 14, 2025:
    Hold on, I don’t see why it’d throw an assertion? ancestor_package[-1] is the last child right? Added some more comments but didn’t change the code
  47. in test/functional/p2p_orphan_handling.py:68 in 889afbadb4 outdated
    60-            self.nodes[0].bumpmocktime(LONG_TIME_SKIP)
    61             # Check that mempool and orphanage have been cleared
    62             self.wait_until(lambda: len(self.nodes[0].getorphantxs()) == 0)
    63             assert_equal(0, len(self.nodes[0].getrawmempool()))
    64+
    65+            self.restart_node(0, extra_args=["-persistmempool=0"])
    


    instagibbs commented at 2:20 pm on February 12, 2025:

    889afbadb417e9422c7c06fd074981fa62045568

    Since we’re restarting does this wipe the mocktime on the node? Doesn’t seem to affect timings, but I think it’s easier to think about the test this way.

    note that if you set it here, you also need to setmocktime any other time you restart as well


    glozow commented at 12:53 pm on February 14, 2025:
    nice, thanks - added a setmocktime() after all the restarts
  48. in src/txorphanage.h:179 in 8520f1e493 outdated
    169@@ -138,10 +170,27 @@ class TxOrphanage {
    170          * m_total_orphan_size. If a peer is removed as an announcer, even if the orphan still
    171          * remains in the orphanage, this number will be decremented. */
    172         int64_t m_total_usage{0};
    173+
    174+        /** Orphan transactions in vector for quick random eviction */
    175+        std::vector<OrphanMap::iterator> m_iter_list;
    


    sipa commented at 2:21 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    I think it would be helpful to add this variable, and the testing thereof, in a separate commit from the actual eviction changes.


    glozow commented at 12:53 pm on February 14, 2025:
    Added a commit before that one, just adding the list and sanity checking
  49. in src/txorphanage.h:191 in 8520f1e493 outdated
    181+         * If the peer is using more than the allowed for either resource, its DoS score is > 1.
    182+         * A peer having a DoS score > 1 does not necessarily mean that something is wrong, since we
    183+         * do not trim unless the orphanage exceeds global limits, but it means that this peer will
    184+         * be selected for trimming sooner. */
    185+        FeeFrac GetDoSScore(unsigned int peer_max_ann, unsigned int peer_max_mem) {
    186+            FeeFrac cpu_score(m_iter_list.size(), peer_max_ann);
    


    sipa commented at 2:26 pm on February 12, 2025:
    Huh, neat, I hadn’t considered using FeeFrac here, but it fits.
  50. in src/txorphanage.cpp:192 in 8520f1e493 outdated
    178+
    179+    // Sort peers that have the highest ratio of DoSiness first
    180+    auto compare_peer = [this](PeerMap::iterator left, PeerMap::iterator right) {
    181+        const auto max_ann{GetPerPeerMaxAnnouncements()};
    182+        const auto max_mem{GetPerPeerMaxUsage()};
    183+        return left->second.GetDoSScore(max_ann, max_mem) < right->second.GetDoSScore(max_ann, max_mem);
    


    sipa commented at 2:31 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    Using FeeFrac::operator< here, which tiebreaks by biggest denominator first in case the ratios are equal, means that if two peers are equally DoSsy, but one is that due to memory usage, and the other is that due to announcements, the announcements one will be targetted first. That’s probably fine, but perhaps worth documenting.


    glozow commented at 12:53 pm on February 14, 2025:
    good point, added a comment
  51. sipa commented at 2:35 pm on February 12, 2025: member
    Approach ACK
  52. in test/functional/p2p_opportunistic_1p1c.py:506 in acbe37029e outdated
    501+                # Don't sync with ping or wait for responses, because it dramatically increases the
    502+                # runtime of this test.
    503+                peer_doser_batch.send_message(msg_tx(tx))
    504+
    505+        # Something was evicted
    506+        self.wait_until(lambda: len(node.getorphantxs()) < batch_size * num_peers)
    


    instagibbs commented at 2:56 pm on February 12, 2025:

    this will trivially pass (and not actually ensure there was an eviction) unless you let the orphans be processed with a send_and_ping.

    It’s also not true, because no evictions will happen with just 1000 orphans

    Here’s a suggested patchset which has this subtest run in ~2m30s and actually causes evictions with default parameters: https://github.com/instagibbs/bitcoin/commit/fedea4a17b7fc4c442b0ad98b51b85ff93a55beb


    glozow commented at 12:54 pm on February 14, 2025:
    Thanks! Added. It does take a long time but that’s the only way to get evictions due to announcements in a functional test.

    glozow commented at 2:07 pm on February 14, 2025:

    Hm, one of the CI failures is a timeout for this (the other is a wallet thing). Perhaps it takes a bit too long? A few ideas:

    • Change these to batches of 200 * 10 + 101 * 10 and just do the wait_until once to make the test faster.
    • Use -maxorphantxs to reduce the limit.
    • Wait after each orphan is sent. This makes the overall test a lot longer, but makes it less likely we’ll hit the timeout.
    • Keep a lower count of txns and settle for a test that fails on master but not on this PR.

    instagibbs commented at 2:09 pm on February 14, 2025:

    Too bad :(

    edit: my guess is we’ll need to lower max orphan count to make CI runs happy


    glozow commented at 5:54 pm on February 14, 2025:
    Hm, I don’t want to manually change -maxorphantxs. But hey, this is the idea for introducing the test prior to increasing the default to 3000. At that commit, the orphanage doesn’t go past 100, so it’s definitely doing evictions even with just 1010 + 101 orphans.

    glozow commented at 6:41 pm on February 14, 2025:
    Ok, I’ve made it 101 * 30 + 101 = 3131 total. I think the wait_until for the 1p1c orphan to be in orphanage kind of achieves what we want (i.e. evictions for each of the previous orphans have already been calculated) even though we don’t explicitly wait for each peer. this is not true anymore

    glozow commented at 2:05 pm on February 19, 2025:

    fwiw, here’s what I landed on for test_orphanage_dos_many: (1) send 51 of the same orphan from 60 peers (51 orphans, 3060 announcements). sync_with_ping and wait until at least one of the orphans is in orphanage (2) send the 1p1c, which is a small orphan (not 100kvb because prior to increasing -maxorphantxs, this peer’s memory DoS score could be higher than the other peers’ CPU DoS scores and get the tx evicted. Better to compare apples to apples). (3) send 51 unique orphans from 40 peers (2040 orphans, 2040 announcements). sync_with_ping and wait until we have at least 1 of the orphans from each peer. This gives us 2091 orphans, 5100 announcements total from the DoSy peers, plus 1 normal from the 1p1c.

    Before we increase -maxorphantxs, there are evictions during (3). After we increase -maxorphantxs, there are evictions in (1) and (3). The reason I’m doing the shared orphans first is so that we get to maximum announcements more quickly, and we can expect evictions during both rounds.

  53. in test/functional/p2p_opportunistic_1p1c.py:540 in acbe37029e outdated
    535+            peer_doser_shared = node.add_p2p_connection(P2PInterface())
    536+            for orphan in shared_orphans:
    537+                peer_doser_shared.send_message(msg_tx(orphan))
    538+
    539+        # Something was evicted; the orphanage does not contain all DoS orphans + the 1p1c child
    540+        self.wait_until(lambda: len(node.getorphantxs()) < batch_size * num_peers + len(shared_orphans) + 1)
    


    instagibbs commented at 3:33 pm on February 12, 2025:
    this is a buggy assertion because no evictions will be happening, you might just front-run validation and slip through on your local machine. see https://github.com/instagibbs/bitcoin/commit/fedea4a17b7fc4c442b0ad98b51b85ff93a55beb for what I think would be a valid assertion

    glozow commented at 12:55 pm on February 14, 2025:
    Added thanks!
  54. instagibbs commented at 3:34 pm on February 12, 2025: member
    combing through tests a bit, think I spotted the CI failure cause
  55. glozow added this to the milestone 29.0 on Feb 12, 2025
  56. glozow requested review from sr-gi on Feb 13, 2025
  57. glozow requested review from stickies-v on Feb 13, 2025
  58. glozow force-pushed on Feb 14, 2025
  59. DrahtBot removed the label CI failed on Feb 14, 2025
  60. glozow force-pushed on Feb 14, 2025
  61. glozow force-pushed on Feb 14, 2025
  62. DrahtBot added the label CI failed on Feb 14, 2025
  63. in src/test/fuzz/txorphan_protected.cpp:46 in a704b8a959 outdated
    41+
    42+    // We have NUM_PEERS, of which Peer==0 is the "honest" one
    43+    // who will never exceed their reserved weight of announcement
    44+    // count, and should therefore never be evicted.
    45+    const unsigned int NUM_PEERS = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 125);
    46+    const unsigned int global_announcement_limit = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(NUM_PEERS, 1'000);
    


    instagibbs commented at 7:04 pm on February 14, 2025:
    Let’s set the upper range to DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS?

    glozow commented at 7:23 pm on February 14, 2025:
    done
  64. in src/test/fuzz/txorphan_protected.cpp:50 in a704b8a959 outdated
    45+    const unsigned int NUM_PEERS = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 125);
    46+    const unsigned int global_announcement_limit = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(NUM_PEERS, 1'000);
    47+    // This must match announcement limit (or exceed) otherwise "honest" peer can be evicted
    48+    const unsigned int global_tx_limit = global_announcement_limit;
    49+    const unsigned int per_peer_announcements = global_announcement_limit / NUM_PEERS;
    50+    const unsigned int per_peer_weight_reservation = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 4'000);
    


    instagibbs commented at 7:05 pm on February 14, 2025:
    set the upper range to DEFAULT_ANCESTOR_SIZE_LIMIT_KVB * 4000? See no reason to not cover the full range (we could also increase the num_outs range to make larger ranges hit based on a bool?

    glozow commented at 7:24 pm on February 14, 2025:
    I made it DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER * 10
  65. glozow force-pushed on Feb 14, 2025
  66. glozow force-pushed on Feb 14, 2025
  67. in src/test/orphanage_tests.cpp:252 in 064af55a0a outdated
    247+    // Force an eviction. Note that no limiting has happened yet at this point (we haven't called
    248+    // LimitOrphans yet) so it may be oversize and LimitOrphans may evict more than 1 transaction.
    249+    // All evictions will be from the dos_peer's transactions.
    250+    const auto prev_count = orphanage.Size();
    251+    orphanage.LimitOrphans(prev_count - 1, det_rand);
    252+    BOOST_CHECK(orphanage.Size() <= prev_count - 1);
    


    mzumsande commented at 9:01 pm on February 14, 2025:
    This seems very cautious. The test first adds 150 txns, then another DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS (so there is a 1 to 1 relation between announcements and txns). If all txns are distinct, this should be enough to assert that Size() should shrink by 150 compared to prev_count, not just 1.

    glozow commented at 8:16 pm on February 18, 2025:
    changed to be asserting exact counts πŸ‘
  68. in test/functional/p2p_opportunistic_1p1c.py:478 in 793c80be58 outdated
    473+        self.log.info("Send another round of very large orphans from a DoSy peer")
    474+        for large_orphan in large_orphans[60:]:
    475+            peer_doser.send_and_ping(msg_tx(large_orphan))
    476+
    477+        # Something was evicted; the orphanage does not contain all large orphans + the 1p1c child
    478+        self.wait_until(lambda: len(node.getorphantxs()) < len(large_orphans) + 1)
    


    mzumsande commented at 9:26 pm on February 14, 2025:
    the numbers don’t add up for me: the tests creates 100 large orphans, sends 20, sends 1 small tx, then sends another 40 large orphans, and finally asserts that there are less than 101 entries in the orphanage. This would also be true without any eviction.

    glozow commented at 8:15 pm on February 18, 2025:
    thanks, fixed
  69. DrahtBot removed the label CI failed on Feb 14, 2025
  70. in src/txorphanage.cpp:100 in 944f61e6d5 outdated
     96         auto peer_it = m_peer_orphanage_info.find(peer);
     97         if (Assume(peer_it != m_peer_orphanage_info.end())) {
     98             peer_it->second.m_total_usage -= tx_size;
     99+
    100+            auto& orphan_list = peer_it->second.m_iter_list;
    101+            size_t old_pos = std::distance(orphan_list.begin(), std::find(orphan_list.begin(), orphan_list.end(), it));
    


    sipa commented at 2:32 pm on February 15, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    The cost of this std::distance may be O(num_orphans_per_peer), and the peer : it->second.announcers loop around can run up to O(num_announcers_per_tx). However, since the sum of all orphan_list lengths is equal to the total number of announcements, the overall cost of EraseTx is bounded by O(total_announcements). In both MaybeExpireOrphan and EraseForBlock, the function EraseTx may be invoked once per orphan, so I think this may actually mean O(total_orphans * total_announcements), which may be millions, which could mean a concerning amount of time. Something similar may apply to the call in MaybeTrimOrphans, but it’s a bit more complicated to analyse. Does that sound right?

    One possibility to reduce this may be to batch the removals. Replace EraseTx with a function that takes a set of wtxids to remove, and just loops over all peers’ orphan_lists once, removing anything that’s in the set. That would reduce the cost to just O(total_announcements). However, it would mean always iterating over all announcements whenever an orphan is erased.

    Alternatively, the iter_lists could be replaced with sets for faster removal, but that would increase the cost of random evictions in it from O(1) to O(n). That might not actually be an improvement for MaybeTrimOrphans.

    I think the proper solution is replacing the data structures that together encode the announcement sets (announcers, m_iter_list, and optionally also m_work_set) with a single global boost::multiindex, with hashed by-wtxid index, and a ranked by-(peer,wtxid) index (which allows for fast random access).


    instagibbs commented at 4:21 pm on February 15, 2025:

    Added a few more benchmarks to get an idea of what it could look like with current PR: https://github.com/instagibbs/bitcoin/commit/ba2e3e339cafdf1b38742b2c288a18dd32c63db3

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        7,012,786.00 |              142.60 |    0.9% |      0.08 | `OrphanageEvictionBlockManyPeers`
    3|       27,505,341.00 |               36.36 |    0.8% |      0.30 | `OrphanageEvictionBlockOnePeer`
    4|       12,507,729.00 |               79.95 |    0.6% |      0.14 | `OrphanageEvictionManyWithManyPeers`
    5|       26,721,356.00 |               37.42 |    0.4% |      0.29 | `OrphanageEvictionManyWithOnePeer`
    6|        7,262,273.00 |              137.70 |    3.3% |      0.08 | `OrphanageEvictionPeerMany`
    7|       22,306,678.00 |               44.83 |    0.7% |      0.25 | `OrphanageEvictionPeerOne`
    

    Added EraseForBlock, EraseForPeer(in a loop), and parameterized number of peers. Looks like the std::distance work is causing most of the time since things are slower with a single peer?


    sipa commented at 4:35 pm on February 15, 2025:

    @instagibbs I believe the worst case when max_orphans == max_announcements is to have exactly one peer, and then erasing the transactions in reverse order they appear in _m_list_iters.

    That would cost n^2/2 steps in std::distance.

    When max_announcements is larger than max_orphans, the worst case is having max_announcements / max_orphans peers, and every transaction be announced by all, I think.


    instagibbs commented at 4:44 pm on February 15, 2025:

    I believe the worst case when max_orphans == max_announcements is to have exactly one peer, and then erasing the transactions in reverse order they appear in _m_list_iters.

    I swapped the order of the block txns to force it to walk the whole list for the single peer, it’s about 10% slower


    glozow commented at 6:22 pm on February 15, 2025:

    Another solution maybe: replace set<NodeId> announcers in OrphanTxBase with a std::map<NodeId, size_t> announcers, where the value is the orphan’s position in the PeerOrphanInfo::m_iter_list. Previously, we had a list_posthat tracked the orphan’s location in m_orphan_list. That removes the need to do std::distance.

    On the whole though, I agree a multiindex is probably the most natural data structure for orphanage.


    sipa commented at 6:32 pm on February 15, 2025:
    @glozow That works, I believe.

    glozow commented at 10:15 pm on February 18, 2025:

    replace set announcers in OrphanTxBase with a std::map<NodeId, size_t> announcers, where the value is the orphan’s position in the PeerOrphanInfo::m_iter_list

    Went ahead with this solution.

  71. [txorphanage] use Assert instead of Assume in SanityCheck d231a636a1
  72. [bench] TxOrphanage::EraseForBlock 032b623753
  73. in test/functional/p2p_orphan_handling.py:634 in 212e1ea50c outdated
    648-        assert_equal(len(orphanage), DEFAULT_MAX_ORPHAN_TRANSACTIONS)
    649-
    650-        self.log.info("Clearing the orphanage")
    651-        for index, parent_orphan in enumerate(parent_orphans):
    652-            peer_1.send_and_ping(msg_tx(parent_orphan))
    653-        self.wait_until(lambda: len(node.getorphantxs()) == 0)
    


    kevkevinpal commented at 3:25 am on February 16, 2025:

    instead of removing this test we can keep it if we restart the node with the previous max orphan amount

    0self.restart_node(0, extra_args=["-maxorphantx=" + str(DEFAULT_MAX_ORPHAN_TRANSACTIONS)])
    

    and we can probably move DEFAULT_MAX_ORPHAN_TRANSACTIONS into this test_max_orphan_amount and rename it to max_orphan_amount since this isnt the default max orphan amount anymore.

    If we still don’t want this test we can remove DEFAULT_MAX_ORPHAN_TRANSACTIONS since it is only used in this test


    glozow commented at 8:15 pm on February 18, 2025:
    thanks, removed DEFAULT_MAX_ORPHAN_TRANSACTIONS
  74. glozow force-pushed on Feb 18, 2025
  75. glozow force-pushed on Feb 18, 2025
  76. in src/txorphanage.cpp:41 in 19c77223cd outdated
    35@@ -36,8 +36,12 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    36         return false;
    37     }
    38 
    39-    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, {peer}, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME}, m_orphan_list.size()});
    40+    auto& orphan_list = m_peer_orphanage_info.try_emplace(peer).first->second.m_iter_list;
    41+    std::map<NodeId, size_t> announcer_list_pos{{peer, orphan_list.size()}};
    42+    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, announcer_list_pos, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME}, m_orphan_list.size()});
    


    sipa commented at 2:04 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Use tx, std::move(announcer_list_pos), ... to avoid an allocation.


    glozow commented at 8:58 pm on February 18, 2025:
    thanks, done
  77. in src/txorphanage.h:158 in 19c77223cd outdated
    153+        /** Orphan transactions announced by this peer. */
    154+        std::vector<OrphanMap::iterator> m_iter_list;
    155+
    156+        /** Remove the element at list_pos in m_iter_list in O(1) time by swapping the last element
    157+         * with the one at list_pos and popping the back if there are multiple elements. Returns the
    158+         * swapped element, if applicable, so that the caller can update its list_pos.
    


    sipa commented at 2:18 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Would it be possible to do the list_pos updating here without needing to return an iteration to push that responsibility to the caller? It would mean RemoveIterAt would need to know what peer it’s operating in, so that means it’s perhaps more appropriate to have it as TxOrphanage member function rather than a PeerOrphanInfo member function.


    glozow commented at 8:58 pm on February 18, 2025:
    good point, I’ve made it a TxOrphanage method now
  78. in src/txorphanage.cpp:112 in 19c77223cd outdated
    110@@ -104,6 +111,7 @@ int TxOrphanage::EraseTx(const Wtxid& wtxid)
    111         m_orphan_list[old_pos] = it_last;
    112         it_last->second.list_pos = old_pos;
    


    sipa commented at 2:26 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    I think this line, and the 7 lines before it, can be replaced with RemoveIterAt(it->second.list_pos), especially if it can be changed to do the list_pos updating internally?

    Not very important as the code disappears in the next commit, but would make it more obviously correct.


    glozow commented at 8:45 pm on February 18, 2025:
    Maybe I’m misunderstanding, but I don’t think we can use RemoveIterAt (which is for updating the peer list) for this (which is the global TxOrphanage::m_orphan_list)?

    sipa commented at 8:48 pm on February 18, 2025:
    Oh of course; it just looked very similar.
  79. in src/bench/txorphanage.cpp:102 in 2bffbd5c9d outdated
     96@@ -97,4 +97,105 @@ static void OrphanageEraseForBlockSinglePeer(benchmark::Bench& bench)
     97     });
     98 }
     99 
    100+static void OrphanageEvictionManyPeers(benchmark::Bench& bench)
    


    sipa commented at 3:21 pm on February 18, 2025:

    In commit “[bench] TxOrphanage::LimitOrphans”

    Would it make sense to introduce this benchmark earlier (and the other ones below), so we can see what effect the previous commit has on it?


    glozow commented at 8:58 pm on February 18, 2025:
    yes, moved it up
  80. DrahtBot added the label CI failed on Feb 18, 2025
  81. [bench] TxOrphanage::LimitOrphans 123c50af7c
  82. [refactor] split LimitOrphans into MaybeExpire and MaybeTrim f579dd9184
  83. [txorphanage] add per-peer iterator list and announcements accounting
    Unused for now. Tracks the number of announcements per peer.
    56f05a73d8
  84. [txorphanage] when full, evict from the DoSiest peers first
    The max_orphans param is retained to preserve test behavior.
    DEFAUL_MAX_ORPHAN_ANNOUNCEMENTS and DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER are
    chosen to allow 1 maximally-sized ancestor set to be resolved with each
    peer:
    * DEFAULT_MAX_PEER_CONNECTIONS x (DEFAULT_ANCESTOR_LIMIT - 1) = 3000
    * DEFAULT_ANCESTOR_SIZE_LIMIT = 404'000.
    6cbe944539
  85. [unit test] orphanage evicts dosy peers first 09ed45935b
  86. [functional test] 1p1c is accepted in the presence of orphanage DoSers
    Co-authored-by: Greg Sanders <gsanders87@gmail.com>
    40cef196dc
  87. [test] restart instead of bumpmocktime between p2p_orphan_handling subtests
    If we want to restart at all during the tests, we can't have future timestamps.
    6d2e69f34f
  88. [functional test] each peer can relay 1 ancestor package aeba8b7ac8
  89. [txorphanage] limit EraseForBlock iterations and use set instead of vec
    All transactions in the orphanage could be spending the same inputs,
    which means the vector could have a lot of duplicates.
    021fc20f9c
  90. [txorphanage] when orphans are erased, delete them from worksets c1d9651804
  91. [test] remove orphanage debug log assertions in p2p_invalid_tx.py
    Asserting debug logs does not actually test anything; test using
    getorphantxs instead.
    Also remove the test that exactly 101 orphans leads to eviction.
    f624138882
  92. Increase default -maxorphantx to 3000
    Now that TxOrphanage internally enforces more robust memory and
    computation bounds, it is safe to increase this number. Historical data
    shows that we frequently bust the 100 limit, leading to us evicting and
    redownloading the same orphans.
    6d707430e5
  93. [fuzz] Add orphan protection harness 8e3b80924e
  94. glozow force-pushed on Feb 18, 2025
  95. DrahtBot removed the label CI failed on Feb 18, 2025
  96. in src/bench/txorphanage.cpp:1 in 032b623753 outdated
    0@@ -0,0 +1,100 @@
    1+// Copyright (c) 2011-2022 The Bitcoin Core developers
    


    sipa commented at 1:12 am on February 19, 2025:

    In commit “[bench] TxOrphanage::EraseForBlock”

    Better not to have years listed than outdated/wrong ones.

  97. in src/txorphanage.h:184 in 56f05a73d8 outdated
    179@@ -168,6 +180,10 @@ class TxOrphanage {
    180 
    181     /** If there are more than max_orphans total orphans, evict randomly until that is no longer the case. */
    182     unsigned int MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng);
    183+
    184+    /** Remove the element at list_pos in m_iter_list in O(1) time by swapping the last element
    


    sipa commented at 1:18 am on February 19, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Nit: I may have instigated this, but given the m_peer_orphanage_info.find(peer) call, it’s O(log n) really (in the number of peers, not O(1)).

  98. sipa commented at 2:09 pm on February 19, 2025: member

    The changes here have surprisingly little effect on the included benchmarks:

    • [bench] TxOrphanage::LimitOrphans
    ns/op op/s err% total benchmark
    68,742,718.06 14.55 0.2% 10.59 OrphanageEraseForBlockSinglePeer
    8,857.40 112,899.93 0.3% 10.86 OrphanageEvictionManyPeers
    1,544,919.67 647.28 0.2% 10.97 OrphanageWorksetManyPeers
    11,444,367.17 87.38 0.1% 11.00 OrphanageWorksetSinglePeer
    • [txorphanage] when full, evict from the DoSiest peers first
    ns/op op/s err% total benchmark
    65,643,925.00 15.23 1.7% 10.32 OrphanageEraseForBlockSinglePeer
    8,657.19 115,510.91 0.1% 10.83 OrphanageEvictionManyPeers
    1,472,007.25 679.34 0.6% 10.96 OrphanageWorksetManyPeers
    10,034,423.77 99.66 0.3% 11.05 OrphanageWorksetSinglePeer
    • [txorphanage] limit EraseForBlock iterations and use set instead of vec
    ns/op op/s err% total benchmark
    59,924,050.19 16.69 1.1% 10.98 OrphanageEraseForBlockSinglePeer
    8,684.76 115,144.26 0.3% 10.80 OrphanageEvictionManyPeers
    1,500,899.95 666.27 0.2% 11.10 OrphanageWorksetManyPeers
    10,619,315.89 94.17 0.3% 10.99 OrphanageWorksetSinglePeer
    • [txorphanage] when orphans are erased, delete them from worksets
    ns/op op/s err% total benchmark
    59,404,692.72 16.83 1.6% 10.70 OrphanageEraseForBlockSinglePeer
    8,824.15 113,325.33 0.1% 10.82 OrphanageEvictionManyPeers
    1,481,787.75 674.86 0.2% 10.96 OrphanageWorksetManyPeers
    10,452,273.40 95.67 0.5% 11.10 OrphanageWorksetSinglePeer
    • Increase default -maxorphantx to 3000
    ns/op op/s err% total benchmark
    59,868,390.44 16.70 1.8% 11.29 OrphanageEraseForBlockSinglePeer
    9,275.98 107,805.36 0.3% 10.97 OrphanageEvictionManyPeers
    1,538,763.63 649.87 0.2% 11.01 OrphanageWorksetManyPeers
    10,657,836.93 93.83 0.4% 11.03 OrphanageWorksetSinglePeer
  99. glozow commented at 2:25 pm on February 19, 2025: member

    The changes here have surprisingly little effect on the included benchmarks:

    My expectation was that the time would go up with these changes, but hopefully not by too much.

    I am pretty surprised that “[txorphanage] when full, evict from the DoSiest peers first” doesn’t make OrphanageEvictionManyPeers slower, but I guess it’s because there are only 24 transactions.

  100. glozow commented at 4:43 pm on February 19, 2025: member

    Will add something like this as comments as well but here’s the thinking around these benches:

    • The EraseForBlock bench exists to test the worst case EraseForBlock time, which is every orphan conflicting the maximum amount with the block. That’s (very roughly) ~2000 inputs per tx (within 400KWu), times the max number of orphans.
      • We’re kind of cheating in “limit EraseForBlock iterations and use set instead of vec” which forces it to stop at 100 orphans, even if the max number of orphans is increased.
      • There is also memory, which what changing to a set helps with. Due to the way the loops work, we would have 2000 * max number of orphans iterators in the vector.
    • The workset benches exist to test the worst case AddChildrenToWorkSet time. Similarly, worst case is when all orphans spend an output of the transaction. The complexity can also increase when there are multiple announcers for the orphans.
    • The eviction bench exists to test the worst case LimitOrphans time. Note it artificially uses max_orphans=0 even though that doesn’t happen in real life. The worst case is when every orphan has been announced by every peer, and we need to remove all announcers one by one before we actually erase anything. This would have been pretty bad before we set a global announcement limit.

    So for all of these benches, we primarily want them to not blow up when we increase the DEFAULT_MAX_ORPHAN_TRANSACTIONS.

  101. glozow removed this from the milestone 29.0 on Feb 20, 2025
  102. glozow added this to the milestone 30.0 on Feb 20, 2025
  103. sipa commented at 3:44 pm on February 20, 2025: member
    Sad to see this slip, but given the amount of changes and discoveries that necessitated them even in just the last week, it’s probably the right decision.
  104. glozow commented at 3:52 pm on February 20, 2025: member
    I’m sad too! Seeing the stats from https://delvingbitcoin.org/t/stats-on-orphanage-overflows/1421 made this more pressing in my opinion, but it’s not a regression. I think we can still try to consider small, obviously safe changes for v29, but this feels too big. I don’t want to risk creating new DoS problems.
  105. in src/txorphanage.cpp:339 in 021fc20f9c outdated
    332@@ -332,16 +333,19 @@ void TxOrphanage::EraseForBlock(const CBlock& block)
    333             if (itByPrev == m_outpoint_to_orphan_it.end()) continue;
    334             for (auto mi = itByPrev->second.begin(); mi != itByPrev->second.end(); ++mi) {
    335                 const CTransaction& orphanTx = *(*mi)->second.tx;
    336-                vOrphanErase.push_back(orphanTx.GetWitnessHash());
    337+                wtxids_to_erase.insert(orphanTx.GetWitnessHash());
    338+                // Stop to avoid doing too much work. If there are more orphans to erase, rely on
    339+                // expiration and evictions to clean up everything eventually.
    340+                if (wtxids_to_erase.size() >= 100) break;
    


    instagibbs commented at 6:47 pm on February 20, 2025:
    seems like we’re leaving only one of 3 loops? is this intended?
  106. in src/txorphanage.cpp:215 in 6cbe944539 outdated
    215+        // ratios.
    216+        std::pop_heap(peer_it_heap.begin(), peer_it_heap.end(), compare_peer);
    217+        auto it_worst_peer = peer_it_heap.back();
    218+        peer_it_heap.pop_back();
    219+
    220+        // Evict a random orphan from this peer.
    


    mzumsande commented at 7:31 pm on February 20, 2025:
    “Remove a random announcement from this peer.” seems better because we only sometimes evict an orphan.
  107. in src/txorphanage.cpp:219 in 6cbe944539 outdated
    219+
    220+        // Evict a random orphan from this peer.
    221+        size_t randompos = rng.randrange(it_worst_peer->second.m_iter_list.size());
    222+        auto it_to_evict = it_worst_peer->second.m_iter_list.at(randompos);
    223+
    224+        // Only erase this peer as an announcer, unless it is the only announcer. Otherwise peers
    


    mzumsande commented at 7:35 pm on February 20, 2025:
    This is a bit ambiguous, I first read it as “don’t erase the peer as an announcer if it was the only one”, but what is meant is “erase the peer as an announcer, also remove the orphan if the peer was the only announcer”, so maybe reword it.
  108. in src/txorphanage.h:243 in 6cbe944539 outdated
    242+    int64_t GetPerPeerMaxUsage() const {
    243+        return m_reserved_weight_per_peer;
    244+    }
    245+
    246+    int64_t GetGlobalMaxUsage() const {
    247+        return std::max<int64_t>(int64_t(m_peer_orphanage_info.size()) * m_reserved_weight_per_peer, 1);
    


    mzumsande commented at 7:45 pm on February 20, 2025:

    If I understand it correctly, the current logic is that we assign additional weight for all peers that have an entry in m_peer_orphanage_info, which means we don’t assign weight for peers not participating in tx relay or that do but never sent us an orphan before. But once they sent us an orphan, that weight will stay reserved until they disconnect (and can be used by other peers too), even if they never send us another orphan.

    This seems like a middle-ground between assigning each peer a fixed DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER share which may not be exceeded, and assigning a pool DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER for each peer (whether that peer sent us an orphan before or not) - was that the purpose of choosing to do it this way?

  109. in src/txorphanage.h:42 in 6cbe944539 outdated
    33@@ -27,7 +34,26 @@ static constexpr auto ORPHAN_TX_EXPIRE_INTERVAL{5min};
    34  * Not thread-safe. Requires external synchronization.
    35  */
    36 class TxOrphanage {
    37+    /** The usage (weight) reserved for each peer, representing the amount of memory we are willing
    38+     * to allocate for orphanage space. Note that this number is a reservation, not a limit: peers
    39+     * are allowed to exceed this reservation until the global limit is reached, and peers are
    40+     * effectively guaranteed this amount of space. Reservation is per-peer, so the global upper
    41+     * bound on memory usage scales up with more peers. */
    42+    unsigned int m_reserved_weight_per_peer{DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER};
    


    mzumsande commented at 8:30 pm on February 20, 2025:

    One possibility is that an attacker who wants to spam us with orphans can do so via other peers, without providing the orphans themselves:

    1.) Attacker A inv’s a parent P to victim V but doesn’t answer GETDATA (2 minutes timeout) 2.) Within these 2 minutes, A sends P and multiple non-conflicting children C to the rest of the network - these are accepted/relayed by all other peers, so will eventually be announced to V by all of its legitimate peers. 3.) All of V’s legitimate peers will announce P and C, V will only request C (because P is in flight) and saves them as orphans / adds all of its peers as announcers. 4.) V’s orphanage reaches its limits. With 125 peers, you’d need ~25 children to reach the max announcement limit of 3000, so you might need to split the children over multiple parents because of the mempool descendant limit. 5.) V will start evicting announcements randomly from peers, so it may also evict legitimate unrelated orphans.

    Unlike with the status quo where attackers can just spam us with orphans for free, this is not free (P and C need to be valid transactions with a sufficient fee to be relayed), but I couldn’t think of any countermeasure, so maybe it kind of sets a theoretical limit on how certain we can be that legitimate orphans are resolved?


    instagibbs commented at 10:48 pm on February 20, 2025:

    IIUC this is basically a way of slowing down the parent resolution of a target package by 2 minutes, thereby increasing the window of possible random eviction when under real cpfp traffic.

    Without this delay we can still have natural overflow if there are too many (or too large) real packages in flight, the time window here just gets a lot longer

    Unfortunately we have no way of peers to communicate which orphans they claim are higher paying (to preferentially evict those last f.e.). Then at least orphan package makers could try to outbid the queue. Maybe something to think about with a future sender-initiated protocol?

    Being more aggressive about timing out inbound peers(and or txid-relay) when they aren’t responding to getdatas could also be on the table, at the cost of bandwidth in the average case over a slow link.

    re:(2) you could also just imagine the scenario where the attacker simply relies on natural cpfp traffic and may just get lucky that it gets evicted in the two minutes. Costs the attacker nothing in this case though chances of failure are likely way higher?


    mzumsande commented at 4:49 pm on February 21, 2025:

    IIUC this is basically a way of slowing down the parent resolution of a target package by 2 minutes, thereby increasing the window of possible random eviction when under real cpfp traffic.

    Not sure if I understood that right, but I think that would be a different kind of attack where the attacker knows the target packet. What I meant was that that the attacker crafts spam transactions (P/C) that the rest of the network relays, but that end up only in the victim’s orphanage, resulting in eviction of random other orphans (about which the attacker doesn’t need to know any specifics) - similar to how an attacker could today just spam orphans to trigger random eviction on master, but no longer for free.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me