p2p: improve TxOrphanage denial of service bounds #31829

pull glozow wants to merge 17 commits into bitcoin:master from glozow:2025-01-orphanage-peer-dos changing 22 files +1873 βˆ’883
  1. glozow commented at 9:14 pm on February 9, 2025: member

    This PR is part of the orphan resolution project, see #27463.

    This design came from collaboration with sipa - thanks.

    We want to limit the CPU work and memory used by TxOrphanage to avoid denial of service attacks. On master, this is achieved by limiting the number of transactions in this data structure to 100, and the weight of each transaction to 400KWu (the largest standard tx) [0]. We always allow new orphans, but if the addition causes us to exceed 100, we evict one randomly. This is dead simple, but has problems:

    • It makes the orphanage trivially churnable: any one peer can render it useless by spamming us with lots of orphans. It’s possible this is happening: “Looking at data from node alice on 2024-09-14 shows that we’re sometimes removing more than 100k orphans per minute. This feels like someone flooding us with orphans.” [1]
    • Effectively, opportunistic 1p1c is useless in the presence of adversaries: it is opportunistic and pairs a low feerate tx with a child that happens to be in the orphanage. So if nothing is able to stay in orphanages, we can’t expect 1p1cs to propagate.
    • This number is also often insufficient for the volume of orphans we handle: historical data show that overflows are pretty common, and there are times where “it seems like [the node] forgot about the orphans and re-requested them multiple times.” [1]

    Just jacking up the -maxorphantxs number is not a good enough solution, because it doesn’t solve the churnability problem, and the effective resource bounds scale poorly.

    This PR introduces numbers for {global, per-peer} {memory usage, announcements}, representing resource limits:

    • The (constant) global announcement limit is the number of unique (wtxid, peer) pairs in the orphanage [2]. This represents a cap on CPU, and does not change with the number of peers we have. Evictions must happen whenever this limit is reached.
    • The (variable) per-peer announcement limit is the global announcement limit divided by the number of peers. Peers are allowed to exceed this limit provided the global announcement limit has not been reached. The per-peer announcement limit decreases with more peers.
    • The (constant) per-peer memory usage reservation is the amount of orphan weight [3] reserved per peer [4]. Reservation means that peers are effectively guaranteed this amount of space. It is not a limit; peers are allowed to exceed this limit provided the global usage limit is not reached.
    • The (variable) global memory usage limit is the number of peers multiplied by the per-peer reservation [5]. As such, the global memory usage limit scales up with the number of peers we have. Evictions must happen whenever this limit is reached.
    • We introduce a “Peer DoS Score” which is the maximum between its “CPU Score” and “Memory Score.” The CPU score is the ratio between the number of orphans announced by this peer / peer announcement limit. The memory score is the total usage of all orphans announced by this peer / peer usage reservation.

    Eviction changes in a few ways:

    • It is triggered if the global announcement limit or global memory usage limit is exceeded.
    • On each iteration of the loop, instead of selecting a random orphan, we select a peer and delete 1 of its announcements. Specifically, we select the peer with the highest DoS score, which is the maximum between its CPU DoS score (based on announcements) and Memory DoS score (based on tx weight). After the peer has been selected, we evict the oldest orphan (non-reconsiderable sorted before reconsiderable).
    • Instead of evicting orphans, we evict announcements. An orphan is still in the orphanage as long as there is 1 peer announcer. Of course, over the course of several iteration loops, we may erase all announcers, thus erasing the orphan itself. The purpose of this change is to prevent a peer from being able to trigger eviction of another peer’s orphans.

    This PR also:

    • Reimplements TxOrphanage as single multi-index container.
    • Effectively bounds the number of transactions that can be in a peer’s work set by ensuring it is a subset of the peer’s announcements.
    • Removes the -maxorphantxs config option, as the orphanage no longer limits by unique orphans.

    This means we can receive 1p1c packages in the presence of spammy peers. It also makes the orphanage more useful and increases our download capacity without drastically increasing orphanage resource usage.

    [0]: This means the effective memory limit in orphan weight is 100 * 400KWu = 40MWu [1]: https://delvingbitcoin.org/t/stats-on-orphanage-overflows/1421 [2]: Limit is 3000, which is equivalent to one max size ancestor package (24 transactions can be missing inputs) for each peer (default max connections is 125). [3]: Orphan weight is used in place of actual memory usage because something like “one maximally sized standard tx” is easier to reason about than “considering the bytes allocated for vin and vout vectors, it needs to be within N bytes…” etc. We can also consider a different formula to encapsulate more the memory overhead but still have an interface that is easy to reason about. [4]: The limit is 404KWu, which is the maximum size of an ancestor package. [5]: With 125 peers, this is 50.5MWu, which is a small increase from the existing limit of 40MWu. While the actual memory usage limit is higher (this number does not include the other memory used by TxOrphanage to store the outpoints map, etc.), this is within the same ballpark as the old limit.

  2. glozow added the label P2P on Feb 9, 2025
  3. DrahtBot commented at 9:14 pm on February 9, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31829.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Approach ACK sipa

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32631 (refactor: Convert GenTxid to std::variant by marcofleon)
    • #32189 (refactor: Txid type safety (parent PR) by marcofleon)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
    • #28690 (build: Introduce internal kernel library by TheCharlatan)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. glozow force-pushed on Feb 9, 2025
  5. DrahtBot added the label CI failed on Feb 9, 2025
  6. DrahtBot commented at 9:20 pm on February 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36925040096

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. glozow force-pushed on Feb 9, 2025
  8. glozow force-pushed on Feb 10, 2025
  9. glozow commented at 1:13 pm on February 10, 2025: member
    Rebased
  10. glozow marked this as ready for review on Feb 10, 2025
  11. DrahtBot removed the label CI failed on Feb 10, 2025
  12. in src/txorphanage.cpp:55 in ef2f44e653 outdated
    35@@ -36,9 +36,10 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    36         return false;
    37     }
    38 
    


    instagibbs commented at 4:43 pm on February 10, 2025:

    ef2f44e653a4877d4e65fbd5a51ec83ceb96d212

    MAX_GLOBAL_ORPHAN_ANNOUNCEMENTS doesn’t occur in the PR, should be DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS?


    glozow commented at 12:04 pm on February 11, 2025:
    thanks, fixed
  13. in src/txorphanage.h:47 in ef2f44e653 outdated
    42+    unsigned int m_reserved_weight_per_peer{DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER};
    43+
    44+    /** The maximum number of announcements across all peers, representing a computational upper bound,
    45+     * i.e. the maximum number of evictions we might do at a time. There is no per-peer announcement
    46+     * limit until the global limit is reached. Also, this limit is constant regardless of how many
    47+     * peers we have: if we only have 1 peer, this is the number of orphans they may provide. As
    


    instagibbs commented at 4:50 pm on February 10, 2025:
    s/may provide/may provide without risking eviction/?

    glozow commented at 12:06 pm on February 11, 2025:
    changed
  14. in src/txorphanage.h:48 in ef2f44e653 outdated
    43+
    44+    /** The maximum number of announcements across all peers, representing a computational upper bound,
    45+     * i.e. the maximum number of evictions we might do at a time. There is no per-peer announcement
    46+     * limit until the global limit is reached. Also, this limit is constant regardless of how many
    47+     * peers we have: if we only have 1 peer, this is the number of orphans they may provide. As
    48+     * more peers are added, each peer's allocation is reduced. */
    


    instagibbs commented at 4:50 pm on February 10, 2025:
    s/allocation/protected allocation/?

    glozow commented at 12:06 pm on February 11, 2025:
    added
  15. in src/txorphanage.h:148 in ef2f44e653 outdated
    136@@ -111,7 +137,6 @@ class TxOrphanage {
    137 
    138 protected:
    139     struct OrphanTx : public OrphanTxBase {
    


    instagibbs commented at 4:51 pm on February 10, 2025:
    do we think we’ll end up using the derived class in a meaningful way later?

    glozow commented at 4:33 am on February 12, 2025:
    not entirely sure… could be cleaned up

    glozow commented at 5:42 pm on June 4, 2025:
    this is gone now
  16. in src/txorphanage.h:247 in ef2f44e653 outdated
    233+
    234+    unsigned int GetGlobalMaxUsage() const {
    235+        return std::max<unsigned int>(m_peer_orphanage_info.size() * m_reserved_weight_per_peer, 1);
    236+    }
    237+
    238+    unsigned int GetPerPeerMaxAnnouncements() const {
    


    instagibbs commented at 5:02 pm on February 10, 2025:
    note: this is use to “normalize” the one DoS score metric against the other only. If this wasn’t used, trimming would heavily favor announcement trimming scores first.

    glozow commented at 4:40 am on February 12, 2025:
    Yes, they’re both meant to be ratios :+1:
  17. in src/txorphanage.h:196 in ef2f44e653 outdated
    182+            FeeFrac cpu_score(m_iter_list.size(), peer_max_ann);
    183+            FeeFrac mem_score(m_total_usage, peer_max_mem);
    184+            return std::max<FeeFrac>(cpu_score, mem_score);
    185+        }
    186     };
    187     std::map<NodeId, PeerOrphanInfo> m_peer_orphanage_info;
    


    instagibbs commented at 5:11 pm on February 10, 2025:

    re:global memory limits, they won’t be increased until each connected peer offers up their own orphan or announcement (which makes sense)

    would be good to have test/fuzz coverage that there isn’t some bug where this continuously grows, raising the global limits to unsafe levels? Arg in SanityCheck?


    sipa commented at 4:26 am on February 19, 2025:

    Regarding https://github.com/bitcoin/bitcoin/pull/31829/commits/2771b69e461230feb7761b0afff41b44bd3ba34f#r1949554535

    With GetGlobalMaxUsage() computing memory limits on-the-fly, is this comment still relevant?


    instagibbs commented at 4:40 pm on February 20, 2025:

    I guess my worry is that we would somehow add a regression that results in the peer set growing indefinitely in the orphanage, which would grow memory usage substantially. Here’s some suggested coverage which would hopefully uncover such an issue (unless the caller plum forgets to EraseForPeer):

      0diff --git a/src/test/fuzz/txorphan.cpp b/src/test/fuzz/txorphan.cpp
      1index 9133323449..df0a02d75d 100644
      2--- a/src/test/fuzz/txorphan.cpp
      3+++ b/src/test/fuzz/txorphan.cpp
      4@@ -45,10 +45,15 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
      5         outpoints.emplace_back(Txid::FromUint256(uint256{i}), 0);
      6     }
      7 
      8     CTransactionRef ptx_potential_parent = nullptr;
      9 
     10+    // Peers which have offered orphans (via tx or announcement) and have not subsequently
     11+    // "disconnected" aka called EraseForPeer
     12+    std::set<NodeId> connected_peers;
     13+
     14+
     15     LIMITED_WHILE(outpoints.size() < 200'000 && fuzzed_data_provider.ConsumeBool(), 10 * DEFAULT_MAX_ORPHAN_TRANSACTIONS)
     16     {
     17         // construct transaction
     18         const CTransactionRef tx = [&] {
     19             CMutableTransaction tx_mut;
     20@@ -121,14 +126,16 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     21 
     22                         if (add_tx) {
     23                             Assert(orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start);
     24                             Assert(orphanage.TotalOrphanUsage() == tx_weight + total_bytes_start);
     25                             Assert(tx_weight <= MAX_STANDARD_TX_WEIGHT);
     26+                            connected_peers.insert(peer_id);
     27                         } else {
     28                             // Peer may have been added as an announcer.
     29                             if (orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start) {
     30                                 Assert(orphanage.HaveTxFromPeer(wtxid, peer_id));
     31+                                connected_peers.insert(peer_id);
     32                             } else {
     33                                 // Otherwise, there must not be any change to the peer byte count.
     34                                 Assert(orphanage.UsageByPeer(peer_id) == total_peer_bytes_start);
     35                             }
     36 
     37@@ -158,10 +165,11 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     38                         // Total bytes should not have changed. If peer was added as announcer, byte
     39                         // accounting must have been updated.
     40                         Assert(orphanage.TotalOrphanUsage() == total_bytes_start);
     41                         if (added_announcer) {
     42                             Assert(orphanage.UsageByPeer(peer_id) == tx_weight + total_peer_bytes_start);
     43+                            connected_peers.insert(peer_id);
     44                         } else {
     45                             Assert(orphanage.UsageByPeer(peer_id) == total_peer_bytes_start);
     46                         }
     47                     }
     48                 },
     49@@ -190,10 +198,11 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     50                         Assert(!have_tx && !have_tx_and_peer && !orphanage.EraseTx(wtxid));
     51                     }
     52                 },
     53                 [&] {
     54                     orphanage.EraseForPeer(peer_id);
     55+                    connected_peers.erase(peer_id); // "DisconnectPeer"
     56                     Assert(!orphanage.HaveTxFromPeer(tx->GetWitnessHash(), peer_id));
     57                     Assert(orphanage.UsageByPeer(peer_id) == 0);
     58                 },
     59                 [&] {
     60                     // test mocktime and expiry
     61@@ -212,7 +221,8 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
     62 
     63         const bool have_tx{orphanage.HaveTx(tx->GetWitnessHash())};
     64         const bool get_tx_nonnull{orphanage.GetTx(tx->GetWitnessHash()) != nullptr};
     65         Assert(have_tx == get_tx_nonnull);
     66     }
     67-    orphanage.SanityCheck();
     68+
     69+    orphanage.SanityCheck(connected_peers.size());
     70 }
     71diff --git a/src/txorphanage.cpp b/src/txorphanage.cpp
     72index 40a2503af5..c6c156409d 100644
     73--- a/src/txorphanage.cpp
     74+++ b/src/txorphanage.cpp
     75@@ -402,11 +402,11 @@ std::vector<TxOrphanage::OrphanTxBase> TxOrphanage::GetOrphanTransactions() cons
     76         ret.push_back({o.second.tx, o.second.announcers, o.second.nTimeExpire});
     77     }
     78     return ret;
     79 }
     80 
     81-void TxOrphanage::SanityCheck() const
     82+void TxOrphanage::SanityCheck(int expected_num_peers) const
     83 {
     84     // Check that cached m_total_announcements is correct. First count when iterating through m_orphans (counting number
     85     // of announcers each), then count when iterating through peers (counting number of orphans per peer).
     86     unsigned int counted_total_announcements{0};
     87     // Check that m_total_orphan_usage is correct
     88@@ -461,10 +461,16 @@ void TxOrphanage::SanityCheck() const
     89                 return orphan_it->second.tx->GetWitnessHash() == wtxid;
     90             }) != info.m_iter_list.end());
     91         }
     92     }
     93 
     94+    // We should not be offering more global memory for orphanage than expected
     95+    if (expected_num_peers != -1) {
     96+        Assert((size_t) expected_num_peers == m_peer_orphanage_info.size());
     97+        Assert(GetGlobalMaxUsage() == std::max<int64_t>(expected_num_peers * GetPerPeerMaxUsage(), 1));
     98+    }
     99+
    100     Assert(wtxids_in_peer_map.size() == m_orphans.size());
    101     Assert(counted_total_announcements == 0);
    102 }
    103 
    104 bool TxOrphanage::NeedsTrim(unsigned int max_orphans) const
    105diff --git a/src/txorphanage.h b/src/txorphanage.h
    106index a5a6a94bec..562e9a3321 100644
    107--- a/src/txorphanage.h
    108+++ b/src/txorphanage.h
    109@@ -140,11 +140,11 @@ public:
    110         return peer_it == m_peer_orphanage_info.end() ? 0 : peer_it->second.m_iter_list.size();
    111     }
    112 
    113     /** Check consistency between PeerOrphanInfo and m_orphans. Recalculate counters and ensure they
    114      * match what is cached. */
    115-    void SanityCheck() const;
    116+    void SanityCheck(int expected_num_peers = -1) const;
    117 
    118 protected:
    119     struct OrphanTx : public OrphanTxBase {
    120     };
    
  18. in src/txorphanage.cpp:103 in ef2f44e653 outdated
    108-        it_last->second.list_pos = old_pos;
    109+        // Find this orphan iter's position in the list, and delete it.
    110+        auto& orphan_list = peer_it->second.m_iter_list;
    111+        size_t old_pos = std::distance(orphan_list.begin(), std::find(orphan_list.begin(), orphan_list.end(), it));
    112+
    113+        if (!Assume(old_pos < orphan_list.size())) continue;
    


    instagibbs commented at 5:21 pm on February 10, 2025:
    Maybe a bit too paranoid since it was just computed explicitly from the underlying list?

    glozow commented at 12:04 pm on February 11, 2025:
    removed
  19. in src/txorphanage.cpp:200 in ef2f44e653 outdated
    200-        ++nEvicted;
    201+        if (!Assume(!peer_it_heap.empty())) break;
    202+        // Find the peer with the highest DoS score, which is a fraction of {usage, announcements} used
    203+        // over the allowance. This metric causes us to naturally select peers who have exceeded
    204+        // their limits (i.e. a DoS score > 1) before peers who haven't. We may choose the same peer
    205+        // change since the last iteration of this loop.
    


    instagibbs commented at 5:29 pm on February 10, 2025:
    wording confusion: same peer change?

    glozow commented at 12:05 pm on February 11, 2025:
    fixed
  20. in src/test/orphanage_tests.cpp:247 in 95b61662e5 outdated
    242+        auto ptx = MakeTransactionSpending(/*outpoints=*/{}, det_rand);
    243+        orphanage.AddTx(ptx, dos_peer);
    244+    }
    245+    peer_usages.emplace_back(orphanage.UsageByPeer(dos_peer));
    246+
    247+    // Force an eviction. Note that no limiting has happened yet at this point. LimitOrphans may
    


    instagibbs commented at 5:44 pm on February 10, 2025:

    happened yet at this point

    Do you mean prior to LimitOrphans?


    glozow commented at 12:12 pm on February 11, 2025:
    yes, clarified
  21. in src/test/orphanage_tests.cpp:256 in 95b61662e5 outdated
    250+    orphanage.LimitOrphans(prev_count - 1, det_rand);
    251+    BOOST_CHECK(orphanage.Size() <= prev_count - 1);
    252+
    253+    // The DoS peer's orphans have been evicted, nobody else's have.
    254+    for (NodeId peer{0}; peer <= dos_peer; ++peer) {
    255+        BOOST_CHECK_EQUAL(peer == dos_peer, peer_usages.at(peer) != orphanage.UsageByPeer(peer));
    


    instagibbs commented at 5:46 pm on February 10, 2025:
    nit: would rather we check that we evicted the dos’y peer and somehow didn’t add more resources allocated to him

    glozow commented at 12:16 pm on February 11, 2025:
    Added
  22. in test/functional/p2p_opportunistic_1p1c.py:476 in 653f1bb84d outdated
    467+        peer_normal.wait_for_getdata([parent_txid_int])
    468+
    469+        self.log.info("Send another round of very large orphans from a DoSy peer")
    470+        for large_orphan in large_orphans[60:]:
    471+            peer_doser.send_and_ping(msg_tx(large_orphan))
    472+
    


    instagibbs commented at 6:04 pm on February 10, 2025:

    could do this for both cases

    0       # Something was evicted
    1        assert_greater_than(len(large_orphans), len(node.getorphantxs()))
    

    glozow commented at 12:37 pm on February 11, 2025:
    added
  23. in src/bench/txorphanage.cpp:103 in 3ce9ef7dd3 outdated
    44+}
    45+
    46+static void OrphanageEvictionMany(benchmark::Bench& bench)
    47+{
    48+    NodeId NUM_PEERS{125};
    49+    unsigned int NUM_TRANSACTIONS(DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS / NUM_PEERS);
    


    instagibbs commented at 6:15 pm on February 10, 2025:
    seems wrong, you’re sending 3000/125=24 transactions total?

    glozow commented at 12:18 pm on February 11, 2025:
    Yes, and they are each announced by every peer. This bench is to test the maximum number of transactions where every peer has 100% overlap. We call AddTx DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS times, which is the maximum before eviction would trigger. If we increase DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS, the bench will scale too.

    instagibbs commented at 3:02 pm on February 15, 2025:
    are you sure it’s being sent via every peer for this benchmark? Looks like there’s no overlap?

    glozow commented at 4:44 pm on February 20, 2025:
    You are right. That explains why it doesn’t get slower haha
  24. in src/txorphanage.cpp:276 in 33034eaa3b outdated
    275+                // items that are no longer in the orphanage. We should only do this once per peer
    276+                // per call to AddChildrenToWorkSet, so keep track of which peers we have trimmed.
    277+                // We also never need to do it more than once since evictions don't happen in this
    278+                // function.
    279+                if (orphan_work_set.size() + 1 > MAX_ORPHAN_WORK_QUEUE && !peers_workset_trimmed.contains(announcer)) {
    280+                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return m_orphans.contains(wtxid); });
    


    instagibbs commented at 8:28 pm on February 10, 2025:

    this is backwards. See added unit test:

     0diff --git a/src/test/orphanage_tests.cpp b/src/test/orphanage_tests.cpp
     1index fe0f81fdb4..dde42d9d4a 100644
     2--- a/src/test/orphanage_tests.cpp
     3+++ b/src/test/orphanage_tests.cpp
     4@@ -77,0 +78,24 @@ static CTransactionRef MakeTransactionSpending(const std::vector<COutPoint>& out
     5+// 101 output transaction
     6+static CTransactionRef MakeHugeTransactionSpending(const std::vector<COutPoint>& outpoints, FastRandomContext& det_rand)
     7+{
     8+    CKey key;
     9+    MakeNewKeyWithFastRandomContext(key, det_rand);
    10+    CMutableTransaction tx;
    11+    // If no outpoints are given, create a random one.
    12+    if (outpoints.empty()) {
    13+        tx.vin.emplace_back(Txid::FromUint256(det_rand.rand256()), 0);
    14+    } else {
    15+        for (const auto& outpoint : outpoints) {
    16+            tx.vin.emplace_back(outpoint);
    17+        }
    18+    }
    19+    // Ensure txid != wtxid
    20+    tx.vin[0].scriptWitness.stack.push_back({1});
    21+    tx.vout.resize(101);
    22+    tx.vout[0].nValue = CENT;
    23+    tx.vout[0].scriptPubKey = GetScriptForDestination(PKHash(key.GetPubKey()));
    24+    tx.vout[1].nValue = 3 * CENT;
    25+    tx.vout[1].scriptPubKey = GetScriptForDestination(WitnessV0KeyHash(key.GetPubKey()));
    26+    return MakeTransactionRef(tx);
    27+}
    28+
    29@@ -598,0 +623,23 @@ BOOST_AUTO_TEST_CASE(peer_worksets)
    30+
    31+        {
    32+            // We will fill the orphanage with a single parent and 101 children
    33+            // from that single transaction to cause potential deletion of work set
    34+            // from peer 0.
    35+            auto tx_missing_parent = MakeHugeTransactionSpending({}, det_rand);
    36+            std::vector<CTransactionRef> tx_orphans;
    37+            for (unsigned int i{0}; i < MAX_ORPHAN_WORK_QUEUE + 1; i++) {
    38+                auto tx_orphan = MakeTransactionSpending({COutPoint{tx_missing_parent->GetHash(), i}}, det_rand);
    39+                BOOST_CHECK(orphanage.AddTx(tx_orphan, /*peer=*/node0));
    40+            }
    41+
    42+            // 101 transactions in the orphanage (no trimming of orphanage yet), now
    43+            // add parent to work set, which will all be allocated to peer 0.
    44+            // work set should get trimmed exactly once down to MAX_ORPHAN_WORK_QUEUE
    45+            orphanage.AddChildrenToWorkSet(*tx_missing_parent, det_rand);
    46+            for (unsigned int i{0}; i < MAX_ORPHAN_WORK_QUEUE; i++) {
    47+                BOOST_CHECK(orphanage.GetTxToReconsider(node0));
    48+            }
    49+
    50+            // We should have emptied the work queue in MAX_ORPHAN_WORK_QUEUE steps
    51+            BOOST_CHECK(!orphanage.HaveTxToReconsider(node0));
    52+        }
    53diff --git a/src/txorphanage.cpp b/src/txorphanage.cpp
    54index 09d50a244a..2a881db669 100644
    55--- a/src/txorphanage.cpp
    56+++ b/src/txorphanage.cpp
    57@@ -276 +276 @@ void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, FastRandomContext
    58-                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return m_orphans.contains(wtxid); });
    59+                    std::erase_if(orphan_work_set, [&](const auto& wtxid) { return !m_orphans.contains(wtxid); });
    

    glozow commented at 12:45 pm on February 11, 2025:
    Wow, very bad bit flip :facepalm: thank you

    glozow commented at 5:06 pm on February 11, 2025:
    Wrote a similar test to check that an evicted work item is the one that doesn’t exist in m_orphans anymore.
  25. in src/txorphanage.cpp:281 in 33034eaa3b outdated
    281+                    peers_workset_trimmed.insert(announcer);
    282+                }
    283+
    284+                // Add this tx to the work set. If the workset is full, even after trimming, don't
    285+                // accept any new work items until the work queue has been flushed.
    286+                if (orphan_work_set.size() < MAX_ORPHAN_WORK_QUEUE) {
    


    instagibbs commented at 8:40 pm on February 10, 2025:
    should we debug log if we’re not adding to work set? might be good to know it’s happening

    glozow commented at 4:39 am on February 12, 2025:
    Added a couple pushes ago, but gone again. After a bit of offline discussion with @mzumsande and @sipa, it seemed better to just synchronously remove wtxids from worksets when they are removed as announcements. This means that the work set is always a subset of announcement set (added this to SanityCheck). Also, potentially failing to add things to work set seemed to make this less useful.

    instagibbs commented at 12:45 pm on February 12, 2025:
    Ok! that was my suggestion offline too. Bounding announcements means we bound the workset :+1:
  26. in src/txorphanage.cpp:149 in ff24c1feeb outdated
    139@@ -140,13 +140,12 @@ void TxOrphanage::EraseForPeer(NodeId peer)
    140     if (nErased > 0) LogDebug(BCLog::TXPACKAGES, "Erased %d orphan transaction(s) from peer=%d\n", nErased, peer);
    141 }
    142 
    143-void TxOrphanage::LimitOrphans(unsigned int max_orphans, FastRandomContext& rng)
    144+unsigned int TxOrphanage::MaybeExpireOrphans()
    145 {
    146-    unsigned int nEvicted = 0;
    147+    int nErased = 0;
    


    mzumsande commented at 8:42 pm on February 10, 2025:
    nit: doesn’t really matter if unsigned int (return value) or int is used, but would be nice to make it consistent, also with the %u / %d format specifiers in the logprints.

    glozow commented at 4:37 am on February 12, 2025:
    made consistent + made both logs %u
  27. in src/txorphanage.cpp:245 in ff24c1feeb outdated
    175         ++nEvicted;
    176     }
    177+    return nEvicted;
    178+}
    179+
    180+void TxOrphanage::LimitOrphans(unsigned int max_orphans, FastRandomContext& rng)
    


    mzumsande commented at 9:12 pm on February 10, 2025:
    nit: commit msg of c929d71c0828544be509934312b6a7d11b47ea4d lacks a verb (e.g. “split”).

    glozow commented at 4:25 am on February 12, 2025:
    added
  28. in test/functional/p2p_opportunistic_1p1c.py:451 in 653f1bb84d outdated
    446+        # normal package request to time out.
    447+        self.wait_until(lambda: len(node.getorphantxs()) == num_individual_dosers)
    448+
    449+        self.log.info("Send an orphan from a non-DoSy peer. Its orphan should not be evicted.")
    450+        low_fee_parent = self.create_tx_below_mempoolminfee(self.wallet)
    451+        high_fee_child = self.wallet.create_self_transfer(utxo_to_spend=low_fee_parent["new_utxo"], fee_rate=20*FEERATE_1SAT_VB)
    


    instagibbs commented at 9:22 pm on February 10, 2025:
    the honest orphan should be as large as possible: target_vsize=100000

    glozow commented at 12:38 pm on February 11, 2025:
    done
  29. in test/functional/p2p_opportunistic_1p1c.py:500 in 653f1bb84d outdated
    495+                # runtime of this test.
    496+                peer_doser_batch.send_message(msg_tx(tx))
    497+
    498+        self.log.info("Send an orphan from a non-DoSy peer. Its orphan should not be evicted.")
    499+        low_fee_parent = self.create_tx_below_mempoolminfee(self.wallet)
    500+        high_fee_child = self.wallet.create_self_transfer(utxo_to_spend=low_fee_parent["new_utxo"], fee_rate=20*FEERATE_1SAT_VB)
    


    instagibbs commented at 9:23 pm on February 10, 2025:
    the honest orphan should be as large as possible: target_vsize=100000

    glozow commented at 12:38 pm on February 11, 2025:
    done
  30. in test/functional/p2p_opportunistic_1p1c.py:485 in 653f1bb84d outdated
    473+        self.log.info("Provide the orphan's parent. This 1p1c package should be successfully accepted.")
    474+        peer_normal.send_and_ping(msg_tx(low_fee_parent["tx"]))
    475+        assert_equal(node.getmempoolentry(orphan_tx.rehash())["ancestorcount"], 2)
    476+
    477+    @cleanup
    478+    def test_orphanage_dos_many(self):
    


    instagibbs commented at 9:26 pm on February 10, 2025:
    can we get a functional test case that covers the “protects fully sized ancestor package” scenario in p2p_orphan_handling.py?

    glozow commented at 4:38 am on February 12, 2025:
    (thanks, I took your test with just a few tweaks)
  31. instagibbs commented at 9:27 pm on February 10, 2025: member

    The resource bounds additions seem to make sense, still working through the workset change implications.

    I’ve got a minimal fuzz harness checking that the “honest” peer cannot be evicted, please feel free to take it: https://github.com/instagibbs/bitcoin/tree/2025-01-orphanage-peer-dos_greg_2

  32. in src/txorphanage.h:236 in ef2f44e653 outdated
    230+    unsigned int GetPerPeerMaxUsage() const {
    231+        return m_reserved_weight_per_peer;
    232+    }
    233+
    234+    unsigned int GetGlobalMaxUsage() const {
    235+        return std::max<unsigned int>(m_peer_orphanage_info.size() * m_reserved_weight_per_peer, 1);
    


    mzumsande commented at 9:58 pm on February 10, 2025:
    should this be int64_t instead of int in the spirit of the first commit?

    glozow commented at 4:37 am on February 12, 2025:
    fixed

    sipa commented at 2:25 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    The int64_t return type won’t actually do anything here, because std::max is instantiated for unsigned int, and also, the std::map::size() may return something smaller (particularly on 32-bit systems).

    0int64_t GetGlobalMaxUsage() const {
    1        return std::max<int64_t>(int64_t(m_peer_orphanage_info.size()) * m_reserved_weight_per_peer, 1);
    2}
    

    glozow commented at 12:57 pm on February 14, 2025:
    Thanks! Fixed
  33. in src/txorphanage.cpp:185 in ef2f44e653 outdated
    169@@ -165,13 +170,68 @@ unsigned int TxOrphanage::MaybeExpireOrphans()
    170 
    171 unsigned int TxOrphanage::MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng)
    172 {
    173+    // Exit early to avoid building the heap unnecessarily
    174+    if (!NeedsTrim() && m_orphans.size() <= max_orphans) return 0;
    175+
    176+    std::vector<PeerMap::iterator> peer_it_heap;
    177+    for (auto it = m_peer_orphanage_info.begin(); it != m_peer_orphanage_info.end(); ++it) peer_it_heap.push_back(it);
    178+    peer_it_heap.reserve(m_peer_orphanage_info.size());
    


    mzumsande commented at 10:35 pm on February 10, 2025:
    should call reserve before pushing entries to peer_it_heap, not after.

    glozow commented at 4:25 am on February 12, 2025:
    fixed
  34. in src/txorphanage.cpp:173 in ef2f44e653 outdated
    169@@ -165,13 +170,68 @@ unsigned int TxOrphanage::MaybeExpireOrphans()
    170 
    171 unsigned int TxOrphanage::MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng)
    172 {
    173+    // Exit early to avoid building the heap unnecessarily
    174+    if (!NeedsTrim() && m_orphans.size() <= max_orphans) return 0;
    


    mzumsande commented at 10:40 pm on February 10, 2025:
    Could m_orphans.size() <= max_orphans be inside NeedsTrim?

    glozow commented at 4:25 am on February 12, 2025:
    Moved into NeedsTrim
  35. glozow force-pushed on Feb 11, 2025
  36. glozow commented at 5:04 pm on February 11, 2025: member
    Thanks @instagibbs for the testing and review, added your fuzz commits and took comments. Still need to write the p2p_orphan_handling test.
  37. glozow force-pushed on Feb 11, 2025
  38. DrahtBot added the label CI failed on Feb 11, 2025
  39. DrahtBot commented at 5:08 pm on February 11, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37041607307

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  40. glozow force-pushed on Feb 11, 2025
  41. mzumsande commented at 7:33 pm on February 11, 2025: contributor

    Halfway through, some minor points below - my main conceptual question is why m_total_announcements is a meaningful metric in limiting the orphanage.

    My understanding is that m_total_orphan_usage exists to limit memory usage, and m_total_announcements to limit CPU usage - but why the number of announcements instead of number of orphans? Why would it make the situation any less DoSy if we remove an announcer but keep the orphan? Since we only assign the tx to one peer’s workset after 7426afbe62414fa575f91b4f8d3ea63bcc653e8b, more announcers for the same number of orphans doesn’t really mean any additional work.

  42. glozow commented at 10:49 pm on February 11, 2025: member

    My understanding is that m_total_orphan_usage exists to limit memory usage, and m_total_announcements to limit CPU usage - but why the number of announcements instead of number of orphans?

    Yep, to limit CPU usage. The complexity of eviction for example is bounded by the total number of announcements: in the worst case, each orphan has many announcers and the MaybeTrimOrphans loop first removes announcements until each orphan just has 1 left, and then finally can remove transactions. See comment above declaration, “The loop can run a maximum of m_max_global_announcement times”

    Why would it make the situation any less DoSy if we remove an announcer but keep the orphan?

    Perhaps I should have stated this in the OP more explicitly, but a major motivation for this eviction strategy is to prevent any peer from being to evict any announcements of another peer, hence the per-peer limits. If we changed the eviction code to remove orphans wholesale instead of just announcements, we’d have a similar situation to today’s: an attacker can cause churn of an honest orphan by announcing it along with a lot of other orphans.

    So evicting announcements instead of orphans isn’t less DoSy, but it does make the orphanage less churnable.

  43. glozow force-pushed on Feb 12, 2025
  44. glozow force-pushed on Feb 12, 2025
  45. in test/functional/p2p_orphan_handling.py:753 in 4e17767f4b outdated
    748+            peer_doser.send_and_ping(msg_tx(large_orphan))
    749+
    750+        self.log.info("Provide the top ancestor. The whole package should be re-evaluated after enough time.")
    751+        peer_normal.send_and_ping(msg_tx(ancestor_package[0]["tx"]))
    752+
    753+        self.wait_until(lambda: node.getmempoolentry(ancestor_package[-1]["txid"])["ancestorcount"] == DEFAULT_ANCESTOR_LIMIT)
    


    instagibbs commented at 2:06 pm on February 12, 2025:
    on second thought, this will probably throw an assertion if the final child isn’t in the mempool yet? Probably need to prepend this clause with ancestor_package[-1]["txid"] in node.getrawmempool()

    glozow commented at 4:00 pm on February 13, 2025:
    ah hm, maybe we don’t need to wait? I guess if you send a ping, you don’t get a pong until all 24 are processed.

    instagibbs commented at 5:04 pm on February 13, 2025:
    I’ll defer to you, but IIUC we’ll do ~1 orphan processing per message processing step, so it might take a bit more to process all 24 from orphanage? Alternatively we could query the first tx and wait until the descendant count hits DEFAULT_ANCESTOR_LIMIT

    glozow commented at 12:52 pm on February 14, 2025:
    Hold on, I don’t see why it’d throw an assertion? ancestor_package[-1] is the last child right? Added some more comments but didn’t change the code
  46. in test/functional/p2p_orphan_handling.py:69 in 889afbadb4 outdated
    60-            self.nodes[0].bumpmocktime(LONG_TIME_SKIP)
    61             # Check that mempool and orphanage have been cleared
    62             self.wait_until(lambda: len(self.nodes[0].getorphantxs()) == 0)
    63             assert_equal(0, len(self.nodes[0].getrawmempool()))
    64+
    65+            self.restart_node(0, extra_args=["-persistmempool=0"])
    


    instagibbs commented at 2:20 pm on February 12, 2025:

    889afbadb417e9422c7c06fd074981fa62045568

    Since we’re restarting does this wipe the mocktime on the node? Doesn’t seem to affect timings, but I think it’s easier to think about the test this way.

    note that if you set it here, you also need to setmocktime any other time you restart as well


    glozow commented at 12:53 pm on February 14, 2025:
    nice, thanks - added a setmocktime() after all the restarts
  47. in src/txorphanage.h:179 in 8520f1e493 outdated
    169@@ -138,10 +170,27 @@ class TxOrphanage {
    170          * m_total_orphan_size. If a peer is removed as an announcer, even if the orphan still
    171          * remains in the orphanage, this number will be decremented. */
    172         int64_t m_total_usage{0};
    173+
    174+        /** Orphan transactions in vector for quick random eviction */
    175+        std::vector<OrphanMap::iterator> m_iter_list;
    


    sipa commented at 2:21 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    I think it would be helpful to add this variable, and the testing thereof, in a separate commit from the actual eviction changes.


    glozow commented at 12:53 pm on February 14, 2025:
    Added a commit before that one, just adding the list and sanity checking
  48. in src/txorphanage.h:191 in 8520f1e493 outdated
    181+         * If the peer is using more than the allowed for either resource, its DoS score is > 1.
    182+         * A peer having a DoS score > 1 does not necessarily mean that something is wrong, since we
    183+         * do not trim unless the orphanage exceeds global limits, but it means that this peer will
    184+         * be selected for trimming sooner. */
    185+        FeeFrac GetDoSScore(unsigned int peer_max_ann, unsigned int peer_max_mem) {
    186+            FeeFrac cpu_score(m_iter_list.size(), peer_max_ann);
    


    sipa commented at 2:26 pm on February 12, 2025:
    Huh, neat, I hadn’t considered using FeeFrac here, but it fits.
  49. in src/txorphanage.cpp:192 in 8520f1e493 outdated
    178+
    179+    // Sort peers that have the highest ratio of DoSiness first
    180+    auto compare_peer = [this](PeerMap::iterator left, PeerMap::iterator right) {
    181+        const auto max_ann{GetPerPeerMaxAnnouncements()};
    182+        const auto max_mem{GetPerPeerMaxUsage()};
    183+        return left->second.GetDoSScore(max_ann, max_mem) < right->second.GetDoSScore(max_ann, max_mem);
    


    sipa commented at 2:31 pm on February 12, 2025:

    In commit “[txorphanage] when full, evict from the DoSiest peers first”

    Using FeeFrac::operator< here, which tiebreaks by biggest denominator first in case the ratios are equal, means that if two peers are equally DoSsy, but one is that due to memory usage, and the other is that due to announcements, the announcements one will be targetted first. That’s probably fine, but perhaps worth documenting.


    glozow commented at 12:53 pm on February 14, 2025:
    good point, added a comment
  50. sipa commented at 2:35 pm on February 12, 2025: member
    Approach ACK
  51. in test/functional/p2p_opportunistic_1p1c.py:506 in acbe37029e outdated
    501+                # Don't sync with ping or wait for responses, because it dramatically increases the
    502+                # runtime of this test.
    503+                peer_doser_batch.send_message(msg_tx(tx))
    504+
    505+        # Something was evicted
    506+        self.wait_until(lambda: len(node.getorphantxs()) < batch_size * num_peers)
    


    instagibbs commented at 2:56 pm on February 12, 2025:

    this will trivially pass (and not actually ensure there was an eviction) unless you let the orphans be processed with a send_and_ping.

    It’s also not true, because no evictions will happen with just 1000 orphans

    Here’s a suggested patchset which has this subtest run in ~2m30s and actually causes evictions with default parameters: https://github.com/instagibbs/bitcoin/commit/fedea4a17b7fc4c442b0ad98b51b85ff93a55beb


    glozow commented at 12:54 pm on February 14, 2025:
    Thanks! Added. It does take a long time but that’s the only way to get evictions due to announcements in a functional test.

    glozow commented at 2:07 pm on February 14, 2025:

    Hm, one of the CI failures is a timeout for this (the other is a wallet thing). Perhaps it takes a bit too long? A few ideas:

    • Change these to batches of 200 * 10 + 101 * 10 and just do the wait_until once to make the test faster.
    • Use -maxorphantxs to reduce the limit.
    • Wait after each orphan is sent. This makes the overall test a lot longer, but makes it less likely we’ll hit the timeout.
    • Keep a lower count of txns and settle for a test that fails on master but not on this PR.

    instagibbs commented at 2:09 pm on February 14, 2025:

    Too bad :(

    edit: my guess is we’ll need to lower max orphan count to make CI runs happy


    glozow commented at 5:54 pm on February 14, 2025:
    Hm, I don’t want to manually change -maxorphantxs. But hey, this is the idea for introducing the test prior to increasing the default to 3000. At that commit, the orphanage doesn’t go past 100, so it’s definitely doing evictions even with just 1010 + 101 orphans.

    glozow commented at 6:41 pm on February 14, 2025:
    Ok, I’ve made it 101 * 30 + 101 = 3131 total. I think the wait_until for the 1p1c orphan to be in orphanage kind of achieves what we want (i.e. evictions for each of the previous orphans have already been calculated) even though we don’t explicitly wait for each peer. this is not true anymore

    glozow commented at 2:05 pm on February 19, 2025:

    fwiw, here’s what I landed on for test_orphanage_dos_many: (1) send 51 of the same orphan from 60 peers (51 orphans, 3060 announcements). sync_with_ping and wait until at least one of the orphans is in orphanage (2) send the 1p1c, which is a small orphan (not 100kvb because prior to increasing -maxorphantxs, this peer’s memory DoS score could be higher than the other peers’ CPU DoS scores and get the tx evicted. Better to compare apples to apples). (3) send 51 unique orphans from 40 peers (2040 orphans, 2040 announcements). sync_with_ping and wait until we have at least 1 of the orphans from each peer. This gives us 2091 orphans, 5100 announcements total from the DoSy peers, plus 1 normal from the 1p1c.

    Before we increase -maxorphantxs, there are evictions during (3). After we increase -maxorphantxs, there are evictions in (1) and (3). The reason I’m doing the shared orphans first is so that we get to maximum announcements more quickly, and we can expect evictions during both rounds.

  52. in test/functional/p2p_opportunistic_1p1c.py:540 in acbe37029e outdated
    535+            peer_doser_shared = node.add_p2p_connection(P2PInterface())
    536+            for orphan in shared_orphans:
    537+                peer_doser_shared.send_message(msg_tx(orphan))
    538+
    539+        # Something was evicted; the orphanage does not contain all DoS orphans + the 1p1c child
    540+        self.wait_until(lambda: len(node.getorphantxs()) < batch_size * num_peers + len(shared_orphans) + 1)
    


    instagibbs commented at 3:33 pm on February 12, 2025:
    this is a buggy assertion because no evictions will be happening, you might just front-run validation and slip through on your local machine. see https://github.com/instagibbs/bitcoin/commit/fedea4a17b7fc4c442b0ad98b51b85ff93a55beb for what I think would be a valid assertion

    glozow commented at 12:55 pm on February 14, 2025:
    Added thanks!
  53. instagibbs commented at 3:34 pm on February 12, 2025: member
    combing through tests a bit, think I spotted the CI failure cause
  54. glozow added this to the milestone 29.0 on Feb 12, 2025
  55. glozow requested review from sr-gi on Feb 13, 2025
  56. glozow requested review from stickies-v on Feb 13, 2025
  57. glozow force-pushed on Feb 14, 2025
  58. DrahtBot removed the label CI failed on Feb 14, 2025
  59. glozow force-pushed on Feb 14, 2025
  60. glozow force-pushed on Feb 14, 2025
  61. DrahtBot added the label CI failed on Feb 14, 2025
  62. in src/test/fuzz/txorphan_protected.cpp:46 in a704b8a959 outdated
    41+
    42+    // We have NUM_PEERS, of which Peer==0 is the "honest" one
    43+    // who will never exceed their reserved weight of announcement
    44+    // count, and should therefore never be evicted.
    45+    const unsigned int NUM_PEERS = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 125);
    46+    const unsigned int global_announcement_limit = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(NUM_PEERS, 1'000);
    


    instagibbs commented at 7:04 pm on February 14, 2025:
    Let’s set the upper range to DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS?

    glozow commented at 7:23 pm on February 14, 2025:
    done
  63. in src/test/fuzz/txorphan_protected.cpp:50 in a704b8a959 outdated
    45+    const unsigned int NUM_PEERS = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 125);
    46+    const unsigned int global_announcement_limit = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(NUM_PEERS, 1'000);
    47+    // This must match announcement limit (or exceed) otherwise "honest" peer can be evicted
    48+    const unsigned int global_tx_limit = global_announcement_limit;
    49+    const unsigned int per_peer_announcements = global_announcement_limit / NUM_PEERS;
    50+    const unsigned int per_peer_weight_reservation = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 4'000);
    


    instagibbs commented at 7:05 pm on February 14, 2025:
    set the upper range to DEFAULT_ANCESTOR_SIZE_LIMIT_KVB * 4000? See no reason to not cover the full range (we could also increase the num_outs range to make larger ranges hit based on a bool?

    glozow commented at 7:24 pm on February 14, 2025:
    I made it DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER * 10
  64. glozow force-pushed on Feb 14, 2025
  65. glozow force-pushed on Feb 14, 2025
  66. in src/test/orphanage_tests.cpp:252 in 064af55a0a outdated
    247+    // Force an eviction. Note that no limiting has happened yet at this point (we haven't called
    248+    // LimitOrphans yet) so it may be oversize and LimitOrphans may evict more than 1 transaction.
    249+    // All evictions will be from the dos_peer's transactions.
    250+    const auto prev_count = orphanage.Size();
    251+    orphanage.LimitOrphans(prev_count - 1, det_rand);
    252+    BOOST_CHECK(orphanage.Size() <= prev_count - 1);
    


    mzumsande commented at 9:01 pm on February 14, 2025:
    This seems very cautious. The test first adds 150 txns, then another DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS (so there is a 1 to 1 relation between announcements and txns). If all txns are distinct, this should be enough to assert that Size() should shrink by 150 compared to prev_count, not just 1.

    glozow commented at 8:16 pm on February 18, 2025:
    changed to be asserting exact counts πŸ‘
  67. in test/functional/p2p_opportunistic_1p1c.py:478 in 793c80be58 outdated
    473+        self.log.info("Send another round of very large orphans from a DoSy peer")
    474+        for large_orphan in large_orphans[60:]:
    475+            peer_doser.send_and_ping(msg_tx(large_orphan))
    476+
    477+        # Something was evicted; the orphanage does not contain all large orphans + the 1p1c child
    478+        self.wait_until(lambda: len(node.getorphantxs()) < len(large_orphans) + 1)
    


    mzumsande commented at 9:26 pm on February 14, 2025:
    the numbers don’t add up for me: the tests creates 100 large orphans, sends 20, sends 1 small tx, then sends another 40 large orphans, and finally asserts that there are less than 101 entries in the orphanage. This would also be true without any eviction.

    glozow commented at 8:15 pm on February 18, 2025:
    thanks, fixed
  68. DrahtBot removed the label CI failed on Feb 14, 2025
  69. in src/txorphanage.cpp:100 in 944f61e6d5 outdated
     96         auto peer_it = m_peer_orphanage_info.find(peer);
     97         if (Assume(peer_it != m_peer_orphanage_info.end())) {
     98             peer_it->second.m_total_usage -= tx_size;
     99+
    100+            auto& orphan_list = peer_it->second.m_iter_list;
    101+            size_t old_pos = std::distance(orphan_list.begin(), std::find(orphan_list.begin(), orphan_list.end(), it));
    


    sipa commented at 2:32 pm on February 15, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    The cost of this std::distance may be O(num_orphans_per_peer), and the peer : it->second.announcers loop around can run up to O(num_announcers_per_tx). However, since the sum of all orphan_list lengths is equal to the total number of announcements, the overall cost of EraseTx is bounded by O(total_announcements). In both MaybeExpireOrphan and EraseForBlock, the function EraseTx may be invoked once per orphan, so I think this may actually mean O(total_orphans * total_announcements), which may be millions, which could mean a concerning amount of time. Something similar may apply to the call in MaybeTrimOrphans, but it’s a bit more complicated to analyse. Does that sound right?

    One possibility to reduce this may be to batch the removals. Replace EraseTx with a function that takes a set of wtxids to remove, and just loops over all peers’ orphan_lists once, removing anything that’s in the set. That would reduce the cost to just O(total_announcements). However, it would mean always iterating over all announcements whenever an orphan is erased.

    Alternatively, the iter_lists could be replaced with sets for faster removal, but that would increase the cost of random evictions in it from O(1) to O(n). That might not actually be an improvement for MaybeTrimOrphans.

    I think the proper solution is replacing the data structures that together encode the announcement sets (announcers, m_iter_list, and optionally also m_work_set) with a single global boost::multiindex, with hashed by-wtxid index, and a ranked by-(peer,wtxid) index (which allows for fast random access).


    instagibbs commented at 4:21 pm on February 15, 2025:

    Added a few more benchmarks to get an idea of what it could look like with current PR: https://github.com/instagibbs/bitcoin/commit/ba2e3e339cafdf1b38742b2c288a18dd32c63db3

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        7,012,786.00 |              142.60 |    0.9% |      0.08 | `OrphanageEvictionBlockManyPeers`
    3|       27,505,341.00 |               36.36 |    0.8% |      0.30 | `OrphanageEvictionBlockOnePeer`
    4|       12,507,729.00 |               79.95 |    0.6% |      0.14 | `OrphanageEvictionManyWithManyPeers`
    5|       26,721,356.00 |               37.42 |    0.4% |      0.29 | `OrphanageEvictionManyWithOnePeer`
    6|        7,262,273.00 |              137.70 |    3.3% |      0.08 | `OrphanageEvictionPeerMany`
    7|       22,306,678.00 |               44.83 |    0.7% |      0.25 | `OrphanageEvictionPeerOne`
    

    Added EraseForBlock, EraseForPeer(in a loop), and parameterized number of peers. Looks like the std::distance work is causing most of the time since things are slower with a single peer?


    sipa commented at 4:35 pm on February 15, 2025:

    @instagibbs I believe the worst case when max_orphans == max_announcements is to have exactly one peer, and then erasing the transactions in reverse order they appear in _m_list_iters.

    That would cost n^2/2 steps in std::distance.

    When max_announcements is larger than max_orphans, the worst case is having max_announcements / max_orphans peers, and every transaction be announced by all, I think.


    instagibbs commented at 4:44 pm on February 15, 2025:

    I believe the worst case when max_orphans == max_announcements is to have exactly one peer, and then erasing the transactions in reverse order they appear in _m_list_iters.

    I swapped the order of the block txns to force it to walk the whole list for the single peer, it’s about 10% slower


    glozow commented at 6:22 pm on February 15, 2025:

    Another solution maybe: replace set<NodeId> announcers in OrphanTxBase with a std::map<NodeId, size_t> announcers, where the value is the orphan’s position in the PeerOrphanInfo::m_iter_list. Previously, we had a list_posthat tracked the orphan’s location in m_orphan_list. That removes the need to do std::distance.

    On the whole though, I agree a multiindex is probably the most natural data structure for orphanage.


    sipa commented at 6:32 pm on February 15, 2025:
    @glozow That works, I believe.

    glozow commented at 10:15 pm on February 18, 2025:

    replace set announcers in OrphanTxBase with a std::map<NodeId, size_t> announcers, where the value is the orphan’s position in the PeerOrphanInfo::m_iter_list

    Went ahead with this solution.

  70. in test/functional/p2p_orphan_handling.py:635 in 212e1ea50c outdated
    648-        assert_equal(len(orphanage), DEFAULT_MAX_ORPHAN_TRANSACTIONS)
    649-
    650-        self.log.info("Clearing the orphanage")
    651-        for index, parent_orphan in enumerate(parent_orphans):
    652-            peer_1.send_and_ping(msg_tx(parent_orphan))
    653-        self.wait_until(lambda: len(node.getorphantxs()) == 0)
    


    kevkevinpal commented at 3:25 am on February 16, 2025:

    instead of removing this test we can keep it if we restart the node with the previous max orphan amount

    0self.restart_node(0, extra_args=["-maxorphantx=" + str(DEFAULT_MAX_ORPHAN_TRANSACTIONS)])
    

    and we can probably move DEFAULT_MAX_ORPHAN_TRANSACTIONS into this test_max_orphan_amount and rename it to max_orphan_amount since this isnt the default max orphan amount anymore.

    If we still don’t want this test we can remove DEFAULT_MAX_ORPHAN_TRANSACTIONS since it is only used in this test


    glozow commented at 8:15 pm on February 18, 2025:
    thanks, removed DEFAULT_MAX_ORPHAN_TRANSACTIONS
  71. glozow force-pushed on Feb 18, 2025
  72. glozow force-pushed on Feb 18, 2025
  73. in src/txorphanage.cpp:41 in 19c77223cd outdated
    35@@ -36,8 +36,12 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    36         return false;
    37     }
    38 
    39-    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, {peer}, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME}, m_orphan_list.size()});
    40+    auto& orphan_list = m_peer_orphanage_info.try_emplace(peer).first->second.m_iter_list;
    41+    std::map<NodeId, size_t> announcer_list_pos{{peer, orphan_list.size()}};
    42+    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, announcer_list_pos, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME}, m_orphan_list.size()});
    


    sipa commented at 2:04 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Use tx, std::move(announcer_list_pos), ... to avoid an allocation.


    glozow commented at 8:58 pm on February 18, 2025:
    thanks, done
  74. in src/txorphanage.h:158 in 19c77223cd outdated
    153+        /** Orphan transactions announced by this peer. */
    154+        std::vector<OrphanMap::iterator> m_iter_list;
    155+
    156+        /** Remove the element at list_pos in m_iter_list in O(1) time by swapping the last element
    157+         * with the one at list_pos and popping the back if there are multiple elements. Returns the
    158+         * swapped element, if applicable, so that the caller can update its list_pos.
    


    sipa commented at 2:18 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Would it be possible to do the list_pos updating here without needing to return an iteration to push that responsibility to the caller? It would mean RemoveIterAt would need to know what peer it’s operating in, so that means it’s perhaps more appropriate to have it as TxOrphanage member function rather than a PeerOrphanInfo member function.


    glozow commented at 8:58 pm on February 18, 2025:
    good point, I’ve made it a TxOrphanage method now
  75. in src/txorphanage.cpp:112 in 19c77223cd outdated
    110@@ -104,6 +111,7 @@ int TxOrphanage::EraseTx(const Wtxid& wtxid)
    111         m_orphan_list[old_pos] = it_last;
    112         it_last->second.list_pos = old_pos;
    


    sipa commented at 2:26 pm on February 18, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    I think this line, and the 7 lines before it, can be replaced with RemoveIterAt(it->second.list_pos), especially if it can be changed to do the list_pos updating internally?

    Not very important as the code disappears in the next commit, but would make it more obviously correct.


    glozow commented at 8:45 pm on February 18, 2025:
    Maybe I’m misunderstanding, but I don’t think we can use RemoveIterAt (which is for updating the peer list) for this (which is the global TxOrphanage::m_orphan_list)?

    sipa commented at 8:48 pm on February 18, 2025:
    Oh of course; it just looked very similar.
  76. in src/bench/txorphanage.cpp:102 in 2bffbd5c9d outdated
     96@@ -97,4 +97,105 @@ static void OrphanageEraseForBlockSinglePeer(benchmark::Bench& bench)
     97     });
     98 }
     99 
    100+static void OrphanageEvictionManyPeers(benchmark::Bench& bench)
    


    sipa commented at 3:21 pm on February 18, 2025:

    In commit “[bench] TxOrphanage::LimitOrphans”

    Would it make sense to introduce this benchmark earlier (and the other ones below), so we can see what effect the previous commit has on it?


    glozow commented at 8:58 pm on February 18, 2025:
    yes, moved it up
  77. DrahtBot added the label CI failed on Feb 18, 2025
  78. glozow force-pushed on Feb 18, 2025
  79. DrahtBot removed the label CI failed on Feb 18, 2025
  80. in src/bench/txorphanage.cpp:1 in 032b623753 outdated
    0@@ -0,0 +1,100 @@
    1+// Copyright (c) 2011-2022 The Bitcoin Core developers
    


    sipa commented at 1:12 am on February 19, 2025:

    In commit “[bench] TxOrphanage::EraseForBlock”

    Better not to have years listed than outdated/wrong ones.

  81. in src/txorphanage.h:184 in 56f05a73d8 outdated
    179@@ -168,6 +180,10 @@ class TxOrphanage {
    180 
    181     /** If there are more than max_orphans total orphans, evict randomly until that is no longer the case. */
    182     unsigned int MaybeTrimOrphans(unsigned int max_orphans, FastRandomContext& rng);
    183+
    184+    /** Remove the element at list_pos in m_iter_list in O(1) time by swapping the last element
    


    sipa commented at 1:18 am on February 19, 2025:

    In commit “[txorphanage] add per-peer iterator list and announcements accounting”

    Nit: I may have instigated this, but given the m_peer_orphanage_info.find(peer) call, it’s O(log n) really (in the number of peers, not O(1)).

  82. sipa commented at 2:09 pm on February 19, 2025: member

    The changes here have surprisingly little effect on the included benchmarks:

    • [bench] TxOrphanage::LimitOrphans
    ns/op op/s err% total benchmark
    68,742,718.06 14.55 0.2% 10.59 OrphanageEraseForBlockSinglePeer
    8,857.40 112,899.93 0.3% 10.86 OrphanageEvictionManyPeers
    1,544,919.67 647.28 0.2% 10.97 OrphanageWorksetManyPeers
    11,444,367.17 87.38 0.1% 11.00 OrphanageWorksetSinglePeer
    • [txorphanage] when full, evict from the DoSiest peers first
    ns/op op/s err% total benchmark
    65,643,925.00 15.23 1.7% 10.32 OrphanageEraseForBlockSinglePeer
    8,657.19 115,510.91 0.1% 10.83 OrphanageEvictionManyPeers
    1,472,007.25 679.34 0.6% 10.96 OrphanageWorksetManyPeers
    10,034,423.77 99.66 0.3% 11.05 OrphanageWorksetSinglePeer
    • [txorphanage] limit EraseForBlock iterations and use set instead of vec
    ns/op op/s err% total benchmark
    59,924,050.19 16.69 1.1% 10.98 OrphanageEraseForBlockSinglePeer
    8,684.76 115,144.26 0.3% 10.80 OrphanageEvictionManyPeers
    1,500,899.95 666.27 0.2% 11.10 OrphanageWorksetManyPeers
    10,619,315.89 94.17 0.3% 10.99 OrphanageWorksetSinglePeer
    • [txorphanage] when orphans are erased, delete them from worksets
    ns/op op/s err% total benchmark
    59,404,692.72 16.83 1.6% 10.70 OrphanageEraseForBlockSinglePeer
    8,824.15 113,325.33 0.1% 10.82 OrphanageEvictionManyPeers
    1,481,787.75 674.86 0.2% 10.96 OrphanageWorksetManyPeers
    10,452,273.40 95.67 0.5% 11.10 OrphanageWorksetSinglePeer
    • Increase default -maxorphantx to 3000
    ns/op op/s err% total benchmark
    59,868,390.44 16.70 1.8% 11.29 OrphanageEraseForBlockSinglePeer
    9,275.98 107,805.36 0.3% 10.97 OrphanageEvictionManyPeers
    1,538,763.63 649.87 0.2% 11.01 OrphanageWorksetManyPeers
    10,657,836.93 93.83 0.4% 11.03 OrphanageWorksetSinglePeer
  83. glozow commented at 2:25 pm on February 19, 2025: member

    The changes here have surprisingly little effect on the included benchmarks:

    My expectation was that the time would go up with these changes, but hopefully not by too much.

    I am pretty surprised that “[txorphanage] when full, evict from the DoSiest peers first” doesn’t make OrphanageEvictionManyPeers slower, but I guess it’s because there are only 24 transactions.

  84. glozow commented at 4:43 pm on February 19, 2025: member

    Will add something like this as comments as well but here’s the thinking around these benches:

    • The EraseForBlock bench exists to test the worst case EraseForBlock time, which is every orphan conflicting the maximum amount with the block. That’s (very roughly) ~2000 inputs per tx (within 400KWu), times the max number of orphans.
      • We’re kind of cheating in “limit EraseForBlock iterations and use set instead of vec” which forces it to stop at 100 orphans, even if the max number of orphans is increased.
      • There is also memory, which what changing to a set helps with. Due to the way the loops work, we would have 2000 * max number of orphans iterators in the vector.
    • The workset benches exist to test the worst case AddChildrenToWorkSet time. Similarly, worst case is when all orphans spend an output of the transaction. The complexity can also increase when there are multiple announcers for the orphans.
    • The eviction bench exists to test the worst case LimitOrphans time. Note it artificially uses max_orphans=0 even though that doesn’t happen in real life. The worst case is when every orphan has been announced by every peer, and we need to remove all announcers one by one before we actually erase anything. This would have been pretty bad before we set a global announcement limit.

    So for all of these benches, we primarily want them to not blow up when we increase the DEFAULT_MAX_ORPHAN_TRANSACTIONS.

  85. glozow removed this from the milestone 29.0 on Feb 20, 2025
  86. glozow added this to the milestone 30.0 on Feb 20, 2025
  87. sipa commented at 3:44 pm on February 20, 2025: member
    Sad to see this slip, but given the amount of changes and discoveries that necessitated them even in just the last week, it’s probably the right decision.
  88. glozow commented at 3:52 pm on February 20, 2025: member
    I’m sad too! Seeing the stats from https://delvingbitcoin.org/t/stats-on-orphanage-overflows/1421 made this more pressing in my opinion, but it’s not a regression. I think we can still try to consider small, obviously safe changes for v29, but this feels too big. I don’t want to risk creating new DoS problems.
  89. in src/txorphanage.cpp:339 in 021fc20f9c outdated
    332@@ -332,16 +333,19 @@ void TxOrphanage::EraseForBlock(const CBlock& block)
    333             if (itByPrev == m_outpoint_to_orphan_it.end()) continue;
    334             for (auto mi = itByPrev->second.begin(); mi != itByPrev->second.end(); ++mi) {
    335                 const CTransaction& orphanTx = *(*mi)->second.tx;
    336-                vOrphanErase.push_back(orphanTx.GetWitnessHash());
    337+                wtxids_to_erase.insert(orphanTx.GetWitnessHash());
    338+                // Stop to avoid doing too much work. If there are more orphans to erase, rely on
    339+                // expiration and evictions to clean up everything eventually.
    340+                if (wtxids_to_erase.size() >= 100) break;
    


    instagibbs commented at 6:47 pm on February 20, 2025:
    seems like we’re leaving only one of 3 loops? is this intended?
  90. in src/txorphanage.cpp:215 in 6cbe944539 outdated
    215+        // ratios.
    216+        std::pop_heap(peer_it_heap.begin(), peer_it_heap.end(), compare_peer);
    217+        auto it_worst_peer = peer_it_heap.back();
    218+        peer_it_heap.pop_back();
    219+
    220+        // Evict a random orphan from this peer.
    


    mzumsande commented at 7:31 pm on February 20, 2025:
    “Remove a random announcement from this peer.” seems better because we only sometimes evict an orphan.
  91. in src/txorphanage.cpp:219 in 6cbe944539 outdated
    219+
    220+        // Evict a random orphan from this peer.
    221+        size_t randompos = rng.randrange(it_worst_peer->second.m_iter_list.size());
    222+        auto it_to_evict = it_worst_peer->second.m_iter_list.at(randompos);
    223+
    224+        // Only erase this peer as an announcer, unless it is the only announcer. Otherwise peers
    


    mzumsande commented at 7:35 pm on February 20, 2025:
    This is a bit ambiguous, I first read it as “don’t erase the peer as an announcer if it was the only one”, but what is meant is “erase the peer as an announcer, also remove the orphan if the peer was the only announcer”, so maybe reword it.
  92. in src/txorphanage.h:243 in 6cbe944539 outdated
    242+    int64_t GetPerPeerMaxUsage() const {
    243+        return m_reserved_weight_per_peer;
    244+    }
    245+
    246+    int64_t GetGlobalMaxUsage() const {
    247+        return std::max<int64_t>(int64_t(m_peer_orphanage_info.size()) * m_reserved_weight_per_peer, 1);
    


    mzumsande commented at 7:45 pm on February 20, 2025:

    If I understand it correctly, the current logic is that we assign additional weight for all peers that have an entry in m_peer_orphanage_info, which means we don’t assign weight for peers not participating in tx relay or that do but never sent us an orphan before. But once they sent us an orphan, that weight will stay reserved until they disconnect (and can be used by other peers too), even if they never send us another orphan.

    This seems like a middle-ground between assigning each peer a fixed DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER share which may not be exceeded, and assigning a pool DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER for each peer (whether that peer sent us an orphan before or not) - was that the purpose of choosing to do it this way?


    glozow commented at 7:45 pm on February 24, 2025:

    If I understand it correctly, …

    Correct πŸ‘ the general idea is to put a hard cap on the memory usage per peer. However, if a peer is using a lot more than the others simply because it is very useful, we don’t penalize them until we run out of space.

    We could do a RegisterPeer type of thing when the peer first connects, but I don’t see any particular reason to do that extra step right now. Perhaps we can add this in the future, if we want to give peers a different reservation depending on the type of connection. This has a nice side effect where we don’t reserve any space for useless spy peers, and save a bit of extra work for short-lived connections or ones where no transactions are relayed, though I wouldn’t say this is a motivation.

  93. in src/txorphanage.h:42 in 6cbe944539 outdated
    33@@ -27,7 +34,26 @@ static constexpr auto ORPHAN_TX_EXPIRE_INTERVAL{5min};
    34  * Not thread-safe. Requires external synchronization.
    35  */
    36 class TxOrphanage {
    37+    /** The usage (weight) reserved for each peer, representing the amount of memory we are willing
    38+     * to allocate for orphanage space. Note that this number is a reservation, not a limit: peers
    39+     * are allowed to exceed this reservation until the global limit is reached, and peers are
    40+     * effectively guaranteed this amount of space. Reservation is per-peer, so the global upper
    41+     * bound on memory usage scales up with more peers. */
    42+    unsigned int m_reserved_weight_per_peer{DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER};
    


    mzumsande commented at 8:30 pm on February 20, 2025:

    One possibility is that an attacker who wants to spam us with orphans can do so via other peers, without providing the orphans themselves:

    1.) Attacker A inv’s a parent P to victim V but doesn’t answer GETDATA (2 minutes timeout) 2.) Within these 2 minutes, A sends P and multiple non-conflicting children C to the rest of the network - these are accepted/relayed by all other peers, so will eventually be announced to V by all of its legitimate peers. 3.) All of V’s legitimate peers will announce P and C, V will only request C (because P is in flight) and saves them as orphans / adds all of its peers as announcers. 4.) V’s orphanage reaches its limits. With 125 peers, you’d need ~25 children to reach the max announcement limit of 3000, so you might need to split the children over multiple parents because of the mempool descendant limit. 5.) V will start evicting announcements randomly from peers, so it may also evict legitimate unrelated orphans.

    Unlike with the status quo where attackers can just spam us with orphans for free, this is not free (P and C need to be valid transactions with a sufficient fee to be relayed), but I couldn’t think of any countermeasure, so maybe it kind of sets a theoretical limit on how certain we can be that legitimate orphans are resolved?


    instagibbs commented at 10:48 pm on February 20, 2025:

    IIUC this is basically a way of slowing down the parent resolution of a target package by 2 minutes, thereby increasing the window of possible random eviction when under real cpfp traffic.

    Without this delay we can still have natural overflow if there are too many (or too large) real packages in flight, the time window here just gets a lot longer

    Unfortunately we have no way of peers to communicate which orphans they claim are higher paying (to preferentially evict those last f.e.). Then at least orphan package makers could try to outbid the queue. Maybe something to think about with a future sender-initiated protocol?

    Being more aggressive about timing out inbound peers(and or txid-relay) when they aren’t responding to getdatas could also be on the table, at the cost of bandwidth in the average case over a slow link.

    re:(2) you could also just imagine the scenario where the attacker simply relies on natural cpfp traffic and may just get lucky that it gets evicted in the two minutes. Costs the attacker nothing in this case though chances of failure are likely way higher?


    mzumsande commented at 4:49 pm on February 21, 2025:

    IIUC this is basically a way of slowing down the parent resolution of a target package by 2 minutes, thereby increasing the window of possible random eviction when under real cpfp traffic.

    Not sure if I understood that right, but I think that would be a different kind of attack where the attacker knows the target packet. What I meant was that that the attacker crafts spam transactions (P/C) that the rest of the network relays, but that end up only in the victim’s orphanage, resulting in eviction of random other orphans (about which the attacker doesn’t need to know any specifics) - similar to how an attacker could today just spam orphans to trigger random eviction on master, but no longer for free.


    glozow commented at 8:00 pm on February 24, 2025:

    My general impression is that, since these “spam” orphans are real, fee-paying transactions that end up in mempool, they are equally useful / equally deserve the space. I do think this represents a bound on how much total orphan volume we can handle, but I can’t really see why the victim node should evict these orphans over the others, just because they only originate from 1 peer.

    Unfortunately we have no way of peers to communicate which orphans they claim are higher paying (to preferentially evict those last f.e.). Then at least orphan package makers could try to outbid the queue. Maybe something to think about with a future sender-initiated protocol?

    Even if we had a way to communicate this information, I don’t think it could be trusted. I think the only real solution to this is a sender-initiated protocol. I have half a mind to go ahead and propose a quick-and-dirty sender-initiated protocol for 1p1cs right now, but maybe that’s too short term thinking.

  94. glozow commented at 9:13 pm on March 4, 2025: member

    Following coredev discussions, I’m working on a few things:

    • given selected peer, evict by entry/announcement time instead of randomly
    • shorten expiry time (and GETDATA interval)
    • rewrite as boost multi-index
    • clarify benches
  95. instagibbs commented at 9:29 pm on March 4, 2025: member

    shorten expiry time (and GETDATA interval)

    Is this worth splitting out as its own PR?

  96. glozow commented at 9:45 pm on March 4, 2025: member
    Yes! I think multi index could also be its own PR. But wanted to give a status update here.
  97. glozow marked this as a draft on Mar 9, 2025
  98. DrahtBot added the label Needs rebase on Mar 16, 2025
  99. glozow force-pushed on May 19, 2025
  100. glozow force-pushed on May 19, 2025
  101. DrahtBot added the label CI failed on May 19, 2025
  102. DrahtBot commented at 3:27 pm on May 19, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/42487913320 LLM reason (✨ experimental): The CI failure is due to multiple linting errors, including circular dependencies, missing include guards, and spelling mistakes.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  103. glozow force-pushed on May 19, 2025
  104. DrahtBot removed the label Needs rebase on May 19, 2025
  105. glozow force-pushed on May 20, 2025
  106. DrahtBot removed the label CI failed on May 20, 2025
  107. glozow force-pushed on May 22, 2025
  108. glozow marked this as ready for review on May 22, 2025
  109. glozow commented at 1:07 pm on May 22, 2025: member

    This is ready for review again. Main changes from February: it includes the rewrite as a boost::multi_index container, removes -maxorphantxs entirely, and drops the benches that weren’t demonstrating anything. I’ve updated the PR description.

    Rewrite and eviction changes are in the same commit. I didn’t think it made sense to first reimplement the old design as a multi_index, because the old eviction strategy requires twice as many indexes. If you are familiar with the old design and just want to see the behavior changes applied to it before comparing it with the new impl, I have a copy of the original PR here.

  110. in src/node/txorphanage_impl.h:221 in 455b4b8178 outdated
    216+            ++it;
    217+        }
    218+        return count;
    219+    }
    220+
    221+    /** Return number of announcements with this wtxid. */
    


    instagibbs commented at 5:36 pm on May 27, 2025:

    extra fn doesn’t seem worth it

     0diff --git a/src/node/txorphanage_impl.h b/src/node/txorphanage_impl.h
     1index 89974ec506..b6137845b4 100644
     2--- a/src/node/txorphanage_impl.h
     3+++ b/src/node/txorphanage_impl.h
     4@@ -205,11 +205,11 @@ class TxOrphanageImpl
     5     }
     6 
     7-    /** Return number of announcements with the same wtxid as it. */
     8-    unsigned int CountSameWtxid(Iter<ByWtxid> it) const
     9+    /** Return number of announcements with this wtxid. */
    10+    unsigned int CountWtxid(const Wtxid& wtxid) const
    11     {
    12+        auto it = m_orphans.get<ByWtxid>().lower_bound(ByWtxidView{wtxid, min_peer});
    13         if (it == m_orphans.end()) return 0;
    14 
    15         unsigned int count{0};
    16-        const auto& wtxid{it->m_tx->GetWitnessHash()};
    17         while (it != m_orphans.end() && it->m_tx->GetWitnessHash() == wtxid) {
    18             ++count;
    19@@ -218,12 +218,4 @@ class TxOrphanageImpl
    20         return count;
    21     }
    22-
    23-    /** Return number of announcements with this wtxid. */
    24-    unsigned int CountWtxid(const Wtxid& wtxid) const
    25-    {
    26-        auto it = m_orphans.get<ByWtxid>().lower_bound(ByWtxidView{wtxid, min_peer});
    27-        if (it == m_orphans.end()) return 0;
    28-        return CountSameWtxid(it);
    29-    }
    30 public:
    31     TxOrphanageImpl() = default;
    

    glozow commented at 2:36 pm on June 2, 2025:
    have consolidated the two
  111. in src/node/txorphanage_impl.h:616 in 455b4b8178 outdated
    624+
    625+            // If needs trim, then at least one peer has a DoS score higher than 1.
    626+            Assume(dos_score.fee > dos_score.size);
    627+
    628+            // Evict the oldest announcement from this peer, sorting non-reconsiderable before reconsiderable.
    629+            auto it_ann = m_orphans.get<ByPeer>().lower_bound(ByPeerView{worst_peer, false, 0});
    


    instagibbs commented at 6:26 pm on May 27, 2025:

    455b4b817884f860cd4467f0a9be4a459e89891c

    Take or leave suggestion to reduce churn in the heap, did no benchmarking but may reduce work for most DoSy peers.

     0diff --git a/src/node/txorphanage_impl.h b/src/node/txorphanage_impl.h
     1index 89974ec506..6fe7422054 100644
     2--- a/src/node/txorphanage_impl.h
     3+++ b/src/node/txorphanage_impl.h
     4@@ -623,20 +623,27 @@ public:
     5             heap_peer_dos.pop_back();
     6 
     7+            auto it_worst_peer = m_peer_orphanage_info.find(worst_peer);
     8+
     9             // If needs trim, then at least one peer has a DoS score higher than 1.
    10             Assume(dos_score.fee > dos_score.size);
    11 
    12-            // Evict the oldest announcement from this peer, sorting non-reconsiderable before reconsiderable.
    13-            auto it_ann = m_orphans.get<ByPeer>().lower_bound(ByPeerView{worst_peer, false, 0});
    14-            Assume(it_ann->m_announcer == worst_peer);
    15-            Erase<ByPeer>(it_ann, /*cleanup_outpoints_map=*/CountWtxid(it_ann->m_tx->GetWitnessHash()) == 1);
    16-            num_erased += 1;
    17-
    18-            // Unless this peer is empty (which should never happen as long as per-peer reserved usage is at least as
    19-            // large as the maximum allowed orphan size), put it back in the heap so we continue to consider evicting
    20-            // its orphans. Calculate the DoS score anew. This peer might still be the DoSiest one.
    21-            auto it_worst_peer = m_peer_orphanage_info.find(worst_peer);
    22-            if (it_worst_peer != m_peer_orphanage_info.end() && it_worst_peer->second.m_count_announcements > 0) {
    23-                heap_peer_dos.emplace_back(worst_peer, it_worst_peer->second.GetDosScore(max_ann, max_mem));
    24-                std::push_heap(heap_peer_dos.begin(), heap_peer_dos.end(), compare_score);
    25+            if (it_worst_peer != m_peer_orphanage_info.end()) {
    26+                // Avoid churn by re-entering only when worst peer's score is no longer worst
    27+                // Evict the oldest announcement from this peer, sorting non-reconsiderable before reconsiderable.
    28+                auto it_ann = m_orphans.get<ByPeer>().lower_bound(ByPeerView{worst_peer, false, 0});
    29+                const auto& next_dos_score = heap_peer_dos.empty() ? FeeFrac{0, 1} : heap_peer_dos.front().second;
    30+                while (NeedsTrim() && it_worst_peer->second.GetDosScore(max_ann, max_mem) > next_dos_score) {
    31+                    Assume(it_ann->m_announcer == worst_peer);
    32+                    Erase<ByPeer>(it_ann, /*cleanup_outpoints_map=*/CountWtxid(it_ann->m_tx->GetWitnessHash()) == 1);
    33+                    num_erased += 1;
    34+                    it_ann++; // advance to next announcement from this peer in case loop continues
    35+                }
    36+                // Unless this peer is empty (which should never happen as long as per-peer reserved usage is at least as
    37+                // large as the maximum allowed orphan size), put it back in the heap so we continue to consider evicting
    38+                // its orphans. Calculate the DoS score anew. This peer might still be the DoSiest one.
    39+                if (it_worst_peer->second.m_count_announcements > 0) {
    40+                    heap_peer_dos.emplace_back(worst_peer, it_worst_peer->second.GetDosScore(max_ann, max_mem));
    41+                    std::push_heap(heap_peer_dos.begin(), heap_peer_dos.end(), compare_score);
    42+                }
    43             }
    44         } while (!heap_peer_dos.empty() && NeedsTrim());
    

    glozow commented at 7:11 pm on June 5, 2025:
    Implemented the general idea, i.e. keep trimming until this peer wouldn’t be the next thing we pop from the heap. There was some iterator invalidation going on here
  112. in src/txorphanage.cpp:330 in aed51fe7d5 outdated
    326@@ -327,10 +327,10 @@ void TxOrphanage::SanityCheck() const
    327     // Check that cached m_total_announcements is correct
    328     unsigned int counted_total_announcements{0};
    329     // Check that m_total_orphan_usage is correct
    330-    unsigned int counted_total_usage{0};
    331+    int64_t counted_total_usage{0};
    


    instagibbs commented at 3:11 pm on May 28, 2025:

    aed51fe7d5cbcc43eba2be3cd5af666fe1d95dd7

    give motivation in commit message for the change?


    glozow commented at 4:34 pm on June 2, 2025:
    added
  113. in src/node/txorphanage.cpp:40 in 737c5127df outdated
    36@@ -37,7 +37,7 @@ bool TxOrphanage::AddTx(const CTransactionRef& tx, NodeId peer)
    37         return false;
    38     }
    39 
    40-    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, {peer}, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME}, m_orphan_list.size()});
    41+    auto ret = m_orphans.emplace(wtxid, OrphanTx{{tx, {peer}}, Now<NodeSeconds>() + ORPHAN_TX_EXPIRE_TIME, m_orphan_list.size()});
    


    instagibbs commented at 3:15 pm on May 28, 2025:

    737c5127df841e9c8037b1885284f80b0aba17dd

    might be good to note in commit message the getorphantxs is experimental, so breaking is considered ok


    glozow commented at 4:34 pm on June 2, 2025:
    added
  114. in src/test/fuzz/txdownloadman.cpp:307 in f2dcdbf700 outdated
    303@@ -305,9 +304,9 @@ FUZZ_TARGET(txdownloadman_impl, .init = initialize)
    304     // Initialize a TxDownloadManagerImpl
    305     bilingual_str error;
    306     CTxMemPool pool{MemPoolOptionsForTest(g_setup->m_node), error};
    307-    const auto max_orphan_count = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(0, 300);
    308+    const auto max_orphan_count = node::DEFAULT_MAX_ORPHAN_TRANSACTIONS;
    


    instagibbs commented at 4:02 pm on May 28, 2025:

    f2dcdbf700b3b20a315f5a6eec57c7463955fe43

    if this goes away later anyways, ignore, but CheckInvariants could just directly use node::DEFAULT_MAX_ORPHAN_TRANSACTIONS


    glozow commented at 8:45 pm on June 3, 2025:
    ignoring because the constant gets deleted
  115. in src/node/txorphanage_impl.h:77 in 455b4b8178 outdated
    72+        int64_t GetUsage()  const {
    73+            return GetTransactionWeight(*m_tx);
    74+        }
    75+    };
    76+
    77+    // Index by wtxid, then peer. Uses:
    


    instagibbs commented at 6:51 pm on May 28, 2025:

    455b4b817884f860cd4467f0a9be4a459e89891c

    nit: unsure if “uses” section adds a lot to clarity vs reading the code


    glozow commented at 1:59 pm on June 2, 2025:
    removed
  116. in src/node/txorphanage_impl.h:70 in 455b4b8178 outdated
    83+    struct ByWtxid {};
    84+    using ByWtxidView = std::tuple<Wtxid, NodeId>;
    85+    struct WtxidExtractor
    86+    {
    87+        using result_type = ByWtxidView;
    88+        result_type operator()(const Announcement& ann) const
    


    instagibbs commented at 7:08 pm on May 28, 2025:
    not sure what result_type is doing here or in ByPeerViewExtractor

    glozow commented at 12:24 pm on June 2, 2025:
    It’s a part of what boost multiindex requires for its “key extractor” concept: https://www.boost.org/doc/libs/1_88_0/libs/multi_index/doc/reference/key_extraction.html
  117. in src/node/txorphanage_impl.h:169 in 455b4b8178 outdated
    164+            return std::max<FeeFrac>(cpu_score, mem_score);
    165+        }
    166+    };
    167+    /** Store per-peer statistics. Used to determine each peer's DoS score. */
    168+    std::unordered_map<NodeId, PeerInfo> m_peer_orphanage_info;
    169+    using PeerMap = decltype(m_peer_orphanage_info);
    


    instagibbs commented at 7:16 pm on May 28, 2025:
    PeerMap is unused circa 455b4b817884f860cd4467f0a9be4a459e89891c

    glozow commented at 2:31 pm on June 2, 2025:
    removed
  118. in src/test/orphanage_tests.cpp:134 in b5fe4383ad outdated
    131 
    132     // ... and 50 that depend on other orphans:
    133     for (int i = 0; i < 50; i++)
    134     {
    135-        CTransactionRef txPrev = orphanage.RandomOrphan();
    136+        CTransactionRef txPrev = Random(orphans_added, m_rng);
    


    sipa commented at 7:21 pm on May 28, 2025:

    In commit “[prep/test] modify test to not access TxOrphanage internals”

    It seems all invocations of Random() assume that the returned value won’t be nullptr anyway, so I think they can just be replaced with:

    0CTransactionRef txPrev = orphans_added[m_rng.randrange(orphans_added.size())];
    

    glozow commented at 4:31 pm on June 2, 2025:
    done
  119. in src/node/txorphanage_impl.h:132 in 455b4b8178 outdated
    127+
    128+    /** Index from the parents' outputs to wtxids that exist in m_orphans. Used to find children of
    129+     * a transaction that can be reconsidered and to remove entries that conflict with a block.*/
    130+    std::map<COutPoint, std::set<Wtxid>> m_outpoint_to_orphan_it;
    131+
    132+    struct PeerInfo {
    


    instagibbs commented at 7:24 pm on May 28, 2025:
    nit: s/PeerInfo/PeerDoSInfo/

    glozow commented at 2:30 pm on June 2, 2025:
    done
  120. in src/node/txorphanage_impl.h:142 in 455b4b8178 outdated
    157+        * do not trim unless the orphanage exceeds global limits, but it means that this peer will
    158+        * be selected for trimming sooner. If the global announcement or global memory usage
    159+        * limits are exceeded, it must be that there is a peer whose DoS score > 1. */
    160+        FeeFrac GetDosScore(unsigned int max_peer_count, int64_t max_peer_bytes) const
    161+        {
    162+            const FeeFrac cpu_score(m_count_announcements, max_peer_count);
    


    instagibbs commented at 7:27 pm on May 28, 2025:
    probably should assert the denominators are not 0, otherwise FeeFrac comparison becomes nonsensical

    glozow commented at 2:31 pm on June 2, 2025:
    added
  121. in src/node/txorphanage_impl.h:244 in 455b4b8178 outdated
    239+    int64_t TotalOrphanUsage() const { return m_unique_orphan_bytes; }
    240+
    241+    /** Number of unique orphans */
    242+    unsigned int CountUniqueOrphans() const { return m_unique_orphans; }
    243+
    244+    /** Number of orphans from this peer */
    


    instagibbs commented at 7:39 pm on May 28, 2025:
    0    /** Number of stored orphans from this peer */
    

    glozow commented at 2:37 pm on June 2, 2025:
    done
  122. in src/node/txorphanage_impl.h:350 in 455b4b8178 outdated
    345+        // have the tx data.
    346+        if (it == m_orphans.end()) return false;
    347+        if (it->m_tx->GetWitnessHash() != wtxid) return false;
    348+
    349+        // Quit if we already have this announcement (same wtxid and peer).
    350+        if (HaveTxFromPeer(wtxid, peer)) return false;
    


    instagibbs commented at 7:54 pm on May 28, 2025:
    could use it->m_announcer == peer since you already did the lookup

    glozow commented at 2:44 pm on June 2, 2025:
    removed
  123. in src/node/txorphanage_impl.h:361 in 455b4b8178 outdated
    356+
    357+        ++m_current_sequence;
    358+        auto& peer_info = m_peer_orphanage_info.try_emplace(peer).first->second;
    359+        peer_info.Add(*ret.first);
    360+
    361+        const auto& txid = ret.first->m_tx->GetHash();
    


    instagibbs commented at 7:56 pm on May 28, 2025:
    0        const auto& txid = ptx->GetHash();
    

    glozow commented at 2:44 pm on June 2, 2025:
    done
  124. sipa commented at 8:35 pm on May 28, 2025: member
    The PR title may need updating, as -maxorphantxs is gone now.
  125. in src/node/txorphanage_impl.h:395 in 455b4b8178 outdated
    390+        auto& index_by_peer = m_orphans.get<ByPeer>();
    391+        auto it = index_by_peer.lower_bound(ByPeerView{peer, false, 0});
    392+        unsigned int num_erased{0};
    393+        while (it != index_by_peer.end() && it->m_announcer == peer) {
    394+            // Decide what will happen next before the iter is invalidated.
    395+            const bool last_item{std::next(it) == index_by_peer.end() || std::next(it)->m_announcer != peer};
    


    instagibbs commented at 8:36 pm on May 28, 2025:

    455b4b817884f860cd4467f0a9be4a459e89891c

    Little confused how this logic is necessary. The while loop will exit as soon as it is incremented to index_by_peer.end() or it->m_announcer != peer. What am I missing?

    Same with EraseAll


    glozow commented at 2:45 pm on June 2, 2025:
    right, removed
  126. in src/node/txorphanage_impl.h:411 in 455b4b8178 outdated
    406+
    407+        if (num_erased > 0) LogDebug(BCLog::TXPACKAGES, "Erased %d orphan transaction(s) from peer=%d\n", num_erased, peer);
    408+    }
    409+
    410+    /** Erase all entries with this wtxid. Return the number of announcements erased. */
    411+    unsigned int EraseAll(const Wtxid& wtxid)
    


    instagibbs commented at 8:45 pm on May 28, 2025:

    455b4b817884f860cd4467f0a9be4a459e89891c

    difference with EraseTx is pretty subtle, and only EraseTx appears to be used externally. Make this private or just subsume into EraseTx?


    glozow commented at 2:46 pm on June 2, 2025:
    made private
  127. in src/node/txorphanage_impl.h:21 in ea3a65e698 outdated
    16+#include <util/epochguard.h>
    17+#include <util/hasher.h>
    18+#include <util/result.h>
    19+#include <util/feefrac.h>
    20+
    21+#include <boost/multi_index/hashed_index.hpp>
    


    sipa commented at 8:49 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Many of the includes here seem unused:

     0--- a/src/node/txorphanage_impl.h
     1+++ b/src/node/txorphanage_impl.h
     2@@ -5,33 +5,20 @@
     3 #ifndef BITCOIN_NODE_TXORPHANAGE_IMPL_H
     4 #define BITCOIN_NODE_TXORPHANAGE_IMPL_H
     5 
     6-#include <coins.h>
     7-#include <consensus/amount.h>
     8-#include <indirectmap.h>
     9 #include <logging.h>
    10-#include <net.h>
    11 #include <policy/policy.h>
    12 #include <primitives/transaction.h>
    13-#include <sync.h>
    14-#include <util/epochguard.h>
    15 #include <util/hasher.h>
    16 #include <util/result.h>
    17 #include <util/feefrac.h>
    18 
    19-#include <boost/multi_index/hashed_index.hpp>
    20-#include <boost/multi_index/identity.hpp>
    21 #include <boost/multi_index/indexed_by.hpp>
    22 #include <boost/multi_index/ordered_index.hpp>
    23-#include <boost/multi_index/sequenced_index.hpp>
    24 #include <boost/multi_index/tag.hpp>
    25 #include <boost/multi_index_container.hpp>
    26 
    27-#include <atomic>
    28 #include <map>
    29-#include <optional>
    30 #include <set>
    31-#include <string>
    32-#include <string_view>
    33 #include <utility>
    34 #include <vector>
    

    glozow commented at 1:58 pm on June 2, 2025:
    Fixed
  128. in src/node/txorphanage_impl.h:454 in 455b4b8178 outdated
    449+    {
    450+        auto it = m_orphans.get<ByPeer>().lower_bound(ByPeerView{peer, true, 0});
    451+        return it != m_orphans.get<ByPeer>().end() && it->m_announcer == peer && it->m_reconsider;
    452+    }
    453+
    454+    /** If there is a tx that can be reconsidered, return it. Otherwise, return a nullptr. */
    


    instagibbs commented at 8:49 pm on May 28, 2025:

    just noting this is going to get oldest-reconsidered-by-peer first

    sounds logical


    glozow commented at 12:26 pm on June 2, 2025:
    Yes, seemed like the most fair way to do it.
  129. in src/node/txorphanage_impl.h:108 in ea3a65e698 outdated
    125+    /** Total bytes used by orphans, deduplicated by wtxid. */
    126+    unsigned int m_unique_orphan_bytes{0};
    127+
    128+    /** Index from the parents' outputs to wtxids that exist in m_orphans. Used to find children of
    129+     * a transaction that can be reconsidered and to remove entries that conflict with a block.*/
    130+    std::map<COutPoint, std::set<Wtxid>> m_outpoint_to_orphan_it;
    


    sipa commented at 8:52 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Would it be possible to use std::map<COutPoint, std::set<Iter>> here as type? That would be faster (avoiding a lookup to resolve Wtxid -> Iter) and use less memory. boost::multi_index iterator objects are stable (remain valid as long as the object they point to exists), unlike std::unordered_map.


    glozow commented at 6:01 pm on June 5, 2025:
    I went pretty far into implementing this, but realized there is a downside to this approach - the set stores entries for each announcement instead of each unique orphan. It requires us to update this map each time a new announcement is added/removed instead of just for unique ones. Perhaps worth keeping as Wtxid - what do you think?

    sipa commented at 6:14 pm on June 5, 2025:
    Ah, I see. So you could have an $\mathcal{O(n)}$ blowup factor with $n$ the number of peers that have announced that Wtxid? If so, that doesn’t sound like a worthwhile tradeoff.
  130. in src/node/txorphanage_impl.h:480 in 455b4b8178 outdated
    475+            for (const auto& input : block_tx.vin) {
    476+                auto it_prev = m_outpoint_to_orphan_it.find(input.prevout);
    477+                if (it_prev != m_outpoint_to_orphan_it.end()) {
    478+                    // Copy all wtxids to wtxids_to_erase.
    479+                    std::copy(it_prev->second.cbegin(), it_prev->second.cend(), std::inserter(wtxids_to_erase, wtxids_to_erase.end()));
    480+                    for (const auto& wtxid : it_prev->second) {
    


    instagibbs commented at 8:53 pm on May 28, 2025:

    455b4b817884f860cd4467f0a9be4a459e89891c

    doesn’t the inserter above do this


    glozow commented at 2:47 pm on June 2, 2025:
    ha yes, removed the duplicate
  131. in src/node/txorphanage_impl.h:483 in 455b4b8178 outdated
    490+        }
    491+
    492+        if (num_erased != 0) {
    493+            LogDebug(BCLog::TXPACKAGES, "Erased %d orphan transaction(s) included or conflicted by block\n", num_erased);
    494+        }
    495+        return wtxids_to_erase.size();
    


    instagibbs commented at 8:56 pm on May 28, 2025:
    0        Assume(wtxids_to_erase.size() == num_erased);
    1        return wtxids_to_erase.size();
    

    glozow commented at 2:49 pm on June 2, 2025:
    added
  132. in src/node/txorphanage_impl.h:163 in ea3a65e698 outdated
    179+    {
    180+        // Update m_peer_orphanage_info and clean up entries if they point to an empty struct.
    181+        // This means peers that are not storing any orphans do not have an entry in
    182+        // m_peer_orphanage_info (they can be added back later if they announce another orphan) and
    183+        // ensures disconnected peers are not tracked forever.
    184+        auto peer_it = m_peer_orphanage_info.find(it->m_announcer);
    


    sipa commented at 9:00 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Assume(peer_it != m_peer_orphanage_info.end()); ?


    glozow commented at 2:34 pm on June 2, 2025:
    added
  133. in src/node/txorphanage_impl.h:207 in ea3a65e698 outdated
    202+            }
    203+        }
    204+        m_orphans.get<Tag>().erase(it);
    205+    }
    206+
    207+    /** Return number of announcements with the same wtxid as it. */
    


    sipa commented at 9:04 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    This has an unstated assumption that it is the iterator the first entry in the ByWtxid index for a given wtxid.

    Given that there is only one call site (CountWtxid below), maybe it is better to inline this function there?


    glozow commented at 2:35 pm on June 2, 2025:
    done
  134. in src/node/txorphanage_impl.h:550 in 455b4b8178 outdated
    545+                    // inputs. However, we don't want to create an issue in which the assigned peer can purposefully stop us
    546+                    // from processing the orphan by disconnecting.
    547+                    const auto num_announcers{CountWtxid(wtxid)};
    548+                    if (!Assume(num_announcers > 0)) continue;
    549+                    std::advance(it, rng.randrange(num_announcers));
    550+                    if (!Assume(it->m_tx->GetWitnessHash() == wtxid)) continue;
    


    instagibbs commented at 9:06 pm on May 28, 2025:

    probably too cautious already but

    0                    if (!Assume(it->m_tx->GetWitnessHash() == wtxid)) break;
    

    glozow commented at 3:06 pm on June 2, 2025:
    done
  135. in src/node/txorphanage_impl.h:235 in ea3a65e698 outdated
    230+    TxOrphanageImpl(unsigned int max_global_ann, int64_t reserved_peer_usage) :
    231+        m_max_global_announcements{max_global_ann},
    232+        m_reserved_usage_per_peer{reserved_peer_usage}
    233+    {}
    234+
    235+    /** Number of announcements ones for the same wtxid are not de-duplicated. */
    


    sipa commented at 9:14 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Nit: grammar parse error.


    glozow commented at 2:37 pm on June 2, 2025:
    fixed
  136. in src/node/txorphanage_impl.h:591 in 455b4b8178 outdated
    586+     * amount of announcements and space for each peer. The reserved amount is protected from eviction even if there
    587+     * are peers spamming the orphanage.
    588+     */
    589+    void LimitOrphans()
    590+    {
    591+        if (m_orphans.empty() || !NeedsTrim()) return;
    


    instagibbs commented at 9:14 pm on May 28, 2025:
    0        if (!NeedsTrim()) return;
    

    glozow commented at 3:06 pm on June 2, 2025:
    done
  137. in src/node/txorphanage_impl.h:259 in ea3a65e698 outdated
    254+    }
    255+
    256+    void SanityCheck() const
    257+    {
    258+        std::unordered_map<NodeId, PeerInfo> reconstructed_peer_info;
    259+        std::map<Wtxid, int64_t > unique_wtxids_to_usage;
    


    sipa commented at 9:17 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Nit: after int64_t.


    glozow commented at 2:37 pm on June 2, 2025:
    removed
  138. in src/node/txorphanage_impl.h:268 in ea3a65e698 outdated
    263+            for (const auto& input : it->m_tx->vin) {
    264+                all_outpoints.insert(input.prevout);
    265+            }
    266+            unique_wtxids_to_usage.emplace(it->m_tx->GetWitnessHash(), it->GetUsage());
    267+
    268+            auto& peer_info = reconstructed_peer_info.try_emplace(it->m_announcer).first->second;
    


    sipa commented at 9:18 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Nit: maybe auto& peer_info = reconstructed_peer_info[it->m_announcer]; ?


    glozow commented at 2:42 pm on June 2, 2025:
    done
  139. in src/node/txorphanage_impl.h:44 in ea3a65e698 outdated
    39+/** Default value for TxOrphanage::m_reserved_usage_per_peer. */
    40+static constexpr int64_t DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER{404'000};
    41+/** Default value for TxOrphanage::m_max_global_announcements. */
    42+static constexpr unsigned int DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS{3000};
    43+/** Minimum NodeId for lower_bound lookups (in practice, NodeIds start at 0). */
    44+static constexpr NodeId min_peer{std::numeric_limits<NodeId>::min()};
    


    sipa commented at 9:21 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Style: MIN_PEER? It wasn’t immediately clear when reading the code that this referred to a constant.


    glozow commented at 1:58 pm on June 2, 2025:
    makes sense, done
  140. in src/node/txorphanage_impl.h:300 in ea3a65e698 outdated
    295+    bool AddTx(const CTransactionRef& tx, NodeId peer)
    296+    {
    297+        const auto& wtxid{tx->GetWitnessHash()};
    298+        const auto& txid{tx->GetHash()};
    299+        // Quit if we already have this announcement (same wtxid and peer).
    300+        if (HaveTxFromPeer(wtxid, peer)) return false;
    


    sipa commented at 9:26 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    This could be avoided by checking the ret.first result of auto ret = m_orphans.get<ByWtxid>().emplace(tx, peer, m_current_sequence); below. The ByWtxid index is unique, so emplacement will fail if the same the wtxid/peer combination already exist, I think.


    glozow commented at 2:43 pm on June 2, 2025:
    looks correct, done
  141. in src/node/txorphanage_impl.h:350 in ea3a65e698 outdated
    345+        // have the tx data.
    346+        if (it == m_orphans.end()) return false;
    347+        if (it->m_tx->GetWitnessHash() != wtxid) return false;
    348+
    349+        // Quit if we already have this announcement (same wtxid and peer).
    350+        if (HaveTxFromPeer(wtxid, peer)) return false;
    


    sipa commented at 9:27 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    This can be avoided by checking ret.first from the auto ret = m_orphans.get<ByWtxid>().emplace(ptx, peer, m_current_sequence); call below, because the ByWtxid index will fail if the same announcement already exists.


    glozow commented at 2:44 pm on June 2, 2025:
    yes, done
  142. in src/node/txorphanage_impl.h:454 in ea3a65e698 outdated
    449+    {
    450+        auto it = m_orphans.get<ByPeer>().lower_bound(ByPeerView{peer, true, 0});
    451+        return it != m_orphans.get<ByPeer>().end() && it->m_announcer == peer && it->m_reconsider;
    452+    }
    453+
    454+    /** If there is a tx that can be reconsidered, return it. Otherwise, return a nullptr. */
    


    sipa commented at 9:31 pm on May 28, 2025:

    In commit “feature: Add TxOrphanageImpl”

    Maybe note that if a tx is returned, it is also marked as non-reconsiderable.


    glozow commented at 2:46 pm on June 2, 2025:
    added in the documentation
  143. sipa commented at 9:31 pm on May 28, 2025: member
    Some comments already.
  144. in src/node/txorphanage_impl.h:631 in 455b4b8178 outdated
    626+            Assume(dos_score.fee > dos_score.size);
    627+
    628+            // Evict the oldest announcement from this peer, sorting non-reconsiderable before reconsiderable.
    629+            auto it_ann = m_orphans.get<ByPeer>().lower_bound(ByPeerView{worst_peer, false, 0});
    630+            Assume(it_ann->m_announcer == worst_peer);
    631+            Erase<ByPeer>(it_ann, /*cleanup_outpoints_map=*/CountWtxid(it_ann->m_tx->GetWitnessHash()) == 1);
    


    instagibbs commented at 9:39 pm on May 28, 2025:
    If you had MaxGlobalAnnouncements peers all announcing the same tx, it would end up being n^2 * logn work with CountWtxid calls?

    glozow commented at 11:03 am on June 1, 2025:
    Oof yes! Should be replaced with an “IsUnique” type of thing.

    glozow commented at 3:15 pm on June 2, 2025:
    Added an IsUnique to avoid this
  145. instagibbs commented at 9:45 pm on May 28, 2025: member

    reviewed through 455b4b817884f860cd4467f0a9be4a459e89891c

    apologies if later commits clear up my confusion

  146. in src/test/orphanage_tests.cpp:126 in c9a34d4cdb outdated
    121+    // Single peer: eviction is triggered if either limit is hit
    122+    {
    123+        // Test announcement limits
    124+        NodeId peer{8};
    125+        node::TxOrphanageImpl orphanage_low_ann(/*max_global_ann=*/1, /*reserved_peer_usage=*/TX_SIZE * 10);
    126+        node::TxOrphanageImpl orphanage_low_mem(/*max_global_ann=*/10, /*reserved_peer_usage=*/TX_SIZE + 1);
    


    instagibbs commented at 2:15 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    0        node::TxOrphanageImpl orphanage_low_mem(/*max_global_ann=*/10, /*reserved_peer_usage=*/TX_SIZE);
    

    glozow commented at 3:39 pm on June 2, 2025:
    done
  147. in src/test/orphanage_tests.cpp:143 in c9a34d4cdb outdated
    138+        orphanage_low_mem.AddTx(TXNS.at(1), peer);
    139+        BOOST_CHECK(orphanage_low_mem.CountAnnouncements() <= orphanage_low_mem.MaxGlobalAnnouncements());
    140+        BOOST_CHECK(orphanage_low_mem.TotalOrphanUsage() > orphanage_low_mem.MaxGlobalUsage());
    141+        BOOST_CHECK(orphanage_low_mem.NeedsTrim());
    142+
    143+        BOOST_CHECK_EQUAL(CheckNumEvictions(orphanage_low_mem), 1);
    


    instagibbs commented at 2:20 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    could run these checks just before additions to assert no-op


    glozow commented at 8:43 pm on June 3, 2025:
    added
  148. in src/test/orphanage_tests.cpp:182 in c9a34d4cdb outdated
    177+        orphanage.AddChildrenToWorkSet(*parents.at(0), det_rand);
    178+        BOOST_CHECK(orphanage.HaveTxToReconsider(peer));
    179+
    180+        // Add 1 more orphan, causing the orphanage to be oversize. child1 is evicted.
    181+        orphanage.AddTx(children.at(3), peer);
    182+        orphanage.LimitOrphans();
    


    instagibbs commented at 2:22 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    might as well use and assert return of CheckNumEvictions


    glozow commented at 3:41 pm on June 2, 2025:
    done
  149. in src/test/orphanage_tests.cpp:190 in c9a34d4cdb outdated
    185+        BOOST_CHECK(orphanage.HaveTx(children.at(2)->GetWitnessHash()));
    186+        BOOST_CHECK(orphanage.HaveTx(children.at(3)->GetWitnessHash()));
    187+
    188+        // Add 1 more... child2 is evicted.
    189+        orphanage.AddTx(children.at(4), peer);
    190+        orphanage.LimitOrphans();
    


    instagibbs commented at 2:22 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    might as well use and assert return of CheckNumEvictions


    glozow commented at 3:41 pm on June 2, 2025:
    done
  150. in src/test/orphanage_tests.cpp:203 in c9a34d4cdb outdated
    198+        orphanage.AddChildrenToWorkSet(*parents.at(3), det_rand);
    199+
    200+        orphanage.AddTx(children.at(5), peer);
    201+        orphanage.AddChildrenToWorkSet(*parents.at(5), det_rand);
    202+
    203+        orphanage.LimitOrphans();
    


    instagibbs commented at 2:23 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    might as well use and assert return of CheckNumEvictions


    glozow commented at 3:43 pm on June 2, 2025:
    done
  151. in src/test/orphanage_tests.cpp:214 in c9a34d4cdb outdated
    209+        // The first transaction returned from GetTxToReconsider is the older one, not the one that was marked for
    210+        // reconsideration earlier.
    211+        // Transactions are marked non-reconsiderable again when returned through GetTxToReconsider
    212+        BOOST_CHECK_EQUAL(orphanage.GetTxToReconsider(peer), children.at(3));
    213+        orphanage.AddTx(children.at(6), peer);
    214+        orphanage.LimitOrphans();
    


    instagibbs commented at 2:23 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    might as well use and assert return of CheckNumEvictions


    glozow commented at 3:43 pm on June 2, 2025:
    done
  152. in src/test/orphanage_tests.cpp:226 in c9a34d4cdb outdated
    221+        BOOST_CHECK_EQUAL(orphanage.GetTxToReconsider(peer), children.at(5));
    222+    }
    223+
    224+    // Multiple peers: when limit is exceeded, we choose the DoSiest peer and evict their oldest transaction.
    225+    {
    226+        NodeId peer0{0};
    


    instagibbs commented at 2:25 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    nit: s/peer0/dos_peer/


    glozow commented at 4:18 pm on June 2, 2025:
    done, peer_dosy
  153. in src/test/orphanage_tests.cpp:240 in c9a34d4cdb outdated
    235+        for (unsigned int i{0}; i < max_announcements; ++i) {
    236+            orphanage.AddTx(TXNS.at(i), peer0);
    237+            BOOST_CHECK_EQUAL(CheckNumEvictions(orphanage), 0);
    238+        }
    239+        BOOST_CHECK_EQUAL(orphanage.AnnouncementsFromPeer(peer0), max_announcements);
    240+        BOOST_CHECK_EQUAL(orphanage.AnnouncementsFromPeer(peer1), 0);
    


    instagibbs commented at 2:26 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    0        BOOST_CHECK_EQUAL(orphanage.AnnouncementsFromPeer(peer1), 0);
    1        BOOST_CHECK_EQUAL(orphanage.AnnouncementsFromPeer(peer2), 0);
    

    glozow commented at 4:19 pm on June 2, 2025:
    done
  154. in src/test/orphanage_tests.cpp:357 in c9a34d4cdb outdated
    318+        BOOST_CHECK_EQUAL(orphanage.MaxGlobalAnnouncements(), node::DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS);
    319+        BOOST_CHECK_EQUAL(orphanage.ReservedPeerUsage(), node::DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER);
    320+        BOOST_CHECK_EQUAL(orphanage.MaxGlobalUsage(), node::DEFAULT_RESERVED_ORPHAN_WEIGHT_PER_PEER * 3);
    321+        BOOST_CHECK_EQUAL(orphanage.MaxPeerAnnouncements(), node::DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS / 3);
    322+
    323+        // Number of peers didn't change.
    


    instagibbs commented at 2:32 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    EraseForPeer and EraseTx coverage here would be good. IIUC max memory goes up IFF a peer has a live orphan announcement?


    glozow commented at 4:27 pm on June 2, 2025:
    added
  155. in src/test/orphanage_tests.cpp:93 in c9a34d4cdb outdated
    89@@ -89,6 +90,244 @@ static bool EqualTxns(const std::set<CTransactionRef>& set_txns, const std::vect
    90     return true;
    91 }
    92 
    93+unsigned int CheckNumEvictions(node::TxOrphanageImpl& orphanage)
    


    instagibbs commented at 2:33 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    Single scenario that boots out 2+ txns with a single CheckNumEvictions seems apt


    glozow commented at 8:43 pm on June 3, 2025:
    added a test that has 10 evictions in 1 go
  156. in src/test/orphanage_tests.cpp:278 in c9a34d4cdb outdated
    273+            // Evictions are FIFO within a peer, so the ith transaction sent by peer0 is the one that was evicted.
    274+            BOOST_CHECK(!orphanage.HaveTxFromPeer(TXNS.at(i)->GetWitnessHash(), peer0));
    275+            BOOST_CHECK(orphanage.HaveTx(TXNS.at(i)->GetWitnessHash()));
    276+        }
    277+
    278+        // With 6 peers, each can add 10, and still only peer0's orphans are evicted.
    


    instagibbs commented at 2:34 pm on May 29, 2025:

    c9a34d4cdb5f5eca385304dc5c836960fad2a74a

    Would like test coverage for an “alternation” to deleting another peer’s announcement to demonstrate heap behavior. IIUC this test is only doing peer0


    glozow commented at 7:11 pm on June 5, 2025:
    Added a test that has interleaved worst peers in a LimitOrphans call
  157. in src/test/fuzz/txorphan.cpp:248 in f6c4f1ed3e outdated
    243+
    244+    // Peer that must have orphans protected from eviction
    245+    NodeId honest_peerid{0};
    246+
    247+    // We have NUM_PEERS, of which Peer==0 is the "honest" one
    248+    // who will never exceed their reserved weight of announcement
    


    instagibbs commented at 2:37 pm on May 29, 2025:

    f6c4f1ed3e353d6bf1f4372adb63b1906d18890a

    0    // who will never exceed their reserved weight or announcement
    

    glozow commented at 3:36 pm on June 2, 2025:
    done
  158. in src/test/fuzz/txorphan.cpp:291 in f6c4f1ed3e outdated
    288+            }
    289+            // output amount or spendability will not affect txorphanage
    290+            tx_mut.vout.reserve(num_out);
    291+            for (uint32_t i = 0; i < num_out; i++) {
    292+                const auto payload_size = fuzzed_data_provider.ConsumeIntegralInRange<unsigned int>(1, 100000);
    293+                if (payload_size) {
    


    instagibbs commented at 2:41 pm on May 29, 2025:

    f6c4f1ed3e353d6bf1f4372adb63b1906d18890a

    this conditional is always taken


    glozow commented at 3:37 pm on June 2, 2025:
    changed
  159. in src/test/fuzz/txorphan.cpp:268 in f6c4f1ed3e outdated
    266+
    267+    // initial outpoints used to construct transactions later
    268+    for (uint8_t i = 0; i < 4; i++) {
    269+        outpoints.emplace_back(Txid::FromUint256(uint256{i}), 0);
    270+    }
    271+
    


    instagibbs commented at 2:43 pm on May 29, 2025:
    0
    1    // This set of wtxids are honest peer's live announcements that must be protected
    

    glozow commented at 3:36 pm on June 2, 2025:
    added
  160. in src/test/fuzz/txorphan.cpp:372 in f6c4f1ed3e outdated
    367+
    368+                    Assert(orphanage.CountAnnouncements() <= global_announcement_limit);
    369+                    Assert(orphanage.TotalOrphanUsage() <= per_peer_weight_reservation * NUM_PEERS);
    370+
    371+                    // This should never differ before and after since we aren't allowing
    372+                    // expiries and we've never exceeded the per-peer reservations.
    


    instagibbs commented at 2:51 pm on May 29, 2025:

    f6c4f1ed3e353d6bf1f4372adb63b1906d18890a

    expiries aren’t at thing anymore


    glozow commented at 3:38 pm on June 2, 2025:
    that’s what I meant to say, but yeah probably not helpful to mention
  161. instagibbs commented at 3:09 pm on May 29, 2025: member
    reviewed through ea3a65e698f519afee23484ce1b399e9a4c62529
  162. in src/node/txorphanage_impl.h:500 in 455b4b8178 outdated
    507+        // transactions are added first. Doing so helps avoid work when one of the orphans replaced
    508+        // an earlier one. Since we require the NodeId to match, one peer's announcement order does
    509+        // not bias how we process other peer's orphans.
    510+        auto& index_by_peer = m_orphans.get<ByPeer>();
    511+        auto it_upper = index_by_peer.upper_bound(ByPeerView{peer, true, std::numeric_limits<uint64_t>::max()});
    512+        auto it_lower = index_by_peer.lower_bound(ByPeerView{peer, false, 0});
    


    instagibbs commented at 3:46 pm on May 29, 2025:
    what if the peer sent us the first non-reconsidered orphan? would that not be considered below due to sequence number being 0?

    glozow commented at 3:05 pm on June 2, 2025:
    That element would be equal to it_lower. I guess the concern is if we skip that element in the while (rit != rit_end) loop? rit_end is one past the last element. Here’s a diagram from cpp reference: https://saco-evaluator.org.za/docs/cppreference/en/cpp/iterator/rbegin.html
  163. glozow renamed this:
    p2p: improve TxOrphanage denial of service bounds and increase -maxorphantxs
    p2p: improve TxOrphanage denial of service bounds
    on Jun 1, 2025
  164. [txorphanage] change type of usage to int64_t
    Since this field holds a total number of bytes, overflow is within the
    realm of possibility. Use int64 to be safe.
    6eaba06b1a
  165. [prep/refactor] move txorphanage to node namespace and directory
    This is move-only.
    fd5ba7a3db
  166. [prep/rpc] remove entry and expiry time from getorphantxs
    Expiry is going away in a later commit.
    This is only an RPC change. Behavior of the orphanage does not change.
    Note that getorphantxs is marked experimental.
    c70097bf69
  167. glozow force-pushed on Jun 2, 2025
  168. glozow commented at 4:36 pm on June 2, 2025: member
    Thanks for the review! Just addressed most comments, still have a few more to get to
  169. glozow force-pushed on Jun 2, 2025
  170. DrahtBot added the label CI failed on Jun 2, 2025
  171. DrahtBot commented at 4:41 pm on June 2, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/43322556029 LLM reason (✨ experimental): Lint check failed due to trailing whitespace in src/test/orphanage_tests.cpp.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  172. [prep/test] modify test to not access TxOrphanage internals
    These internals should and will be private.
    3662c5b5ec
  173. [prep/config] remove -maxorphantx
    The orphanage will no longer have a maximum number of unique orphans.
    789e38ff22
  174. [prep/refactor] move DEFAULT_MAX_ORPHAN_TRANSACTIONS to txorphanage.h
    This is move only.
    b5c14b8e7a
  175. [prep/test] have TxOrphanage remember its own limits in LimitOrphans
    Move towards a model where TxOrphanage is initialized with limits that
    it remembers throughout its lifetime.
    Remove the param. Limiting by number of unique orphans will be removed
    in a later commit.
    Now that -maxorphantx is gone, this does not change the node behavior.
    The parameter is only used in tests.
    1225e06846
  176. in src/node/txorphanage.h:31 in 3da112f33b outdated
    27@@ -30,7 +28,11 @@ static const uint32_t DEFAULT_MAX_ORPHAN_TRANSACTIONS{100};
    28  * Not thread-safe. Requires external synchronization.
    29  */
    30 class TxOrphanage {
    31+    const std::unique_ptr<TxOrphanageImpl> m_impl;
    


    sipa commented at 4:53 pm on June 2, 2025:

    What do you think about using the “exposed std::unique_ptr<>” pattern (don’t know if it has a name) as opposed to full-blown pimpl?

    So like TxGraph / TxGraphImpl, I’m suggesting there would be 2 files (no _impl.h), with a TxOrphanage abstract class with virtual member functions, and an implementation defined only inside the .cpp file, and a factory function std::unique_ptr<TxOrphanage> MakeTxOrphanage() or so that returns a newly-constructed TxOrphanageImpl. It avoids the boilerplate of TxOrphanage functions that forward to the impl code, and the need for a _impl.h file. A downside is that it’s (probably negligibly) slower to dispatch (because virtual functions), and an inability to test the implementation beyond what the public interface offers, but I don’t think that’s happening here anyway?


    glozow commented at 5:02 pm on June 3, 2025:
    Sure, yes. I’ll add a commit to the beginning to refactor to this structure + change all clients to do MakeTxOrphanage, and then swap out the Impls basically.

    glozow commented at 8:53 pm on June 3, 2025:
    Restructured this way, with the help of πŸͺ„ ✨AI ✨
  177. glozow force-pushed on Jun 3, 2025
  178. glozow force-pushed on Jun 3, 2025
  179. DrahtBot removed the label CI failed on Jun 3, 2025
  180. in src/test/fuzz/txorphan.cpp:232 in 3322f9301e outdated
    227@@ -228,3 +228,154 @@ FUZZ_TARGET(txorphan, .init = initialize_orphanage)
    228     }
    229     orphanage->SanityCheck();
    230 }
    231+
    232+FUZZ_TARGET(txorphan_protected, .init = initialize_orphanage)
    


    instagibbs commented at 2:37 pm on June 5, 2025:
    Had a thought: this harness could be relaxed to where the fuzzer input selects the protected peers, which could be any subset of all the peers. This would allow coverage of a scenario where no peers exceed reservation limits, and possibly stronger assertions in that case.

    glozow commented at 7:11 pm on June 5, 2025:
    Good idea, done
  181. [prep/refactor] make TxOrphanage a virtual class implemented by TxOrphanageImpl d85124825b
  182. [p2p] improve TxOrphanage DoS limits
    This is largely a reimplementation using boost::multi_index_container.
    All the same public methods are available. It has an index by outpoint,
    per-peer tracking, peer worksets, etc.
    
    A few differences:
    - Limits have changed: instead of a global limit of 100 unique orphans,
      we have a maximum number of announcements (which can include duplicate
    orphans) and a global memory limit which scales with the number of
    peers.
    - The maximum announcements limit is 100 to match the original limit,
      but this is actually a stricter limit because the announcement count
    is not de-duplicated.
    - Eviction strategy: when global limits are reached, a per-peer limit
      comes into play. While limits are exceeded, we choose the peer whose
    β€œDoS score” (max usage / limit ratio for announcements and memory
    limits) is highest and evict announcements by entry time, sorting
    non-reconsiderable ones before reconsiderable ones. Since announcements
    are unique by (wtxid, peer), as long as 1 announcement remains for a
    transaction, it remains in the orphanage.
    - This eviction strategy means no peer can influence the eviction of
      another peer’s orphans.
    - Also, since global limits are a multiple of per-peer limits, as long
      as a peer does not exceed its limits, its orphans are protected from
    eviction.
    - Orphans no longer expire, since older announcements are generally
      removed before newer ones.
    - GetChildrenFromSamePeer returns the transactions from newest to
      oldest.
    498f1c0191
  183. [cleanup] remove unused rng param from LimitOrphans 5c43a637d9
  184. [cleanup] replace TxOrphanage::Size() with CountUniqueOrphans 2d17b9fb3b
  185. [unit test] basic TxOrphanageImpl eviction and protection 5d94c6f351
  186. [unit test] strengthen GetChildrenFromSamePeer tests: results are in recency order 3b4e2eabce
  187. glozow force-pushed on Jun 5, 2025
  188. [fuzz] txorphanage_impl protection harness
    This fuzzer specifically tests protection of an honest peer's orphans.
    
    Co-authored-by: Greg Sanders <gsanders87@gmail.com>
    05e6241be6
  189. [p2p] bump DEFAULT_MAX_ORPHAN_ANNOUNCEMENTS to 3000
    For the default number of peers (125), allows each to relay a default
    descendant package of transactions (up to 25-1=24 can be missing inputs)
    out of order.
    
    Functional tests aren't changed to check for a cap for 3000
    because it would make the runtime too long.
    
    Also deletes the now-unused DEFAULT_MAX_ORPHAN_TRANSACTIONS.
    5e86bb8b29
  190. [prep/test] restart instead of bumpmocktime between p2p_orphan_handling subtests
    If we want to restart at all during the tests, we can't have future timestamps.
    072a77f2b4
  191. [functional test] orphan resolution works in the presence of DoSy peers
    Co-authored-by: Greg Sanders <gsanders87@gmail.com>
    0d511965c9
  192. glozow force-pushed on Jun 5, 2025
  193. DrahtBot added the label CI failed on Jun 5, 2025
  194. DrahtBot commented at 7:16 pm on June 5, 2025: contributor

    🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/43568906845 LLM reason (✨ experimental): The failure is due to a missing include of <bitset> in the source file.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  195. DrahtBot removed the label CI failed on Jun 5, 2025
  196. glozow commented at 3:54 pm on June 6, 2025: member
    All comments addressed, ready for review. Will write up some notes for a review club soon.
  197. in src/node/txorphanage.cpp:14 in 498f1c0191 outdated
     7@@ -8,68 +8,159 @@
     8 #include <logging.h>
     9 #include <policy/policy.h>
    10 #include <primitives/transaction.h>
    11+#include <util/feefrac.h>
    12 #include <util/time.h>
    13 
    14+#include <boost/multi_index/indexed_by.hpp>
    


    sipa commented at 6:00 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nitty McNitface: commit title undersells it a bit. Maybe “Overhaul TxOrphanage with smarter limits” or so?

  198. in src/node/txorphanage.cpp:94 in 498f1c0191 outdated
    129-    std::vector<OrphanMap::iterator> m_orphan_list;
    130+    /** Number of unique orphans by wtxid. Less than or equal to the number of entries in m_orphans. */
    131+    unsigned int m_unique_orphans{0};
    132+
    133+    /** Total bytes used by orphans, deduplicated by wtxid. */
    134+    unsigned int m_unique_orphan_bytes{0};
    


    sipa commented at 6:02 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    I see in other places int64_t is used for expressing usage (like PeerDosInfo::m_total_usage). Maybe better to pick one type for all places that involve usage? Or even introduce some type aliases for things like announcementcounts/txcounts/memusages, as that may improve readability too.

  199. in src/node/txorphanage.cpp:184 in 498f1c0191 outdated
    182     int64_t UsageByPeer(NodeId peer) const override;
    183     void SanityCheck() const override;
    184 };
    185 
    186-bool TxOrphanageImpl::AddTx(const CTransactionRef& tx, NodeId peer)
    187+/** Erase from m_orphans and update m_peer_orphanage_info.
    


    sipa commented at 6:11 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nit: this function documentation may be more appropriate near its declaration inline in TxOrphanageImpl, rather than here with its implementation.

    (and elsewhere below)

  200. in src/node/txorphanage.cpp:203 in 498f1c0191 outdated
    207+    Assume(peer_it != m_peer_orphanage_info.end());
    208+    if (peer_it->second.Subtract(*it)) m_peer_orphanage_info.erase(peer_it);
    209+
    210+    if (cleanup_outpoints_map) {
    211+        m_unique_orphans -= 1;
    212+        m_unique_orphan_bytes -= it->GetUsage();
    


    sipa commented at 8:30 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nit: if cleanup_outpoints_map also controls whether m_unique_orphans and m_unique_orphan_bytes gets updated, maybe it’s better to call it “was_last_for_wtxid” or so?

    Alternative, it seems possible to just drop the argument, and have this function call IsUnique directly. That would remove the responsibility from the caller. The only caller that doesn’t already do this is EraseTx, but even there it’s correct I think, and only negligibly slower?

  201. in src/node/txorphanage.cpp:393 in 498f1c0191 outdated
    490+    while (it != index_by_peer.end() && it->m_announcer == peer) {
    491+        // Decide what will happen next before the iter is invalidated.
    492+        auto it_next = std::next(it);
    493+
    494+        // Delete item, cleaning up m_outpoint_to_orphan_it iff this entry is unique by wtxid.
    495+        Erase<ByPeer>(it, /*cleanup_outpoints_map=*/IsUnique(it->m_tx->GetWitnessHash()));
    


    sipa commented at 8:36 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    I think IsUnique(it) would work here too, and be more efficient? Or move into Erase, see comment there.

  202. in src/node/txorphanage.cpp:463 in 498f1c0191 outdated
    575+            while (NeedsTrim()) {
    576+                if (!Assume(it_ann->m_announcer == worst_peer)) break;
    577+                // Decide what will happen next before the iter is invalidated.
    578+                auto it_next = std::next(it_ann);
    579+
    580+                Erase<ByPeer>(it_ann, /*cleanup_outpoints_map=*/IsUnique(it_ann->m_tx->GetWitnessHash()));
    


    sipa commented at 8:37 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    IsUnique(it_ann) here too? (or move it inside Erase, see comment there).

  203. in src/node/txorphanage.cpp:499 in 498f1c0191 outdated
    635-                // Get this source peer's work set, emplacing an empty set if it didn't exist
    636-                // (note: if this peer wasn't still connected, we would have removed the orphan tx already)
    637-                std::set<Wtxid>& orphan_work_set = m_peer_orphanage_info.try_emplace(announcer).first->second.m_work_set;
    638-                // Add this tx to the work set
    639-                orphan_work_set.insert(elem->first);
    640+                const auto num_announcers{CountWtxid(wtxid)};
    


    sipa commented at 8:44 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Calling lower_bound to initialize it, and then calling CountWtxid(wtxid) which will do the same seems superfluous.

    How about:

    0auto it = index_by_wtxid.lower_bound(ByWtxidView{wtxid, MIN_PEER});
    1auto it_end = index_by_wtxid.upper_bound(ByWtxidView{wtxid, MAX_PEER});
    2const auto num_announcers{std::distance(it, it_end)};
    3if (num_announcers == 0) continue;
    4std::advance(it, rng.randrange(num_announcers));
    

    This would perhaps even allow you to get rid of CountWtxid, as all remaining calls are in Assume(CountWtxid(wtxid) > 1);.

  204. in src/node/txorphanage.cpp:541 in 498f1c0191 outdated
    696-        }
    697+    auto it = m_orphans.get<ByPeer>().lower_bound(ByPeerView{peer, true, 0});
    698+    if (it != m_orphans.get<ByPeer>().end() && it->m_announcer == peer && it->m_reconsider) {
    699+        // Flip m_reconsider. Even if this transaction stays in orphanage, it shouldn't be
    700+        // reconsidered again until there is a new reason to do so.
    701+        auto mark_reconsidered_modifier = [](auto& ann) { ann.m_reconsider = false; };
    


    sipa commented at 8:49 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nit: static constexpr auto mark_reconsider_modifier = ...

  205. in src/node/txorphanage.cpp:505 in 498f1c0191 outdated
    641+                if (!Assume(num_announcers > 0)) continue;
    642+                std::advance(it, rng.randrange(num_announcers));
    643+                if (!Assume(it->m_tx->GetWitnessHash() == wtxid)) break;
    644+
    645+                // Mark this orphan as ready to be reconsidered.
    646+                auto mark_reconsidered_modifier = [](auto& ann) { ann.m_reconsider = true; };
    


    sipa commented at 8:50 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nit: static constexpr auto mark_reconsidered_modifier = ...

  206. in src/node/txorphanage.cpp:362 in 498f1c0191 outdated
    423+    auto it = index_by_wtxid.lower_bound(ByWtxidView{wtxid, MIN_PEER});
    424+    if (it == index_by_wtxid.end()) return 0;
    425+
    426+    unsigned int num_erased{0};
    427+    const auto& txid = it->m_tx->GetHash();
    428+    while (it != index_by_wtxid.end() && it->m_tx->GetWitnessHash() == wtxid) {
    


    sipa commented at 8:56 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    This loop could also be written as:

    0auto it = index_by_wtxid.lower_bound(ByWtxidView{wtxid, MIN_PEER});
    1auto it_end = index_by_wtxid.upper_bound(ByWtxidView{wtxid, MAX_PEER});
    2unsigned int num_erased{0};
    3while (it != it_end) {
    4    Erase<ByWtxid>(it++, num_erased == 0);
    5    ++num_erased;
    6}
    

    (also applies to EraseForPeer)

  207. in src/node/txorphanage.cpp:379 in 498f1c0191 outdated
    457-    m_orphan_list.pop_back();
    458-
    459-    m_orphans.erase(it);
    460-    return 1;
    461+
    462+    return std::min<unsigned int>(1, num_erased);
    


    sipa commented at 9:01 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    If the if (it == index_by_wtxid.end()) return 0; above is kept, then I think this is equivalent to return 1;.

  208. in src/node/txorphanage.cpp:375 in 498f1c0191 outdated
    444-        // Unless we're deleting the last entry in m_orphan_list, move the last
    445-        // entry to the position we're deleting.
    446-        auto it_last = m_orphan_list.back();
    447-        m_orphan_list[old_pos] = it_last;
    448-        it_last->second.list_pos = old_pos;
    449+    if (num_erased > 0) {
    


    sipa commented at 9:01 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    If the if (it == index_by_wtxid.end()) return 0; above is kept, then I think this condition is always true.

  209. in src/node/txorphanage.cpp:353 in 498f1c0191 outdated
    393+
    394+    Assume(CountWtxid(wtxid) > 1);
    395+    return true;
    396 }
    397 
    398 int TxOrphanageImpl::EraseTx(const Wtxid& wtxid)
    


    sipa commented at 9:04 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    This function only ever returns 0 or 1. Maybe make its return type bool?

  210. in src/node/txorphanage.cpp:428 in 498f1c0191 outdated
    540+    std::vector<std::pair<NodeId, FeeFrac>> heap_peer_dos;
    541+    heap_peer_dos.reserve(m_peer_orphanage_info.size());
    542+    for (const auto& [nodeid, entry] : m_peer_orphanage_info) {
    543+        heap_peer_dos.emplace_back(nodeid, entry.GetDosScore(max_ann, max_mem));
    544+    }
    545+    auto compare_score = [](auto left, auto right) { return left.second < right.second; };
    


    sipa commented at 9:08 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Nit: static constexpr auto compare_score =

    Also, the arguments should probably be auto&, though the copy will probably be optimized away anyway.

  211. in src/node/txorphanage.cpp:452 in 498f1c0191 outdated
    564+
    565+        // If needs trim, then at least one peer has a DoS score higher than 1.
    566+        Assume(dos_score.fee > dos_score.size);
    567+
    568+        auto it_worst_peer = m_peer_orphanage_info.find(worst_peer);
    569+        if (it_worst_peer != m_peer_orphanage_info.end()) {
    


    sipa commented at 9:09 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    Is it possible that this condition is not true?

  212. in src/node/txorphanage.cpp:468 in 498f1c0191 outdated
    580+                Erase<ByPeer>(it_ann, /*cleanup_outpoints_map=*/IsUnique(it_ann->m_tx->GetWitnessHash()));
    581+                num_erased += 1;
    582+                it_ann = it_next;
    583+
    584+                // If we erased the last orphan from this peer, it_worst_peer will be invalidated.
    585+                it_worst_peer = m_peer_orphanage_info.find(worst_peer);
    


    sipa commented at 9:14 pm on June 6, 2025:

    In commit “[p2p] improve TxOrphanage DoS limits”

    I don’t think this search is necessary, because no more announcements from the same peer left can be determined by looking if it_ann is end() or refers to another peer now.

    (Feel free to disregard, if you think the current code is sufficiently cleaner; I don’t think the performance difference matters)

  213. sipa commented at 9:16 pm on June 6, 2025: member
    Made it through most of the big commit.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-08 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me