This PR is part of the orphan resolution project, see #27463.
We want to limit the CPU work and memory used by TxOrphanage
to avoid denial of service attacks. On master, this is achieved by limiting the number of transactions in this data structure to 100, and the weight of each transaction to 400KWu (the largest standard tx) [0]. We always allow new orphans, but if the addition causes us to exceed 100, we evict one randomly. This is dead simple, but has problems:
- It makes the orphanage trivially churnable: any one peer can render it useless by spamming us with lots of orphans. It’s possible this is happening: “Looking at data from node alice on 2024-09-14 shows that weβre sometimes removing more than 100k orphans per minute. This feels like someone flooding us with orphans.” [1]
- Effectively, opportunistic 1p1c is useless in the presence of adversaries: it is opportunistic and pairs a low feerate tx with a child that happens to be in the orphanage. So if nothing is able to stay in orphanages, we can’t expect 1p1cs to propagate.
- This number is also often insufficient for the volume of orphans we handle: historical data show that overflows are pretty common, and there are times where “it seems like [the node] forgot about the orphans and re-requested them multiple times.” [1]
Just jacking up the -maxorphantxs
number is not a good enough solution, because it doesn’t solve the churnability problem, and the effective resource bounds scale poorly.
This PR introduces numbers for {global, per-peer} {memory usage, announcements}, representing resource limits:
- The (constant) global announcement limit is the number of unique (wtxid, peer) pairs in the orphanage [2]. This represents a cap on CPU, and does not change with the number of peers we have. Evictions must happen whenever this limit is reached.
- The (variable) per-peer announcement limit is the global announcement limit divided by the number of peers. Peers are allowed to exceed this limit provided the global announcement limit has not been reached. The per-peer announcement limit decreases with more peers.
- The (constant) per-peer memory usage reservation is the amount of orphan weight [3] reserved per peer [4]. Reservation means that peers are effectively guaranteed this amount of space. It is not a limit; peers are allowed to exceed this limit provided the global usage limit is not reached.
- The (variable) global memory usage limit is the number of peers multiplied by the per-peer reservation [5]. As such, the global memory usage limit scales up with the number of peers we have. Evictions must happen whenever this limit is reached.
- We introduce a “Peer DoS Score” which is the maximum between its “CPU Score” and “Memory Score.” The CPU score is the ratio between the number of orphans announced by this peer / peer announcement limit. The memory score is the total usage of all orphans announced by this peer / peer usage reservation.
Eviction changes in a few ways:
- It is triggered if any of these 3 conditions are true: (1)
-maxorphantxs
is exceeded (2) the global announcement limit is exceeded (3) the global memory usage limit is exceeded. - On each iteration of the loop, instead of selecting a random orphan, select a peer and delete 1 of its items. Specifically, select the “DoSiest peer” i.e. the one with the highest DoS score. This is the maximum between its CPU DoS score (based on announcements) and Memory DoS score (based on tx weight). After the peer has been selected, we evict a random orphan.
- Instead of evicting orphans, we evict announcements, i.e., we don’t erase the orphan unless this peer is its only announcer. Of course, over the course of several iteration loops, we may erase all announcers and then the orphan itself. The purpose of this change is to prevent a peer from being able to churn another peer’s announcers.
This PR also:
- Bounds the number of transactions that can be in a peer’s work set by ensuring it is a subset of the peer’s announcements.
- Halts processing at 100 transactions within
EraseForBlock
. Theoretically, it’s possible that every transaction in the orphanage spends inputs that are in the block, and we don’t wantEraseForBlock
to run for too long. - Changes the default
-maxorphantxs
to the global announcement limit.
This means we can receive 1p1c packages in the presence of spammy peers. It also makes the orphanage more useful and increases our download capacity without drastically increasing orphanage resource usage.
[0]: This means the effective memory limit in orphan weight is 100 * 400KWu = 40MWu
[1]: https://delvingbitcoin.org/t/stats-on-orphanage-overflows/1421
[2]: Limit is 3000, which is equivalent to one max size ancestor package (24 transactions can be missing inputs) for each peer (default max connections is 125).
[3]: Orphan weight is used in place of actual memory usage because something like “one maximally sized standard tx” is easier to reason about than “considering the bytes allocated for vin and vout vectors, it needs to be within N bytes…” etc. We can also consider a different formula to encapsulate more the memory overhead but still have an interface that is easy to reason about.
[4]: The limit is 404KWu, which is the maximum size of an ancestor package.
[5]: With 125 peers, this is 50.5MWu, which is a small increase from the existing limit of 40MWu. While the actual memory usage limit is higher (this number does not include the other memory used by TxOrphanage
to store the outpoints map, etc.), this is within the same ballpark as the old limit.