This replaces the transaction request logic with an encapsulated class that maintains all the state surrounding it. By keeping it stand alone, it can be easily tested (using included unit tests and fuzz tests).
The new logic is close in spirit to the current one, but much easier to specify I think:
- Always prefer outbound peers over inbound peers, as long as any are available.
- Don’t download from inbound peers during the first 2 s after announcement, to give time for announcements from outbound peers to come in.
- The peer who is the very first to announce a txid is preferred (compared to other inbounds or other outbounds), so that chains of dependent transactions announced by one peer will generally not be split out to other peers.
- After the peer that announces a txid first, requests go to uniformly selected peers (outbounds first; inbounds that are past their 2 s delay afterwards), to minimize the impact an attacker can have (as attackers can easily race the network and be first).
- A txid is never requested twice from the same peer, as long as other candidates remain (even when reannounced).
- Transactions are requested from new candidates as soon as old requests expire, or NOTFOUND is received, or invalid-witness transactions are received (i.e. this includes the functionality added by, and replaces, #18238).
One significant change is the removal of the in-flight limit of 100 txids. I think this limit is not exactly desirable (e.g. when a txid is only announced by one peer, we should request it, regardless of how many other txids we’re already waiting for), and with the changes to handle NOTFOUND immediately, there is less problems with high limits. This implementation’s performance is also only determined by the number of total transaction/peer pairs tracked (O(log n)), not by the number in-flight, removing another reason for having that limit.
A few further improvements could be investigated and relatively easily added if desirable:
- Biasing away from peers with lots of in-flight requests (but not outright limiting like now).
- Making the 2s inbound delay Poisson-distributed.
- Adding a wtxid flag to entries to make it compatible with #18044.
- Adding a time limit for the time between announcement and request.
While the implementation is non-trivial due to performance and resource usage considerations, it has a significantly simpler exact specification. This specification is documented in src/txrequest.h, and reimplemented naively in the src/test/fuzz/txrequest.cpp fuzz test. It may be a good idea to review those first to understand the desired semantics before digging into the optimized implementation.