multi-peer orphan resolution followups

glozow commented at 11:49 pm on January 15, 2025: member

Followup to #31397.

Addressing (in order): #31397 (review) #31397 (review) #31397 (review) #31397 (review) #31397 (review) #31397 (review) #31397 (review) #31658#pullrequestreview-2551617694 #31397 (review)

glozow added the label P2P on Jan 15, 2025

DrahtBot commented at 11:49 pm on January 15, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31666.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, marcofleon, mzumsande, dergoegge

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

glozow force-pushed on Jan 16, 2025

glozow marked this as ready for review on Jan 16, 2025

DrahtBot added the label CI failed on Jan 17, 2025

glozow force-pushed on Jan 18, 2025

glozow commented at 1:42 pm on January 18, 2025: member

Rebased for #31675 which fixes https://cirrus-ci.com/task/5200590696873984

DrahtBot removed the label CI failed on Jan 18, 2025

fanquake requested review from instagibbs on Jan 21, 2025

fanquake requested review from mzumsande on Jan 21, 2025

in src/node/txdownloadman_impl.h:197 in b0446cc347 outdated

193@@ -194,6 +194,12 @@ class TxDownloadManagerImpl {
194     /** Helper for getting deduplicated vector of Txids in vin. */
195     std::vector<Txid> GetUniqueParents(const CTransaction& tx);
196 
197+    /** If a transaction is an orphan resolution candidate, treat the unique_parents as announced by this peer; add them

instagibbs commented at 6:20 pm on January 21, 2025:

they are added as new invs IFF they are deemed good candidates for resolution

glozow commented at 11:00 pm on January 21, 2025:

thanks, edited

instagibbs commented at 11:00 pm on January 21, 2025:

Thanks for the update even though on second reading it was already ok :)

glozow commented at 11:03 pm on January 21, 2025:

🫡 I am p easy-going

in src/txrequest.h:200 in ac880a0b16 outdated

195@@ -196,7 +196,7 @@ class TxRequestTracker {
196     size_t Size() const;
197 
198     /** For some tx return all peers with non-COMPLETED announcements for its txid or wtxid. The resulting vector may contain duplicate NodeIds. */
199-    std::vector<NodeId> GetCandidatePeers(const CTransactionRef& tx) const;
200+    void GetCandidatePeers(const uint256& txhash, std::vector<NodeId>& result_peers) const;

instagibbs commented at 6:26 pm on January 21, 2025:

documentation should note that the list is appended to

glozow commented at 11:00 pm on January 21, 2025:

thanks, fixed

instagibbs commented at 6:53 pm on January 21, 2025: member

a couple documentation nits only

glozow force-pushed on Jan 21, 2025

instagibbs approved

instagibbs commented at 11:01 pm on January 21, 2025: member

ACK 47f269b3375288c53a02bdb894de83cfa39c81c4

fanquake requested review from marcofleon on Jan 22, 2025

marcofleon commented at 12:31 pm on January 22, 2025: contributor

Nice improvements in txdownloadman_impl, definitely easier to read.

code review ACK 47f269b3375288c53a02bdb894de83cfa39c81c4

in src/test/fuzz/txrequest.cpp:308 in 445bdaf45a outdated

294@@ -295,6 +295,12 @@ class Tester
295                 tracked += m_announcements[txhash][peer].m_state != State::NOTHING;
296                 inflight += m_announcements[txhash][peer].m_state == State::REQUESTED;
297                 candidates += m_announcements[txhash][peer].m_state == State::CANDIDATE;
298+
299+                std::vector<NodeId> candidate_peers;
300+                m_tracker.GetCandidatePeers(TXHASHES[txhash], candidate_peers);
301+                for (const auto& peer : candidate_peers) {

sipa commented at 3:05 pm on January 29, 2025:

In commit “[fuzz] GetCandidatePeers”

This tests that all values returned by GetCandidatePeers are in the set of expected results, but not that nothing is missing from the GetCandidatePeers result (so if GetCandidatePeers() never modified candidate_peers, no issue would be detected).

glozow commented at 11:08 pm on January 29, 2025:

I’ve changed this to assert that candidate_peers matches m_announcements[txhash].

in src/node/txdownloadman_impl.cpp:180 in 6027148b3b outdated

171@@ -172,6 +172,22 @@ void TxDownloadManagerImpl::DisconnectedPeer(NodeId nodeid)
172 
173 }
174 
175+bool TxDownloadManagerImpl::AddOrphanParentAnnouncements(const std::vector<Txid>& unique_parents, const Wtxid& wtxid, NodeId nodeid, std::chrono::microseconds now)
176+{
177+    if (auto delay{OrphanResolutionCandidate(nodeid, wtxid, unique_parents.size())}) {
178+        const auto& info = m_peer_info.at(nodeid).m_connection_info;
179+
180+        // Treat finding orphan resolution candidate as equivalent to the peer announcing all missing parents

sipa commented at 3:06 pm on January 29, 2025:

In commit “[refactor] helper function for adding orphan parents to txrequest”.

µnit: period at the end of a sentence?

glozow commented at 11:07 pm on January 29, 2025:

added

in src/txorphanage.cpp:165 in 47f269b337 outdated

170-                    LogDebug(BCLog::TXPACKAGES, "added %s (wtxid=%s) to peer %d workset\n",
171-                             tx.GetHash().ToString(), tx.GetWitnessHash().ToString(), announcer);
172-                }
173+
174+                // Select a random peer to assign orphan processing, reducing wasted work if the orphan is still missing
175+                // inputs. However, we don't want to create an issue in which the assigned peer can purposefully stop us

sipa commented at 3:14 pm on January 29, 2025:

In commit “[p2p] assign just 1 random announcer in AddChildrenToWorkSet”

How is this prevented? Just by virtue of picking a random one, so it’s unlikely to be an attacker?

glozow commented at 3:20 pm on January 29, 2025:

Indeed not really prevented, just that it isn’t trivially gameable.

sipa commented at 3:30 pm on January 29, 2025:

Should there perhaps be a preference for outbound peers here then, like in tx requesting?

instagibbs commented at 4:09 pm on January 29, 2025:

The attack scenario here would be:

honest node gives you orphan_A
adversary (sybil) announce orphan_A
someone (could be adversary) gives parent_A, enters into mempool
node adds to adversary’s workset
adversary disconnects
we don’t re-evaluate the orphan

without this PR’s change, we may be doing the re-evaluation work up to num_inbound times erroneously?

Also, making sure, are there any ways 1P1C are affected by the choice either way?

mzumsande commented at 8:46 pm on January 29, 2025:

without this PR’s change, we may be doing the re-evaluation work up to num_inbound times erroneously?

yes - I suggested this because it seemed a bit wasteful in the honest case, but it is probably no big deal, because the actual full validation will only be done once and checking inputs repeatedly shouldn’t be too much additional work.

On the other hand, I think that this attack would be really hard to pull off timing-wise (you would need to guess when other peers do orphan resolution, and even if you get that right you need to disconnect exactly at the right moment) so I’m not sure if it is practical.

glozow commented at 9:31 pm on January 29, 2025:

I wonder if a good “mitigation” (if we’re worried) is @mzumsande other suggestion, which is to process 1 thing off the peer’s workset when it disconnects? Although if trying to “attack” this way, the solution then is to just have 2 in the workset. I’d propose the same trick then, which is randomly popping from workset. That outta be good enough? I prefer something like this to potentially re-validating repeatedly.

However I think it’s very realistic to add outbound/inbound information to txorphanage (we likely want it for other stuff) so I think I shall do that.

glozow commented at 10:29 pm on January 29, 2025:

Also, making sure, are there any ways 1P1C are affected by the choice either way?

No because 1p1c doesn’t happen through the workset stuff. We pick an orphan child and immediately process it.

glozow commented at 11:08 pm on January 29, 2025:

Left as is for now

mzumsande commented at 1:20 am on January 30, 2025:

I wonder if a good “mitigation” (if we’re worried) is @mzumsande other suggestion, which is to process 1 thing off the peer’s workset when it disconnects? Although if trying to “attack” this way, the solution then is to just have 2 in the workset.

Another idea that would deal with that is to assign it just to the peer who provided the parent here (no randomness), and during disconnection we don’t do any validation but pass all oustanding workset entries on to another random announcer, if any exist.

glozow commented at 5:13 pm on January 30, 2025:

without this PR’s change, we may be doing the re-evaluation work up to num_inbound times erroneously?

yes - I suggested this because it seemed a bit wasteful in the honest case, but it is probably no big deal, because the actual full validation will only be done once and checking inputs repeatedly shouldn’t be too much additional work.

Hm. What do you mean by “the actual full validation will only be done once” btw?

mzumsande commented at 6:20 pm on January 30, 2025:

Hm. What do you mean by “the actual full validation will only be done once” btw?

I meant that aborting due to missing inputs should be fast compared to full validation (signature checks), which will only be done once, because after the tx is in the mempool all subsequent checks from other peer’s worksets will abort quickly.

glozow commented at 6:52 pm on January 30, 2025:

I meant that aborting due to missing inputs should be fast compared to full validation (signature checks), which will only be done once,

Ah ok same page 👍 was wondering if you were thinking of some other kind of caching

in src/node/txdownloadman_impl.cpp:413 in 6027148b3b outdated

409@@ -401,17 +410,8 @@ node::RejectedTxTodo TxDownloadManagerImpl::MempoolRejectedTx(const CTransaction
410                 add_extra_compact_tx &= !m_orphanage.HaveTx(wtxid);
411 
412                 auto add_orphan_reso_candidate = [&](const CTransactionRef& orphan_tx, const std::vector<Txid>& unique_parents, NodeId nodeid, std::chrono::microseconds now) {
413-                    const auto& wtxid = orphan_tx->GetWitnessHash();
414-                    if (auto delay{OrphanResolutionCandidate(nodeid, wtxid, unique_parents.size())}) {
415-                        const auto& info = m_peer_info.at(nodeid).m_connection_info;
416+                    if (AddOrphanParentAnnouncements(unique_parents, orphan_tx->GetWitnessHash(), nodeid, now)) {

mzumsande commented at 8:18 pm on January 29, 2025:

I find the flow of logic a bit convoluted:

The check whether a peer is a orphan resolution candidate happens in OrphanResolutionCandidate, which also calculates the delay and is called from AddOrphanParentAnnouncements.

I would find the folllowing kind of of flow easier to understand (no need for std::optional, functions do exactly what their name suggests they do):

0if (IsOrphanCandidate()) { // only checks if peer is a candidate
1   AddOrphanParentAnnouncements(); // delay is calculated here or in a submethod
2   m_orphanage.AddTx() / m_orphanage.AddAnnouncer();
3}

glozow commented at 11:07 pm on January 29, 2025:

You’re totally right that it’s convoluted! I’ve changed the commit to just move all the parent inv logic to OrphanResolutionCandidate instead. We don’t need 2 small helper functions where 1 always calls the other. I think this is even clearer, but let me know if you think I should still split it into IsOrphanCandidate() and AddOrphanParentAnnouncements()?

mzumsande commented at 7:14 pm on February 3, 2025:

Looks good to me now, rest is just a matter of taste.

mzumsande commented at 8:47 pm on January 29, 2025: contributor

Concept ACK, just one question / suggestion.

[refactor] move parent inv-adding to OrphanResolutionCandidate

Deduplicate the logic of adding the parents as announcements to
txrequest. The function can return a bool (indicating whether we're
attempting orphan resolution) instead of the delay.

57221ad979

[refactor] rename to OrphanResolutionCandidate to MaybeAdd* 6e4d392a75

[refactor] make GetCandidatePeers take uint256 and in-out vector

The txrequest fuzzer uses uint256s, not transactions, so it's best if
GetCandidatePeers takes that as an input.

7704139cf0

[fuzz] GetCandidatePeers c4cc61db98

multi-announcer orphan handling test fixups 18820ccf6b

[refactor] assign local variable for wtxid 32eb6dc758

[doc] how unique_parents can be empty e3bd51e4b5

pass P2PTxInvStore init args to P2PInterface init 2da46b88f0

test fix: make peer who sends MSG_TX announcement non-wtxidrelay

Otherwise, it is not meaningful to test whether the announcement is
ignored, because *all* announcements of this type are ignored.

4c1fa6b28c

[p2p] assign just 1 random announcer in AddChildrenToWorkSet 7426afbe62

glozow force-pushed on Jan 29, 2025

instagibbs commented at 2:28 pm on January 30, 2025: member

reACK 7426afbe62414fa575f91b4f8d3ea63bcc653e8b

I am -0 on “[p2p] assign just 1 random announcer in AddChildrenToWorkSet” though. That and similar changes discussed in #31666 (review) seem to complicate analysis for unclear performance gains?

DrahtBot requested review from mzumsande on Jan 30, 2025

DrahtBot requested review from marcofleon on Jan 30, 2025

marcofleon commented at 5:01 pm on January 31, 2025: contributor

reACK 7426afbe62414fa575f91b4f8d3ea63bcc653e8b

Changes since I last reviewed:

Consolidating AddOrphanParentAnnoucements and OrphanResolutionCandidate. Checked that the logic ends up being the same with the updated version.
Making the txrequest fuzz target more robust. I ran it for a bit to be sure.

I’m also neutral on the AddChildrenToWorkSet decision for now. Maybe it’s something that might be included/changed with a future orphan handling PR?

mzumsande commented at 7:46 pm on February 3, 2025: contributor

Code Review ACK 7426afbe62414fa575f91b4f8d3ea63bcc653e8b

dergoegge approved

dergoegge commented at 10:00 am on February 4, 2025: member

Code review ACK 7426afbe62414fa575f91b4f8d3ea63bcc653e8b

fanquake merged this on Feb 4, 2025

fanquake closed this on Feb 4, 2025

Abababaababababababa approved

glozow deleted the branch on Feb 5, 2025

multi-peer orphan resolution followups #31666

Code Coverage & Benchmarks

Reviews

Conflicts