net processing: Move orphan reprocessing to a global #19364
pull jnewbery wants to merge 3 commits into bitcoin:master from jnewbery:2020-06-global-orphans changing 4 files +33 −23-
jnewbery commented at 6:09 pm on June 23, 2020: memberCurrently we keep a per-peer list of orphans to reconsider after one of their parents is accepted. There’s no reason this list should be connected to the peer that provided the parent, so instead make it a global. Process that list in a separate call into net processing that does general work rather than in the context of a specific peer.
-
jnewbery force-pushed on Jun 23, 2020
-
DrahtBot added the label P2P on Jun 23, 2020
-
DrahtBot commented at 7:34 pm on June 23, 2020: member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #19316 ([net] Cleanup logic around connection types by amitiuttarwar)
- #19315 ([tests] Allow outbound & block-relay-only connections in functional tests. by amitiuttarwar)
- #19306 (refactor: Replace RecursiveMutex with Mutex in CTxMemPool by hebasto)
- #18044 (Use wtxid for transaction relay by sdaftuar)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
-
in src/net_processing.h:77 in 0e7e11ee0e outdated
72@@ -73,6 +73,14 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI 73 */ 74 bool SendMessages(CNode* pto) override EXCLUSIVE_LOCKS_REQUIRED(pto->cs_sendProcessing); 75 76+ /** 77+ * Process global tasks that aren't attached to a specific peer
naumenkogs commented at 8:47 am on June 24, 2020:Wondering if we should clarify that these tasks are expected to be limited to 1 second or something (perhaps “lightweight tasks”)? Just so that future devs don’t tempt to put any heavy logic in here.
jnewbery commented at 1:42 pm on June 24, 2020:Such documentation, if it’s added, should be at the level of thePeerLogicValidation
class. We don’t want anything in the message handler thread to block for a long time (or to be blocked by slow tasks in other threads). That includes anything inProcessMessages()
,SendMessages()
and (now)ProcessGlobalTasks()
.naumenkogs commented at 8:55 am on June 24, 2020: memberConcept ACK. The idea makes perfect sense from the high-level. The implementation seems safe.laanwj commented at 1:10 pm on June 24, 2020: memberDoes this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I’m vaguely worried that this might open up a new DoS vector.jnewbery commented at 2:22 pm on June 24, 2020: memberDoes this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I’m vaguely worried that this might open up a new DoS vector.
That’s the right question to be asking, but I think we’re safe:
- items are inserted into
g_orphan_work_set
at the point they would have been inserted into the per-peerorphan_work_set
. - one item is popped from that set on every message handler thread loop. In effect this means that the set is emptied very quickly. Even if the peer that provided the parent (which previously would have held the object in its
orphan_work_set
), the object will very quickly be processed fromg_orphan_work_set
.
These entries can’t persist for long in the set. I could refactor
ProcessOrphanTx()
to make it more obvious that we’ll always empty this set quickly, but I wanted to keep the diff small for this PR. Perhaps I could do that in a follow-up.Longer term, it might make sense to consolidate all orphan logic into a module, similar to how #19184 overhauls request logic.
in src/net_processing.h:82 in b957b0d098 outdated
77+ * Process global tasks that aren't attached to a specific peer 78+ * 79+ * @param[in] interrupt Interrupt condition for processing threads 80+ * @returns bool true if there's more work to do 81+ */ 82+ bool ProcessGlobalTasks(std::atomic<bool>& interrupt);
troygiorshev commented at 1:41 pm on June 29, 2020:Compiler warning
-Winconsistent-missing-override
0 bool ProcessGlobalTasks(std::atomic<bool>& interrupt) override;
jnewbery commented at 4:52 pm on June 29, 2020:Good catch. Fixed!troygiorshev commented at 1:57 pm on June 29, 2020: contributorACK b957b0d098f5dce2961d2b1a8c7bcfeb6b8e935b modulo below
Looks to me that this PR is a very natural extension of #15644. (It’s maybe in the same line of thinking as this comment there) Naively this global approach feels better.
In regards to the addition of ProcessGlobalTasks, it seems reasonable to me but I don’t think I have the background to honestly Concept ACK it. (i.e. I can’t fully grasp the impact it will have on future development). Looks like others have, so no issue here.
If I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of
g_orphan_work_set
and preventing us from processing any other orphaned transactions? Seems like once that peer is dealt with,g_orphan_work_set
will be cleared relatively quickly.[net processing] Add ProcessGlobalTasks
ProcessGlobalTasks does general work that isn't attached to a specific peer.
[net processing] Move orphan_work_set to a global
Orphan reprocessing should not be tied to the peer that provided the parent. Move orphan_work_set to be a global and process it in ProcessGlobalTasks()
[net processing] Don't call ProcessOrphanTx from ProcessMessages
We reprocess orphan transactions on every MessageHandler thread, so there's no need to call it in the context of an individual peer.
jnewbery commented at 4:58 pm on June 29, 2020: memberIf I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of g_orphan_work_set and preventing us from processing any other orphaned transactions?
I think @laanwj’s concern may have been in general about us storing unvalidated data from a peer, which is always something that we need to be careful about. It’s not so much of a concern here because the size of
orphan_work_set
is limited by the size of the orphan set, and we’ll always clearorphan_work_set
very quickly.I have a branch here: https://github.com/jnewbery/bitcoin/tree/2020-06-global-orphans2 that refactors
ProcessOrphanTx()
and makes it very obvious thatg_orphan_work_set
is emptied very quickly. I’m not sure whether it’s worth opening that as a follow-up after this is merged.jnewbery force-pushed on Jun 29, 2020hebasto commented at 5:14 pm on June 29, 2020: memberConcept ACK.in src/net_processing.cpp:2836 in 9487f082e5
2832@@ -2832,7 +2833,7 @@ void ProcessMessage( 2833 auto it_by_prev = mapOrphanTransactionsByPrev.find(COutPoint(inv.hash, i)); 2834 if (it_by_prev != mapOrphanTransactionsByPrev.end()) { 2835 for (const auto& elem : it_by_prev->second) { 2836- pfrom.orphan_work_set.insert(elem->first); 2837+ g_orphan_work_set.insert(elem->first);
jonatack commented at 3:48 am on July 1, 2020:9200126 nit: would it be better to useemplace
here to construct rather than copy?
jnewbery commented at 1:51 pm on July 1, 2020:emplace wins when you have to construct a temporary in place which is then copied/moved to the container. When you already have the object in hand (as we do here), it makes no difference - the object is just copied into the container.in src/net_processing.cpp:3682 in 9487f082e5
3673@@ -3687,6 +3674,25 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter 3674 return fMoreWork; 3675 } 3676 3677+bool PeerLogicValidation::ProcessGlobalTasks(std::atomic<bool>& interruptMsgProc) 3678+{ 3679+ // Reconsider orphan transactions that may now have parents 3680+ { 3681+ LOCK2(cs_main, g_cs_orphans); 3682+ if (!g_orphan_work_set.empty()) {
jonatack commented at 4:13 am on July 1, 2020:9200126 Do you think it would be clearer for this conditional to be a guard (return early) clause instead? e.g. etc.
0- if (!g_orphan_work_set.empty()) { 1+ if (g_orphan_work_set.empty()) return true;
jnewbery commented at 1:53 pm on July 1, 2020:That’s exactly how I’d normally structure a function like this. However, I expect we’ll want to move more housekeeping tasks intoProcessGlobalTasks()
, in which case we’ll want to continue through to the next task queue if there are not orphan txs to reconsider. I didn’t use the early exit pattern so that adding those tasks doesn’t require restructuring this function.in src/net_processing.cpp:3679 in 9487f082e5
3673@@ -3687,6 +3674,25 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter 3674 return fMoreWork; 3675 } 3676 3677+bool PeerLogicValidation::ProcessGlobalTasks(std::atomic<bool>& interruptMsgProc) 3678+{ 3679+ // Reconsider orphan transactions that may now have parents
jonatack commented at 4:17 am on July 1, 2020:9200126 A comment on the meaning of the return value logic (e.g. orphans remain in the set/more work to do) may be worth adding here.
jnewbery commented at 1:54 pm on July 1, 2020:Documentation forProcessGlobalTasks()
is by the declaration in the header file, and includes the meaning of the return value.in src/net_processing.cpp:3685 in 9487f082e5
3680+ { 3681+ LOCK2(cs_main, g_cs_orphans); 3682+ if (!g_orphan_work_set.empty()) { 3683+ std::list<CTransactionRef> removed_txn; 3684+ ProcessOrphanTx(connman, m_mempool, g_orphan_work_set, removed_txn); 3685+ for (const CTransactionRef& removedTx : removed_txn) {
jonatack commented at 4:19 am on July 1, 2020:9200126 I wanted to ask if these two variable names could be updated/improved while moving this code (e.g. toremoved_txns
andremoved_txn
), but it looks like it may be clearer to leave them as-is given the same naming is used in related functions.
jnewbery commented at 1:54 pm on July 1, 2020:Agree! This can be changed later if necessary.jonatack commented at 5:10 am on July 1, 2020: memberInteresting idea! This LGTM.
I wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.
To keep the commits hygienic, consider removing the call to
ProcessOrphanTx
fromProcessMessage
in the same commit that adds the new call to it fromProcessGlobalTasks
, e.g. squashing the last two commits.If you retouch, might be good to fixup
s/ProcessMessages/ProcessMessage/
in the commit message, as both function names exist in the file.Consider adding here the Doxygen comments commit from your branch.
If this moves forward, big concept ACK on the changes in https://github.com/jnewbery/bitcoin/commits/2020-06-global-orphans2.
jnewbery commented at 2:00 pm on July 1, 2020: memberI wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.
See #15644 (review)
To keep the commits hygienic, consider removing the call to ProcessOrphanTx from ProcessMessage in the same commit that adds the new call to it from ProcessGlobalTasks, e.g. squashing the last two commits.
I think that’d be doing too much in the same commit, but I’m happy to squash if other reviewers agree.
If you retouch, might be good to fixup s/ProcessMessages/ProcessMessage/ in the commit message, as both function names exist in the file.
I did intend to put ProcessMessages. ProcessMessage is called from ProcessMessages, and ProcessMessages is the entry point into net processing for the message handler thread, so from a high level, it makes sense to say that his change is moving orphan reconsideration from ProcessMessages into ProcessGlobalTasks.
Consider adding here the Doxygen comments commit from your branch.
I think that can stay in the follow-up PR, but again, I’m happy to add it here if other reviewers think it makes more sense to add it here.
troygiorshev commented at 4:26 am on July 2, 2020: contributorreACK 9487f082e5f4b9696e3b4fd42aafebf42617ca9c
Looking forward to the follow-up!
sipa commented at 4:51 am on July 2, 2020: memberIt took me a while to remember the context here, but this PR causes a subtle but significant change in P2P behavior.
The reason that there is a per-peer set of to-be-processed potential orphans is that this lets us pause processing for that peer until they’re all processed - as long as the orphan work set for a peer is non-empty, all that happens for that peer is steps towards emptying that set. This makes the behavior observationally equivalent (from the perspective of any single peer) to recursively processing all orphans immediately as soon as the “missing link” transaction is received.
With this change that is no longer the case - P2P messages from the peer that provides the missing link can and will be processed and responded to before processing the dependent orphans. I’m not sure that anything depends on that behavior, but I also can’t guarantee that there isn’t.
I don’t think code cleanup is a sufficient justification for such a fundamental P2P change.
jnewbery commented at 3:19 pm on July 2, 2020: memberI’m not sure that anything depends on that behavior, but I also can’t guarantee that there isn’t.
I don’t think code cleanup is a sufficient justification for such a fundamental P2P change.
For context, this is part of a wider project to move all application-layer data and processing into net processing.
The PR that introduced
orphan_work_set
and moved orphan data into the net layer also changed behaviour, as you noted in the PR description: Messages from other peers arriving in the mean time are processed normally, but other messages from the peer that gave us the parent have to wait until all orphan processing is done. ie processing the orphan_work_set is interposed with processing messages from other peers. How were you able to satisfy yourself that that change was safe, but this one isn’t? Is there a specific scenario you’re thinking of here which you’re worried about?I can’t think of any problems this might cause. If a peer announces a tx that conflicts with an orphan and also provides the parent, then in master, the orphan will be accepted into the mempool, but in this branch the conflicting tx may be accepted first. I don’t think that’s an issue since in the case of two conflicting txs we never guarantee which one will make it into the mempool. Are there other scenarios we should be considering?
Generally, if we’re worried about behaviour changes we try to reason about them and satisfy ourselves that they’re safe, and include tests if we have any doubts. I don’t see why this PR should be different.
jnewbery commented at 10:37 pm on July 2, 2020: memberLots of discussion about this on IRC today: http://www.erisian.com.au/bitcoin-core-dev/log-2020-07-02.html L325 onwards.sipa commented at 6:25 pm on July 7, 2020: memberFWIW here is a commit that moves
orphan_work_set
to net_processing (insideNodeState
) without changing semantics: https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_npIt also makes
orphan_work_set
protected byg_cs_orphans
, as it apparently was not explicitly protected by any lock in the existing code (which was ok as only the message handler thread accessed it, but still scary).jnewbery commented at 7:20 pm on July 7, 2020: memberThis PR intended to do two things:
- Don’t do orphan reconsideration in the context of an individual peer
- Move orphan_work_set data from net into net_processing.
I think both changes are good. If everyone agreed, I don’t think it’d be a problem to do both in this PR, but since this is controversial and @sipa has concerns that (1) is correct, it makes sense to split them out and consider them separately.
The branch at https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_np seems fine, but I hope we’ll soon have a
PeerState
struct in net_processing that isn’t guarded by cs_main. That’d be a more natural place for this. See https://github.com/jnewbery/bitcoin/tree/2020-06-cs-main-split for a demonstration. No harm in moving it toCNodeState
first, but it might be duplicate work if we’re going to move it again shortly afterwards.jnewbery closed this on Jul 7, 2020
MarcoFalke referenced this in commit 7b7cb70f4c on Sep 30, 2020sidhujag referenced this in commit 1c9fa80e11 on Sep 30, 2020DrahtBot locked this on Feb 15, 2022
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me