net processing: Move orphan reprocessing to a global #19364

pull jnewbery wants to merge 3 commits into bitcoin:master from jnewbery:2020-06-global-orphans changing 4 files +33 −23
  1. jnewbery commented at 6:09 pm on June 23, 2020: member
    Currently we keep a per-peer list of orphans to reconsider after one of their parents is accepted. There’s no reason this list should be connected to the peer that provided the parent, so instead make it a global. Process that list in a separate call into net processing that does general work rather than in the context of a specific peer.
  2. jnewbery force-pushed on Jun 23, 2020
  3. DrahtBot added the label P2P on Jun 23, 2020
  4. DrahtBot commented at 7:34 pm on June 23, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19316 ([net] Cleanup logic around connection types by amitiuttarwar)
    • #19315 ([tests] Allow outbound & block-relay-only connections in functional tests. by amitiuttarwar)
    • #19306 (refactor: Replace RecursiveMutex with Mutex in CTxMemPool by hebasto)
    • #18044 (Use wtxid for transaction relay by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. in src/net_processing.h:77 in 0e7e11ee0e outdated
    72@@ -73,6 +73,14 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI
    73     */
    74     bool SendMessages(CNode* pto) override EXCLUSIVE_LOCKS_REQUIRED(pto->cs_sendProcessing);
    75 
    76+    /**
    77+    * Process global tasks that aren't attached to a specific peer
    


    naumenkogs commented at 8:47 am on June 24, 2020:
    Wondering if we should clarify that these tasks are expected to be limited to 1 second or something (perhaps “lightweight tasks”)? Just so that future devs don’t tempt to put any heavy logic in here.

    jnewbery commented at 1:42 pm on June 24, 2020:
    Such documentation, if it’s added, should be at the level of the PeerLogicValidation class. We don’t want anything in the message handler thread to block for a long time (or to be blocked by slow tasks in other threads). That includes anything in ProcessMessages(), SendMessages() and (now) ProcessGlobalTasks().
  6. naumenkogs commented at 8:55 am on June 24, 2020: member
    Concept ACK. The idea makes perfect sense from the high-level. The implementation seems safe.
  7. laanwj commented at 1:10 pm on June 24, 2020: member
    Does this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I’m vaguely worried that this might open up a new DoS vector.
  8. jnewbery commented at 2:22 pm on June 24, 2020: member

    Does this new global data structure need to be limited in size in some way? At least when it was per peer, it would go away when the peer was disconnected. I’m vaguely worried that this might open up a new DoS vector.

    That’s the right question to be asking, but I think we’re safe:

    • items are inserted into g_orphan_work_set at the point they would have been inserted into the per-peer orphan_work_set.
    • one item is popped from that set on every message handler thread loop. In effect this means that the set is emptied very quickly. Even if the peer that provided the parent (which previously would have held the object in its orphan_work_set), the object will very quickly be processed from g_orphan_work_set.

    These entries can’t persist for long in the set. I could refactor ProcessOrphanTx() to make it more obvious that we’ll always empty this set quickly, but I wanted to keep the diff small for this PR. Perhaps I could do that in a follow-up.

    Longer term, it might make sense to consolidate all orphan logic into a module, similar to how #19184 overhauls request logic.

  9. in src/net_processing.h:82 in b957b0d098 outdated
    77+    * Process global tasks that aren't attached to a specific peer
    78+    *
    79+    * @param[in]   interrupt       Interrupt condition for processing threads
    80+    * @returns     bool            true if there's more work to do
    81+    */
    82+    bool ProcessGlobalTasks(std::atomic<bool>& interrupt);
    


    troygiorshev commented at 1:41 pm on June 29, 2020:

    Compiler warning -Winconsistent-missing-override

    0    bool ProcessGlobalTasks(std::atomic<bool>& interrupt) override;
    

    jnewbery commented at 4:52 pm on June 29, 2020:
    Good catch. Fixed!
  10. troygiorshev commented at 1:57 pm on June 29, 2020: contributor

    ACK b957b0d098f5dce2961d2b1a8c7bcfeb6b8e935b modulo below

    Looks to me that this PR is a very natural extension of #15644. (It’s maybe in the same line of thinking as this comment there) Naively this global approach feels better.

    In regards to the addition of ProcessGlobalTasks, it seems reasonable to me but I don’t think I have the background to honestly Concept ACK it. (i.e. I can’t fully grasp the impact it will have on future development). Looks like others have, so no issue here.

    If I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of g_orphan_work_set and preventing us from processing any other orphaned transactions? Seems like once that peer is dealt with, g_orphan_work_set will be cleared relatively quickly.

  11. [net processing] Add ProcessGlobalTasks
    ProcessGlobalTasks does general work that isn't attached to a specific
    peer.
    0ea9abf9b4
  12. [net processing] Move orphan_work_set to a global
    Orphan reprocessing should not be tied to the peer that provided
    the parent. Move orphan_work_set to be a global and process it
    in ProcessGlobalTasks()
    92001263d6
  13. [net processing] Don't call ProcessOrphanTx from ProcessMessages
    We reprocess orphan transactions on every MessageHandler thread,
    so there's no need to call it in the context of an individual peer.
    9487f082e5
  14. jnewbery commented at 4:58 pm on June 29, 2020: member

    If I can give it a shot, the suspected DoS vector is something along the lines of a peer feeding us fake orphan transactions, therefore inflating the size of g_orphan_work_set and preventing us from processing any other orphaned transactions?

    I think @laanwj’s concern may have been in general about us storing unvalidated data from a peer, which is always something that we need to be careful about. It’s not so much of a concern here because the size of orphan_work_set is limited by the size of the orphan set, and we’ll always clear orphan_work_set very quickly.

    I have a branch here: https://github.com/jnewbery/bitcoin/tree/2020-06-global-orphans2 that refactors ProcessOrphanTx() and makes it very obvious that g_orphan_work_set is emptied very quickly. I’m not sure whether it’s worth opening that as a follow-up after this is merged.

  15. jnewbery force-pushed on Jun 29, 2020
  16. hebasto commented at 5:14 pm on June 29, 2020: member
    Concept ACK.
  17. in src/net_processing.cpp:2836 in 9487f082e5
    2832@@ -2832,7 +2833,7 @@ void ProcessMessage(
    2833                 auto it_by_prev = mapOrphanTransactionsByPrev.find(COutPoint(inv.hash, i));
    2834                 if (it_by_prev != mapOrphanTransactionsByPrev.end()) {
    2835                     for (const auto& elem : it_by_prev->second) {
    2836-                        pfrom.orphan_work_set.insert(elem->first);
    2837+                        g_orphan_work_set.insert(elem->first);
    


    jonatack commented at 3:48 am on July 1, 2020:
    9200126 nit: would it be better to use emplace here to construct rather than copy?

    jnewbery commented at 1:51 pm on July 1, 2020:
    emplace wins when you have to construct a temporary in place which is then copied/moved to the container. When you already have the object in hand (as we do here), it makes no difference - the object is just copied into the container.
  18. in src/net_processing.cpp:3682 in 9487f082e5
    3673@@ -3687,6 +3674,25 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3674     return fMoreWork;
    3675 }
    3676 
    3677+bool PeerLogicValidation::ProcessGlobalTasks(std::atomic<bool>& interruptMsgProc)
    3678+{
    3679+    // Reconsider orphan transactions that may now have parents
    3680+    {
    3681+        LOCK2(cs_main, g_cs_orphans);
    3682+        if (!g_orphan_work_set.empty()) {
    


    jonatack commented at 4:13 am on July 1, 2020:

    9200126 Do you think it would be clearer for this conditional to be a guard (return early) clause instead? e.g. etc.

    0-        if (!g_orphan_work_set.empty()) {
    1+        if (g_orphan_work_set.empty()) return true;
    

    jnewbery commented at 1:53 pm on July 1, 2020:
    That’s exactly how I’d normally structure a function like this. However, I expect we’ll want to move more housekeeping tasks into ProcessGlobalTasks(), in which case we’ll want to continue through to the next task queue if there are not orphan txs to reconsider. I didn’t use the early exit pattern so that adding those tasks doesn’t require restructuring this function.
  19. in src/net_processing.cpp:3679 in 9487f082e5
    3673@@ -3687,6 +3674,25 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3674     return fMoreWork;
    3675 }
    3676 
    3677+bool PeerLogicValidation::ProcessGlobalTasks(std::atomic<bool>& interruptMsgProc)
    3678+{
    3679+    // Reconsider orphan transactions that may now have parents
    


    jonatack commented at 4:17 am on July 1, 2020:
    9200126 A comment on the meaning of the return value logic (e.g. orphans remain in the set/more work to do) may be worth adding here.

    jnewbery commented at 1:54 pm on July 1, 2020:
    Documentation for ProcessGlobalTasks() is by the declaration in the header file, and includes the meaning of the return value.
  20. in src/net_processing.cpp:3685 in 9487f082e5
    3680+    {
    3681+        LOCK2(cs_main, g_cs_orphans);
    3682+        if (!g_orphan_work_set.empty()) {
    3683+            std::list<CTransactionRef> removed_txn;
    3684+            ProcessOrphanTx(connman, m_mempool, g_orphan_work_set, removed_txn);
    3685+            for (const CTransactionRef& removedTx : removed_txn) {
    


    jonatack commented at 4:19 am on July 1, 2020:
    9200126 I wanted to ask if these two variable names could be updated/improved while moving this code (e.g. to removed_txns and removed_txn), but it looks like it may be clearer to leave them as-is given the same naming is used in related functions.

    jnewbery commented at 1:54 pm on July 1, 2020:
    Agree! This can be changed later if necessary.
  21. jonatack commented at 5:10 am on July 1, 2020: member

    Interesting idea! This LGTM.

    I wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.

    To keep the commits hygienic, consider removing the call to ProcessOrphanTx from ProcessMessage in the same commit that adds the new call to it from ProcessGlobalTasks, e.g. squashing the last two commits.

    If you retouch, might be good to fixup s/ProcessMessages/ProcessMessage/ in the commit message, as both function names exist in the file.

    Consider adding here the Doxygen comments commit from your branch.

    If this moves forward, big concept ACK on the changes in https://github.com/jnewbery/bitcoin/commits/2020-06-global-orphans2.

  22. jnewbery commented at 2:00 pm on July 1, 2020: member

    I wonder if there may have been a reason why it was implemented this way in #15644 / 866c805, so it may be good to have review by the author of the code.

    See #15644 (review)

    To keep the commits hygienic, consider removing the call to ProcessOrphanTx from ProcessMessage in the same commit that adds the new call to it from ProcessGlobalTasks, e.g. squashing the last two commits.

    I think that’d be doing too much in the same commit, but I’m happy to squash if other reviewers agree.

    If you retouch, might be good to fixup s/ProcessMessages/ProcessMessage/ in the commit message, as both function names exist in the file.

    I did intend to put ProcessMessages. ProcessMessage is called from ProcessMessages, and ProcessMessages is the entry point into net processing for the message handler thread, so from a high level, it makes sense to say that his change is moving orphan reconsideration from ProcessMessages into ProcessGlobalTasks.

    Consider adding here the Doxygen comments commit from your branch.

    I think that can stay in the follow-up PR, but again, I’m happy to add it here if other reviewers think it makes more sense to add it here.

  23. jonatack commented at 3:25 pm on July 1, 2020: member

    Thanks @jnewbery.

    I wanted to encourage the Doxygen commit if you weren’t sure about doing a follow-up.

    ACK 9487f082e5f4b9696e3b4fd42aafebf42617ca9c

  24. troygiorshev commented at 4:26 am on July 2, 2020: contributor

    reACK 9487f082e5f4b9696e3b4fd42aafebf42617ca9c

    Looking forward to the follow-up!

  25. sipa commented at 4:51 am on July 2, 2020: member

    It took me a while to remember the context here, but this PR causes a subtle but significant change in P2P behavior.

    The reason that there is a per-peer set of to-be-processed potential orphans is that this lets us pause processing for that peer until they’re all processed - as long as the orphan work set for a peer is non-empty, all that happens for that peer is steps towards emptying that set. This makes the behavior observationally equivalent (from the perspective of any single peer) to recursively processing all orphans immediately as soon as the “missing link” transaction is received.

    With this change that is no longer the case - P2P messages from the peer that provides the missing link can and will be processed and responded to before processing the dependent orphans. I’m not sure that anything depends on that behavior, but I also can’t guarantee that there isn’t.

    I don’t think code cleanup is a sufficient justification for such a fundamental P2P change.

  26. jnewbery commented at 3:19 pm on July 2, 2020: member

    I’m not sure that anything depends on that behavior, but I also can’t guarantee that there isn’t.

    I don’t think code cleanup is a sufficient justification for such a fundamental P2P change.

    For context, this is part of a wider project to move all application-layer data and processing into net processing.

    The PR that introduced orphan_work_set and moved orphan data into the net layer also changed behaviour, as you noted in the PR description: Messages from other peers arriving in the mean time are processed normally, but other messages from the peer that gave us the parent have to wait until all orphan processing is done. ie processing the orphan_work_set is interposed with processing messages from other peers. How were you able to satisfy yourself that that change was safe, but this one isn’t? Is there a specific scenario you’re thinking of here which you’re worried about?

    I can’t think of any problems this might cause. If a peer announces a tx that conflicts with an orphan and also provides the parent, then in master, the orphan will be accepted into the mempool, but in this branch the conflicting tx may be accepted first. I don’t think that’s an issue since in the case of two conflicting txs we never guarantee which one will make it into the mempool. Are there other scenarios we should be considering?

    Generally, if we’re worried about behaviour changes we try to reason about them and satisfy ourselves that they’re safe, and include tests if we have any doubts. I don’t see why this PR should be different.

  27. jnewbery commented at 10:37 pm on July 2, 2020: member
    Lots of discussion about this on IRC today: http://www.erisian.com.au/bitcoin-core-dev/log-2020-07-02.html L325 onwards.
  28. sipa commented at 6:25 pm on July 7, 2020: member

    FWIW here is a commit that moves orphan_work_set to net_processing (inside NodeState) without changing semantics: https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_np

    It also makes orphan_work_set protected by g_cs_orphans, as it apparently was not explicitly protected by any lock in the existing code (which was ok as only the message handler thread accessed it, but still scary).

  29. jnewbery commented at 7:20 pm on July 7, 2020: member

    This PR intended to do two things:

    1. Don’t do orphan reconsideration in the context of an individual peer
    2. Move orphan_work_set data from net into net_processing.

    I think both changes are good. If everyone agreed, I don’t think it’d be a problem to do both in this PR, but since this is controversial and @sipa has concerns that (1) is correct, it makes sense to split them out and consider them separately.

    The branch at https://github.com/sipa/bitcoin/commits/202007_orphan_work_set_to_np seems fine, but I hope we’ll soon have a PeerState struct in net_processing that isn’t guarded by cs_main. That’d be a more natural place for this. See https://github.com/jnewbery/bitcoin/tree/2020-06-cs-main-split for a demonstration. No harm in moving it to CNodeState first, but it might be duplicate work if we’re going to move it again shortly afterwards.

  30. jnewbery closed this on Jul 7, 2020

  31. MarcoFalke referenced this in commit 7b7cb70f4c on Sep 30, 2020
  32. sidhujag referenced this in commit 1c9fa80e11 on Sep 30, 2020
  33. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me