Tidy up ProcessOrphanTx #19498

pull jnewbery wants to merge 6 commits into bitcoin:master from jnewbery:2020-07-orphan-tidy changing 2 files +55 −43
  1. jnewbery commented at 10:30 am on July 12, 2020: member
    Originally a follow-up to #19364, this simplifies the logic in ProcessOrphanTx() and removes unused variables.
  2. jnewbery commented at 10:32 am on July 12, 2020: member
    @jonatack @troygiorshev - you both concept ACKed this branch in #19364.
  3. fanquake added the label P2P on Jul 12, 2020
  4. in src/net_processing.cpp:1967 in 9c954f4a86 outdated
    1940+/**
    1941+ * Reconsider orphan transactions after a parent has been accepted to the mempool.
    1942+ *
    1943+ * @param[in]      connman          Reference to the connection manager
    1944+ * @param[in]      mempool          Reference to mempool
    1945+ * @param[in/out]  orphan_work_set  The set of orphan transactions to reconsider. Only one
    


    jonatack commented at 2:53 pm on July 12, 2020:
    2ec4eeadb nit, can mention “Reference to” here as well

    jnewbery commented at 5:16 pm on July 12, 2020:
    Meh. I think I’d prefer to remove “Reference to” from the others than add this.
  5. in src/net_processing.cpp:1965 in 9c954f4a86 outdated
    1935@@ -1925,64 +1936,67 @@ static void ProcessHeadersMessage(CNode& pfrom, CConnman* connman, ChainstateMan
    1936     return;
    1937 }
    1938 
    1939-void static ProcessOrphanTx(CConnman* connman, CTxMemPool& mempool, std::set<uint256>& orphan_work_set, std::list<CTransactionRef>& removed_txn) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans)
    1940+/**
    1941+ * Reconsider orphan transactions after a parent has been accepted to the mempool.
    1942+ *
    1943+ * @param[in]      connman          Reference to the connection manager
    


    jonatack commented at 3:16 pm on July 12, 2020:
    9c954f4 it looks like connman can be reference to const?

    jnewbery commented at 5:16 pm on July 12, 2020:
    sure. Done
  6. in src/net_processing.cpp:1966 in 9c954f4a86 outdated
    1935@@ -1925,64 +1936,67 @@ static void ProcessHeadersMessage(CNode& pfrom, CConnman* connman, ChainstateMan
    1936     return;
    1937 }
    1938 
    1939-void static ProcessOrphanTx(CConnman* connman, CTxMemPool& mempool, std::set<uint256>& orphan_work_set, std::list<CTransactionRef>& removed_txn) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans)
    1940+/**
    1941+ * Reconsider orphan transactions after a parent has been accepted to the mempool.
    1942+ *
    1943+ * @param[in]      connman          Reference to the connection manager
    1944+ * @param[in]      mempool          Reference to mempool
    


    jonatack commented at 3:18 pm on July 12, 2020:
    2ec4eead [in/out]? (mempool needs to be mutable)

    jnewbery commented at 5:16 pm on July 12, 2020:
    hmmm I’m not sure. mempool isn’t used by the caller after the function returns, so I’m not sure I’d call it an ‘out param’. Just because an argument is non-const, I don’t think that automatically makes it an out-param.
  7. jonatack commented at 3:38 pm on July 12, 2020: member
    ACK on the doxygen comments. The pure refactoring changes seem ok. I hesitate with the commits where refactoring is mixed with behavior changes: are the behavior changes desirable in themselves or ignorable side effects; are the benefits worth the review and risk.
  8. DrahtBot commented at 5:14 pm on July 12, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #20025 (validation/util: add GetTransactionFee by gzhao408)
    • #19911 (net: guard vRecvGetData with cs_vRecv and orphan_work_set with g_cs_orphans by narula)
    • #19339 (validation: re-delegate absurd fee checking from mempool to clients by gzhao408)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. jnewbery commented at 5:15 pm on July 12, 2020: member

    are the behavior changes desirable in themselves or ignorable side effects

    There are no observable functional changes. The behavior changes are simply in the timing of how items are popped off the orphan_work_set list. They make the control flow of the function much easier to follow. After these changes either:

    • orphan_work_set is empty and the function does nothing; or
    • the function pops one item off orphan_work_set and processes it.
  10. DrahtBot added the label Needs rebase on Jul 14, 2020
  11. jnewbery force-pushed on Jul 15, 2020
  12. jnewbery commented at 9:48 am on July 15, 2020: member
    rebased
  13. DrahtBot removed the label Needs rebase on Jul 15, 2020
  14. DrahtBot added the label Needs rebase on Jul 16, 2020
  15. jnewbery commented at 9:36 am on July 16, 2020: member
    rebased
  16. jnewbery force-pushed on Jul 16, 2020
  17. jnewbery removed the label Needs rebase on Jul 16, 2020
  18. DrahtBot added the label Needs rebase on Jul 22, 2020
  19. jnewbery commented at 10:07 am on July 23, 2020: member
    rebased
  20. jnewbery force-pushed on Jul 23, 2020
  21. DrahtBot removed the label Needs rebase on Jul 23, 2020
  22. in src/net_processing.cpp:3778 in eb30f87e09 outdated
    3776@@ -3763,10 +3777,7 @@ bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& inter
    3777     if (!pfrom->orphan_work_set.empty()) {
    3778         std::list<CTransactionRef> removed_txn;
    


    troygiorshev commented at 8:44 pm on July 23, 2020:

    This is now unused and can be removed.

    This is the only thing blocking my ACK.

    +1 to the removal of an out param! 👏


    troygiorshev commented at 8:59 pm on July 23, 2020:

    Looks like the same can be done at for AcceptToMemoryPool, here and here.

    I’ve checked all of AcceptToMemoryPool’s call sites, I think we can do the same thing and remove two more out parameters! (AcceptToMemoryPool just calls AcceptToMemoryPoolWithTime) Maybe.

    (For a Follow-up)


    jnewbery commented at 7:51 am on July 24, 2020:

    Thanks. Removed removed_txn

    Those out params are still needed in AcceptToMemoryPool. They’re used to add the removed transactions to the compact block extra transactions here: https://github.com/jnewbery/bitcoin/blob/66ae250195ab7c5c94f72fa9cf222992e1ac46c1/src/net_processing.cpp#L2946-L2947


    troygiorshev commented at 11:23 am on July 24, 2020:
    I’m suggesting that calls to AddtoCompactExtraTransactions be moved into AcceptToMemoryPoolWithTime, in a similar manner to how you moved it into ProcessOrphanTx. I’m pretty sure this is possible, whether it’s a good idea or not is something I’m less sure about.

    jnewbery commented at 12:10 pm on July 24, 2020:
    That would be a layer violation. AddToCompactExtraTransactions() is a net_processing function, adding data to a net_processing data structure. AcceptToMemoryPool() and the functions it calls are validation functions and shouldn’t call into net_processing.
  23. in src/net_processing.cpp:2044 in eb30f87e09 outdated
    2011+ * @param[in]      mempool          Mempool
    2012+ * @param[in/out]  orphan_work_set  The set of orphan transactions to reconsider. Generally only one
    2013+ *                                  orphan will be reconsidered on each call of this function. This set
    2014+ *                                  may be added to if accepting an orphan causes its children to be
    2015+ *                                  reconsidered.
    2016+ */
    


    troygiorshev commented at 8:47 pm on July 23, 2020:
    non-blocking personal-opinion nit: I would prefer the docs to be wrapped to 80 💻

    jnewbery commented at 7:39 am on July 24, 2020:
    We don’t have a project-wide style for this, although I try to keep my lines under 120 chars. 80 feels is a bit stingy to me.
  24. troygiorshev commented at 9:15 pm on July 23, 2020: contributor

    almost-ACK

    +💯 to removing old outdated comments! 👏👏

    Travis failure is unrelated

  25. jnewbery force-pushed on Jul 24, 2020
  26. jnewbery force-pushed on Jul 24, 2020
  27. jnewbery force-pushed on Jul 24, 2020
  28. jnewbery commented at 8:12 am on July 24, 2020: member
  29. troygiorshev commented at 11:26 am on July 24, 2020: contributor
    tACK 833bccc via git range-diff master eb30f87 833bccc
  30. in src/net_processing.cpp:261 in 833bccc522 outdated
    256@@ -253,12 +257,19 @@ namespace {
    257             return &(*a) < &(*b);
    258         }
    259     };
    260-    std::map<COutPoint, std::set<std::map<uint256, COrphanTx>::iterator, IteratorComparator>> mapOrphanTransactionsByPrev GUARDED_BY(g_cs_orphans);
    261 
    262-    std::vector<std::map<uint256, COrphanTx>::iterator> g_orphan_list GUARDED_BY(g_cs_orphans); //! For random eviction
    263+    /** Index from COutPoint into the mapOrphanTransactions. Used to remove
    


    amitiuttarwar commented at 2:50 am on July 25, 2020:
    would be nice if this mentioned that it’s the parents COutPoint

    jnewbery commented at 8:05 am on July 27, 2020:
    done
  31. in src/net_processing.cpp:264 in 833bccc522 outdated
    261 
    262-    std::vector<std::map<uint256, COrphanTx>::iterator> g_orphan_list GUARDED_BY(g_cs_orphans); //! For random eviction
    263+    /** Index from COutPoint into the mapOrphanTransactions. Used to remove
    264+     *  orphan transactions from the mapOrphanTransactions */
    265+    std::map<COutPoint, std::set<std::map<uint256, COrphanTx>::iterator, IteratorComparator>> mapOrphanTransactionsByPrev GUARDED_BY(g_cs_orphans);
    266+    /** Orphan transactions listed in random order for random eviction */
    


    amitiuttarwar commented at 3:02 am on July 25, 2020:

    1/3 nit on this comment, 2/3 for my understanding- these orphans aren’t listed exactly random right? AddOrphanTx adds the entries to in the order they are seen, but EraseOrphanTx introduces some randomness bc it deletes an entry by swapping out the last element (which I’m still trying to understand the point of).

    seems the point of this data structure is to enable LimitOrphanTxSize to easily choose a random orphan to erase by providing a vector interface?

    if my understanding is correct, I think it’d be clearer to say “List of orphan transactions to enable random eviction”


    jnewbery commented at 8:16 am on July 27, 2020:

    these orphans aren’t listed exactly random right?

    if my understanding is correct, I think it’d be clearer to say “List of orphan transactions to enable random eviction”

    You’re correct. I’ve now fixed the comment. Thank you!

    it deletes an entry by swapping out the last element (which I’m still trying to understand the point of)

    This is because g_orphan_list is a std::vector, which is a contiguous-memory container. It can’t have gaps in the underlying array. The normal way to erase an element from the middle of the vector is to delete it and then shift all the following elements forward by one position. That’s expensive, and would invalidate all of the list_pos values that point back to a position in the vector. It’s more efficient to copy the last element to the position of the element you want to delete, fix up the list_pos of just that COrphanTx, and then pop of the (now duplicate) last element.


    amitiuttarwar commented at 7:05 pm on July 27, 2020:

    does this sound right:

    it seems to me that list_pos is tracked for this reason. so the alternative would be if we removed list_pos & used in the built in vector erase function. the tradeoff would be less efficiency.

    the current design choice is to store this additional field on every COrphanTx to reduce an O(n) operation on the g_orphan_list vector, where n <= 100 | -maxorphantx. but I don’t have much idea of how often we erase orphans in practice. so small increase in memory for small decrease in processing time?


    jnewbery commented at 7:03 am on July 28, 2020:
    Yes, this is the only reason we store list_pos. Without it, we’d have to do a (relatively) expensive erase from the middle of the vector.
  32. in src/net_processing.cpp:164 in 833bccc522 outdated
    155+/** Guards orphan transactions and extra txs for compact blocks */
    156 RecursiveMutex g_cs_orphans;
    157+/** Map from txid to orphan transaction record. Limited by
    158+ *  -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */
    159 std::map<uint256, COrphanTx> mapOrphanTransactions GUARDED_BY(g_cs_orphans);
    160 std::map<uint256, std::map<uint256, COrphanTx>::iterator> g_orphans_by_wtxid GUARDED_BY(g_cs_orphans);
    


    amitiuttarwar commented at 3:07 am on July 25, 2020:
    while adding comments, consider one here too? meybe: Index from wtxid into the mapOrphanTransactions to lookup orphan transactions using their witness ids.

    jnewbery commented at 8:20 am on July 27, 2020:
    Thanks. I missed this in the rebase. Added.
  33. in src/net_processing.cpp:2040 in 833bccc522 outdated
    2007+/**
    2008+ * Reconsider orphan transactions after a parent has been accepted to the mempool.
    2009+ *
    2010+ * @param[in]      connman          Connection manager
    2011+ * @param[in]      mempool          Mempool
    2012+ * @param[in/out]  orphan_work_set  The set of orphan transactions to reconsider. Generally only one
    


    amitiuttarwar commented at 3:13 am on July 25, 2020:
    after this PR, isn’t it a max of one orphan that will be reconsidered on each run of the function?

    jnewbery commented at 8:25 am on July 27, 2020:
    Yes. Fixed!
  34. amitiuttarwar commented at 3:50 am on July 25, 2020: contributor

    yay tidy! thanks for the cleanup and docs, the code makes more sense.

    is my understanding correct that there is a v. small change in behavior in ProcessOrphanTx- previously, if an orphan taken off the orphan_work_set was not accepted to mempool bc of missing additional parents & there are more orphans in the work set, the while loop would continue & process the next one. with these changes it will consider max of one orphan per run. seems like an edge case & acceptable shift, but would like to confirm my understanding.

  35. jnewbery force-pushed on Jul 27, 2020
  36. jnewbery commented at 8:55 am on July 27, 2020: member

    Thanks for the review @amitiuttarwar . I’ve addressed your review comments.

    is my understanding correct that there is a v. small change in behavior in ProcessOrphanTx- previously, if an orphan taken off the orphan_work_set was not accepted to mempool bc of missing additional parents & there are more orphans in the work set, the while loop would continue & process the next one. with these changes it will consider max of one orphan per run. seems like an edge case & acceptable shift, but would like to confirm my understanding.

    Yes, that’s exactly right. See the commit log from ProcessOrphanTx: Only ever reprocess one orphan tx:

    0    Minor changes in behavior:
    1    
    2    - exit immediately if the orphan_work_set entry is not found in
    3      mapOrphanTransactions. This can happen if the orphan is erased
    4      from mapOrpahnTransactions due to action by another peer.
    5    - exit immediately if the orphan is still an orphan (do not
    6      continue and process another).
    

    To expand a bit: this logic was extracted into its own function in #15644 to ensure that we only do one expensive transaction validation per ProcessMessages()/SendMessages() call. There were two conditions where we could process more than one item from the orphan_work_set: the transaction is no longer in the orphan pool, or the transaction still doesn’t have all of its parents (in which case AcceptToMemoryPool() exits with TX_MISSING_INPUTS before doing any expensive checks).

    I think it’s a much simpler mental model to say that we’ll process exactly one transaction from the set on every call to ProcessOrphanTx() (and it makes the functionality match the function name!) I’ve seen multiple bugs in these auxiliary work queues where we’ve either failed to make forward progress or created a livelock with network peers. Keeping the mental model of these work queues as simple as possible definitely seems worth the minimal sacrifice of having to call this function more times.

  37. jnewbery force-pushed on Jul 27, 2020
  38. troygiorshev commented at 12:30 pm on July 27, 2020: contributor
    reACK c5b4c02ee2e423dbcccc985b4f573f8ed9aafa0a via git range-diff master 833bccc c5b4c02, only comment changes
  39. amitiuttarwar commented at 7:18 pm on July 27, 2020: contributor

    I’ve seen multiple bugs in these auxiliary work queues where … created a livelock with network peers

    woah interesting, any you could point me towards?

    See the commit log from ProcessOrphanTx: Only ever reprocess one orphan tx

    oops sorry, I don’t know how I missed that. thanks for the explanation.

    alright, I’ve been considering possible ways this difference could be exploited and I can’t think of anything substantial. I’m comfortable with these changes.

    code review ACK c5b4c02ee2

  40. sipa commented at 7:27 pm on July 27, 2020: member

    I’m not worried about the only-reprocess-one-at-a-time as an attack vector - at worst, a peer could make his own transactions be processed slower. As long as there is no impact on what one peer can do w.r.t. processing of things arriving from other peers, there is no such concern.

    However, I’m slightly concerned that there could be innocent transaction patterns that result in orphan transactions being processed significantly slower. As every incoming transaction may “wake up” a number of unprocessed orphans for reprocessing, and each of these again waking up earlier ones again, it sounds like the worst case may be pretty bad.

  41. jnewbery commented at 7:21 am on July 28, 2020: member

    I’m not worried about the only-reprocess-one-at-a-time as an attack vector - at worst, a peer could make his own transactions be processed slower. As long as there is no impact on what one peer can do w.r.t. processing of things arriving from other peers, there is no such concern.

    If this is the case, then should we process the entire orphan_work_set in one pass? I’m happy to remove the ProcessOrphanTx: Only ever reprocess one orphan tx commit, but I want to understand (and document!) what the expected behaviour is. It’s currently not made obvious to the reader which circumstances will cause us to reconsider multiple transactions from the orphan set in one pass (answer: if the txid is no longer in the orphan set, we’ll continue, and if the result from AcceptToMemoryPool is TX_MISSING_INPUTS we’ll iterate again without setting the done variable).

  42. sipa commented at 6:14 pm on July 28, 2020: member

    @jnewbery It currently iterates until either one orphan is actually accepted or actually rejected, or the work set is empty.

    So it continues until it does some “real work”, as anything else (reconsidering things that still have missing inputs) is presumably fast.

  43. jnewbery commented at 10:45 am on July 29, 2020: member

    I don’t see how this could cause orphans to be processed significantly slower. We always make progress through the peer’s orphan_work_set on each ProcessMessages call, and we tell the message handler thread that there’s more work to do if the work set isn’t empty, so we’ll never sleep between calls.

    In the absolute pessimal case, all orphans start in orphan_work_set, the first 99 are still orphans after reconsideration and the 100th is also a parent to the first 99. We then reconsider the first 99 again and the 99th is a parent to the first 98, and so on. To clear this, we’d need to call ProcessOrphanTx() 100^2 / 2 = 50,000 times instead of 100 times currently. I don’t think that’s a problem since:

    • this can’t happen through innocent transaction patterns. The peer would need to construct a very particular transaction tree (including grinding to get the txids in order)
    • this doesn’t make us do any additional work. We have to call AcceptToMemoryPool() 50,000 times in each case. The only difference is that with this branch we spread those calls out over 50,000 calls to ProcessOrphanTx() instead of 100.
    • the peer that fed us the 100 transactions is therefore only hurting itself. Processing for other peers is unaffected.

    Are you thinking of a different scenario?

  44. ajtowns commented at 1:09 am on July 31, 2020: member

    Wow, big approach NACK on “tidy up” PRs that do behaviour changes.

    I think a better mental model for this sort of work queue is “always make real progress every time you look at the queue”, and the main benefit of the tidy up here is moving the “make real progress” out of the while loop, since we’re not doing many of them we’re only doing one, so having it in the loop is misleading. But we can do both: just make the trivial progress first in a loop to find a real orphan tx to attempt, then do that outside of the loop. Something like:

     0--- a/src/net_processing.cpp
     1+++ b/src/net_processing.cpp
     2@@ -2010,23 +2010,26 @@ static void ProcessHeadersMessage(CNode& pfrom, CConnman& connman, ChainstateMan
     3  *
     4  * [@param](/bitcoin-bitcoin/contributor/param/)[in]      connman          Connection manager
     5  * [@param](/bitcoin-bitcoin/contributor/param/)[in]      mempool          Mempool
     6- * [@param](/bitcoin-bitcoin/contributor/param/)[in/out]  orphan_work_set  The set of orphan transactions to reconsider. Only one orphan
     7- *                                  will be reconsidered on each call of this function. This set
     8- *                                  may be added to if accepting an orphan causes its children to
     9- *                                  be reconsidered.
    10+ * [@param](/bitcoin-bitcoin/contributor/param/)[in/out]  orphan_work_set  The set of orphan transactions to reconsider. Exactly one orphan
    11+ *                                  will be re-validated on each call of this function if there is an
    12+ *                                  orphan to process, however if any of the items in orphan_work_set
    13+ *                                  have already been removed from mapOrphanTransactions they may
    14+ *                                  also be erased. This set may be added to if accepting an orphan
    15+ *                                  causes its children to be reconsidered.
    16  */
    17 void static ProcessOrphanTx(CConnman& connman, CTxMemPool& mempool, std::set<uint256>& orphan_work_set) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans)
    18 {
    19     AssertLockHeld(cs_main);
    20     AssertLockHeld(g_cs_orphans);
    21 
    22-    if (orphan_work_set.empty()) return;
    23-
    24-    const uint256 orphanHash = *orphan_work_set.begin();
    25-    orphan_work_set.erase(orphan_work_set.begin());
    26-
    27-    auto orphan_it = mapOrphanTransactions.find(orphanHash);
    28+    std::map<uint256, COrphanTx>::iterator orphan_it = mapOrphanTransactions.end();
    29+    while (!orphan_work_set.empty() && orphan_it == mapOrphanTransactions.end()) {
    30+        const uint256 orphanHash = *orphan_work_set.begin();
    31+        orphan_work_set.erase(orphan_work_set.begin());
    32+        orphan_it = mapOrphanTransactions.find(orphanHash);
    33+    }
    34     if (orphan_it == mapOrphanTransactions.end()) return;
    35+    const uint256 orphanHash = orphan_it->first;
    36 
    37     const CTransactionRef porphanTx = orphan_it->second.tx;
    38     TxValidationState state;
    

    EDIT: (this is still a behaviour change in that it stops work after finding a tx that still has missing inputs. I’m fine with counting that much checking as “real work”, just don’t see doing a failing map lookup as enough progress)

  45. jnewbery commented at 9:31 am on July 31, 2020: member

    Wow, big approach NACK on “tidy up” PRs that do behaviour changes.

    I don’t understand. You’ve NACKed a behaviour change and then suggested a different behaviour change. What’s the difference?

    I agree that if a PR claimed to not change behaviour but did unintentionally, that’d be very bad, but all the behaviour changes in this PR have been fully documented and justified in the commit logs since the PR was opened. They’re observationally equivalent to the peers, and no peer can influence how the node processes other peers.

  46. DrahtBot added the label Needs rebase on Aug 6, 2020
  47. jnewbery commented at 10:43 am on August 7, 2020: member

    @sipa @ajtowns - you’ve both expressed reservations about this PR. Before I go ahead and rebase, can you answer the questions above: @sipa

    I’m slightly concerned that there could be innocent transaction patterns that result in orphan transactions being processed significantly slower.

    I don’t see how this could cause orphans to be processed significantly slower. […] Are you thinking of a different scenario? @ajtowns

    Wow, big approach NACK on “tidy up” PRs that do behaviour changes.

    all the behaviour changes in this PR have been fully documented and justified in the commit logs since the PR was opened. They’re observationally equivalent to the peers, and no peer can influence how the node processes other peers.

  48. ajtowns commented at 2:56 pm on August 11, 2020: member

    @sipa @ajtowns - you’ve both expressed reservations about this PR. Before I go ahead and rebase, can you answer the questions above:

    Wow, big approach NACK on “tidy up” PRs that do behaviour changes. all the behaviour changes in this PR have been fully documented and justified in the commit logs since the PR was opened. They’re observationally equivalent to the peers, and no peer can influence how the node processes other peers.

    The NACK is just on describing something as a “tidy up” when it’s a behaviour change; fixable either by describing the behaviour change, or moving it to a separate PR (and describing it there). “Oh, I tidied up” should mean “I moved the stuff over there” not “this code didn’t bring me joy, so I Marie Kondo’d it”

    Sorry for the delay, I wanted to check that the behaviour was actually (potentially) changing – the difference between doing the next step in ProcessOrphanTx and hence ProcessMessages directly, rather than leaving it for the next iteration in ThreadMessageHandler (which is reduced to 0.1sec away due to fMoreWork being set) is that this gives time for other peers to do things as well.

    I think with suitably well connected peers you could arrange for a set of transactions A1..A100 with A as a parent, B1..B100 each of which have each of A1..A100 as a parent, C1..C100 each of which have each of B1..B100 as a parent, D1..D100 etc. Node X provides A1..A100, and then B provides tx A. At this point A1..A100 are in its orphan_work_set, and it tries A1 then lets other peers have a go. Node Y comes along and provides B1..B100, and some random subset of A1..A100,B1..B100 get evicted from the orphanage. X then processes A2, either doing nothing, or adding B1..B100 to its work set. Then Y provides C1..C100. X processes A3, etc. Not sure if there’s a way to tweak that so the work set grows spectacularly unboundedly, but it definitely seems like a behaviour change to me, and it definitely doesn’t match my expectation of “making forward progress when working on a queue”.

  49. jnewbery commented at 8:42 pm on August 11, 2020: member

    Thanks @ajtowns . A few questions/observations about your scenario:

    • “which is reduced to 0.1sec away due to fMoreWork being set”. I think it’s the other way round. We wait for up to 0.1s if there’s no more work to do. If there is work to do, then we immediately reiterate through vNodes again.
    • “Node X provides A1..A100, and then B provides tx A”. I think the B is a typo here and Node X provides A1..A100 and also provides tx A? (I don’t see node B mentioned anywhere else)
    • "…it tries A1 then lets other peers have a go. Node Y comes along and provides B1..B100". When node X “lets other peers have a go” it returns from ProcessMessages() and we continue through the message handler loop. We can only process one message from each peer on each iteration of that loop, so node Y can’t roll mapOrphanTransactions in one iteration as you suggest here. Theoretically nodes Y1 through Y100 could do so, but their messages would need to be exactly synchronized with the message handler loop, which seems unlikely.
    • “Not sure if there’s a way to tweak that so the work set grows spectacularly unboundedly”. I don’t think so. Every iteration through the the message handler loop, each peer can theoretically add one tx to the orphan map, and node X will remove one tx from its orphan_work_set. Processing that tx from the orphan_work_set may result in more txs being added to the set.
    • The scenario you describe where other peers construct transactions that node X adds to its orphan_work_set is possible today. The slower removal of evicted orphans from the orphan_work_set doesn’t change this.
    • aside: this is a good argument to not process orphan transactions in the context of the node that provided the parent/ancestor, which I wanted to do in #19364.
    • finally, to what end is node Y carrying out this “attack” on node X. Yes, it can prevent the victim from processing messages from node X, but only as long as it can continue send transactions that get accepted to the mempool at the victim’s transaction processing capacity. I’d expect that to be many transactions per second. This doesn’t seem like a particularly useful attack, but if there are more dangerous variations, I still think the correct fix is to not process orphans in the context of the peer that provided the parent/ancestor.
  50. amitiuttarwar commented at 9:53 pm on August 11, 2020: contributor

    since its the only point of contention, maybe the ProcessOrphanTx: Only ever reprocess one orphan tx commit can be broken out into a separate PR to separate review from the rest of the changes?

    in regards to this change, I’m not sure I currently have much insight to offer on this nuanced discussion, I’m mostly trying to learn and make sure I understand the concerns/tradeoffs. this comment is mostly a biased summary, so sorry if it doesn’t feel useful, but I’m hoping it might be helpful for moving the conversation forward, maybe for other reviewers, and for my own understanding. so, I’m interested in any feedback on this take.

    an attempt to sum up some of the principles we are discussing: (A) each attempt to process an item should make forward progress in the work queue (B) an adversarial peer shouldn’t be able to significantly impact the node eg. prevent it from processing messages from other peers (C) we should simplify the mental overhead required to reason about these work queues, because their interactions are inherently complex & error-prone (D) we don’t want to negatively impact processing in normal cases

    on current master- to achieve (B) we cap at 1 expensive validation per call. and while “expensive” is always going to be a judgement call, I don’t think the existing logic has clear boundaries. if we invoke ATMP but the error is TX_MISSING_INPUTS, we attempt to process the next orphan. this helps (A), but in my opinion detracts from (C) because there are other failures that exit the loop that do even less work (like if the txn fails pre-checks). so essentially, instead of saying “exit if we do expensive work”, we are saying “exit if we do the right kind of work”. which @sipa defines here #19498 (comment) as “real work” -> orphan is actually accepted, actually rejected, or the work set is empty. which can be argued achieves (C).

    in the current proposed changes- (correct me if I’m wrong, but based on my impression..) @jnewbery is striving to improve (C) by loosening the definition of “forward progress” in (A), and makes a case that this doesn’t detract from (B) or (D). the changes define “forward progress” to be always remove exactly one orphan from the work queue (even if lookup fails or inputs still missing)

    in @ajtowns suggestion here #19498 (comment), (again.. please correct me if I’m wrong) he agrees that it would be beneficial to improve (C) but doesn’t think (A) is sufficiently addressed in the patch and proposes a definition of “forward progress” between current master & current PR- essentially, ATMP an orphan.

    👆🏽 seems like a reasonable compromise to me in improving (C) and a reasonable definition for (A). do others agree? or are there outstanding concerns with robustness / @jnewbery do you not think it achieves (C)?

    also, I haven’t yet dug into fully understanding the attack being discussed, but @ajtowns is the essential concern around a way a sophisticated attacker can grow the queue faster than it would be processed in the current proposed patch?

  51. jnewbery force-pushed on Aug 12, 2020
  52. DrahtBot removed the label Needs rebase on Aug 12, 2020
  53. jnewbery force-pushed on Aug 12, 2020
  54. jnewbery commented at 9:04 am on August 12, 2020: member

    Thanks @amitiuttarwar . Great summary.

    I’ve done as you’ve suggested and removed the ProcessOrphanTx: Only ever reprocess one orphan tx commit. I think this PR by itself is a marginal improvement, but not including #19364 and the ProcessOrphanTx: Only ever reprocess one orphan tx commit limits the benefits. The fact that @ajtowns and I can’t agree on whether node Y can interfere with node X’s processing by sending many transactions does confirm my belief that orphan handling is (unnecessarily?) complex and a simpler mental model would be beneficial here.

    If people think this PR is still worthwhile without that commit, lets go ahead and try to get it merged, and perhaps revisit orphan processing at some point in the future.

  55. jnewbery commented at 2:12 pm on August 24, 2020: member
    rebased
  56. jnewbery force-pushed on Aug 24, 2020
  57. amitiuttarwar commented at 9:53 pm on August 24, 2020: contributor

    utACK 82396bb897

    orphan processing is complex & I appreciate the efforts to simplify & document. thank you!

    as before, I still think simplifying the while loop to have a clearer definition of “forward progress” would be nice, but given the review strugs, I understand dropping from this PR.

  58. in src/net_processing.cpp:2058 in f98ffeb432 outdated
    2039@@ -2040,12 +2040,9 @@ void static ProcessOrphanTx(CConnman& connman, CTxMemPool& mempool, std::set<uin
    2040         const CTransactionRef porphanTx = orphan_it->second.tx;
    2041         const CTransaction& orphanTx = *porphanTx;
    2042         NodeId fromPeer = orphan_it->second.fromPeer;
    2043-        // Use a new TxValidationState because orphans come from different peers (and we call
    2044-        // MaybePunishNodeForTx based on the source peer from the orphan map, not based on the peer
    2045-        // that relayed the previous transaction).
    2046-        TxValidationState orphan_state;
    2047+        TxValidationState state;
    


    MarcoFalke commented at 7:37 am on August 25, 2020:

    in commit f98ffeb4323111f214234ef33:

    I’d be surprised if this commit compiles. Especially L2094


    jnewbery commented at 9:02 am on August 25, 2020:
    Good catch Marco. This was a bad rebase. I’ve fixed it and verified that all commits compile.
  59. jnewbery force-pushed on Aug 25, 2020
  60. jnewbery commented at 9:02 am on August 25, 2020: member
    Fixed a bad intermediate commit (no changes to final code)
  61. amitiuttarwar commented at 4:35 pm on August 25, 2020: contributor
    re utACK e59bbcd06f - checked that nothing changed about final state & that commits build
  62. MarcoFalke added the label Refactoring on Sep 7, 2020
  63. MarcoFalke commented at 7:14 am on September 7, 2020: member
    Concept ACK
  64. DrahtBot commented at 5:40 pm on September 7, 2020: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  65. DrahtBot added the label Needs rebase on Sep 7, 2020
  66. [net processing] Add doxygen comments for orphan data and function 6e8dd99ef1
  67. ProcessOrphanTx: remove useless done variable
    There is a keyword that allows us to break out of loops. Use it.
    
    There's a small change in behaviour here: if we process multiple orphans
    that are still orphans, then we'll only call mempool.check() once at the
    end, instead of after processing each tx.
    55c79a9cef
  68. ProcessOrphanTx: remove useless setMisbehaving set
    This starts empty, and is only added to if we're about to
    exit the function (so we never read from it).
    4763b51bca
  69. ProcessOrphanTx: Remove outdated commented
    Also rename orphan_state to state. Both the comment and the variable
    name are leftover from when this logic was part of ProcessMessage().
    e07c5d9423
  70. ProcessOrphanTx: Remove aliases 4fce726bd1
  71. ProcessOrphanTx: Move AddToCompactExtraTransactions call into ProcessOrphanTx 001343f4bc
  72. jnewbery force-pushed on Sep 7, 2020
  73. jnewbery commented at 7:13 pm on September 7, 2020: member
    Rebased
  74. jnewbery removed the label Needs rebase on Sep 7, 2020
  75. troygiorshev commented at 2:50 am on September 23, 2020: contributor

    ACK 001343f4bc8b22fa9e313bd2867756eb9d614fa3

    This is a rather straightforward PR now.

  76. sipa commented at 1:08 am on September 30, 2020: member
    utACK 001343f4bc8b22fa9e313bd2867756eb9d614fa3
  77. fanquake requested review from amitiuttarwar on Sep 30, 2020
  78. MarcoFalke commented at 1:53 pm on September 30, 2020: member

    ACK 001343f4bc8b22fa9e313bd2867756eb9d614fa3 🌮

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3ACK 001343f4bc8b22fa9e313bd2867756eb9d614fa3 🌮
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUhi8gv+MYMSOeIoSOoTXDVzdEvq7uZMWyFQwE9JWb+4Gf7ArKX7dLYmeI7/Qx7b
     8hk1wi8p7ilKfr6/9qmVrlEbautGDbPXvGQ9QfUhOLl7O73UKigynh+49Sq6VfH20
     9Et7J3EcyPncqRvU+sQihvOJuXVfsd/K1LXFrxn1Yby8c++pLCmlGwzzlIexWrO+T
    10eV19tSFabJaux0U5ft4l/ZAJ+DncYjM7IQ16W/PkfE3Z/dyfrsr92ZmfZKfHdPd/
    11b8kywFsZ5vZOQAX+4rcAvyHIZyOTZaRTRiKAOTeZ251zn8VjGlhYsbPmA1mwvtZn
    12q9l5C4aTEvDFhqzlSf2kvZ5PR121SGezO5Eq4DEPyRHoK0oyybHv/D6doNOO99Da
    13hVDhdsvCSVXrMbhOSCwboufxbEbYJ5yoLYNZzV8z2WF8Rq2TmdY/l2m+fww8L+SD
    14vatKkGZQATYSd/kMzxBzH0JWLAmz0wEuixWJfaTavv/nO/PsDRDByy0/JNUHUEw2
    15UiwmxSO1
    16=BIT5
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash bf74cbfe89c193a0e98cecd45019f0465bbd807c268f14767ecff37e07f76fe5 -

  79. MarcoFalke merged this on Sep 30, 2020
  80. MarcoFalke closed this on Sep 30, 2020

  81. jnewbery deleted the branch on Sep 30, 2020
  82. sidhujag referenced this in commit 1c9fa80e11 on Sep 30, 2020
  83. Fabcien referenced this in commit ac7a1c6996 on Nov 3, 2021
  84. Fabcien referenced this in commit 4410964ccc on Nov 3, 2021
  85. Fabcien referenced this in commit e2e197f5ef on Nov 3, 2021
  86. Fabcien referenced this in commit 062d88d5fe on Nov 3, 2021
  87. Fabcien referenced this in commit 9b3b55dccc on Nov 3, 2021
  88. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me