p2p: Request NOTFOUND transactions immediately from other outbound peers, when possible #15505

pull sdaftuar wants to merge 4 commits into bitcoin:master from sdaftuar:2019-02-notfound-requests changing 5 files +123 −44
  1. sdaftuar commented at 8:19 pm on February 28, 2019: member

    Currently when a peer asks for a transaction that we cannot (or will not) deliver, we send a NOTFOUND message for the given txid. However, our software currently ignores the NOTFOUND messages we receive from others – if other peers have announced a transaction, we wait until the transaction request times out before requesting from our next peer.

    Improve this by immediately requesting the transaction from one of the outbound peers who have recently announced it (if any).

  2. DrahtBot commented at 9:31 pm on February 28, 2019: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16279 (Return the AcceptBlock CValidationState directly in ProcessNewBlock by TheBlueMatt)
    • #16197 (net: Use mockable time for tx download by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/net_processing.cpp:853 in 059e04c03c outdated
    849+        if (peer_loc != g_outbound_peers.size()-1) {
    850+            // Swap this peer with the last one
    851+            g_outbound_peers[peer_loc] = g_outbound_peers.back();
    852+        }
    853+        g_outbound_peers.pop_back();
    854+    }
    


    jamesob commented at 10:31 pm on February 28, 2019:
    Seems like this could be a lot more succinct with vector::erase - is there some reason to do it this way?

    sdaftuar commented at 3:31 pm on March 1, 2019:

    I guess I have an aversion to vector::erase because it requires moving all the other entries. Possibly that concern doesn’t make sense for such a small vector?

    But the bigger issue here is why this is a vector at all – originally I had this as a set, but then I changed it to be a vector in order to make the logic around when to drain the notfound queue for each peer make sense. In short, I wanted to uniformly at random assign a notfound transaction to an outbound peer that announced it, but I didn’t have a good way to do that, so instead I assign it to all peers, but then have a (hacky) counter in SendMessages and each outbound peer will check that counter to decide when to drain its notfound queue.

    (I’m hoping that reviewers will give me a better idea of how to implement that logic than what I’ve got so far…)

  4. in src/net_processing.cpp:356 in 0f4ceef389 outdated
    351@@ -338,6 +352,11 @@ struct CNodeState {
    352      *   peers.
    353      */
    354     struct TxDownloadState {
    355+        /* Notifications of NOTFOUND transactions (from other peers) which
    356+         * this peer has recently announced. txid->time in micros
    


    naumenkogs commented at 11:18 pm on February 28, 2019:
    This comment seems outdated

    sdaftuar commented at 8:21 pm on March 1, 2019:
    Fixed.
  5. fanquake added the label P2P on Feb 28, 2019
  6. in src/net_processing.cpp:3186 in 0f4ceef389 outdated
    3183+                    // waiting for the download timeout to expire before
    3184+                    // re-requesting a transaction.
    3185+                    if (!AlreadyHave(inv)) {
    3186+                        for (auto it = g_outbound_peers.begin(); it != g_outbound_peers.end(); ++it) {
    3187+                            if (*it == pfrom->GetId()) continue;
    3188+                            CNodeState *state = State(*it);
    


    practicalswift commented at 7:29 am on March 1, 2019:
    This state shadows the state in the outer scope.

    sdaftuar commented at 8:22 pm on March 1, 2019:
    Fixed.
  7. in src/net_processing.cpp:748 in 0f4ceef389 outdated
    743@@ -724,6 +744,41 @@ void RequestTx(CNodeState* state, const uint256& txid, int64_t nNow) EXCLUSIVE_L
    744     peer_download_state.m_tx_process_time.emplace(process_time, txid);
    745 }
    746 
    747+// Given a txid and a peer that we'd like to consider downloading a transaction
    748+// from: add an appropriate INV nessage to get_data if it is time to request
    


    practicalswift commented at 7:31 am on March 1, 2019:
    I’ve got an important nessage for you: this should be “message”.

    sdaftuar commented at 8:22 pm on March 1, 2019:
    Fixed.
  8. in src/net_processing.cpp:751 in 0f4ceef389 outdated
    743@@ -724,6 +744,41 @@ void RequestTx(CNodeState* state, const uint256& txid, int64_t nNow) EXCLUSIVE_L
    744     peer_download_state.m_tx_process_time.emplace(process_time, txid);
    745 }
    746 
    747+// Given a txid and a peer that we'd like to consider downloading a transaction
    748+// from: add an appropriate INV nessage to get_data if it is time to request
    749+// the transaction from the peer, and update the appropriate tx download state
    750+// (both the global state and the peer's state). Send a getdata message if the
    751+// get_data vector grows too large.
    


    jamesob commented at 3:09 pm on March 1, 2019:

    Nice doc. To get doxygen to pick this up, you’ll have to do

    0//! this
    1/// or this
    2/**
    3 * or this
    4 */
    

    sdaftuar commented at 8:22 pm on March 1, 2019:
    Done.
  9. in src/net_processing.cpp:3242 in 0f4ceef389 outdated
    3163+        CNodeState *state = State(pfrom->GetId());
    3164+        std::vector<CInv> vInv;
    3165+        vRecv >> vInv;
    3166+        if (vInv.size() <= MAX_INV_SZ) {
    3167+            for (CInv &inv : vInv) {
    3168+                if (inv.type == MSG_TX || inv.type == MSG_WITNESS_TX) {
    


    jamesob commented at 3:19 pm on March 1, 2019:

    This is just a style/readability thing, but you can avoid indenting the next ~23 lines by changing this to

    0if (inv.type != MSG_TX && inv.type != MSG_WITNESS_TX) {
    1    continue;
    2}
    

    sdaftuar commented at 8:22 pm on March 1, 2019:
    Will leave as-is – in this case I like being able to tell that the code block exactly matches the if condition, rather than matching the opposite of an if condition above it that would have caused the loop iteration to be skipped.
  10. in src/net_processing.cpp:3234 in 0f4ceef389 outdated
    3155@@ -3081,8 +3156,46 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
    3156     }
    3157 
    3158     if (strCommand == NetMsgType::NOTFOUND) {
    


    jamesob commented at 3:29 pm on March 1, 2019:
    Aid for fellow reviewers: NOTFOUND messages are assembled here and their structure exactly resembles INVs, which is why we can deserialize using cInv below.
  11. sdaftuar commented at 3:32 pm on March 1, 2019: member
    Thanks for the nits; I’ll address them later, but in the meantime I’m open to suggestions on how to better implement the random assignment of notfound transactions to the outbound peers that have announced them (I think the logic I have here should work, but it’s a bit gross).
  12. in src/net_processing.cpp:3246 in 0f4ceef389 outdated
    3166+        if (vInv.size() <= MAX_INV_SZ) {
    3167+            for (CInv &inv : vInv) {
    3168+                if (inv.type == MSG_TX || inv.type == MSG_WITNESS_TX) {
    3169+                    state->m_tx_download.m_tx_announced.erase(inv.hash);
    3170+                    auto in_flight_it = state->m_tx_download.m_tx_in_flight.find(inv.hash);
    3171+                    if (in_flight_it == state->m_tx_download.m_tx_in_flight.end()) {
    


    jamesob commented at 3:55 pm on March 1, 2019:

    IMO more clearly written as

    0if (!state->m_tx_download.m_tx_in_flight.count(inv.hash)) {
    1...
    2}
    

    sdaftuar commented at 8:23 pm on March 1, 2019:
    Oops, I meant to use the iterator in the erase() call, saving an extra map lookup. Just did that in the latest commit.
  13. in src/net_processing.cpp:4010 in 0f4ceef389 outdated
    4005+            // we might impose).
    4006+            // Our algorithm is to keep a global SendMessages() counter and
    4007+            // every outbound peer gets to drain its notfound queue when the
    4008+            // counter matches its location.
    4009+            assert(g_outbound_peers.size() > 0);
    4010+            if (g_outbound_peers[not_found_counter % g_outbound_peers.size()] == pto->GetId()) {
    


    jamesob commented at 4:15 pm on March 1, 2019:

    Correct me if I’m wrong here, but isn’t this equivalent to doing something like

    0if (GetRandInt(/* nMax= */g_outbound_peers.size()) == 0) {
    1...
    2}
    

    ?

    If you think the potential performance difference (which IMO is probably negligible when shoulder-to-shoulder with network IO) is okay, just doing something like that would allow you to keep g_outbound_peers a set and preserve the algorithm.


    sdaftuar commented at 4:55 pm on March 1, 2019:
    Doing a 1 in N draw each time would introduce a bias to peers that are earlier in the loop, versus ones that are later in the loop. Imagine if N=2, and we have two peers, peer0 and peer1. peer0 would be selected with probability 1/2 + 1/8 + 1/32 + … = 2/3. If N=8, I think the first peer would have a 19% chance of being selected.

    sdaftuar commented at 5:03 pm on March 1, 2019:
    I just realized that my hacky implementation here is broken too – the not_found_counter will basically increment by the number of peers we have, so if the number of peers we have is equal the number of outbound peers we have (a common state for non-listening nodes) then we’ll always be requesting notfound’s from the first peer.

    jamesob commented at 5:46 pm on March 1, 2019:

    I’m not sure I see how there’s any bias based on the ordering of peers - the GetRandInt thing is an independent trial across peers, though of course we may end up draining multiple peers’ m_not_found per single ThreadMessageHandler loop if two peers both get 1/num_outbounds lucky. Is there a reason we’d want to avoid that?

    It seems like there might be a bias based upon which peer we select to queue the m_not_found message on (peers earlier in g_outbound_peers will get selected first more often), but I’m not sure how the order in which we flush m_not_found makes a difference.


    sdaftuar commented at 8:23 pm on March 1, 2019:
    Ok I redid this to be (hopefully) better, thanks.
  14. jamesob commented at 4:17 pm on March 1, 2019: member
    Feel free to ignore the nits, but curious what you think about the use of GetRandInt in lieu of the vector hackery.
  15. sdaftuar force-pushed on Mar 1, 2019
  16. sdaftuar commented at 8:25 pm on March 1, 2019: member
    @jamesob Thanks for the helpful suggestions. I rewrote this with g_outbound_peers being a set, and moving the global sequence number needed for peer selection to be in the interface between net and net_processing. (It seems to me like the idea of having synchronization between loop iterations in ThreadMessageHandler is something that is occasionally useful in net_processing, so I hope the interface change there is acceptable.)
  17. sdaftuar force-pushed on Mar 1, 2019
  18. jamesob commented at 9:35 pm on March 1, 2019: member
    Looks good! I’ll write up a related functional test this weekend unless you have a burning desire to do so.
  19. naumenkogs commented at 9:56 pm on March 1, 2019: member

    I must be misunderstanding something.

    Imagine we have 2 peers: [p1, p2] Start with seq_num = 1

    SendMessages(p1, 1 % 2 = 1): false SendMessages(p2, 1 % 2 = 1): false

    SendMessages(p1, 2 % 2 = 0): true SendMessages(p2, 2 % 2 = 0): true

    So, p1 will be always called before p2 (and much likely to request something from p1)?

  20. jamesob commented at 10:02 pm on March 1, 2019: member
    @naumenkogs it’s not the result of the modulus that is evaluated to determine whether to notfound flush for a given peer; sequence_number % num_outbounds is used as an index into the set of outbounds which then determines the peer whose turn it is to flush.
  21. sdaftuar commented at 10:04 pm on March 1, 2019: member

    Let’s say we have 2 peers, p1 and p2 in g_outbound_peers, and let’s assume the std::set comparator puts them in that order when iterating through g_outbound_peers. Let’s also say both of them have something in their respective notfound queues.

    When the sequence number is 1, then 1 % 2 == 1, so we look at g_outbound_peers.begin() + 1, and we get p2’s nodeid. When we call SendMessage on p1, the nodeid will not match, so its queue will not be drained. When we call SendMessages on p2, the condition will be true, and we’ll drain the notfound queue. On the next loop iteration, the conditions flip.

    (I’m open to simpler logic that accomplishes the same thing!)

  22. naumenkogs commented at 10:07 pm on March 1, 2019: member

    @sdaftuar @jamesob I see, you’re right, thanks.

    I was not a big fan of extra sequence_numbers, but based on the discussion above and some thinking I don’t see a cleaner way to select a peer in a fair way. This sequence_number is also likely to be useful somewhere else in future.

    utACK

  23. jamesob commented at 7:41 pm on March 5, 2019: member

    utACK https://github.com/bitcoin/bitcoin/pull/15505/commits/f5dbb49c8922cb26d05862b7aacf161bde8ffc5a

    I was going to write a functional test for this, but I’m not sure there’s a sensible way to do that given that it looks like we can’t force P2PDataStore clients to be outbound connections for nodes under test. (see also: #14210)

    If anyone can think of a good manual test plan, I’m happy to carry it out.

  24. DrahtBot added the label Needs rebase on May 4, 2019
  25. sdaftuar commented at 3:16 pm on June 5, 2019: member
    Closing for now.
  26. sdaftuar closed this on Jun 5, 2019

  27. sdaftuar reopened this on Jun 12, 2019

  28. [net] Add a global sequence number to SendMessages()
    Keep an incrementing counter between loop iterations in ThreadMessageHandler.
    
    This allows for synchronization between loop iterations in SendMessages.
    88d73ab65c
  29. [p2p] Maintain a set of outbound peers 7249cad634
  30. Refactor transaction getdata request logic b8ca596138
  31. Request NOTFOUND transactions immediately
    Thanks to James O'Beirne for some of the code in this patch.
    753883512a
  32. sdaftuar force-pushed on Jun 16, 2019
  33. DrahtBot removed the label Needs rebase on Jun 18, 2019
  34. sdaftuar commented at 2:28 pm on July 22, 2019: member
    Seems like not much interest here, closing for now.
  35. sdaftuar closed this on Jul 22, 2019

  36. jkczyz commented at 11:41 pm on July 30, 2019: contributor

    Rather than using a sequence number to determine which peer’s notfound queue to drain, would the following work?

    1. When a notfound message is received, add each transaction to the notfound queue of a random peer selected from those that have announced the transaction.
    2. Unconditionally drain each peer’s notfound queue.

    This would avoid the added complexity of a sequence number. It would also ensure that each transaction is requested from only one peer.

  37. sdaftuar reopened this on Oct 29, 2019

  38. fanquake renamed this:
    [p2p] Request NOTFOUND transactions immediately from other outbound peers, when possible
    p2p: Request NOTFOUND transactions immediately from other outbound peers, when possible
    on Oct 29, 2019
  39. DrahtBot commented at 7:35 pm on October 29, 2019: contributor
  40. DrahtBot added the label Needs rebase on Oct 29, 2019
  41. jnewbery commented at 8:13 pm on November 13, 2019: contributor
    @sdaftuar what do you think about @jkczyz’s suggestion for randomly selecting which peer to request the tx from? Conceptually, I think I prefer it to the sequence counter in the net/net_processing interface.
  42. sdaftuar commented at 8:28 pm on November 14, 2019: member

    Rather than using a sequence number to determine which peer’s notfound queue to drain, would the following work?

    1. When a notfound message is received, add each transaction to the notfound queue of a random peer selected from those that have announced the transaction.
    2. Unconditionally drain each peer’s notfound queue.

    The reason I tend to avoid an approach like this is because the logic for whether we actually send a message to a peer is very complex; there are many reasons why we might call SendMessages() on a peer and then ultimately not be able to send a getdata to it (maybe we have too many getdata’s in flight to them, or maybe we’re processing block requests for them, or maybe their send buffer is full blocking communication for a while, etc). Since there’s no good way to tell whether a getdata would be sent if assigned to a given peer, I think it’s more robust to just signal all of them and let each peer individually figure out if they can make the request.

  43. maflcko commented at 7:39 pm on November 15, 2019: member
    @sdaftuar Are you still working on this to get it rebased or would you prefer some to take it over?
  44. sdaftuar commented at 8:07 pm on November 15, 2019: member
    Please consider this up for grabs.
  45. fanquake added the label Up for grabs on Nov 15, 2019
  46. fanquake commented at 3:30 am on January 5, 2020: member
    Going to close. @naumenkogs you might be interested in following up here?
  47. fanquake closed this on Jan 5, 2020

  48. bitcoin locked this on Feb 15, 2022
  49. sipa commented at 10:08 am on May 2, 2023: member
    I believe this PR’s goal has now effectively been solved by #19988.
  50. maflcko removed the label Up for grabs on May 2, 2023
  51. maflcko removed the label Needs rebase on May 2, 2023
  52. bitcoin unlocked this on May 2, 2023
  53. bitcoin locked this on May 1, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me