net: Better askfor request management #4831

pull laanwj wants to merge 3 commits into bitcoin:master from laanwj:2014_09_request_handling changing 8 files +458 −59
  1. laanwj commented at 9:38 am on September 3, 2014: member
    • Instead of naively planning requests every two minutes into (potentially) the far future w/ mapAlreadyAskedFor, actively keep track of requests, and which nodes offer the requested items
    • Better handles the case where nodes announce an inv and go away, or stop responding
    • Maintains state separate from CNode, which makes it easier to review and understand separately from the rest, as well as localizes changes if extension of the logic is necessary

    I also think it can serve as an example of how to ‘peel off’ a single concern from net/main.

    Uses CTimeoutCondition from @ashleyholman (#4230).

    Cases to test:

    • Announce and don’t reply to getdata, make sure retry chooses different node after REQUEST_TIMEOUT
    • Node disconnects and was the current node requested from - should immediately try to get from different node
    • Last node that had announced a certain transaction goes away - request should be discarded I’ve tested these cases using custom pynode clients, as well as by running this with full debugging on my node to see how it behaves,
    • Premature flushing if more than 1000 getdatas queued to a node (this is a very unlikely path but it needs to be tested)

    TODO:

    • Slim down debug logging (this is initially useful for testing and checking, but too much in production)
    • Make sure FlushGetdata’s handling of CNode locking is correct
    • Don’t request from nodes whose sendbuffer is full

    Known issues:

    • Doesn’t currently group getdata requests (sends a getdata request per item instead of one getdata for many invs) - wouldn’t be too difficult to fit in but I’m not sure it is worth the complexity, as usually only one inv is sent at a time (done now)

    Supercedes: #4828 #4547 #4827

  2. in src/netaskfor.cpp: in 57b0fa5ade outdated
    260+    /// Bound number of concurrent inventory requests to each node, this has
    261+    /// the indirect effect of bounding all data structures.
    262+    if (state->setAskFor.size() > MAX_SETASKFOR_SZ)
    263+        return;
    264+
    265+    LogPrint("netaskfor", "askfor %s  peer=%d\n", inv.ToString(), nodeid);
    


    rebroad commented at 4:30 am on September 4, 2014:
    “tx” instead of “netaskfor”? And should this be added to the command line syntax messages in init.cpp?

    laanwj commented at 6:29 am on September 4, 2014:
    Could be called ’tx’ but I tried to keep it as general as possible. It requests inventory items. ACK on adding it to command-line syntax.
  3. in src/netaskfor.cpp: in 57b0fa5ade outdated
    156+            invRequestsWorkQueue.begin()->first ? "retry" : "first request");
    157+    invstate.beingRequestedFrom = nodeid;
    158+
    159+    std::vector<CInv> vGetData;
    160+    vGetData.push_back(inv);
    161+    node->PushMessage("getdata", vGetData);
    


    rebroad commented at 5:01 am on September 4, 2014:

    Just one inv per getdata message?

    Why not use the old code to create a list of invs to getdata and use that one node at a time?


    laanwj commented at 6:33 am on September 4, 2014:

    It is a bit more difficult because we act per request here, not per node.

    It would be possible to accumulate getdatas per node, then submit them at once after an iteration, but as said under KNOWN ISSUES in the OP I’m not sure it’s worth the extra complexity. Anyhow it’s an obvious optimization that could be done. Working on this right now…


    rebroad commented at 7:11 am on September 4, 2014:
    I’ve added a pull request (https://github.com/laanwj/bitcoin/pull/3) to this branch that might help with this… (also makes it easier to collect stats in later commits of mine).
  4. laanwj force-pushed on Sep 4, 2014
  5. laanwj force-pushed on Sep 4, 2014
  6. in src/netaskfor.cpp: in a2afa1d17b outdated
     95+/** Handler for when a new node appears */
     96+void InitializeNode(NodeId nodeid, const CNode *pnode)
     97+{
     98+    LOCK(cs_invRequests);
     99+    /*CNodeAskForState &state = */
    100+    mapNodeAskForState.insert(std::make_pair(nodeid, CNodeAskForState())).first->second;
    


    SergioDemianLerner commented at 0:30 am on September 5, 2014:
    What is the trailing .first->second for?

    laanwj commented at 9:52 am on September 5, 2014:
    That’s a leftover from the commented code above it. Basically, the CNodeAskForState had some initial assignments here so a reference to it was stored. This is not necessary at the moment so both the commented code and the .first->second should go.
  7. SergioDemianLerner commented at 0:57 am on September 5, 2014: contributor
    This is excellent. Could we unify the handling and scheduling of transactions getdata/invs with block getdata/invs ? Because sooner or later block fetching will require a similar method to withstand malicious block invs. Maybe that can be done later on top of this patch.
  8. sipa commented at 1:01 am on September 5, 2014: member
    We are actually doing already something very similar for blocks, and the headersfirst branch extends it (#4468). It’s a bit more complicated there, as we want a moving window of block fetching to limit out-of-orderness in which blocks arrive.
  9. sipa commented at 1:08 am on September 5, 2014: member

    Big +1 on code organization: implementing independent pieces of protocol handling should definitely move to separate files, with separate data structures and separate locks (other examples that afaik could easily be converted into this style are ping/pong and alerts).

    It’s a bit unfortunate that the block fetching and tx fetching code are separated but implement similar functionality. It’s a strict improvement though as they were already separate (originally my fault I guess, as I didn’t touch the tx handling code when rewriting the block handling part), and unifying them now would probably interfere with other changes.

  10. laanwj commented at 10:08 am on September 5, 2014: member
    @SergioDemianLerner As I see it, blocks handling is much more closely bound to main/core than this (basically independent) inventory item fetcher, making it enough of a separate concern to warrant being a different module. I’m not against unifying them if it can be done sanely, of course. But block handling is essentially different especially after headers-first.
  11. rebroad commented at 11:06 am on September 5, 2014: contributor
    Malicious block invs? What is one of those?
  12. rebroad commented at 0:23 am on September 6, 2014: contributor

    Currently, the logic is such that large orphan txs are requested repeatedly and ignored repeatedly:-

    2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=1 2014-09-06 00:01:46 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=11 2014-09-06 00:01:47 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=27 2014-09-06 00:01:48 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=23 2014-09-06 00:01:49 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=12 2014-09-06 00:01:50 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=26 2014-09-06 00:01:51 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=24 2014-09-06 00:01:53 ignoring large orphan tx (size: 5057, hash: 18098a192869ef0f9128be9bf1f3bb243575f88d072bb24f918c4e4f5a894b80) peer=20

    Is it possible to make the node remember recently ignored txs so that they don’t keep being requested? This was working in the previous implementation thanks to #4542

  13. laanwj commented at 2:36 am on September 6, 2014: member

    @rebroad This should simply work with #4542? It doesn’t touch any part of the same code.

    As I see it, the logic of whether to request something is outside scope of this module. If you ask netaskfor to retrieve a transaction for you, it will do so, until being told to stop :) (or until it runs out of peers)

  14. rebroad commented at 7:25 am on September 6, 2014: contributor
    @laanwj oh.. yes, you are right. Ok, I’ll re-merge 4542 in that case (which didn’t touch large orphan txs anyway I’ve just noticed!).
  15. in src/netaskfor.cpp: in a2afa1d17b outdated
    216+                LogPrint("netaskfor", "%s: processing item %s\n", __func__, inv.ToString());
    217+                if (it != mapInvRequests.end())
    218+                {
    219+                    CInvState &invstate = it->second;
    220+                    invstate.workQueueIter = invRequestsWorkQueue.end();
    221+                    /// Pick a node to request from, if available
    


    laanwj commented at 8:21 am on September 6, 2014:

    This should skip nodes whose sendbuffer is full.

    What is the correct way to check this? This would be something like pnode->nSendSize < SendBufferSize(), but I’m not sure what lock is needed for that.


    sipa commented at 8:28 am on September 6, 2014:
    In theory you would need pnode->cs_vSend, but if you can argue that the system keeps working correctly even if the test returns the right result only most of the time, put a big comment, and use no lock…
  16. laanwj commented at 3:19 pm on September 9, 2014: member

    Looks like the timeout could indeed be reduced.

     02014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7543
     12014-09-09 15:15:27 ThreadHandleAskFor: processing item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698
     22014-09-09 15:15:27 QueueGetdata: Requesting item tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 from node 7543 (first request)
     32014-09-09 15:15:27 FlushGetdata: peer=7543 getdata tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 
     42014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6150
     52014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=633
     62014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7455
     72014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7018
     82014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=330
     92014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7471
    102014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6306
    112014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=978
    122014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7436
    132014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7596
    142014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=7504
    152014-09-09 15:15:27 askfor tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698  peer=6215
    162014-09-09 15:15:27 Completed: tx b87fabfb19b9401f20c9bb9330db0916c4781a2d06efc914556776ae6715a698 peer=7543
    

    In by-far most cases the transaction is returned in a few seconds. At the same time, invs for it are still coming in from other nodes.

  17. sync: Add CTimeoutCondition
    A condition that can wait for a specified timeout, this is useful when
    it is known in advance that events have to be processed at some time in the
    future.
    b9e5772597
  18. net: add StartThreads and StopThreads signals
    This allows modules to maintain their own threads.
    7ea2476c0a
  19. net: Better inventory request management 9cbaaf61ad
  20. in src/netaskfor.cpp: in a2afa1d17b outdated
    327+    if (it == mapInvRequests.end())
    328+    {
    329+        std::pair<MapInvRequests::iterator, bool> ins = mapInvRequests.insert(std::make_pair(inv, CInvState()));
    330+        it = ins.first;
    331+        /// As this is the first time that this item gets announced by anyone, add it to the work queue immediately
    332+        it->second.workQueueIter = invRequestsWorkQueue.insert(std::make_pair(0, inv));
    


    laanwj commented at 9:07 am on September 10, 2014:

    There is nothing in place at the moment to make sure that data items are requested in the same order that they’re announced (std::multimap does not preserve insertion order). There probably needs to be.

    OTOH when requesting from multiple nodes (or when timeouts are involved) there is no guarantee that the response will come in the same order as the requests. So maybe it’s not an issue.

  21. laanwj force-pushed on Sep 18, 2014
  22. BitcoinPullTester commented at 10:37 am on September 18, 2014: none
    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4831_9cbaaf61ade4b91469f3d728795ec83859c25192/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.
  23. sipa commented at 2:20 am on September 20, 2014: member
    Going to test this.
  24. sipa commented at 3:52 am on September 29, 2014: member
    Works without problems (even in valgrind, after running for a few days). I didn’t actually check whether it fetches/relays things correctly, though.
  25. in src/netaskfor.cpp: in 9cbaaf61ad
     6+#include "config/bitcoin-config.h"
     7+#endif
     8+
     9+#include "netaskfor.h"
    10+
    11+#include <boost/thread.hpp>
    


    Diapolo commented at 8:43 am on September 29, 2014:
    Nit: This should go below our headers, so just flip this with the block below.
  26. Diapolo commented at 8:46 am on September 29, 2014: none
    Just a general question, why are most/all comments beginning with ///?
  27. fanquake commented at 8:58 am on September 29, 2014: member
    @Diapolo It’s so that they’ll be picked up by doxygen. It doesn’t recognise comments starting with //
  28. laanwj added this to the milestone 0.11.0 on Nov 3, 2014
  29. laanwj added the label P2P on Nov 3, 2014
  30. sipa commented at 4:27 pm on November 17, 2014: member
    I hope you pick this up soon after 0.10 :)
  31. laanwj removed this from the milestone 0.11.0 on May 1, 2015
  32. laanwj added this to the milestone 0.12.0 on May 1, 2015
  33. rebroad commented at 4:10 pm on June 25, 2015: contributor
    Needs rebase - is this still in progress?
  34. jgarzik commented at 6:12 pm on July 23, 2015: contributor

    Ping. Needs refresh.

    I think the general consensus is that we want this, but needs more review? Seems to have positive noises in the security discussion and on here, but no ACKs.

  35. gmaxwell commented at 8:46 am on September 6, 2015: contributor
    I tested this previously (at some version…) but it got put down after being punted out of 0.10. Seems to have been forgotten. Lets unforget it.
  36. dcousens commented at 1:25 pm on September 7, 2015: contributor
    concept ACK
  37. jgarzik commented at 5:14 pm on September 15, 2015: contributor
    concept ACK - let this not be forgotten
  38. sipa commented at 5:34 pm on September 15, 2015: member
    Perhaps this will be pushed back until after Cory’s P2P refactor?
  39. laanwj commented at 3:58 am on September 18, 2015: member
    Needs to be picked back up at some point, I’m not sure when. Certainly don’t want to interfere with Cory’s libevent work.
  40. laanwj closed this on Sep 26, 2015

  41. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me