prevent peer flooding request queue for an inv

kazcw commented at 10:39 pm on July 16, 2014: contributor

mapAlreadyAskedFor does not keep track of which peer has a request queued for a particular tx. As a result, a peer can blind a node to a tx indefinitely by sending many invs for the same tx, and then never replying to getdatas for it. Each inv received will be placed 2 minutes farther back in mapAlreadyAskedFor, so a short message containing 10 invs would render that tx unavailable for 20 minutes.

This is fixed by disallowing a peer from having more than one entry for a particular inv in mapAlreadyAskedFor at a time.

rebroad commented at 10:47 am on July 17, 2014: contributor

Nice catch

prevent peer flooding request queue for an inv

mapAlreadyAskedFor does not keep track of which peer has a request queued for a
particular tx. As a result, a peer can blind a node to a tx indefinitely by
sending many invs for the same tx, and then never replying to getdatas for it.
Each inv received will be placed 2 minutes farther back in mapAlreadyAskedFor,
so a short message containing 10 invs would render that tx unavailable for 20
minutes.

This is fixed by disallowing a peer from having more than one entry for a
particular inv in mapAlreadyAskedFor at a time.

dcd7ef782b

BitcoinPullTester commented at 7:56 am on July 20, 2014: none

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4547_dcd7ef782bd8105c3427f932206e33fa07ddeded/ for binaries and test log. This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/ Contact BlueMatt on freenode if something looks broken.

kazcw commented at 5:57 pm on July 26, 2014: contributor

This partially mitigates #1157, by at least limiting the time each transaction can be delayed to 2 minutes per peer.

sipa commented at 2:46 pm on July 27, 2014: member

Untested ACK.

jgarzik commented at 4:47 pm on July 29, 2014: contributor

ut ACK

laanwj added the label P2P on Jul 31, 2014

laanwj added the label Improvement on Jul 31, 2014

TheBlueMatt commented at 6:18 am on September 2, 2014: member

Indeed it likely should be an mruset, however it should grow with mapAskFor, which should be a lrumultimap…so maybe that can be a separate pull.

TheBlueMatt commented at 6:20 am on September 2, 2014: member

ut ACK.

laanwj commented at 8:02 am on September 2, 2014: member

Indeed it likely should be an mruset, however it should grow with mapAskFor, which should be a lrumultimap…so maybe that can be a separate pull.

~~Does it need to be a multimap at all? We already make sure that the key (time) is monotonically increasing in AskFor.~~ (edit: yes, it does, nNow is monotonically increasing but nRequestTime doesn’t need to be)

ACK after limiting the size of the set. Let’s not add any more unbounded structures to CNode.

Edit: limiting doesn’t need a mruset or limitedset at all. Something like

0    if (mapAskFor.size() > MAPASKFOR_MAX_SZ)
1        return;

at the beginning of AskFor would do.

laanwj commented at 8:10 am on September 2, 2014: member

I’m not sure how much this will really help though. It avoids ‘inv with duplicates’ from one node but It’s easy to pretend to be multiple nodes by reconnecting, or having multiple connections open even from the same host. I think we need a global limit as well, something like

0    // Don't plan too far into the future
1    if (nRequestTime > nNow + 10 * 60 * 1000000)
2        return;

rebroad commented at 2:39 am on September 3, 2014: contributor

@laanwj I agree, but also, perhaps the 2 minutes is a little high? If we change this to 1 minute this might also help.

Or better still, why not make the node determine the ideal number instead of hardcoding it?

laanwj commented at 10:09 am on September 4, 2014: member

See also #4831.

Not sure if the 2 minutes is too much, if you have a way to better determine the timeout that is welcome, but there is not really a hurry to have a transaction quickly. The smaller the timeout, the larger the chance that the first node isn’t really defunct but just having lag, and that we’ll receive it from two peers.

rebroad commented at 10:37 am on September 4, 2014: contributor

@laanwj How about keeping track of the average delay that each node takes when serving txs, and then when a tx comes in, as well as requesting it immediately from the first node to advertise it, it also deduces how long it will take to arrive from that node. After 10 seconds, it assesses which nodes also can provide it, and if any can provide it significantly sooner than the first node that advertised it, it requests it from that node instead.

laanwj commented at 11:13 am on September 4, 2014: member

@rebroad Possible, but I don’t want to make this overly complex. Mind that retries are extremely rare at 2 minutes. I’ve only had a few in a day on a normal node. By far most nodes simply send the tx when requested, even if sometimes they take their time.

I’ve also tried with a timeout of 30 seconds for a while, and retries were much more numerous. In general, the first node would still send the transaction anyway, as well as the other nodes who are asked. 10 seconds would be crazy.

The P2P network was never designed to be a low-latency network, and there is also not really a point in making it one. If anything, high-latency networks lend themselves better to privacy measures.

With the changes in #4831 all the pathological cases are avoided. A node announcing transactions then disconnecting, for example, will have no effect. The retry will always go to a node that is still connected and that has announced the transaction. This makes sure that second retries are even more infrequent.

We could add code to keep track of a running average of getdata response times, of course, and see from there.

rebroad commented at 3:27 pm on September 4, 2014: contributor

@laanwj No, I meant 10 seconds before assessing which would be the best node to request it from, not to perform the actual request (unless the currently requested node is known to take on average longer than the time it would take to request it from a different node).

jgarzik commented at 4:59 pm on September 4, 2014: contributor

Meta: The following note was actually intended for another @rebroad PR, to which @gmaxwell replied, where there was another “10 sec vs. 2 min” debate. I forgot the PR number.

Speaking generally, this debate sometimes seems to illustrate linear thinking with regards to peer interactions. In particular, I think two techniques are useful in byzantine networks:

Get impatient. When a local node requests something of a remote, and either (a) it does not receive a response after a short while, or (b) the remote responds at a punishingly slow trickle, the local node should notice this poor behavior and make the same query of another node. As such, I’m a bit sympathetic to some of @rebroad ’s goals.
Go parallel & redundant. Psychologically, engineers are resistant to this sort of solution, but a simple and effective approach may be to issue the same query in parallel, and discard useless or slow results. Yes, this increases traffic on the network, but by a quantifiable factor of 2 or 3.

10 seconds is too fast, 2 minutes is too slow. But I prefer better solutions to picking better magic numbers :)

rebroad commented at 2:18 am on September 5, 2014: contributor

@jgarzik By waiting only 10 seconds, in my experience, you’ll have received an inv from >50% of currently connected nodes. This is surely more than enough time to select the fastest peer to send the getdata request to; and certainly an improvement on the current implentation (i.e. added benefit with no disadvantages).

I.e. the “10 seconds” refers to the “short while” you mention. It’s not the same delay (delay, not size of delay) as the delay used before requesting the tx from another peer.

jgarzik commented at 3:49 am on September 5, 2014: contributor

When reasoning about this problem you cannot rely solely on today’s current average behavior.

rebroad commented at 4:15 am on September 5, 2014: contributor

@jgarzik Yes, I agree that the 10 seconds would ideally not be hardcoded either. But it is a better hardcoding than the current hardcoding, but as a continuous improvement, this hardcoding would also be removed. What would you recommend replacing the hardcoded 10 second wait with?

laanwj commented at 10:11 am on September 18, 2014: member

Closing in favor of #4831.

laanwj closed this on Sep 18, 2014

DrahtBot locked this on Sep 8, 2021

prevent peer flooding request queue for an inv #4547