randomize GETDATA(tx) request order and introduce bias toward outbound #14897

naumenkogs commented at 4:12 AM on December 8, 2018: member

This code makes executing two particular (and potentially other) attacks harder.

InvBlock

This behavior was described well here (page 11).

Per current implementation, if node A receives INV (tx) from node B, node A sends GETDATA to B and waits for TX message back.

Node A is likely to receive more INVs (regarding the same tx) from other peers. But node A would not send another GETDATA unless it does not hear TX back from node B for next 2 minutes (to save bandwidth)

Thus, if B is a malicious node, it can prevent node A from getting the transaction (even if all A’s peers have it) for 2 minutes.

This behavior seems to be an inherent limitation of the current P2P relay protocol, and I don’t see how it can be fundamentally changed (I can see workarounds which involve rewriting a lot of P2P code though).

What does this PR fix?

The attacks I’m looking at involve preventing A from learning the transaction for 2*N minutes. To do that, an attacker has to spin up N nodes and send N INVs simultaneously to node A (then InvBlocks will be queued with an interval of 2 minutes according to current implementation)

More precisely, 2 scenarios I’m looking at are:

An attacker censors a particular transaction. By performing InvBlock from different nodes, an attacker can execute a network-wide censorship of a particular transaction (or all transactions). The earlier an attacker founds the transaction he wants to censor, the easier it is to perform an attack. As it was pointed out by @gwillen, this is even more dangerous in the case of lightning, where transactions are known in advance.
Topology inference described in papers 1, 2 involve network-wide InvBlock. This fix would not mitigate this type of inference, but I believe it will make it more expensive to perform (an attacker would have to create more transactions and perform more rounds to learn the topology, the second paper itself notes that InvBlock isolation is important for the attack).

How does it work

This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order. As per @gmaxwell suggestion, GETDATA requests queue is created after processing all incoming messages from all nodes.

After this fix, if the incoming messages were [I1, I2, I3, O1, O2, O3, O4], the queue for GETDATA may look like [O2, O1, O3, O4, I1, I3, I2, ….].

If {I1, I2, I3} were significantly earlier (but the difference is less than TX_TIMEOUT=60 s) than others, the queue for GETDATA may look like [I2, O2, O1, O3, O4, I1, I3, ….].

Other comments:

This mitigation works better if the connectivity is higher (especially outbound, because it would be less likely that 2 GETDATAs for inbound malicious nodes queued together)

fanquake added the label P2P on Dec 8, 2018

gmaxwell commented at 4:40 AM on December 8, 2018: contributor

I'm strongly in support of doing something about this.

But instead, why not do something like: for each tx keep a next-time and ordered data structure, sorted on ( outbound, nonce) where nonce is just selected at random when the inv is processed.

Then after each run of the message handling loop, after processing messages from all peers. go through the txn and for every tx with a next-time in the past, pop the best element from the list and query its peer, then update the next time.

This way there is even a decent chance that you do not fetch the txn from the first to offer it to you, you fetch in random order, and fetch from outpeers first if there are any outstanding offer.

naumenkogs commented at 5:59 AM on December 8, 2018: member

@gmaxwell My implementation was supposed to have minimum intrusion in other parts of the code, but now when you suggested an alternative it seems not much code too. I'll update my PR soon.

DrahtBot commented at 6:42 AM on December 8, 2018: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15141 (Rewrite DoS interface between validation and net_processing by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

gmaxwell commented at 8:42 PM on December 8, 2018: contributor

Awesome, if you find that implementing something else actually does turn out to be complicated, please feel free to loop back. I asked in part because it didn't seem like it would be much more complicated, but the devil is in the details. I think what you propose here sounds better than nothing, but I think we can probably bypass the half-measure. :)

naumenkogs renamed this:
~~bias towards outbound connections for askFor(tx)~~
randomize GETDATA(tx) request order and introduce bias toward outbound
on Dec 9, 2018

naumenkogs force-pushed on Dec 9, 2018

in src/net.cpp:2823 in 6019b2a875 outdated

2838 | -    else
2839 | -        mapAlreadyAskedFor.insert(std::make_pair(inv.hash, nRequestTime));
2840 | -    mapAskFor.insert(std::make_pair(nRequestTime, inv));
2841 | +    tx_to_ask.insert(std::make_pair(inv.hash, nNow));
2842 | +
2843 | +    int priority = rand() % OUTBOUND_ASK_PRIORITY;

jonasschnelli commented at 11:05 PM on December 9, 2018:

Maybe use the internal random functions here (GetRand())?

AM5800 commented at 9:28 AM on December 10, 2018:

Using something like randrange would be even better

jonasschnelli commented at 11:06 PM on December 9, 2018: contributor

Nice work! Concept ACK.

in src/net.h:92 in 6019b2a875 outdated

  87 | @@ -88,6 +88,11 @@ static const size_t DEFAULT_MAXSENDBUFFER    = 1 * 1000;
  88 |  // NOTE: When adjusting this, update rpcnet:setban's help ("24h")
  89 |  static const unsigned int DEFAULT_MISBEHAVING_BANTIME = 60 * 60 * 24;  // Default 24-hour ban
  90 |  
  91 | +// time between GETDATA requests to different peers
  92 | +static const int64_t TX_TIMEOUT = 60 * 1000000;

AM5800 commented at 9:29 AM on December 10, 2018:

I think this name is way too generic. Maybe GETDATA_TIMEOUT?

AM5800 commented at 11:00 AM on December 10, 2018:

Besides, neither comment nor the variable name mention unit. I guess those are microseconds, but still...

in src/net.cpp:2077 in 6019b2a875 outdated

2072 | +                continue;
2073 | +
2074 | +            std::multimap<int, std::pair<CNode*, CInv>> peersToAskTx = tx_to_ask_from.find(tx->first)->second;
2075 | +            std::multimap<int, std::pair<CNode*, CInv>>::iterator peerToAsk = peersToAskTx.begin();
2076 | +            if (peerToAsk != peersToAskTx.end()) {
2077 | +                peerToAsk->second.first->scheduled_asks.insert(peerToAsk->second.second);

AM5800 commented at 10:36 AM on December 10, 2018:

I find this block and especially this line barely readable. All those peerToAsk->second.first... Maybe you can at least consider extracting some variables and giving them proper names?

For example tx->first is tx_hash. tx-second is a timeout and so on.

in src/net.cpp:90 in 6019b2a875 outdated

  85 | @@ -86,7 +86,8 @@ std::map<CNetAddr, LocalServiceInfo> mapLocalHost GUARDED_BY(cs_mapLocalHost);
  86 |  static bool vfLimited[NET_MAX] GUARDED_BY(cs_mapLocalHost) = {};
  87 |  std::string strSubVersion;
  88 |  
  89 | -limitedmap<uint256, int64_t> mapAlreadyAskedFor(MAX_INV_SZ);
  90 | +limitedmap<uint256, int64_t> tx_to_ask(MAX_INV_SZ);
  91 | +std::map<uint256, std::multimap<int, std::pair<CNode*, CInv>>> tx_to_ask_from;

AM5800 commented at 10:49 AM on December 10, 2018:

Can this CNode* pointer dangle?

naumenkogs commented at 5:58 PM on December 10, 2018:

It seems CNode constructor is deleted, and using a pointer is widely used across the codebase. What exactly would you suggest?

AM5800 commented at 7:14 PM on December 10, 2018:

I asked because I wasn't able to figure out if this is a problem during my initial review: if we put some node to this map and this node disconnects after. What will happen?

Right now I can't check it. Will look tomorrow more thoroughly. I also have a solution in case pointers can dangle - you can use NodeId. And then when you need actual pointer to CNode query it from CConnman.

BTW, thanks for extracted variables. It is much better now!

naumenkogs commented at 8:21 PM on December 10, 2018:

That's indeed a good idea.

sdaftuar commented at 2:49 PM on December 18, 2018:

This comment was marked resolved but I think the review comment still hasn't been addressed -- I think it'd be better to store a NodeId rather than a CNode * inside this structure, to avoid concerns around the locking of CNode's and potentially dereferencing a deleted object.

Also could you add a comment explaining what is being stored in this map?

in src/net.cpp:2825 in 6019b2a875 outdated

2840 | -    mapAskFor.insert(std::make_pair(nRequestTime, inv));
2841 | +    tx_to_ask.insert(std::make_pair(inv.hash, nNow));
2842 | +
2843 | +    int priority = rand() % OUTBOUND_ASK_PRIORITY;
2844 | +    if (!fInbound) {
2845 | +        priority += OUTBOUND_ASK_PRIORITY;

AM5800 commented at 11:20 AM on December 10, 2018:

I am a bit confused: So here if a node is inbound - it's priority is bound by OUTBOUND_ASK_PRIORITY, and if it is outbound - you add OUTBOUND_ASK_PRIORITY one more time.

So outbound nodes will have higher priority value, right?

But then in std::multimap<int, std::pair<CNode*, CInv>>::iterator peerToAsk = peersToAskTx.begin(); You take the node with lowest priority?

naumenkogs commented at 4:41 PM on December 10, 2018:

oops. You are right.

in src/net.cpp:2079 in 4ba0f8648b outdated

2074 | +
2075 | +            uint256 tx_hash = tx->first;
2076 | +            std::multimap<int, std::pair<CNode*, CInv>> peers_to_ask_tx = tx_to_ask_from.find(tx_hash)->second;
2077 | +            std::multimap<int, std::pair<CNode*, CInv>>::iterator peer_to_ask = peers_to_ask_tx.begin();
2078 | +            // Make sure that queried peer is not disconnected
2079 | +            CNode* peer;

ken2812221 commented at 3:11 PM on December 17, 2018:

I think you should initialize this with nullptr.

in src/net.cpp:2069 in 4ba0f8648b outdated

2065 | @@ -2065,6 +2066,35 @@ void CConnman::ThreadMessageHandler()
2066 |              if (flagInterruptMsgProc)
2067 |                  return;
2068 |          }
2069 | +        int64_t nNow = GetTimeMicros();

sdaftuar commented at 2:53 PM on December 18, 2018:

All the code added in this chunk strikes me as logic that belongs in the SendMessages() function in net_processing.cpp, rather than in the message handling thread.

The split between what we do in net.cpp and what we do in net_processing.cpp isn't crystal clear, but my general sense is that we try to do application level behaviors in net_processing.cpp, and generic/low-level network logic in net.cpp. When there's spillover between the two (such as when we need to store data in a CNode that feels like it belongs in CNodeState but for some reason it's tricky to do), then the split tends to be that net.cpp just handles the data storage, while the logic that uses it is in net_processing.cpp.

So adding logic directly to ThreadMessageHandler makes me vaguely uncomfortable -- the locking around CNode's can be tricky to get right, and so doing application-level things in here feels like the wrong design.

I actually wonder if we could move all this transaction request logic (including the per-peer and per-txid storage) to be in net_processing.cpp, and do this all with CNodeState, rather than have inv storage kept in CNode? If that's achievable, I think that would be a win overall. @TheBlueMatt or @theuni -- care to agree/disagree with what I wrote?

TheBlueMatt commented at 6:10 PM on December 19, 2018:

Yes, that stuff has no business being in net, its only currently there for historical reasons, if there's an opportunity to move it to net_processing, we should.

in src/net.cpp:2843 in 4ba0f8648b outdated

2858 | +    if (fInbound) {
2859 | +        priority += OUTBOUND_ASK_MAX_INFERIORITY;
2860 | +    }
2861 | +
2862 | +    tx_to_ask_from[inv.hash].insert(std::make_pair(priority, std::make_pair(this, inv)));
2863 | +    AddRef();

sdaftuar commented at 3:04 PM on December 18, 2018:

Storing a CNode * and doing an AddRef seems risky to me -- I think it's better to store a NodeId and do a lookup to see if the peer is still around. Otherwise, it's possible for a CNode object to live on even after a peer has disconnected, potentially for a long time? And in general reasoning about the lifetime of these objects is already a little tricky, so I think we should avoid adding more complexity to it.

in src/net.cpp:2834 in 4ba0f8648b outdated

2849 | -    if (it != mapAlreadyAskedFor.end())
2850 | -        mapAlreadyAskedFor.update(it, nRequestTime);
2851 | -    else
2852 | -        mapAlreadyAskedFor.insert(std::make_pair(inv.hash, nRequestTime));
2853 | -    mapAskFor.insert(std::make_pair(nRequestTime, inv));
2854 | +    tx_to_ask.insert(std::make_pair(inv.hash, nNow));

sdaftuar commented at 3:19 PM on December 18, 2018:

I was wondering if we should bake in a delay if we receive a new transaction for the first time from an inbound peer? It seems like if we want to give some window for an outbound peer to jump the queue before we make a getdata request to an inbound peer, then we might as well make that window more explicit, rather than leave it to the timing of the message handling thread.

in src/net_processing.cpp:2215 in 4ba0f8648b outdated

2209 | @@ -2210,8 +2210,10 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2210 |          bool fMissingInputs = false;
2211 |          CValidationState state;
2212 |  
2213 | -        pfrom->setAskFor.erase(inv.hash);
2214 | -        mapAlreadyAskedFor.erase(inv.hash);
2215 | +        pfrom->ask_set.erase(inv);
2216 | +        tx_to_ask.erase(inv.hash);
2217 | +        tx_to_ask_from.erase(inv.hash);

sdaftuar commented at 8:06 PM on December 18, 2018:

Erasing from tx_to_ask_from would mean that we will forget all the other peers who have announced this transaction to us as well, and not be able to request it anymore? I think we shouldn't do that until after the transaction has been accepted to our mempool (or maybe is in our orphan map). Otherwise an adversary could blind us to a segwit transaction by announcing a tx first, waiting a reasonable amount of time for other peers to announce, and then delivering a malleated version.

in src/net.h:72 in 4ba0f8648b outdated

  68 | @@ -69,7 +69,7 @@ static const bool DEFAULT_UPNP = false;
  69 |  /** The maximum number of entries in mapAskFor */
  70 |  static const size_t MAPASKFOR_MAX_SZ = MAX_INV_SZ;
  71 |  /** The maximum number of entries in setAskFor (larger due to getdata latency)*/
  72 | -static const size_t SETASKFOR_MAX_SZ = 2 * MAX_INV_SZ;
  73 | +static const size_t ASKSET_MAX_SZ = 2 * MAX_INV_SZ;

sdaftuar commented at 8:20 PM on December 18, 2018:

The comment above this line should be updated for the new variable name.

in src/net.cpp:2076 in 4ba0f8648b outdated

2071 | +            int64_t next_ask_time = tx->second;
2072 | +            if (next_ask_time > nNow)
2073 | +                continue;
2074 | +
2075 | +            uint256 tx_hash = tx->first;
2076 | +            std::multimap<int, std::pair<CNode*, CInv>> peers_to_ask_tx = tx_to_ask_from.find(tx_hash)->second;

sdaftuar commented at 8:22 PM on December 18, 2018:

Do we need to make this std::multimap<...>& peers_to_ask_tx to avoid a copy?

in src/net.h:544 in 4ba0f8648b outdated

 540 | @@ -536,7 +541,13 @@ extern bool fDiscover;
 541 |  extern bool fListen;
 542 |  extern bool fRelayTxes;
 543 |  
 544 | -extern limitedmap<uint256, int64_t> mapAlreadyAskedFor;
 545 | +// keeps track of all received INVs for which TX not received

sdaftuar commented at 8:27 PM on December 18, 2018:

Perhaps update this comment to explain what is in this map exactly? I think it's txid -> next request time in micros?

in src/net.h:548 in 4ba0f8648b outdated

 540 | @@ -536,7 +541,13 @@ extern bool fDiscover;
 541 |  extern bool fListen;
 542 |  extern bool fRelayTxes;
 543 |  
 544 | -extern limitedmap<uint256, int64_t> mapAlreadyAskedFor;
 545 | +// keeps track of all received INVs for which TX not received
 546 | +extern limitedmap<uint256, int64_t> tx_to_ask;
 547 | +
 548 | +// peers are ordered randomly with a bias towards outbound
 549 | +// need CInv inside to distinguish segwit

sdaftuar commented at 8:28 PM on December 18, 2018:

Also here, would be nice to have the comment explain exactly what the fields in the map are.

sdaftuar changes_requested

sdaftuar commented at 8:28 PM on December 18, 2018: member

Concept ACK, this would be a nice improvement to make.

However I think it would be better if we could move more, if not all, of this logic to net_processing.cpp. I left some comments below explaining my thinking.

in src/net.cpp:89 in 4ba0f8648b outdated

  85 | @@ -86,7 +86,8 @@ std::map<CNetAddr, LocalServiceInfo> mapLocalHost GUARDED_BY(cs_mapLocalHost);
  86 |  static bool vfLimited[NET_MAX] GUARDED_BY(cs_mapLocalHost) = {};
  87 |  std::string strSubVersion;
  88 |  
  89 | -limitedmap<uint256, int64_t> mapAlreadyAskedFor(MAX_INV_SZ);
  90 | +limitedmap<uint256, int64_t> tx_to_ask(MAX_INV_SZ);

sdaftuar commented at 7:37 PM on December 19, 2018:

If an adversary were to blast us with huge INV messages, then I think they could take over this data structure, and blind us to all other transactions -- once a txid leaves tx_to_ask I think we'll never request it?

Also, I think this is one way that tx_to_ask and tx_to_ask_from could get out of sync which looks like it would cause a memory leak.

in src/net.cpp:2663 in 4ba0f8648b outdated

2833 | -    limitedmap<uint256, int64_t>::const_iterator it = mapAlreadyAskedFor.find(inv.hash);
2834 | -    if (it != mapAlreadyAskedFor.end())
2835 | -        nRequestTime = it->second;
2836 | -    else
2837 | -        nRequestTime = 0;
2838 | -    LogPrint(BCLog::NET, "askfor %s  %d (%s) peer=%d\n", inv.ToString(), nRequestTime, FormatISO8601Time(nRequestTime/1000000), id);

practicalswift commented at 2:33 PM on January 7, 2019:

This is the last use of FormatISO8601Time. Please remove it from src/util/time.cpp.

DrahtBot added the label Needs rebase on Jan 14, 2019

naumenkogs force-pushed on Jan 14, 2019

naumenkogs commented at 9:27 PM on January 14, 2019: member

I've updated the implementation in a DoS-resistant way.

Well, the new code is heavily based on the latest @sdaftuar work (mostly his implementation with my review and couple minor edits), but we decided to use this PR.

DrahtBot removed the label Needs rebase on Jan 14, 2019

laanwj added this to the "Blockers" column in a project

ryanofsky commented at 8:50 PM on January 18, 2019: member

I think I might be missing something about when the code in SendMessages() runs. But how is this change randomizing which peer a transaction gets requested from?

It seems like the first request is scheduled based on inv time, and goes to the peer that announced first (with 1 second penalty for an incoming peer):

https://github.com/bitcoin/bitcoin/blob/2f9abf746b3555ca8b47e8d0122bf4252396a7ff/src/net_processing.cpp#L669-L674

And then subsequent requests all get scheduled at the same time based on the first request (again with penalty for incoming peers):

https://github.com/bitcoin/bitcoin/blob/2f9abf746b3555ca8b47e8d0122bf4252396a7ff/src/net_processing.cpp#L3865-L3867

So the peer chosen will depend on the order of SendMessages() calls, which appears to be fixed, not random.

Very possible I am missing something here, though.

As for the PR, I need to look more closely at the code, but in general it looks very good and the comments are great. It would be good to add a commit description and also credit Suhas if it's true this is "heavily based" on his work (https://github.com/bitcoin/bitcoin/pull/14897#issuecomment-454167958). I think if you add a

Co-authored-by: Suhas Daftuar <sdaftuar@gmail.com>

line to the end of your commit message it will list him as a coauthor in github.

naumenkogs commented at 9:39 PM on January 18, 2019: member

@ryanofsky

It would not randomize for outbound peers, because the request would be sent right after processMessage for a given peer. We might add a delay to randomize this one too, but under current threat model (sybil attacks through outbound are very unlikely) I don't see a real reason.

And yes, I should modify the github comment to credit Suhas, thanks for the idea.

ryanofsky commented at 10:59 PM on January 18, 2019: member

It would not randomize for outbound peers, because the request would be sent right after processMessage for a given peer

This is only true for the first request, right? After the first request, assuming it timed out, all the outbound peers would be scheduled to make the next request at the same time (last_request_time + MAX_GETDATA_TX_DELAY), and since SendMessages() seems to be called on peers in a stable order, which peer requested first would be fixed. This seems to contradict the PR description which says "This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order." Maybe you just need to drop the part about randomizing the order.

I think I also need to look into how SendMessages() is called. It seems like if it were called in a predictable way, like once a second on all peers in a fixed order, then this code would just keep resending the transaction request to the first outbound peer after the initial request timed out, and never try any different peers, because every peer would schedule their next request at the exact same time, and the same peer would always be called first to actually send the request.

naumenkogs commented at 11:05 PM on January 18, 2019: member

@ryanofsky I would rather randomize it for outbound as well, since the point you're making seems correct and it's no good.

It is called in a fixed order. There is a chance of randomization due to non-uniform processing time per request, but this processing time is way less than 1 second, so the chance is very low.

After talking to you, I think we should add PoissonNext() to both outbound and inbound delays (a new delay every time), because this would only make it stronger and won't bring any bad implications. What do you think?

ryanofsky commented at 11:13 PM on January 18, 2019: member

Adding PoissonNext() seems like a reasonable way to randomize & rotate among peers, but I am far from an expert on this code and am just trying to understand it myself, so hopefully others will weigh in.

naumenkogs commented at 11:23 PM on January 18, 2019: member

@ryanofsky Ah, I think I was confused for a while myself (Sorry for that), but see what actually happens.

Imagine scheduling a request for t=1 from peer A. Then, for A and other peers B, C we've heard the transaction from the next request time after the timeout will be t1=61. Then, m_tx_download.m_tx_process_time for peer A would not contain the tx anymore (it'd be in in_flight), and the choice between B and C will be random, because they're scheduled for the same time, and who knows which node's SendMessages will hit this timestamp first.

Well, a bit not random, because Probability(B) > Probability(C) if the list is A B C D E F G H. But that's fine, because 1) outbound would be always prioritized; 2) An attacker cannot control this list.

Thus, I think we should not change anything (besides perhaps adding a little delay for outbound to not process right away, but that's not critical)

ryanofsky commented at 12:17 AM on January 19, 2019: member

My concern isn't than an attacker will be able to choose an outbound peer. My concern is that if an outbound peer fails, the code might just keep trying to send requests to it without ever trying another peer. With:

void ThreadMessageHandler() {
   for (pnode : vnodes) {
      SendMessages(pnode);
   }
}

if there's a longer delay between ThreadMessageHandler calls than it take ThreadMessageHandler to execute, and if multiple nodes are scheduled to send a request at the same time, the earlier node in the list will usually be the one sending the request, unless I'm missing something.

naumenkogs commented at 12:41 AM on January 19, 2019: member

@ryanofsky no it won't, because it will be removed from the attacker's state.m_tx_download.m_tx_process_time (it will be in attacker's state.m_tx_download.m_in_flight instead). And thus next time we will ask it from some other peer, which was never asked for that tx before.

ryanofsky commented at 12:57 AM on January 19, 2019: member

Makes sense. Thanks!

DrahtBot added the label Needs rebase on Jan 21, 2019

naumenkogs force-pushed on Jan 21, 2019

DrahtBot removed the label Needs rebase on Jan 21, 2019

in src/net_processing.cpp:675 in 53ad7bedb4 outdated

 670 | +    int64_t process_time = GetTimeMicros();
 671 | +
 672 | +    // We delay processing announcements from non-preferred (eg inbound) peers
 673 | +    if (!state->fPreferredDownload) process_time += INBOUND_PEER_TX_DELAY;
 674 | +
 675 | +    peer.m_tx_process_time.insert(std::make_pair(process_time, txid));

sipa commented at 10:58 PM on January 23, 2019:

Use peer.m_tx_process_time.emplace(process_time, txid); for better readability (and possibly slightly better performance).

in src/net_processing.cpp:662 in 53ad7bedb4 outdated

 656 | @@ -591,6 +657,25 @@ static void FindNextBlocksToDownload(NodeId nodeid, unsigned int count, std::vec
 657 |      }
 658 |  }
 659 |  
 660 | +void RequestTx(CNodeState *state, uint256 &txid) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 661 | +{
 662 | +    CNodeState::TxDownloadState &peer = state->m_tx_download;

sipa commented at 10:58 PM on January 23, 2019:

Slightly confusing variable name. What about download_state ?

in src/net_processing.cpp:697 in 53ad7bedb4 outdated

 693 | @@ -609,6 +694,26 @@ static bool IsOutboundDisconnectionCandidate(const CNode *node)
 694 |      return !(node->fInbound || node->m_manual_connection || node->fFeeler || node->fOneShot);
 695 |  }
 696 |  
 697 | +int64_t PeerLogicValidation::GetTxRequestTime(uint256 &txid)

sipa commented at 10:59 PM on January 23, 2019:

txid can be a const reference

in src/net_processing.cpp:706 in 53ad7bedb4 outdated

 701 | +        return it->second;
 702 | +    }
 703 | +    return 0;
 704 | +}
 705 | +
 706 | +void PeerLogicValidation::UpdateTxRequestTime(uint256 &txid, int64_t request_time)

sipa commented at 11:01 PM on January 23, 2019:

txid can be a const reference

in src/net_processing.h:76 in 53ad7bedb4 outdated

  71 | @@ -72,11 +72,19 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI
  72 |      /** If we have extra outbound peers, try to disconnect the one with the oldest block announcement */
  73 |      void EvictExtraOutboundPeers(int64_t time_in_seconds) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
  74 |  
  75 | +    void EraseTxRequest(uint256& txid) EXCLUSIVE_LOCKS_REQUIRED(cs_main) { m_already_asked_for.erase(txid); }
  76 | +private:

sipa commented at 11:16 PM on January 23, 2019:

This 'private' seems redundant.

in src/net_processing.cpp:3842 in 53ad7bedb4 outdated

3837 | @@ -3731,24 +3838,44 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
3838 |          //
3839 |          // Message: getdata (non-blocks)
3840 |          //
3841 | -        while (!pto->mapAskFor.empty() && (*pto->mapAskFor.begin()).first <= nNow)
3842 | +        std::multimap<int64_t, uint256> &tx_process_time = state.m_tx_download.m_tx_process_time;
3843 | +        while (!tx_process_time.empty() && (*tx_process_time.begin()).first <= nNow && state.m_tx_download.m_in_flight.size() < MAX_PEER_TX_IN_FLIGHT)

sipa commented at 11:46 PM on January 23, 2019:

Instead of (*tx_process_time.begin()).first you can write tx_process_time.begin()->first

in src/net_processing.cpp:3848 in 53ad7bedb4 outdated

3851 | -                vGetData.push_back(inv);
3852 | -                if (vGetData.size() >= 1000)
3853 | -                {
3854 | -                    connman->PushMessage(pto, msgMaker.Make(NetMsgType::GETDATA, vGetData));
3855 | -                    vGetData.clear();
3856 | +                // If this transaction was last request more than 1 minute ago,

sipa commented at 11:47 PM on January 23, 2019:

Typo: requested

in src/net_processing.cpp:3864 in 53ad7bedb4 outdated

3867 | +                    UpdateTxRequestTime(inv.hash, nNow);
3868 | +                    state.m_tx_download.m_in_flight.insert(inv.hash);
3869 | +                } else {
3870 | +                    // This transaction is in flight from someone else; queue
3871 | +                    // up processing to happen after the download times out
3872 | +                    // (with a slight delay for inboudnd peers, to prefer

sipa commented at 12:21 AM on January 24, 2019:

Typo: inboudnd

in src/net_processing.cpp:3866 in 53ad7bedb4 outdated

3869 | +                } else {
3870 | +                    // This transaction is in flight from someone else; queue
3871 | +                    // up processing to happen after the download times out
3872 | +                    // (with a slight delay for inboudnd peers, to prefer
3873 | +                    // requests to outbound peers).
3874 | +                    int64_t new_process_time = last_request_time + MAX_GETDATA_TX_DELAY;

sipa commented at 12:28 AM on January 24, 2019:

Could this just call RequestTx? To avoid spreading the INBOUND_PEER_TX_DELAY logic over two places in the code.

in src/net_processing.cpp:3870 in 53ad7bedb4 outdated

3878 | -            } else {
3879 | -                //If we're not going to ask, don't expect a response.
3880 | -                pto->setAskFor.erase(inv.hash);
3881 | -            }
3882 | -            pto->mapAskFor.erase(pto->mapAskFor.begin());
3883 | +              } else {

sipa commented at 12:49 AM on January 24, 2019:

Incorrect indentation.

sipa commented at 7:08 PM on January 24, 2019: member

Due to the fact that nodes are processed in order in CConnman::ThreadMessageHandler, with an up-to-100ms delay before running SendMessages, I believe it may be very likely that the first outgoing peer will get the inv first. Maybe when re-inserting invs in the queue for processing, add a small random delay (say in the order of a few seconds)?

naumenkogs commented at 4:50 AM on January 26, 2019: member

@sipa so in current version inbound peers get a penalty of 1 second. Without the randomization you suggest, creating sybil inbound connections to a target node is quite pointless.

If we add this PoissonNextSend(2) to every non-first (I believe it does not make sense for first) request, it would make sense to create sybil inbounds, hope that one of them roll zero and get a 1-second delay (from penalty), while one (or N/8) outbound get 2-seconds delay on average (and there are currently only 8 outbound not all of which would know the tx under attack).

Thus, there is a trade-off. I think it is safe enough when penalty >= delay, so I'll make both of them 1 second for now. I will leave a comment about it as well.

sipa commented at 5:17 AM on January 26, 2019: member

@naumenkogs What about something like a uniform delay between 60 and 62 s for outbound, and between 62 and 64 s for inbound?

naumenkogs commented at 5:35 AM on January 26, 2019: member

I've increased the delay to 2 seconds. We might've still used 1 second for first request to reduce the latency, but 1) I believe the issue pointed out above still applies for inbound even though it's much less likely 2) I think latency won't be affected significantly (I'd expect 5% extra delay or so) 3) code looks a bit cleaner this way

in src/net_processing.h:75 in fe01a6d942 outdated

  71 | @@ -72,6 +72,8 @@ class PeerLogicValidation final : public CValidationInterface, public NetEventsI
  72 |      /** If we have extra outbound peers, try to disconnect the one with the oldest block announcement */
  73 |      void EvictExtraOutboundPeers(int64_t time_in_seconds) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
  74 |  
  75 | +    // void EraseTxRequest(uint256& txid) EXCLUSIVE_LOCKS_REQUIRED(cs_main) { m_already_asked_for.erase(txid); }

marcinja commented at 5:13 PM on January 28, 2019:

nit: Looks like you can remove these two lines now

naumenkogs commented at 7:54 PM on January 28, 2019:

oops!

in src/net_processing.cpp:707 in be6a8a3c44 outdated

 707 | +    if (last_request_time == 0) {
 708 | +        peer_download_state.m_tx_announced.insert(txid);
 709 | +        process_time = GetTimeMicros();
 710 | +    } else {
 711 | +        // Add extra delay so that first peer from the list is not prioritizied due to ThreadMessageHandler delay
 712 | +        process_time = PoissonNextSend(last_request_time + MAX_GETDATA_TX_DELAY, AVG_GETDATA_REQUEST_INTERVAL);

sdaftuar commented at 7:19 PM on January 29, 2019:

There's no downside here to using a uniform distribution here, is there? And otherwise some inbound peers could be prioritized over outbound peers, which is what we were trying to avoid... I think we could just do:

process_time = last_request_time + MAX_GETDATA_TX_DELAY + GetRand(AVG_GETDATA_REQUEST_INTERVAL * 1000000)

naumenkogs commented at 10:59 PM on January 29, 2019:

some inbound peers could be prioritized over outbound peers

Nope, with these constants, it's [60..62] for outbound and [62..64] for inbound, so it's not possible.

However, I think you're right and using GetRand() would work.

sdaftuar commented at 3:20 PM on January 30, 2019:

Nope, with these constants, it's [60..62] for outbound and [62..64] for inbound, so it's not possible.

There is no upper bound on what PoissonNextSend might return...?

naumenkogs commented at 3:35 PM on January 30, 2019:

You're right, thanks.

in src/net_processing.cpp:706 in be6a8a3c44 outdated

 706 | +    // First time requesting this tx
 707 | +    if (last_request_time == 0) {
 708 | +        peer_download_state.m_tx_announced.insert(txid);
 709 | +        process_time = GetTimeMicros();
 710 | +    } else {
 711 | +        // Add extra delay so that first peer from the list is not prioritizied due to ThreadMessageHandler delay

sdaftuar commented at 7:19 PM on January 29, 2019:

nit: prioritized

Also I find this comment a bit confusing, perhaps just something like: // Randomize the delay to avoid biasing some peers over others (such as due to fixed ordering of peer processing in ThreadMessageHandler)

(BTW sorry for writing this code incorrectly the first time, I mistakenly thought we shuffled our peers each loop iteration in ThreadMessageHandler, which is why I didn't add randomization here in the first place!)

in src/net_processing.cpp:374 in be6a8a3c44 outdated

 369 | @@ -367,6 +370,9 @@ struct CNodeState {
 370 |      }
 371 |  };
 372 |  
 373 | +// Keeps track of the time (in microseconds) when transactions were requested last time
 374 | +limitedmap<uint256, int64_t> m_already_asked_for(MAX_INV_SZ);

sdaftuar commented at 7:20 PM on January 29, 2019:

I think we still want the GUARDED_BY(cs_main) annotation here?

naumenkogs commented at 10:25 PM on January 29, 2019:

For some reason I didn't manage to make it work with limitedmaps. It seemed like it's either size allocation or GUARDED_BY, and I don't see how to allocate it anywhere else after my code changes.

sdaftuar commented at 4:17 PM on January 30, 2019:

Actually, this should be renamed now to g_already_asked_for

in src/net_processing.cpp:671 in be6a8a3c44 outdated

 667 | +void EraseTxRequest(uint256& txid) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 668 | +{
 669 | +    m_already_asked_for.erase(txid);
 670 | +}
 671 | +
 672 | +int64_t GetTxRequestTime(const uint256& txid)

sdaftuar commented at 7:21 PM on January 29, 2019:

I think this should also have EXCLUSIVE_LOCKS_REQUIRED(cs_main).

in src/net_processing.cpp:680 in be6a8a3c44 outdated

 676 | +        return it->second;
 677 | +    }
 678 | +    return 0;
 679 | +}
 680 | +
 681 | +void UpdateTxRequestTime(const uint256 &txid, int64_t request_time)

sdaftuar commented at 7:21 PM on January 29, 2019:

EXCLUSIVE_LOCKS_REQUIRED(cs_main) here as well.

in src/net_processing.cpp:691 in be6a8a3c44 outdated

 689 | +        m_already_asked_for.update(it, request_time);
 690 | +    }
 691 | +}
 692 | +
 693 | +
 694 | +void RequestTx(CNodeState *state, const uint256 &txid) {

sdaftuar commented at 7:23 PM on January 29, 2019:

EXCLUSIVE_LOCKS_REQUIRED(cs_main) here as well.

sdaftuar commented at 7:34 PM on January 29, 2019: member

Looks mostly good, will test.

in src/net_processing.cpp:374 in 1c2dc6dfb6 outdated

 369 | @@ -301,6 +370,9 @@ struct CNodeState {
 370 |      }
 371 |  };
 372 |  
 373 | +// Keeps track of the time (in microseconds) when transactions were requested last time
 374 | +limitedmap<uint256, int64_t> m_already_asked_for(MAX_INV_SZ);

MarcoFalke commented at 3:19 AM on January 30, 2019:

For some reason I didn't manage to make it work with limitedmaps. It seemed like it's either size allocation or GUARDED_BY, and I don't see how to allocate it anywhere else after my code changes.

This might work

limitedmap<uint256, int64_t> m_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);

sdaftuar commented at 4:14 PM on January 30, 2019:

Verified that this works, thanks @MarcoFalke

sdaftuar commented at 3:31 PM on January 30, 2019: member

Tested and it works, though I'd like to see the changes I mentioned above.

Here's a functional test that reviewers may be interested in that demonstrates the new behavior (this test fails on master):

#!/usr/bin/env python3
# Copyright (c) 2019 The Bitcoin Core developers
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
"""Test transaction download behavior for resilience to InvBlock.

- Create some bitcoind nodes that are connected to one another.
- Create many mininode's to each bitcoind.
- Randomly create a transaction on a bitcoind node:
  * When a mininode hears of the transaction via an INV, send
    announcements from all mininodes to all bitcoind nodes.
  * Do not respond to any GETDATA requests.
  * Verify that after 90 seconds all bitcoind nodes still have each
    transaction.

"""

from test_framework.messages import msg_inv, CInv
from test_framework.mininode import P2PInterface
from test_framework.test_framework import BitcoinTestFramework
from test_framework.util import sync_mempools

class TestP2PConn(P2PInterface):
    def __init__(self):
        super().__init__()
        self.announced_tx = []

class TxDownloadTest(BitcoinTestFramework):
    def set_test_params(self):
        self.setup_clean_chain = True
        self.num_nodes = 5

    def test_inv_block(self):
        # Pick a random bitcoind and generate a transaction
        import random
        node = random.choice(self.nodes)
        txid = node.sendtoaddress(random.choice(self.nodes).getnewaddress(), 1)
        self.log.info("---> Generated txid " + txid)
        txid = int(txid, 16)

        # Announce the transaction to all peers
        msg = msg_inv([CInv(t=1, h=txid)])
        for p in self.peers:
            p.send_message(msg)

        sync_mempools(self.nodes, timeout=90)
        self.log.info("---> Mempools synced")

    def run_test(self):
        # Setup the p2p connections
        NUM_INBOUND = 10
        self.peers = []
        for node in self.nodes:
            node.generate(10)
            self.sync_all()
            for i in range(NUM_INBOUND):
                 self.peers.append(node.add_p2p_connection(TestP2PConn()))

        self.nodes[0].generate(100) # mature prior coinbases
        self.sync_all()
        self.log.info("Nodes are setup with balances")

        TRIALS = 2
        for i in range(TRIALS):
            self.log.info("InvBlockTest: trial %d" % i)
            self.test_inv_block()

if __name__ == '__main__':
    TxDownloadTest().main()

MarcoFalke commented at 3:39 PM on January 30, 2019: member

Could the commits be squashed into a single one? It seems they are just fixups to the first one.

MarcoFalke added this to the milestone 0.18.0 on Jan 30, 2019

naumenkogs force-pushed on Jan 30, 2019

in src/net_processing.cpp:336 in cf36539a4d outdated

 325 | +     *   peer's m_in_flight set and from their recently announced set
 326 | +     *   (m_tx_announced).  We also clear m_already_asked_for for that entry, so
 327 | +     *   that if somehow the transaction is not accepted but also not added to
 328 | +     *   the reject filter, then we will eventually redownload from other
 329 | +     *   peers.
 330 | +     */

sdaftuar commented at 5:02 PM on January 30, 2019:

nit: This comment should be updated to refer to the new variables (eg mapAlreadyAskedFor was renamed), new constants (1 second is now incorrect), and the randomization behavior/delay change.

in src/net_processing.cpp:374 in cf36539a4d outdated

 369 | @@ -301,6 +370,9 @@ struct CNodeState {
 370 |      }
 371 |  };
 372 |  
 373 | +// Keeps track of the time (in microseconds) when transactions were requested last time
 374 | +limitedmap<uint256, int64_t> m_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);

sdaftuar commented at 5:02 PM on January 30, 2019:

nit: I mentioned this before, I believe our style guide says that globals should start with g_ instead of m_.

sdaftuar approved

sdaftuar commented at 5:04 PM on January 30, 2019: member

lightly-tested ACK cf36539a4d7a428e2120c1bc26cfba3216a48a69

I did leave some nits that should probably be addressed however...

naumenkogs force-pushed on Feb 1, 2019

in src/net_processing.cpp:380 in 1442d29530 outdated

 372 | @@ -301,6 +373,9 @@ struct CNodeState {
 373 |      }
 374 |  };
 375 |  
 376 | +// Keeps track of the time (in microseconds) when transactions were requested last time
 377 | +limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);

jamesob commented at 6:17 PM on February 4, 2019:

Presumably the MAX_PEER_TX_ANNOUNCEMENTS limitation prevents any peer from triggering unnecessary evictions by overrunning this map with junk announcements.

sdaftuar commented at 6:54 PM on February 4, 2019:

I think the limit on the number of in-flight requests to a peer is what prevents this from being taken over.

naumenkogs commented at 7:10 PM on February 4, 2019:

Right, and you need as many as 500 sybil connections to overrun this map :)

sdaftuar commented at 7:23 PM on February 4, 2019:

I forgot to add that the downside to this map being overrun is that we'll just request a transaction sooner than we otherwise would have (in the event that it was removed due to this data structure filling up). So if a zillion peers manage to take over this data structure, they're just wasting our bandwidth a bit -- which they can already do.

jamesob commented at 8:08 PM on February 4, 2019:

I think the limit on the number of in-flight requests to a peer is what prevents this from being taken over.

Ah right, per the third clause in this while condition: https://github.com/bitcoin/bitcoin/pull/14897/files#diff-eff7adeaec73a769788bb78858815c91R3864

in src/net_processing.cpp:706 in 1442d29530 outdated

 697 | +    if (peer_download_state.m_tx_announced.size() >= MAX_PEER_TX_ANNOUNCEMENTS || peer_download_state.m_tx_announced.count(txid)) {
 698 | +        // Too many queued announcements from this peer, or we already have
 699 | +        // this announcement
 700 | +        return;
 701 | +    }
 702 | +

jamesob commented at 6:22 PM on February 4, 2019:

Is it worth a belt-and-suspenders early return here if txid is present in peer_download_state.m_tx_announced or .m_in_flight, a la the old AskFor() definition? Or is that change in behavior intentional?

sdaftuar commented at 6:55 PM on February 4, 2019:

I believe the || peer_download_state.m_tx_announced.count(txid)) { captures the check you're asking about? I don't think this should be a change in behavior compared with AskFor().

Because we check against m_tx_announced, I don't think we also need to check with what's in flight.

jamesob commented at 8:05 PM on February 4, 2019:

D'oh, yep you're totally right.

in src/net_processing.cpp:1700 in 1442d29530 outdated

1696 | @@ -1570,7 +1697,7 @@ bool static ProcessHeadersMessage(CNode *pfrom, CConnman *connman, const std::ve
1697 |      return true;
1698 |  }
1699 |  
1700 | -bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStream& vRecv, int64_t nTimeReceived, const CChainParams& chainparams, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool enable_bip61)
1701 | +bool static ProcessMessage(PeerLogicValidation* peer_logic, CNode* pfrom, const std::string& strCommand, CDataStream& vRecv, int64_t nTimeReceived, const CChainParams& chainparams, CConnman* connman, const std::atomic<bool>& interruptMsgProc, bool enable_bip61)

jamesob commented at 6:28 PM on February 4, 2019:

Might be missing something, but can't see any necessity for this new param.

sdaftuar commented at 6:57 PM on February 4, 2019:

I think this is a hold-over from an earlier version of this patch in which some of the data structures were inside PeerLogicValidation -- I think you're right that this should be removed now.

naumenkogs commented at 6:58 PM on February 4, 2019:

good point, it's not needed after latest changes.

jamesob commented at 6:48 PM on February 4, 2019: member

Concept ACK. The documentation is awesome and in general GETDATA(tx) logic is much easier to reason about after this changeset.

in src/net_processing.cpp:688 in 1442d29530 outdated

 680 | +    return 0;
 681 | +}
 682 | +
 683 | +void UpdateTxRequestTime(const uint256& txid, int64_t request_time) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 684 | +{
 685 | +    auto it = g_already_asked_for.find(txid);

sipa commented at 7:43 PM on February 4, 2019:

If g_already_asked_for turns into a normal map, this whole function can become g_already_asked_for[txid] = request_time;.

naumenkogs commented at 7:16 PM on February 5, 2019:

To be clear: are you suggesting switching to normal map and relying on the fact that we are not gonna have 500 sybils (so that it results in the same limiting properties due to checks discussed here?

sipa commented at 6:26 PM on February 6, 2019:

No, I'm suggesting that if we'd determine that switching to a normal map is safe, this function can be simplified.

jamesob commented at 8:09 PM on February 4, 2019: member

utACK modulo the unnecessary ProcessMessage() param.

naumenkogs force-pushed on Feb 4, 2019

jamesob commented at 8:16 PM on February 4, 2019: member

utACK https://github.com/bitcoin/bitcoin/commit/acf05329ca1bc36f8e89dcd7e1db956b6681262d

It'd be nice if someone packaged up @sdaftuar's functional test for a follow-up PR.

in src/net_processing.cpp:3863 in acf05329ca outdated

3869 | -                vGetData.push_back(inv);
3870 | -                if (vGetData.size() >= 1000)
3871 | -                {
3872 | -                    connman->PushMessage(pto, msgMaker.Make(NetMsgType::GETDATA, vGetData));
3873 | -                    vGetData.clear();
3874 | +        std::multimap<int64_t, uint256>& tx_process_time = state.m_tx_download.m_tx_process_time;

sipa commented at 9:58 PM on February 4, 2019:

Suggestion: use auto& as type.

in src/net_processing.cpp:3865 in acf05329ca outdated

3871 | -                {
3872 | -                    connman->PushMessage(pto, msgMaker.Make(NetMsgType::GETDATA, vGetData));
3873 | -                    vGetData.clear();
3874 | +        std::multimap<int64_t, uint256>& tx_process_time = state.m_tx_download.m_tx_process_time;
3875 | +        while (!tx_process_time.empty() && tx_process_time.begin()->first <= nNow && state.m_tx_download.m_in_flight.size() < MAX_PEER_TX_IN_FLIGHT) {
3876 | +            const uint256& txid = (*tx_process_time.begin()).second;

sipa commented at 9:59 PM on February 4, 2019:

Suggestion: use tx_process_time.begin()->second.

in src/net_processing.cpp:3874 in acf05329ca outdated

3880 | +                // then request.
3881 | +                int64_t last_request_time = GetTxRequestTime(inv.hash);
3882 | +                if (last_request_time <= nNow - GETDATA_TX_INTERVAL) {
3883 | +                    LogPrint(BCLog::NET, "Requesting %s peer=%d\n", inv.ToString(), pto->GetId());
3884 | +                    vGetData.push_back(inv);
3885 | +                    if (vGetData.size() >= 1000) {

sipa commented at 10:00 PM on February 4, 2019:

Maybe introduce a constant for this (I know it's moved code, feel free to ignore).

MarcoFalke commented at 7:40 PM on February 5, 2019:

Maybe introduce a constant for this (I know it's moved code, feel free to ignore).

I'd prefer to do this in a follow up pull request for all places where this constant is used, since this is not the only place that uses this constant IIRC.

naumenkogs commented at 8:03 PM on February 5, 2019:

It seems like it's the only place. We had this discussion on IRC: the code for processing GETDATA uses 50,000 instead of 1000 for compatibility reasons.

in src/net_processing.cpp:76 in acf05329ca outdated

  71 | +/** How many microseconds to delay requesting transactions from inbound peers */
  72 | +static constexpr int64_t INBOUND_PEER_TX_DELAY = 2 * 1000000;
  73 | +/** How long to wait (in microseconds) before downloading a transaction from an additional peer */
  74 | +static constexpr int64_t GETDATA_TX_INTERVAL = 60 * 1000000;
  75 | +/** Maximum delay (in microseconds) for transaction requests to avoid biasing some peers over others.
  76 | + * To preserve security, should not exceed INBOUND_PEER_TX_DELAY */

sipa commented at 10:15 PM on February 4, 2019:

You can add static_assert(INBOUND_PEER_DELAY >= MAX_GETDATA_RANDOM_DELAY, "To preserve security, MAX_GETDATA_RANDOM_DELAY should not exceed INBOUND_PEER_DELAY"); if you want to codify this (only checked at compile time).

naumenkogs commented at 10:52 PM on February 4, 2019:

Awesome, I forgot about this option.

in src/net_processing.cpp:708 in acf05329ca outdated

 703 | +    int64_t process_time;
 704 | +    int64_t last_request_time = GetTxRequestTime(txid);
 705 | +    // First time requesting this tx
 706 | +    if (last_request_time == 0) {
 707 | +        peer_download_state.m_tx_announced.insert(txid);
 708 | +        process_time = GetTimeMicros();

sipa commented at 10:22 PM on February 4, 2019:

You could avoid querying the time here, by passing in nNow from the caller.

sipa commented at 10:24 PM on February 4, 2019: member

utACK, but looks like it needs rebasing on the logger changes in #15266.

in src/net_processing.cpp:344 in acf05329ca outdated

 339 | +
 340 | +        //! Store all the transactions a peer has recently announced
 341 | +        std::set<uint256> m_tx_announced;
 342 | +
 343 | +        //! Store transactions which were requested by us
 344 | +        std::set<uint256> m_in_flight;

MarcoFalke commented at 11:07 PM on February 4, 2019:

doc-nit: Either remove the "tx" from the m_tx_-prefix from the other members or call this member m_tx_in_flight?

in src/net_processing.cpp:669 in acf05329ca outdated

 665 | @@ -591,6 +666,58 @@ static void FindNextBlocksToDownload(NodeId nodeid, unsigned int count, std::vec
 666 |      }
 667 |  }
 668 |  
 669 | +void EraseTxRequest(uint256& txid) EXCLUSIVE_LOCKS_REQUIRED(cs_main)

MarcoFalke commented at 7:30 PM on February 5, 2019:

style-nit: Should be const reference?

in src/net_processing.cpp:722 in acf05329ca outdated

 714 | +
 715 | +    // We delay processing announcements from non-preferred (eg inbound) peers
 716 | +    if (!state->fPreferredDownload) process_time += INBOUND_PEER_TX_DELAY;
 717 | +
 718 | +    peer_download_state.m_tx_process_time.emplace(process_time, txid);
 719 | +}

MarcoFalke commented at 7:38 PM on February 5, 2019:

In this function you are checking against the size of m_tx_announced to prevent DoS. However, a remote peer can control whether entries are added to m_tx_announced or not (see the if condition before peer_download_state.m_tx_announced.insert)

So you can eat all memory by slowly filling m_tx_process_time without touching the other tx download state and thus avoid the DoS protection.

sdaftuar commented at 7:57 PM on February 5, 2019:

Good catch! Looks like it's just a mistake that we are only adding to m_tx_announced if last_request_time == 0, rather than always.

MarcoFalke commented at 7:47 PM on February 5, 2019: member

Concept ACK acf05329ca

I really like the refactoring that renames the variable names and adds documentation. That makes it a lot easier to understand the code (or at least less confusing).

I will review and test later this week, but initial tests suggest that a remote can fill up the tx state without bounds. (I stopped my node when m_tx_process_time.size()==1359900)

naumenkogs force-pushed on Feb 5, 2019

laanwj commented at 5:56 PM on February 6, 2019: member

@naumenkogs Unless you'd like to be credited as User in the release notes, you'll want to change the author name on your commit:

Author: User <...@gmail.com>

MarcoFalke commented at 7:35 PM on February 6, 2019: member

utACK 4282e1472d

sipa commented at 11:49 PM on February 6, 2019: member

utACK 4282e1472d7798cb622a3918e6cd168ab49ff96e

One thing for future consideration: it seems to me that filterInventoryKnown and m_tx_announced have a partial overlapping function, which makes me wonder if they can be merged, or at least have inv announcement test both.

You may want to take @laanwj's comment into account about crediting of your commit.

Change in transaction pull scheduling to prevent InvBlock-related attacks

Co-authored-by: Suhas Daftuar <sdaftuar@gmail.com>

1cff3d6cb0

naumenkogs force-pushed on Feb 7, 2019

naumenkogs commented at 3:06 PM on February 7, 2019: member

My last change is just to update author's name.

MarcoFalke commented at 3:12 PM on February 7, 2019: member

no-code-changes ACK 1cff3d6cb0

in src/net_processing.cpp:708 in 1cff3d6cb0

 703 | +        return;
 704 | +    }
 705 | +    peer_download_state.m_tx_announced.insert(txid);
 706 | +
 707 | +    int64_t process_time;
 708 | +    int64_t last_request_time = GetTxRequestTime(txid);

MarcoFalke commented at 6:23 PM on February 7, 2019:

style-nit: Could be const

in src/net_processing.cpp:3877 in 1cff3d6cb0

3883 | +            CInv inv(MSG_TX | GetFetchFlags(pto), txid);
3884 | +            if (!AlreadyHave(inv)) {
3885 | +                // If this transaction was last requested more than 1 minute ago,
3886 | +                // then request.
3887 | +                int64_t last_request_time = GetTxRequestTime(inv.hash);
3888 | +                if (last_request_time <= nNow - GETDATA_TX_INTERVAL) {

MarcoFalke commented at 6:24 PM on February 7, 2019:

style-nit: Could inline GetTxRequestTime here to avoid the named symbol last_request_time.

in src/net_processing.cpp:705 in 1cff3d6cb0

 700 | +    if (peer_download_state.m_tx_announced.size() >= MAX_PEER_TX_ANNOUNCEMENTS || peer_download_state.m_tx_announced.count(txid)) {
 701 | +        // Too many queued announcements from this peer, or we already have
 702 | +        // this announcement
 703 | +        return;
 704 | +    }
 705 | +    peer_download_state.m_tx_announced.insert(txid);

MarcoFalke commented at 6:29 PM on February 7, 2019:

style-nit: Could make this if (!peer_download_state.m_tx_announced.insert(txid).second) return; and remove the count() above to avoid two lookups. (This will be in line with the previous code in AskFor)

sipa commented at 7:39 PM on February 7, 2019: member

no-code-changes ACK 1cff3d6cb017aea87d16cbda0768bbab256d16da

sipa merged this on Feb 8, 2019

sipa closed this on Feb 8, 2019

sipa referenced this in commit b9b26d9c36 on Feb 8, 2019

fanquake removed this from the "Blockers" column in a project

HashUnlimited referenced this in commit c119ba3c9b on Feb 20, 2019

MarcoFalke referenced this in commit 607b1b7498 on Apr 18, 2019

MarcoFalke referenced this in commit f792395d13 on Jun 12, 2019

sidhujag referenced this in commit 0fe7564a41 on Jun 12, 2019

MarcoFalke referenced this in commit f3855781fd on Jun 16, 2019

jasonbcox referenced this in commit 8d1a1da709 on Dec 6, 2019

codablock referenced this in commit 8c0ff34ccd on Apr 7, 2020

codablock referenced this in commit 74eabc23e5 on Apr 7, 2020

codablock referenced this in commit 26dec64e79 on Apr 8, 2020

jonspock referenced this in commit 95f5bd81c6 on Sep 29, 2020

jonspock referenced this in commit 5b47d32846 on Sep 29, 2020

jonspock referenced this in commit 3dfd653640 on Oct 10, 2020

MarcoFalke referenced this in commit 762cbd287f on Dec 25, 2020

sidhujag referenced this in commit 68c18bc1da on Dec 25, 2020

mjdietzx referenced this in commit 32ae4b427f on Dec 26, 2020

ckti referenced this in commit 675f3ed517 on Mar 28, 2021

ckti referenced this in commit 0333b9cdda on Mar 28, 2021

PastaPastaPasta referenced this in commit ca2a430574 on Jun 27, 2021

PastaPastaPasta referenced this in commit 23776b4c53 on Jun 28, 2021

PastaPastaPasta referenced this in commit fb5de1fdc9 on Jun 29, 2021

PastaPastaPasta referenced this in commit 4a516525d2 on Jul 1, 2021

PastaPastaPasta referenced this in commit a00d0c25d7 on Jul 1, 2021

PastaPastaPasta referenced this in commit 4028ce8e8b on Jul 15, 2021

PastaPastaPasta referenced this in commit 4bb9abf422 on Jul 15, 2021

PastaPastaPasta referenced this in commit 1177bc94fc on Sep 17, 2021

gades referenced this in commit 1e3af95174 on Feb 20, 2022

gades referenced this in commit 2faa55b02f on Feb 20, 2022

DrahtBot locked this on Aug 16, 2022