p2p: Avoid forwarding ADDR messages to SPV nodes

naumenkogs commented at 8:15 pm on October 18, 2019: member

Background

The primary method of relaying node addresses across the network is daily self-announcement and forwarding self-announcements of other nodes. To rate-limit those, we relay them only to 1 or 2 peers (depending on whether the address is reachable for us). Unfortunately, in current implementation there is a chance of choosing SPV nodes, which, to the best of my knowledge, currently do not forward those announcements.

Discussion

During the meeting we reached a soft consensus that address relay participation should not be coupled to SPV/full node definition, but rather have an explicit flag (e.g. service flag). Until that’s getting resolved, I’m suggesting a tiny patch that prevent forwarding (they still can request and learn) these announcements to SPV nodes.

Analysis

To justify the change, I decided to measure how does this relay perform in different conditions (see my script). I measured how many nodes know about the new address relayed through this protocol (omitting batching <10 rule) within 120 “waves” of relay, each happening every 30 seconds (which is roughly 1 hour).

Let’s see what if the address is reachable to 5% of the network (I guess pretty fair assumption for Onion addresses, for example). Let’s call those “exotic”. Let’s also simulate having 10% of black holes, which are indistinguishable from real nodes, but don’t forward anything.

non-forwarding SPV nodes	5%	10%	20%	30%
All nodes knowing addr (before the change)	56%	48%	30%	22%
All nodes knowing addr (after the change)	55%	54%	58%	40%

Exotic nodes knowing addr (before the change)	80%	75%	55%	50%
Exotic nodes knowing addr (after the change)	76%	77%	75%	75%

If you don’t believe it’s that bad, please make your own simulation to validate mine :)

It will become worse and worse with the growth of SPV.

These numbers can also be used to motivate the high-level change (see discussion).

Questions

Would we make SPVs more vulnerable to addrman injections? Like, would it be easier for an attacker to fill their addrmans with malicious sybil nodes? (I guess we are already vulnerable it assuming some capabilities of an attacker, but would it be noticeably worse? EthanHeilman)
Does the same apply to the old pruned nodes? (achow101 I think you mentioned that older nodes would also be excluded here). Should we forward these announcements to them, if this makes them noticeably more vulnerable?

We can forward it to these two groups IN ADDITION to forwarding to (1 or 2) peers, if there is a belief that this is might affect their security. The bandwidth overhead would be obviously very low.

P.S.

~~This experiment also means that we’re very vulnerable to just black holes when relaying exotic addresses, but I will write about it in a separate issue.~~ Imrpoved the script to consider black holes.

P.P.S.

I guess the biggest inaccuracy in my little experiment is that in practice, those exotic semi-reachable nodes are well-interconnected, so once the address makes it to one of those — it should be relayed a bit better. Maybe, I will try to measure. Anyway, this does not disregard the problem. Improved the script to take exotic inter-connectivity into account and updated the measurements.

fanquake added the label P2P on Oct 18, 2019

EthanHeilman commented at 9:45 pm on October 18, 2019: contributor

We can forward it to these two groups IN ADDITION to forwarding to (1 or 2) peers, if there is a belief that this is might affect their security. The bandwidth overhead would be obviously very low.

It is good for SPV nodes to get addrs sent to them, but it doesn’t benefit the network as much as sending addrs to a full node. Ensuring addrs go to 2 full node peers without excluding SPV nodes seems to capture the requirements well.

fanquake requested review from sdaftuar on Oct 18, 2019

practicalswift commented at 2:20 pm on October 20, 2019: contributor

Concept ACK

Thanks for working on this!

MarcoFalke commented at 7:46 pm on October 22, 2019: member

It is good for SPV nodes to get addrs sent to them

Do you mean unsolicited addrs? If so, I tend to disagree. That would make it harder to run a bandwidth efficient non-addr-relay node (e.g. an “SPV node”). I think that addr messages should only be gossiped to bip-155 nodes or network nodes (detectable via service bit). All other nodes should send a getaddr or getaddrv2 p2p message to solicit addresses.

sipa commented at 7:53 pm on October 22, 2019: member

I think it’s a bit more nuanced than that.

In an ideal world, I think light clients would participate in local IP address management. Perhaps they wouldn’t relay addresses themselves, but even if not, they should be learning from IP addresses that are gossiped around.

In reality it seems that none of them do, and obviously gossipping IP addresses to nodes that are just going to ignore them is pointless.

harding commented at 3:29 pm on November 3, 2019: contributor

I wanted to note that one SPV client I’m aware of does (or at least did) use addr messages to find nodes. E.g., see this code:

https://github.com/breadwallet/breadwallet-core/blob/494aa99ed61df24d13e45f6b335cda192c1b765c/bitcoin/BRPeer.c#L268

However, that wallet also uses BIP37 bloom filters, so I don’t know what their strategy will be going forward with regards to using the P2P protocol. Directly relevant to this PR, it looks like they don’t accept unsolicited addr messages anyway—they only accept responses to a getaddr message: https://github.com/breadwallet/breadwallet-core/blob/494aa99ed61df24d13e45f6b335cda192c1b765c/bitcoin/BRPeer.c#L282

naumenkogs commented at 6:48 pm on November 5, 2019: member

So, it seems like most of us agree that we should allow light clients to request addrs, and we should not send them unsolicited addr messages assuming they forward them.

Just wanted to clarify, it is exactly what this PR does.

naumenkogs commented at 6:09 pm on December 1, 2019: member

I updated the script to account for black holes and be more precise in general (original post is updated to reflect new numbers). The result is: current behavior is actually not as bad as I thought. I still think it’s worth merging it though, especially because it will take a while for us to figure out all addrV2 stuff. So let’s save some bandwidth and make address relay be more effective for the cost of merging one line.

EthanHeilman commented at 3:55 pm on December 2, 2019: contributor

@naumenkogs If I understand you correctly this commit now excludes sending addrs to non-forward nodes, but would send addrs to SPV clients that do forward addrs?

I looked through the commit and it wasn’t completely obvious to me, how MayHaveUsefulAddressDB excludes non-forwarding nodes?

I know MayHaveUsefulAddressDB checks the service flags and I read the service flag descriptions in the code but the flags seem obscure to me i.e. I have trouble understanding precisely what they denote. Also one imagines that a non-forwarding node may have a “useful address DB” so the name of the function call is slightly deceptive.

I would advocate for a comment explaining the intention of the call: “We are calling MayHaveUsefulAddressDB to exclude non-forwarding nodes.”

naumenkogs commented at 6:49 pm on December 2, 2019: member

@EthanHeilman

this commit now excludes sending addrs to non-forward nodes, but would send addrs to SPV clients that do forward addrs?

No. Unfortunately, right now there is no way to distinguish forwarding SPV nodes and non-forwarding SPV nodes. Based on the implementations of SPV nodes I found, none of them forward addrs, so I suggest not forwarding to all SPV nodes for now (until we progress with an explicit opt-in addr forwarding). Also we found that some implementations would still request addrs, and after this PR they would still be allowed to do so.

Fyi, service bits NODE_NETWORK | NODE_NETWORK_LIMITED currently literally mean “everything except SPV”.

Now I agree that documenting and renaming the function would be useful here, will do.

DrahtBot commented at 7:45 am on January 17, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#19860 (Improve diversification of new connections: privacy and stability by naumenkogs)
#19858 (Periodically make block-relay connections and sync headers by sdaftuar)
#19763 (net: don’t try to relay to the address’ originator by vasild)
#19724 ([net] Cleanup connection types- followups by amitiuttarwar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

vasild commented at 4:09 pm on March 30, 2020: member

Here is a functional test that demonstrates this patch works as intended.

  0diff --git c/test/functional/p2p_addr_relay.py i/test/functional/p2p_addr_relay.py
  1new file mode 100755
  2index 000000000..7b74888f0
  3--- /dev/null
  4+++ i/test/functional/p2p_addr_relay.py
  5@@ -0,0 +1,75 @@
  6+#!/usr/bin/env python3
  7+# Copyright (c) 2017-2019 The Bitcoin Core developers
  8+# Distributed under the MIT software license, see the accompanying
  9+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
 10+"""Tests address relaying to SPV nodes (aka clients aka ~(NODE_NETWORK | NODE_NETWORK_LIMITED))
 11+
 12+* set up nodeRelay (bitcoind)
 13+* connect two mini nodes to nodeRelay - node1 and node2
 14+* fire an "addr message" from node1 to nodeRelay
 15+* observe if node2 will get the address relayed through nodeRelay
 16+
 17+Results should be:
 18+
 19+Without [#17194](/bitcoin-bitcoin/17194/):
 20+node2 services=NODE_WITNESS | NODE_NETWORK_LIMITED -> relayed
 21+node2 services=NODE_WITNESS | NODE_NETWORK -> relayed
 22+node2 services=NODE_WITNESS -> relayed
 23+
 24+With [#17194](/bitcoin-bitcoin/17194/):
 25+node2 services=NODE_WITNESS | NODE_NETWORK_LIMITED -> relayed
 26+node2 services=NODE_WITNESS | NODE_NETWORK -> relayed
 27+node2 services=NODE_WITNESS -> not relayed
 28+"""
 29+from test_framework.messages import CAddress, msg_addr, NODE_NETWORK, NODE_NETWORK_LIMITED, NODE_WITNESS
 30+from test_framework.mininode import P2PInterface, mininode_lock
 31+from test_framework.test_framework import BitcoinTestFramework
 32+from test_framework.util import (
 33+    assert_equal,
 34+    wait_until,
 35+)
 36+
 37+import time
 38+
 39+class P2Paddr(P2PInterface):
 40+    def wait_for_addr(self, timeout=5):
 41+        test_function = lambda: self.last_message.get("addr")
 42+        wait_until(test_function, timeout=timeout, lock=mininode_lock)
 43+
 44+class NodeNetworkLimitedTest(BitcoinTestFramework):
 45+    def set_test_params(self):
 46+        self.setup_clean_chain = True
 47+        self.num_nodes = 1
 48+        self.extra_args = [[]]
 49+
 50+    def setup_network(self):
 51+        self.add_nodes(self.num_nodes, self.extra_args)
 52+        self.start_nodes()
 53+
 54+    def run_test(self):
 55+        nodeRelay = self.nodes[0]
 56+
 57+        node1 = nodeRelay.add_p2p_connection(P2Paddr(), services=NODE_WITNESS | NODE_NETWORK)
 58+        #node2 = nodeRelay.add_p2p_connection(P2Paddr(), services=NODE_WITNESS | NODE_NETWORK)
 59+        #node2 = nodeRelay.add_p2p_connection(P2Paddr(), services=NODE_WITNESS | NODE_NETWORK_LIMITED)
 60+        node2 = nodeRelay.add_p2p_connection(P2Paddr(), services=NODE_WITNESS)
 61+
 62+        addr = CAddress()
 63+        addr.time = int(time.time())
 64+        addr.nServices = NODE_NETWORK | NODE_WITNESS
 65+        addr.ip = "123.123.123.123"
 66+        addr.port = 8333
 67+
 68+        msg = msg_addr()
 69+        msg.addrs.append(addr)
 70+
 71+        self.log.info("Checking how nodeRelay relays the address it received from node1")
 72+
 73+        node1.send_message(msg)
 74+
 75+        node2.wait_for_addr()
 76+
 77+        nodeRelay.disconnect_p2ps()
 78+
 79+if __name__ == '__main__':
 80+    NodeNetworkLimitedTest().main()
 81diff --git c/src/net_processing.cpp i/src/net_processing.cpp
 82index 465a0c06e..795a9179f 100644
 83--- c/src/net_processing.cpp
 84+++ i/src/net_processing.cpp
 85@@ -97,13 +97,16 @@ void EraseOrphansFor(NodeId peer);
 86 /** Increase a node's misbehavior score. */
 87 void Misbehaving(NodeId nodeid, int howmuch, const std::string& message="") EXCLUSIVE_LOCKS_REQUIRED(cs_main);
 88 
 89 /** Average delay between local address broadcasts in seconds. */
 90 static constexpr unsigned int AVG_LOCAL_ADDRESS_BROADCAST_INTERVAL = 24 * 60 * 60;
 91 /** Average delay between peer address broadcasts in seconds. */
 92-static const unsigned int AVG_ADDRESS_BROADCAST_INTERVAL = 30;
 93+// Relay immediately, otherwise we may end up delaying the relay for a long
 94+// time (without upper limit, see PoissonNextSend()) which makes the
 95+// p2p_addr_relay.py test fragile because it waits for the relay.
 96+static const unsigned int AVG_ADDRESS_BROADCAST_INTERVAL = 0;
 97 /** Average delay between trickled inventory transmissions in seconds.
 98  *  Blocks and whitelisted receivers bypass this, outbound peers get half this delay. */
 99 static const unsigned int INVENTORY_BROADCAST_INTERVAL = 5;
100 /** Maximum number of inventory items to send per transmission.
101  *  Limits the impact of low-fee transaction floods. */
102 static constexpr unsigned int INVENTORY_BROADCAST_MAX = 7 * INVENTORY_BROADCAST_INTERVAL;
103@@ -1377,13 +1380,20 @@ static void RelayAddress(const CAddress& addr, bool fReachable, CConnman* connma
104     FastRandomContext insecure_rand;
105 
106     std::array<std::pair<uint64_t, CNode*>,2> best{{{0, nullptr}, {0, nullptr}}};
107     assert(nRelayNodes <= best.size());
108 
109     auto sortfunc = [&best, &hasher, nRelayNodes](CNode* pnode) {
110-        if (pnode->nVersion >= CADDR_TIME_VERSION && pnode->IsAddrRelayPeer() && MayHaveUsefulAddressDB(pnode->nServices)) {
111+        if (pnode->nVersion >= CADDR_TIME_VERSION && pnode->IsAddrRelayPeer()
112+#if 1
113+                && MayHaveUsefulAddressDB(pnode->nServices)
114+#elif 1
115+                // same as above
116+                && !pnode->fClient
117+#endif
118+                ) {
119             uint64_t hashKey = CSipHasher(hasher).Write(pnode->GetId()).Finalize();
120             for (unsigned int i = 0; i < nRelayNodes; i++) {
121                  if (hashKey > best[i].first) {
122                      std::copy(best.begin() + i, best.begin() + nRelayNodes - 1, best.begin() + i + 1);
123                      best[i] = std::make_pair(hashKey, pnode);
124                      break;

Since this PR is not getting traction for some time, if it looks like too risky behavioral change, what about making it optional under a newly introduced config option?

naumenkogs commented at 7:16 pm on March 30, 2020: member

@vasild Thanks for the test! Will integrate it into this PR.

I anticipate that there’s not enough attention not because it’s too risky, but rather just a natural thing. Maybe a lack of motivation. I would say that adding optional would make it even less attractive to review, because of new complexity.

vasild commented at 7:46 pm on March 30, 2020: member

Yes, a config knob is not very attractive either.

Btw I had to make this change in the code to make the test pass quickly and deterministically:

0-static const unsigned int AVG_ADDRESS_BROADCAST_INTERVAL = 30;
1+// Relay immediately, otherwise we may end up delaying the relay for a long
2+// time (without upper limit, see PoissonNextSend()) which makes the
3+// p2p_addr_relay.py test fragile because it waits for the relay.
4+static const unsigned int AVG_ADDRESS_BROADCAST_INTERVAL = 0;

It cannot be added to the repository in its current form, but is good enough for manual testing. Maybe I overlooked something and it can be done without requiring changes to the code?

sipa commented at 7:23 pm on July 7, 2020: member

Concept and code review ACK 644eae6558b5becfdd8031ca83d6b92f4467cf3c

Given evidence that non-full-nodes don’t do anything with unsollicited addr relays, and the fact that GETADDR still works, I think it’s fine to drop relay to them.

It’s probably worth announcing this on the bitcoin-dev mailinglist, to ask for comments, and suggest that if there are exceptions, we can work on a service flag for explicitly opting into addr relay.

DrahtBot added the label Needs rebase on Jul 10, 2020

fanquake commented at 5:05 am on July 26, 2020: member

@jnewbery @amitiuttarwar Did you want to take a look here?

MarcoFalke removed the label Needs rebase on Jul 26, 2020

DrahtBot added the label Needs rebase on Jul 26, 2020

Do not forward short (<10) ADDR messages to the nodes which won't relay them forward. 3a1629af23

naumenkogs force-pushed on Jul 27, 2020

DrahtBot removed the label Needs rebase on Jul 27, 2020

jnewbery commented at 10:31 am on July 27, 2020: member

If the goal is to ensure that ADDR gossips don’t disappear into a black hole, then I don’t think this is the correct solution.

Ensuring addrs go to 2 full node peers without excluding SPV nodes seems to capture the requirements well.

This seems like a much better solution. We lose nothing by gossiping an address to SPV peers, as long as we also gossip to two “address relay” peers.

We don’t know whether there are clients that rely on receiving address gossips. After this PR, SPV clients can still request addresses using GETADDR, but we only permit one GETADDR request per connection. If an SPV client wants to request fresh addresses, they’d need to disconnect-reconnect-GETADDR, which seems undesirable.

<aj> how about just biassing against picking peers for the 1 or 2 to relay for if those peers haven't sent us ADDR messages already, if that makes sense?

(http://www.erisian.com.au/bitcoin-core-dev/log-2019-10-17.html#l-591)

This seems even better to me. What we actually care about is whether the peer is going to relay the address to more peers, not whether it’s advertised that it’s able to serve blocks. Whether the peer has previously sent us addresses is our best heuristic for this.

naumenkogs commented at 10:43 am on July 28, 2020: member

@jnewbery I don’t have a strong opinion on the approach, indeed the goal is to make sure ADDRs don’t go into black hole. ~~Otherwise, a node with 8 outbound peers and 100 inbound SPVs will be self-announcing very poorly.~~ (wrong, see response)

We seem to have 2 options:

don’t send to SPV
make sure we choose those who sent us ADDR before

The only real difference is that the first option (current implementation 3a1629af23c101f19b371c68155411b0048409f4) might exclude some SPV implementations which handle ADDR messages (and send ADDR messages, as required per second option to reliably work), and nobody seen these clients so far. I guess we can ask on the ML whether they exist, but the lack of response won’t prove the absence :)

I don’t mind switching to the second option, but since the activity in this PR is low, I would do it after Concept ACK 2nd option from someone?

EthanHeilman commented at 3:58 pm on July 28, 2020: contributor

If someone is interested in writing the concept ACK of the second option get in touch with me, happy to review your code provide input. Otherwise I might try to write this.

jnewbery commented at 11:50 am on July 29, 2020: member

Otherwise, a node with 8 outbound peers and 100 inbound SPVs will be self-announcing very poorly.

This isn’t true. We self-announce our address to all peers on a timer:

https://github.com/bitcoin/bitcoin/blob/a76ccb01b96424ef207762b704d81b9d2497f3d2/src/net_processing.cpp#L4058-L4065

(although in practice most of the time this won’t actually result in our address being advertised since it won’t have cycled out of m_addr_known yet).

It is true that we’ll be ineffective at relaying new addresses that we hear about.

DrahtBot commented at 1:22 pm on September 3, 2020: member

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

DrahtBot added the label Needs rebase on Sep 3, 2020

adamjonas commented at 3:03 pm on December 8, 2020: member

@naumenkogs checking in on the status of the PR. Is this something you plan to continue to pursue?

mzumsande commented at 8:20 pm on June 18, 2021: member

If I understand the code at
https://github.com/bitcoin/bitcoin/blob/da69d9965a112c6421fce5649b5a18beb7513526/src/net_processing.cpp#L2783-L2786 correctly, the assumption of 120 waves of relay in the simulation is too much, because with nSince = nNow - 10 * 60, gossip relay for a given addr will just end 10 minutes (=20 waves on average) after the self-announcement, assuming synchronized clocks. So I would expect relay to be worse than the simulations suggest.

naumenkogs commented at 1:00 pm on September 15, 2021: member

Closing this because it was partially handled by #21528, but also arguments just get outdated after that PR, and the code as well.

naumenkogs closed this on Sep 15, 2021

DrahtBot locked this on Oct 30, 2022