Addrman: test-before-evict bugfix and improvements for block-relay-only peers

sdaftuar commented at 6:38 pm on October 19, 2020: member

This PR does two things:

Block-relay-only interaction with addrman.
- Calling CAddrMan::Connected() on an address that was a block-relay-only peer causes the time we report in addr messages containing that peer to be updated; particularly now that we use anchor connections with a our block-relay-only peers, this risks leaking information about those peers. So, stop this.
- Avoiding calling CAddrMan::Good() on block-relay-only peer addresses causes the addrman logic around maintaining the new and tried table to be less good, and in particular makes it so that block-relay-only peer addresses are more likely to be evicted from the addrman (for no good reason I can think of). So, mark those addresses as good when we connect.
Fix test-before-evict bug. There’s a bug where if we get a collision in the tried table with an existing address that is one of our current peers, and the connection is long-lived enough, then SelectTriedCollisions() might return that existing peer address to us as a test-before-evict connection candidate. However, our logic for new outbound connections would later prevent us from actually making a connection; the result would be that when we get a collision with a long-lived current peer, that peer’s address is likely to get evicted from the tried table. Fix this by checking to see if a test-before-evict candidate is a peer we’re currently connected to, and if so, mark it as Good().

sdaftuar commented at 6:40 pm on October 19, 2020: member

I’m still trying to verify that the bug being fixed relating to test-before-evict is in fact a bug (and I’m not just misunderstanding the logic); but if my understanding is correct I think we should fix all these issues in the next release.

DrahtBot commented at 7:21 pm on October 19, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20233 (addrman: Make consistency checks a runtime option by jnewbery)
#20228 (addrman: Make addrman a top-level component by jnewbery)
#20158 (tree-wide: De-globalize ChainstateManager by dongcarl)
#19843 (Refactoring and minor improvement for self-advertisements by naumenkogs)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label P2P on Oct 19, 2020

fanquake requested review from amitiuttarwar on Oct 19, 2020

fanquake requested review from naumenkogs on Oct 19, 2020

MarcoFalke added this to the milestone 0.21.0 on Oct 20, 2020

naumenkogs commented at 6:31 am on October 20, 2020: member

Concept ACK.

The former issues are worth addressing. Seems like the latter bug is real, even though the chance of hitting it is rather low.

sdaftuar commented at 3:20 pm on October 25, 2020: member

I was able to verify the test-before-evict bug by modifying the tried table to be very small, making collisions with existing peers likely. The patch here appears to fix it as well.

in src/net.cpp:2590 in dafde15cc0 outdated

2586@@ -2571,7 +2587,8 @@ void CConnman::DeleteNode(CNode* pnode)
2587     assert(pnode);
2588     bool fUpdateConnectionTime = false;
2589     m_msgproc->FinalizeNode(pnode->GetId(), fUpdateConnectionTime);
2590-    if (fUpdateConnectionTime) {
2591+    if (fUpdateConnectionTime && !pnode->IsBlockOnlyConn()) {

jnewbery commented at 8:44 am on October 26, 2020:

This further splits the logic about whether to call Connected() for a node. Now there are three three places that you need to look:

Is the peer an outbound (https://github.com/bitcoin/bitcoin/blob/dafde15cc04fbeb5851fa98fec34dd9f2a7e8420/src/net_processing.cpp#L2475-L2478)
Does the peer have any misbehavior points (https://github.com/bitcoin/bitcoin/blob/dafde15cc04fbeb5851fa98fec34dd9f2a7e8420/src/net_processing.cpp#L849-L851)
Is the peer a block only connection (here)

I’d rather place this new logic in one of the existing places and eventually consolidate them, rather than add another place for people to look.

sdaftuar commented at 12:52 pm on October 26, 2020:

Agreed that this is confusing. Originally I was contemplating have code in net.cpp that wouldn’t call CAddrMan::Attempt() on block-relay-only peers, and so having this logic also be in net.cpp made more sense to me; but now I think it’s fine to move into net_processing/FinalizeNode().

jnewbery commented at 2:31 pm on October 26, 2020:

I’d suggest going further and removing fCurrentlyConnected, and then changing FinalizeNode() to check whether the peer is:

fSuccessfullyConnected
not inbound and not block relay only
has 0 misbehavior score and then directly call CAddrMan.Connected() directly, so that all the logic around calling Connected() is in one place.

If you agree, perhaps you could take a look at 1755f08b4dba34b483a401c6098d5df549219ce7 in PR #20228, which does some of that. For this PR, just putting the block relay only check inside FinalizeNode() at least doesn’t make the situation any worse.

sdaftuar commented at 4:54 pm on October 26, 2020:

I just did the simple thing of moving this to FinalizeNode(). Because there’s no good way to get a CNode from a NodeId while it’s in the process of being deleted (as far as I can tell!), I had to change the interface to accommodate this move.

jnewbery commented at 5:16 pm on October 26, 2020:

DeleteNode() is holding a reference to the CNode, so you should just be able to pass a const reference to that CNode down to FinalizeNode(). That’d be more symmetrical with the other NetEventsInterface interface functions (you could also pass a pointer like the other functions, but they should all be const references really)

in src/net_processing.cpp:2428 in dafde15cc0 outdated

2414@@ -2421,9 +2415,22 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
2415             // Get recent addresses
2416             m_connman.PushMessage(&pfrom, CNetMsgMaker(greatest_common_version).Make(NetMsgType::GETADDR));

jnewbery commented at 8:50 am on October 26, 2020:

Is there any reason to net send a getaddr message to block relay only peers? See the comment above “If we’re starting up for the first time, our addrman may be pretty empty…”. Since #17428, the first peers we connect to on startup will be anchor connections, which are block relay only peers. I can’t see any downside to asking them for addrs.

sdaftuar commented at 12:49 pm on October 26, 2020:

Well if you have anchor connections to connect to, you likely have a useful peers.dat as well (you must have been running before), so I don’t think that reasoning applies.

If we process an addr message from a peer by adding entries to addrman, we then will leak that information in at least two ways I can think of – feeler connections will be attempted to addresses we learn that are in the new table, and we sample from our addrman to respond to other peers who send us getaddr messages. I don’t know if there might be additional sources of information leak about what is in our addrman besides those, but those both seem like ways an attacker could at least probabilistically test whether two nodes on the network might be connected.

jnewbery commented at 2:23 pm on October 26, 2020:

Ah, I see that we ignore all addr messages from block-relay-only peers anyway, in the addr message processing (https://github.com/bitcoin/bitcoin/blob/d67883d01e507dd22d1281f4a4860e79d6a46a47/src/net_processing.cpp#L2549), so this seems fine.

in src/net.h:448 in dafde15cc0 outdated

444@@ -445,6 +445,10 @@ class CConnman
445     CNode* FindNode(const std::string& addrName);
446     CNode* FindNode(const CService& addr);
447 
448+    // Determine whether we're already connected to a given peer, in order to

jnewbery commented at 8:52 am on October 26, 2020:

Make this a doxygen format comment (starting with /**, so that doxygen docs can be generated from it: https://doxygen.bitcoincore.org/)

sdaftuar commented at 4:53 pm on October 26, 2020:

Done.

in src/net.cpp:1975 in dafde15cc0 outdated

1971@@ -1967,7 +1972,18 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1972             CAddrInfo addr = addrman.SelectTriedCollision();
1973 
1974             // SelectTriedCollision returns an invalid address if it is empty.
1975-            if (!fFeeler || !addr.IsValid()) {
1976+            bool test_before_evict = fFeeler && addr.IsValid();

jnewbery commented at 10:02 am on October 26, 2020:

This logic is already very convoluted, and I think this change makes it even more difficult to follow. The mainline case (connecting to a new outbound beer), currently calls SelectedTriedCollision(), then sets test_before_evict to false, and then throws away the return value from SelectedTriedCollision() in the if (!test_before_evict) block below.

The logic for feeler connections and non-feeler connections has diverged to the point where it’d make this code a lot easier to follow if it were refactored to a GetFeelerConnectionAddress() and GetOutboundConnectionAddress() function. For now, I think the following is easier to read:

 0            if (fFeeler) {
 1                // First, try to get a tried table collision address. This returns
 2                // an empty (invalid) address if there are no collisions to try.
 3                addr = addrman.SelectTriedCollision();
 4
 5                if (!addr.IsValid()) {
 6                    // No tried table collisions. Select a new table address
 7                    // for our feeler.
 8                    addr = addrman.Select(true);
 9                } else if (AlreadyConnectedToPeer(addr)) {
10                    // If test-before-evict logic would have us connect to a
11                    // peer that we're already connected to, just mark that
12                    // address as Good(). We won't be able to initiate the
13                    // connection anyway, so this avoids inadvertently evicting
14                    // a currently-connected peer.
15                    addrman.Good(addr);
16                    addr = addrman.Select(true);
17                }
18            } else {
19                // Not a feeler
20                addr = addrman.Select();
21            }

sdaftuar commented at 4:55 pm on October 26, 2020:

I’m indifferent, so taking your patch.

in src/net.cpp:1982 in dafde15cc0 outdated

1978+            // If test-before-evict logic would have us connect to a peer that
1979+            // we're already connected to, just mark that address as Good(). We
1980+            // won't be able to initiate the connection anyway, so this avoids
1981+            // inadvertently evicting a currently-connected peer.
1982+            if (test_before_evict && AlreadyConnectedToPeer(addr)) {
1983+                MarkAddressGood(addr);

jnewbery commented at 10:02 am on October 26, 2020:

No need to call this CConnman forwarding function. Just call addrman.Good() directly.

sdaftuar commented at 4:55 pm on October 26, 2020:

I took this change as part of your patch for rewriting the loop.

in src/net_processing.cpp:2421 in dafde15cc0 outdated

2414@@ -2421,9 +2415,22 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
2415             // Get recent addresses
2416             m_connman.PushMessage(&pfrom, CNetMsgMaker(greatest_common_version).Make(NetMsgType::GETADDR));
2417             pfrom.fGetAddr = true;
2418+        }
2419 
2420-            // Moves address from New to Tried table in Addrman, resolves
2421-            // tried-table collisions, etc.
2422+        if (!pfrom.IsInboundConn()) {
2423+            // For non-inbound connections, we also update the addrman to

jnewbery commented at 10:04 am on October 26, 2020:

Remove the word ‘also’. Comments have a habit of migrating around in the codebase, and joining words like also lose their meaning if additional code/comments are added around the existing comments.

sdaftuar commented at 4:55 pm on October 26, 2020:

Done.

in src/net.cpp:359 in dafde15cc0 outdated

347@@ -348,6 +348,11 @@ CNode* CConnman::FindNode(const CService& addr)
348     return nullptr;
349 }
350 
351+bool CConnman::AlreadyConnectedToPeer(const CAddress& addr)
352+{
353+    return FindNode(static_cast<CNetAddr>(addr)) || FindNode(addr.ToStringIPPort());

jnewbery commented at 10:07 am on October 26, 2020:

Note that this is inefficient (lock cs_vNodes, iterate through nodes, unlock, lock, iterate, unlock), but that was a problem with the existing code. It could be made more efficient by converting this to a ForEachNode() call with a lambda returning if the address matches the CNetAddr or string.

jnewbery commented at 10:11 am on October 26, 2020:

It might be worth adding a comment about why we search both for the CNetAddr and the ip/port string. It was added here: https://github.com/bitcoin/bitcoin/commit/9bab521df895c149579b9e64931405c56b008afb#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R1416 in the PR to add support for SOCKS5.

sdaftuar commented at 12:50 pm on October 26, 2020:

Agreed, but I didn’t want to rewrite this.

sdaftuar commented at 4:55 pm on October 26, 2020:

Care to suggest a comment? I’m not sure my understanding of when we might be looking up by CNetAddr versus string is precise enough to say the right thing here.

jnewbery commented at 5:12 pm on October 26, 2020:

I was hoping you might have a better understanding of this than me :grin: . My understanding of how hostnames/proxies work in Bitcoin Core is certainly too hazy to suggest any wording here.

I’m going to mark this as resolved. It’d be good to document why we’re doing this, but this PR isn’t changing the behavior, so it’s certainly not a blocker.

jnewbery commented at 5:13 pm on October 26, 2020:

That’s reasonable. It’s not a change in behaviour. Marking resolved.

ariard commented at 1:56 am on October 30, 2020:

If the peer connection is initiated based on hostname (e.g -seednode, -addnode) we have to resolve first the provided hostname to obtain the addr (in ConnectNode). If we succeed this resolution, we store the hostname (MaybeSetAddrName). In case of future connection also from hostname (OpenNetworkConnection with pszDest != null), for already-open connection check step, search is done on this subset (L2191).

Based on this observation, the second lookup with string appears as redundant, as each connected peer must be in findable by CNetAddr.

That said, I agree it doesn’t change behavior.

sipa commented at 2:27 am on October 30, 2020:

@ariard If there is a name proxy, we never resolve ourselves, and instead make a proxy connection to the specified name (by asking the proxy to do the resolving for us). In that case there is no CNetAddr known.

ajtowns commented at 2:53 am on October 30, 2020:

@sipa But CAddress& addr is a CNetAddr and addr.ToStringIPPort() generates a string from the CNetAddr member variables? As far as I can see there’s no way to initialise CNetAddr::m_addr to something like ::1 or 0:0:0:0:0:0:0:1 which would be disambiguated by converting to a canonical string?

sipa commented at 2:54 am on October 30, 2020:

Oh, right. What I said isn’t relevant here.

amitiuttarwar commented at 3:39 am on November 1, 2020:

@sipa, so do you agree having the two lookups seems redundant? or am I missing something?

in src/net.cpp:351 in dafde15cc0 outdated

347@@ -348,6 +348,11 @@ CNode* CConnman::FindNode(const CService& addr)
348     return nullptr;
349 }
350 
351+bool CConnman::AlreadyConnectedToPeer(const CAddress& addr)

jnewbery commented at 10:09 am on October 26, 2020:

nit: AlreadyConnectedToAddress() would be a more accurate name.

sdaftuar commented at 4:55 pm on October 26, 2020:

Agreed, done.

sdaftuar force-pushed on Oct 26, 2020

in src/net_processing.cpp:840 in 266d22a3d1 outdated

828@@ -829,7 +829,7 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
829     scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
830 }
831 
832-void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
833+void PeerManager::FinalizeNode(NodeId nodeid, const bool is_block_relay_only_peer, bool& fUpdateConnectionTime) {

jnewbery commented at 4:42 pm on October 26, 2020:

I think it’d be cleaner to pass in a const CNode& and get the id and block relay only state from the CNode.

sdaftuar commented at 6:28 pm on October 26, 2020:

Done.

jnewbery commented at 6:50 pm on October 26, 2020:

Sorry I wasn’t clear here, but you can also drop the NodeId argument from the function signature, and then call GetId() from within FinalizeNode(). Passing both CNode& and NodeId is somewhat redundant.

sdaftuar commented at 7:56 pm on October 26, 2020:

Done in 430c8b6c0f9381f039d83066aa2cbd70c1d1414b.

in src/net_processing.cpp:2441 in 266d22a3d1 outdated

2428+            // While we strive to not leak information about block-relay-only
2429+            // connections via the addrman, not moving an address to the tried
2430+            // table is also potentially detrimental because new-table entries
2431+            // are subject to eviction in the event of addrman collisions.  We
2432+            // mitigate the information-leak by never calling
2433+            // CAddrMan::Connected() on block-relay-only peers; see net.cpp

jnewbery commented at 4:43 pm on October 26, 2020:

This is now in FinalizeNode, not net.cpp :)

sdaftuar commented at 6:28 pm on October 26, 2020:

Oops, thanks for catching that. Fixed.

sdaftuar force-pushed on Oct 26, 2020

jnewbery commented at 5:10 pm on October 26, 2020: member

Concept ACK.

An alternative approach could be to just periodically refresh the info.nLastSuccess for all active connections. If I’m understanding the addrman code correctly, as long as we’re refreshing that value every 4 hours, we’ll never evict that peer.

Adding a StillGood() (with a better name) method to CAddrMan which simply updates nLastSuccess and calling it either globally or per-peer every three hours would avoid adding more logic to the already-too-complicated ThreadOpenConnections()

sdaftuar commented at 6:05 pm on October 26, 2020: member

An alternative approach could be to just periodically refresh the info.nLastSuccess for all active connections. If I’m understanding the addrman code correctly, as long as we’re refreshing that value every 4 hours, we’ll never evict that peer.

Adding a StillGood() (with a better name) method to CAddrMan which simply updates nLastSuccess and calling it either globally or per-peer every three hours would avoid adding more logic to the already-too-complicated ThreadOpenConnections()

I think the best thing would be if addrman just detected the collision with an existing peer immediately, and avoided the whole tried-collision logic itself. If we had a good way to do that (by changing the interface between addrman and connman) then I think that could make sense.

Short of addrman being smart enough to avoid this problem, having net.cpp detect this condition and fix it directly seems next simplest to me; relying on Good() or a StillGood() being called at the right times to avoid this problem as a consequence of how and when those variables get updated just seems unnecessarily complex…? The next time someone wonders if it’s possible for an existing peer to be evicted from the tried table, I think it’ll be much easier to reason about my proposed change than the alternative you’re suggesting.

sdaftuar force-pushed on Oct 26, 2020

jnewbery commented at 6:57 pm on October 26, 2020: member

relying on Good() or a StillGood() being called at the right times to avoid this problem as a consequence of how and when those variables get updated just seems unnecessarily complex…? The next time someone wonders if it’s possible for an existing peer to be evicted from the tried table, I think it’ll be much easier to reason about my proposed change than the alternative you’re suggesting.

I disagree, but perhaps it’s just a matter of taste. To me, making addrman always aware of existing connections and keeping nLastSuccess fresh by periodic notifications from net_processing or net seems much cleaner than only finding out at the point that we attempt to reconnect.

sdaftuar commented at 8:25 pm on October 26, 2020: member

I disagree, but perhaps it’s just a matter of taste. To me, making addrman always aware of existing connections and keeping nLastSuccess fresh by periodic notifications from net_processing or net seems much cleaner than only finding out at the point that we attempt to reconnect.

We could add an interface to addrman so that it explicitly keeps track of currently connected peers, by notifying it when we successfully make new connections and again when a peer disconnects, and then use that to prevent tried collisions with current peers from ever making it into the tried-collision list for later resolution.

jnewbery commented at 10:36 pm on October 26, 2020: member

We could add an interface to addrman so that it explicitly keeps track of currently connected peers, by notifying it when we successfully make new connections and again when a peer disconnects, and then use that to prevent tried collisions with current peers from ever making it into the tried-collision list for later resolution.

Yes, that seems like another reasonable approach.

To be clear, I’m not NACKing the approach in this PR. I wanted to offer an alternative suggestion, but what you’re doing here seems fine, and this actually cleans up and clarifies some of the code.

Thanks for being so responsive to review. I plan to do another pass of the latest changes tomorrow.

in src/net_processing.cpp:841 in 430c8b6c0f outdated

836@@ -837,7 +837,8 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
837     scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
838 }
839 
840-void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
841+void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
842+    NodeId nodeid = node.GetId();

jnewbery commented at 9:32 am on October 27, 2020:

nit: if you have to retouch the branch, you could make this const.

in src/net.cpp:2019 in 430c8b6c0f outdated

2018+                    // If test-before-evict logic would have us connect to a
2019+                    // peer that we're already connected to, just mark that
2020+                    // address as Good(). We won't be able to initiate the
2021+                    // connection anyway, so this avoids inadvertently evicting
2022+                    // a currently-connected peer.
2023+                    addrman.Good(addr);

jnewbery commented at 9:46 am on October 27, 2020:

This will call Good() for any address that we’re connected to, including inbound peers. That’s different from the current behavior, where we’re careful to not call Good() for inbounds. The result of that is we’ll move addresses of inbound peers to the tried table, even though we’ve never actually successfully connected out to them.

Does this open up a gap where an attacker can get addresses added to your tried table by opening lots of inbound connections to try to trigger this logic?

sdaftuar commented at 12:40 pm on October 27, 2020:

That can only happen if there’s a tried-table collision with the address of the inbound peer – so that inbound peer’s address is already in our tried-table anyway… I don’t think calling Good() on an address already in our tried-table can be all that detrimental? I think there are two cases to consider:

If we were to initiate an outbound connection to the address, it would accept it, and we would therefore mark it Good(). In this case, marking it Good() because it’s a current peer is correct.
If we initiated an outbound connection to the address, it would not accept it. In this case, it’s sort of unclear what is going on; we must have been able to connect to it as an outbound in the past in order to get it into the tried table in the first place. So perhaps the peer isn’t accepting inbound connections at this time (or is full), or there’s something funny going on with the networking/address that we’re trying to use to reach it? However, keeping it in our tried table doesn’t seem detrimental; at most, about 115 addresses in our tried table could be reserved for us by inbound peers who no longer are taking incoming connections but were in the past… An attacker could just keep listening on those addresses anyway and maintain those tried-table entries (tried-table collisions are rare anyway), so I don’t know that we’re giving an attacker much more power or reducing the costs of an eclipse attack all that much by failing to detect this case.

Another option would be to (a) change our outbound peer logic to allow feeler connections to inbound peers, and (b) change this logic to mark outbound peers’ addresses as Good() in the event of a tried table collision. I am not sure if there’s some reason that would be a bad idea, but I’m also not sure that it’s necessary.

jnewbery commented at 2:16 pm on October 27, 2020:

Ah yes, of course you’re right. Anything returned from SelectTriedCollision() must by definition already be in the tried table.

jnewbery commented at 12:06 pm on October 27, 2020: member

If you squashed the last commit (Eliminate unnecessary parameter from FinalizeNode()) with the first commit (Avoid calling CAddrMan::Connected() on block-relay-only peer addresses), or even just moved it to be the first commit, you’d avoid having to touch other lines of code (e.g. in denialofservice_tests.cpp) multiple times.

jnewbery commented at 3:05 pm on October 27, 2020: member

utACK 430c8b6c0f9381f039d83066aa2cbd70c1d1414b

I think moving the last commit to the beginning would make this easier for other reviewers. Very happy to reACK if you decide to do that while you just have my ACK.

Avoid calling CAddrMan::Connected() on block-relay-only peer addresses

Connected() updates the time we serve in addr messages, so avoid leaking
block-relay-only peer connections by avoiding these calls.

daf5553126

Call CAddrMan::Good() on block-relay-only peer addresses

Being able to invoke Good() is important for address management (new vs tried
table, tried table eviction via test-before-evict). We mitigate potential
information leaks by not calling Connected() on these peer addresses.

4fe338ab3e

Refactor test for existing peer connection into own function e8b215a086

Avoid test-before-evict evictions of current peers

Outbound peer logic prevents connecting to addresses that we're already
connected to, so prevent inadvertent eviction of current peers via
test-before-evict by checking this condition and marking current peer's
addresses as Good().

Co-authored-by: John Newbery <john@johnnewbery.com>

16d9bfc417

sdaftuar force-pushed on Oct 27, 2020

sdaftuar commented at 3:42 pm on October 27, 2020: member

Thanks for the review @jnewbery – I went ahead and squashed that last commit into the first one, so that the FinalizeNode interface changes are all together now.

jnewbery commented at 3:58 pm on October 27, 2020: member

utACK 16d9bfc4172b4f6ce24a3cd1a1cfa3933cd26751

in src/net.cpp:2022 in 16d9bfc417

2021+                    // connection anyway, so this avoids inadvertently evicting
2022+                    // a currently-connected peer.
2023+                    addrman.Good(addr);
2024+                    // Select a new table address for our feeler instead.
2025+                    addr = addrman.Select(true);
2026+                }

jonatack commented at 5:23 pm on October 28, 2020:

Commented this here for my own understanding while parsing the changed logic in 16d9bfc4172b4f6ce2.

0-                }
1+                } // Otherwise, keep the selected addr if we have a tried
2+                  // collision we're not already connected to.

jonatack commented at 5:24 pm on October 28, 2020: member

Tested ACK 16d9bfc4172b4f6ce24a3cd1a1cfa3933cd26751

sipa commented at 11:15 pm on October 28, 2020: member

I haven’t looked at the code here, but just tried to verify the test-before-evict bug you mention. I believe it is indeed there, and will result in the conncted node being forgotten. Consider the following scenario:

Addrman has a tried entry A and a new entry B that would collide with A if moved to the tried table. There is a (long term) ongoining outbound connection to A currently.
An outbound connection to B succeeds (VERSION is received) as a non-block-only connection, calling Good(B). Due to the collision, B is not marked tried, but instead recorded in the m_tried_collisions set.
ConnectionType::FEELER connections during this state have a high chance of returning A from SelectTriedCollision() now. However, OpenNetworkConnection on those will fail, as an existing connection to A exists. No addrman state changes result from this (neither nLastSuccess or nLastTry are updated).
The first time ResolveCollisions is called at least 4 hours after the connection to A was opened, and after successfully connecting to B, it will see all connections to A as over ADDRMAN_REPLACEMENT_HOURS old. Assuming the last connection to B at this point is also at least 40 minutes ago (which seems likely), B will take A’s place in the tried table.

in src/net.cpp:2006 in 16d9bfc417

1998@@ -1999,11 +1999,30 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
1999             if (nTries > 100)
2000                 break;
2001 
2002-            CAddrInfo addr = addrman.SelectTriedCollision();
2003+            CAddrInfo addr;
2004 
2005-            // SelectTriedCollision returns an invalid address if it is empty.
2006-            if (!fFeeler || !addr.IsValid()) {
2007-                addr = addrman.Select(fFeeler);

sipa commented at 0:06 am on October 29, 2020:

Very happy to see this rewritten. The logic where we always called SelectTriedCollision and then throw out the result if !fFeeler was very confusing (and inefficient…).

sipa commented at 0:08 am on October 29, 2020: member

utACK 16d9bfc4172b4f6ce24a3cd1a1cfa3933cd26751

amitiuttarwar commented at 5:05 am on October 29, 2020: contributor

approach ACK - I’ve read through the code, understand the proposed changes, and think they make sense.

the update to Connected looks good & I agree that the test-before-evict logic was bugged & this patch fixes it. I also appreciate the refactoring to ThreadOpenConnections, I’ve looked at that code before but wasn’t able to fully retain the pieces to understand the big picture, but with this patch its both fixed and makes sense to me now.

only thing holding me back from a code review ACK is wanting to better understand the full implications of Good to verify that there are no other problems by calling it for block-relay-only connections in both places. I think we are good 😏 but I’m going to continue digging to fully convince myself.

also- the travis failure is bizarre ! https://travis-ci.org/github/bitcoin/bitcoin/jobs/739321630 no error, no logs, nothing. doesn’t even look like a timeout!

update: its been restarted

in src/net_processing.cpp:859 in daf5553126 outdated

854@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
855     if (state->fSyncStarted)
856         nSyncStarted--;
857 
858-    if (misbehavior == 0 && state->fCurrentlyConnected) {
859+    if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) {
860+        // Note: we avoid changing visible addrman state for block-relay-only peers

ariard commented at 0:48 am on October 30, 2020:

I would mention the reference to anchors you made in PR description. Otherwise, I don’t think this change is that much valuable if we had only non-persistent block-relay peers. I don’t see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime isn’t considered by AddrMan::Select ?

Note, maybe we should rename AddrMan::Connected to HasBeenConnected, current comment “Mark an entry as currently-connected to” is confusing.

sdaftuar commented at 2:27 pm on October 30, 2020:

I would mention the reference to anchors you made in PR description. Otherwise, I don’t think this change is that much valuable if we had only non-persistent block-relay peers. I don’t see how learning a past connection would significantly help an attacker to disrupt them further in the future, CAddrInfo.nTime isn’t considered by AddrMan::Select ?

Perhaps it doesn’t, but I think if we left addrman truly unchanged by block-relay-only peers then that would definitely mean that we can’t leak topology information via addrman. I reversed my view on whether we should call Good() for these peers after realizing we didn’t have machinery in place to prevent addrman eviction and it seemed like that could happen too easily (and actually more easily for long-lived block-relay-only peer connections, because their timestamps could never update), but if we did come up with an auxiliary data structure for remembering block-relay-only peers (perhaps, just by making the anchor connections data structure more robust to temporarily losing connectivity) then I might argue again that we should switch the way we do things in the future to try to make these connections not interact with addrman whatsoever. You could imagine for instance having a separate, smaller “block-relay-addrman” that samples from the main addrman whenever it needs more addresses but otherwise has its own eviction rules and never shares its state through any externally probable interface.

So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don’t have a better alternative right now. But I would hate for someone to later argue that it’s somehow safe to make these addrman calls because we don’t know of an information leak – the point is to isolate this part of the p2p protocol so there can’t be an information leak, even one we haven’t thought of. (Leaving this call to Connected() on these peer addresses in the first place was a big oversight in #15759, in my opinion.)

ariard commented at 2:54 am on November 1, 2020:

So philosophically I still think no interaction should be our default way of thinking, and an exception to that is being proposed in this PR because I don’t have a better alternative right now. But I would hate for someone to later argue that it’s somehow safe to make these addrman calls because we don’t know of an information leak

Right, I concede fixing the information leak in itself is valuable even if it’s not currently exploitable by an attacker to interfere with our non-persistent block-relay peer selection. This one might change in the future and turn this leak as a more exploitable, without the timestamp relation being recollected.

I agree it’s a more temporary fix waiting to remove timestamps from addr messages as discussed during last p2p meeting. Going further, even assuming a separate, smaller “block-relay-addrman”, I’m still concerned about the block-relay peers not participating in addr-relay compared to other peers, this fact which might observable in itself to break the hiddeness property of block-relay peers. E.g I was thinking about an adversary tweaking nServices to create a unique fingerprint and observe the absence of propagation on a presumed block-relay link to victim node. I guess isolating addr-relay on its own would make us all more confident, though it’s a more complex, long-term change.

ariard commented at 2:20 am on October 30, 2020: member

Code Review ACK 16d9bfc.

AFAIU this privacy fix it’s a worthy one as it masks the addr of our hidden peers among the wider set of all our known addrs. After this PR, an adversary observing our addr-relay traffic shouldn’t be able to dissociate them.

in src/net_processing.cpp:858 in 16d9bfc417

854@@ -854,7 +855,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
855     if (state->fSyncStarted)
856         nSyncStarted--;
857 
858-    if (misbehavior == 0 && state->fCurrentlyConnected) {
859+    if (misbehavior == 0 && state->fCurrentlyConnected && !node.IsBlockOnlyConn()) {

mzumsande commented at 4:50 pm on October 30, 2020:

This line confused me into thinking that addrman::Connected() could also be called for inbound peers on disconnection - which would be not ideal, because then our peer might be leaking the information about our long-term blocks-only connections by updating nTime on their side of the connection.

However, if I read the code correctly, fCurrentlyConnected, despite its name, actually rather tracks whether it is an sucessfully connected outbound connection, so that this is not an issue (although keeping a flag for that does not seem necessary).

jnewbery commented at 5:47 pm on October 30, 2020:

Yes, you’re exactly right. See #20187 (review).

It’d be great to consolidate the pfrom.IsInboundConn() check here too. That could easily be done as a follow up, eliminating fCurrentlyConnected entirely (and using fSuccessfullyConnected to check that the version handshake is complete).

mzumsande commented at 5:10 pm on October 30, 2020: member

Code-Review ACK 16d9bfc4172b4f6ce24a3cd1a1cfa3933cd26751.

I’m not entirely convinced that calling Good for blocks-only peers leaks strictly no information at all (e.g. it updates internal state such as nLastSuccess, which influence IsTerrible(), which influences GetAddr Response selection), but I couldn’t think of a way to use that, especially after GetAddr caching (18991).

sdaftuar commented at 5:59 pm on October 30, 2020: member

I’m not entirely convinced that calling Good for blocks-only peers leaks strictly no information at all (e.g. it updates internal state such as nLastSuccess, which influence IsTerrible(), which influences GetAddr Response selection), but I couldn’t think of a way to use that, especially after GetAddr caching (18991). @mzumsande I agree that it’s unlikely we’re leaking no information by calling Good(); my previous approach in #15759 was to try to avoid calling any addrman functions for block-relay-only peers, but my conclusion is that addrman-eviction is a worse outcome than the effects of calling Good() (which I estimate as reducing the anonymity set of our block-relay-only peers to the set of addresses in our tried table – hopefully it’s not meaningfully worse than that).

However if we were to structure the addrman differently in the future so that eviction isn’t a concern (as I mention here #20187 (review)), I might suggest that we reverse course again and avoid any interaction between block-relay-only peers and addrman at all.

amitiuttarwar commented at 7:00 pm on November 2, 2020: contributor

code review ACK 16d9bfc417

agree with the prior conversation that we can’t have 100% guarantee we’re not leaking any information, but that not evicting the peers is more important. & I spent some time and can’t figure out any concrete way to exploit any information leak, so I’m comfortable with these changes

thoughts for future improvements:

also saw some mentions along these lines, but I think the code around fUpdateConnectionTime, fCurrentlyConnected, and comment or function name of CAddrMan::Connected could be simplified and made easier to parse. Connected is only invoked when stopping or disconnecting the nodes, but the comment says “Mark an entry as currently-connected-to.” fUpdateConnectionTime is used as an in/out param when FinalizeNode could just directly call Connected. but to reiterate, these are all tangential to this PR. just thoughts that came up as I was trying to understand relevant code paths.

fanquake merged this on Nov 3, 2020

fanquake closed this on Nov 3, 2020

sidhujag referenced this in commit 7ea7d069c9 on Nov 3, 2020

sidhujag referenced this in commit 402829ed13 on Nov 3, 2020

deadalnix referenced this in commit 70bbd9b113 on Dec 21, 2021

deadalnix referenced this in commit d924bd2a89 on Dec 21, 2021

deadalnix referenced this in commit 811f79967e on Dec 21, 2021

deadalnix referenced this in commit 82d88e35d6 on Dec 21, 2021

DrahtBot locked this on Aug 16, 2022

Addrman: test-before-evict bugfix and improvements for block-relay-only peers #20187

Conflicts