#20892 (tests: Run both descriptor and legacy tests within a single test invocation by achow101)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
sipa force-pushed
on Aug 24, 2023
DrahtBot added the label
CI failed
on Aug 24, 2023
DrahtBot removed the label
CI failed
on Aug 24, 2023
sipa force-pushed
on Aug 24, 2023
glozow added the label
P2P
on Aug 24, 2023
Sjors
commented at 11:53 am on August 25, 2023:
member
Not needed for testing here, but it would still be cool to have a lock icon in the peers tab in the GUI, and reveal the session id if you click on it. cc @hebasto
Tested a manual connection between macOS and Ubuntu, found matching session ids, pfew - no man in the middle attack on my local network :-)
DrahtBot removed the label
Needs rebase
on Sep 11, 2023
in
src/protocol.h:295
in
1768aff56doutdated
290@@ -291,6 +291,9 @@ enum ServiceFlags : uint64_t {
291 // See BIP159 for details on how this is implemented.
292 NODE_NETWORK_LIMITED = (1 << 10),
293294+ // NODE_P2P_V2 means the node supports BIP324 transport
295+ NODE_P2P_V2 = (1 << 11),
theStack
commented at 5:12 pm on September 11, 2023:
in
test/functional/p2p_v2.py:36
in
be8a5dcdddoutdated
31+ assert_equal(self.nodes[1].getblockcount(), 0)
32+ with self.nodes[0].assert_debug_log(expected_msgs=[sending_handshake],
33+ unexpected_msgs=[downgrading_to_v1]):
34+ self.connect_nodes(0, 1, True)
35+ # sync_all() verifies that the block tips match
36+ self.sync_all(self.nodes[0:2])
theStack
commented at 5:19 pm on September 11, 2023:
in commit be8a5dcddd36be8a5f137baf3ebd32d590a0747d: Somewhere around here it might make sense to verify that v2 is indeed used by checking the new getpeerinfo fields (“transport_protocol_type” and “session_id”) for nodes 0 and 1.
maflcko
commented at 7:39 am on September 12, 2023:
nit: Not really a fan of hard-coding default value in the code syntax. If the default was changed, this would require changing the code syntax. It would be better to hard-code the default value itself. Even best would be to not hard-code anything anywhere, which should be possible with self.Arg<bool>(2), or similar.
1893@@ -1822,6 +1894,10 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
1894 }
18951896 const bool inbound_onion = std::find(m_onion_binds.begin(), m_onion_binds.end(), addr_bind) != m_onion_binds.end();
1897+ // The V2Transport transparently falls back to V1 behavior when an incoming V1 connection is
1898+ // detected, so use it whenever we signal NODE_P2P_V2.
1899+ const bool use_p2p_v2 = !!(nodeServices & NODE_P2P_V2);
maflcko
commented at 7:47 am on September 12, 2023:
in
test/functional/p2p_v2_transport.py:23
in
2aabbdf4deoutdated
18+
19+ def run_test(self):
20+ network_info = self.nodes[0].getnetworkinfo()
21+ assert_equal(int(network_info["localservices"], 16) & NODE_P2P_V2, 0)
22+ if "P2P_V2" in network_info["localservicesnames"]:
23+ raise AssertionError("Did not expect P2P_V2 to be signaled for a v1 node")
maflcko
commented at 7:58 am on September 12, 2023:
0 assert "P2P_V2" in network_info["localservicesnames"]
1
nit: Can be written shorter? Also, the full test can be removed and the two checks can be added to the p2p_v2.py test? This avoids having to spin up a node twice to call a single RPC each time.
maflcko
commented at 8:02 am on September 12, 2023:
member
left some nits on the tests, feel free to ignore.
achow101
commented at 6:41 pm on September 13, 2023:
member
Couple things while testing this out:
Would be nice to have the transport type and session id also show in the GUI’s peer info page
Could we log the connection version once the connection is established? Maybe once we receive their version message? The transport version can be figured out from the logs for outbound connections, but not for inbound.
Sjors
commented at 7:06 pm on September 13, 2023:
member
(deleted my last comment, forgot to add true to the RPC call)
in
src/net.h:1113
in
1fa8efaff7outdated
1080@@ -1077,7 +1081,11 @@ class CConnman
1081 vWhitelistedRange = connOptions.vWhitelistedRange;
1082 {
1083 LOCK(m_added_nodes_mutex);
1084- m_added_nodes = connOptions.m_added_nodes;
1085+
1086+ for (const std::string& strAddedNode : connOptions.m_added_nodes) {
1087+ // -addnode cli arg does not currently have a way to signal BIP324 support
Possibly, but at this stage my goal is to get enough code in for basic tests and experimentation, and for that, an RPC suffices. Further on, we’ll certainly want more convenient ways of interacting with this (perhaps we can treat it as a whitelist permission or so, too, but I don’t want to bikeshed about that here).
Sjors
commented at 11:15 pm on September 13, 2023:
It can wait indeed. It’s just that I have some nodes that I’d like to connect by default.
mzumsande
commented at 7:58 pm on September 14, 2023:
There are also other means of making outbound connections based on user input, like -connect or -seednode for which at some later point a decision must be made (make configurable / keep default at v1 / switch to v2 ).
There is also the possibility of extending the reconnection mechanism to switch to V2 when you connect using V1 to a node and they tell you they support V2.
in
test/functional/test_framework/test_framework.py:192
in
2aabbdf4deoutdated
188@@ -189,6 +189,8 @@ def parse_args(self):
189 parser.add_argument("--randomseed", type=int,
190 help="set a random seed for deterministically reproducing a previous test run")
191 parser.add_argument("--timeout-factor", dest="timeout_factor", type=float, help="adjust test timeouts by a factor. Setting it to 0 disables all timeouts")
192+ parser.add_argument("--v2transport", dest="v2transport", default=False, action="store_true",
I didn’t study 8672b854fb7a81a400150ef09e99a06c829b68f8 much.
Can you expand the description of 4cfbbceff35fcf22a232c66278c4ababd750d9bb a bit to explain under what circumstances PerformReconnections() is called and elaborate a bit on the various delays? Particuarly, if we tried a v2 connection and failed, how long do we wait before retrying with v1?
in
src/net.cpp:3509
in
2aabbdf4deoutdated
3513+ for (const auto& it : m_added_nodes) {
3514+ if (added_node_params.m_added_node == it.m_added_node) return false;
3515 }
35163517- m_added_nodes.push_back(strNode);
3518+ m_added_nodes.push_back(added_node_params);
jonatack
commented at 8:25 pm on September 13, 2023:
Note that both the AddNode() and RemoveAddedNode() methods become one-liners with the change in 7caaa9e (#28248), if you’d like to pull it in before this change.
jonatack
commented at 8:30 pm on September 13, 2023:
contributor
Approach ACK. Suggest reviewers consider testing this using the logging in #28248 that uncovered peer connection issues in the interactions between automatic and manual, outbound and inbound, and some that are network-specific, given that this pull adds another dimension.
hebasto
commented at 10:22 pm on September 13, 2023:
member
Would be nice to have the transport type and session id also show in the GUI’s peer info page
Could we log the connection version once the connection is established? Maybe once we receive their version message? The transport version can be figured out from the logs for outbound connections, but not for inbound.
I’ve added logging on VERACK for both inbound and outbound (but only with -debug=net for inbound, because possible spam).
@Sjors
Can you expand the description of https://github.com/bitcoin/bitcoin/commit/4cfbbceff35fcf22a232c66278c4ababd750d9bb a bit to explain under what circumstances PerformReconnections() is called and elaborate a bit on the various delays? Particuarly, if we tried a v2 connection and failed, how long do we wait before retrying with v1?
It should be pretty much immediately - the very next time a connection is attempted to be created.
in
src/net.cpp:2886
in
f204845319outdated
2882@@ -2834,6 +2883,8 @@ void CConnman::ThreadOpenAddedConnections()
2883 // Retry every 60 seconds if a connection was attempted, otherwise two seconds
2884 if (!interruptNet.sleep_for(std::chrono::seconds(tried ? 60 : 2)))
2885 return;
2886+ // See if any reconnections are desired.
2887+ PerformReconnections();
Sjors
commented at 11:18 pm on September 13, 2023:
nit: Can be written shorter by combining the call with the line below.
The general idea is that for each call to generate the dev is forced to specify which nodes are expected to be synced after the call, to make intermittent races/errors less common and assumptions easier to follow.
(Same above)
(My idea was to force devs to type invalid_call, which would make them realize that the call is considered “invalid”, but maybe that doesn’t work :sweat_smile:)
mzumsande
commented at 8:34 pm on September 14, 2023:
could return an error if we explicitely set the bool with the RPC command but don’t have activated -v2transport ourselves (instead of silently downgrading to v1).
So it seems that even though the p2p v2 handshake isn’t completed yet, the “transport_protocol_type” is already reported as “v2”. Should we also use “detecting” (or something else) for the sender side in this case?
Haven’t checked in detail, but I suspect that this happens if a peer according to addrman signals NODE_P2P_V2 and we hence try to initiate a v2 connection, but the other side only supports p2p v1? (or, for some other reason the handshake doesn’t complete).
sipa force-pushed
on Sep 15, 2023
sipa
commented at 12:06 pm on September 15, 2023:
member
What’s going on is that the code only reported “detecting” for inbound possibly-v2 connections where the 12-byte prefix hasn’t been received yet. In inbound that’s more or less correct, except that peer that send non-bitcoin traffic would also get incorrectly classified as v2.
I’ve changed it to align the transport version and the session id: only say v2 when the garbage auth packet has been completely received; before that point “detecting” will be reported.
Diff:
0--- a/src/net.cpp
1+++ b/src/net.cpp
2@@ -1637,15 +1637,14 @@ Transport::Info V2Transport::GetInfo() const noexcept
3 4 Transport::Info info;
5 6- // Report DETECTING as transport type if we don't know whether the peer is V1 or V2.
7- info.transport_type = m_recv_state ==
8- RecvState::KEY_MAYBE_V1 ? TransportProtocolType::DETECTING : TransportProtocolType::V2;
9-
10- // Do not report the session ID until the garbage authentication packet has been received and
11- // verified (confirming that the other side very likely has the same keys as us).
12+ // Do not report v2 and session ID until the garbage authentication packet has been received
13+ // and verified (confirming that the other side very likely has the same keys as us).
14 if (m_recv_state != RecvState::KEY_MAYBE_V1 && m_recv_state != RecvState::KEY &&
15 m_recv_state != RecvState::GARB_GARBTERM && m_recv_state != RecvState::GARBAUTH) {
16+ info.transport_type = TransportProtocolType::V2;
17 info.session_id = uint256(MakeUCharSpan(m_cipher.GetSessionID()));
18+ } else {
19+ info.transport_type = TransportProtocolType::DETECTING;
20 }
2122 return info;
vasild
commented at 3:14 pm on September 15, 2023:
contributor
Concept ACK
-v2transport is maybe a bit too obscure from the users’ point of view? Something like -p2pencryption may be clearer.
sipa
commented at 3:18 pm on September 15, 2023:
member
I consider the -v2transport an option just for people who want to experiment with it, before it gets enabled by default.
I worry that advertising the feature as “encryption” would result in incorrect assumptions about what it achieves (which is raising costs for large-scale observers, if deployed at scale; it doesn’t really protect privacy for your own actions against a deliberate attacker).
mzumsande
commented at 3:25 pm on September 15, 2023:
I think this means that reconnections for added nodes can be made from ThreadOpenConnections, and reconnections for automatic connections can be performed by ThreadOpenAddedConnections - which seems really weird, although I don’t think it breaks anything. I wonder if it makes sense to perform the respective reconnection attempts only by their designated threads.
I agree it’s somewhat strange, but I’m not convinced that splitting them up is better.
Pro current approach:
Slightly lower latency for reconnections with 2 threads considering them.
Simpler implementation.
Not entirely clear what to do with connections not created by either thread (like addnode oneshot, unsure if there are others). I guess they can go in the addnode thread too.
Pro split:
No interference between the two types. For example, if you’d configure lots of addnodes errorneously as V2, with the currently approach they’ll end up slowing down normal (random) peer connection establishment.
mzumsande
commented at 7:28 pm on September 25, 2023:
Ok - as pointed out below it could also help resolve problems with a blocking semaphore to have another thread trying as well.
mzumsande
commented at 3:34 pm on September 15, 2023:
contributor
Concept ACK
sipa force-pushed
on Sep 15, 2023
sipa force-pushed
on Sep 15, 2023
DrahtBot added the label
CI failed
on Sep 15, 2023
sipa force-pushed
on Sep 19, 2023
sipa
commented at 9:21 am on September 19, 2023:
member
in
src/node/connection_types.h:84
in
0c345514b8outdated
78@@ -79,4 +79,14 @@ enum class ConnectionType {
79 /** Convert ConnectionType enum to a string value */
80 std::string ConnectionTypeAsString(ConnectionType conn_type);
8182+/** Transport layer version */
83+enum class TransportProtocolType {
84+ DETECTING, //!< Incoming connection; peer could be v1 or v2
theStack
commented at 9:32 am on September 19, 2023:
0 DETECTING, //!< peer could be v1 or v2
since the latest change in V2Transport::GetInfo (introduced in #28331 (comment)) the DETECTING state is now used both for in- and outbound connections
theStack
commented at 10:19 am on September 19, 2023:
contributor
Statistical data reporting nit: apparently the garbage data is accounted for the sender side as empty type ("") in "bytessent_per_msg", while for the receiver side ("bytesrecv_per_msg") it’s not, e.g.:
(seeing this for both in- and outbound connections)
Left a few other nits below. I’ve been running this PR for several days now with several in- and outbound peers and everything seems to work just fine AFAICT. Currently also running an IBD with a single BIP324 peer (the only one that @Sjors’ DNS seed x809.seed.bitcoin.sprovoost.nl returned, which was recently added to bitcoin-seeder, see https://github.com/sipa/bitcoin-seeder/pull/102).
@jonatack:
Would be nice to have the transport type and session id also show in the GUI’s peer info page
Am updating -netinfo for bip324 / v2 p2p as well.
Looking very much forward to that :rocket:
DrahtBot removed the label
CI failed
on Sep 19, 2023
sipa force-pushed
on Sep 20, 2023
DrahtBot added the label
Needs rebase
on Sep 21, 2023
sipa force-pushed
on Sep 22, 2023
sipa
commented at 10:41 am on September 22, 2023:
member
Rebased.
@theStack I guess the fact that the handshake bytes are reported in the send statistics but not in the receive statistics is inconsistent. There is a simple solution, I think, namely adding a “message” in the ReceivedMessageComplete sense to be retrieved to transition from VERSION to APP state (but one with no payload, and reject = true), to communicate the handshake bytes to the net layer. I prefer doing this in a follow-up, though, as there may be a wider question about how to report the various types of bytes (maybe we want to classify garbage/key/version separately, and maybe we want to support reporting decoy data too).
DrahtBot removed the label
Needs rebase
on Sep 22, 2023
DrahtBot added the label
CI failed
on Sep 22, 2023
sipa force-pushed
on Sep 22, 2023
DrahtBot removed the label
CI failed
on Sep 22, 2023
DrahtBot added the label
CI failed
on Sep 23, 2023
theStack
commented at 11:39 pm on September 23, 2023:
contributor
Light ACK964a8b88ef69643181e11475f03f94cc2780772e
(still planning to look deeper at 58ba59f39c217f6835e7839c391dd680ee9d62f2 and its implications w.r.t. the discussion #28331 (review))
Tested more intensly over for the last few days with some more in- and outbound connections and everything was working fine. Unexpected disconnects and other seemingly weird behaviour patterns turned out to be wrong alarms not related to any of the new p2p v2 code, presumably the connection was just bad for outbound connection tests on my laptop (unstable/slow wifi at conferences…). Will keep testing over the following days.
@theStack I guess the fact that the handshake bytes are reported in the send statistics but not in the receive statistics is inconsistent. There is a simple solution, I think, namely adding a “message” in the ReceivedMessageComplete sense to be retrieved to transition from VERSION to APP state (but one with no payload, and reject = true), to communicate the handshake bytes to the net layer. I prefer doing this in a follow-up, though, as there may be a wider question about how to report the various types of bytes (maybe we want to classify garbage/key/version separately, and maybe we want to support reporting decoy data too).
Sounds good to me. Another easy alternative to get rid of the inconsistency would be to simply report the handshake bytes in neither side for now until we figure out how we want to report the different types of bytes. E.g.:
0index 73dbb090bc..0bd7d68e96 100644
1--- a/src/net.cpp
2+++ b/src/net.cpp
3@@ -1699,7 +1699,9 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
4 // Notify transport that bytes have been processed.
5 node.m_transport->MarkBytesSent(nBytes);
6 // Update statistics per message type.
7- node.AccountForSentBytes(msg_type, nBytes);
8+ if (!msg_type.empty()) { // don't report v2 handshake bytes for now
9+ node.AccountForSentBytes(msg_type, nBytes);
10+ }
11 nSentSize += nBytes;
12 if ((size_t)nBytes != data.size()) {
13 // could not send full message; stop sending more
(as far as I’m aware, this wouldn’t change the statistics for v1 peers as we haven’t used the empty string as reporting type anywhere before)
DrahtBot removed the label
CI failed
on Sep 25, 2023
ajtowns
commented at 12:01 pm on September 25, 2023:
struct ReconnectionInfo { ... }; std::list<ReconnectionInfo> m_reconnections; ? If you’re going to comment names for the entries, might as well actually give them names.
ajtowns
commented at 12:29 pm on September 25, 2023:
This seems a little bit complex/brittle doesn’t it?
The issue I’m thinking of is that ThreadOpenConnections will block on CSemaphoreGrant grant(*semOutbound) waiting for one to disconnect and release its grant, but the peer wanting a reconnection attempt doesn’t release its grant, so the thread here won’t be woken up. That’s okay normally though: there’s slots for 8 outbounds, 2 block relay connections and a feeler/extra block relay; so usually there will be a spare slot, so the code will continue through and arrive back to the 500ms wait, then handle reconnections again. And even if they are all used up, every minute, ThreadOpenAddedConnections will help out, unless all the slots there are also used up.
I don’t think that’s super problematic – at worst I think it just performs a bit poorly, by delaying a reconnect for a minute or until your feeler/extra block relay peer gets disconnected. But mostly it feels like gating this through semaphores is just a bit too simple for what we’re trying to achieve.
I guess I think it would be better to drop the semaphore approach and just have a queue of “open connection” work, one or two (or n) threads that attempt to open connections, and then they just wake up via a condition variable to see what work there is to do (outbound, block relay, addnode, or reconnection) and do it.
@ajtowns I agree it would be a better to have a more integrated design for managing the creation of open connections. Do you think the code changes for that are simple enough to do something like that in this PR, or should we keep it for a follow-up improvement?
ajtowns
commented at 4:07 am on September 27, 2023:
Seems better as a follow-up to me; even if the code changes ended up being simple, thinking through all the possible edge cases seems complex enough to justify its own PR.
ajtowns
commented at 1:16 pm on September 25, 2023:
Did you consider an approach where the CNode object is preserved across the reconnection attempt? I guess that would involve an annoying amount of messing with ConnectNode. It would be nice if the peer id didn’t change between the v2 and v1 connection attempts though - it’s a bit awkward to correlate the two attempts when you can’t just grep for peer=nnn
mzumsande
commented at 6:44 pm on September 25, 2023:
also, I think that this might lead to (temporary) overshoots in the number of connection subtypes that share semOutbound, for example:
Make 8th full-relay connection using v2, get disconnected and add to m_reconnections.
Make another outbound from ThreadOpenConnections()
call PerformReconnections() and make a 9th full outbound
Although this shouldn’t be too bad since EvictExtraOutboundPeers should deal with this and disconnect one of the extra connections both for block-relay and full outbound case.
I did consider reusing the CNode object, but it seems hard to do, because in general we only construct a CNode object after the connection is established. I guess it would be possible to instead leave to-be-reconnected nodes in the nodes array, but it would need more substantial changes (factoring out initialization logic from the CNode constructor, figuring out the interaction with various special states, dealing with any code that assumes any node in the array is active, …). This would however solve both of the mentioned issues (node id numbers changing on reconnect, and accounting of connection subtypes).
Just making the NodeId be reused would be a much smaller change, though I’m not convinced there is no code somewhere that assumes that distinct objects use distinct ids (and it seems possible in theory that the old object survives in m_nodes_disconnected actually while the new one is already created?).
If possible, I’d prefer to keep this for a follow-up.
ajtowns
commented at 6:04 am on September 27, 2023:
ajtowns
commented at 3:08 pm on September 25, 2023:
I wonder how much sense it really makes to have the Transport virtual class – every V2Transport has a V1Transport m_v1_fallback member, so could just use V2Transport everywhere, and set m_recv_state and m_send_state to V1 when we want v1, instead of allocating a different structure.
Fair point, when starting off with this approach I think we were assuming that downgrade from V2 to V1 would happen at the CNode level rather than inside the transports.
It’s pretty easy to address this: add a start_v1 parameter to the V2Transport constructors, and change the std::unique_ptr<Transport> to V2Transport. Even if they keep deriving from Transport, the fact that they’re final classes means no virtual function call overhead.
Still, as long as we don’t enable -v2transport by default though, the virtual class approach makes sure that no V2Transport code is active unless enabled, though at the cost of virtual function calls. I think that’s fine, but it’s also easy to change.
1598+ // we reconnect.
1599+ if (m_recv_state != RecvState::KEY) return false;
1600+ // Only when nothing has been received do we reconnect.
1601+ if (!m_recv_buffer.empty()) return false;
1602+ // Check if we've sent enough for the other side to disconnect us (if it was V1).
1603+ LOCK(m_send_mutex);
ajtowns
commented at 3:14 pm on September 25, 2023:
Should you release the m_recv_mutex lock before taking the m_send_mutex lock? Presumably it shouldn’t possibly matter by the time you’re worrying about reconnecting, though.
I don’t think this matters; the connection is already closed here, so nothing except V2Transport::ShouldReconnectV1 should be accessing these locks anymore anyway. And if m_recv_mutex is released before, an overly keen code reviewer may wonder if the receive state couldn’t change in between the release and the grab of m_send_mutex.
in
src/net.cpp:1952
in
5917b6b5d7outdated
1943@@ -1917,6 +1944,19 @@ void CConnman::DisconnectNodes()
1944 // remove from m_nodes
1945 m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
19461947+ // Add to reconnection list if appropriate. We don't reconnect right here, because
1948+ // the creation of a connection is a blocking operation (up to several seconds),
1949+ // and we don't want to hold up the socket handler thread for that long.
1950+ if (pnode->m_transport->ShouldReconnectV1()) {
1951+ reconnections_to_add.emplace_back(
1952+ /*addr_connect=*/CAddress{pnode->addr, ServiceFlags{pnode->addr.nServices & ~NODE_P2P_V2}},
ajtowns
commented at 3:19 pm on September 25, 2023:
Is it worth adding a comment that removing NODE_P2P_V2 here is what ensures it’s retried as a v1 connection in ConnectNode ?
288@@ -289,11 +289,12 @@ static RPCHelpMan addnode()
289 {
290 {"node", RPCArg::Type::STR, RPCArg::Optional::NO, "The address of the peer to connect to"},
291 {"command", RPCArg::Type::STR, RPCArg::Optional::NO, "'add' to add a node to the list, 'remove' to remove a node from the list, 'onetry' to try a connection to the node once"},
292+ {"v2transport", RPCArg::Type::BOOL, RPCArg::Default{false}, "Peer supports BIP324 v2 transport protocol"},
mzumsande
commented at 3:43 pm on September 25, 2023:
nit: would prefer “Attempt to connect with BIP324 v2 transport protocol” or something similar, we usually don’t know for sure if the peer actually supports it (and will downgrade if they don’t).
Renamed, and also renamed some related variables in other places to avoid name collisions.
mzumsande
commented at 7:50 pm on September 25, 2023:
contributor
ACK964a8b88ef69643181e11475f03f94cc2780772e (comments are mostly nits)
Reviewed the code, and also did some manual testing for the downgrade mechanism.
I have tested this on mainnet for more than a week now and not encountered any problems (one v2 peer has been connected for 11 days).
sipa force-pushed
on Sep 27, 2023
sipa
commented at 12:48 pm on September 27, 2023:
member
@theStack I’ve included your patch to not include v2 handshake bytes in the per-message-type send byte statistics, as a separate commit. I’ve also mentioned the idea as a potential follow-up in #27634.
@real-or-random I’ve added a commit to mention BIP324 in doc/bips.md.
DrahtBot added the label
CI failed
on Sep 27, 2023
sipa force-pushed
on Sep 27, 2023
theStack approved
theStack
commented at 9:25 pm on September 27, 2023:
contributor
I think we also want to mention BIP324 in the release notes, even though it’s default-off? (doesn’t necessarily have to be in this PR though).
DrahtBot requested review from ajtowns
on Sep 27, 2023
DrahtBot requested review from mzumsande
on Sep 27, 2023
DrahtBot removed the label
CI failed
on Sep 27, 2023
in
src/net.cpp:1582
in
830b23ab80outdated
1608+ LOCK(m_recv_mutex);
1609+ // Only in the very first state (where m_recv_buffer.empty() means nothing received) do
1610+ // we reconnect.
1611+ if (m_recv_state != RecvState::KEY) return false;
1612+ // Only when nothing has been received do we reconnect.
1613+ if (!m_recv_buffer.empty()) return false;
vasild
commented at 12:09 pm on September 28, 2023:
The two comments don’t quite match the two conditions - the first comment mentions m_recv_buffer.empty() which is checked in the second condition. Maybe move (where m_recv_buffer.empty() means nothing received) from the first comment to the second.
vasild
commented at 12:14 pm on September 28, 2023:
Commit 75c02891b1net: use V2Transport when NODE_P2P_V2 service flag is present is somewhat broken in isolation because it would try to connect using V2 without a fallback to V1. This is later amended in commit d22f715331net: reconnect with V1Transport under certain conditions. Maybe squash both into one commit?
I feel they’re sufficiently distinct aspects that having them in separate commits is helpful for review. Sure, the first commit on its own may not result in exactly ideal behavior, but I also don’t think it’s broken.
EDIT: reverted. Apparently the (existing) semantics related moving of semaphore grants is that the grant moves, but the associated semaphore remains.
vasild
commented at 9:52 am on September 29, 2023:
Is that necessary? I find it unexpected to leave the semaphore. Usually moved-from objects remain in some empty but valid state, e.g. as if default constructed. A semaphore grant that has a semaphore in it is not that. Did something break when you set other.sem = nullptr? I tried this patch:
and both unit tests and functional tests passed. I think the moved-from semaphore grant is not used after the move and is destroyed shortly afterwards.
It very quickly crashes on normal mainnet for me with this patch, possibly because it’s related to automatic connections which aren’t used in the functional test framework.
I do want to investigate more what’s actually necessary here, because I agree that relying on the state of a moved-from object is strange.
ThreadOpenAddedConnections creates a CSemaphoreGrant before it loops over vInfo. Maybe that causes problems for a subsequent item? This function seems to relate to manually added connections.
Most functional tests use the onetry command when they call addnode, which calls OpenNetworkConnection directly. Except for rpc_net.py which uses add, but it’s never trying more than 1 node at a time.
vasild
commented at 5:10 pm on September 29, 2023:
Hmm, so the code in CConnman::ThreadOpenAddedConnections() acquires the grant once and then opens connections to all entries in vInfo. Is this intended? Looks like a bug to me. It might as well exceed MAX_ADDNODE_CONNECTIONS. Unrelated to this PR, but should it acquire the grant inside the for (const AddedNodeInfo& info : vInfo) { loop?
mzumsande
commented at 5:19 pm on September 29, 2023:
Wouldn’t grant.TryAcquire() inside the for loop fail and break once we reach MAX_ADDNODE_CONNECTIONS?
ajtowns
commented at 5:26 pm on September 29, 2023:
@vasild Agree. The code is fine at the moment, because passing &grant into OpenNetworkConnections at most only releases our grant, without invalidating it. Changing it to invalidate the grant makes our next addnode attempt fail due to reusing an invalid object.
(There’s no bug here I think – if we hid max addnode connections, the grant.TryAcquire() will fail and we’ll break out of the loop. Moving it into the loop as CSemaphoreGrant grant(*semAddNode, /*fTry=*/tried); would work, I think)
vasild
commented at 5:48 pm on September 29, 2023:
Right, I did not see the TryAcquire() call :facepalm:, no bug in master.
Now the question that is relevant to this PR: what about moving the grant creation inside the for loop:
0--- i/src/net.cpp
1+++ w/src/net.cpp
2@@ -2861,18 +2861,18 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
3 void CConnman::ThreadOpenAddedConnections()
4 {
5 AssertLockNotHeld(m_unused_i2p_sessions_mutex);
6 AssertLockNotHeld(m_reconnections_mutex);
7 while (true)
8 {
9- CSemaphoreGrant grant(*semAddnode);
10 std::vector<AddedNodeInfo> vInfo = GetAddedNodeInfo();
11 bool tried = false;
12 for (const AddedNodeInfo& info : vInfo) {
13 if (!info.fConnected) {
14- if (!grant.TryAcquire()) {
15+ CSemaphoreGrant grant(*semAddnode, /*fTry=*/true);
16+ if (!grant) {
17 // If we've used up our semaphore and need a new one, let's not wait here since while we are waiti
18 // the addednodeinfo state might change.
19 break;
20 }
and having the move semantic (more naturally) leave the moved-from object in a default-constructed state (have grant=false, sem=null)?
I’ve re-introduced the other.sem = nullptr; statements in the move constructor/assignment, and made the corresponding fix in ThreadOpenAddedConnections too. I’ve also refactored the semaphore modernization commit a bit more to get rid of the pointers and use rvalue reference instead when moving is intended (to make it clearer when a variable shouldn’t be used anymore).
@Sjors Thanks a lot for reproducing the error. It helped locate the issue, and I used it to confirm the code I just pushed fixes the problem. Would you mind adding the test as a separate PR, if possible?
Nice! I don’t think my test is usable other than for manual testing. We currently can’t mock the 60 second delay involved. We also can’t configure the maximum connection count so it would need 10 test nodes, though that’s doable. I’ll open an issue.
vasild
commented at 12:45 pm on September 28, 2023:
This will create a temporary object of type ReconnectionInfo, and will move-construct a new element in reconnections_to_add from the temporary. This defeats the purpose of emplace_back(), might as well use push_back(), or drop ReconnectionInfo{.
Switched to push_back and dropped ReconnectionInfo{.
in
src/net.h:742
in
830b23ab80outdated
738@@ -704,8 +739,12 @@ class CNode
739 // Bind address of our side of the connection
740 const CAddress addrBind;
741 const std::string m_addr_name;
742+ /** The pszDest argument provided ConnectNode(). Only used for reconnections. */
vasild
commented at 1:06 pm on September 28, 2023:
0 /** The pszDest argument provided to ConnectNode(). Only used for reconnections. */
1978@@ -1906,6 +1979,20 @@ void CConnman::DisconnectNodes()
1979 // remove from m_nodes
1980 m_nodes.erase(remove(m_nodes.begin(), m_nodes.end(), pnode), m_nodes.end());
19811982+ // Add to reconnection list if appropriate. We don't reconnect right here, because
1983+ // the creation of a connection is a blocking operation (up to several seconds),
1984+ // and we don't want to hold up the socket handler thread for that long.
1985+ if (pnode->m_transport->ShouldReconnectV1()) {
vasild
commented at 1:10 pm on September 28, 2023:
When we fail to connect using V2 and succeed to connect using V1, should we bother removing NODE_P2P_V2 for that node from addrman?
That already happens; whenever an outbound peer’s services are received from them, we overwrite their entry in addrman (see m_addrman.SetServices(pfrom.addr, nServices); line in net_processing). It’s not dependent on the actual transport used for the current connection, but I don’t think that incorporating that is helpful. If a peer lies about their supported services, they only hurt their own ability of being connected to.
Worth noting here is that if an address is already in addrman and we receive it via gossip we update its services, but in a cumulative manner - only add new services, without removing existent services from addrman:
@vasild FWIW, that’s the primary reason why we really need a v1-reconnect concept when v2 connections fail; otherwise it’d be trivial for attackers to inject the NODE_P2P_V2 flag in all addresses they relay and cause widespread inability to connect.
in
src/node/connection_types.h:88
in
830b23ab80outdated
78@@ -79,4 +79,14 @@ enum class ConnectionType {
79 /** Convert ConnectionType enum to a string value */
80 std::string ConnectionTypeAsString(ConnectionType conn_type);
8182+/** Transport layer version */
83+enum class TransportProtocolType {
84+ DETECTING, //!< Peer could be v1 or v2
85+ V1, //!< Unencrypted, plaintext protocol
86+ V2, //!< BIP324 protocol
87+};
vasild
commented at 1:22 pm on September 28, 2023:
I don’t know why I don’t see this warning when compiling, but only inside youcompleteme. Both should be using the same flags. Anyway, mentioning it here:
src/node/connection_types.h|17 col 12 error| Enum ‘ConnectionType’ uses a larger base type (‘int’, size: 4 bytes) than necessary for its value set, consider using ‘std::uint8_t’ (1 byte) as the base type to reduce its size [performance- enum-size]
Sjors
commented at 12:52 pm on September 29, 2023:
af8a90d0c27c3b0787b66094e7db6662596bed71 test seems unrelated to this commit and probably could have been introduced earlier in 9b378d3093a3c7ae3c944e469faf5bea67459b07.
af8a90d0c27c3b0787b66094e7db6662596bed71: isn’t it usually (always?) the case that either addrConnectorpszDest is provided by the caller? In which case this wouldn’t do anything, but it’s not harmful.
“in which case this wouldn’t do anything” -> what is the “this” referring to?
vasild
commented at 4:58 pm on September 29, 2023:
either addrConnect or pszDest is provided by the caller
That used to be the case before this PR. This PR changes that to pass information via both params. I do not like this as I find it confusing but I did not bother mentioning it as too minor and subjective and I did not see a clear better way. Now that this has been raised already:
This PR uses addrConnect to pass the service flags even if the address to connect to is in pszDest. Can this be improved? Maybe instead of pzsDest pass struct X { const char* dest, ServiceFlags services}? And then pass std::variant of either CAddress or X (one argument instead of two).
I agree with making a change like this, but I fear the scope of that could easily grow too large for this PR. There are several other places in the network code where this “string name or CAddress” variant concept exist, and if we’re going to introduce a variant type or something similar for that, I’d prefer for it to be done everywhere. It is also relevant for #28155 (@sr-gi) I think.
If you feel the mixing of the two argument is a problem in this PR, what about the more narrow change of adding a use_v2transport boolean argument to ConnectNode and OpenNetworkConnection instead, and ignoring the addrConnect.nServices field?
Setting it to addrConnect.nServices instead of leaving the default initialized NODE_NONE.
This PR changes that to pass information via both params. I do not like this
Ah, I didn’t notice that. No strong opinion but in that case I think we should document the pszDest and addrConnect arguments of ConnectNode and OpenNetworkConnection.
adding a use_v2transport boolean argument to ConnectNode and OpenNetworkConnection instead
Sjors
commented at 4:31 pm on September 29, 2023:
member
Reviewed down to 7ad356c3e1d7319af7f9c96a1bfa30b997c6adde.
sipa force-pushed
on Sep 29, 2023
sipa force-pushed
on Sep 29, 2023
DrahtBot added the label
CI failed
on Sep 29, 2023
DrahtBot removed the label
CI failed
on Sep 29, 2023
sipa force-pushed
on Sep 30, 2023
sipa
commented at 1:14 pm on September 30, 2023:
member
I’ve added a use_v2transport argument to OpenNetworkConnection and ConnectNode, rather than looking at the addrConnect.nServices (as not all call sites use addrConnect):
0diff --git a/src/net.cpp b/src/net.cpp
1index 8bd4aebbae..7e70a0586a 100644
2--- a/src/net.cpp
3+++ b/src/net.cpp
4@@ -439,7 +439,7 @@ static CAddress GetBindAddress(const Sock& sock)
5 return addr_bind;
6 }
7 8-CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type)
9+CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type, bool use_v2transport)
10 {
11 AssertLockNotHeld(m_unused_i2p_sessions_mutex);
12 assert(conn_type != ConnectionType::INBOUND);
13@@ -457,9 +457,6 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
14 }
15 }
16 17- // Use v2 transport when both us and them signal NODE_P2P_V2.
18- const bool use_v2transport(addrConnect.nServices & GetLocalServices() & NODE_P2P_V2);
19-
20 LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "trying %s connection %s lastseen=%.1fhrs\n",
21 use_v2transport ? "v2" : "v1",
22 pszDest ? pszDest : addrConnect.ToStringAddrPort(),
23@@ -572,7 +569,6 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
24 if (!addr_bind.IsValid()) {
25 addr_bind = GetBindAddress(*sock);
26 }
27-
28 CNode* pnode = new CNode(id,
29 std::move(sock),
30 addrConnect,
31@@ -1920,7 +1916,7 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
32 CSemaphoreGrant grant(*semOutbound, true);
33 if (!grant) return false;
34 35- OpenNetworkConnection(CAddress(), false, std::move(grant), address.c_str(), conn_type);
36+ OpenNetworkConnection(CAddress(), false, std::move(grant), address.c_str(), conn_type, /*use_v2transport=*/false);
37 return true;
38 }
39 40@@ -1958,8 +1954,7 @@ void CConnman::DisconnectNodes()
41 // and we don't want to hold up the socket handler thread for that long.
42 if (pnode->m_transport->ShouldReconnectV1()) {
43 reconnections_to_add.push_back({
44- // Removing NODE_P2P_V2 here ensures we reconnect using V1.
45- .addr_connect = CAddress{pnode->addr, ServiceFlags{pnode->addr.nServices & ~NODE_P2P_V2}},
46+ .addr_connect = pnode->addr,
47 .count_failure = pnode->m_count_failure_after_reconnect,
48 .grant = std::move(pnode->grantOutbound),
49 .destination = pnode->m_dest,
50@@ -2378,7 +2373,7 @@ void CConnman::ProcessAddrFetch()
51 CAddress addr;
52 CSemaphoreGrant grant(*semOutbound, /*fTry=*/true);
53 if (grant) {
54- OpenNetworkConnection(addr, false, std::move(grant), strDest.c_str(), ConnectionType::ADDR_FETCH);
55+ OpenNetworkConnection(addr, false, std::move(grant), strDest.c_str(), ConnectionType::ADDR_FETCH, /*use_v2transport=*/false);
56 }
57 }
58 59@@ -2481,7 +2476,7 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
60 for (const std::string& strAddr : connect)
61 {
62 CAddress addr(CService(), NODE_NONE);
63- OpenNetworkConnection(addr, false, {}, strAddr.c_str(), ConnectionType::MANUAL);
64+ OpenNetworkConnection(addr, false, {}, strAddr.c_str(), ConnectionType::MANUAL, /*use_v2transport=*/false);
65 for (int i = 0; i < 10 && i < nLoop; i++)
66 {
67 if (!interruptNet.sleep_for(std::chrono::milliseconds(500)))
68@@ -2786,7 +2781,9 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
69 // Don't record addrman failure attempts when node is offline. This can be identified since all local
70 // network connections (if any) belong in the same netgroup, and the size of `outbound_ipv46_peer_netgroups` would only be 1.
71 const bool count_failures{((int)outbound_ipv46_peer_netgroups.size() + outbound_privacy_network_peers) >= std::min(nMaxConnections - 1, 2)};
72- OpenNetworkConnection(addrConnect, count_failures, std::move(grant), /*strDest=*/nullptr, conn_type);
73+ // Use BIP324 transport when both us and them have NODE_V2_P2P set.
74+ const bool use_v2transport(addrConnect.nServices & GetLocalServices() & NODE_P2P_V2);
75+ OpenNetworkConnection(addrConnect, count_failures, std::move(grant), /*strDest=*/nullptr, conn_type, use_v2transport);
76 }
77 }
78 }
79@@ -2876,12 +2873,7 @@ void CConnman::ThreadOpenAddedConnections()
80 }
81 tried = true;
82 CAddress addr(CService(), NODE_NONE);
83-
84- if (info.m_params.m_use_v2transport) {
85- addr.nServices = ServiceFlags(addr.nServices | NODE_P2P_V2);
86- }
87-
88- OpenNetworkConnection(addr, false, std::move(grant), info.m_params.m_added_node.c_str(), ConnectionType::MANUAL);
89+ OpenNetworkConnection(addr, false, std::move(grant), info.m_params.m_added_node.c_str(), ConnectionType::MANUAL, info.m_params.m_use_v2transport);
90 if (!interruptNet.sleep_for(std::chrono::milliseconds(500)))
91 return;
92 }
93@@ -2895,7 +2887,7 @@ void CConnman::ThreadOpenAddedConnections()
94 }
95 96 // if successful, this moves the passed grant to the constructed node
97-void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFailure, CSemaphoreGrant&& grant_outbound, const char *pszDest, ConnectionType conn_type)
98+void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFailure, CSemaphoreGrant&& grant_outbound, const char *pszDest, ConnectionType conn_type, bool use_v2transport)
99 {
100 AssertLockNotHeld(m_unused_i2p_sessions_mutex);
101 assert(conn_type != ConnectionType::INBOUND);
102@@ -2917,7 +2909,7 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
103 } else if (FindNode(std::string(pszDest)))
104 return;
105106- CNode* pnode = ConnectNode(addrConnect, pszDest, fCountFailure, conn_type);
107+ CNode* pnode = ConnectNode(addrConnect, pszDest, fCountFailure, conn_type, use_v2transport);
108109 if (!pnode)
110 return;
111@@ -3854,7 +3846,8 @@ void CConnman::PerformReconnections()
112 item.count_failure,
113 std::move(item.grant),
114 item.destination.empty() ? nullptr : item.destination.c_str(),
115- item.conn_type);
116+ item.conn_type,
117+ /*use_v2transport=*/false);
118 }
119 }
120121diff --git a/src/net.h b/src/net.h
122index c9a0afdd97..25a9814f63 100644
123--- a/src/net.h
124+++ b/src/net.h
125@@ -1139,7 +1139,7 @@ public:
126 bool GetNetworkActive() const { return fNetworkActive; };
127 bool GetUseAddrmanOutgoing() const { return m_use_addrman_outgoing; };
128 void SetNetworkActive(bool active);
129- void OpenNetworkConnection(const CAddress& addrConnect, bool fCountFailure, CSemaphoreGrant&& grant_outbound, const char* strDest, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
130+ void OpenNetworkConnection(const CAddress& addrConnect, bool fCountFailure, CSemaphoreGrant&& grant_outbound, const char* strDest, ConnectionType conn_type, bool use_v2transport) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
131 bool CheckIncomingNonce(uint64_t nonce);
132133 // alias for thread safety annotations only, not defined
134@@ -1355,7 +1355,7 @@ private:
135 bool AlreadyConnectedToAddress(const CAddress& addr);
136137 bool AttemptToEvictConnection();
138- CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
139+ CNode* ConnectNode(CAddress addrConnect, const char *pszDest, bool fCountFailure, ConnectionType conn_type, bool use_v2transport) EXCLUSIVE_LOCKS_REQUIRED(!m_unused_i2p_sessions_mutex);
140 void AddWhitelistPermissionFlags(NetPermissionFlags& flags, const CNetAddr &addr) const;
141142 void DeleteNode(CNode* pnode);
143diff --git a/src/rpc/net.cpp b/src/rpc/net.cpp
144index 8f6ec29813..8d796b8e9b 100644
145--- a/src/rpc/net.cpp
146+++ b/src/rpc/net.cpp
147@@ -299,7 +299,7 @@ static RPCHelpMan addnode()
148 {
149 {"node", RPCArg::Type::STR, RPCArg::Optional::NO, "The address of the peer to connect to"},
150 {"command", RPCArg::Type::STR, RPCArg::Optional::NO, "'add' to add a node to the list, 'remove' to remove a node from the list, 'onetry' to try a connection to the node once"},
151- {"v2transport", RPCArg::Type::BOOL, RPCArg::Default{false}, "Attempt to connect using BIP324 v2 transport protocol"},
152+ {"v2transport", RPCArg::Type::BOOL, RPCArg::Default{false}, "Attempt to connect using BIP324 v2 transport protocol (ignored for 'remove' command)"},
153 },
154 RPCResult{RPCResult::Type::NONE, "", ""},
155 RPCExamples{
156@@ -327,10 +327,7 @@ static RPCHelpMan addnode()
157 if (command == "onetry")
158 {
159 CAddress addr;
160- if (use_v2transport) {
161- addr.nServices = ServiceFlags(addr.nServices | NODE_P2P_V2);
162- }
163- connman.OpenNetworkConnection(addr, /*fCountFailure=*/false, /*grant_outbound=*/{}, node_arg.c_str(), ConnectionType::MANUAL);
164+ connman.OpenNetworkConnection(addr, /*fCountFailure=*/false, /*grant_outbound=*/{}, node_arg.c_str(), ConnectionType::MANUAL, use_v2transport);
165 return UniValue::VNULL;
166 }
in
src/net.cpp:1605
in
d5c8b42a16outdated
1600+ Transport::Info info;
1601+
1602+ // Do not report v2 and session ID until the garbage authentication packet has been received
1603+ // and verified (confirming that the other side very likely has the same keys as us).
1604+ if (m_recv_state != RecvState::KEY_MAYBE_V1 && m_recv_state != RecvState::KEY &&
1605+ m_recv_state != RecvState::GARB_GARBTERM) {
I’ve intentionally tested connecting to a node running a version that still uses the garbage authentication packet (i.e. it doesn’t have the recent BIP324 change included yet: #28525) and noticed that we report v2 and the session id too early, before the garbage authentication (now included in VERSION) is received. I think that’s fixed by the following patch:
Nice catch. Since the version packet now takes on the role of garbage authentication, we need to wait for the version packet before deciding it’s confirmed to be a v2 peer.
mzumsande
commented at 2:00 am on October 2, 2023:
I think that this is a small change in behavior (fyi @vasild because of #28331 (review)):
On master, the thread will block here if all addnode slots are taken because we call Acquire() (and not TryAcquire()) so there won’t be any sleep_for and no continuous calls to GetAddedNodeInfo() in this situation (until the grant is released because one of the existing added nodes has disconnected).
With this PR removing non-fTry acquiry of the grant, it means that we will now continuously cycle through the while loop, calling GetAddedNodeInfo() and sleeping for 2 seconds in the same situation where all addnode slots are taken.
Is this change intended / better than the behavior on master? On first glance, repeated calls to GetAddedNodeInfo() seem unnecessary when we don’t have a grant anyway, so the behavior on master seems more natural.
Nice find. I’ve adapted the code to match the old behavior (I hope; please have a look); even if we decide different behavior is desirable, this isn’t the PR to introduce it.
mzumsande
commented at 9:40 pm on October 2, 2023:
Looks good, I tested that the behavior doesn’t change anymore.
sipa force-pushed
on Oct 2, 2023
sipa
commented at 3:10 am on October 2, 2023:
member
0diff --git a/src/net.cpp b/src/net.cpp
1index 7e70a0586a..a17c215a2c 100644
2--- a/src/net.cpp
3+++ b/src/net.cpp
4@@ -1603,10 +1603,10 @@ Transport::Info V2Transport::GetInfo() const noexcept
5 6 Transport::Info info;
7 8- // Do not report v2 and session ID until the garbage authentication packet has been received
9+ // Do not report v2 and session ID until the version packet has been received
10 // and verified (confirming that the other side very likely has the same keys as us).
11 if (m_recv_state != RecvState::KEY_MAYBE_V1 && m_recv_state != RecvState::KEY &&
12- m_recv_state != RecvState::GARB_GARBTERM) {
13+ m_recv_state != RecvState::GARB_GARBTERM && m_recv_state != RecvState::VERSION) {
14 info.transport_type = TransportProtocolType::V2;
15 info.session_id = uint256(MakeUCharSpan(m_cipher.GetSessionID()));
16 } else {
17@@ -2861,11 +2861,12 @@ void CConnman::ThreadOpenAddedConnections()
18 AssertLockNotHeld(m_reconnections_mutex);
19 while (true)
20 {
21+ CSemaphoreGrant grant(*semAddnode);
22 std::vector<AddedNodeInfo> vInfo = GetAddedNodeInfo();
23 bool tried = false;
24 for (const AddedNodeInfo& info : vInfo) {
25 if (!info.fConnected) {
26- CSemaphoreGrant grant(*semAddnode, /*fTry=*/true);
27+ if (!grant) grant = CSemaphoreGrant(*semAddnode, /*fTry=*/true);
28 if (!grant) {
29 // If we've used up our semaphore and need a new one, let's not wait here since while we are waiting
30 // the addednodeinfo state might change.
DrahtBot added the label
CI failed
on Oct 2, 2023
fanquake
commented at 9:15 am on October 2, 2023:
member
This adds a way to specify a v2 peer to addnode. What about something similar for -connect= peers? Right now there is no way to specify “use v2” for addresses provided to -connect=. What about if -v2transport=1 then always try v2 first (and on failure v1) for all addnode and connect peers? Then this extra argument to addnode is not needed.
There have been several comments related to this before. The situation is certainly incomplete w.r.t. the ability to specify v1/v2 for manual connections (e.g. also the -addnode / -seednode options lacks that), but my goal with this PR is getting the minimal in to support testing. Making things convenient for wider adoption I think needs some more UX discussion (e.g. do we permit specifying IP address/ranges to be V2, or do we use a network permission, or some special suffix -addnode=IP@v2; also how does this interact with a potential future where peer pubkeys can be specified, …), which I think is better done separately.
Regarding automatically assuming v2 for all addnode… that would (for now) mean 2 connection attempts for every manual peer; I’m not sure that’s an acceptable tradeoff right now.
vasild approved
vasild
commented at 10:12 am on October 2, 2023:
contributor
ACKcd5e72d6fb82c580dbf693b14e842d6204327311 modulo the use-after-move which I think is benign but should be silenced regardless. This should resolve it (untested):
0 OpenNetworkConnection(addr, false, std::move(grant), info.m_params.m_added_node.c_str(), ConnectionTyp
1+ grant = CSemaphoreGrant{};
2 if (!interruptNet.sleep_for(std::chrono::milliseconds(500)))
DrahtBot requested review from theStack
on Oct 2, 2023
DrahtBot requested review from mzumsande
on Oct 2, 2023
6a2dbed1209b18d923da64b019f6f378d3c30695: could a comment here that we // Use temporary variable to track new reconnection candidates, because m_reconnections_mutex may be blocked [also for several seconds??? or much shorter but still don't want that] (supplementing the comment below why we don’t do the reconnect itself here either).
6a2dbed1209b18d923da64b019f6f378d3c30695ReconnectionInfo could specify use_v2transport so that PerformReconnections can be used more broadly than just v2 -> v1 downgrades. Can always be done later of course.
360@@ -361,6 +361,11 @@ class Transport {
361362 /** Return the memory usage of this transport attributable to buffered data to send. */
363 virtual size_t GetSendMemoryUsage() const noexcept = 0;
364+
365+ // 3. Miscellaneous functions.
366+
367+ /** Whether upon disconnections, a reconnect with V1 is warranted. */
368+ virtual bool ShouldReconnectV1() const noexcept = 0;
I don’t think that works, because even while ReconnectionInfo can be generic for any kind of reconnection attempt, this function and its call site are very much specific to a V2 connection downgrading to a V1.
in
src/net.cpp:1925
in
6a2dbed120outdated
1920+ // the creation of a connection is a blocking operation (up to several seconds),
1921+ // and we don't want to hold up the socket handler thread for that long.
1922+ if (pnode->m_transport->ShouldReconnectV1()) {
1923+ reconnections_to_add.push_back({
1924+ .addr_connect = pnode->addr,
1925+ .count_failure = pnode->m_count_failure_after_reconnect,
6a2dbed1209b18d923da64b019f6f378d3c30695 I’m confused about the intention of count_failure / m_count_failure_after_reconnect. If this is set to true then AddrManImpl marks the most recent failed attempt and bumps info.nAttempts.
The latter suggests that we should set this to false if we’r trying a v2 -> v1 downgrade, otherwise we’d be counting double. Even better: we should not count the failed v2 attempt if we intend to try again with v1.
Nice catch. I was clearly confused when writing this.
The situation is even simpler, because the fCountFailure argument to OpenNetworkConnection is only related to connection failure (i.e., TCP/IP failing to establish a connection) and not related to failures that happen after that point (perhaps it should, but that seems out of scope for this PR). We only ever attempt a V1 reconnection if the establishment of the V2 connection succeeded before that point, so by the time we reconnect, we already know the peer can be reached (in a TCP/IP sense), and there is no need to keep track of failure counting after that.
I’ve changed it to set fCountFailure always to false on reconnect.
6a2dbed1209b18d923da64b019f6f378d3c30695: The word count is ambigous here, because AddrManImpl literally counts failures, but I think the meaning here is “it counts / we care about it”.
See also my above comment. Depending how this is intended to work, maybe I can think of a better name.
2e0f8a6d39a15ecac60b9d463606e16ba5707bb5: I’m confused by this change. I see !fgrant calls the bool() operator which checks fHaveGrant. But if that returns false we now break out of the for loop. Wheveras we used to call TryAcquire() and continue if that worked.
Oh I see. The CSemaphoreGrant grant(*semAddnode) constructor before the loop doesn’t fTry, but uses Acquire(), which waits (unchanged behavior) and always sets grant to true.
At the end of the first iteration we set fTry, so we either get a grant for the next iteration or we don’t.
The first time we don’t get a grant we bail out of the loop. This is effectively the same as before, it’s just that we did TryAquire() at the start of each iteration (which also always would have succeed in the first iteration).
Sjors
commented at 4:46 pm on October 2, 2023:
member
tACK7b255b56698392019dd575c3bb73f6d8b1334f4b
DrahtBot requested review from vasild
on Oct 2, 2023
DrahtBot removed the label
CI failed
on Oct 2, 2023
in
doc/bips.md:52
in
7b255b5669outdated
48@@ -49,6 +49,7 @@ BIPs that are implemented by Bitcoin Core:
49 * [`BIP 173`](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki): Bech32 addresses for native Segregated Witness outputs are supported as of **v0.16.0** ([PR 11167](https://github.com/bitcoin/bitcoin/pull/11167)). Bech32 addresses are generated by default as of **v0.20.0** ([PR 16884](https://github.com/bitcoin/bitcoin/pull/16884)).
50 * [`BIP 174`](https://github.com/bitcoin/bips/blob/master/bip-0174.mediawiki): RPCs to operate on Partially Signed Bitcoin Transactions (PSBT) are present as of **v0.17.0** ([PR 13557](https://github.com/bitcoin/bitcoin/pull/13557)).
51 * [`BIP 176`](https://github.com/bitcoin/bips/blob/master/bip-0176.mediawiki): Bits Denomination [QT only] is supported as of **v0.16.0** ([PR 12035](https://github.com/bitcoin/bitcoin/pull/12035)).
52+* [`BIP 324`](https://github.com/bitcoin/bips/blob/master/bip-0324.mediawiki): Support for the BIP324 v2 transport protocol and the associated `NODE_P2P_V2` service bit is supported as of **v26.0**, but off by default ([PR 28331](https://github.com/bitcoin/bitcoin/pull/28331)).
net: reconnect with V1Transport under certain conditions
When an outbound v2 connection is disconnected without receiving anything, but at
least 24 bytes of our pubkey were sent out (enough to constitute an invalid v1
header), add them to a queue of reconnections to be tried.
The reconnections are in a queue rather than performed immediately, because we should
not block the socket handler thread with connection creation (a blocking operation
that can take multiple seconds).
432a62c4dc
net: expose transport types/session IDs of connections in RPC and logs
0diff --git a/doc/bips.md b/doc/bips.md
1index 87b5918c72..952d289daa 100644
2--- a/doc/bips.md
3+++ b/doc/bips.md
4@@ -49,7 +49,7 @@ BIPs that are implemented by Bitcoin Core:
5 * [`BIP 173`](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki): Bech32 addresses for native Segregated Witness outputs are supported as of **v0.16.0** ([PR 11167](https://github.com/bitcoin/bitcoin/pull/11167)). Bech32 addresses are generated by default as of **v0.20.0** ([PR 16884](https://github.com/bitcoin/bitcoin/pull/16884)).
6 * [`BIP 174`](https://github.com/bitcoin/bips/blob/master/bip-0174.mediawiki): RPCs to operate on Partially Signed Bitcoin Transactions (PSBT) are present as of **v0.17.0** ([PR 13557](https://github.com/bitcoin/bitcoin/pull/13557)).
7 * [`BIP 176`](https://github.com/bitcoin/bips/blob/master/bip-0176.mediawiki): Bits Denomination [QT only] is supported as of **v0.16.0** ([PR 12035](https://github.com/bitcoin/bitcoin/pull/12035)).
8-* [`BIP 324`](https://github.com/bitcoin/bips/blob/master/bip-0324.mediawiki): Support for the BIP324 v2 transport protocol and the associated `NODE_P2P_V2` service bit is supported as of **v26.0**, but off by default ([PR 28331](https://github.com/bitcoin/bitcoin/pull/28331)).
9+* [`BIP 324`](https://github.com/bitcoin/bips/blob/master/bip-0324.mediawiki): The v2 transport protocol specified by BIP324 and the associated `NODE_P2P_V2` service bit are supported as of **v26.0**, but off by default ([PR 28331](https://github.com/bitcoin/bitcoin/pull/28331)).
10 * [`BIP 325`](https://github.com/bitcoin/bips/blob/master/bip-0325.mediawiki): Signet test network is supported as of **v0.21.0** ([PR 18267](https://github.com/bitcoin/bitcoin/pull/18267)).
11 * [`BIP 339`](https://github.com/bitcoin/bips/blob/master/bip-0339.mediawiki): Relay of transactions by wtxid is supported as of **v0.21.0** ([PR 18044](https://github.com/bitcoin/bitcoin/pull/18044)).
12 * [`BIP 340`](https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki)
13diff --git a/src/net.cpp b/src/net.cpp
14index c4c2d55c77..6b2ef5f43d 100644
15--- a/src/net.cpp
16+++ b/src/net.cpp
17@@ -469,7 +469,7 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
18 const std::vector<CService> resolved{Lookup(pszDest, default_port, fNameLookup && !HaveNameProxy(), 256)};
19 if (!resolved.empty()) {
20 const CService& rnd{resolved[GetRand(resolved.size())]};
21- addrConnect = CAddress{MaybeFlipIPv6toCJDNS(rnd), addrConnect.nServices};
22+ addrConnect = CAddress{MaybeFlipIPv6toCJDNS(rnd), NODE_NONE};
23 if (!addrConnect.IsValid()) {
24 LogPrint(BCLog::NET, "Resolver returned invalid address %s for %s\n", addrConnect.ToStringAddrPort(), pszDest);
25 return nullptr;
theStack approved
theStack
commented at 11:28 pm on October 2, 2023:
contributor
re-ACK75a329103505736acb9036224da2dfa8ab038c43
DrahtBot requested review from Sjors
on Oct 2, 2023
DrahtBot requested review from mzumsande
on Oct 2, 2023
mzumsande
commented at 11:59 pm on October 2, 2023:
contributor
re-ACK75a329103505736acb9036224da2dfa8ab038c43
DrahtBot removed review request from mzumsande
on Oct 2, 2023
Sjors
commented at 9:12 am on October 3, 2023:
member
re-ACK75a329103505736acb9036224da2dfa8ab038c43
DrahtBot removed review request from Sjors
on Oct 3, 2023
fanquake merged this
on Oct 3, 2023
fanquake closed this
on Oct 3, 2023
vasild
commented at 9:36 am on October 3, 2023:
contributor
ACK75a329103505736acb9036224da2dfa8ab038c43
rsantacroce
commented at 8:13 am on October 4, 2023:
none
any clear reason why noise protocol hasn’t been considered here ?
Sjors
commented at 8:22 am on October 4, 2023:
member
rsantacroce
commented at 8:35 am on October 4, 2023:
none
Thank you, i missed that one! very good explanation now it’s the other way around why we don’t use it on stratumv2 …. this is for the other repo. thank you and thank you for the great work.
sipa added the label
Needs release note
on Oct 4, 2023
hebasto referenced this in commit
0e3de3b83e
on Oct 5, 2023
Frank-GER referenced this in commit
95e4527ea0
on Oct 13, 2023
in
test/functional/p2p_v2_transport.py:45
in
05d19fbcc1outdated
47+ self.connect_nodes(0, 1, peer_advertises_v2=True)
48+ self.generate(self.nodes[0], 5, sync_fun=lambda: self.sync_all(self.nodes[0:2]))
49+ assert_equal(self.nodes[1].getblockcount(), 5)
50+ # verify there is a v2 connection between node 0 and 1
51+ node_0_info = self.nodes[0].getpeerinfo()
52+ node_1_info = self.nodes[0].getpeerinfo()
kashifs
commented at 4:04 pm on November 10, 2023:
@sipa, am I missing something or should this line read:
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me