net: transport abstraction

sipa commented at 8:04 pm on July 26, 2023: member

This PR furthers the P2P message serialization/deserialization abstraction introduced in #16202 and #16562, in preparation for introducing the BIP324 v2 transport (making this part of #27634). However, nothing in this PR is BIP324-specific, and it contains a number of independently useful improvements.

The overall idea is to have a single object in every CNode (called m_transport) that is responsible for converting sent messages to wire bytes, and for converting received wire bytes back to messages, while having as little as possible knowledge about this conversion process in higher-level net code. To accomplish that, there is an abstract Transport class with (currently) a single V1Transport implementation.

Structurally, the above is accomplished by:

Merging the TransportDeserializer and TransportSerializer classes into a single Transport class, which encompasses both the sending and receiving side. For V1Transport these two sides are entirely separate, but this assumption doesn’t hold for the BIP324 transport where e.g. the sending encryption key depends on the DH key negotiation data received from the other side. Merging the two means a future V2Transport can handle all this interaction without callers needing to be aware.
Removing the assumption that each message is sent using a computed header followed by (unmodified) data bytes. To achieve that, the sending side of Transport mirrors what the receiver side does: callers can set a message to be sent, then ask what bytes must be sent out, and then allowing them to transition to the next message.
Adding internal locks to protect the sending and receiving state of the V1Transport implementation. I believe these aren’t strictly needed (opinions welcome) as there is no real way to use Transport objects in a multi-threaded fashion without some form of external synchronization (e.g. “get next bytes to send” isn’t meaningful to call from multiple threads at the same time without mechanism to control the order they’ll actually get sent). Still, I feel it’s cleaner to make the object responsible for its own consistency (as we definitely do not want the entire object to be under a single external GUARDED_BY, as that’d prevent simultaneous sending and receiving).
Moving the conversion of messages to bytes on the sending side from PushMessage to SocketSendData, which is needed to deal with the fact that a transport may not immediately be able to send messages.

This PR is not a refactor, though some commits are. Among the semantic changes are:

Changing the send buffer pushback mechanism to trigger based on the memory usage of the buffer rather than the amount of bytes to be sent. This is both closer to the desired behavior, and makes the buffering independent from transport details (which is why it’s included here).
When optimistic send is not applicable, the V1 message checksum calculation now runs in the net thread rather than the message handling thread. I believe that’s generally an improvement, as the message handling thread is far more computationally bottlenecked already.
The checksum calculation now runs under the CNode::cs_vSend lock, which does mean no two checksum calculations for messages sent to the same node can run in parallel, even if running in separate threads. Despite that limitation, having the checksum for non-optimistic sends moved in the net thread is still an improvement, I believe.
Statistics for per-message-type sent bytes are now updated when the bytes are actually handed to the OS rather than in PushMessage. This is because the actual serialized sizes aren’t known until they’ve gone through the transport object.

A fuzz test of the entire V1Transport is included. More elaborate rationale for each of the changes can be found in the commit messages.

DrahtBot commented at 8:04 pm on July 26, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	theStack, vasild, dergoegge
Concept ACK	jonatack
Stale ACK	vincenzopalazzo, Sjors, mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

DrahtBot added the label P2P on Jul 26, 2023

sipa force-pushed on Jul 29, 2023

dergoegge commented at 10:41 am on July 31, 2023: member

Concept ACK

Merging the two means a future V2Transport can handle all this interaction without callers needing to be aware.

🚀

sipa force-pushed on Jul 31, 2023

jonatack commented at 9:04 pm on August 2, 2023: member

Concept ACK

in src/net.h:351 in c7720844a4 outdated

306@@ -306,29 +307,38 @@ class V1Transport final : public Transport
307         hasher.Reset();
308     }
309 
310+    bool CompleteInternal() const noexcept EXCLUSIVE_LOCKS_REQUIRED(m_cs_recv)

Sjors commented at 11:03 am on August 3, 2023:

c7720844a4357aa497362fd5b481bc1a9c27687d: I assume this separation of CompleteInternal() is because EXCLUSIVE_LOCKS_REQUIRED is only set for V1Transport rather than on Transport?

sipa commented at 7:08 pm on August 3, 2023:

It’s because I don’t want a recursive lock, and I’m introducing a caller of CompleteInternal() that already holds m_cs_recv.

in src/net.cpp:787 in f97adbf3e9 outdated

782@@ -783,6 +783,8 @@ CNetMessage V1Transport::GetMessage(const std::chrono::microseconds time, bool&
783 {
784     // Initialize out parameter
785     reject_message = false;
786+
787+    LOCK(m_cs_recv);

Sjors commented at 11:48 am on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463 this LOCK should have been moved in the previous commit. Otherwise it gives a thread safety warning.

sipa commented at 10:29 pm on August 15, 2023:

Done.

in src/net.h:279 in f97adbf3e9 outdated

278+    // 2. Sending side functions, for serializing messages into bytes to be sent over the wire.
279+    // Callers must guarantee that none of these functions are called concurrently w.r.t. one
280+    // another.
281+
282+    /** Whether the last provided message has been sent, and a new one can be provided. */
283+    virtual bool DoneSendingMessage() const noexcept = 0;

Sjors commented at 1:25 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463: NoPendingSend() ? That makes it more clear this method doesn’t mark something done. Also it applies before the first message too.

sipa commented at 8:15 pm on August 3, 2023:

Actually, that name would be confusing for V2, where it’s possible that there is no pending message, but also no message can be provided (yet) because the handshake has not completed.

Perhaps CanSetNewMessageToSend() is better? (let the bikesheddening begin!)

theStack commented at 0:08 am on August 8, 2023:

FWIW, I agree that the Done prefix of the method name is quite confusing, for both the reasons Sjors lined out. CanSetNewMessageToSend() sounds much better to me (maybe even just CanSetMessageToSend, to have the full name of the method included which can be called if true is returned?).

sipa commented at 10:30 pm on August 15, 2023:

I’ve made some significant changes to the code, and DoneSendingMessage is now gone: instead the caller just tries SetMessageToSend, and it’ll fail if the message can’t be set. This avoids a virtual function call in some cases, doing both at once.

in src/net.cpp:2919 in f97adbf3e9 outdated

2922         bool optimisticSend(pnode->vSendMsg.empty());
2923 
2924-        //log total amount of bytes per message type
2925-        pnode->AccountForSentBytes(msg.m_type, nTotalSize);
2926-        pnode->nSendSize += nTotalSize;
2927+        assert(pnode->m_transport->DoneSendingMessage());

Sjors commented at 2:11 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463: this new assert and the one below had me a bit worried, but they disappear in 68e48a0185751d24eecb194b8efd7028c8b590f3.

I think they’re ok here though:

The very first message it’s trivially clear this won’t be a problem: m_sending_header starts out false, m_bytes_sent at 0 and m_message_to_send.data is initialised empty.
MarkBytesSent sets these things back when (exactly) all bytes have been put in the send buffer (vSendMsg). In v1 this is always the case after the second (header only) or third call to GetBytesToSend(), after which we leave the while loop.

sipa commented at 10:30 pm on August 15, 2023:

DoneSendingMessage is gone.

in src/net.cpp:2922 in f97adbf3e9 outdated

2925-        pnode->AccountForSentBytes(msg.m_type, nTotalSize);
2926-        pnode->nSendSize += nTotalSize;
2927+        assert(pnode->m_transport->DoneSendingMessage());
2928+        pnode->m_transport->SetMessageToSend(std::move(msg));
2929+
2930+        while (true) {

Sjors commented at 2:26 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463: could add a comment here:

0// In v1 transport GetBytesToSend first returns a header and next the data (if any).

sipa commented at 10:30 pm on August 15, 2023:

I’ve added a comment to this effect.

in src/net.cpp:2924 in f97adbf3e9 outdated

2927+        assert(pnode->m_transport->DoneSendingMessage());
2928+        pnode->m_transport->SetMessageToSend(std::move(msg));
2929+
2930+        while (true) {
2931+            const auto& [bytes, more, msg_type] = pnode->m_transport->GetBytesToSend();
2932+            if (bytes.empty()) break;

Sjors commented at 2:32 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463: Since we’re notifying two different things about these sent bytes - and also setting nSendSize - I suggested some extra comments…

0// Update statistics per message type

sipa commented at 10:30 pm on August 15, 2023:

Done.

in src/net.cpp:2928 in f97adbf3e9 outdated

2931+            const auto& [bytes, more, msg_type] = pnode->m_transport->GetBytesToSend();
2932+            if (bytes.empty()) break;
2933+            pnode->AccountForSentBytes(msg_type, bytes.size());
2934+            pnode->nSendSize += bytes.size();
2935+            if (pnode->nSendSize > nSendBufferMaxSize) pnode->fPauseSend = true;
2936+            pnode->vSendMsg.push_back({bytes.begin(), bytes.end()});

Sjors commented at 2:33 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463

0// Notify Transport that bytes have been processed

sipa commented at 10:30 pm on August 15, 2023:

Done.

in src/net.cpp:2925 in f97adbf3e9 outdated

2928+        pnode->m_transport->SetMessageToSend(std::move(msg));
2929+
2930+        while (true) {
2931+            const auto& [bytes, more, msg_type] = pnode->m_transport->GetBytesToSend();
2932+            if (bytes.empty()) break;
2933+            pnode->AccountForSentBytes(msg_type, bytes.size());

Sjors commented at 2:36 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463

0// Update bytes in send buffer

(becomes “Update memory use of send buffer” in the next commit)

sipa commented at 10:31 pm on August 15, 2023:

Done (and also updated the one in the next commit).

in src/net.h:331 in f97adbf3e9 outdated

292+     */
293+    using BytesToSend = std::tuple<Span<const uint8_t>, bool, const std::string&>;
294+    /** Get bytes to send on the wire. */
295+    virtual BytesToSend GetBytesToSend() const noexcept = 0;
296+    /** Report how many bytes returned by GetBytesToSend() have been sent. No effect if 0. */
297+    virtual void MarkBytesSent(size_t bytes_sent) noexcept = 0;

Sjors commented at 2:38 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463: could we just assume bytes will get sent this if GetBytesToSend() has been called? Afaik there’s no way to handle a failure anyway. But I guess it’s safer to track it explicitly.

Update: in 68e48a0185751d24eecb194b8efd7028c8b590f3 this function becomes more important, and is used to track when not all bytes are sent.

sipa commented at 7:29 pm on August 3, 2023:

In general, it’s true that the new sending interface is overly complicated for how it is used initially, and then it only becomes apparent in the next commit.

Here specifically, indeed, everything returned by GetBytesToSend is initially always marked sent, but in the next commit that changes, as the buffering responsibility is moved from vSendMsg to m_transport. I’m happy to add (possibly temporary) comments to explain what’s happening.

sipa commented at 10:31 pm on August 15, 2023:

I’ve added some more comments in various places hopefully addresses this complexity, foreshadowing that some added code will be removed in a future commit.

in src/net.cpp:832 in f97adbf3e9 outdated

833@@ -827,8 +834,46 @@ void V1Transport::prepareForTransport(CSerializedNetMsg& msg, std::vector<unsign
834     memcpy(hdr.pchChecksum, hash.begin(), CMessageHeader::CHECKSUM_SIZE);
835 
836     // serialize header
837-    header.reserve(CMessageHeader::HEADER_SIZE);

Sjors commented at 2:49 pm on August 3, 2023:

f97adbf3e93e89b1e8ce1dc212e84ac6b2879463 this reserve wasn’t useful?

vincenzopalazzo commented at 9:20 pm on August 12, 2023:

Mh looks like that header is no longer used?

sipa commented at 3:49 am on August 14, 2023:

The header argument no longer exists (as the caller doesn’t have any notion of headers anymore, that’s local to the transport implementation). Instead there is an m_header_to_send variable, but it’s reused across messages, so (repeated) reserving makes no sense.

in src/net.cpp:3014 in 1937a5fcf7 outdated

2944-            if (pnode->nSendSize > nSendBufferMaxSize) pnode->fPauseSend = true;
2945             pnode->vSendMsg.push_back({bytes.begin(), bytes.end()});
2946+            pnode->m_send_memusage += sizeof(pnode->vSendMsg.back()) + memusage::DynamicUsage(pnode->vSendMsg.back());
2947             pnode->m_transport->MarkBytesSent(bytes.size());
2948         }
2949+        if (pnode->m_send_memusage + pnode->m_transport->GetSendMemoryUsage() > nSendBufferMaxSize) pnode->fPauseSend = true;

Sjors commented at 3:13 pm on August 3, 2023:

1937a5fcf795149c44b7f4f016c05000ac3adaf9 Isn’t >m_transport->GetSendMemoryUsage() 0 since we just sent the message and cleared it?

(but that changes in the next commit 68e48a0185751d24eecb194b8efd7028c8b590f3)

sipa commented at 7:03 pm on August 3, 2023:

Yeah, it should be 0 at this point, but that won’t remain with the next commit. A temporary explanation message could be added to explain that here.

sipa commented at 10:32 pm on August 15, 2023:

I’ve added a comment to explain this better.

in src/net.cpp:3013 in 68e48a0185 outdated

2948-            pnode->AccountForSentBytes(msg_type, bytes.size());
2949-            pnode->vSendMsg.push_back({bytes.begin(), bytes.end()});
2950-            pnode->m_send_memusage += sizeof(pnode->vSendMsg.back()) + memusage::DynamicUsage(pnode->vSendMsg.back());
2951-            pnode->m_transport->MarkBytesSent(bytes.size());
2952-        }
2953+        pnode->m_send_memusage += msg.GetMemoryUsage();

Sjors commented at 4:03 pm on August 3, 2023:

68e48a0185751d24eecb194b8efd7028c8b590f3: this is much readable than the intermediate calculation introduced in 1937a5fcf795149c44b7f4f016c05000ac3adaf9. It might be worth moving that commit after here.

sipa commented at 8:00 pm on August 3, 2023:

I don’t reordering works without (even) more intermediary complexity. The issue is “net: move message serialization from PushMessage to SocketSendData” changes the data type of vSendMsg from bytes-to-be-sent to messages-to-be-sent, and the latter just don’t have a known size-on-the-wire (unless an additional API to transports is added for that, or hardcoding the V1 message encoding size rules). That’s why this PR first changes the notion of send buffer size: the old notion just doesn’t really make sense anymore after the buffer changes introduced here.

in src/net.cpp:2945 in 68e48a0185 outdated

2954+        pnode->vSendMsg.push_back(std::move(msg));
2955         if (pnode->m_send_memusage + pnode->m_transport->GetSendMemoryUsage() > nSendBufferMaxSize) pnode->fPauseSend = true;
2956 
2957-        assert(pnode->m_transport->DoneSendingMessage());
2958-
2959         // If write queue empty, attempt "optimistic write"

Sjors commented at 4:12 pm on August 3, 2023:

68e48a0185751d24eecb194b8efd7028c8b590f3: This could be a good time to document what optimistic send actually is. From 1817398b397afebcc857c40a16d201c84878cb89:

0// Because the poll/select loop may pause for 100msec before actually doing a
1// send, and we have no way to force the loop awake, try sending from the calling
2// thread if the queue is empty.

Or shorter: // which avoids a delay of up to SELECT_TIMEOUT_MILLISECONDS.

sipa commented at 10:32 pm on August 15, 2023:

Not done yet; will do on a future push.

sipa commented at 11:15 pm on August 16, 2023:

Done.

in src/net.cpp:1305 in 68e48a0185 outdated

1300@@ -1298,7 +1301,9 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
1301         bool select_send;
1302         {
1303             LOCK(pnode->cs_vSend);
1304-            select_send = !pnode->vSendMsg.empty();
1305+            // This relies on optimistic send to make sure the transport always has a message to
1306+            // send if there are any.

Sjors commented at 4:23 pm on August 3, 2023:

68e48a0185751d24eecb194b8efd7028c8b590f3 Is the above comment still correct? “As this only happens when optimistic write failed”

sipa commented at 10:32 pm on August 15, 2023:

The comment is gone (due to significant changes; the code also works differently now).

in src/net.cpp:938 in 68e48a0185 outdated

912+        const auto& [data, more, msg_type] = node.m_transport->GetBytesToSend();
913         int nBytes = 0;
914-        {
915+        if (!data.empty()) {
916             LOCK(node.m_sock_mutex);
917             if (!node.m_sock) {

Sjors commented at 4:41 pm on August 3, 2023:

68e48a0185751d24eecb194b8efd7028c8b590f3

0// Leave message in the transport for when the socket is available.
1// v2 transport would also require waiting for the handshake to complete

With v2 I assume we can’t even call GetBytesToSend before the handshake?

sipa commented at 7:02 pm on August 3, 2023:

We can - In fact the handshake itself is sent this way (that’s the nice part about this abstraction, the caller doesn’t know or care whether bytes being sent are on behalf of a message we’re trying to send or something else).

Sjors commented at 9:22 am on August 4, 2023:

I guess it’s not entirely clear to me whose responsibility it is to ensure the handshake has been done: #28165 (review) (I assume it will be in the main PR)

sipa commented at 12:50 pm on August 4, 2023:

My view is that anything that is different between v1 and v2 connections ought to be handled by the respective transport class.

In V2 there are multiple stages a connection goes through (pubkey, garbage+terminator, garbage authentication packet, version negotiation packet, and finally application data during which bitcoin P2P messages can be sent); this will be implemented using a finite state machine on sender and receiver side to control what state we are in. Only during the last phase (application data) can SetMessageToSend be called.

GetBytesToSend can always be called. If there is nothing to send, it’ll return empty. If there is, (at least) some part of it will be returned.

For V1, GetBytesToSend() is non-empty whenever there is a header or a payload to be sent.
For V2, GetBytesToSend() is non-empty whenever there are handshake bytes (pubkey, garbage, garbage auth, version packet) or an application packet (encoding a message) to be sent.

sipa commented at 10:33 pm on August 15, 2023:

I haven’t added a comment here as I’m not entirely sure under what scenarios if (!node.m_sock) triggers; it’s code that existed beforehand, and is untouched by this PR.

Sjors commented at 2:57 pm on August 17, 2023:

I think !node.m_sock can only happen when when we’ve disconnected in another thread, using CloseSocketDisconnect() (and during tests). That’s also what the documentation of m_sock_mutex says. cc @vasild

sipa commented at 7:04 pm on August 17, 2023:

Right, so I think !node.m_sock only happens when the node is already disconnected, or will never actually be connected to anything (in tests). I’ve added the following comment:

There is no socket in case we’ve already disconnected, or in test cases without real connections. In these cases, we bail out immediately and just leave things in the send queue and transport.

Further up, above the SetMessageToSend call, I’ve added:

This fails when there is an existing message still being sent.

which in #28196 is extended to say:

This fails when there is an existing message still being sent, or (for v2 transports) when the handshake has not yet completed.

Sjors commented at 5:51 pm on August 18, 2023:

8d1a91f6394f00b880613268a9f2f3164ef914e2: technically have_unsent_data should be set to true here, but since we just disconnected it doesn’t matter.

vasild commented at 7:30 am on August 22, 2023:

Right, if CNode::m_sock is empty shared_ptr, that means CNode::CloseSocketDisconnect() has been called and this is a one-way street, it is not coming back, so all communication structs and data for this peer can be dropped.

Or, during tests an empty shared_ptr may be supplied to the CNode constructor. My understanding is that in those tests the code that uses the socket is not supposed to be reached/executed. Other tests provide a mocked socket, that returns either fuzzed data or static/hardcoded contents upon read in order to entertain the code that receives from the socket.

Sjors commented at 5:07 pm on August 3, 2023: member

Concept ACK

Mostly happy with b7a7ed70d06ab3f994ff58e3a0c99105ee88ab6e. Minor inline suggestions and/or making sure I understand what the code is doing.

I also checked that all intermediate commits compile and ran the new fuzzer for several hours (but didn’t study it very carefully).

1937a5fcf795149c44b7f4f016c05000ac3adaf9: -maxsendbuffer help could be changed to say “Maximum per-connection memory use for the send buffer”
68e48a0185751d24eecb194b8efd7028c8b590f3: can you elaborate a bit in the commit description why nSendOffset can be dropped?

sipa commented at 9:09 pm on August 3, 2023: member

https://github.com/bitcoin/bitcoin/commit/68e48a0185751d24eecb194b8efd7028c8b590f3: can you elaborate a bit in the commit description why nSendOffset can be dropped?

In the current codebase, the send buffering (= remembering the to-be-sent bytes which we haven’t managed to send yet) is done using vSendMsg (a queue of byte arrays) + nSendOffset (the position within vSendMsg[0] up to where we’ve sent things). After this PR, vSendMsg is turned into a queue of messages, which have not yet been converted to bytes-on-the-wire; this conversion is now handled by m_transport, and it’s the transport that remembers what/how much has been sent yet.

theStack commented at 0:09 am on August 8, 2023: contributor

Concept ACK

Slowly working my way through, reviewed up to f97adbf3e93e89b1e8ce1dc212e84ac6b2879463 (commit 3/8), looks good so far. One potential code deduplication nit (though I’m not sure if it gains that much in readability): seems like for V1Transport, HaveBytesToSend() and DoneSendingMessage() are in an inverse relation (or however one would call that), so one could be expressed through the other, e.g.:

 0index 887669e32b..af0fa5c603 100644
 1--- a/src/net.cpp
 2+++ b/src/net.cpp
 3@@ -863,8 +863,7 @@ void V1Transport::SetMessageToSend(CSerializedNetMsg&& msg) noexcept
 4 
 5 bool V1Transport::HaveBytesToSend() const noexcept
 6 {
 7-    LOCK(m_cs_send);
 8-    return m_sending_header || m_bytes_sent != m_message_to_send.data.size();
 9+    return !DoneSendingMessage();
10 }
11 
12 Transport::BytesToSend V1Transport::GetBytesToSend() const noexcept

in src/net.h:266 in 8f5c65a464 outdated

266+    // 1. Receiver side functions, for deserializing the network buffer into a transport protocol
267+    // agnostic CNetMessage (message type & payload). Callers must guarantee that none of these
268+    // functions are called concurrently w.r.t. one another.
269+
270     // returns true if the current deserialization is complete
271     virtual bool Complete() const = 0;

mzumsande commented at 12:59 pm on August 11, 2023:

commit 8f5c65a464b7ed47939a055a4b65286d20a5b126: Could rename to Complete() to DeserComplete(), DoneReceivingMessage() or somthing similar to make it clearer that this function refers to deserialization side, now that both direction are in one class.

sipa commented at 10:34 pm on August 15, 2023:

I’ve added a commit that renames all receive-side functions to have “Receive” somewhere in the name.

vincenzopalazzo commented at 9:25 pm on August 12, 2023: none

Concept ACK till https://github.com/bitcoin/bitcoin/pull/28165/commits/1937a5fcf795149c44b7f4f016c05000ac3adaf9

I will continue later on it

in src/test/denialofservice_tests.cpp:90 in 68e48a0185 outdated

86@@ -87,8 +87,15 @@ BOOST_AUTO_TEST_CASE(outbound_slow_chain_eviction)
87 
88     {
89         LOCK(dummyNode1.cs_vSend);
90-        BOOST_CHECK(dummyNode1.vSendMsg.size() > 0);
91+        BOOST_CHECK(!dummyNode1.vSendMsg.empty() || dummyNode1.m_transport->HaveBytesToSend());

mzumsande commented at 5:12 pm on August 14, 2023:

Why the second condition within the context of this unit test - It’s not like it could have sent the message partially before? (same for the other Check below)

sipa commented at 10:35 pm on August 15, 2023:

It can be (and is), due to optimistic send logic triggering SocketSendData, which now moves vSendMsg objects into the transport.

sipa commented at 11:16 pm on August 16, 2023:

Actually just the second condition suffices (the entire expected message is moved to the transport). I’ve simplified the code accordingly.

in src/test/util/net.cpp:76 in 68e48a0185 outdated

71@@ -72,6 +72,12 @@ void ConnmanTestMsg::NodeReceiveMsgBytes(CNode& node, Span<const uint8_t> msg_by
72 
73 bool ConnmanTestMsg::ReceiveMsgFrom(CNode& node, CSerializedNetMsg&& ser_msg) const
74 {
75+    /* Flush out any unsent bytes from previous messages. */
76+    while (node.m_transport->HaveBytesToSend()) {

mzumsande commented at 6:15 pm on August 14, 2023:

It seems confusing and not ideal that a function called ReceiveMsgFrom has so much code dealing with the Send part, and also the side effect of flushing the send message buffer, when the goal is just to create a header for ser_msg to be able to receive that, but not to send anything. For example, it should be possible to call ReceiveMsgFrom in situations where the send buffer has unrelated contents. Maybe it’d be better to just have the relevant code (i.e. the old prepareForTransport duplicated here to extract the header instead of changing the send parts of m_transport?)

sipa commented at 10:41 pm on August 15, 2023:

It’s complicated. You’re right that duplicating the old prepareForTransport here would work, but I feel that’s not really the right approach, as it’s kind of hiding what’s really going on. It’d also be incompatible with a potential future upgrade to using v2 transports inside the affected unit and fuzz tests.

What’s really going on is that we’re using the CNode’s own sending infrastructure to construct bytes, which are then fed to the same CNode’s receive infrastructure. But there is also “normal” non-test sending logic (e.g. sending of version/verack automatically) that uses the same sending infrastucture (however, failing as there is no real socket to send anything on). The flushing here was necessary to wipe the non-test messages that enter the same transport.

I have for now tried to address your concern here by introducing a more explicit FlushSendBuffer call, and adding it where necessary to make tests pass, making it clear where and what is being flushed away, however I think the more proper solution would involve:

Somehow prevent the sending of non-test messages entirely in test connmans (rather that hacking them away after the fact).
Give the test connman dedicated per-receive-node send transports which are just used for the test messages sent there. That would make it compatible with v2 operation too.

Doing those things here feel out of scope, though.

mzumsande commented at 6:19 pm on August 14, 2023: contributor

Concept ACK

sipa force-pushed on Aug 15, 2023

sipa commented at 10:47 pm on August 15, 2023: member

I’ve made some significant changes to this PR:

DoneSendingMessage is gone (the caller can just call SetMessageToSend directly, which will fail if nothing can be sent at that time).
HaveBytesToSend is gone (the caller can just call GetBytesToSend, which will report an empty span if nothing is to be sent).
GetBytesToSend now takes a have_next_message as input, which lets its prediction for whether there are more bytes to send afterwards be more accurate (letting it take into account a future SetMessageToSend). This prediction is also tested now, through an Assume in the net code directly, and inside the transport simulation fuzz test.
A lot more comments and documentation.
More sanity checks in the added fuzz test.

DrahtBot added the label CI failed on Aug 16, 2023

in src/net.h:306 in 384c798cc3 outdated

304+     *
305+     * The bytes returned by this function act as a stream which can only be appended to. This
306+     * means that with the exception of MarkBytesSent, operations on the transport can only append
307+     * to what is being returned.
308+     */
309+    virtual BytesToSend GetBytesToSend(bool have_next_message) const noexcept = 0;

Sjors commented at 9:40 am on August 16, 2023:

384c798cc3836558463e88ec7f563b236f50bf22

The have_next_message argument is confusing and seems contradictory to the goal of separating the bytes stream from messages.

It is set by the caller when they know they have * another message ready to send.

In this commit it’s always false, so if we need it at all, maybe delay its introduction to 09972b518bed5809d37c79fb0ddef034dce27ba0? That’s a better place to explain how it relates to setting Sock::SEND (if it can’t be in its own commit).

sipa commented at 11:17 pm on August 16, 2023:

I’ve moved the introduction of have_next_message to (a separate commit in) #28196, as it’s not really necessary in this PR yet.

in src/net.h:292 in 384c798cc3 outdated

287+     *  - Span<const uint8_t> to_send: span of bytes to be sent over the wire (possibly empty).
288+     *  - bool more: whether there will be more bytes to be sent after the ones in to_send are
289+     *    all sent (as signaled by MarkBytesSent()).
290+     *  - const std::string& m_type: message type on behalf of which this is being sent.
291+     */
292+    using BytesToSend = std::tuple<Span<const uint8_t>, bool, const std::string&>;

Sjors commented at 9:45 am on August 16, 2023:

384c798cc3836558463e88ec7f563b236f50bf22 could we annotate the tuple?

0std::tuple<Span<const uint8_t> /*to_send*/, bool /*more*/, const std::string& /* m_type */>

sipa commented at 11:22 pm on August 16, 2023:

Done.

Sjors commented at 12:59 pm on August 16, 2023: member

The new approach in 384c798cc3836558463e88ec7f563b236f50bf22 is a nice improvement.

I’ll continue reviewing from 09972b518bed5809d37c79fb0ddef034dce27ba0 onwards after you’ve had a chance to comment on my more comment.

sipa force-pushed on Aug 16, 2023

DrahtBot removed the label CI failed on Aug 17, 2023

in src/net.cpp:3010 in 58cb50cdc6 outdated

3028-        // At this point, m_transport->GetSendMemoryUsage() isn't very interesting as the
3029-        // transport's message is fully flushed (and converted to byte arrays). It's still included
3030-        // here for correctness, and will become relevant in a future commit when a queued message
3031-        // inside the transport may survive PushMessage calls.
3032+        const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
3033+        const bool queue_was_empty{to_send.empty() && pnode->vSendMsg.empty()};

Sjors commented at 1:32 pm on August 17, 2023:

58cb50cdc60df5d7e42466213f213cf7ab54db45: IIUC in v1 && pnode->vSendMsg.empty() is always true when GetBytesToSend() just told us there’s nothing to send? But in v2 we might be waiting for a handshake.

sipa commented at 6:32 pm on August 17, 2023:

I believe that’s true, though only due to optimistic send itself (which guarantees that as soon as something is added to vSendMsg, it’s immediately at least attempted to be moved to the transport). Without it, if there were two consecutive PushMessage without intervening SocketSendData, vSendMsg would already have something in it on the second one.

Because of that, I think it’s more “obviously correct” to do the full check here.

DrahtBot added the label Needs rebase on Aug 17, 2023

Sjors approved

Sjors commented at 3:00 pm on August 17, 2023: member

ACK d9a91ba9606f828ef3846d2877ec56313653a109 just in time for the #27981 rebase :-(

sipa force-pushed on Aug 17, 2023

DrahtBot removed the label Needs rebase on Aug 17, 2023

sipa force-pushed on Aug 18, 2023

DrahtBot added the label CI failed on Aug 18, 2023

sipa force-pushed on Aug 18, 2023

Sjors commented at 2:02 pm on August 18, 2023: member

re-ACK 07cb6c4

The rebase on #27981 looks good to me. I didn’t look at the fuzzer changes. Otherwise it’s just some extra comments.

sipa force-pushed on Aug 18, 2023

sipa commented at 2:07 pm on August 18, 2023: member

Updates since a few days ago:

Moved the introduction of the have_next_message argument of GetBytesToSend, and testing thereof, to #28196.
Rebased after merge of #27981 (sorry @Sjors, the earlier rebase was incorrect, it only looked at vSendMsg.empty() for the returned data_left in SocketSendData()).
Added a few more comments.
Some performance improvements to the fuzz test.

Sjors commented at 4:01 pm on August 18, 2023: member

(sorry @Sjors, the earlier rebase was incorrect, it only looked at vSendMsg.empty() for the returned data_left in SocketSendData()).

Is there a test you can add that would catch this difference?

sipa commented at 4:40 pm on August 18, 2023: member

@Sjors I don’t think so, or not easily. Giving the wrong “data_left” result just causes the #27981 heuristic to be wrong, but outside of some edge cases, that’s unlikely to be noticeable in practice.

in src/net.cpp:918 in 8d1a91f639 outdated

905@@ -906,36 +906,50 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
906 {
907     auto it = node.vSendMsg.begin();
908     size_t nSentSize = 0;
909-
910-    while (it != node.vSendMsg.end()) {
911-        const auto& data = *it;
912-        assert(data.size() > node.nSendOffset);
913+    bool have_unsent_data{false};

Sjors commented at 5:38 pm on August 18, 2023:

8d1a91f6394f00b880613268a9f2f3164ef914e2: data_left is more consistent with the header

Sjors commented at 6:00 pm on August 18, 2023:

Or perhaps something like return {nSentSize, !message_queue_empty || !transport_empty};

sipa commented at 3:29 pm on August 21, 2023:

Will address when I retouch.

sipa commented at 7:33 pm on August 21, 2023:

Fixed.

in src/net.cpp:928 in 8d1a91f639 outdated

919+            size_t memusage = it->GetMemoryUsage();
920+            if (node.m_transport->SetMessageToSend(*it)) {
921+                // Update memory usage of send buffer (as *it will be deleted).
922+                node.m_send_memusage -= memusage;
923+                ++it;
924+            }

Sjors commented at 5:49 pm on August 18, 2023:

8d1a91f6394f00b880613268a9f2f3164ef914e2: IIUC even though !node.vSendMsg.empty(), we don’t want else { have_unsent_data = true } here because:

SetMessageToSend may return false for other reasons (though afaik it can’t at the moment)
If it does return false, then GetBytesToSend below populates data, tries to send it and then updates have_unsent_data appropriately.

Sjors approved

Sjors commented at 5:57 pm on August 18, 2023: member

re-ACK d99269058922af5fdc14d7fcc88f132060b4b57a

vincenzopalazzo approved

vincenzopalazzo commented at 5:58 pm on August 18, 2023: none

ACK https://github.com/bitcoin/bitcoin/pull/28165/commits/d99269058922af5fdc14d7fcc88f132060b4b57a

I like the abstraction of the Transport concept, it is close to a person’s mental model of this kind of system!

DrahtBot removed the label CI failed on Aug 18, 2023

in src/test/fuzz/p2p_transport_serialization.cpp:39 in d992690589 outdated

24@@ -24,9 +25,10 @@ void initialize_p2p_transport_serialization()
25 
26 FUZZ_TARGET(p2p_transport_serialization, .init = initialize_p2p_transport_serialization)
27 {
28-    // Construct deserializer, with a dummy NodeId
29-    V1TransportDeserializer deserializer{Params(), NodeId{0}, SER_NETWORK, INIT_PROTO_VERSION};
30-    V1TransportSerializer serializer{};
31+    // Construct transports for both sides, with dummy NodeIds.
32+    V1Transport recv_transport{NodeId{0}, SER_NETWORK, INIT_PROTO_VERSION};
33+    V1Transport send_transport{NodeId{1}, SER_NETWORK, INIT_PROTO_VERSION};

vasild commented at 11:10 am on August 19, 2023:

Is it not possible to use a single transport here?

sipa commented at 5:13 pm on August 21, 2023:

I believe so, though I think it’s cleaner to use two (the test isn’t actually trying to model a node talking to itself).

in src/net.h:326 in d992690589 outdated

349-    const uint256& GetMessageHash() const;
350-    int readHeader(Span<const uint8_t> msg_bytes);
351-    int readData(Span<const uint8_t> msg_bytes);
352-
353-    void Reset() {
354+    mutable Mutex m_cs_recv; //!< Lock for receive state

vasild commented at 11:26 am on August 19, 2023:

Aren’t mutexes supposed to be named like m_foo_mutex, rather than m_cs_foo in new code? I don’t think there is written convention, feel free to ignore.

sipa commented at 7:34 pm on August 21, 2023:

Ok, renamed to m_send_mutex and m_recv_mutex.

in src/net.h:378 in d992690589 outdated

385     }
386-    void SetVersion(int nVersionIn) override
387+
388+    void SetReceiveVersion(int nVersionIn) override EXCLUSIVE_LOCKS_REQUIRED(!m_cs_recv)
389     {
390+        LOCK(m_cs_recv);

vasild commented at 11:38 am on August 19, 2023:

According to developer-notes.md annotations in function declaration (EXCLUSIVE_LOCKS_REQUIRED(!m_cs_recv)) should be combined with run-time asserts in function definitions (AssertLockNotHeld(m_cs_recv)). So, this code should be:

0void SetReceiveVersion(int nVersionIn) override EXCLUSIVE_LOCKS_REQUIRED(!m_cs_recv)
1{
2    AssertLockNotHeld(m_cs_recv);
3    LOCK(m_cs_recv);

I opened #27116 to relax that since it is redundant.

sipa commented at 7:35 pm on August 21, 2023:

I’ve added AssertLockHeld and AssertLockNotHeld everywhere. It’s easy to delete some of them later if #27116 would get accepted.

in src/net.h:309 in d992690589 outdated

303+     */
304+    using BytesToSend = std::tuple<
305+        Span<const uint8_t> /*to_send*/,
306+        bool /*more*/,
307+        const std::string& /*m_type*/
308+    >;

vasild commented at 1:55 pm on August 19, 2023:

Maybe mention something about the lifetime of to_send and msg_type since they refer to data inside the transport. IIRC will be invalidated by the next SetMessageToSend() call.

sipa commented at 7:35 pm on August 21, 2023:

Done.

in src/net.cpp:732 in d992690589 outdated

725@@ -717,7 +726,15 @@ bool CNode::ReceiveMsgBytes(Span<const uint8_t> msg_bytes, bool& complete)
726     return true;
727 }
728 
729-int V1TransportDeserializer::readHeader(Span<const uint8_t> msg_bytes)
730+V1Transport::V1Transport(const NodeId node_id, int nTypeIn, int nVersionIn) noexcept :
731+    m_node_id(node_id), hdrbuf(nTypeIn, nVersionIn), vRecv(nTypeIn, nVersionIn)
732+{
733+    std::copy(std::begin(Params().MessageStart()), std::end(Params().MessageStart()), m_magic_bytes);

vasild commented at 11:41 am on August 21, 2023:

This is unbounded copy. Maybe check that it will not cause buffer overflow? E.g.

0    const auto& src = Params().MessageStart();
1    const auto& src_begin = std::begin(src);                             
2    const auto& src_end = std::end(src);
3    assert(std::distance(src_begin, src_end) <= std::distance(std::begin(m_magic_bytes), std::end(m_magic_bytes)));
4    std::copy(src_begin, src_end, m_magic_bytes);

sipa commented at 7:35 pm on August 21, 2023:

I’ve added an assertion equivalent to this.

in src/test/fuzz/p2p_transport_serialization.cpp:119 in d992690589 outdated

111+
112+    // Put the transports in an array for by-index access.
113+    std::array<Transport*, 2> transports = {&initiator, &responder};
114+
115+    // Two deques representing in-flight bytes. inflight[i] is from transport[i] to transport[!i].
116+    std::array<std::vector<uint8_t>, 2> in_flight;

vasild commented at 3:41 pm on August 21, 2023:

nit:

0    // Two vectors representing in-flight bytes. inflight[i] is from transport[i] to transport[!i].
1    std::array<std::vector<uint8_t>, 2> in_flight;

sipa commented at 7:35 pm on August 21, 2023:

Done.

in src/test/fuzz/p2p_transport_serialization.cpp:130 in d992690589 outdated

130+    // Function to consume a message type.
131+    auto msg_type_fn = [&]() {
132+        std::string ret;
133+        while (ret.size() < CMessageHeader::COMMAND_SIZE) {
134+            char c = provider.ConsumeIntegral<char>();
135+            if (c < ' ' || c > 0x7E) break;

vasild commented at 3:50 pm on August 21, 2023:

Is this some form of locale independent isprint(3)?

If the purpose is to generate random (including invalid) message types then why not just provider.ConsumeBytesAsString(CMessageHeader::COMMAND_SIZE), or if the purpose is to generate only valid message types, then pick a random value from getAllNetMessageTypes()?

sipa commented at 7:36 pm on August 21, 2023:

No, it’s matching the message type validity criteria from CMessageHeader::IsCommandValid().

I’ve reworked this a bit (making it also select from valid messages, which is useful for a follow-up PR anyway), and added comments.

in src/test/fuzz/p2p_transport_serialization.cpp:266 in d992690589 outdated

266+        return to_recv_len > 0;
267+    };
268+
269+    // Main loop, interleaving new messages, sends, and receives.
270+    unsigned iter = 0;
271+    while (iter < 1000 && provider.remaining_bytes()) {

vasild commented at 4:14 pm on August 21, 2023:

0    LIMITED_WHILE(provider.remaining_bytes(), 1000) {

sipa commented at 7:37 pm on August 21, 2023:

Done.

in src/test/fuzz/p2p_transport_serialization.cpp:270 in d992690589 outdated

270+    unsigned iter = 0;
271+    while (iter < 1000 && provider.remaining_bytes()) {
272+        uint8_t code = provider.ConsumeIntegral<uint8_t>();
273+        // Lowest bit identifies the sender side of the operation.
274+        int side = code & 1;
275+        switch ((side >> 1) % 3) {

vasild commented at 4:21 pm on August 21, 2023:

Given that side is 0 or 1, then side >> 1 will always be 0, no? Did you mean code >> 1? I find this less cryptic:

0size_t side = provider.ConsumeBool();
1switch (provider.ConsumeIntegralInRange(0, 2)) {
2    case 0: ...
3    case 1: ...
4    case 2: ...
5}

sipa commented at 7:38 pm on August 21, 2023:

Ouch, that was a serious problem with the fuzz test.

I’ve instead replaced it with

 0    LIMITED_WHILE(provider.remaining_bytes(), 1000) {
 1        CallOneOf(provider,
 2            // (Try to) give the next message to the transport.
 3            [&] { new_msg_fn(0); },
 4            [&] { new_msg_fn(1); },
 5            // (Try to) send some bytes from the transport to the network.
 6            [&] { send_fn(0); },
 7            [&] { send_fn(1); },
 8            // (Try to) receive bytes from the network, converting to messages.
 9            [&] { recv_fn(0); },
10            [&] { recv_fn(1); }
11        );
12    }

which is hopefully as clear as your suggestion, but doesn’t waste 2 bytes of input to pick one of 6 values.

in src/test/fuzz/p2p_transport_serialization.cpp:333 in d992690589 outdated

323+    FuzzedDataProvider provider{buffer.data(), buffer.size()};
324+    XoRoShiRo128PlusPlus rng(provider.ConsumeIntegral<uint64_t>());
325+    auto t1 = MakeV1Transport(NodeId{0});
326+    auto t2 = MakeV1Transport(NodeId{1});
327+    if (!t1 || !t2) return;
328+    SimulationTest(*t1, *t2, rng, provider);

vasild commented at 4:30 pm on August 21, 2023:

Is there a reason to use pointers? The following seems to be simpler:

0    V1Transport t0{NodeId{0}, SER_NETWORK, INIT_PROTO_VERSION};
1    V1Transport t1{NodeId{1}, SER_NETWORK, INIT_PROTO_VERSION};
2    SimulationTest(t0, t1, rng, provider);

and remove MakeV1Transport()?

sipa commented at 7:40 pm on August 21, 2023:

I could do that here, but the test gets extended in #28196 with versions that let a V1 transport talk to a V2, and a V2 to a V2 (see https://github.com/bitcoin/bitcoin/blob/b23b1d0700e0cffc0d0d2c624222142f211e04be/src/test/fuzz/p2p_transport_serialization.cpp#L377L408). For consistency, I think the code here is better, though if you want I can delay the introduction to that PR.

vasild commented at 7:45 am on August 22, 2023:

It is ok as it is - there is some reason for using pointers.

in src/test/fuzz/p2p_transport_serialization.cpp:191 in d992690589 outdated

191+            expected[side].emplace_back(std::move(next_msg[side]));
192+            // Construct a new next message to send.
193+            next_msg[side] = make_msg_fn(false);
194         }
195+        // Return whether any message was added.
196+        return queued;

vasild commented at 4:32 pm on August 21, 2023:

The return value of new_msg_fn() is ignored by the caller.

sipa commented at 7:40 pm on August 21, 2023:

Done.

vasild commented at 4:50 pm on August 21, 2023: contributor

Posting review midway, reviewed up to 57e9bb0ffd (incl).

sipa force-pushed on Aug 21, 2023

in src/net.h:307 in 678809e001 outdated

314+     *
315+     * The bytes returned by this function act as a stream which can only be appended to. This
316+     * means that with the exception of MarkBytesSent, operations on the transport can only append
317+     * to what is being returned.
318+     *
319+     * Note that the m_type value returned is a reference, and calling any non-const function on

vasild commented at 8:59 am on August 22, 2023:

nit: and to_send too

0     * Note that m_type and to_send refer to data that is internal to the transport, and calling any non-const function on

sipa commented at 7:29 pm on August 22, 2023:

Done.

in src/test/fuzz/p2p_transport_serialization.cpp:138 in 678809e001 outdated

138+
139+    // Function to consume a message type.
140+    auto msg_type_fn = [&]() {
141+        uint8_t v = provider.ConsumeIntegral<uint8_t>();
142+        if (v <= 12) {
143+            // If v is in range 0..12, construct a valid (but possibly unknown) message type of

vasild commented at 9:14 am on August 22, 2023:

nit:

0        if (v <= CMessageHeader::COMMAND_SIZE) {
1            // If v is below the maximum command size, construct a valid (but possibly unknown) message type of

sipa commented at 7:30 pm on August 22, 2023:

Done. I’ve also changed the encoding a bit further to simplify it:

Only v = 0xFF is now used for free-form commands (and the length is determined by the first invalid character).
v % g_all_messages.size() can be used as index without potentially confusing lack of subtraction.

in src/test/fuzz/p2p_transport_serialization.cpp:150 in 678809e001 outdated

150+                ret += c;
151+            }
152+            return ret;
153+        } else {
154+            // Otherwise, use it as index into the list of known messages.
155+            return g_all_messages[(v - 13) % g_all_messages.size()];

vasild commented at 9:20 am on August 22, 2023:

I guess the following would achieve the same, no need to subtract 13?

0            return g_all_messages[v % g_all_messages.size()];

sipa commented at 7:30 pm on August 22, 2023:

Done.

in src/net.cpp:866 in 678809e001 outdated

970+            node.AccountForSentBytes(msg_type, nBytes);
971             nSentSize += nBytes;
972-            if (node.nSendOffset == data.size()) {
973-                node.nSendOffset = 0;
974-                node.nSendSize -= data.size();
975-                node.fPauseSend = node.nSendSize > nSendBufferMaxSize;

vasild commented at 9:35 am on August 22, 2023:

In the old code it was really strange to check whether nSendSize is too big after decrementing it. Good that this is gone now.

sipa commented at 3:04 pm on August 22, 2023:

FWIW, this line is (was) for turning fPauseSend off again after the buffer level dropped.

in src/net.cpp:939 in 678809e001 outdated

943             LOCK(node.m_sock_mutex);
944+            // There is no socket in case we've already disconnected, or in test cases without
945+            // real connections. In these cases, we bail out immediately and just leave things
946+            // in the send queue and transport.
947             if (!node.m_sock) {
948+                // Don't set data_left here here; we won't ever send anything anymore.

vasild commented at 10:32 am on August 22, 2023:

nit:

0                // Don't set data_left here; we won't ever send anything anymore.

sipa commented at 7:30 pm on August 22, 2023:

Done.

in src/net.cpp:962 in 678809e001 outdated

975-                node.fPauseSend = node.nSendSize > nSendBufferMaxSize;
976-                it++;
977-            } else {
978+            if ((size_t)nBytes != data.size()) {
979                 // could not send full message; stop sending more
980+                data_left = true;

vasild commented at 10:40 am on August 22, 2023:

more and data_left mean similar things and I find the names a bit confusing or difficult to distinguish which one means what. Maybe consider something like more_in_transport and more_in_data.

Edit: wait, they mean the same thing. I thought that data_left would be used to retry with the remaining from the data variable, but this is not the case. Evey time we set data_left to true we break from the loop, getting data out of scope and losing it. In this case the data remains in the transport.

Is it possible to use just one variable, something like the following?

 0--- i/src/net.cpp
 1+++ w/src/net.cpp
 2@@ -925,13 +925,14 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
 3             if (node.m_transport->SetMessageToSend(*it)) {
 4                 // Update memory usage of send buffer (as *it will be deleted).
 5                 node.m_send_memusage -= memusage;
 6                 ++it;
 7             }
 8         }
 9-        const auto& [data, more, msg_type] = node.m_transport->GetBytesToSend();
10+        const auto& [data, _data_left, msg_type] = node.m_transport->GetBytesToSend();
11+        data_left = _data_left;
12         int nBytes = 0;
13         if (!data.empty()) {
14             LOCK(node.m_sock_mutex);
15             // There is no socket in case we've already disconnected, or in test cases without
16             // real connections. In these cases, we bail out immediately and just leave things
17             // in the send queue and transport.
18@@ -940,13 +941,13 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
19                 break;
20             }
21             int flags = MSG_NOSIGNAL | MSG_DONTWAIT;
22 #ifdef MSG_MORE
23             // We have more to send if either the transport itself has more, or if we have more
24             // messages to send.
25-            if (more || it != node.vSendMsg.end()) {
26+            if (data_left || it != node.vSendMsg.end()) {
27                 flags |= MSG_MORE;
28             }
29 #endif
30             nBytes = node.m_sock->Send(reinterpret_cast<const char*>(data.data()), data.size(), flags);
31         }
32         if (nBytes > 0) {
33@@ -969,13 +970,15 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
34                 if (nErr != WSAEWOULDBLOCK && nErr != WSAEMSGSIZE && nErr != WSAEINTR && nErr != WSAEINPROGRESS) {
35                     LogPrint(BCLog::NET, "socket send error for peer=%d: %s\n", node.GetId(), NetworkErrorString(nErr));
36                     node.CloseSocketDisconnect();
37                 }
38             }
39             // couldn't send anything at all
40-            data_left = !data.empty();
41+            if (!data.empty()) {
42+                data_left = true;
43+            }
44             break;
45         }
46     }
47 
48     node.fPauseSend = node.m_send_memusage + node.m_transport->GetSendMemoryUsage() > nSendBufferMaxSize;
49

Sjors commented at 3:07 pm on August 22, 2023:

I would use data_in_transport instead of _data_left (see #28165 (review)).

And then drop data_left = _data_left;.

Then below:

0bool message_queue_empty = it == node.vSendMsg.end()`
1if (data_in_transport || !message_queue_empty) {
2  flags |= MSG_MORE;

And:

0// couldn't send anything at all
1if (!data.empty()) {
2  data_in_transport = true
3}

And then finally return !message_queue_empty || data_in_transport instead of data_left.

sipa commented at 3:11 pm on August 22, 2023:

I don’t find this easier to reason about, as the two variables do mean something different (one is whether more bytes follow after to_send, and the other is a return value saying whether SocketSendData didn’t send everything).

There is another (even) shorter solution, I think:

 0
 1--- i/src/net.cpp
 2+++ w/src/net.cpp
 3@@ -925,13 +925,14 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
 4             if (node.m_transport->SetMessageToSend(*it)) {
 5                 // Update memory usage of send buffer (as *it will be deleted).
 6                 node.m_send_memusage -= memusage;
 7                 ++it;
 8             }
 9         }
10         const auto& [data, more, msg_type] = node.m_transport->GetBytesToSend();
11+        data_left = !data.empty(); // will be overwritten on the next loop if we send data in full.
12         int nBytes = 0;
13         if (!data.empty()) {
14             LOCK(node.m_sock_mutex);
15             // There is no socket in case we've already disconnected, or in test cases without
16             // real connections. In these cases, we bail out immediately and just leave things
17             // in the send queue and transport.
18@@ -940,13 +941,13 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
19                 break;
20             }
21             int flags = MSG_NOSIGNAL | MSG_DONTWAIT;
22 #ifdef MSG_MORE
23             // We have more to send if either the transport itself has more, or if we have more
24             // messages to send.
25             if (more || it != node.vSendMsg.end()) {
26                 flags |= MSG_MORE;
27             }
28 #endif
29             nBytes = node.m_sock->Send(reinterpret_cast<const char*>(data.data()), data.size(), flags);
30         }
31         if (nBytes > 0) {
32@@ -969,13 +970,15 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
33                 if (nErr != WSAEWOULDBLOCK && nErr != WSAEMSGSIZE && nErr != WSAEINTR && nErr != WSAEINPROGRESS) {
34                     LogPrint(BCLog::NET, "socket send error for peer=%d: %s\n", node.GetId(), NetworkErrorString(nErr));
35                     node.CloseSocketDisconnect();
36                 }
37             }
38             // couldn't send anything at all
39             break;
40         }
41     }
42 
43     node.fPauseSend = node.m_send_memusage + node.m_transport->GetSendMemoryUsage() > nSendBufferMaxSize;

Do you think that’s better?

Sjors commented at 5:48 pm on August 22, 2023:

I’m only seeing an extra line here…? But something along those lines could be worth a shot.

sipa commented at 7:31 pm on August 22, 2023:

@Sjors It was a diff w.r.t. the code that was there before, not w.r.t. the PR as it was then.

In either case, I’ve implemented it, please have a look.

in src/net.cpp:1326 in 678809e001 outdated

1326+        bool select_send;
1327+        {
1328+            LOCK(pnode->cs_vSend);
1329+            // Sending is possible if either there are bytes to send right now, or if there will be
1330+            // once a potential message from vSendMsg is handed to the transport.
1331+            const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();

vasild commented at 11:00 am on August 22, 2023:

Since the last two seem to be irrelevant and unused, if there is no other reason to show some meaningful names, then:

0            const auto& [to_send, std::ignore, std::ignore] = pnode->m_transport->GetBytesToSend();

(same in CConnman::PushMessage() and in the tests)

Sjors commented at 2:56 pm on August 22, 2023:

Optionally annotate the return values if the fact that we’re ignoring them is potentially interesting / strange. [to_send, /*more=*/std::ignore, /*msg_type=*/std::ignore] = pnode->…

sipa commented at 3:14 pm on August 22, 2023:

This is not valid C++. std::ignore can be used as an argument to std::tie, but not in structured bindings. The _ in front is supposed to indicate the variables are unused, instead (which also documents their name).

vasild commented at 12:48 pm on August 23, 2023:

Ok, in C++27 then, sorry for the noise.

in src/net.cpp:3010 in 678809e001 outdated

3015     size_t nBytesSent = 0;
3016     {
3017         LOCK(pnode->cs_vSend);
3018-        bool optimisticSend(pnode->vSendMsg.empty());
3019+        const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
3020+        const bool queue_was_empty{to_send.empty() && pnode->vSendMsg.empty()};

vasild commented at 12:25 pm on August 22, 2023:

This logic is used also in CConnman::GenerateWaitSockets() but negated: select_send = !to_send.empty() || !pnode->vSendMsg.empty(); is it worth introducing bool CNode::SendQueueEmpty() and using it in both places?

 0--- i/src/net.cpp
 1+++ w/src/net.cpp
 2@@ -1319,20 +1319,13 @@ Sock::EventsPerSock CConnman::GenerateWaitSockets(Span<CNode* const> nodes)
 3     for (const ListenSocket& hListenSocket : vhListenSocket) {
 4         events_per_sock.emplace(hListenSocket.sock, Sock::Events{Sock::RECV});
 5     }
 6 
 7     for (CNode* pnode : nodes) {
 8         bool select_recv = !pnode->fPauseRecv;
 9-        bool select_send;
10-        {
11-            LOCK(pnode->cs_vSend);
12-            // Sending is possible if either there are bytes to send right now, or if there will be
13-            // once a potential message from vSendMsg is handed to the transport.
14-            const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
15-            select_send = !to_send.empty() || !pnode->vSendMsg.empty();
16-        }
17+        bool select_send = WITH_LOCK(pnode->cs_vSend, !pnode->SendQueueEmpty());
18         if (!select_recv && !select_send) continue;
19 
20         LOCK(pnode->m_sock_mutex);
21         if (pnode->m_sock) {
22             Sock::Event event = (select_send ? Sock::SEND : 0) | (select_recv ? Sock::RECV : 0);
23             events_per_sock.emplace(pnode->m_sock, Sock::Events{event});
24@@ -3007,14 +3000,13 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
25         msg.data.data()
26     );
27 
28     size_t nBytesSent = 0;
29     {
30         LOCK(pnode->cs_vSend);
31-        const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
32-        const bool queue_was_empty{to_send.empty() && pnode->vSendMsg.empty()};
33+        const bool queue_was_empty{pnode->SendQueueEmpty()};
34 
35         // Update memory usage of send buffer.
36         pnode->m_send_memusage += msg.GetMemoryUsage();
37         if (pnode->m_send_memusage + pnode->m_transport->GetSendMemoryUsage() > nSendBufferMaxSize) pnode->fPauseSend = true;
38         // Move message to vSendMsg queue.
39         pnode->vSendMsg.push_back(std::move(msg));

Sjors commented at 2:54 pm on August 22, 2023:

Didn’t check, but that looks like a nice simplification indeed.

sipa commented at 6:16 pm on August 22, 2023:

This works, but in #28196 one of the two call sites would be replaced with other code anyway (or require more virtual function calls than needed), so my preference is not doing this.

in src/net.cpp:915 in 678809e001 outdated

915+    return m_message_to_send.GetMemoryUsage();
916 }
917 
918 std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
919 {
920     auto it = node.vSendMsg.begin();

vasild commented at 12:48 pm on August 22, 2023:

Just a comment and a question.

Wrt the title of the commit message 69f4b340c068078cca7b68aa04360aaa7c0fd12b net: move message serialization from PushMessage to SocketSendData

By “serialization” you mean creating/serializing the header and concatenating it with the bytes of the already serialized payload? I find this a bit confusing as I would assume that once we have CSerializedNetMsg then serialization has already happened (both of the header and of the payload). It should be done by CNetMsgMaker::Make() which takes the high-level data and produces CSerializedNetMsg.

Why in the first place, even in master do we need to have the header bytes serialized in another place than the payload? Why was it not done so that CNetMsgMaker::Make() would produce both serialized header bytes plus serialized payload bytes, concatenate them and return a single byte array?

sipa commented at 7:35 pm on August 22, 2023:

Yeah, the “serialization” terminology here was weird. I think perhaps historically, the serialization (as in conversion from application objects to byte-serialized form) was perhaps done at the same time as the computation of message header for the v1 protocol, but that hasn’t been the case for a while (probably since some form of abstraction was introduced with the prospect of having separate transports).

I’ve tried to change some comments and commit titles to clarify this distinction.

in src/net.h:278 in 678809e001 outdated

277-    virtual int Read(Span<const uint8_t>& msg_bytes) = 0;
278+    virtual bool ReceivedMessageComplete() const = 0;
279+    // set the deserialization context version
280+    virtual void SetReceiveVersion(int version) = 0;
281+    /** read and deserialize data, advances msg_bytes data pointer. Return false is invalid. */
282+    virtual bool ReceivedBytes(Span<const uint8_t>& msg_bytes) = 0;

vasild commented at 1:07 pm on August 22, 2023:

s/false is invalid/false if invalid/ or even better, tell doxygen about it:

0    /**
1     * Read and deserialize data, advance msg_bytes data pointer.
2     * [@return](/bitcoin-bitcoin/contributor/return/) false if invalid.
3     */
4    virtual bool ReceivedBytes(Span<const uint8_t>& msg_bytes) = 0;

sipa commented at 7:35 pm on August 22, 2023:

Done (adding doxygen comments, a bit more elaborate than what you suggested).

in src/net.cpp:694 in 678809e001 outdated

689@@ -681,16 +690,16 @@ bool CNode::ReceiveMsgBytes(Span<const uint8_t> msg_bytes, bool& complete)
690     nRecvBytes += msg_bytes.size();
691     while (msg_bytes.size() > 0) {
692         // absorb network data
693-        int handled = m_deserializer->Read(msg_bytes);
694-        if (handled < 0) {
695+        bool handled = m_transport->ReceivedBytes(msg_bytes);
696+        if (!handled) {

vasild commented at 1:10 pm on August 22, 2023:

nit (handled is not used afterwards):

0        if (!m_transport->ReceivedBytes(msg_bytes)) {

(same in p2p_transport_serialization.cpp)

Sjors commented at 2:38 pm on August 22, 2023:

I don’t mind keeping handled around, to make it more clear that receiving 0 bytes is also “handled”.

theStack commented at 5:54 pm on August 22, 2023:

nit: if handled is decided to be kept, it should be declared as const.

sipa commented at 7:36 pm on August 22, 2023:

I’ve dropped the handled, because it was outdated anyway. It used to be a variable that contained how many bytes were processed, but since the last commit in this PR, it doesn’t have that meaning anymore at all. Just a branch with a comment is clearer.

vasild approved

vasild commented at 1:14 pm on August 22, 2023: contributor

ACK 678809e00178b33e273f0775eea9635fecaf386a

Some things may be improved, see comments below, but no blockers (thus ACK).

DrahtBot requested review from vincenzopalazzo on Aug 22, 2023

DrahtBot requested review from Sjors on Aug 22, 2023

DrahtBot removed review request from vincenzopalazzo on Aug 22, 2023