BIP324 connection support #28196

pull sipa wants to merge 10 commits into bitcoin:master from sipa:202307_bip324_transport changing 9 files +1558 −59
  1. sipa commented at 5:56 pm on August 1, 2023: member

    This is part of #27634.

    This implements the BIP324 v2 transport (which implements all of what the BIP calls transport layer and application layer), though in a non-exposed way. It is tested through an extensive fuzz test, which verifies that v2 transports can talk to v2 transports, and v1 transports can talk to v2 transports, and a unit test that exercises a number of unusual scenarios. The transport is functionally complete, including:

    • Autodetection of incoming V1 connections.
    • Garbage, both sending and receiving.
    • Short message type IDs, both sending and receiving.
    • Ignore packets (receiving only, but tested in a unit test).
    • Session IDs are visible in getpeerinfo output (for manual comparison).

    Things that are not included, left for future PRs, are:

    • Actually using the v2 transport for connections.
    • Support for the NODE_P2P_V2 service flag.
    • Retrying downgrade to V1 when attempted outbound V2 connections immediately fail.
    • P2P functional and unit tests
  2. DrahtBot commented at 5:57 pm on August 1, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK theStack, mzumsande, naumenkogs
    Concept ACK Sjors
    Stale ACK vincenzopalazzo, vasild

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. sipa marked this as a draft on Aug 1, 2023
  4. sipa renamed this:
    BIP324 transport support
    BIP324 connection support
    on Aug 1, 2023
  5. sipa force-pushed on Aug 1, 2023
  6. DrahtBot added the label CI failed on Aug 1, 2023
  7. sipa force-pushed on Aug 1, 2023
  8. sipa force-pushed on Aug 1, 2023
  9. sipa force-pushed on Aug 1, 2023
  10. DrahtBot removed the label CI failed on Aug 1, 2023
  11. sipa force-pushed on Aug 15, 2023
  12. DrahtBot added the label CI failed on Aug 15, 2023
  13. sipa force-pushed on Aug 15, 2023
  14. sipa force-pushed on Aug 16, 2023
  15. sipa force-pushed on Aug 16, 2023
  16. sipa force-pushed on Aug 16, 2023
  17. sipa force-pushed on Aug 17, 2023
  18. DrahtBot removed the label CI failed on Aug 17, 2023
  19. DrahtBot added the label Needs rebase on Aug 17, 2023
  20. sipa force-pushed on Aug 17, 2023
  21. sipa force-pushed on Aug 17, 2023
  22. DrahtBot removed the label Needs rebase on Aug 17, 2023
  23. sipa force-pushed on Aug 18, 2023
  24. DrahtBot added the label CI failed on Aug 18, 2023
  25. sipa force-pushed on Aug 18, 2023
  26. sipa force-pushed on Aug 18, 2023
  27. DrahtBot removed the label CI failed on Aug 18, 2023
  28. sipa force-pushed on Aug 19, 2023
  29. sipa force-pushed on Aug 21, 2023
  30. sipa force-pushed on Aug 21, 2023
  31. sipa force-pushed on Aug 22, 2023
  32. DrahtBot added the label CI failed on Aug 22, 2023
  33. in test/functional/test_framework/test_framework.py:598 in 163d93304d outdated
    594@@ -595,12 +595,12 @@ def connect_nodes(self, a, b):
    595         # * Must have a verack message before anything else
    596         self.wait_until(lambda: sum(peer['version'] != 0 for peer in from_connection.getpeerinfo()) == from_num_peers)
    597         self.wait_until(lambda: sum(peer['version'] != 0 for peer in to_connection.getpeerinfo()) == to_num_peers)
    598-        self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in from_connection.getpeerinfo()) == from_num_peers)
    599-        self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in to_connection.getpeerinfo()) == to_num_peers)
    600+        self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) >= 21 for peer in from_connection.getpeerinfo()) == from_num_peers)
    


    Sjors commented at 1:21 pm on August 23, 2023:
    163d93304dea08bf8f2ea79d79cabceec1a13d96: I assume this is because of the short message ID’s? Can you mention that in the commit message, so it’s easy to find with git blame?

    sipa commented at 1:52 pm on August 23, 2023:

    This commit will disappear eventually, when integration is done more properly.

    But in short, yes, this is because when short IDs are in use, messages have a 21 byte overhead (on top of the payload) in V2 vs. 24 bytes in V1.

  34. in src/net.cpp:1442 in 9c31593786 outdated
    1434@@ -1434,6 +1435,20 @@ size_t V2Transport::GetSendMemoryUsage() const noexcept
    1435     return sizeof(m_send_buffer) + memusage::DynamicUsage(m_send_buffer);
    1436 }
    1437 
    1438+Span<const std::byte> V2Transport::GetSessionID() const noexcept
    1439+{
    1440+    if (m_use_v1) return m_v1_fallback.GetSessionID();
    1441+    LOCK(m_recv_mutex);
    1442+    if (m_recv_state == RecvState::V1) return m_v1_fallback.GetSessionID();
    


    Sjors commented at 1:30 pm on August 23, 2023:

    9c3159378658c4b1535ab6c546512f7d1b2b3979: what’s the difference between these two scenarios? m_use_v1 vs m_recv_state == RecvState::V1?

    Why not return {}? Unsafe / bad / not great?


    sipa commented at 2:01 pm on August 23, 2023:

    The difference between these scenarios: nothing, but it’s possible that the m_use_v1 atomic gets set in between it being checked here and the lock being grabbed. So the m_use_v1 is there as an optimization that almost always works, but the RecvState::V1 check is necessary to make it always work. For other Transport member functions the m_use_v1 is introduced in a separate commit, which hopefully explains it.

    There’s also nothing wrong with return {}; here, but I thought it’d be more obviously correct to explicitly call the V1Transport function.

  35. Sjors commented at 1:36 pm on August 23, 2023: member

    Started doing review, out of order. Some quick initial questions and remarks…

    I tend to get spontaneous v2 connections eventually, both inbound and outbound. But are there any known reachable mainnet nodes folks can test against? (testnet and signet is fine too I suppose)

    I would be useful to have at least one log message in the lifetime of a peer to indicate it’s a v2. E.g. [net] Added v2 connection peer=… (haven’t checked if we already know it’s v2 at that point)

    Do I assume correctly that 0 refers to size of the message payload, i.e. ignoring the (short) message id (what was a header in v1)? [net] sending verack (0 bytes)

  36. sipa force-pushed on Aug 24, 2023
  37. sipa commented at 0:28 am on August 24, 2023: member
    I’ve dropped the last two commits here (which enabled rudimentary testing), as better testing is possible with #28331 now.
  38. DrahtBot removed the label CI failed on Aug 24, 2023
  39. sipa force-pushed on Aug 24, 2023
  40. sipa marked this as ready for review on Aug 24, 2023
  41. sipa force-pushed on Aug 28, 2023
  42. sipa commented at 1:14 am on August 28, 2023: member
    Added a commit adding unit tests for V2Transport, running through a number of valid and invalid scenarios (including overly-long garbage, decoy packets, ignore bits and non-empty garbage authentication, non-empty version packets, and bit errors).
  43. sipa force-pushed on Aug 28, 2023
  44. DrahtBot added the label CI failed on Aug 28, 2023
  45. DrahtBot removed the label CI failed on Aug 28, 2023
  46. in src/net.cpp:1483 in f41a1a0ae4 outdated
    1180+    CNetMessage msg{std::move(ret)};
    1181+    msg.m_raw_message_size = m_recv_decode_buffer.size() + BIP324Cipher::EXPANSION;
    1182+    if (msg_type) {
    1183+        reject_message = false;
    1184+        msg.m_type = std::move(*msg_type);
    1185+        msg.m_time = time;
    


    mzumsande commented at 11:53 pm on August 28, 2023:
    In V1Transport::GetReceivedMessage, we call RandAddEvent() to harvest entropy from the time and checksum of the received messages. Should something similar be done for V2?

    sipa commented at 3:02 am on August 29, 2023:
    Good idea. Done (in V2Transport::ProcessReceivedPacket, which sees the authentication tags).
  47. sipa force-pushed on Aug 29, 2023
  48. DrahtBot added the label CI failed on Aug 29, 2023
  49. in src/net.cpp:1293 in aa1da1a0ef outdated
    1093+                m_recv_state = RecvState::APP_READY;
    1094+            }
    1095+            break;
    1096+        default:
    1097+            // Any other state is invalid (this function should not have been called).
    1098+            assert(false);
    


    vincenzopalazzo commented at 12:08 pm on August 29, 2023:
    maybe this is for a followup PR, but I think here it is better to print the status of the state, this makes trivial to debug the problem if happens.

    sipa commented at 12:09 pm on August 29, 2023:
    No, this code is unreachable. If it gets hit, it would be obvious what is wrong.

    vincenzopalazzo commented at 12:20 pm on August 29, 2023:
    I see now your point, ok make sense
  50. vincenzopalazzo approved
  51. vincenzopalazzo commented at 12:18 pm on August 29, 2023: none

    ACK https://github.com/bitcoin/bitcoin/pull/28196/commits/d9344fd5c77e69a544c2d80aa6cbd20827f2e169

    Does CI failure look like an unlucky run? or unrelated

     0 node2 2023-08-29T03:35:33.769000Z [httpworker.2] [wallet/wallet.h:832] [WalletLogPrintf] [w1] AddToWallet e930f400504216afbf78e3760d4f6730547258b8a453514d7b2b5bccd2432f9a 
     1 node2 2023-08-29T03:35:33.769921Z [http] [httpserver.cpp:241] [http_request_cb] [http] Received a POST request for /wallet/w1 from 127.0.0.1:49136 
     2 node2 2023-08-29T03:35:33.769971Z [httpworker.3] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=getwalletinfo user=__cookie__ 
     3 node2 2023-08-29T03:35:33.770827Z [http] [httpserver.cpp:241] [http_request_cb] [http] Received a POST request for /wallet/w1 from 127.0.0.1:49136 
     4 node2 2023-08-29T03:35:33.770859Z [httpworker.0] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=listtransactions user=__cookie__ 
     5 test  2023-08-29T03:35:33.772000Z TestFramework (ERROR): Assertion failed 
     6                                   Traceback (most recent call last):
     7                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
     8                                       self.run_test()
     9                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/wallet_backwards_compatibility.py", line 201, in run_test
    10                                       assert txs[3]["abandoned"]
    11                                   AssertionError
    12 test  2023-08-29T03:35:33.773000Z TestFramework (DEBUG): Closing down network thread 
    
  52. DrahtBot removed the label CI failed on Aug 29, 2023
  53. sipa force-pushed on Aug 29, 2023
  54. in src/net.cpp:1020 in f7c0be500a outdated
    939+    AssertLockNotHeld(m_recv_mutex);
    940+    LOCK(m_recv_mutex);
    941+    return m_recv_state == RecvState::APP_READY;
    942+}
    943+
    944+void V2Transport::SetReceiveVersion(int nVersionIn) noexcept
    


    mzumsande commented at 8:38 pm on August 29, 2023:
    Looks like SetReceiveVersion is never used anywhere, neither for V1 nor V2, because the version is set in the respective constructors. Should we just remove it from the Transport interface?

    sipa commented at 8:38 pm on August 30, 2023:
    Added a commit to remove it.
  55. sipa force-pushed on Aug 29, 2023
  56. in src/net.cpp:1171 in f7c0be500a outdated
    1047+{
    1048+    AssertLockHeld(m_recv_mutex);
    1049+    if (m_recv_buffer.size() == BIP324Cipher::LENGTH_LEN) {
    1050+        // Length descriptor received.
    1051+        m_recv_len = m_cipher.DecryptLength(MakeByteSpan(m_recv_buffer));
    1052+        if (m_recv_len > MAX_SIZE + 13 || m_recv_len > MAX_PROTOCOL_MESSAGE_LENGTH + 13) {
    


    mzumsande commented at 8:53 pm on August 29, 2023:
    could put some of the magic numbers (13 here and in a few other places like SetMessageToSend, 12 in GetMessageType()) into constants or maybe use CMessageHeader::COMMAND_SIZE.

    sipa commented at 8:38 pm on August 30, 2023:
    I’ve added CMessageHeader::COMMAND_SIZE in a bunch of places.
  57. in src/net.cpp:1310 in 8a2e9f6fb9 outdated
    1290+            // Unknown short message id.
    1291+            return std::nullopt;
    1292+        }
    1293+    }
    1294+
    1295+    if (contents.size() < 13) return std::nullopt; // Long encoding needs at least 13 bytes
    


    theStack commented at 2:06 am on August 30, 2023:

    nit-like: For the long encoding processing, it might be worthwhile to advance for one byte in the contents span first, to work with the remaining 12 bytes and avoid the plus ones everywhere below (especially for the array indices in the while loops)?

    0    contents = contents.subspan(1);
    1    if (contents.size() < 12) return std::nullopt; // Long encoding needs at least 12 bytes (after the short message id)
    

    sipa commented at 3:20 pm on August 30, 2023:
    Good idea, I’ve rewritten some part of the code here using that approach. Also added more comments.
  58. in src/net.cpp:1374 in 8a2e9f6fb9 outdated
    1356+    if (short_message_id) {
    1357+        contents.resize(1 + msg.data.size());
    1358+        contents[0] = *short_message_id;
    1359+        std::copy(msg.data.begin(), msg.data.end(), contents.begin() + 1);
    1360+    } else {
    1361+        contents.resize(13 + msg.data.size());
    


    theStack commented at 2:12 am on August 30, 2023:

    nit: though logically identical with the current code, could be explicit about the fill value of the resize (in contrast to the short message case above, not all of the new elments after the resize are overwritten):

    0        contents.resize(13 + msg.data.size(), 0);
    

    sipa commented at 3:20 pm on August 30, 2023:
    Done, and added a comment.
  59. theStack commented at 2:28 am on August 30, 2023: contributor

    Concept ACK

    Left some nits regarding short message encoding below, planning to do a deeper review within the next days.

  60. sipa force-pushed on Aug 30, 2023
  61. in src/net.h:345 in 040ea05dae outdated
    327@@ -320,7 +328,7 @@ class Transport {
    328      * Note that m_type and to_send refer to data that is internal to the transport, and calling
    329      * any non-const function on this object may invalidate them.
    330      */
    


    vasild commented at 12:58 pm on August 30, 2023:

    nit: tell doxygen about the parameter (and maybe about the return value):

    0/**
    1 * Title sentence up to first dot.
    2 * Further elaborate description. Many sentences whatever.
    3 * [@param](/bitcoin-bitcoin/contributor/param/)[in] have_next_message Controls whether the "more"...
    4 * [@return](/bitcoin-bitcoin/contributor/return/) The bytes returned by this function...
    5 */
    

    sipa commented at 7:07 pm on August 31, 2023:
    Done.
  62. in src/net.h:467 in 040ea05dae outdated
    472+         * connection aborts. */
    473+        GARBAUTH,
    474+
    475+        /** Version packet.
    476+         *
    477+         * A packet is received, and decrypted/verified. If that succeeds, its contents is
    


    vasild commented at 1:24 pm on August 30, 2023:
    0         * A packet is received, and decrypted/verified. If that succeeds, the state becomes APP and the decrypted contents is
    

    sipa commented at 7:14 pm on August 31, 2023:
    Done.
  63. in src/net.h:435 in 040ea05dae outdated
    440+     *  and ignored by receivers. If extensions are defined, they can change what is sent as long
    441+     *  as an empty version packet contents is interpreted as no extensions present. */
    442+    static constexpr std::array<std::byte, 0> VERSION_CONTENTS = {};
    443+
    444+    /** State type that defines the contents of the receive buffer. */
    445+    enum class RecvState {
    


    vasild commented at 1:31 pm on August 30, 2023:
    0Enum 'RecvState' uses a larger base type ('int', size: 4 bytes) than necessary for its value set, consider using 'std::uint8_t' (1 byte) as the base type to reduce its size [performance-enum-size]
    

    (same for SendState)


    sipa commented at 7:14 pm on August 31, 2023:
    Done.
  64. in src/net.h:443 in 040ea05dae outdated
    439+    /** Contents of the version packet to send. BIP324 stipulates this is supposed to be empty,
    440+     *  and ignored by receivers. If extensions are defined, they can change what is sent as long
    441+     *  as an empty version packet contents is interpreted as no extensions present. */
    442+    static constexpr std::array<std::byte, 0> VERSION_CONTENTS = {};
    443+
    444+    /** State type that defines the contents of the receive buffer. */
    


    vasild commented at 1:42 pm on August 30, 2023:

    It would be useful to have a diagram describing the possible state transitions of this state machine:

    0    /** State type that defines the contents of the receive buffer. Possible transitions:
    1     * KEY_MAYBE_V1 --> V1                                                   *--------<---------*
    2     *            |                                                          v                  |
    3     *            *---> KEY --> GARB_GARBTERM --> GARBAUTH --> VERSION --> APP --> APP_READY -->*
    4     */
    

    (same for SendState)


    sipa commented at 7:14 pm on August 31, 2023:
    Done.
  65. in src/net.cpp:1088 in faf5af3f6b outdated
    1002+    AssertLockNotHeld(m_send_mutex);
    1003+    // We still have to determine if this is a v1 or v2 connection. The bytes being received could
    1004+    // be the beginning of either a v1 packet (network magic + "version\x00"), or of a v2 public key.
    1005+    assert(m_recv_buffer.size() <= m_v1_prefix.size());
    1006+    if (!std::equal(m_recv_buffer.begin(), m_recv_buffer.end(), m_v1_prefix.begin())) {
    1007+        // Mismatch with v1 prefix, so we can assume a v2 connection.
    


    Sjors commented at 2:21 pm on August 30, 2023:
    377ae15917dc460f589614ec33f567b523181dc3: do we want to be a bit more robust here and also check against all / some known other network magic values? Especially if we move away from being on port 8333 all the time. Ultimately the v2 handshake will fail anyway if e.g. a Signet node connects to us, but seems nicer to detect (and log) it.

    sipa commented at 7:40 pm on September 1, 2023:

    I’ve toyed with this idea already actually, and think it’s quite doable. We could instead of just looking at the first 12 bytes, wait until we have the first 16, and if the 12 bytes after the first 4 match “version\x00\x00\x00\x00\x00” but the magic doesn’t match, disconnect instead of treating it as V2.

    I don’t think it’s strictly needed, as the 64+ bytes sent by the initiator should cause instant disconnection anyway by any v1 receiver, even ones of the wrong network.

    WDYT? This PR, or for a follow-up?


    Sjors commented at 2:43 pm on September 4, 2023:
    No strong preference here, but I imagine it’s useful early on for debugging to tell the difference between a truly failed v2 connection and some other network failing to make a v1 connection. Though I have no idea how often the latter actually happens these days.

    sipa commented at 3:42 am on September 6, 2023:
    I’ve added a commit for detecting wrong-network V1 incoming connections, and log + disconnect them.
  66. in src/net.cpp:1152 in faf5af3f6b outdated
    1009+        // Transition the sender to KEY state (if not already).
    1010+        LOCK(m_send_mutex);
    1011+        assert(m_send_state == SendState::KEY_MAYBE_V1 || m_send_state == SendState::KEY);
    1012+        m_send_state = SendState::KEY;
    1013+    } else if (m_recv_buffer.size() == m_v1_prefix.size()) {
    1014+        // Full match with the v2 prefix, so fall back to v1 behavior.
    


    Sjors commented at 2:25 pm on August 30, 2023:
    377ae15917dc460f589614ec33f567b523181dc3: v1 prefix

    sipa commented at 7:30 pm on September 1, 2023:
    Fixed (you’re not the first one to notice).
  67. sipa force-pushed on Aug 30, 2023
  68. sipa commented at 5:27 pm on August 30, 2023: member

    For anyone wanting to review the inner commits of this PR, I recommend starting with the state definitions in “net: add V2Transport class with subset of BIP324 functionality”: https://github.com/bitcoin/bitcoin/blob/1a8892201ef8ca6192348d15e25e50d1eabea986/src/net.h#L433L507

    The idea is that both the sender side and receiver side of V2Transport are state machines that are transitioned through, with each state corresponding to the meaning of what is in the respective send/receive buffers. The receive state is changed by receiving bytes, or extracting completed messages. The send state is changed by receiving bytes, sending bytes, or being given a message to send.

    Later commits then extend this state machine with additional features, like detecting V1 connections, and sending garbage. If useful, I’m happy to split things up further (e.g. breaking out incoming garbage detection, or skipping of decoy packets, into separate commits), but I’d like to hear about this approach from reviewers first.

  69. in src/net.h:443 in de705988c3 outdated
    438+    /** Contents of the version packet to send. BIP324 stipulates this is supposed to be empty,
    439+     *  and ignored by receivers. If extensions are defined, they can change what is sent as long
    440+     *  as an empty version packet contents is interpreted as no extensions present. */
    441+    static constexpr std::array<std::byte, 0> VERSION_CONTENTS = {};
    442+
    443+    /** State type that defines the contents of the receive buffer. */
    


    mzumsande commented at 6:08 pm on August 30, 2023:
    nit: this initially confused me a little; maybe stress that it defines what we are expecting to retrieve from the receive buffer but not necessarily the current content of the receive buffer (which may be empty if we haven’t received anything yet, or contain other data if the peer didn’t follow the protocol) - this seems like a small difference to the send buffer, where it actually contains the given info when it’s in the status.

    sipa commented at 8:38 pm on August 30, 2023:
    Expanded the comment a bit.
  70. in src/net.cpp:1112 in de705988c3 outdated
    966+        // These three states all involve decoding a packet. Process the length descriptor first,
    967+        // followed by the ciphertext.
    968+        if (m_recv_buffer.size() < BIP324Cipher::LENGTH_LEN) {
    969+            return BIP324Cipher::LENGTH_LEN - m_recv_buffer.size();
    970+        } else {
    971+            return BIP324Cipher::EXPANSION + m_recv_len - m_recv_buffer.size();
    


    mzumsande commented at 6:46 pm on August 30, 2023:
    maybe add a comment that BIP324Cipher::EXPANSION includes BIP324Cipher::LENGTH_LEN - I was wondering at first whether that needed to be accounted for too here until I looked it up.

    sipa commented at 8:38 pm on August 30, 2023:
    Done.
  71. in src/net.cpp:1071 in de705988c3 outdated
    1066+            return false;
    1067+        }
    1068+        // Feed the last 4 bytes of the Poly1305 authentication tag (and its timing) into our RNG.
    1069+        RandAddEvent(ReadLE32(m_recv_buffer.data() + m_recv_buffer.size() - 4));
    1070+
    1071+        // At this point we have a valid packed decrypted into m_recv_decode_buffer. Depending on
    


    mzumsande commented at 6:55 pm on August 30, 2023:
    typo: packet

    sipa commented at 8:38 pm on August 30, 2023:
    Fixed.
  72. in src/net.cpp:1259 in de705988c3 outdated
    1077+            m_recv_state = RecvState::VERSION;
    1078+            m_recv_garbage = {};
    1079+            break;
    1080+        case RecvState::VERSION:
    1081+            if (!ignore) {
    1082+                // Version message received; transition to application phase. The contents is
    


    mzumsande commented at 6:56 pm on August 30, 2023:
    typo: content

    sipa commented at 8:38 pm on August 30, 2023:
    BIP324 calls it contents.

    mzumsande commented at 8:48 pm on August 31, 2023:
    okay then, can be resolved.
  73. mzumsande commented at 8:03 pm on August 30, 2023: contributor
    Concept ACK - I reviewed only the first two commits so far in-depth.
  74. sipa force-pushed on Aug 30, 2023
  75. DrahtBot added the label CI failed on Aug 31, 2023
  76. DrahtBot removed the label CI failed on Aug 31, 2023
  77. in src/net.h:435 in 9d496e2d53 outdated
    431+     *  as an empty version packet contents is interpreted as no extensions present. */
    432+    static constexpr std::array<std::byte, 0> VERSION_CONTENTS = {};
    433+
    434+    /** State type that defines the current contents of the receive buffer and/or how the next
    435+     *  received bytes added to it will be interpreted. */
    436+    enum class RecvState {
    


    vasild commented at 7:40 am on August 31, 2023:

    From your comment #28196 (comment)

    … both the sender side and receiver side of V2Transport are state machines that are transitioned through, with each state corresponding to the meaning of what is in the respective send/receive buffers. The receive state is changed by receiving bytes, or extracting completed messages. The send state is changed by receiving bytes, sending bytes, or being given a message to send.

    That would be useful to have as a comment in the source code, somewhere around RecvState or SendState.


    sipa commented at 7:15 pm on August 31, 2023:
    I’ve added an overall comment above enum class RecvState.
  78. in src/net.h:303 in 9d496e2d53 outdated
    298@@ -300,7 +299,8 @@ class Transport {
    299      *  - Span<const uint8_t> to_send: span of bytes to be sent over the wire (possibly empty).
    300      *  - bool more: whether there will be more bytes to be sent after the ones in to_send are
    301      *    all sent (as signaled by MarkBytesSent()).
    302-     *  - const std::string& m_type: message type on behalf of which this is being sent.
    303+     *  - const std::string& m_type: message type on behalf of which this is being sent
    304+     *    ("" for bytes that are not on behalf of any message).
    


    vasild commented at 8:02 am on August 31, 2023:

    What about using std::optional to represent “not on behalf of any message” instead of ""?

    0    using BytesToSend = std::tuple<
    1        Span<const uint8_t> /*to_send*/,
    2        bool /*more*/,
    3        const std::optional<std::string>& /*m_type*/
    4    >;
    5...
    6    /** Type of the message being sent. */
    7    std::optional<std::string> m_send_type GUARDED_BY(m_send_mutex);
    

    sipa commented at 5:45 pm on August 31, 2023:
    The m_type value is really only used to control the breakdown statistics of bytes sent/received per message type in the getpeerinfo RPC. We need string categories there, so it seems to me really the question is what category name to use for bytes not assignable to a specific message type. "" seems to be as good as any, so all this suggestion would entail is having a more complex m_type in BytesToSend, and logic in the RPC code to turn std::nullopt into "". I don’t think that improves much.
  79. in src/net.h:441 in 9d496e2d53 outdated
    423+class V2Transport final : public Transport
    424+{
    425+public:
    426+    static constexpr uint32_t MAX_GARBAGE_LEN = 4095;
    427+
    428+private:
    


    vasild commented at 9:43 am on August 31, 2023:
    Consider this: https://google.github.io/styleguide/cppguide#Declaration_Order, personally I find it easier to read the interface if that order is followed. Or at least I would avoid two public: sections.

    sipa commented at 7:07 pm on August 31, 2023:
    I’ve merged the two public: sections.
  80. in src/net.h:555 in 9d496e2d53 outdated
    551+    std::array<uint8_t, 12> m_v1_prefix = {0, 0, 0, 0, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x00};
    552+
    553+    /** Lock for receiver-side fields. */
    554+    mutable Mutex m_recv_mutex;
    555+    /** In {GARBAUTH, VERSION, APP}, the decrypted packet length, if
    556+     *  m_recv_buffer.size() >= BIP324::LENGTH_LEN). Unspecified otherwise. */
    


    vasild commented at 9:47 am on August 31, 2023:

    typo and an extra ):

    0     *  m_recv_buffer.size() >= BIP324Cipher::LENGTH_LEN. Unspecified otherwise. */
    

    sipa commented at 7:15 pm on August 31, 2023:
    Fixed.
  81. in src/net.h:564 in 9d496e2d53 outdated
    560+    /** During GARBAUTH, the garbage received during GARB_GARBTERM. */
    561+    std::vector<uint8_t> m_recv_garbage GUARDED_BY(m_recv_mutex);
    562+    /** Buffer to put decrypted contents in, for converting to CNetMessage. */
    563+    std::vector<uint8_t> m_recv_decode_buffer GUARDED_BY(m_recv_mutex);
    564+    /** Deserialization type. */
    565+    int m_recv_type GUARDED_BY(m_recv_mutex);
    


    vasild commented at 9:58 am on August 31, 2023:

    Can be const?

    0    const int m_recv_type GUARDED_BY(m_recv_mutex);
    

    Same for m_recv_version.


    sipa commented at 7:15 pm on August 31, 2023:
    Done. Also dropped the GUARDED_BY(m_recv_mutex) which is not needed for immutable variables.
  82. in src/net.h:639 in 9d496e2d53 outdated
    595+public:
    596+
    597+    /** Construct a V2 transport with securely generated random keys. */
    598+    V2Transport(NodeId nodeid, bool initiating, int type_in, int version_in) noexcept;
    599+    /** Construct a V2 transport with specified keys and garbage (test use only). */
    600+    V2Transport(NodeId nodeid, bool initiating, int type_in, int version_in, const CKey& key, Span<const std::byte> ent32, Span<const uint8_t> garbage) noexcept;
    


    vasild commented at 10:05 am on August 31, 2023:
    In some places uint8_t is used and in some other std::byte. I guess they are used interchangeably? Maybe for consistency use just one of them everywhere.

    sipa commented at 5:57 pm on August 31, 2023:

    The conflict is due to the fact that most of the cryptography code (at least the part used by V2Transport, directly and indirectly) has been converted to be entirely Span<std::byte>-based, while the network/serialization code is still mostly (Span<uint8_t> and (uint8_t* data, size_t size)-based). In V2Transport we interact with both, which means some silly conversions.

    I’ve opted not to try to convert the network/serialize code here to Span<std::byte> (as that’s a bigger, and probably longer-term effort, though I do expect that to happen eventually), and as a result the internal V2Transport variables that are closer to that side also use uint8_t*. They’re indeed interchangeable (even by the C++ spec: specifically only char, unsigned char, and std::byte pointers are allowed to be used to access data of any data type), but until the whole codebase this interacts with is std::byte-based, there will be some conversions.

  83. in src/net.cpp:1096 in 9d496e2d53 outdated
    1029+        return m_v1_prefix.size() - m_recv_buffer.size();
    1030+    case RecvState::KEY:
    1031+        // As long as we have not received the other side's public key, don't receive more than
    1032+        // that (64 bytes), as garbage follows, and locating the garbage terminator requires the
    1033+        // key exchange first.
    1034+        return EllSwiftPubKey::size() - m_recv_buffer.size();
    


    vasild commented at 10:59 am on August 31, 2023:
    It is not immediately obvious whether these subtractions can go negative. Should this be handled? Or if it cannot happen, then an assert/assume?

    sipa commented at 7:16 pm on August 31, 2023:
    It’s impossible to hit, because GetMaxBytesToProcess only lets in as much as needed to reach the end of the buffer. I’ve added comments and Assumes.
  84. in src/net.h:571 in 9d496e2d53 outdated
    567+    int m_recv_version GUARDED_BY(m_recv_mutex);
    568+    /** Current receiver state. */
    569+    RecvState m_recv_state GUARDED_BY(m_recv_mutex);
    570+
    571+    /** Lock for sending-side fields. */
    572+    mutable Mutex m_send_mutex;
    


    vasild commented at 11:39 am on August 31, 2023:

    Maybe expand the comments with something like this:

    To avoid deadlocks, if both m_recv_mutex and m_send_mutex have to be locked at the same time, always lock m_recv_mutex first. I.e. when locking m_recv_mutex, make sure that m_send_mutex is not already locked by the calling thread.


    sipa commented at 7:17 pm on August 31, 2023:
    It turns out that we actually have locking annotations for that. I’ve added ACQUIRED_BEFORE(m_send_mutex) to m_recv_mutex, and ACQUIRED_AFTER(m_recv_mutex) to m_send_mutex, together with a shortened version of your suggested comment.
  85. in src/net.cpp:1106 in 9d496e2d53 outdated
    1101+
    1102+        // Initialize the ciphers.
    1103+        std::array<std::byte, EllSwiftPubKey::size()> ellswift_data;
    1104+        std::copy(m_recv_buffer.begin(), m_recv_buffer.end(), UCharCast(ellswift_data.data()));
    1105+        LOCK(m_send_mutex);
    1106+        m_cipher.Initialize(EllSwiftPubKey{ellswift_data}, m_initiating);
    


    vasild commented at 11:47 am on August 31, 2023:
    This is doing an extra copy of the key - first from the buffer to the array and then from the array to ~m_cipher.m_pubkey~ EllSwiftPubKey::m_pubkey. Is it possible to avoid that? Is it possible to initialize m_cipher.m_pubkey directly from the buffer? I think it is ok to take vector in EllSwiftPubKey and document that its size must be EllSwiftPubKey::size() and assert that it is indeed.

    sipa commented at 7:17 pm on August 31, 2023:
    Fixed in an extra preparatory commit (allowing Span<const std::byte> argument for EllSwiftPubKey constructor.
  86. in src/net.cpp:1109 in 9d496e2d53 outdated
    1104+        std::copy(m_recv_buffer.begin(), m_recv_buffer.end(), UCharCast(ellswift_data.data()));
    1105+        LOCK(m_send_mutex);
    1106+        m_cipher.Initialize(EllSwiftPubKey{ellswift_data}, m_initiating);
    1107+
    1108+        // Switch receiver state to GARB_GARBTERM.
    1109+        m_recv_state = RecvState::GARB_GARBTERM;
    


    vasild commented at 11:52 am on August 31, 2023:

    It would be good to enforce the state machine allowed transitions - in all places that change the state, assert that such transition is allowed. In this case: assert(m_recv_state == RecvState::KEY). Or maybe have m_recv_state be a class that enforces correctness internally and is then called like:

    0m_recv_state.ChangeTo(RecvState::GARB_GARBTERM);
    

    sipa commented at 7:18 pm on August 31, 2023:
    A wrapper felt like overkill to me, but I’ve added SetReceiveState and SetSendState functions for effecting state transitions, which enforce the allowed ones.
  87. in src/net.cpp:1192 in 9d496e2d53 outdated
    1144+            // Garbage terminator received. Switch to receiving garbage authentication packet.
    1145+            m_recv_garbage = std::move(m_recv_buffer);
    1146+            m_recv_garbage.resize(m_recv_garbage.size() - BIP324Cipher::GARBAGE_TERMINATOR_LEN);
    1147+            m_recv_buffer.clear();
    1148+            m_recv_state = RecvState::GARBAUTH;
    1149+        } else if (m_recv_buffer.size() == MAX_GARBAGE_LEN + BIP324Cipher::GARBAGE_TERMINATOR_LEN) {
    


    vasild commented at 12:24 pm on August 31, 2023:
    What about changing that == to >=? Otherwise, if it somehow happens that the buffer is larger, then this safety check will never catch it and it will keep receiving “forever”.

    sipa commented at 7:12 pm on August 31, 2023:

    This is something I learned in college, though I’ve never seen it repeated elsewhere: when you have conditional code, make the conditional as narrow as necessary. The reason is that in general, if you let conditionals apply to states that shouldn’t be hit at all, they’ll hide the issue; the alternative generally leads to more obvious failures.

    I think it applies here: if we’d use >=, and we ever ended up in a > state, that would imply we’re not actually testing for the garbage terminator after every byte, possibly leading to hard-to-discover connection failures. With just ==, it’ll keep receiving forever, likely stalling, or crashing, with an obvious too-large receive buffer.

    Of course, we can do even better: add an Assume that the receive buffer size never exceeds MAX_GARBAGE_LEN + BIP324Cipher::GARBAGE_TERMINATOR_LEN in this state, which is what I’ve added here.

  88. in src/net.cpp:1148 in 9d496e2d53 outdated
    1143+        if (MakeByteSpan(m_recv_buffer).last(BIP324Cipher::GARBAGE_TERMINATOR_LEN) == m_cipher.GetReceiveGarbageTerminator()) {
    1144+            // Garbage terminator received. Switch to receiving garbage authentication packet.
    1145+            m_recv_garbage = std::move(m_recv_buffer);
    1146+            m_recv_garbage.resize(m_recv_garbage.size() - BIP324Cipher::GARBAGE_TERMINATOR_LEN);
    1147+            m_recv_buffer.clear();
    1148+            m_recv_state = RecvState::GARBAUTH;
    


    vasild commented at 12:28 pm on August 31, 2023:
    Would be nice if at the start of the function, or just before m_recv_state = ... there is something like assert(current state is xyz);

    sipa commented at 7:14 pm on August 31, 2023:
    Done, at the beginning of all ProcessReceived* functions.
  89. in src/net.cpp:1225 in 9d496e2d53 outdated
    1162+    if (m_recv_buffer.size() == BIP324Cipher::LENGTH_LEN) {
    1163+        // Length descriptor received.
    1164+        m_recv_len = m_cipher.DecryptLength(MakeByteSpan(m_recv_buffer));
    1165+        if (m_recv_len > MAX_SIZE + 1 + CMessageHeader::COMMAND_SIZE ||
    1166+            m_recv_len > MAX_PROTOCOL_MESSAGE_LENGTH + 1 + CMessageHeader::COMMAND_SIZE) {
    1167+            LogPrint(BCLog::NET, "V2 transport error: packet too large (%u bytes), peer=%d\n", m_recv_len, m_nodeid);
    


    vasild commented at 12:33 pm on August 31, 2023:
    0        m_recv_len = m_cipher.DecryptLength(MakeByteSpan(m_recv_buffer));
    1        const auto max = std::min(MAX_SIZE, MAX_PROTOCOL_MESSAGE_LENGTH) + 1 /* what is this? */ + CMessageHeader::COMMAND_SIZE;
    2        if (m_recv_len > max) {
    3            LogPrint(BCLog::NET, "V2 transport error: packet too large (%u bytes > %u), peer=%d\n", m_recv_len, max, m_nodeid);
    

    sipa commented at 7:14 pm on August 31, 2023:
    Done (using a static constexpr size_t constant with comments higher up).
  90. in src/net.cpp:1228 in 9d496e2d53 outdated
    1165+        if (m_recv_len > MAX_SIZE + 1 + CMessageHeader::COMMAND_SIZE ||
    1166+            m_recv_len > MAX_PROTOCOL_MESSAGE_LENGTH + 1 + CMessageHeader::COMMAND_SIZE) {
    1167+            LogPrint(BCLog::NET, "V2 transport error: packet too large (%u bytes), peer=%d\n", m_recv_len, m_nodeid);
    1168+            return false;
    1169+        }
    1170+    } else if (m_recv_buffer.size() > BIP324Cipher::LENGTH_LEN && m_recv_buffer.size() == m_recv_len + BIP324Cipher::EXPANSION) {
    


    vasild commented at 12:40 pm on August 31, 2023:

    Is it possible that this branch executes before the one above? If the buffer is less than LENGTH_LEN (3) and then on the next invocation of this method, it is greater than LENGTH_LEN (3), equal to EXPANSION (20)?

    Edit: that’s not possible currently. It relies on

    1. this method being called only if m_recv_state is one of GARBAUTH, VERSION, APP and
    2. GetMaxBytesToProcess() working correctly

    that’s a bit remote from the point of view inside this function. Maybe add assert or assume to ensure that?


    sipa commented at 7:12 pm on August 31, 2023:
    I’m not sure how I’d assert for it, but I’ve added a comment.
  91. in src/net.cpp:1518 in 9d496e2d53 outdated
    1373+    } else {
    1374+        // Initialize with zeroes, and then write the message type string starting at offset 0.
    1375+        // This means contents[0] and the unused positions in contents[1..13] remain 0x00.
    1376+        contents.resize(1 + CMessageHeader::COMMAND_SIZE + msg.data.size(), 0);
    1377+        std::copy(msg.m_type.begin(), msg.m_type.end(), reinterpret_cast<char*>(contents.data() + 1));
    1378+        std::copy(msg.data.begin(), msg.data.end(), contents.begin() + 1 + CMessageHeader::COMMAND_SIZE);
    


    vasild commented at 1:16 pm on August 31, 2023:

    The cast seems unnecessary?

    0        std::copy(msg.m_type.begin(), msg.m_type.end(), contents.data() + 1);
    1        std::copy(msg.data.begin(), msg.data.end(), contents.begin() + 1 + CMessageHeader::COMMAND_SIZE);
    

    sipa commented at 7:13 pm on August 31, 2023:
    Done.
  92. vasild commented at 1:28 pm on August 31, 2023: contributor
    Reviewed up to 1a8892201e (incl). Posting review midway. Mostly minor stuff in the comments below.
  93. in src/net.h:572 in c3902f3f1d outdated
    539@@ -538,6 +540,9 @@ class V2Transport final : public Transport
    540     const bool m_initiating;
    541     /** NodeId (for debug logging). */
    542     const NodeId m_nodeid;
    543+    /** Whether the send/receive states are V1. This is an optimization allowing fallback to
    544+     *  (typically) work without the m_cs_send or m_cs_recv locks. */
    


    mzumsande commented at 5:58 pm on August 31, 2023:
    name changed: m_recv_mutex / m_send_mutex - same a few lines above for the V1 states.

    sipa commented at 8:50 pm on August 31, 2023:
    Resolved by dropping the commit.
  94. in src/net.h:573 in c3902f3f1d outdated
    539@@ -538,6 +540,9 @@ class V2Transport final : public Transport
    540     const bool m_initiating;
    541     /** NodeId (for debug logging). */
    542     const NodeId m_nodeid;
    543+    /** Whether the send/receive states are V1. This is an optimization allowing fallback to
    544+     *  (typically) work without the m_cs_send or m_cs_recv locks. */
    545+    std::atomic<bool> m_use_v1{false};
    


    mzumsande commented at 6:18 pm on August 31, 2023:
    Does the m_use_v1 optimization result in a meaningful speedup? My first thought was that the V1 functions are so similar to the respective V2 functions in terms of locking (have analogous m_recv_mutex/m_send_mutex locks for analogous functions) so if there would be lock contention at the V2 level that m_use_v1 avoids, we’d now just have the same contention for the analogous locks one level below at V1Transport instead, resulting in no real performance improvement.

    sipa commented at 8:37 pm on August 31, 2023:
    That sounds right; there will really only be contention on one of the two levels, and uncontended mutex grabs are in the 10s of nanonseconds I believe. I’ll just drop this commit.
  95. sipa force-pushed on Aug 31, 2023
  96. sipa commented at 7:21 pm on August 31, 2023: member
    I’ve addressed most of @vasild’s comments above, leading to some substantial changes in the “net: add V2Transport class with subset of BIP324 functionality” commit.
  97. sipa force-pushed on Aug 31, 2023
  98. in src/net.h:572 in 8bd5aa1a10 outdated
    568+    /** NodeId (for debug logging). */
    569+    const NodeId m_nodeid;
    570+    /** Encapsulate a V1Transport to fall back to. */
    571+    V1Transport m_v1_fallback;
    572+    /** V1 prefix to look for (4-byte network magic + "version\x00"; magic will be filled in). */
    573+    std::array<uint8_t, 12> m_v1_prefix = {0, 0, 0, 0, 0x76, 0x65, 0x72, 0x73, 0x69, 0x6f, 0x6e, 0x00};
    


    vasild commented at 1:14 pm on September 1, 2023:

    Maybe this is a bit more readable and easy to verify that it is correct:

    0    std::array<uint8_t, 12> m_v1_prefix = {0, 0, 0, 0, 'v', 'e', 'r', 's', 'i', 'o', 'n', 0x00};
    

    sipa commented at 5:45 pm on September 1, 2023:
    Fixed.
  99. in src/net.cpp:1085 in 8bd5aa1a10 outdated
    1140+    AssertLockNotHeld(m_send_mutex);
    1141+    Assume(m_recv_state == RecvState::KEY_MAYBE_V1);
    1142+    // We still have to determine if this is a v1 or v2 connection. The bytes being received could
    1143+    // be the beginning of either a v1 packet (network magic + "version\x00"), or of a v2 public key.
    1144+    assert(m_recv_buffer.size() <= m_v1_prefix.size());
    1145+    if (!std::equal(m_recv_buffer.begin(), m_recv_buffer.end(), m_v1_prefix.begin())) {
    


    vasild commented at 1:26 pm on September 1, 2023:
    Here we assume that a public key will never start with magic+“version”. Can it? If the pubkey is completely random bytes, then the chance of that happening is 1 in 25612 = 296.

    sipa commented at 5:15 pm on September 1, 2023:
    Indeed, exactly. See also footnote 9 in BIP324.
  100. in src/net.cpp:1053 in 8bd5aa1a10 outdated
    1046+
    1047+void V2Transport::SetSendState(SendState send_state) noexcept
    1048+{
    1049+    AssertLockHeld(m_send_mutex);
    1050+    // No-op if no change is desired.
    1051+    if (send_state == m_send_state) return;
    


    vasild commented at 1:31 pm on September 1, 2023:
    Is that allowed only for KEY_GARB -> KEY_GARB for sender? And no X -> X for the receive state?

    sipa commented at 5:13 pm on September 1, 2023:
    It’s possible to outlaw those X -> X transitions, but I don’t think that accomplishes much. Conceptually, staying in the same state is always allowed; it just so happens that in many cases the current code never invokes this function in cases where no state change is needed.

    sipa commented at 7:14 pm on September 8, 2023:
    For posterity, the ability to transition from a state to itself is gone entirely.
  101. in src/net.cpp:1152 in 8bd5aa1a10 outdated
    1147+        SetReceiveState(RecvState::KEY); // Convert to KEY state, leaving received bytes around.
    1148+        // Transition the sender to KEY_GARB state (if not already).
    1149+        LOCK(m_send_mutex);
    1150+        SetSendState(SendState::KEY_GARB);
    1151+    } else if (m_recv_buffer.size() == m_v1_prefix.size()) {
    1152+        // Full match with the v2 prefix, so fall back to v1 behavior.
    


    vasild commented at 1:34 pm on September 1, 2023:
    0        // Full match with the v1 prefix, so fall back to v1 behavior.
    

    sipa commented at 6:13 pm on September 1, 2023:
    Fixed.
  102. in src/net.cpp:1157 in 8bd5aa1a10 outdated
    1152+        // Full match with the v2 prefix, so fall back to v1 behavior.
    1153+        LOCK(m_send_mutex);
    1154+        Span<const uint8_t> feedback{m_recv_buffer};
    1155+        bool ret = m_v1_fallback.ReceivedBytes(feedback); // Feed already received bytes to v1 transport.
    1156+        assert(feedback.empty());
    1157+        assert(ret);
    


    vasild commented at 2:09 pm on September 1, 2023:

    This looks too dangerous/fragile to me. feedback is constructed from m_recv_buffer which is provided by the peer and if ReceivedBytes() does not like it, then we will crash. Earlier we have checked that its size equals m_v1_prefix.size() and even earlier we have checked that the first bytes of m_recv_buffer and m_v1_prefix equal. Or if m_v1_fallback has been used before, it may dislike a legit magic+“version”.

    Maybe construct feedback from m_v1_prefix which has been constructed by us? I know they are equal, but still… @sdaftuar had some concerns elsewhere about asserts in the networking code: #27374 (review). Following that should all asserts in this PR be replaced by dropping the connection? If there is a rare bug in e.g. V2Transport is it better to crash the node or close the connection and continue operation?


    sipa commented at 7:48 pm on September 1, 2023:

    Maybe construct feedback from m_v1_prefix which has been constructed by us? I know they are equal, but still…

    I think that’s overkill. I’ve added some comments explaining why the v1 fallback should always accept this prefix.

    … concerns elsewhere about asserts in the networking code …

    I have changed all the asserts to Assumes. I didn’t do so initially, because I wasn’t aware that Assumes are treated like assertions in fuzz tests, and wanted to make sure the fuzz tests actually had a chance of triggering them. As that doesn’t seem to be the case, there is little reason to keep asserts.

    I’ve also gone over the code added in this PR, trying to find possible places where actually dangerous (e.g. out of bounds access) behavior is guarded by assertions (as those are worthwhile to keep as assertions in production, crashing is better than possibly remotely-triggerable OOB-based UB), but I can’t actually find any. Nearly code I can find that accesses buffers is in conditional branches that test actually the right buffer sizes beforehand.

    Changing some/all of them to instead disconnect would require a bunch of code changes, and because of the above, I don’t think it’s worth it. Keeping a connection open in a weird state, where it’ll very likely not do anything and time out isn’t all that bad compared to disconnecting in case of a bug (and may actually be easier to notice).

  103. in src/test/fuzz/p2p_transport_serialization.cpp:348 in 8bd5aa1a10 outdated
    343+    key_data.resize(32);
    344+    CKey key;
    345+    key.Set(key_data.begin(), key_data.end(), true);
    346+    if (!key.IsValid()) return {};
    347+    // Construct garbage
    348+    size_t garb_len = provider.ConsumeIntegralInRange<size_t>(0, 4095);
    


    vasild commented at 3:39 pm on September 1, 2023:
    0    size_t garb_len = provider.ConsumeIntegralInRange<size_t>(0, V2Transport::MAX_GARBAGE_LEN);
    

    sipa commented at 6:12 pm on September 1, 2023:
    Fixed.
  104. in src/test/fuzz/p2p_transport_serialization.cpp:364 in 8bd5aa1a10 outdated
    359+    // Retrieve entropy
    360+    auto ent = provider.ConsumeBytes<std::byte>(32);
    361+    ent.resize(32);
    362+    // Use as entropy SHA256(ent || garbage). This prevents a situation where the fuzzer manages to
    363+    // include the garbage terminator (which is a function of both ellswift keys) in the garbage.
    364+    // This is entremely unlikely (~2^-116) with random keys/garbage, but the fuzzer can choose
    


    vasild commented at 3:43 pm on September 1, 2023:
    0    // This is extremely unlikely (~2^-116) with random keys/garbage, but the fuzzer can choose
    

    sipa commented at 6:12 pm on September 1, 2023:
    Fixed.
  105. in src/net.cpp:1346 in 8bd5aa1a10 outdated
    1321+        } else if (m_recv_state == RecvState::GARBAUTH || m_recv_state == RecvState::VERSION ||
    1322+            m_recv_state == RecvState::APP) {
    1323+            // During states where a packet is being received, as much as is expected but never
    1324+            // more than MAX_RESERVE_AHEAD bytes in addition to what is received so far.
    1325+            size_t alloc_add = std::min(max_read, msg_bytes.size() + MAX_RESERVE_AHEAD);
    1326+            m_recv_buffer.reserve(m_recv_buffer.size() + alloc_add);
    


    vasild commented at 3:53 pm on September 1, 2023:

    Why do this? To reduce the amount of m_recv_buffer resizes?

    How did you choose 250000?

    When would max_read be larger than msg_bytes.size() + MAX_RESERVE_AHEAD?


    sipa commented at 5:39 pm on September 1, 2023:

    Why do this? To reduce the amount of m_recv_buffer resizes?

    Most std::vector implementations will double their memory when they resize beyond their bounds. That’s a problem, because without explicitly controlling the preallocation to prevent that it means peers can trivially cause multiple megabytes of allocated but unused memory, without even needing to send bytes to fill them.

    I’ve added some extra comments to the code.

    How did you choose 250000?

    Same as V1Transport (see V1Transport::readData). I just realized it’s 256 * 1024 there, I’ll use the same here.

    When would max_read be larger than msg_bytes.size() + MAX_RESERVE_AHEAD?

    For example, when after receiving the first 1000 bytes of a 2 MB packet. max_read will be ~2000000, but msg_bytes.size() + MAX_RESERVE_AHEAD will be 251000.

  106. in src/test/net_tests.cpp:1282 in 8bd5aa1a10 outdated
    1277+    /** Schedule an encrypted packet with specified message type and payload to be sent to
    1278+     *  transport (only after ReceiveKey). */
    1279+    void SendMessage(std::string mtype, Span<const uint8_t> payload)
    1280+    {
    1281+        // Construct contents consisting of 0x00 + 12-byte message type + payload.
    1282+        std::vector<uint8_t> contents(13 + payload.size());
    


    vasild commented at 4:16 pm on September 1, 2023:
    Would be nice to avoid the magic numbers 12 and 13.

    sipa commented at 6:34 pm on September 1, 2023:
    Fixed (in the code; I’ve left them in the comments as I think making the actual values known is also useful for the reader).
  107. in src/test/net_tests.cpp:1340 in 8bd5aa1a10 outdated
    1321+        tester.SendVersion();
    1322+        ret = tester.Interact();
    1323+        BOOST_CHECK(ret && ret->empty());
    1324+        tester.ReceiveGarbage();
    1325+        tester.ReceiveVersion();
    1326+        auto msg_data_1 = g_insecure_rand_ctx.randbytes<uint8_t>(InsecureRandRange(100000));
    


    vasild commented at 4:19 pm on September 1, 2023:
    Since this unit test uses random numbers, it may happen that if it fails/crashes, then it does not do that if run again. How would one go to reproduce a test failure?

    sipa commented at 7:44 pm on September 1, 2023:
    The test RNG seed is printed out, and can be provided back using the RANDOM_CTX_SEED env variable. The ellswift keys used in the unit test come from the high-quality RNG and not the test RNG though, but I don’t expect those to actually influence the kind of failures this type of test would trigger.

    vasild commented at 2:45 pm on September 4, 2023:

    Right, I see now. To print the seed one has to use -printtoconsole=1:

    0$ ./src/test/test_bitcoin --run_test=net_tests/v2transport_test -- -printtoconsole=1
    1Running 1 test case...
    22023-09-04T13:44:27Z Seed: Setting random seed for current tests to RANDOM_CTX_SEED=e99d422b1c8107ef36cbf5bdb18326ebfbe0e3d311d31f0b5a74fb50ada6d18e
    3...
    

    -printtoconsole=1 seems to not be used on CI: https://cirrus-ci.com/task/5714938337951744?logs=ci#L3390

    What about non-deterministic failure that happens on CI?


    sipa commented at 4:19 pm on September 4, 2023:
    @vasild Heh, that sounds like a problem we may want to fix in general outside this PR.

    vasild commented at 11:47 am on September 6, 2023:

    (offtopic)

    in multiprocess, i686, DEBUG the seed it printed: https://cirrus-ci.com/task/5770424366137344?logs=ci#L3855

    0/bin/bash: line 1: 25146 Aborted                 (core dumped) test/test_bitcoin --catch_system_errors=no -l test_suite -t "$( cat test/net_tests.cpp | grep -E "(BOOST_FIXTURE_TEST_SUITE\\(|BOOST_AUTO_TEST_SUITE\\()" | cut -d '(' -f 2 | cut -d ',' -f 1 | cut -d ')' -f 1 )" -- DEBUG_LOG_OUT > "$TEST_LOGFILE" 2>&1
    1...
    2test/net_tests.cpp(1311): Entering test case "v2transport_test"
    32023-09-06T04:19:04.842541Z [test] [test/util/random.cpp:31] [Seed] Seed: Setting random seed for current tests to RANDOM_CTX_SEED=c36f0537baafdf3bc691d89186094681ace7c42292794ea4d00fa784c3b4957b
    4...
    52023-09-06T04:19:05.182858Z [test] [net.cpp:1237] [ProcessReceivedPacketBytes] [net]net.cpp:1129 ProcessReceivedKeyBytes: Assertion `m_recv_buffer.size() <= OFFSET + MATCH.size()' failed.
    

    In previous releases, qt5 dev package and depends packages, DEBUG it is not: https://cirrus-ci.com/task/4785261947650048?logs=ci#L3042

    0bash -c ' DIR_UNIT_TEST_DATA=/ci_container_base/ci/scratch/qa-assets/unit_test_data/ LD_LIBRARY_PATH=/ci_container_base/depends/x86_64-pc-linux-gnu/lib /ci_container_base/ci/scratch/out/bin/test_bitcoin --catch_system_errors=no -l test_suite'
    1...
    2test/net_tests.cpp(1311): Entering test case "v2transport_test"
    3net.cpp:1129 ProcessReceivedKeyBytes: Assertion `m_recv_buffer.size() <= OFFSET + MATCH.size()' failed.
    4/ci_container_base/ci/test/06_script_b.sh: line 170: 22342 Aborted                 bash -c "${TEST_RUNNER_ENV} DIR_UNIT_TEST_DATA=${DIR_UNIT_TEST_DATA} LD_LIBRARY_PATH=${DEPENDS_DIR}/${HOST}/lib ${BASE_OUTDIR}/bin/test_bitcoin --catch_system_errors=no -l test_suite"
    

    @MarcoFalke do you have an idea if this is intentional? Any downsides to also use DEBUG_LOG_OUT in previous releases, qt5 dev package and depends packages, DEBUG?


    maflcko commented at 12:11 pm on September 6, 2023:
    Yeah, could be an oversight in RUN_UNIT_TESTS_SEQUENTIAL. Should be easy to fix there.

    maflcko commented at 12:13 pm on September 6, 2023:

    The basic idea is that the two files must specify the same config:

    0$ git grep catch_syst src/Makefile.test.include ./ci/
    1ci/test/06_script_b.sh:  bash -c "${TEST_RUNNER_ENV} DIR_UNIT_TEST_DATA=${DIR_UNIT_TEST_DATA} LD_LIBRARY_PATH=${DEPENDS_DIR}/${HOST}/lib ${BASE_OUTDIR}/bin/test_bitcoin --catch_system_errors=no -l test_suite"
    2src/Makefile.test.include:      $(TEST_BINARY) --catch_system_errors=no -l test_suite -t "$$(\
    

    sipa commented at 8:38 pm on September 8, 2023:
    In #28433 I’m making part of the unit tests’ key material be generated by the test RNG (as a side-effect of other changes, but it’s relevant here).

    maflcko commented at 3:17 pm on September 10, 2023:

    The basic idea is that the two files must specify the same config:

    FWIW, I won’t be working on this fix. Maybe something to keep in mind for the cmake migration and fix there?


    vasild commented at 7:50 am on September 13, 2023:
    Opened #28466 so that this does not get forgotten.
  108. vasild approved
  109. vasild commented at 4:22 pm on September 1, 2023: contributor

    ACK 8bd5aa1a1084c5f4c34cba85506661338e8e91ea (I need to re-review the unit test from the last commit more carefully)

    Non-blocker comments below.

  110. DrahtBot requested review from vincenzopalazzo on Sep 1, 2023
  111. DrahtBot removed review request from vincenzopalazzo on Sep 1, 2023
  112. DrahtBot requested review from vincenzopalazzo on Sep 1, 2023
  113. in src/net.h:636 in fb3ec8fd9e outdated
    582+
    583+public:
    584+    static constexpr uint32_t MAX_GARBAGE_LEN = 4095;
    585+
    586+    /** Construct a V2 transport with securely generated random keys. */
    587+    V2Transport(NodeId nodeid, bool initiating, int type_in, int version_in) noexcept;
    


    Sjors commented at 6:10 pm on September 1, 2023:

    fb3ec8fd9e96f6c2d16db02fd2d683ea097a90ff: Maybe repeat here, in the public interface, that initiating means Whether we are the initiator side..

    I’m a bit confused about the other two arguments. These refer to our internal DataStream types and versions? Not something related to the v2 protocol?


    sipa commented at 7:31 pm on September 1, 2023:

    I’ve added doxygen comments about the parameters.

    The type/version thing will hopefully go away everywhere soon (see #25284).

  114. in src/net.h:425 in fb3ec8fd9e outdated
    418@@ -417,6 +419,187 @@ class V1Transport final : public Transport
    419     size_t GetSendMemoryUsage() const noexcept override EXCLUSIVE_LOCKS_REQUIRED(!m_send_mutex);
    420 };
    421 
    422+class V2Transport final : public Transport
    423+{
    424+private:
    425+    /** Contents of the version packet to send. BIP324 stipulates this is supposed to be empty,
    


    Sjors commented at 6:27 pm on September 1, 2023:
    fb3ec8fd9e96f6c2d16db02fd2d683ea097a90ff: stipulates that senders should leave this empty, and receivers should ignore it.

    sipa commented at 7:36 pm on September 1, 2023:
    Done; that’s more concise.
  115. in src/net.h:507 in fb3ec8fd9e outdated
    502+        /** Public key.
    503+         *
    504+         * This is the initial state. The public key is sent out. When the receiver
    505+         * receives the other side's public key and transitions to GARB_GARBTERM, the sender state
    506+         * becomes KEY_GARBTERM_GARBAUTH_VERSION. The key is left in the send buffer when this
    507+         * happens, because it may not have been fully sent out yet. */
    


    Sjors commented at 7:02 pm on September 1, 2023:
    fb3ec8fd9e96f6c2d16db02fd2d683ea097a90ff: why not remain in the KEY state until we’ve completely sent out the public key? And only then transition to GARBTERM_GARBAUTH_VERSION?

    sipa commented at 7:38 pm on September 1, 2023:
    There’s nothing wrong with that approach, but it’s a bit more complex I think. Because it’d mean the transition can happen under two possible events (the last bytes of the peer’s key are received while our key is sent already, and the last bytes of our key get sent while all bytes of the peer’s key are received already).

    Sjors commented at 3:00 pm on September 4, 2023:

    I found the description quite hard to understand. Conceptually it seems easier to understand “We sent our key AND we have their key”, i.e. it’s waiting for both to finish.

    Though I might be understanding that wrong, given #28196 (comment)


    sipa commented at 4:10 pm on September 4, 2023:

    @Sjors I’ve coded it up (removing the key and garbage from the send buffer after sending them, so the next state only has garbage terminator, garbage authentication, version): https://github.com/sipa/bitcoin/commits/202309_bip324_split_sendstate. Having done so, the code complexity increase isn’t too bad, but there is one significant other downside I had not anticipated: it makes GetBytesToSend and MarkBytesSent additionally grab the receive lock (instead of just the send lock). On that basis alone, I think the current approach is better.

    Though I might be understanding that wrong, given #28196 (comment)

    Perhaps the difficulty lies here: the garbage is sent before the other side’s key is received, but the garbage terminator (and authentication) are only sent afterwards. At the time the garbage is sent (after sending our key, but potentially before receiving the other side’s key), the encryption ciphers are not initialized yet, and thus a decoy packet can’t be sent yet.


    Sjors commented at 5:30 pm on September 4, 2023:
    Perhaps a more natural border would be KEY_GARB -> GARBTERM_GARBAUTH_VERSION then? But that sounds like what you implemented in the branch. Maybe just leave it as is, I might revisit this after I’ve looked at more of the code.

    sipa commented at 6:16 pm on September 4, 2023:
    Right, that’s exactly what my branch there does: GARBTERM_GARBAUTH_VERSION instead of KEY_GARB_GARBTERM_GARBAUTH_VERSION. The issue is that to determine if the transition can be made when sending out the last of KEY_GARB there we need to check if the other side’s key has been received already, necessitating m_recv_mutex in MarkBytesSent (and in GetBytesToSend for the more result; the latter being something I only noticed because the fuzz test failed without).

    Sjors commented at 8:05 pm on September 4, 2023:
    I suppose the big conceptual difference between our send and receive state machines is that the latter has a clearly defined moment when the key exchange is complete, i.e. once it received the public key bytes. That’s when it performs a Diff-Hellman exchange and sets m_cipher. From the send state machine point of view this happens in the background at some unpredictable point. So it does make sense that’s represented by a state transition there (to KEY_GARBTERM_GARBAUTH_VERSION).

    sipa commented at 5:29 pm on September 5, 2023:
    Marking this as addressed because of #28196 (comment)
  116. Sjors commented at 7:06 pm on September 1, 2023: member

    Studied the state machine a bit, which makes sense to me, except the KEY -> KEY_GARBTERM_GARBAUTH_VERSION transition.

    Running the code (using the main PR). I also looked at the fallback mechanism.

    General protocol question: why are garbage and decoy packets distinct concepts in BIP324? That is, why can’t garbage just be a single instance of a decoy package?

  117. DrahtBot removed review request from vincenzopalazzo on Sep 1, 2023
  118. DrahtBot requested review from vincenzopalazzo on Sep 1, 2023
  119. DrahtBot removed review request from vincenzopalazzo on Sep 1, 2023
  120. DrahtBot requested review from vincenzopalazzo on Sep 1, 2023
  121. DrahtBot removed review request from vincenzopalazzo on Sep 1, 2023
  122. DrahtBot requested review from vincenzopalazzo on Sep 1, 2023
  123. sipa force-pushed on Sep 1, 2023
  124. sipa commented at 7:53 pm on September 1, 2023: member

    Addressed a number of comments by @vasild and @Sjors.

    I’ve also removed the Erlay messages from the short message id list, after observing that having them there makes no difference: if peers send us these, the messages will be ignored, whether they’re in the list or not. After discussing with @mzumsande I think it may also be better to remove them from the BIP for now, but it’s nice to realize that they can be dropped from the code regardless of what happens to the BIP.

  125. sipa commented at 7:57 pm on September 1, 2023: member

    @Sjors

    General protocol question: why are garbage and decoy packets distinct concepts in BIP324? That is, why can’t garbage just be a single instance of a decoy package?

    Good question. The answer is that decoy messages can’t be used yet when garbage is sent, because the keys haven’t been exchanged yet, so packets in general are not available yet. See also BIP324 footnote 8.

  126. in src/net.cpp:1046 in ab9d292e3f outdated
    1041+    Assume(m_recv_state == RecvState::KEY);
    1042+    if (m_recv_buffer.size() == EllSwiftPubKey::size()) {
    1043+        // Other side's key has been fully received.
    1044+
    1045+        // Initialize the ciphers.
    1046+        EllSwiftPubKey ellswift(MakeByteSpan(m_recv_buffer).first(EllSwiftPubKey::size()));
    


    theStack commented at 2:05 am on September 2, 2023:

    nit: as m_recv_buffer has exactly the size of an ellswift pubkey at this point (ensured by the if a few lines above), the .first call isn’t needed:

    0        EllSwiftPubKey ellswift(MakeByteSpan(m_recv_buffer));
    

    sipa commented at 3:39 am on September 2, 2023:
    Nice, done.
  127. in src/net.cpp:1246 in ab9d292e3f outdated
    1131+        if (!ret) {
    1132+            LogPrint(BCLog::NET, "V2 transport error: packet decryption failure (%u bytes), peer=%d\n", m_recv_len, m_nodeid);
    1133+            return false;
    1134+        }
    1135+        // Feed the last 4 bytes of the Poly1305 authentication tag (and its timing) into our RNG.
    1136+        RandAddEvent(ReadLE32(m_recv_buffer.data() + m_recv_buffer.size() - 4));
    


    theStack commented at 2:06 am on September 2, 2023:

    ~nit:~

    0        RandAddEvent(ReadLE32(m_recv_buffer.end() - 4));
    

    (EDIT: oh nevermind, end() returns an iterator, not a pointer, so this won’t work.)


    sipa commented at 3:39 am on September 2, 2023:
    Sadly, indeed.
  128. in src/net.cpp:1479 in ab9d292e3f outdated
    1244+    Assume(m_recv_state == RecvState::APP_READY);
    1245+    Span<const uint8_t> contents{m_recv_decode_buffer};
    1246+    auto msg_type = GetMessageType(contents);
    1247+    CDataStream ret(m_recv_type, m_recv_version);
    1248+    CNetMessage msg{std::move(ret)};
    1249+    msg.m_raw_message_size = m_recv_decode_buffer.size() + BIP324Cipher::EXPANSION;
    


    theStack commented at 2:19 am on September 2, 2023:
    Should the size of the 3-byte length descriptor also be added here? (at least that would come close to what happens for v1, where HEADER_SIZE is accounted for).

    sipa commented at 3:39 am on September 2, 2023:
    That’s included in EXPANSION. I’ve added a comment to clarify that.
  129. in src/net.h:425 in ab9d292e3f outdated
    418@@ -417,6 +419,194 @@ class V1Transport final : public Transport
    419     size_t GetSendMemoryUsage() const noexcept override EXCLUSIVE_LOCKS_REQUIRED(!m_send_mutex);
    420 };
    421 
    422+class V2Transport final : public Transport
    423+{
    424+private:
    425+    /** Contents of the version packet to send. BIP324 that senders should leave this empty, and
    


    theStack commented at 2:24 am on September 2, 2023:

    missing verb after “BIP324”:

    0    /** Contents of the version packet to send. BIP324 stipulates that senders should leave this empty, and
    

    sipa commented at 3:39 am on September 2, 2023:
    Fixed.
  130. in src/net.cpp:974 in ab9d292e3f outdated
    969+{
    970+    AssertLockHeld(m_send_mutex);
    971+    // No-op if no change is desired.
    972+    if (send_state == m_send_state) return;
    973+    // Enforce allowed state transitions.
    974+    switch (send_state) {
    


    theStack commented at 2:31 am on September 2, 2023:
    nit: in Set{Send,Receive}State, at least for my brain the state transition checks would be slightly easier to grasp if the current state is switch/case-d and the allowed next state(s) are Assumed (i.e. asserting “where can I go to?” rather than “where could I have come from?”).

    sipa commented at 3:40 am on September 2, 2023:
    Agreed, changed. It’s also a bit simpler now actually.
  131. theStack commented at 2:41 am on September 2, 2023: contributor
    Reviewed up to 4810d69306ee214063386f9bfd40f89525482e8b so far, left some non-blocking nits on the way.
  132. sipa force-pushed on Sep 2, 2023
  133. maflcko added the label P2P on Sep 4, 2023
  134. in src/net.h:320 in d67be9ad38 outdated
    319-     * to what is being returned.
    320+     * @param[in] have_next_message controls whether the "more" return value indicates more bytes
    321+     *            to be sent before (have_next_message=false) or after (have_next_message=true) a
    322+     *            potential SetMessageToSend immediately afterwards. It is set by the caller when
    323+     *            they know they have another message ready to send. The have_next_message
    324+     *            argument only affects this "more" return value and nothing else.
    


    Sjors commented at 6:04 pm on September 4, 2023:

    d67be9ad3836bd19cc8ea71a0b99c7d60682dd32: alternative explanation, which I think is more clear (but perhaps wrong):

    have_next_message: the caller knows it has another message ready to send. The have_next_message argument (only) affects the “more” return value, which is used to set the MSG_MORE TCP flag and to trigger optimistic send. With v2 transport more can be false even if vSendMsg has more messages left. This happens before the handshake is complete.

    I found this part confusing: “indicates more bytes to be sent before (have_next_message=false) or after (have_next_message=true) a potential SetMessageToSend immediately afterwards”


    sipa commented at 10:26 pm on September 4, 2023:
    I’ve rephrased things a bit, but I don’t like documenting what the caller will be doing with that value, that’s not what the interface cares about.
  135. in src/net.cpp:3609 in d67be9ad38 outdated
    3012@@ -3006,7 +3013,7 @@ void CConnman::PushMessage(CNode* pnode, CSerializedNetMsg&& msg)
    3013     size_t nBytesSent = 0;
    3014     {
    3015         LOCK(pnode->cs_vSend);
    3016-        const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
    3017+        const auto& [to_send, more, _msg_type] = pnode->m_transport->GetBytesToSend(true);
    


    Sjors commented at 6:13 pm on September 4, 2023:

    d67be9ad3836bd19cc8ea71a0b99c7d60682dd32 maybe add a comment:

    0// Check if the transport still has unsent bytes,
    1// and indicate to it that we're about to give it a message to send.
    

    /*have_next_message=*/true


    sipa commented at 10:26 pm on September 4, 2023:
    Done.
  136. in src/net.cpp:3695 in d67be9ad38 outdated
    3026+        // If there was nothing to send before, and there is now (predicted by the "more" value
    3027+        // returned by the GetBytesToSend call above), attempt "optimistic write":
    3028         // because the poll/select loop may pause for SELECT_TIMEOUT_MILLISECONDS before actually
    3029         // doing a send, try sending from the calling thread if the queue was empty before.
    3030-        if (queue_was_empty) {
    3031+        if (queue_was_empty && more) {
    


    Sjors commented at 6:19 pm on September 4, 2023:
    d67be9ad3836bd19cc8ea71a0b99c7d60682dd32: an alternative approach could be to have the v2 transport tell us that it’s blocked (in the pre-handshake scenario): && !blocked.

    sipa commented at 6:49 pm on September 4, 2023:

    So the alternative I considered here is to make GetBytesToSend return a tristate “more”:

    • definitely: the transport itself has more to send after what it’s returning now
    • maybe: the transport has nothing to send after what it’s returning now, but will if you give it another message
    • definitely not: nothing can be sent after what it’s returning now, even if there were another message.

    But I believe that this does not result in simpler code (neither inside GetBytesToSend, nor in SocketSendBytes).


    sipa commented at 3:43 am on September 6, 2023:
    I’ve added an explanation to the GetBytesToSend comment in the have_next_message commit that explains the behavior in terms of a tristate, as it may help some readers who have not followed the discussion here.
  137. in src/net.cpp:1994 in d67be9ad38 outdated
    1332-            const auto& [to_send, _more, _msg_type] = pnode->m_transport->GetBytesToSend();
    1333-            select_send = !to_send.empty() || !pnode->vSendMsg.empty();
    1334+            // once a potential message from vSendMsg is handed to the transport. GetBytesToSend
    1335+            // determines both of these in a single call.
    1336+            const auto& [to_send, more, _msg_type] = pnode->m_transport->GetBytesToSend(!pnode->vSendMsg.empty());
    1337+            select_send = !to_send.empty() || more;
    


    Sjors commented at 6:35 pm on September 4, 2023:

    d67be9ad3836bd19cc8ea71a0b99c7d60682dd32: if we were to switch to an approach of asking if the transport is blocked, this would change to:

    0select_send = !to_send.empty() || (!blocked && !pnode->vSendMsg.empty());
    

    Not sure if that’s really more clear.

  138. Sjors commented at 6:41 pm on September 4, 2023: member
    Some thoughts on d67be9ad3836bd19cc8ea71a0b99c7d60682dd32. It might make sense to directly track if we’re blocked from sending pending the handshake, instead of the indirect approach of setting more. But I’m not convinced that would make things easier.
  139. in src/test/net_tests.cpp:1016 in 59f1dd1395 outdated
    1014+/** A class for scenario-based tests of V2Transport
    1015+ *
    1016+ * Each V2TransportTester encapsulates a V2Transport (the one being tested), and can be told to
    1017+ * interact with it. To do so, it also encapsulates a BIP324Cipher to act as the other side. A
    1018+ * second V2Transport is not used, as doing so would not permit scenarios that involve sending
    1019+ * invalid data, or ones scenarios using BIP324 features that are not implemented on the sending
    


    mzumsande commented at 7:50 pm on September 4, 2023:
    typo: either “ones” or “scenarios”

    sipa commented at 7:29 pm on September 8, 2023:
    Done in #28433.
  140. in src/net.cpp:1116 in 25b0668c72 outdated
    1033+
    1034+void V2Transport::ProcessReceivedKey() noexcept
    1035+{
    1036+    AssertLockHeld(m_recv_mutex);
    1037+    AssertLockNotHeld(m_send_mutex);
    1038+    Assume(m_recv_state == RecvState::KEY);
    


    Sjors commented at 7:56 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: maybe also Assume(m_recv_buffer.size() <= EllSwiftPubKey::size())

    sipa commented at 10:26 pm on September 4, 2023:
    Sure, done.
  141. in src/net.cpp:1163 in 25b0668c72 outdated
    1035+{
    1036+    AssertLockHeld(m_recv_mutex);
    1037+    AssertLockNotHeld(m_send_mutex);
    1038+    Assume(m_recv_state == RecvState::KEY);
    1039+    if (m_recv_buffer.size() == EllSwiftPubKey::size()) {
    1040+        // Other side's key has been fully received.
    


    Sjors commented at 7:59 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c // , and can now be Diffie–Hellman combined with our key.

    sipa commented at 10:26 pm on September 4, 2023:
    Done.
  142. in src/net.cpp:995 in 25b0668c72 outdated
    918+    m_recv_state{RecvState::KEY},
    919+    m_send_state{SendState::KEY}
    920+{
    921+    // Initialize the send buffer with ellswift pubkey.
    922+    m_send_buffer.resize(EllSwiftPubKey::size());
    923+    std::copy(std::begin(m_cipher.GetOurPubKey()), std::end(m_cipher.GetOurPubKey()), MakeWritableByteSpan(m_send_buffer).begin());
    


    Sjors commented at 8:18 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: is this the right moment to set garbage to send? Since we might receive the other public key any time, at which point the receiver sets the garbage terminator.

    sipa commented at 2:54 pm on September 5, 2023:
    I think so. For responders, the setting of garbage could be delayed until leaving the MAYBE_V1 state, but it’s more convenient to just put it there at initialization time (otherwise we need to buffer it somewhere else when provided for testing purposes, necessitating a separate field etc.).
  143. in src/net.cpp:1147 in 25b0668c72 outdated
    1045+        m_cipher.Initialize(ellswift, m_initiating);
    1046+
    1047+        // Switch receiver state to GARB_GARBTERM.
    1048+        SetReceiveState(RecvState::GARB_GARBTERM);
    1049+        m_recv_buffer.clear();
    1050+
    


    Sjors commented at 8:21 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: this is the very last possible moment to insert some more garbage, but it seems better to have done that at initialisation time.

    sipa commented at 2:55 pm on September 5, 2023:
    No, this is too late. Garbage needs to be sent immediately after sending the key, and possibly before the other side’s key arrives. Its purpose is obscuring the “always sending 64 bytes exactly” pattern. After the other side’s key has been received, we can use decoy packets for traffic shaping (which is a cheaper and more flexible mechanism, but isn’t available before the ciphers are initialized).
  144. in src/net.cpp:1195 in 25b0668c72 outdated
    1085+            m_recv_buffer.clear();
    1086+            SetReceiveState(RecvState::GARBAUTH);
    1087+        } else if (m_recv_buffer.size() == MAX_GARBAGE_LEN + BIP324Cipher::GARBAGE_TERMINATOR_LEN) {
    1088+            // We've reached the maximum length for garbage + garbage terminator, and the
    1089+            // terminator still does not match. Abort.
    1090+            LogPrint(BCLog::NET, "V2 transport error: missing garbage terminator, peer=%d\n", m_nodeid);
    


    Sjors commented at 8:24 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: could add fLogIPs handling, but that can wait for a followup.

    sipa commented at 3:13 pm on September 5, 2023:
    Want to suggest some code? I’ve tried to mimick the amount of logging the V1Transport has. If it’s not V2-specific, it’s probably better for a follow-up that adds it to all transports.

    Sjors commented at 7:48 pm on September 7, 2023:

    I used this in #27826, but it requires having access to the whole CNode&, not just m_nodeid:

    0fLogIPs ? strprintf(" peeraddr=%s", peer.addr.ToStringAddrPort()) : ""
    

    Can wait for followup (and at some point should be a helper function).

  145. in src/net.cpp:1232 in 25b0668c72 outdated
    1100+    Assume(m_recv_state == RecvState::GARBAUTH || m_recv_state == RecvState::VERSION ||
    1101+           m_recv_state == RecvState::APP);
    1102+
    1103+    // The maximum permitted contents length for a packet.
    1104+    static constexpr size_t MAX_CONTENTS_LEN =
    1105+        1 + CMessageHeader::COMMAND_SIZE + // The maximum length for encoding the message type.
    


    Sjors commented at 8:29 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: what does 1 refer to? The short encoding byte?

    sipa commented at 2:57 pm on September 5, 2023:
    It refers to the 0x00 that precedes the long encoding.

    sipa commented at 3:13 pm on September 5, 2023:
    I’ve expanded the comment.
  146. in src/net.cpp:1104 in 25b0668c72 outdated
    1011+        return 1;
    1012+    case RecvState::GARBAUTH:
    1013+    case RecvState::VERSION:
    1014+    case RecvState::APP:
    1015+        // These three states all involve decoding a packet. Process the length descriptor first,
    1016+        // followed by the ciphertext.
    


    Sjors commented at 8:33 pm on September 4, 2023:

    25b0668c72f762df474c29e678e4baf69cff506c

    0// Process the length descriptor first, so we don't process
    1// bytes that belong to the next message or a decoy. Then
    2// process the known-length ciphertext.
    

    sipa commented at 5:09 pm on September 5, 2023:
    I’ve expanded this comment a bit.
  147. sipa commented at 8:33 pm on September 4, 2023: member

    The recent discussion with @Sjors here made me realize that the send state machine is actually overkill, and could be replaced with a single boolean: are the ciphers initialized. It starts off false, gets set to true when the other side’s key arrives. Whenever the send buffer is fully sent, it is wiped (without state change). A new message can be provided whenever the send buffer is empty and the keys are initialized. Stuff just gets concatenated to whatever is in the send buffer. There is no need for states that keep track of what’s in there - we don’t actually care about that.

    EDIT: the above works, until the V1 fallback detection is added, which complicates things enough that some kind of state machine on the sender side seems better. Going to leave things as-is for now.

  148. in src/net.cpp:1261 in 25b0668c72 outdated
    1143+            break;
    1144+        case RecvState::VERSION:
    1145+            if (!ignore) {
    1146+                // Version message received; transition to application phase. The contents is
    1147+                // ignored, but can be used for future extensions.
    1148+                SetReceiveState(RecvState::APP);
    


    Sjors commented at 8:42 pm on September 4, 2023:
    25b0668c72f762df474c29e678e4baf69cff506c: so what happens if we receive a version message with ignore set? We just get stuck? (unless they send it again with ignore unset) Or should we return false in that case (resulting in a disconnect)?

    sipa commented at 2:58 pm on September 5, 2023:
    Well, if it has the ignore bit set, it is not the version packet but a decoy. So in this case, the peer simply hasn’t sent us the version message yet, and we still have to wait for it. It is identical to them not having sent anything at all: in both cases we need to wait until they do.
  149. sipa force-pushed on Sep 4, 2023
  150. sipa commented at 9:53 pm on September 4, 2023: member

    Ok, in the light of the above I have made some changes still. The former KEY_GARB_GARBTERM_GARBAUTH_VERSION, APP, and APP_READY states have all been merged into a READY state, and some other states have been renamed too.

    My biggest takeaway here is really that the send state shouldn’t (try to) correspond to the contents of what’s in the send buffer, but to what can be added to it and sent from it.

  151. sipa force-pushed on Sep 4, 2023
  152. sipa force-pushed on Sep 5, 2023
  153. in src/net.cpp:3695 in c51369a186 outdated
    3029+        // If there was nothing to send before, and there is now (predicted by the "more" value
    3030+        // returned by the GetBytesToSend call above), attempt "optimistic write":
    3031         // because the poll/select loop may pause for SELECT_TIMEOUT_MILLISECONDS before actually
    3032         // doing a send, try sending from the calling thread if the queue was empty before.
    3033-        if (queue_was_empty) {
    3034+        if (queue_was_empty && more) {
    


    naumenkogs commented at 8:16 am on September 5, 2023:

    c51369a186e563fb83452fed055d9f28775c644d

    Isn’t more always gonna be true here? At least in this commit :)


    sipa commented at 3:10 pm on September 5, 2023:
    Indeed it is. I’ve added a comment.
  154. in src/net.cpp:1204 in 5cb433948b outdated
    1085+            // terminator still does not match. Abort.
    1086+            LogPrint(BCLog::NET, "V2 transport error: missing garbage terminator, peer=%d\n", m_nodeid);
    1087+            return false;
    1088+        }
    1089+    }
    1090+    return true;
    


    naumenkogs commented at 8:31 am on September 5, 2023:
    5cb433948b050913d22914905c7db510e9ef48d2 What would return true mean here? What if it happens?

    sipa commented at 12:30 pm on September 5, 2023:
    That would mean there is more garbage/garbage terminator to receive still. Perhaps the name of the function is confusing; it doesn’t mean all the garbage has been received, it’s just processing received garbage bytes.

    sipa commented at 3:10 pm on September 5, 2023:
    I’ve added (empty) branches for these cases with comments.
  155. in src/net.cpp:1228 in 5cb433948b outdated
    1106+        m_recv_len = m_cipher.DecryptLength(MakeByteSpan(m_recv_buffer));
    1107+        if (m_recv_len > MAX_CONTENTS_LEN) {
    1108+            LogPrint(BCLog::NET, "V2 transport error: packet too large (%u bytes), peer=%d\n", m_recv_len, m_nodeid);
    1109+            return false;
    1110+        }
    1111+    } else if (m_recv_buffer.size() > BIP324Cipher::LENGTH_LEN && m_recv_buffer.size() == m_recv_len + BIP324Cipher::EXPANSION) {
    


    naumenkogs commented at 8:46 am on September 5, 2023:

    5cb433948b050913d22914905c7db510e9ef48d2

    could both branches be skipped? What would that indicate then?


    sipa commented at 12:31 pm on September 5, 2023:
    It would mean the packet is not fully received yet.

    sipa commented at 3:10 pm on September 5, 2023:
    I’ve added empty branches with comments.
  156. in src/net.h:487 in 5cb433948b outdated
    450+         * becomes GARB_GARBTERM. */
    451+        KEY,
    452+
    453+        /** Garbage and garbage terminator.
    454+         *
    455+         * Whenever a byte is received, the last 16 bytes are compared with the expected garbage
    


    naumenkogs commented at 9:01 am on September 5, 2023:

    5cb433948b050913d22914905c7db510e9ef48d2 Should be extra careful with not keeping 4kb in memory waiting here i guess… After staring at the code for a few minutes I struggle to find this logic :) So perhaps worth adding a comment for it?

    This question possibly applies to the whole transport thing. Or maybe I’m just confused with how it goes through spans in the first place…


    vasild commented at 11:57 am on September 5, 2023:
    Later communication is subject to ping timeout, e.g. if the peer starts sending us a new block, sends 1MB and stops any sends to us, the peer will eventually be disconnected due to ping timeout. I guess this handshake should be subject to the same. Is it? I guess that is in some later PR which will engage the V2 transport.

    sipa commented at 12:36 pm on September 5, 2023:

    Or maybe I’m just confused with how it goes through spans in the first place…

    The overall logic is that when bytes are received, they are given (as a Span) to ReceivedBytes. That function then processes these bytes in a loop, consisting of:

    • Calling GetMaxBytesToProcess
    • Chopping (up to) that number of bytes from the input Span, and concatenating them to m_recv_buffer.
    • Calling the appropriate function to process whatever is in m_recv_buffer, depending on the state:
      • ProcessReceivedKey if they are ellswift pubkey bytes
      • ProcessReceivedGarbage if they are garbage + garbage terminator
      • ProcessReceivedPacket if they are packet bytes (whether from the garbage authentication packet, version packet, or an application packet).

    I’m happy to make any changes to make this clearer. Suggestions:

    • Adding the above description as a comment in ReceivedBytes
    • Inlining GetMaxBytesToProcess in ReceivedBytes
    • Renaming the ProcessReceived* functions to ProcessReceived*Bytes

    What do you think would help?


    sipa commented at 3:12 pm on September 5, 2023:

    @naumenkogs

    I’ve renamed the functions to ProcessReceived*Bytes, and added comments. Please have a look.

    Regarding the 4 kB, it’s wiped in ProcessReceivedGarbageBytes now: when the full garbage + terminator has been processed, m_recv_buffer.clear(); is called. @vasild

    That’s all done by CNode and the higher-level network receive logic, not the transport. There is nothing V2-specific about it; if we don’t receive stuff for a while, the connection will be considered inactive and closed, regardless of what transport is in use.


    naumenkogs commented at 6:18 am on September 6, 2023:

    Regarding the 4 kB, it’s wiped in ProcessReceivedGarbageBytes now: when the full garbage + terminator has been processed, m_recv_buffer.clear(); is called.

    What will happen if we’re at 3.99 Kb of garbage and no terminator? Am i right that the caller would be responsible for terminating this after a timeout?


    sipa commented at 11:37 am on September 6, 2023:
    Yes, if the connection times out, the CNode along with its m_transport member is automatically deleted. This is no different than with V1 transports today (which also have a send and receive buffer that uses memory until deleted).
  157. in src/net.cpp:1094 in a69b92b7c2 outdated
    1067+        SetReceiveState(RecvState::KEY); // Convert to KEY state, leaving received bytes around.
    1068+        // Transition the sender to AWAITING_KEY state (if not already).
    1069+        LOCK(m_send_mutex);
    1070+        SetSendState(SendState::AWAITING_KEY);
    1071+    } else if (m_recv_buffer.size() == m_v1_prefix.size()) {
    1072+        // Full match with the v1 prefix, so fall back to v1 behavior.
    


    naumenkogs commented at 11:45 am on September 5, 2023:

    a69b92b7c2faf0f301e88e5c6e9eef8f8dbab24a

    Why not handle a partial match too? At least a comment about how it’s resolved at the caller site.


    sipa commented at 12:37 pm on September 5, 2023:
    Similar to your other comments on these functions, it means not enough has been received, and it’ll be dealt with on a future call to this function, when either there is a full match, or a definite mismatch.

    sipa commented at 3:12 pm on September 5, 2023:
    I’ve added empty branches with comments.
  158. in src/net.cpp:1309 in 256eb4ba2f outdated
    1304 
    1305     while (!msg_bytes.empty()) {
    1306         // Decide how many bytes to copy from msg_bytes to m_recv_buffer.
    1307         size_t max_read = GetMaxBytesToProcess();
    1308+        // Reserve space in the buffer.
    1309+        if (m_recv_state == RecvState::KEY_MAYBE_V1 || m_recv_state == RecvState::KEY ||
    


    naumenkogs commented at 11:54 am on September 5, 2023:

    256eb4ba2f6e3827ec5920a5ed4616636fd59310

    nit: switch will look better here i think? And it would also force to do something about the extra state (like in a potential else here)


    sipa commented at 3:12 pm on September 5, 2023:
    I’ve changed it to a switch case.
  159. naumenkogs commented at 12:01 pm on September 5, 2023: member

    Reviewed everything 4f1900596940c7a2c7f641f2ffaa4ef51aded97a, see comments.

    As for the unit tests, it would be nice to know what other coverage could be useful here. In the form of a todo or something. Or do you consider it close to complete?

  160. sipa commented at 12:44 pm on September 5, 2023: member

    As for the unit tests, it would be nice to know what other coverage could be useful here. In the form of a todo or something. Or do you consider it close to complete?

    I consider the fuzz tests (p2p_transport_bidirectional_v2 and p2p_transport_bidirectional_v1v2) to be the most important tests in this PR, as they test the direct functional requirement we have: can V2Transports talk to other V2Transports and V1Transports, if everyone is honest. The unit tests are there to cover the cases that are not testable by the fuzz test (like making sure errors cause disconnect, and receiving-side handling of BIP324 aspects that are unimplemented on the sender side like decoy packets).

    So, yes, I consider this close to complete - but I’m obviously happy to hear ideas for more tests. A lot more becomes testable once integrated though (#28331), as functional tests aren’t really possible before.

  161. in src/net.cpp:1440 in 4f19005969 outdated
    1435+{
    1436+    AssertLockNotHeld(m_send_mutex);
    1437+    LOCK(m_send_mutex);
    1438+    if (m_send_state == SendState::V1) return m_v1_fallback.SetMessageToSend(msg);
    1439+    if (m_send_state != SendState::READY) return false;
    1440+    if (!m_send_buffer.empty()) return false;
    


    vasild commented at 1:30 pm on September 5, 2023:

    nit: both conditions can be combined, they are not terribly long:

    0    if (m_send_state != SendState::READY || !m_send_buffer.empty()) return false;
    

    sipa commented at 3:12 pm on September 5, 2023:
    Done. Also added a comment.
  162. in src/test/net_tests.cpp:1317 in 4f19005969 outdated
    1312+{
    1313+    // A mostly normal scenario, testing a transport in initiator mode.
    1314+    for (int i = 0; i < 10; ++i) {
    1315+        V2TransportTester tester(true);
    1316+        auto ret = tester.Interact();
    1317+        BOOST_CHECK(ret && ret->empty());
    


    vasild commented at 1:44 pm on September 5, 2023:
    I changed something in the test (to make it fail deliberately) and it caused a flood of error reports from boost. I am not sure how useful is this. Does it make sense to continue the test if one of these checks fails? If not, then consider replacing BOOST_CHECK() with BOOST_REQUIRE() which would stop the test on the first failure.

    sipa commented at 3:13 pm on September 5, 2023:
    I’ve changed some of the BOOST_CHECKs into BOOST_REQUIREs.
  163. vasild approved
  164. vasild commented at 1:45 pm on September 5, 2023: contributor

    ACK 4f1900596940c7a2c7f641f2ffaa4ef51aded97a

    Would be happy to re-review if further improvements arise from the discussions :)

  165. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  166. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  167. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  168. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  169. Sjors commented at 2:00 pm on September 5, 2023: member
    (midway review)
  170. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  171. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  172. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  173. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  174. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  175. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  176. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  177. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  178. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  179. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  180. DrahtBot removed review request from vincenzopalazzo on Sep 5, 2023
  181. DrahtBot requested review from vincenzopalazzo on Sep 5, 2023
  182. sipa force-pushed on Sep 5, 2023
  183. in src/net.cpp:1335 in 70298caec3 outdated
    1330+        case RecvState::KEY_MAYBE_V1:
    1331+        case RecvState::KEY:
    1332+        case RecvState::GARB_GARBTERM:
    1333+            // During the initial states (key/garbage), allocate once to fit the maximum (4111
    1334+            // bytes).
    1335+            m_recv_buffer.reserve(MAX_GARBAGE_LEN + BIP324Cipher::GARBAGE_TERMINATOR_LEN);
    


    theStack commented at 4:01 pm on September 5, 2023:

    Is it intended that those .reserve methods could potentially be called multiple times for the same receive state? Maybe it’s more a theoretical thought, but let’s say we are in RecvState::KEY_MAYBE_V1 and for whatever reason (e.g. slow connection), the key is received in multiple small chunks of 16 bytes, i.e. V2Transport::ReceivedBytes is called four times with spans of size 16. This line would then also be executed 4 times, as the receive state never changed in-between.

    To phrase it differently: if a TCP-peer sends us N bytes in one call, is there any guarantee that we receive those N bytes here all at once? (I don’t think so.)

    On the other hand, I assume calling reserve repeatedly is not really expensive (I guess it immediately returns if the passed size is equal to the currently reserved size), so it’s probably not a big deal.


    sipa commented at 4:08 pm on September 5, 2023:

    That’s a good point. reserve, when it is a no-op, is fast, and I’m not worried about calling reserve repeatedly in this case.

    However, when say someone announces to us a 4 MB packet, and then starts sending the bytes one by one, the currently-implemented logic will first preallocate 256 KiB of buffer space, then when a byte arrives, we’ll reallocate to (256 KiB + 1 B), and so on. That could actually amount to a substantial amount of reallocation work.

    I’ll change this to only calling reserve when the current buffer size is actually too little for the bytes received right now.


    sipa commented at 5:09 pm on September 5, 2023:
    Done.

    vasild commented at 9:01 am on September 6, 2023:

    if a TCP-peer sends us N bytes in one call, is there any guarantee that we receive those N bytes here all at once?

    No.

  184. sipa force-pushed on Sep 5, 2023
  185. in src/net.cpp:922 in ff64d9e6dd outdated
    917+/** List of short messages as defined in BIP324, in order.
    918+ *
    919+ * Only message types that are actually implemented in this codebase need to be listed, as other
    920+ * messages get ignored anyway - whether we know how to decode them or not.
    921+ */
    922+const std::string V2_MESSAGE_IDS[] = {
    


    instagibbs commented at 7:34 pm on September 5, 2023:

    short of mechanically generating these directly off the spec, I think statically asserting this list is 29 long would be good for hand-checking the bip

    bip and implementation-related question: how do we keep the message ids in sync? The BIP has erlay-related messages allocated, but they aren’t implemented here. If in a future PR these were added, wouldn’t the two versions be out of sync? Newer peer would send erlay-related messages with message ids(or perhaps something we actually use!) and older peer would simply ignore them. Is there a coordination mechanism for this?


    sipa commented at 6:41 pm on September 6, 2023:

    The two lists need to match in all implemented messages. My thinking was that if messages get added after the Erlay ones in the BIP, before Erlay being implemented in this codebase, we’d introduce dummies for the Erlay ones.

    But it’s probably clearer to do that right now already, so I’ve done so here. I’ve also added an assert that the total length is 33.

  186. sipa force-pushed on Sep 6, 2023
  187. DrahtBot added the label CI failed on Sep 6, 2023
  188. in src/test/net_tests.cpp:1336 in 2a5a7ffff7 outdated
    1331+        tester.SendMessage("tx", msg_data_2); // 12-character encoded message type
    1332+        ret = tester.Interact();
    1333+        BOOST_REQUIRE(ret && ret->size() == 3 &&
    1334+                      (*ret)[0] && (*ret)[0]->m_type == "cmpctblock" && Span{(*ret)[0]->m_recv} == MakeByteSpan(msg_data_1) &&
    1335+                      !(*ret)[1] &&
    1336+                      (*ret)[2] && (*ret)[2]->m_type == "tx" && Span{(*ret)[2]->m_recv} == MakeByteSpan(msg_data_2));
    


    vasild commented at 10:52 am on September 6, 2023:
    If this fails then it is not clear from the message which condition was false. In general, assert(A && B) is equivalent to assert(A); assert(B); but the latter gives more descriptive message in case of failure.

    sipa commented at 12:54 pm on September 6, 2023:
    Done.
  189. in src/net.cpp:1129 in 2a5a7ffff7 outdated
    1124+        // they receive our uniformly random key and garbage, but detecting this case specially
    1125+        // means we can log it.
    1126+        static constexpr std::array<uint8_t, 12> MATCH = {'v', 'e', 'r', 's', 'i', 'o', 'n', 0, 0, 0, 0, 0};
    1127+        static constexpr size_t OFFSET = sizeof(CMessageHeader::MessageStartChars);
    1128+        if (!m_initiating) {
    1129+            Assume(m_recv_buffer.size() <= OFFSET + MATCH.size());
    


    vasild commented at 11:26 am on September 6, 2023:

    This looks wrong, the condition on the if on line 1116 means that m_recv_buffer is exactly 64 bytes. Here we assert that it is less than 4 + 12 = 16. I think this is causing the CI failure.

    Edit: I guess it should be:

    0Assume(m_recv_buffer.size() >= OFFSET + MATCH.size());
    

    sipa commented at 12:54 pm on September 6, 2023:
    Fixed. I added a test, and noticed there was more wrong with this logic.
  190. sipa force-pushed on Sep 6, 2023
  191. sipa force-pushed on Sep 6, 2023
  192. in src/net.cpp:1017 in c23640d313 outdated
    936+
    937+void V2Transport::SetReceiveState(RecvState recv_state) noexcept
    938+{
    939+    AssertLockHeld(m_recv_mutex);
    940+    // No-op if no change is desired.
    941+    if (recv_state == m_recv_state) return;
    


    instagibbs commented at 2:26 pm on September 6, 2023:
    in what cases do we want to set the state again? Might be good to Assume here too if it’s not expected

    sipa commented at 6:39 pm on September 6, 2023:
    We set the state any time the state changes. The conditions for that are in the ProcessReceived*Bytes functions. I’m not sure I understand the question here.

    instagibbs commented at 7:04 pm on September 6, 2023:

    to rephrase, this?

    0    if (recv_state == m_recv_state) { Assume(false); return; }
    

    sipa commented at 7:14 pm on September 6, 2023:

    Oh, I see, you meant “tell me, when do we set the state again to the same state?”, not “tell me again, when do we set the state?”.

    This wasn’t the case in earlier iterations of this PR, but currently we indeed never set a state to its current value again. Enforced that by just dropping the “if (recv_state == m_recv_state)” conditional entirely (for both receiving and sending).

  193. in src/net.cpp:1021 in c23640d313 outdated
    1016+                  m_cipher.GetSendGarbageTerminator().end(),
    1017+                  MakeWritableByteSpan(m_send_buffer).last(BIP324Cipher::GARBAGE_TERMINATOR_LEN).begin());
    1018+        // Construct garbage authentication packet in the send buffer.
    1019+        m_send_buffer.resize(m_send_buffer.size() + BIP324Cipher::EXPANSION);
    1020+        m_cipher.Encrypt(
    1021+            {},
    


    instagibbs commented at 2:28 pm on September 6, 2023:
    could we annotate these args for reading clarity?

    sipa commented at 6:38 pm on September 6, 2023:
    Done.
  194. instagibbs commented at 3:16 pm on September 6, 2023: member
    light review, comparing implementation with the BIP, did not review the tests
  195. DrahtBot removed the label CI failed on Sep 6, 2023
  196. sipa force-pushed on Sep 6, 2023
  197. sipa force-pushed on Sep 6, 2023
  198. sipa force-pushed on Sep 6, 2023
  199. DrahtBot added the label CI failed on Sep 6, 2023
  200. sipa force-pushed on Sep 6, 2023
  201. DrahtBot removed the label CI failed on Sep 7, 2023
  202. naumenkogs commented at 8:32 am on September 7, 2023: member
    ACK 15ea0ce587a334b647119fe65de2b305720c3eba
  203. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  204. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  205. DrahtBot requested review from vasild on Sep 7, 2023
  206. in src/net.cpp:959 in 15ea0ce587 outdated
    954+    "",
    955+    "",
    956+    ""
    957+};
    958+
    959+static_assert(std::size(V2_MESSAGE_IDS) == 33);
    


    vasild commented at 10:52 am on September 7, 2023:

    nit:

    0const std::string V2_MESSAGE_IDS[] = {...};
    1static_assert(std::size(V2_MESSAGE_IDS) == 33);
    

    What the following instead:

    0std::array<std::string, 33> V2_MESSAGE_IDS[] = {...};
    

    sipa commented at 1:11 pm on September 7, 2023:
    Good idea, done.
  207. vasild approved
  208. vasild commented at 12:23 pm on September 7, 2023: contributor

    ACK 15ea0ce587a334b647119fe65de2b305720c3eba

    Coverage report for modified lines by this PR and not covered by tests:

    0./src/test/test_bitcoin
    1./src/qt/test/test_bitcoin-qt
    2./test/functional/test_runner.py
    3FUZZ=p2p_transport_bidirectional_v2   ./src/test/fuzz/fuzz -max_total_time=1800 /tmp/empty
    4FUZZ=p2p_transport_bidirectional_v1v2 ./src/test/fuzz/fuzz -max_total_time=1800 /tmp/empty
    
  209. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  210. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  211. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  212. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  213. net: add have_next_message argument to Transport::GetBytesToSend()
    Before this commit, there are only two possibly outcomes for the "more" prediction
    in Transport::GetBytesToSend():
    * true: the transport itself has more to send, so the answer is certainly yes.
    * false: the transport has nothing further to send, but if vSendMsg has more message(s)
             left, that still will result in more wire bytes after the next
             SetMessageToSend().
    
    For the BIP324 v2 transport, there will arguably be a third state:
    * definitely not: the transport has nothing further to send, but even if vSendMsg has
                      more messages left, they can't be sent (right now). This happens
                      before the handshake is complete.
    
    To implement this, we move the entire decision logic to the Transport, by adding a
    boolean to GetBytesToSend(), called have_next_message, which informs the transport
    whether more messages are available. The return values are still true and false, but
    they mean "definitely yes" and "definitely no", rather than "yes" and "maybe".
    c3fad1f29d
  214. net: remove unused Transport::SetReceiveVersion 5f4b2c6d79
  215. crypto: Spanify EllSwiftPubKey constructor dc2d7eb810
  216. net: add V2Transport class with subset of BIP324 functionality
    This introduces a V2Transport with a basic subset of BIP324 functionality:
    * no ability to send garbage (but receiving is supported)
    * no ability to send decoy packets (but receiving them is supported)
    * no support for short message id encoding (neither encoding or decoding)
    * no waiting until 12 non-V1 bytes have been received
    * (and thus) no detection of V1 connections on the responder side
      (on the sender side, detecting V1 is not supported either, but that needs
      to be dealt with at a higher layer, by reconnecting)
    13a7f01557
  217. net: make V2Transport auto-detect incoming V1 and fall back to it 8da8642062
  218. net: add short message encoding/decoding support to V2Transport 0be752d9f8
  219. net: make V2Transport send uniformly random number garbage bytes 3ffa5fb49e
  220. net: make V2Transport preallocate receive buffer space 297c888997
  221. test: add unit tests for V2Transport 91e1ef8684
  222. net: detect wrong-network V1 talking to V2Transport db9888feec
  223. in src/net.cpp:1090 in 6227597da1 outdated
    1010+    std::copy(std::begin(Params().MessageStart()), std::end(Params().MessageStart()), v1_prefix.begin());
    1011+    Assume(m_recv_buffer.size() <= v1_prefix.size());
    1012+    if (!std::equal(m_recv_buffer.begin(), m_recv_buffer.end(), v1_prefix.begin())) {
    1013+        // Mismatch with v1 prefix, so we can assume a v2 connection.
    1014+        SetReceiveState(RecvState::KEY); // Convert to KEY state, leaving received bytes around.
    1015+        // Transition the sender to AWAITING_KEY state (if not already).
    


    theStack commented at 12:54 pm on September 7, 2023:

    small nit:

    0        // Transition the sender to AWAITING_KEY state.
    

    To my understanding, no-op state transitions are not possible anymore (otherwise, one of the assertions in SetSendState would fail), and we could only hit this path if the sender state is MAYBE_V1.


    sipa commented at 1:08 pm on September 7, 2023:
    Agree. Will address if I push again.

    sipa commented at 7:29 pm on September 8, 2023:
    Done in #28433.
  224. in src/test/fuzz/p2p_transport_serialization.cpp:345 in 7371d7a1a5 outdated
    340+{
    341+    // Retrieve key
    342+    auto key_data = provider.ConsumeBytes<unsigned char>(32);
    343+    key_data.resize(32);
    344+    CKey key;
    345+    key.Set(key_data.begin(), key_data.end(), true);
    


    theStack commented at 12:56 pm on September 7, 2023:
    potential follow-up: after rebase, could now use ConsumePrivateKey (#28419 was merged today).

    sipa commented at 1:09 pm on September 7, 2023:
    Done.
  225. in src/net.cpp:930 in 7371d7a1a5 outdated
    925+
    926+V2Transport::V2Transport(NodeId nodeid, bool initiating, int type_in, int version_in, const CKey& key, Span<const std::byte> ent32) noexcept :
    927+    m_cipher{key, ent32}, m_initiating{initiating}, m_nodeid{nodeid},
    928+    m_recv_type{type_in}, m_recv_version{version_in},
    929+    m_recv_state{RecvState::KEY},
    930+    m_send_state{SendState::AWAITING_KEY}
    


    theStack commented at 1:04 pm on September 7, 2023:
    follow-up consideration: the two V2Transport constructors have a lot of code in common, is it somehow possible to deduplicate their implementation (e.g. with another private constructor)?

    sipa commented at 1:11 pm on September 7, 2023:

    I’ve considered that, but I like that the “production” constructor here can use the default BIP324Cipher constructor (leaving the random key generation there). Unfortunately that does mean some duplication, because the other constructor can’t be written that way.

    Maybe the solution is only having explicit constructors on BIP324Cipher and doing the random key generation here. Perhaps something for a follow-up?


    vasild commented at 9:26 am on September 8, 2023:
    I played with this some time ago. I used a private constructor that is called by the other two public constructors (see delegating constructor). One problem was that BIP324Cipher is not movable, so I involved std::forward to pass the arguments from the public ctors to the private one which would construct the BIP324Cipher object. It worked. Also the private ctor had to have different arguments than the public ones, so I added a dummy nullptr_t first arg (:disappointed:). Then in a later commit of this PR the two public ctors diverged and were not identical anymore, so I decided to ditch the idea.

    sipa commented at 7:29 pm on September 8, 2023:
    Done in #28433.
  226. theStack approved
  227. theStack commented at 1:07 pm on September 7, 2023: contributor

    ACK 15ea0ce587a334b647119fe65de2b305720c3eba

    Left some nitty follow-up material below.

  228. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  229. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  230. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  231. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  232. sipa force-pushed on Sep 7, 2023
  233. sipa commented at 2:09 pm on September 7, 2023: member

    Relevant changes since 15ea0ce587a334b647119fe65de2b305720c3eba (ignoring rebase):

     0--- a/src/net.cpp
     1+++ b/src/net.cpp
     2@@ -919,7 +920,7 @@ namespace {
     3  * Only message types that are actually implemented in this codebase need to be listed, as other
     4  * messages get ignored anyway - whether we know how to decode them or not.
     5  */
     6-const std::string V2_MESSAGE_IDS[] = {
     7+const std::array<std::string, 33> V2_MESSAGE_IDS = {
     8     "", // 12 bytes follow encoding the message type like in V1
     9     NetMsgType::ADDR,
    10     NetMsgType::BLOCK,
    11@@ -956,8 +957,6 @@ const std::string V2_MESSAGE_IDS[] = {
    12     ""
    13 };
    14 
    15-static_assert(std::size(V2_MESSAGE_IDS) == 33);
    16-
    17 class V2MessageMap
    18 {
    19     std::unordered_map<std::string, uint8_t> m_map;
    20--- a/src/test/fuzz/p2p_transport_serialization.cpp
    21+++ b/src/test/fuzz/p2p_transport_serialization.cpp
    22@@ -339,10 +339,7 @@ template<typename RNG>
    23 std::unique_ptr<Transport> MakeV2Transport(NodeId nodeid, bool initiator, RNG& rng, FuzzedDataProvider& provider)
    24 {
    25     // Retrieve key
    26-    auto key_data = provider.ConsumeBytes<unsigned char>(32);
    27-    key_data.resize(32);
    28-    CKey key;
    29-    key.Set(key_data.begin(), key_data.end(), true);
    30+    auto key = ConsumePrivateKey(provider);
    31     if (!key.IsValid()) return {};
    32     // Construct garbage
    33     size_t garb_len = provider.ConsumeIntegralInRange<size_t>(0, V2Transport::MAX_GARBAGE_LEN);
    
  234. theStack approved
  235. theStack commented at 2:23 pm on September 7, 2023: contributor
    re-ACK db9888feec48c6220a2fcf92865503bbbdab02a4
  236. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  237. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  238. DrahtBot requested review from naumenkogs on Sep 7, 2023
  239. DrahtBot requested review from vasild on Sep 7, 2023
  240. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  241. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  242. in src/net.cpp:1353 in 297c888997 outdated
    1348+                // more than MAX_RESERVE_AHEAD bytes in addition to what is received so far.
    1349+                // This means attackers that want to cause us to waste allocated memory are limited
    1350+                // to MAX_RESERVE_AHEAD above the largest allowed message contents size, and to
    1351+                // MAX_RESERVE_AHEAD more than they've actually sent us.
    1352+                size_t alloc_add = std::min(max_read, msg_bytes.size() + MAX_RESERVE_AHEAD);
    1353+                m_recv_buffer.reserve(m_recv_buffer.size() + alloc_add);
    


    Sjors commented at 8:15 pm on September 7, 2023:
    297c8889975a18258d6cc39b1ec1e94fed6630fb: since the code worked before this commit, what happens we don’t call reserve? Does that result in de worst case where the operating system allocates one byte at a time?

    sipa commented at 8:31 pm on September 7, 2023:
    Most std::vector implementations double their memory usage whenever they don’t have enough, I believe. So without this code, you’d actually get fewer reallocations for large messages actually, but also more easily wasted (allocated but unused) memory.
  243. Sjors commented at 8:22 pm on September 7, 2023: member

    Concept ACK.

    Overall this looks pretty solid.

    Commit c3fad1f29df093e8fd03d70eb43f25ee9d531bf7 looks good; there’s no ideal way to do this, but at least it’s now pretty well documented.

    The state machine looks good too.

    Otherwise I only shallowly reviewed up to db9888feec48c6220a2fcf92865503bbbdab02a4, and without looking at test and fuzz code.

    To better ensure decoy packages work, maybe you could add a (debug) rpc method to send decoy package to a given peer? Though that can wait for a followup.

  244. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  245. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  246. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  247. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  248. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  249. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  250. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  251. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  252. sipa commented at 8:37 pm on September 7, 2023: member

    To better ensure decoy packages work, maybe you could add a (debug) rpc method to send decoy package to a given peer? Though that can wait for a followup.

    That’s not possible in this PR, as V2Transport is not hooked up to anything except tests. This would be a possibility in #28331. Another way for testing decoys is through BIP324 support in the p2p functional test code, see #24748.

  253. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  254. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  255. in src/net.cpp:1504 in 3ffa5fb49e outdated
    1497@@ -1490,7 +1498,10 @@ void V2Transport::MarkBytesSent(size_t bytes_sent) noexcept
    1498 
    1499     m_send_pos += bytes_sent;
    1500     Assume(m_send_pos <= m_send_buffer.size());
    1501-    if (m_send_pos == m_send_buffer.size()) {
    1502+    // Only wipe the buffer when everything is sent in the READY state. In the AWAITING_KEY state
    1503+    // we still need the garbage that's in the send buffer to construct the garbage authentication
    1504+    // packet.
    1505+    if (m_send_state == SendState::READY && m_send_pos == m_send_buffer.size()) {
    


    mzumsande commented at 9:27 pm on September 7, 2023:
    While this shortcut works, I wonder if it would be conceptually clearer to not have this special case and instead save the garbage in a m_send_garbage vector, similar to the existing m_recv_garbage on the receive side.

    sipa commented at 11:07 pm on September 7, 2023:
    I’ll do this if I make further changes to the PR, or as a follow-up. It would also simplify something else: getting rid of the “don’t send in MABE_V1 state” exception, by only putting stuff in the send buffer when leaving that state.

    sipa commented at 7:29 pm on September 8, 2023:
    Done in #28433.
  256. mzumsande commented at 10:58 pm on September 7, 2023: contributor

    Code Review ACK db9888feec48c6220a2fcf92865503bbbdab02a4

    I reviewed the code again and did a little bit of manual mutation testing (checking that the fuzz test fails if I break things). The nits can be ignored.

  257. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  258. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  259. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  260. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  261. DrahtBot removed review request from vincenzopalazzo on Sep 7, 2023
  262. DrahtBot requested review from vincenzopalazzo on Sep 7, 2023
  263. naumenkogs commented at 9:02 am on September 8, 2023: member
    ACK db9888feec48c6220a2fcf92865503bbbdab02a4
  264. DrahtBot removed review request from naumenkogs on Sep 8, 2023
  265. DrahtBot removed review request from vincenzopalazzo on Sep 8, 2023
  266. DrahtBot requested review from vincenzopalazzo on Sep 8, 2023
  267. fanquake commented at 9:11 am on September 8, 2023: member

    Outstanding/followup comments:

    #28196 (review) #28196 (review) #28196 (review)

  268. DrahtBot removed review request from vincenzopalazzo on Sep 8, 2023
  269. DrahtBot requested review from vincenzopalazzo on Sep 8, 2023
  270. fanquake merged this on Sep 8, 2023
  271. fanquake closed this on Sep 8, 2023

  272. vasild commented at 9:31 am on September 8, 2023: contributor
    ACK db9888feec48c6220a2fcf92865503bbbdab02a4
  273. Frank-GER referenced this in commit 087e38b9a5 on Sep 8, 2023
  274. fanquake referenced this in commit 8f7b9eb871 on Sep 11, 2023
  275. fanquake referenced this in commit 372e7b6510 on Sep 16, 2023
  276. Frank-GER referenced this in commit 4bbfc7f0b9 on Sep 19, 2023
  277. Frank-GER referenced this in commit d3181ea3ca on Sep 19, 2023
  278. fanquake referenced this in commit 0f89e86516 on Mar 19, 2024
  279. bitcoin locked this on Sep 12, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me