test/BIP324: functional tests for v2 P2P encryption #24748

pull stratospher wants to merge 15 commits into bitcoin:master from stratospher:p2p-encryption-test changing 15 files +752 −66
  1. stratospher commented at 8:14 pm on April 3, 2022: contributor

    This PR introduces support for v2 P2P encryption(BIP 324) in the existing functional test framework and adds functional tests for the same.

    commits overview

    1. introduces a new class EncryptedP2PState to store the keys, functions for performing the initial v2 handshake and encryption/decryption.

    2. this class is used by P2PConnection in inbound/outbound connections to perform the initial v2 handshake before the v1 version handshake. Only after the initial v2 handshake is performed do application layer P2P messages(version, verack etc..) get exchanged. (in a v2 connection)

      • v2_state is the object of class EncryptedP2PState in P2PConnection used to store its keys, session-id etc.
      • a node advertising support for v2 P2P is different from a node actually supporting v2 P2P (differ when false advertisement of services occur)
        • introduce a boolean variable supports_v2_p2p in P2PConnection to denote if it supports v2 P2P.
        • introduce a boolean variable advertises_v2_p2p to denote whether P2PConnection which mimics peer behaviour advertises V2 P2P support. Default option is False.
      • In the test framework, you can create Inbound and Outbound connections to TestNode
        1. During Inbound Connections, P2PConnection is the initiator [TestNode <——— P2PConnection]
          • Case 1:
            • if the TestNode advertises/signals v2 P2P support (means self.nodes[i] set up with "-v2transport=1"), different behaviour will be exhibited based on whether:
              1. P2PConnection supports v2 P2P
              2. P2PConnection does not support v2 P2P
            • In a real world scenario, the initiator node would intrinsically know if they support v2 P2P based on whatever code they choose to run. However, in the test scenario where we mimic peer behaviour, we have no way of knowing if P2PConnection should support v2 P2P or not. So supports_v2_p2p boolean variable is used as an option to enable support for v2 P2P in P2PConnection.
            • Since the TestNode advertises v2 P2P support (using “-v2transport=1”), our initiator P2PConnection would send:
              1. (if the P2PConnection supports v2 P2P) ellswift + garbage bytes to initiate the connection
              2. (if the P2PConnection does not support v2 P2P) version message to initiate the connection
          • Case 2:
            • if the TestNode doesn’t signal v2 P2P support; P2PConnection being the initiator would send version message to initiate a connection.
        2. During Outbound Connections [TestNode ——–> P2PConnection]
          • initiator TestNode would send:
            • (if the P2PConnection advertises v2 P2P) ellswift + garbage bytes to initiate the connection
            • (if the P2PConnection advertises v2 P2P) version message to initiate the connection
          • Suppose P2PConnection advertises v2 P2P support when it actually doesn’t support v2 P2P (false advertisement scenario)
            • TestNode sends ellswift + garbage bytes
            • P2PConnection receives but can’t process it and disconnects.
            • TestNode then tries using v1 P2P and sends version message
            • P2PConnection receives/processes this successfully and they communicate on v1 P2P
    3. the encrypted P2P messages follow a different format - 3 byte length + 1-13 byte message_type + payload + 16 byte MAC

    4. includes support for testing decoy messages and v2 connection downgrade(using false advertisement - when a v2 node makes an outbound connection to a node which doesn’t support v2 but is advertised as v2 by some malicious intermediary)

    run the tests

    • functional test - test/functional/p2p_v2_encrypted.py test/functional/p2p_v2_earlykeyresponse.py

    I’m also super grateful to @ dhruv for his really valuable feedback on this branch. Also written a more elaborate explanation here - https://github.com/stratospher/blogosphere/blob/main/integration_test_bip324.md

  2. DrahtBot added the label Build system on Apr 3, 2022
  3. DrahtBot added the label P2P on Apr 3, 2022
  4. DrahtBot added the label RPC/REST/ZMQ on Apr 3, 2022
  5. DrahtBot added the label Upstream on Apr 3, 2022
  6. DrahtBot added the label Utils/log/libs on Apr 3, 2022
  7. stratospher force-pushed on Apr 4, 2022
  8. DrahtBot commented at 1:31 am on April 5, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, theStack, naumenkogs, glozow
    Concept ACK jonatack, brunoerg
    Approach ACK sr-gi

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29279 (test: p2p: check disconnect due to lack of desirable service flags by theStack)
    • #29016 (RPC: add new listmempooltransactions by niftynei)
    • #28463 (p2p: Increase inbound capacity for block-relay only connections by mzumsande)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. jonatack commented at 7:03 am on April 7, 2022: contributor
    Concept ACK
  10. brunoerg commented at 12:54 pm on April 8, 2022: contributor
    Concept ACK
  11. DrahtBot added the label Needs rebase on Apr 8, 2022
  12. stratospher marked this as a draft on Oct 3, 2022
  13. stratospher commented at 5:24 pm on October 3, 2022: contributor
    converting this PR into a draft. I’ll push the updated version which includes the new spec changes in BIP 324 soon.
  14. stratospher force-pushed on Dec 16, 2022
  15. DrahtBot removed the label Needs rebase on Dec 16, 2022
  16. stratospher marked this as ready for review on Dec 16, 2022
  17. stratospher commented at 4:09 pm on December 16, 2022: contributor

    I’ve updated the PR to reflect the new spec changes in the BIP. It’s built on top of #24545’s 78c3ccc. Only the last 19 commits belong to this PR.

    UPDATE:

    • rebased on #24545’s f97a1a8
    • rebased on #24545’s a7fdbf6
  18. DrahtBot added the label Needs rebase on Dec 19, 2022
  19. stratospher force-pushed on Feb 6, 2023
  20. DrahtBot removed the label Needs rebase on Feb 6, 2023
  21. DrahtBot added the label Needs rebase on Feb 15, 2023
  22. fanquake commented at 3:22 pm on February 16, 2023: member
    Moved to draft given it’s based on multiple other PRs.
  23. fanquake marked this as a draft on Feb 16, 2023
  24. stratospher force-pushed on Feb 21, 2023
  25. DrahtBot removed the label Needs rebase on Feb 21, 2023
  26. DrahtBot added the label Needs rebase on Feb 22, 2023
  27. stratospher force-pushed on Mar 23, 2023
  28. stratospher force-pushed on Mar 26, 2023
  29. fanquake referenced this in commit 61d59fed74 on Jun 30, 2023
  30. sidhujag referenced this in commit 421153c7ad on Jul 1, 2023
  31. sipa commented at 6:07 pm on August 1, 2023: member
    @stratospher Rebasing on #28331 would let you revive this.
  32. maflcko removed the label Build system on Aug 1, 2023
  33. maflcko removed the label RPC/REST/ZMQ on Aug 1, 2023
  34. maflcko removed the label P2P on Aug 1, 2023
  35. maflcko removed the label Upstream on Aug 1, 2023
  36. maflcko removed the label Utils/log/libs on Aug 1, 2023
  37. stratospher force-pushed on Sep 10, 2023
  38. stratospher commented at 6:04 am on September 10, 2023: contributor

    Rebased on #28331. 2 things which affected the tests when rebasing were:

    1. TestNode can send garbage now
    2. since v1 reconnections are now attempted in a queue instead of immediately, there was a latency issue introduced in the tests in add_outbound_p2p_connection()
  39. in test/functional/test_framework/p2p.py:593 in e18b22d34e outdated
    597@@ -598,6 +598,11 @@ def wait_for_disconnect(self, timeout=60):
    598         test_function = lambda: not self.is_connected
    599         self.wait_until(test_function, timeout=timeout, check_connected=False)
    600 
    601+    def wait_for_reconnect(self, timeout=60):
    


    stratospher commented at 6:13 am on September 10, 2023:

    e18b22d: in an outbound connection, TestNode —–> P2PConnection where P2PConnection doesn’t actually support v2 P2P but is advertised to support v2 P2P (false advertisement), we need to wait for the TestNode to reconnect using v1 P2P.

    This wait_for_reconnect() function won’t work always since race conditions are possible. Anyone has a better idea on how this can be done?


    sipa commented at 3:26 pm on September 10, 2023:

    What is the race condition? I expect you should always still observe a v2 connection, a v2 disconnect, and a v1 connect.

    FWIW, the reason why the reconnect is done asynchronously is to avoid making a P2P connection (which can take seconds, or up to a minute) inside the normal network thread (as that’d prevent us from receiving anything from any other peer at the same time).


    stratospher commented at 4:13 pm on September 10, 2023:

    ideally, first wait_for_connect waits on v2 connection and the wait_for_connect a few lines below waits for v1 connection.

    it’s possible that the v2 reconnection happens quickly and the first wait_for_connect detects a v2 connection directly. Then we’d be stuck waiting for disconnect and connect (in the lines below) to happen (even though it has already happened).


    mzumsande commented at 4:05 am on October 9, 2023:

    Maybe it would work just not checking for the racy intermediate states, but instead waiting until we have received the “version” message via v1? Something like

    0def test_function():
    1    if not (self.is_connected and self.last_message.get('version') and self.v2_state is None):
    2        return False
    3    return True
    4self.wait_until(test_function, timeout=timeout, check_connected=False  )
    

    stratospher commented at 2:17 pm on October 9, 2023:
    good idea! that should work.
  40. DrahtBot removed the label Needs rebase on Sep 10, 2023
  41. Brock124590 approved
  42. stratospher force-pushed on Sep 10, 2023
  43. DrahtBot added the label CI failed on Sep 10, 2023
  44. DrahtBot added the label Needs rebase on Sep 11, 2023
  45. stratospher force-pushed on Sep 18, 2023
  46. DrahtBot removed the label Needs rebase on Sep 18, 2023
  47. stratospher force-pushed on Sep 20, 2023
  48. DrahtBot added the label Needs rebase on Sep 21, 2023
  49. stratospher force-pushed on Oct 6, 2023
  50. DrahtBot removed the label Needs rebase on Oct 6, 2023
  51. cacrowley approved
  52. stratospher force-pushed on Oct 7, 2023
  53. in src/net.cpp:1917 in 4084fe0c4c outdated
    1912@@ -1913,7 +1913,11 @@ bool CConnman::AddConnection(const std::string& address, ConnectionType conn_typ
    1913     CSemaphoreGrant grant(*semOutbound, true);
    1914     if (!grant) return false;
    1915 
    1916-    OpenNetworkConnection(CAddress(), false, std::move(grant), address.c_str(), conn_type, /*use_v2transport=*/false);
    1917+    CAddress addr(CService(), NODE_NONE);
    1918+    if (use_v2transport) {
    


    mzumsande commented at 5:29 pm on October 8, 2023:
    Updating the address is no longer necessary with the latest net code, just passing the bool should be enough.

    stratospher commented at 2:15 pm on October 9, 2023:
    thanks! done.
  54. in test/functional/test_framework/v2_p2p.py:101 in a7478eab51 outdated
     96@@ -18,3 +97,143 @@ def v2_ecdh(priv, ellswift_theirs, ellswift_ours, initiating):
     97         else:
     98             # Responding, place their public key encoding first.
     99             return TaggedHash("bip324_ellswift_xonly_ecdh", ellswift_theirs + ellswift_ours + ecdh_point_x32)
    100+
    101+    def initiate_v2_handshake(self, garbage_len=random.randrange(4096)):
    


    mzumsande commented at 2:50 am on October 9, 2023:

    Here, and in respond_v2_handshake: This should probably be something like

    0def initiate_v2_handshake(self, garbage_len=None):
    1    if garbage_len is None:
    2        garbage_len = random.randrange(4096)
    

    Because default args are determined when the function is defined (and not when it’s called), otherwise

    1. the provided --randomseed isn’t used, making runs non-deterministic.
    2. multiple calls to initiate_v2_handshake within one program run will all use the same garbage_len.

    stratospher commented at 2:15 pm on October 9, 2023:
    interesting, done. i’ve removed garbage_len from the function parameter list since the call sites wouldn’t need to set garbage_len value.

    mzumsande commented at 8:03 pm on October 10, 2023:
    Thanks! The fix made it into the wrong commit (should be in “Construct class to handle v2 P2P protocol functions”, it’s one commit later).

    stratospher commented at 9:27 am on October 12, 2023:
    oops, fixed.
  55. mzumsande commented at 4:10 am on October 9, 2023: contributor

    Concept ACK

    just started reviewing, some initial comments

  56. stratospher force-pushed on Oct 9, 2023
  57. in test/functional/test_framework/v2_p2p.py:138 in 5f851714cf outdated
    133+        # Send garbage terminator
    134+        msg_to_send = self.peer['send_garbage_terminator']
    135+        # Optionally send decoy packets after garbage terminator.
    136+        aad = self.sent_garbage
    137+        for decoy_content_len in [random.randint(1, 100) for _ in range(10)]:
    138+            msg_to_send += self.v2_enc_packet(decoy_content_len * b'\x00', aad=aad)
    


    mzumsande commented at 10:40 pm on October 9, 2023:
    I think that all the decoy packets should pass ignore=True as an arg to v2_enc_packet.

    stratospher commented at 9:27 am on October 12, 2023:
    done. true! logs had lines like these: [net] non-version message before version handshake. Message "" from peer=0.
  58. DrahtBot removed the label CI failed on Oct 10, 2023
  59. in src/net.h:1212 in 9810e96c74 outdated
    1208@@ -1209,13 +1209,14 @@ class CConnman
    1209      * @param[in]   address     Address of node to try connecting to
    1210      * @param[in]   conn_type   ConnectionType::OUTBOUND, ConnectionType::BLOCK_RELAY,
    1211      *                          ConnectionType::ADDR_FETCH or ConnectionType::FEELER
    1212+     * @param[in]   use_v2transport  Set to true if node supports BIP324 v2 transport protocol
    


    mzumsande commented at 6:46 pm on October 10, 2023:
    I’d prefer to explain what it actually does - (attempt to) connect using the v2 transport protocol.

    stratospher commented at 9:27 am on October 12, 2023:
    done.
  60. in src/rpc/net.cpp:360 in 9810e96c74 outdated
    356@@ -357,6 +357,7 @@ static RPCHelpMan addconnection()
    357         {
    358             {"address", RPCArg::Type::STR, RPCArg::Optional::NO, "The IP address and port to attempt connecting to."},
    359             {"connection_type", RPCArg::Type::STR, RPCArg::Optional::NO, "Type of connection to open (\"outbound-full-relay\", \"block-relay-only\", \"addr-fetch\" or \"feeler\")."},
    360+            {"v2transport", RPCArg::Type::BOOL, RPCArg::Default{false}, "Node supports BIP324 v2 transport protocol"},
    


    mzumsande commented at 6:47 pm on October 10, 2023:
    same here: connect with v2 transport protocol; we might choose to not pass v2transport=True with this rpc even if the node supports v2 transport.

    stratospher commented at 9:27 am on October 12, 2023:
    done.
  61. in src/rpc/net.cpp:405 in 9810e96c74 outdated
    388@@ -388,11 +389,12 @@ static RPCHelpMan addconnection()
    389     } else {
    390         throw JSONRPCError(RPC_INVALID_PARAMETER, self.ToString());
    391     }
    392+    bool use_v2transport = !request.params[2].isNull() && request.params[2].get_bool();
    


    mzumsande commented at 6:50 pm on October 10, 2023:
    could return an error if -v2transport for the node is not enabled but use_v2transport is chosen (similar to the way it’s done in addnode).

    stratospher commented at 9:27 am on October 12, 2023:
    done.
  62. in src/net.cpp:1836 in 9810e96c74 outdated
    1879@@ -1880,7 +1880,7 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1880     RandAddEvent((uint32_t)id);
    1881 }
    1882 
    1883-bool CConnman::AddConnection(const std::string& address, ConnectionType conn_type)
    1884+bool CConnman::AddConnection(const std::string& address, ConnectionType conn_type, bool use_v2transport = false)
    


    mzumsande commented at 6:59 pm on October 10, 2023:
    9810e96c74380a3087b2b576508581380019872f: Commit message needs an update (remove the service bit flags part)

    stratospher commented at 9:27 am on October 12, 2023:
    done.
  63. in test/functional/test_framework/v2_p2p.py:183 in 5f851714cf outdated
    178+                response = response[1:]
    179+        # disconnect since garbage terminator was not seen after 4 KiB of garbage.
    180+        return processed_length, False
    181+
    182+    def initialize_v2_transport(self, ecdh_secret):
    183+        """Return a peer object with various BIP324 derived keys and ciphers."""
    


    mzumsande commented at 8:56 pm on October 10, 2023:
    nit: this function doesn’t return anything, it sets the peer class member.

    stratospher commented at 9:28 am on October 12, 2023:
    done.
  64. mzumsande commented at 10:27 pm on October 10, 2023: contributor
    halfway through with the review, these are my comments so far.
  65. in test/functional/test_framework/v2_p2p.py:230 in 5f851714cf outdated
    225+        """
    226+        if len(response) < LENGTH_FIELD_LEN:
    227+            return 0, b""
    228+        enc_contents_len = response[:LENGTH_FIELD_LEN]
    229+        response = response[LENGTH_FIELD_LEN:]
    230+        contents_len = int.from_bytes(self.peer['recv_L'].crypt(enc_contents_len), 'little')
    


    mzumsande commented at 4:03 pm on October 11, 2023:
    I have little knowledge of cryptography, but I think that something is wrong here. Doesn’t crypt change the internal state of the stream cipher, so if we abort below because we haven’t received the message in full, but then call crypt again on the same first 3 bytes of the now larger response, we will get an incorrect result?

    stratospher commented at 9:56 am on October 12, 2023:

    nice catch. you’re right!

    i’ve stored contents_len in a class variable so as to not perform the decryption of 3 bytes length again. initially, contents_len is -1, after the 3 bytes length gets processed - contents_len is updated, when the whole packet is processed contents_len is reset to -1 again.

    i’m open to other solutions.

    i wish there was a cleaner way for writing this function with reads like this but we’re working with P2PConnection(subclass of asyncio.Protocol) where no read is possible(no “waiting” for data to be ready) and we just have a data_received() callback when data is received.

  66. fanquake referenced this in commit 4a5aae9330 on Oct 12, 2023
  67. stratospher force-pushed on Oct 12, 2023
  68. DrahtBot added the label CI failed on Oct 12, 2023
  69. Frank-GER referenced this in commit 0a38a7b9bc on Oct 13, 2023
  70. DrahtBot added the label Needs rebase on Nov 1, 2023
  71. stratospher force-pushed on Nov 3, 2023
  72. stratospher commented at 10:32 am on November 3, 2023: contributor
    Rebased on master.
  73. DrahtBot removed the label CI failed on Nov 3, 2023
  74. DrahtBot removed the label Needs rebase on Nov 3, 2023
  75. DrahtBot added the label CI failed on Nov 4, 2023
  76. DrahtBot removed the label CI failed on Nov 4, 2023
  77. achow101 referenced this in commit 962ea5c525 on Nov 7, 2023
  78. stratospher force-pushed on Nov 8, 2023
  79. stratospher commented at 4:39 am on November 8, 2023: contributor
    Updated since #28374 is merged.
  80. stratospher marked this as ready for review on Nov 8, 2023
  81. DrahtBot added the label CI failed on Nov 8, 2023
  82. maflcko commented at 9:35 am on November 8, 2023: member

    CI:

    0�[0m�[0;31mp2p_v2_earlykeyresponse.py                             | ✖ Failed  | 1 s
    
  83. DrahtBot added the label Needs rebase on Nov 8, 2023
  84. in test/functional/test_framework/v2_p2p.py:81 in 63463aa9be outdated
    77+        - see section #overall-handshake-pseudocode in BIP 324
    78+    - encrypt/decrypt v2 P2P messages.
    79+        - see section #overall_packet_encryption_and_decryption_pseudocode in BIP 324
    80+    """
    81+    def __init__(self, **kwargs):
    82+        self.initiating = kwargs['initiating']  # True if initiator
    


    theStack commented at 4:37 pm on November 9, 2023:

    nit: could avoid kwargs here (unless there is a good reason to also allow other keyword arguments)

    0    def __init__(self, *, initiating):
    1        self.initiating = initiating  # True if initiator
    

    stratospher commented at 6:27 am on November 11, 2023:
    done.
  85. in test/functional/test_framework/v2_p2p.py:21 in 63463aa9be outdated
    17 
    18+logger = logging.getLogger("TestFramework.v2_p2p")
    19+
    20+MAGIC_BYTES = {
    21+    "regtest": b"\xfa\xbf\xb5\xda"   # regtest
    22+}
    


    theStack commented at 4:44 pm on November 9, 2023:
    nit: there’s already a MAGIC_BYTES dictionary in p2p.py, could use that? (though it seems there’s some circular inclusion going on, not sure how to properly resolve that….)

    mzumsande commented at 7:45 pm on November 10, 2023:
    I agree that would be nice, in particular so that we can enable p2p_dos_header_tree.py (which uses testnet instead of regtest) in combination with v2 transport. One possibility would be to move the MAGIC_BYTES constants from p2p.py into another file (e.g. test_framework/messages.py), change all includes, pass net to EncryptedP2PState, and change V1_PREFIX into a local variable using it)

    stratospher commented at 6:27 am on November 11, 2023:
    nice! included MAGIC_BYTES in messages.py

    mzumsande commented at 4:52 am on November 13, 2023:
    Thanks! I think you should actually use net by making both callers from p2p.py pass it to EncryptedP2PState, the default value of ‘regtest’ should be dropped in my opinion. Could also change all existing imports for MAGIC_BYTES (there are a few in other tests) to avoid indirect imports.

    stratospher commented at 8:06 am on November 15, 2023:
    sounds much better! added a new commit.
  86. in test/functional/test_framework/v2_p2p.py:108 in 63463aa9be outdated
     99@@ -18,3 +100,147 @@ def v2_ecdh(priv, ellswift_theirs, ellswift_ours, initiating):
    100         else:
    101             # Responding, place their public key encoding first.
    102             return TaggedHash("bip324_ellswift_xonly_ecdh", ellswift_theirs + ellswift_ours + ecdh_point_x32)
    103+
    104+    def initiate_v2_handshake(self):
    105+        """Initiator begins the v2 handshake by sending its ellswift bytes and garbage"""
    106+        self.privkey_ours, self.ellswift_ours = ellswift_create()
    107+        garbage_len = random.randrange(4096)
    108+        self.sent_garbage = os.urandom(garbage_len)
    


    theStack commented at 4:46 pm on November 9, 2023:
    nit: could deduplicate code by using a helper method like generate_garbage (or even generate_keypair_and_garbage) that is used both in initiate_v2_handshake and respond_v2_handshake? Could also put the magic number in a MAX_GARBAGE_LEN constant (that’s the naming we use in the C++ codebase), though that’s one less than the number passed to randrange here.

    stratospher commented at 6:27 am on November 11, 2023:
    true! done.
  87. in test/functional/test_framework/v2_p2p.py:124 in 63463aa9be outdated
    119+                return b""
    120+            self.received_prefix += byte
    121+            if self.received_prefix[-1] != V1_PREFIX[len(self.received_prefix) - 1]:
    122+                self.privkey_ours, self.ellswift_ours = ellswift_create()
    123+                garbage_len = random.randrange(4096)
    124+                self.sent_garbage = os.urandom(garbage_len)
    


    theStack commented at 4:51 pm on November 9, 2023:

    Since we are only in test land here, it’s probably fine to also use seed-determined randomness for garbage as well (i.e. random.randrange)? (I’ve used os.urandom in the past for a PR and got critical feedback about that: #25625 (review), which I agree with)

    0                self.sent_garbage = random.randrange(garbage_len)
    

    stratospher commented at 6:27 am on November 11, 2023:
    makes sense. done.
  88. in test/functional/test_framework/v2_p2p.py:188 in 63463aa9be outdated
    158+        processed_length = len(received_garbage)
    159+        for i in range(4096):
    160+            if received_garbage[-16:] == self.peer['recv_garbage_terminator']:
    161+                # Receive, decode, and ignore version packet.
    162+                # This includes skipping decoys and authenticating the received garbage.
    163+                aad = received_garbage[:-16]
    


    theStack commented at 5:05 pm on November 9, 2023:

    nit: as received_garbage is ensured to have a maximum size of 16 bytes a few lines above, the slicing isn’t needed here.

    0            if received_garbage == self.peer['recv_garbage_terminator']:
    1                # Receive, decode, and ignore version packet.
    2                # This includes skipping decoys and authenticating the received garbage.
    3                aad = received_garbage
    

    stratospher commented at 6:27 am on November 11, 2023:
    but there’s also an else condition below where if the received_garbage doesn’t match the garbage terminator, we keep appending 1 byte to received_garbage until there’s a match with garbage terminator/we’ve exceed max garbage limit. so even though initially received_garbage length is 16 bytes, it can take values from 17 bytes, …, 17 + 4095 bytes.

    theStack commented at 2:23 pm on November 11, 2023:
    Oh right, I missed the else-branch and assumed received_garbage wouldn’t change in the loop (which wouldn’t make much sense), nevermind.
  89. in test/functional/test_framework/test_node.py:722 in 1b656a1cf6 outdated
    719+            self.addconnection('%s:%d' % (address, port), connection_type, advertise_v2_p2p)
    720 
    721         p2p_conn.p2p_connected_to_node = False
    722-        p2p_conn.peer_accept_connection(connect_cb=addconnection_callback, connect_id=p2p_idx + 1, net=self.chain, timeout_factor=self.timeout_factor, **kwargs)()
    723+        if advertise_v2_p2p:
    724+            kwargs['services'] = kwargs['services']|NODE_P2P_V2 if 'services' in kwargs else P2P_SERVICES|NODE_P2P_V2
    


    theStack commented at 5:25 pm on November 9, 2023:

    readability nit:

    0            kwargs['services'] = (kwargs['services']|NODE_P2P_V2) if 'services' in kwargs else (P2P_SERVICES|NODE_P2P_V2)
    

    stratospher commented at 6:28 am on November 11, 2023:
    done.
  90. in test/functional/test_framework/p2p.py:858 in 09638be021 outdated
    854@@ -853,9 +855,11 @@ def send_blocks_and_test(self, blocks, node, *, success=True, force_send=False,
    855 
    856         reject_reason = [reject_reason] if reject_reason else []
    857         with node.assert_debug_log(expected_msgs=reject_reason):
    858+            if is_decoy:  # since decoy messages don't get sent - no need to wait for response
    


    theStack commented at 5:30 pm on November 9, 2023:

    nit:

    0            if is_decoy:  # since decoy messages are ignored by the recipient - no need to wait for response
    

    (or “don’t get processed” or something like that?)


    stratospher commented at 6:28 am on November 11, 2023:
    done.
  91. in test/functional/p2p_v2_encrypted.py:23 in f9a34e3335 outdated
    18+    assert_equal,
    19+    assert_greater_than,
    20+    check_node_connections,
    21+)
    22+
    23+REKEY_INTERVAL = 224
    


    theStack commented at 5:36 pm on November 9, 2023:
    nit: could import this constant from test_framework.crypto

    stratospher commented at 6:28 am on November 11, 2023:
    done.
  92. theStack commented at 5:39 pm on November 9, 2023: contributor

    Concept ACK

    Did only a rough first review round so far, left some comments on the way, mostly nits.

    It seems like the newly introduced functional test p2p_v2_encrypted.py could be merged with the already existing p2p_v2_transport.py (here or in a follow-up), or is it fundamentally different?

  93. stratospher force-pushed on Nov 11, 2023
  94. DrahtBot removed the label Needs rebase on Nov 11, 2023
  95. stratospher commented at 6:32 am on November 11, 2023: contributor

    @theStack, @mzumsande thank you for the reviews! I’ve rebased and addressed your comments.

    It seems like the newly introduced functional test p2p_v2_encrypted.py could be merged with the already existing p2p_v2_transport.py (here or in a follow-up), or is it fundamentally different?

    yes! can be done in a follow up. only difference is p2p_v2_encrypted.py tests TestNode <–> P2PInterface behaviour and p2p_v2_transport.py tests TestNode <–> TestNode behaviour. So maybe in separate functions within p2p_v2_transport.py? @maflcko, I still need to address the CI failure. https://api.cirrus-ci.com/v1/task/6171895557521408/logs/ci.log

  96. theStack commented at 5:29 pm on November 14, 2023: contributor

    I’ve noticed that the test p2p_v2_encrypted.py fails if the node under test sends a small amount of garbage (in the range of 0-19 bytes). Can be reproduced by applying the following simple patch:

     0diff --git a/src/net.cpp b/src/net.cpp
     1index 2682035912..09af15a9b7 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -951,7 +951,7 @@ std::vector<uint8_t> GenerateRandomGarbage() noexcept
     5 {
     6     std::vector<uint8_t> ret;
     7     FastRandomContext rng;
     8-    ret.resize(rng.randrange(V2Transport::MAX_GARBAGE_LEN + 1));
     9+    ret.resize(19); // fails in range 0-19, works with 20+
    10     rng.fillrand(MakeWritableByteSpan(ret));
    11     return ret;
    12 }
    

    Apparently the expected VERACK is not received from the node. Didn’t dig deeper yet. For finding the cause, I think some logger.debug(...) messages in v2_p2p.py might be helpful.

  97. stratospher force-pushed on Nov 15, 2023
  98. stratospher commented at 8:07 am on November 15, 2023: contributor

    I’ve noticed that the test p2p_v2_encrypted.py fails if the node under test sends a small amount of garbage (in the range of 0-19 bytes).

    thanks for catching that @theStack! before the drop garbage authentication packet change in #28525, recvbuf needed 16+20+20 bytes to authenticate the handshake. now it only needs 16+20 bytes. i’ve fixed it now.

  99. stratospher commented at 8:26 am on November 15, 2023: contributor

    https://api.cirrus-ci.com/v1/task/6171895557521408/logs/ci.log

    i wasn’t able to reproduce this error - tried it locally on the docker container, setting randomseed and running it in loop. also confused how it could have happened.

    this is what happens inside p2p_v2_earlykeyresponse.py test

    • we start an inbound connection to the TestNode (TestNode <—- P2PInterface).
    • v2 connection starts by sending 64 bytes ellswift (TestNode <–64 bytes– P2PInterface). this is sent in 2 parts:
      • TestNode <–4 bytes magic bytes– P2PInterface - TestNode doesn’t send us a response
      • TestNode <–remaining 60 bytes– P2PInterface - TestNode responds back since there’s a mismatch from V1_PREFIX.

    What usually happens:

    1. During inbound connection creation, TestNode::add_p2p_connection() calls P2PConnection::peer_connect() where an object of EncryptedP2PState class is created - this object v2_state contains the function EncryptedP2PState::initiate_v2_handshake() to send ellswift bytes
    2. then the asyncio code to create a connection happens and connection_made() is called when the connection is opened
      • inside connection_made(), 64 bytes ellswift is sent using EncryptedP2PState::initiate_v2_handshake()
    3. When the TestNode sends a response, P2PConnection::data_received() callback will detect it

    What happens in test:

    • there are 2 variables:
    1. send_net_magic - initially true. after we send magic bytes, send_net_magic_bytes is set to false (basically just a switch to send 64 bytes in 2 parts)
    2. can_data_be_received - initally false - data can’t be received until mismatch from V1_PREFIX occurs. after we send remaining mismatched 60 bytes, can_data_be_received is set to true.
    • since response will be detected on TestNode, we add an assertion to make sure send_net_magic = false (meaning magic bytes already sent) and can_data_be_received = true (meaning mismatched 60 bytes also sent). TestNode wouldn’t send a response in the intermediate states.
    • so the possible states are:
    1. initially send_net_magic = true, can_data_be_received = false (Assertion would fail in data_received() but shouldn’t happen)
    2. after connection_made()’s custom EncryptedP2PState::initiate_v2_handshake() (TestNode <–4 bytes magic bytes– P2PInterface) send_net_magic = false, can_data_be_received = false (Assertion would fail in data_received() but shouldn’t happen)
    3. after p2p_v2_earlykeyresponse.py test when we call custom EncryptedP2PState::initiate_v2_handshake() explicitly (TestNode <–remaining 60 bytes– P2PInterface) send_net_magic = false, can_data_be_received = true (Assertion passes in data_received())
  100. DrahtBot removed the label CI failed on Nov 15, 2023
  101. in test/functional/p2p_v2_earlykeyresponse.py:55 in b3c0a54085 outdated
    50+        self.v2_state = None
    51+
    52+    def connection_made(self, *args, **kwargs):
    53+        """Custom implementation so that 64 bytes ellswift is sent in 2 parts during `initial_v2_handshake()`"""
    54+        self.v2_state = TestEncryptedP2PState()
    55+        super(PeerEarlyKey, self).connection_made(*args, **kwargs)
    


    mzumsande commented at 10:30 pm on November 15, 2023:
    Is it on purpose to use super() above and provide args here and below? Not an expert on this kind of stuff, but this suggests that providing args is not necessary in python3.

    stratospher commented at 4:29 am on November 20, 2023:
    you’re right! done.
  102. in test/functional/p2p_v2_earlykeyresponse.py:39 in b3c0a54085 outdated
    34+            1. when `send_network_magic` = True, send first 4 bytes of ellswift (matches network magic bytes)
    35+            2. when `send_network_magic` = False, send remaining 60 bytes of ellswift
    36+        """
    37+        if self.send_net_magic:
    38+            self.privkey_ours, self.ellswift_ours = ellswift_create()
    39+            self.sent_garbage = os.urandom(garbage_len)
    


    mzumsande commented at 11:04 pm on November 15, 2023:
    one more instance of os.urandom (random.randbytes(garbage_len) should work)

    stratospher commented at 4:30 am on November 20, 2023:
    done.
  103. mzumsande commented at 11:38 pm on November 15, 2023: contributor

    I had a look at p2p_v2_earlykeyresponse.py and it would fail the handshake rarely (but the test still passes because the error is thrown in the p2p thread!). I could reproduce it by changing GenerateRandomGarbage() similar to TheStack above, but this time to large values: ret.resize(V2Transport::MAX_GARBAGE_LEN-19); or larger. I haven’t analyzed any further so far.

    [Edit]: Well, it is expected that the handshake fails, because we manipulated our key we’ll never find the expected garbage terminator. But it seems confusing to fail with a stack trace / uncaught exception in this case ("Fatal error: protocol.data_received() call failed.") when this is actually expected to happen.

  104. mzumsande commented at 4:44 am on November 16, 2023: contributor
    Ah, I think if we have an unhandled exception like in my previous post, running the test directly (python3 p2p_v2_earlykeyresponse.py) will show success, but if we use the test_runner (python3 test_runner.py p2p_v2_earlykeyresponse.py) it will show failure, probably because things were written to stderr. So we need to do something about the case I mentioned above.
  105. stratospher force-pushed on Nov 20, 2023
  106. stratospher commented at 4:29 am on November 20, 2023: contributor

    But it seems confusing to fail with a stack trace / uncaught exception in this case (“Fatal error: protocol.data_received() call failed.”) when this is actually expected to happen.

     02023-11-18T13:52:32.976000Z TestFramework (INFO): PRNG seed is: 8699047777059301467
     12023-11-18T13:52:32.977000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_70zlmcgr
     22023-11-18T13:52:33.535000Z TestFramework (INFO): Sending ellswift bytes in parts to ensure that response from responder is received only when
     32023-11-18T13:52:33.536000Z TestFramework (INFO): ellswift bytes have a mismatch from the 16 bytes(network magic followed by "version\x00\x00\x00\x00\x00")
     42023-11-18T13:52:33.536000Z TestFramework (INFO): Sending first 4 bytes of ellswift which match network magic
     52023-11-18T13:52:33.536000Z TestFramework (INFO): If a response is received, assertion failure would happen in our custom data_received() function
     62023-11-18T13:52:33.587000Z TestFramework (INFO): Sending remaining ellswift and garbage which are different from V1_PREFIX. Since a response is
     72023-11-18T13:52:33.587000Z TestFramework (INFO): expected now, our custom data_received() function wouldn't result in assertion failure
     8Fatal error: protocol.data_received() call failed.
     9protocol: <__main__.PeerEarlyKey object at 0x7f0d6f6c1060>
    10transport: <_SelectorSocketTransport fd=10 read=polling write=<idle, bufsize=0>>
    11Traceback (most recent call last):
    12  File "/usr/lib/python3.10/asyncio/selector_events.py", line 876, in _read_ready__data_received
    13    self._protocol.data_received(data)
    14  File "/home/stratospher/code/bitcoin/test/functional/p2p_v2_earlykeyresponse.py", line 60, in data_received
    15    super(PeerEarlyKey, self).data_received(t)
    16  File "/home/stratospher/code/bitcoin/test/functional/test_framework/p2p.py", line 297, in data_received
    17    self.v2_handshake()
    18  File "/home/stratospher/code/bitcoin/test/functional/test_framework/p2p.py", line 284, in v2_handshake
    19    raise ValueError("invalid v2 mac tag in handshake authentication")
    20ValueError: invalid v2 mac tag in handshake authentication
    212023-11-18T13:52:33.673000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:13882 due to invalid v2 mac tag in handshake authentication
    222023-11-18T13:52:33.693000Z TestFramework (INFO): successful disconnection when MITM happens in the key exchange phase
    232023-11-18T13:52:33.744000Z TestFramework (INFO): Stopping nodes
    242023-11-18T13:52:33.948000Z TestFramework (INFO): Cleaning up /tmp/bitcoin_func_test_70zlmcgr on exit
    252023-11-18T13:52:33.948000Z TestFramework (INFO): Tests successful
    

    this happened because of the mac tag mismatch in the v2 handshake’s authentication. it only got displayed sometimes because the test would finish before the v2 handshake authentication happened. since we don’t need to do v2 handshake authentication/v2 handshake for this test, and just be able to detect when data is received, i’ve removed the P2PInterface::data_received() call which performs the handshake. whenever data is received, PeerEarlyKey::data_received() would detect it and do nothing with the received bytes.

    I could reproduce it by changing GenerateRandomGarbage() similar to TheStack above, but this time to large values: ret.resize(V2Transport::MAX_GARBAGE_LEN-19); or larger.

    oh! i’m not able to reproduce this by changing to large values. Could this be related to the above test log?

  107. mzumsande commented at 6:46 pm on November 20, 2023: contributor

    is this the test log you observed?

    Yes. Only in ~1% of runs with the previous version, but after changing the garbage size in net.cpp to the max I would get it consistently. I think the reason is that usually bitcoind would terminate the connection (because the python side would wait for more bytes that wouldn’t come), but when the garbage was close to the max, the python side would terminate it and throwing the exception.

    i’ve removed the P2PInterface::data_received() call which performs the handshake

    Thanks, that should fix this issue.

  108. in test/functional/p2p_v2_encrypted.py:102 in 7eac63705b outdated
    86+
    87+            if i:
    88+                # check if node1 connected to node0 (but not to node0's p2p connection directly)
    89+                # gets blocks produced by node0's p2p connection
    90+                self.log.info("Check if blocks produced by node0's p2p connection is received by node0")
    91+                peer6.send_blocks_and_test(test_blocks, node0, success=True) # node0's tip advances
    


    mzumsande commented at 11:13 pm on November 20, 2023:

    One possible problem I encountered when running the p2p_invalid_messages.py test_resource_exhaustion subtest with v2 that but could affect many tests intermittently, also this spot:

    I think that there could be a race between the python main thread of functional tests and the p2p thread: For example, this command 1) builds messages, 2) encrypts them and 3) sends them, all within the main thread. If during step 2) the bitcoind node would decide to send out a ping, the p2p thread will also encrypt a message (the pong answer) and send it. It doesn’t wait for the other thread. This isn’t a problem with v1 (if the messages are sent in different orders, nothing may break), but with v2 I think the connection would break down because the ChaCha stream cipher is stateful.

    So I think that maybe we need some kind of synchronization / locking mechanism?

    I haven’t though much about a solution yet, just wanted to share this problem.


    theStack commented at 10:41 am on November 24, 2023:

    Good point indeed. To execute the steps build/encrypt/send-message atomically, I think introducing a new lock and acquiring it in the send_message method should do the trick? I.e.

     0diff --git a/test/functional/test_framework/p2p.py b/test/functional/test_framework/p2p.py
     1index a70682a768..59707471a6 100755
     2--- a/test/functional/test_framework/p2p.py
     3+++ b/test/functional/test_framework/p2p.py
     4@@ -164,6 +164,7 @@ class P2PConnection(asyncio.Protocol):
     5         # The underlying transport of the connection.
     6         # Should only call methods on this from the NetworkThread, c.f. call_soon_threadsafe
     7         self._transport = None
     8+        self._send_lock = threading.Lock()
     9         self.v2_state = None  # EncryptedP2PState object needed for v2 p2p connections
    10         self.supports_v2_p2p = False  # set if the connection supports v2 p2p
    11         self.reconnect = False  # set if reconnection needs to happen
    12@@ -371,9 +372,10 @@ class P2PConnection(asyncio.Protocol):
    13 
    14         This method takes a P2P payload, builds the P2P header and adds
    15         the message to the send buffer to be sent over the socket."""
    16-        tmsg = self.build_message(message, is_decoy)
    17-        self._log_message("send", message)
    18-        return self.send_raw_message(tmsg)
    19+        with self._send_lock:
    20+            tmsg = self.build_message(message, is_decoy)
    21+            self._log_message("send", message)
    22+            return self.send_raw_message(tmsg)
    23 
    24     def send_raw_message(self, raw_message_bytes):
    25         if not self.is_connected:
    

    It might be also good to classify the methods build_message and send_raw_message as private then (that is, prefixing them with an underscore) to express that they shouldn’t be called from the outside anymore, as this would cause problems for v2 connections after handshake completion. Currently only p2p_invalid_messages.py is calling those methods, so that test needs to be reworked for v2 support (didn’t look deeper yet though, can maybe also done in a follow-up).


    stratospher commented at 11:09 am on November 24, 2023:

    +1, good catch! when i tried this, just running self.test_resource_exhaustion() in p2p_invalid messages.py took too much time.

    1. in v1, without locks - 0m22.265s
    2. in v2, without locks - won’t work
    3. in v1, with locks - 0m22.101s
    4. in v2, with locks - 8m45.916s

    theStack commented at 2:08 pm on November 24, 2023:

    @stratospher: Seems like the long run-time in case 4. is primarily caused by the slow ChaCha20 implementation in Python. On my machine, encrypting 4MB of data (as done in test_resource_exhaustion, but repeated 80 times) takes already more than half a minute:

    0$ ipython 
    1Python 3.10.13 (main, Oct  5 2023, 16:21:31) [Clang 13.0.0 ]
    2Type 'copyright', 'credits' or 'license' for more information
    3IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help.
    4
    5In [1]: from test.functional.test_framework.crypto.bip324_cipher import aead_chacha20_poly1305_encrypt
    6
    7In [2]: %time _ = aead_chacha20_poly1305_encrypt(key=bytes([1]*32), nonce=bytes([2]*12), aad=bytes(), plaintext=bytes([0xcc]*4000000))
    8CPU times: user 35.9 s, sys: 40 ms, total: 36 s
    9Wall time: 35.8 s
    

    I guess we can’t do that much about that (unless someone sees a way to optimize our python cryptography, without introducing new dependencies like numpy or calling an external library?), other than reducing the number of loops in test_resource_exhaustion if p2p-v2 is used.


    mzumsande commented at 4:48 pm on November 24, 2023:

    so that test needs to be reworked for v2 support (didn’t look deeper yet though, can maybe also done in a follow-up)

    Yes, multiple tests need an adjustment for running with python v2 P2P. I have a WIP branch at https://github.com/mzumsande/bitcoin/tree/tmp_bip324_fixalltests for that which has helped me find this concurrency issue and other issues I mentioned above, but will only open a PR after this PR is merged.

    Seems like the long run-time in case 4. is primarily caused by the slow ChaCha20 implementation in Python.

    I agree. I think that his particular subtest maybe could be skipped with v2 because its goal is to stress the node by bombarding it with lots of messages in a short time, not the test framework. But also, feature_block.py and feature_maxuploadtarget.py, become very slow on my computer for the same reason and we might have to only run them with v1 transport.


    stratospher commented at 7:10 pm on November 24, 2023:

    I guess we can’t do that much about that (unless someone sees a way to optimize our python cryptography, without introducing new dependencies like numpy or calling an external library?), other than reducing the number of loops in test_resource_exhaustion if p2p-v2 is used.

    makes sense.

    i’ve introduced a new lock in this commit.

  109. stratospher force-pushed on Nov 24, 2023
  110. in test/functional/test_framework/test_node.py:202 in b9538c7067 outdated
    197@@ -197,6 +198,10 @@ def start(self, extra_args=None, *, cwd=None, stdout=None, stderr=None, env=None
    198         """Start the node."""
    199         if extra_args is None:
    200             extra_args = self.extra_args
    201+        if "-v2transport=1" in extra_args:
    202+            self.advertise_v2_p2p = True
    


    mzumsande commented at 5:21 pm on November 28, 2023:
    Now that #28805 is merged, self.use_v2transport from test_node could be used instead of introducing self.advertise_v2_p2p.

    stratospher commented at 4:18 pm on November 29, 2023:
    yay! done.
  111. in test/functional/test_framework/p2p.py:262 in 9a6b684bec outdated
    257+
    258+            # if we're the responder, read ellswift bytes till the first mismatch from 12 bytes V1_PREFIX in
    259+            # `respond_v2_handshake()`.
    260+            # `complete_handshake()` reads the remaining `64 - length` ellswift bytes afterwards from recvbuf.
    261+            # if we're the initiator, length = 0 and 64 bytes ellswift is read from recvbuf in `complete_handshake()`
    262+            if len(self.recvbuf) < 64 - length:
    


    mzumsande commented at 6:32 pm on November 28, 2023:
    Why 64 - length here instead of 64? I realize that we have already read some bytes in respond_v2_handshake but I think we haven’t removed these from self.recvbuf yet, so wouldn’t this need to be at least 64 bytes to proceed?

    stratospher commented at 4:19 pm on November 29, 2023:
    good catch! done.
  112. in test/functional/test_framework/p2p.py:273 in 9a6b684bec outdated
    268+            self.recvbuf = self.recvbuf[64:]
    269+
    270+        # `self.v2_state.peer` is instantiated only after shared ECDH secret/BIP324 derived keys and ciphers
    271+        # is derived in `complete_handshake()`.
    272+        # so `authenticate_handshake()` which uses the BIP324 derived ciphers gets called after `complete_handshake()`.
    273+        if self.v2_state.peer:
    


    mzumsande commented at 6:52 pm on November 28, 2023:
    Is there a way we could get to this spot without self.v2_state.peer? I think not (it should have been present from the beginning or created in complete_handshake), so this line could be dropped or changed to an assert?

    stratospher commented at 4:19 pm on November 29, 2023:

    you’re right! i’ve made it an assert.

    i vaguely remember adding this in some older version of the branch because when TestNode sends transport layer packets(64 bytes ellswift, x bytes garbage, 16 bytes garbage terminator, optional decoy packets, 20 bytes transport version packet) to our python P2PInterface, it’s not necessary that we receive all these bytes at once on the wire/have access to these bytes at once on self.recvbuf. Looking at this again - even without the “if self.v2_state.peer”, the code paths for when 64 + 16 + 20 bytes is received or when 64 bytes, delay, 16 + 20 bytes is received works fine!

    EDIT: the code path in complete_handshake returning 0, b"" to receive rest of (64-x) bytes doesn’t work currently because it would result in an assertion failure here in case 64 bytes ellswift were received in smaller chunks. added an else condition in case that happens.

  113. in test/functional/test_framework/p2p.py:275 in 9a6b684bec outdated
    270+        # `self.v2_state.peer` is instantiated only after shared ECDH secret/BIP324 derived keys and ciphers
    271+        # is derived in `complete_handshake()`.
    272+        # so `authenticate_handshake()` which uses the BIP324 derived ciphers gets called after `complete_handshake()`.
    273+        if self.v2_state.peer:
    274+            # authenticate v2 handshake
    275+            if len(self.recvbuf) < 16 + 20:
    


    mzumsande commented at 7:10 pm on November 28, 2023:
    could add a comment about the meaning of the magic numbers

    stratospher commented at 4:19 pm on November 29, 2023:
    added!
  114. achow101 referenced this in commit 30a0557829 on Nov 28, 2023
  115. in test/functional/test_framework/p2p.py:572 in ae8070c58f outdated
    559@@ -553,6 +560,11 @@ def on_verack(self, message):
    560 
    561     def on_version(self, message):
    562         assert message.nVersion >= MIN_P2P_VERSION_SUPPORTED, "Version {} received. Test framework only supports versions greater than {}".format(message.nVersion, MIN_P2P_VERSION_SUPPORTED)
    563+        # reconnection using v1 P2P has happened since version message can be processed, previously queued messages are sent using v1 P2P here
    564+        if self.reconnect:
    565+            while self.queue_messages:
    566+                message = self.queue_messages.pop(0)
    567+                self.send_message(message)
    


    mzumsande commented at 8:29 pm on November 28, 2023:
    Should we at some point set self.reconnect to False again (maybe here?), so that future possible disconnections that would be unrelated will be handled correctly?

    stratospher commented at 4:19 pm on November 29, 2023:
    good point! done.
  116. mzumsande commented at 9:18 pm on November 28, 2023: contributor
    Another round of review - looking good!
  117. stratospher force-pushed on Nov 29, 2023
  118. stratospher commented at 4:20 pm on November 29, 2023: contributor
    thanks @mzumsande for the really great feedback! I’ve rebased on master since #28805 is merged and updated the PR to address your comments.
  119. in test/functional/test_framework/v2_p2p.py:60 in 95d57de31f outdated
    56+}
    57+
    58+
    59+def GetShortIDFromMessageType(msgtype):
    60+    """Returns 1-byte short message type ID for the P2P message"""
    61+    msgtype_to_shortid = dict(map(reversed, SHORTID.items()))
    


    theStack commented at 4:55 pm on November 29, 2023:
    nit: as this dictionary is only dependent on the constant SHORTID dict, could place it in the global namespace (e.g. as MSGTYPE_TO_SHORTID directly below SHORTID), so it only has to be calculated once if the module is loaded, and not on each GetShortIDFromMessageType call again

    stratospher commented at 5:08 am on November 30, 2023:
    great idea! done.
  120. in test/functional/test_framework/v2_p2p.py:61 in 95d57de31f outdated
    57+
    58+
    59+def GetShortIDFromMessageType(msgtype):
    60+    """Returns 1-byte short message type ID for the P2P message"""
    61+    msgtype_to_shortid = dict(map(reversed, SHORTID.items()))
    62+    return msgtype_to_shortid[msgtype].to_bytes(1, 'big') if msgtype in msgtype_to_shortid else b"\x00"
    


    theStack commented at 4:58 pm on November 29, 2023:

    a little more pythonic (see https://docs.python.org/3/library/stdtypes.html#dict.get):

    0    return msgtype_to_shortid.get(msgtype, 0).to_bytes(1, 'big')
    

    stratospher commented at 5:10 am on November 30, 2023:
    interesting! i’ve kept a MSGTYPE_TO_SHORTID dictionary and used this construct in build_message.
  121. in test/functional/test_framework/v2_p2p.py:107 in 95d57de31f outdated
    102+    def generate_keypair_and_garbage(self):
    103+        """Generates ellswift keypair and 4095 bytes garbage at max"""
    104+        self.privkey_ours, self.ellswift_ours = ellswift_create()
    105+        garbage_len = random.randrange(MAX_GARBAGE_LEN + 1)
    106+        self.sent_garbage = random.randbytes(garbage_len)
    107+        logger.debug("sending %d bytes of garbage data" % garbage_len)
    


    theStack commented at 5:11 pm on November 29, 2023:

    nit:

    0        logger.debug(f"sending {garbage_len} bytes of garbage data")
    

    stratospher commented at 5:10 am on November 30, 2023:
    done.
  122. in test/functional/test_framework/p2p.py:325 in 07be9af558 outdated
    337+                    else:
    338+                        # a short id
    339+                        if shortid in SHORTID:
    340+                            msgtype = SHORTID[shortid]
    341+                        else:
    342+                            msgtype = "unknown-" + str(shortid)
    


    theStack commented at 5:35 pm on November 29, 2023:

    nit: shorter, using dict’s .get method again:

    0                        msgtype = SHORTID.get(shortid, f"unknown-{shortid}")
    

    stratospher commented at 5:10 am on November 30, 2023:
    yay! looks pythonic! done.
  123. in test/functional/test_framework/p2p.py:393 in 985791c38f outdated
    399-        tmsg += h[:4]
    400-        tmsg += data
    401-        return tmsg
    402+        if self.supports_v2_p2p:
    403+            msgtype = message.msgtype
    404+            data = message.serialize()
    


    theStack commented at 5:40 pm on November 29, 2023:
    nit: to deduplicate, could move these two assignments above the if, as they are identical in both branches

    stratospher commented at 5:10 am on November 30, 2023:
    done.
  124. DrahtBot added the label CI failed on Nov 29, 2023
  125. in test/functional/p2p_v2_encrypted.py:58 in a1109bfc25 outdated
    53+
    54+    def run_test(self):
    55+        node0, node1 = self.nodes[0], self.nodes[1]
    56+        self.log.info("Check inbound connections to v2 TestNode from v2 P2PConnection is v2")
    57+        peer1 = node0.add_p2p_connection(P2PInterface(), wait_for_verack=True, supports_v2_p2p=True)
    58+        assert peer1.supports_v2_p2p
    


    theStack commented at 6:09 pm on November 29, 2023:

    to verify each connection’s transport protocol version, could additionally take the node’s perspective into account via getpeerinfo(), e.g.:

     0diff --git a/test/functional/p2p_v2_encrypted.py b/test/functional/p2p_v2_encrypted.py
     1index 177c3ed42f..8f5c4bb925 100755
     2--- a/test/functional/p2p_v2_encrypted.py
     3+++ b/test/functional/p2p_v2_encrypted.py
     4@@ -56,23 +56,28 @@ class P2PEncrypted(BitcoinTestFramework):
     5         self.log.info("Check inbound connections to v2 TestNode from v2 P2PConnection is v2")
     6         peer1 = node0.add_p2p_connection(P2PInterface(), wait_for_verack=True, supports_v2_p2p=True)
     7         assert peer1.supports_v2_p2p
     8+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v2")
     9 
    10         self.log.info("Check inbound connection to v2 TestNode from v1 P2PConnection is v1")
    11         peer2 = node0.add_p2p_connection(P2PInterface(), wait_for_verack=True, supports_v2_p2p=False)
    12         assert not peer2.supports_v2_p2p
    13+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v1")
    14 
    15         self.log.info("Check outbound connection from v2 TestNode to v1 P2PConnection advertised as v1 is v1")
    16         peer3 = node0.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, supports_v2_p2p=False, advertise_v2_p2p=False)
    17         assert not peer3.supports_v2_p2p
    18+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v1")
    19 
    20         # v2 TestNode performs downgrading here
    21         self.log.info("Check outbound connection from v2 TestNode to v1 P2PConnection advertised as v2 is v1")
    22         peer4 = node0.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, supports_v2_p2p=False, advertise_v2_p2p=True)
    23         assert not peer4.supports_v2_p2p
    24+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v1")
    25 
    26         self.log.info("Check outbound connection from v2 TestNode to v2 P2PConnection advertised as v2 is v2")
    27         peer5 = node0.add_outbound_p2p_connection(P2PInterface(), p2p_idx=2, supports_v2_p2p=True, advertise_v2_p2p=True)
    28         assert peer5.supports_v2_p2p
    29+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v2")
    30 
    31         self.log.info("Check if version is sent and verack is received in inbound/outbound connections")
    32         assert_equal(len(node0.getpeerinfo()), 5)  # check if above 5 connections are present in node0's getpeerinfo()
    33@@ -88,6 +93,7 @@ class P2PEncrypted(BitcoinTestFramework):
    34             # Add v2 P2P connection to node0
    35             peer6 = node0.add_p2p_connection(P2PDataStore(), supports_v2_p2p=True)
    36             assert peer6.supports_v2_p2p
    37+            assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v2")
    38 
    39             if i:
    40                 # check if node1 connected to node0 (but not to node0's p2p connection directly)
    41@@ -115,6 +121,7 @@ class P2PEncrypted(BitcoinTestFramework):
    42         self.restart_node(0, ["-v2transport=0"])
    43         peer7 = node0.add_p2p_connection(P2PInterface(), wait_for_verack=True, supports_v2_p2p=True)
    44         assert not peer7.supports_v2_p2p
    45+        assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v1")
    46         check_node_connections(node=node0, num_in=1, num_out=0)
    

    stratospher commented at 5:10 am on November 30, 2023:
    good suggestion! done.
  126. theStack commented at 6:22 pm on November 29, 2023: contributor
    Left some comments below from the latest review round, mostly nits. I think another small check to add in the newly introduced functional test could be to verify that a P2PConnection’s session id matches the one from the node, as returned via getpeerinfo().
  127. in test/functional/test_framework/test_node.py:666 in 47e72daf65 outdated
    664             kwargs['dstaddr'] = '127.0.0.1'
    665 
    666         p2p_conn.p2p_connected_to_node = True
    667-        p2p_conn.peer_connect(**kwargs, send_version=send_version, net=self.chain, timeout_factor=self.timeout_factor)()
    668+        if self.use_v2transport:
    669+            kwargs['services'] = kwargs['services']|NODE_P2P_V2 if 'services' in kwargs else P2P_SERVICES|NODE_P2P_V2
    


    mzumsande commented at 6:37 pm on November 29, 2023:

    Ah, now that use_v2transport with the correct restart behavior is used, CI fails because it is necessary to fix the test_service_flags subtest from rpc_net.py already in this PR, to something like

    0if self.options.v2transport:
    1    assert_equal(['UNKNOWN[2^4]', 'P2P_V2', 'UNKNOWN[2^63]'], self.nodes[0].getpeerinfo()[-1]['servicesnames'])
    2else:
    3    assert_equal(['UNKNOWN[2^4]', 'UNKNOWN[2^63]'], self.nodes[0].getpeerinfo()[-1]['servicesnames'])
    

    stratospher commented at 5:11 am on November 30, 2023:
    missed that! i’ve updated the subtest in rpc_net.py.

    theStack commented at 2:35 pm on December 24, 2023:

    more-pythonic-nit:

    0            kwargs['services'] = kwargs.get('services', P2P_SERVICES) | NODE_P2P_V2
    

    (also in the add_outbound_p2p_connection method below)


    stratospher commented at 4:53 pm on January 2, 2024:
    nice! done.
  128. stratospher force-pushed on Nov 30, 2023
  129. stratospher force-pushed on Nov 30, 2023
  130. stratospher force-pushed on Nov 30, 2023
  131. DrahtBot removed the label CI failed on Nov 30, 2023
  132. stratospher commented at 7:11 am on November 30, 2023: contributor

    thanks @theStack, @mzumsande! I’ve updated the PR to address your comments. Also made this change to MSGTYPE_TO_SHORTID construction because of lint-python.py failure in the CI

     0diff --git a/test/functional/test_framework/v2_p2p.py b/test/functional/test_framework/v2_p2p.py
     1index fe44351a13..b4efd1ac58 100644
     2--- a/test/functional/test_framework/v2_p2p.py
     3+++ b/test/functional/test_framework/v2_p2p.py
     4@@ -55,7 +55,7 @@ SHORTID = {
     5 }
     6 
     7 # Dictionary which contains short message type ID for the P2P message
     8-MSGTYPE_TO_SHORTID = dict(map(reversed, SHORTID.items()))
     9+MSGTYPE_TO_SHORTID = {msgtype: shortid for shortid, msgtype in SHORTID.items()}
    10 
    11 
    12 class EncryptedP2PState:
    
  133. in test/functional/test_framework/p2p.py:169 in bc26a862b0 outdated
    162@@ -163,6 +163,7 @@ def __init__(self):
    163         # The underlying transport of the connection.
    164         # Should only call methods on this from the NetworkThread, c.f. call_soon_threadsafe
    165         self._transport = None
    166+        self._send_lock = threading.Lock()
    


    mzumsande commented at 10:43 pm on December 6, 2023:
    One thing to consider is that now we have two locks for p2p, we need to worry about deadlocks. There is an implied lock order, on_message locks p2p_lock, and can call messages (e.g. on_inv) that will call send_message and lock _send_lock. So we must make sure never to take p2p_lock after _send_lock. I don’t think that this is currently possible, but maybe we should add a comment about this.

    stratospher commented at 4:01 am on December 15, 2023:
    good find! i’ve added a comment.

    sr-gi commented at 6:41 pm on January 12, 2024:

    Would it be worth changing how p2p_lock is acquired to prevent this? Instead of just grabbing it we could have a helper method being part of the P2PConnection that returns it as long as _send_lock is not being held, and fails otherwise. Something along the lines of:

    0def p2p_lock(self):
    1    assert not self._send_lock.locked()
    2    return p2p_lock
    

    If you make this change it may also be worth renaming p2p_lock to _p2p_lock to discourage it from being imported straightaway (and even comment that it shouldn’t).

    Given this will affect several parts of the tests suite it may be worth doing it as a followup


    stratospher commented at 2:55 pm on January 27, 2024:

    @mzumsande, @sr-gi, I played with @sr-gi’s idea on this branch and wondering about 2 things:

    1. can deadlocks happen? i noticed places in the test where _send_lock was acquired and then p2p_lock (and deadlocks didn’t happen). am i missing something?
      1. MainThread acquires _send_lock
      2. NetworkThread acquires p2p_lock
      3. NetworkThread would wait for MainThread to release _send_lock
      4. MainThread would release _send_lock at some point of time right? (since it doesn’t need to wait for p2p_lock)
    2. do we need a separate _send_lock for every P2PConnection? benchmarks look slightly faster with a global _send_lock (like p2p_lock).

    Given this will affect several parts of the tests suite it may be worth doing it as a followup

    +1, but supposing deadlocks are possible - in the follow up:

    1. _send_lock acquired because of send_message() and then p2p_lock acquired because of wait_until() is harmless for us
    2. _send_lock acquired because of send_message() and then p2p_lock acquired because of on_message() is the situation we need to be concerned about

    sr-gi commented at 6:24 pm on January 29, 2024:

    do we need a separate _send_lock for every P2PConnection? benchmarks look slightly faster with a global _send_lock (like p2p_lock).

    Umm, that sounds a bit strange to me. My intuition would be that having a global lock should be slower, especially if multiple connections are run at the same time, given they would be racing to acquire that lock every time a message needs to be sent (?)

    w.r.t wether or not a deadlock is possible, I think @mzumsande may be able to tell you better given he was the one pointing them out initially, but if we want to enforce a certain ordering of acquiring them I think my suggestion (or something on those lines) may be worth adding. I’m even happy to open a PR for that


    mzumsande commented at 9:03 pm on February 16, 2024:

    Something along the lines of:

    0def p2p_lock(self):
    1    assert not self._send_lock.locked()
    2    return p2p_lock
    

    I think that this is not right (I tried it out, and it asserts everywhere :sweat_smile:): To prevent deadlocks, the lock order must be preserved only within each thread:

    • It’s possible that a given thread first takes p2p_lock, and then send_lock. See for example: on_message -> p2p_lock taken -> call on_inv -> call send_message -> _send_lock taken
    • What must be disallowed is that any thread can do the reverse: First take _send_lock, and then take p2p_lock, because it would lead to a deadlock
    • I don’t think that this is possible because I looked at the one spot where send_lock is taken (it calls self.build_message and self.send_raw_message, neither of which take p2p_lock)
    • But it is perfectly allowed that Thread 1 take _send_lock, and immediately after Thread 2 takes p2p_lock (which would trigger the assert).

    sr-gi commented at 9:42 pm on February 16, 2024:
    You’re right. I don’t think there is any easy way of keeping track of who’s thread is holding what lock in Python though, so I don’t think there a straightforward way of addressing this :(
  134. mzumsande commented at 10:47 pm on December 6, 2023: contributor

    ACK f479a99cc92c7966266804e8c33338cebbc44fbc

    I have reviewed the code. I did some testing / gained more confidence that it works well by having a branch where, with some adjustments, the python v2 p2p module is used for the entire functional test suite.

  135. DrahtBot requested review from jonatack on Dec 6, 2023
  136. DrahtBot requested review from brunoerg on Dec 6, 2023
  137. DrahtBot requested review from theStack on Dec 6, 2023
  138. in test/functional/test_framework/v2_p2p.py:62 in 1e07058714 outdated
    58+# Dictionary which contains short message type ID for the P2P message
    59+MSGTYPE_TO_SHORTID = {msgtype: shortid for shortid, msgtype in SHORTID.items()}
    60+
    61 
    62 class EncryptedP2PState:
    63+    """A class for performing v2 P2P protocol functions on P2PConnection:
    


    sipa commented at 6:32 pm on December 7, 2023:

    In commit “[test] Construct class to handle v2 P2P protocol functions”

    It would be helpful to explain what the return value of the various functions in this class means. In several cases the return value seems to be bytes-that-should-be-sent-to-the-peer, but there are exceptions (-1 meaning downgrade for respond_v2_handshake, and v2_ecdh returning the shared secret).


    stratospher commented at 4:01 am on December 15, 2023:
    definitely useful to have. done.
  139. in test/functional/test_framework/v2_p2p.py:138 in 1e07058714 outdated
    133+        self.initialize_v2_transport(ecdh_secret)
    134+        # Send garbage terminator
    135+        msg_to_send = self.peer['send_garbage_terminator']
    136+        # Optionally send decoy packets after garbage terminator.
    137+        aad = self.sent_garbage
    138+        for decoy_content_len in [random.randint(1, 100) for _ in range(10)]:
    


    sipa commented at 6:55 pm on December 7, 2023:

    In commit “[test] Construct class to handle v2 P2P protocol functions”

    It seems a bit strange to always send exactly 10 decoy packets before the version packet. Maybe randomize that too?


    stratospher commented at 4:01 am on December 15, 2023:
    changed it to send 0-9 number of decoy packets randomly.
  140. in test/functional/test_framework/v2_p2p.py:225 in 1e07058714 outdated
    220+        Returns:
    221+            1. length - length of packet processed in order to update recvbuf.
    222+                      - return 0 if only part of packet is received. (recvbuf not updated since decryption not done yet)
    223+                      - return -1 if there's a MAC tag mismatch and disconnect.
    224+            2. decrypted packet contents
    225+                     - return b"" if only part of packet is received/MAC tag mismatch.
    


    sipa commented at 7:22 pm on December 7, 2023:

    In commit “[test] Construct class to handle v2 P2P protocol functions”

    As I understand it, this function does not distinguish between “valid packet with 0-byte contents” and “decoy packet”, which I believe is incorrect (the version packet can have 0-byte contents, but if it’s a decoy, we should still listen for more packets). This isn’t caught because Bitcoin Core’s V2Transport implementation never sends decoys, but if it did, I believe the tests would break.

    My suggestion would be to return None instead of b"" for decoys.


    stratospher commented at 4:01 am on December 15, 2023:
    good catch! changed it to None.
  141. in test/functional/test_framework/v2_p2p.py:168 in 1e07058714 outdated
    163+                    aad = b""
    164+                    if length == -1:
    165+                        return processed_length, False
    166+                    elif length == 0:
    167+                        return 0, True
    168+                    # currently version packet has empty content while decoy packets don't
    


    sipa commented at 7:28 pm on December 7, 2023:

    In commit “[test] Construct class to handle v2 P2P protocol functions”

    I don’t think we should be hardcoding this assumption. Decoy packets with 0-byte contents are very much correct and possibly useful. See my other comment on v2_receive_packet about distinguishing decoys from 0-byte content packets.


    stratospher commented at 4:01 am on December 15, 2023:
    right! done.
  142. in test/functional/test_framework/p2p.py:217 in e378d8af14 outdated
    210@@ -199,8 +211,13 @@ def connection_made(self, transport):
    211         assert not self._transport
    212         logger.debug("Connected & Listening: %s:%d" % (self.dstaddr, self.dstport))
    213         self._transport = transport
    214+        # in an inbound connection to the TestNode, P2PConnection is the initiator. [TestNode <---- P2PConnection]
    215+        # ellswift is generated and sent immediately to begin the initial v2 handshake.
    216+        if self.supports_v2_p2p and self.v2_state.initiating and not self.v2_state.tried_v2_handshake:
    217+            ellswift_and_garbage_data = self.v2_state.initiate_v2_handshake()
    


    sipa commented at 7:30 pm on December 7, 2023:

    In commit “[test] Introduce EncryptedP2PState object in P2PConnection”

    Nit: I’d find it a bit cleaner to not give this variable such an accurate name for reasons of abstractions. The code here shouldn’t care about what it is that the EncryptedP2PState tells us to send; it should just be sent. Maybe call it “send_handshake_bytes” ?


    stratospher commented at 4:01 am on December 15, 2023:
    oh makes sense to s/ellswift_and_garbage_data/send_handshake_bytes in p2p.py. didn’t change the name in a test where we send ellswift in parts.
  143. sipa commented at 7:37 pm on December 7, 2023: member
    Halfway through.
  144. in test/functional/test_framework/v2_p2p.py:162 in 1e07058714 outdated
    157+                # Receive, decode, and ignore version packet.
    158+                # This includes skipping decoys and authenticating the received garbage.
    159+                aad = received_garbage[:-16]
    160+                while not self.tried_v2_handshake:
    161+                    length, contents = self.v2_receive_packet(response, aad=aad)
    162+                    processed_length += length
    


    theStack commented at 10:37 pm on December 7, 2023:
    If the length returned by v2_receive_packet(...) is -1 due to a MAC tag mismatch, processed_length is decreased by one here, which is very likely unintended. Currently this is not a problem (the call-site in the later introduced method P2PConnection.v2_handshake throws an exception then anyways and doesn’t do anything with the returned length), but it’s probably better if the processed_length increase happens after the length-checks and returns a few lines below.

    stratospher commented at 4:01 am on December 15, 2023:
    done.
  145. in test/functional/test_framework/v2_p2p.py:120 in 1e07058714 outdated
    115+            byte = response.read(1)
    116+            # return b"" if we need to receive more bytes
    117+            if not byte:
    118+                return b""
    119+            self.received_prefix += byte
    120+            v1_prefix = MAGIC_BYTES[self.net] + b'version\x00\x00\x00\x00\x00'
    


    theStack commented at 10:41 pm on December 7, 2023:
    nit: as the v1_prefix only depends on self.net and never changes, could set it once before the loop (or even in the constructor and store it as member, if we’d ever need it somewhere else), rather than repeatedly on each iteration.

    stratospher commented at 4:01 am on December 15, 2023:
    nice, i’ve kept it outside the loop since we don’t need it anywhere else for now.
  146. in test/functional/test_framework/p2p.py:327 in 74187afea1 outdated
    331+                    self.recvbuf = self.recvbuf[msglen:]
    332+                    shortid = msg[0]
    333+                    if shortid == 0:
    334+                        # a string command
    335+                        msgtype = msg[1:13].rstrip(b'\x00')
    336+                        msg = msg[13:]  # msg is set to be payload
    


    theStack commented at 11:53 pm on December 7, 2023:
    nit: could check that msg has the required minimum length (13) and raise an error if it doesn’t, to avoid a potential index out-of-bounds exception (even though this is more theoretical now, as bitcoind wouldn’t send such an invalid message).

    stratospher commented at 4:01 am on December 15, 2023:
    done.
  147. in test/functional/test_framework/p2p.py:310 in 74187afea1 outdated
    322+                if self.supports_v2_p2p:
    323+                    if not self.v2_state.tried_v2_handshake:
    324+                        return
    325+                    # v2 P2P messages are read
    326+                    if len(self.recvbuf) < 3 + 1 + 16:
    327+                        return
    


    theStack commented at 11:56 pm on December 7, 2023:
    Is this early check needed? I think v2_receive_packet already takes care of these “only part of packet is received” cases and indicates this with a corresponding return code.

    stratospher commented at 4:02 am on December 15, 2023:
    true, replaced it with a check using v2_receive_packet return values which are anyways computed.
  148. DrahtBot requested review from theStack on Dec 8, 2023
  149. stratospher force-pushed on Dec 15, 2023
  150. stratospher commented at 4:17 am on December 15, 2023: contributor
    thanks for the reviews @mzumsande, @sipa, @theStack! i’ve updated the PR to address your comments. sorry for the delay since i was away.
  151. in test/functional/test_framework/v2_p2p.py:260 in d4d9f67304 outdated
    257+        Returns:
    258+        1. int - length of packet processed in order to update recvbuf.
    259+               - return 0 if only part of packet is received. (recvbuf not updated since decryption not done yet)
    260+               - return -1 if there's a MAC tag mismatch and disconnect.
    261+        2. bytes - decrypted packet contents
    262+                 - return b"" if only part of packet is received/MAC tag mismatch.
    


    theStack commented at 2:58 pm on December 24, 2023:
    nit: IIUC, b"" is also returned if the decrypted package content is empty. As call-sites currently use the length integer return value (0) to check for the “only part of packet is received” condition, this isn’t a problem, but maybe the description can be improved to avoid confusion.

    stratospher commented at 4:54 pm on January 2, 2024:
    added a description and also an extra assert to make sure we don’t receive b"" as an application layer message.
  152. theStack approved
  153. theStack commented at 3:04 pm on December 24, 2023: contributor
    ACK ad0ae3d2128d04ff4f62a4bf612286d153dc314b
  154. DrahtBot requested review from mzumsande on Dec 24, 2023
  155. stratospher force-pushed on Jan 2, 2024
  156. in test/functional/test_framework/p2p.py:183 in f74c95758e outdated
    179@@ -174,16 +180,22 @@ def peer_connect_helper(self, dstaddr, dstport, net, timeout_factor):
    180         self.recvbuf = b""
    181         self.magic_bytes = MAGIC_BYTES[net]
    182 
    183-    def peer_connect(self, dstaddr, dstport, *, net, timeout_factor):
    184+    def peer_connect(self, dstaddr, dstport, *, net, timeout_factor, supports_v2_p2p=False):
    


    mzumsande commented at 10:09 pm on January 8, 2024:
    here, and in peer_accept_connection: I think the default values aren’t necessary and could be dropped. They are not used right now, and it seems better if future callers are forced make a decision on whether to use v2 or not!

    stratospher commented at 2:22 pm on January 20, 2024:
    nice! done. removed default value for reconnect in peer_accept_connection too since that is also always passed from add_outbound_p2p_connection
  157. sipa commented at 4:21 pm on January 10, 2024: member

    Since the merge of #29058 I think the first commit here should be dropped.

    EDIT: nevermind, I confused addconnection with addnode (and the latter already has a v2transport argument).

  158. in test/functional/test_framework/v2_p2p.py:202 in 76abc719f2 outdated
    199+                    # decoy packets have contents = None. v2 handshake is complete only when version packet
    200+                    # (can be empty with contents = b"") with contents != None is received.
    201+                    if contents is not None:
    202+                        self.tried_v2_handshake = True
    203+                        return processed_length, True
    204+                    response = response[length:]
    


    mzumsande commented at 4:52 pm on January 10, 2024:

    Trying to wrap my head around this: Let’s say we receive the garbage and then receive one or more decoy packages. Then we cut the response here, but if no version package has been received so far, we can still return 0, True earlier in the while loop and try again later. But then the next time this function is called, wouldn’t authentication fail because we’ve already removed the garbage terminator from the response?

    Not a big deal though - because bitcoind doesn’t send any decoy messages currently, this cannot be triggered.


    mzumsande commented at 4:29 pm on January 12, 2024:
    just want to add that if this turned out annoying to fix (I didn’t see an immediate trivial fix but haven’t looked hard), I’d be fine with just throwing an assert / error if decoy packages are received in this PR (since bitcoind shouldn’t send those anyway).

    stratospher commented at 2:22 pm on January 20, 2024:
    great catch! i’ve changed the authenticate_handshake function to first deal with garbage terminator detection before processing decoy packets and version packet. hopefully this is not a concern now?
  159. in test/functional/test_framework/p2p.py:324 in 7da31532af outdated
    320@@ -321,6 +321,8 @@ def _on_data(self):
    321                         return
    322                     self.recvbuf = self.recvbuf[msglen:]
    323 
    324+                    if msg is None:  # reject decoy messages
    


    mzumsande commented at 7:50 pm on January 10, 2024:
    nit: reject sounds like we actively do something, ignore seems better.

    stratospher commented at 2:22 pm on January 20, 2024:
    done.
  160. in test/functional/test_framework/test_node.py:745 in 689318cd83 outdated
    740@@ -741,7 +741,10 @@ def addconnection_callback(address, port):
    741             p2p_conn.wait_until(lambda: p2p_conn.message_count["version"] == 1, check_connected=False)
    742             p2p_conn.wait_until(lambda: not p2p_conn.is_connected, check_connected=False)
    743         else:
    744-            p2p_conn.wait_for_connect()
    745+            if reconnect:
    746+                p2p_conn.wait_for_reconnect()
    


    mzumsande commented at 9:00 pm on January 10, 2024:
    I think that this should be moved outside of the if connection_type == "feeler": clause and be done before: After all the v2 reconnection stuff logically comes first, and then all the existing v1 logic still applies to the downgraded v1 connection - for example, the reconnection logic should also work if add_outbound_p2p_connection specified a feeler connection - in which case we currently wouldn’t call wait_for_reconnect.

    stratospher commented at 2:23 pm on January 20, 2024:
    true! done.
  161. in test/functional/test_framework/v2_p2p.py:203 in 76abc719f2 outdated
    192+                    length, contents = self.v2_receive_packet(response, aad=aad)
    193+                    aad = b""
    194+                    if length == -1:
    195+                        return processed_length, False
    196+                    elif length == 0:
    197+                        return 0, True
    


    sipa commented at 1:49 pm on January 11, 2024:

    In commit “[test] Construct class to handle v2 P2P protocol functions”

    This needs to be return processed_length, True, as there may be been decoys before (I can trigger a failure in p2p_v2_encrypted.py by making Bitcoin Core send large decoys before the version packet). Code to trigger it:

     0--- a/src/net.cpp
     1+++ b/src/net.cpp
     2@@ -1130,6 +1130,15 @@ bool V2Transport::ProcessReceivedKeyBytes() noexcept
     3                   m_cipher.GetSendGarbageTerminator().end(),
     4                   MakeWritableByteSpan(m_send_buffer).last(BIP324Cipher::GARBAGE_TERMINATOR_LEN).begin());
     5 
     6+        unsigned decoys = GetRand<unsigned>(10);
     7+        while (decoys--) {
     8+            unsigned decoysize = GetRand<unsigned>(4000000);
     9+            m_send_buffer.resize(m_send_buffer.size() + BIP324Cipher::EXPANSION + decoysize);
    10+            std::vector<std::byte> decoy_data(decoysize);
    11+            m_cipher.Encrypt(decoy_data, MakeByteSpan(m_send_garbage), true, MakeWritableByteSpan(m_send_buffer).last(BIP324Cipher::EXPANSION + decoysize));
    12+            ClearShrink(m_send_garbage);
    13+        }
    14+
    15         // Construct version packet in the send buffer, with the sent garbage data as AAD.
    

    I think it would be even cleaner to change v2_receive_packet to also return contents = None in case no complete packet was received. In that case, the whole elif length == 0: branch can go away.


    stratospher commented at 2:23 pm on January 20, 2024:

    nice catch! done. i’ve changed authenticate_handshake function a bit as discussed in #24748 (review).

    I think it would be even cleaner to change v2_receive_packet to also return contents = None in case no complete packet was received. In that case, the whole elif length == 0: branch can go away.

    i think we need the elif length == 0: now so that we can exit out of the while not self.tried_v2_handshake: loop in case we don’t have sufficient bytes and need more bytes to be received on recvbuf?

  162. in test/functional/test_framework/v2_p2p.py:265 in 76abc719f2 outdated
    252+        return enc_plaintext_len + aead_ciphertext
    253+
    254+    def v2_receive_packet(self, response, aad=b''):
    255+        """Decrypt a BIP324 packet
    256+
    257+        Returns:
    


    sipa commented at 1:56 pm on January 11, 2024:

    In “In commit “[test] Construct class to handle v2 P2P protocol functions”

    Nit: see above, I think it would be cleaner to return (0, None) in case no complete packet is present, and (-1, None) in case of error. That would make the description:

    • int: number of bytes consumed (or -1 if error)
    • bytes: contents of decrypted non-decoy packet if any (or None otherwise)

    Since the callers don’t care about the distinction between “no packet processed” or “decoy packet processed” for what to do with the returned contents, I think this would let you remove some branches.


    stratospher commented at 2:24 pm on January 20, 2024:
    much simpler! done.
  163. in test/functional/test_framework/p2p.py:215 in f74c95758e outdated
    210@@ -199,8 +211,13 @@ def connection_made(self, transport):
    211         assert not self._transport
    212         logger.debug("Connected & Listening: %s:%d" % (self.dstaddr, self.dstport))
    213         self._transport = transport
    214+        # in an inbound connection to the TestNode, P2PConnection is the initiator. [TestNode <---- P2PConnection]
    215+        # ellswift is generated and sent immediately to begin the initial v2 handshake.
    


    sipa commented at 2:02 pm on January 11, 2024:

    In commit “[test] Introduce EncryptedP2PState object in P2PConnection”:

    Nit: probably unnecessary to go into the detail of what gets sent, as that’s the responsibility of v2_p2p. Maybe just say “The initial handshake is sent immediately”.


    stratospher commented at 2:24 pm on January 20, 2024:
    done.
  164. glozow added this to the milestone 27.0 on Jan 11, 2024
  165. in test/functional/test_framework/p2p.py:167 in f74c95758e outdated
    161@@ -159,6 +162,9 @@ def __init__(self):
    162         # The underlying transport of the connection.
    163         # Should only call methods on this from the NetworkThread, c.f. call_soon_threadsafe
    164         self._transport = None
    165+        self.v2_state = None  # EncryptedP2PState object needed for v2 p2p connections
    166+        self.supports_v2_p2p = False  # set if the connection supports v2 p2p
    167+        self.queue_messages = []  # queue messages to send after initial v2 handshake
    


    sr-gi commented at 7:10 pm on January 11, 2024:
    nit: queued / queue of messages / messages queued

    stratospher commented at 2:25 pm on January 20, 2024:
    removed this variable as discussed in #24748 (review).
  166. in test/functional/test_framework/p2p.py:166 in f74c95758e outdated
    161@@ -159,6 +162,9 @@ def __init__(self):
    162         # The underlying transport of the connection.
    163         # Should only call methods on this from the NetworkThread, c.f. call_soon_threadsafe
    164         self._transport = None
    165+        self.v2_state = None  # EncryptedP2PState object needed for v2 p2p connections
    166+        self.supports_v2_p2p = False  # set if the connection supports v2 p2p
    


    sr-gi commented at 7:13 pm on January 11, 2024:

    I think this field is redundant. This will always be True if v2_state is set, and False otherwise, so we could drop it and just check against whether v2_state is None.

    If you think that’s too verbose, you could also create a helper method:

    0def supports_v2_p2p(self):
    1    return self.v2_state is not None
    

    stratospher commented at 2:24 pm on January 20, 2024:
    oh i like this! done.
  167. in test/functional/test_framework/p2p.py:220 in f74c95758e outdated
    216+        if self.supports_v2_p2p and self.v2_state.initiating and not self.v2_state.tried_v2_handshake:
    217+            send_handshake_bytes = self.v2_state.initiate_v2_handshake()
    218+            self.send_raw_message(send_handshake_bytes)
    219         if self.on_connection_send_msg:
    220-            self.send_message(self.on_connection_send_msg)
    221+            self.queue_messages.append(self.on_connection_send_msg) if self.supports_v2_p2p else self.send_message(self.on_connection_send_msg)
    


    sr-gi commented at 7:46 pm on January 11, 2024:
    nit: feels like this is something that you could do on __init__() instead of creating an empty array. It’s not a big deal though, so feel free to disregard

    sr-gi commented at 7:06 pm on January 12, 2024:
    Giving this a second look, is this queue necessary? The current use of it is adding the on_connection_send_msg and setting that to None, meaning that the queue will have at most a single message. Not sure if this is intended to be used in the future, but I don’t this it is used atm

    stratospher commented at 2:25 pm on January 20, 2024:

    hmm you’re right. i used the idea of a queue initially because i liked the abstraction implied by queuing unsent messages to be sent after v2 handshake/reconnection.

    since on_connection_send_msg is always the version message, i’ve changed the code to work with not clearing on_connection_send_msg before v2 handshake/reconnection is complete.

  168. in test/functional/test_framework/p2p.py:237 in 6a7fe356d0 outdated
    230@@ -231,12 +231,66 @@ def connection_lost(self, exc):
    231         self.recvbuf = b""
    232         self.on_close()
    233 
    234+    # v2 handshake method
    235+    def v2_handshake(self):
    236+        """v2 handshake performed before P2P messages are exchanged (see BIP324). P2PConnection is an initiator
    237+        (in inbound connections to TestNode) and responder(in outbound connections from TestNode).
    


    sr-gi commented at 8:02 pm on January 11, 2024:
    nit: space missing after responder, also I guess add “the” before to match the previous suggestion if you happen to take it

    stratospher commented at 2:25 pm on January 20, 2024:
    done.
  169. in test/functional/test_framework/p2p.py:236 in 6a7fe356d0 outdated
    230@@ -231,12 +231,66 @@ def connection_lost(self, exc):
    231         self.recvbuf = b""
    232         self.on_close()
    233 
    234+    # v2 handshake method
    235+    def v2_handshake(self):
    236+        """v2 handshake performed before P2P messages are exchanged (see BIP324). P2PConnection is an initiator
    


    sr-gi commented at 8:02 pm on January 11, 2024:
    nit: P2PConnection is “the” initiator?

    stratospher commented at 2:25 pm on January 20, 2024:
    done.
  170. in test/functional/test_framework/p2p.py:243 in 6a7fe356d0 outdated
    238+        Performed by:
    239+            * initiator using `initiate_v2_handshake()`, `complete_handshake()` and `authenticate_handshake()`
    240+            * responder using `respond_v2_handshake()`, `complete_handshake()` and `authenticate_handshake()`
    241+
    242+        `initiate_v2_handshake()` is immediately done by the initiator when the connection is established in
    243+        `connection_made()`. the rest of the initial v2 handshake functions are handled here.
    


    sr-gi commented at 8:05 pm on January 11, 2024:
    nit: caps in “the”

    stratospher commented at 2:25 pm on January 20, 2024:
    done.
  171. in test/functional/test_framework/p2p.py:246 in 6a7fe356d0 outdated
    241+
    242+        `initiate_v2_handshake()` is immediately done by the initiator when the connection is established in
    243+        `connection_made()`. the rest of the initial v2 handshake functions are handled here.
    244+        """
    245+        if not self.v2_state.peer:
    246+            length = 0
    


    sr-gi commented at 8:17 pm on January 11, 2024:

    Is this necessary?

    self.v2_state.received_prefix is initialized to b"", whose length is zero. Therefore, and given you are only using this as offset for self.recvbuf, I think you could just dump the variable and do (in line 266):

    0response = self.v2_state.complete_handshake(BytesIO(self.recvbuf[len(self.v2_state.received_prefix):]))
    

    Or define the variable if you want a shorter call but just right before using it, without having to set it to zero first.


    stratospher commented at 2:26 pm on January 20, 2024:
    makes sense! i’ve changed respond_handshake and complete_handshake to return length consumed too.
  172. in test/functional/test_framework/p2p.py:250 in 6a7fe356d0 outdated
    245+        if not self.v2_state.peer:
    246+            length = 0
    247+            if not self.v2_state.initiating and not self.v2_state.sent_garbage:
    248+                # if the responder hasn't sent garbage yet, the responder is still reading ellswift bytes
    249+                send_handshake_bytes = self.v2_state.respond_v2_handshake(BytesIO(self.recvbuf))
    250+                length = len(self.v2_state.received_prefix)
    


    sipa commented at 3:44 pm on January 12, 2024:

    In commit “[test] Perform initial v2 handshake”:

    I think it’s not very clean to reach into self.v2_state’s internals to figure out how much respond_v2_handshake has consumed. It would be better if the function just returned how much was consumed (if anything).


    stratospher commented at 2:24 pm on January 20, 2024:
    makes sense! I’ve changed respond_v2_handshake and complete_v2_handshake to return the bytes consumed info too.
  173. in test/functional/test_framework/p2p.py:276 in 6a7fe356d0 outdated
    271+        # is derived in `complete_handshake()`.
    272+        # so `authenticate_handshake()` which uses the BIP324 derived ciphers gets called after `complete_handshake()`.
    273+        assert self.v2_state.peer
    274+        # at least 16 bytes garbage terminator and 20 bytes empty transport version packet
    275+        # is required to authenticate v2 handshake
    276+        if len(self.recvbuf) < 16 + 20:
    


    sipa commented at 4:17 pm on January 12, 2024:

    In commit “[test] Perform initial v2 handshake”:

    I don’t think this conditional is necessary. authenticate_handshake itself can figure out what it needs.


    stratospher commented at 2:24 pm on January 20, 2024:
    true! done.
  174. in test/functional/p2p_v2_encrypted.py:94 in c79372470c outdated
    89+            assert peer6.supports_v2_p2p
    90+            assert_equal(node0.getpeerinfo()[-1]["transport_protocol_type"], "v2")
    91+
    92+            if i:
    93+                # check if node1 connected to node0 (but not to node0's p2p connection directly)
    94+                # gets blocks produced by node0's p2p connection
    


    sr-gi commented at 7:48 pm on January 12, 2024:

    I’m really struggling to understand what is going on here based on these comments (and the ones bellow at L98-99).

    To my understanding, what we are trying to test here is: given a peer of node0 (peer6) that provides some blocks to it (decoy on the first iteration and regular on the second) check if after connecting node0 and node1 they do end up sharing the same tip. However, I feel the comments are hard to understand


    stratospher commented at 2:26 pm on January 20, 2024:
    yes! i’ve simplified the comments. hopefully it’s better now?

    sr-gi commented at 4:11 pm on January 23, 2024:
    Yes! This reads much clearer now :)
  175. in test/functional/p2p_v2_encrypted.py:56 in c79372470c outdated
    51+        block.solve()
    52+        return block
    53+
    54+    def run_test(self):
    55+        node0, node1 = self.nodes[0], self.nodes[1]
    56+        self.log.info("Check inbound connections to v2 TestNode from v2 P2PConnection is v2")
    


    sr-gi commented at 8:15 pm on January 12, 2024:
    nit: connection (for consistency with the rest)

    stratospher commented at 2:26 pm on January 20, 2024:
    done.
  176. in test/functional/p2p_v2_encrypted.py:114 in c79372470c outdated
    109+            self.disconnect_nodes(0, 1)
    110+
    111+        self.log.info("Check the connections opened as expected")
    112+        check_node_connections(node=node0, num_in=4, num_out=2)
    113+
    114+        self.log.info("Check inbound connections to v1 TestNode from v2 P2PConnection is v1")
    


    sr-gi commented at 8:16 pm on January 12, 2024:

    nit: connection (for consistency with the rest).

    Also, it may be worth moving this right peer2 so inbounds and outbounds are grouped


    stratospher commented at 2:26 pm on January 20, 2024:

    done.

    Also, it may be worth moving this right peer2 so inbounds and outbounds are grouped

    didn’t follow. i’ve renamed peer7 to peer1 since this is first peer in restarted node. is that what you meant?


    sr-gi commented at 6:21 pm on January 22, 2024:

    Sorry, I meant right after peer2, referring to grouping the connections in the test by either inbound or outbound.

    peer1 does v2 from v2 peer2 does v2 from v1 peer3 does v2 to v1 […] the rest are all x to y and then peer7 goes back to peer1 does v1 from v2

    Certainly not a big deal though


    stratospher commented at 6:30 am on January 24, 2024:
    oh that way! the last test was for v1 behaviour and peer7 needed a v1 TestNode while all the others needed a v2 TestNode which is why it was kept separately in the end.
  177. sr-gi commented at 8:52 pm on January 12, 2024: member
    First pass, untested, will do a more thorough review next week
  178. DrahtBot added the label CI failed on Jan 14, 2024
  179. glozow added the label Tests on Jan 18, 2024
  180. in test/functional/test_framework/v2_p2p.py:232 in 42203f84bb outdated
    213+    def initialize_v2_transport(self, ecdh_secret):
    214+        """Sets the peer object with various BIP324 derived keys and ciphers."""
    215+        peer = {}
    216+        salt = b'bitcoin_v2_shared_secret' + MAGIC_BYTES[self.net]
    217+        for name, length in (('initiator_L', 32), ('initiator_P', 32), ('responder_L', 32), ('responder_P', 32),
    218+                             ('garbage_terminators', 32), ('session_id', 32)):
    


    sr-gi commented at 3:38 pm on January 18, 2024:
    Why is 32 here being passed in every tuple if it really is used as a constant?

    stratospher commented at 6:41 am on January 24, 2024:
    true! removed.
  181. in test/functional/test_framework/v2_p2p.py:236 in 42203f84bb outdated
    217+        for name, length in (('initiator_L', 32), ('initiator_P', 32), ('responder_L', 32), ('responder_P', 32),
    218+                             ('garbage_terminators', 32), ('session_id', 32)):
    219+            peer[name] = hkdf_sha256(salt=salt, ikm=ecdh_secret, info=name.encode('utf-8'), length=length)
    220+        peer['initiator_garbage_terminator'] = peer['garbage_terminators'][:16]
    221+        peer['responder_garbage_terminator'] = peer['garbage_terminators'][16:]
    222+        del peer['garbage_terminators']
    


    sr-gi commented at 3:55 pm on January 18, 2024:
    Why is garbage_terminators being actively deleted here instead of letting the deletion be managed by the interpreter once we go out of context (and peer is deleted as a whole)?

    sr-gi commented at 3:56 pm on January 18, 2024:

    I initially thought this had to do with:

    # To achieve forward secrecy we must wipe the key material used to initialize the ciphers

    But turns out we are not doing it for the key material.


    sr-gi commented at 4:04 pm on January 18, 2024:

    We could even get rid of this whole re-assignment and just set it in the latter if/else:

    0self.peer['send_garbage_terminator'] = peer['garbage_terminators'][:16]
    1self.peer['recv_garbage_terminator'] = peer['garbage_terminators'][16:]
    

    stratospher commented at 6:41 am on January 24, 2024:
    nice! done.
  182. in test/functional/test_framework/v2_p2p.py:87 in 42203f84bb outdated
    82+        encrypt/decrypt v2 P2P messages using v2_enc_packet() and v2_receive_packet().
    83+    """
    84+    def __init__(self, *, initiating, net):
    85+        self.initiating = initiating  # True if initiator
    86+        self.net = net
    87+        self.peer = {}  # object with various BIP324 derived keys and ciphers
    


    sr-gi commented at 4:13 pm on January 18, 2024:
    I wonder whether it may be worth defining a Peer class for this and move the peer-building logic of initialize_v2_transport here. I don’t think this would make a huge difference, but it would make accessing the elements of peer cleaner for the callers. e.g. self.peer.send_L instead of self.peer["send_L"]

    stratospher commented at 6:45 am on January 24, 2024:
    right, not much of a difference - leaving it as a follow up if desired.
  183. stratospher force-pushed on Jan 20, 2024
  184. DrahtBot removed the label CI failed on Jan 20, 2024
  185. stratospher force-pushed on Jan 21, 2024
  186. stratospher force-pushed on Jan 21, 2024
  187. DrahtBot added the label CI failed on Jan 21, 2024
  188. DrahtBot removed the label CI failed on Jan 22, 2024
  189. stratospher force-pushed on Jan 22, 2024
  190. in test/functional/test_framework/p2p.py:252 in 885fac3a13 outdated
    247+                # if the responder hasn't sent garbage yet, the responder is still reading ellswift bytes
    248+                # reads ellswift bytes till the first mismatch from 12 bytes V1_PREFIX
    249+                length, send_handshake_bytes = self.v2_state.respond_v2_handshake(BytesIO(self.recvbuf))
    250+                self.recvbuf = self.recvbuf[length:]
    251+                if send_handshake_bytes == -1:
    252+                    self.supports_v2_p2p = False
    


    mzumsande commented at 7:49 pm on January 22, 2024:
    line should be removed now that supports_v2_p2p is a property, it would probably throw an AttributeError if it was hit (which it isn’t)

    stratospher commented at 6:35 am on January 24, 2024:
    good catch! done.
  191. in src/rpc/net.cpp:411 in c950386eaa outdated
    407     NodeContext& node = EnsureAnyNodeContext(request.context);
    408     CConnman& connman = EnsureConnman(node);
    409 
    410-    const bool success = connman.AddConnection(address, conn_type);
    411+    if (use_v2transport && !(connman.GetLocalServices() & NODE_P2P_V2)) {
    412+        throw JSONRPCError(RPC_INVALID_PARAMETER, "Error: v2transport requested but not enabled (see -v2transport)");
    


    naumenkogs commented at 8:20 am on January 23, 2024:
    c950386eaaf41ec096606de81a3a60b278b156f0 nit: since v2transport is named exactly the same in addconnection and as an init flag, perhaps either specify where exactly to see, or have extended description in both places (“adding v2transport connections requires v2transport init flag to be set.”). Current version is kinda confusing.

    stratospher commented at 6:36 am on January 24, 2024:
    done. liked the simpler language.
  192. in test/functional/test_framework/v2_p2p.py:132 in 9b6425ce0f outdated
    129+        """
    130+        return self.generate_keypair_and_garbage()
    131+
    132+    def respond_v2_handshake(self, response):
    133+        """Responder begins the v2 handshake by sending its ellswift bytes and garbage. However, the responder
    134+        sends this after having received at least one byte that mismatches 16-byte v1_prefix.
    


    naumenkogs commented at 8:29 am on January 23, 2024:
    9b6425ce0fb74b40e12f92b704a41feec23cd754 nit: “at least one byte” is confusing… what if it’s half-a-byte? Sounds like that would be ignored, but i assume it won’t be.

    stratospher commented at 4:43 pm on January 23, 2024:

    i took the text from the BIP here. On the python side, data_received() asyncio function only gets called when a bytes object is received.

    (thinking about the c++ side)


    sipa commented at 1:58 pm on January 24, 2024:
    TCP/IP presents a stream interface that sends/receives bytes. You can only receive multiples of bytes.
  193. in test/functional/test_framework/v2_p2p.py:147 in 9b6425ce0f outdated
    144+            byte = response.read(1)
    145+            # return b"" if we need to receive more bytes
    146+            if not byte:
    147+                return len(self.received_prefix), b""
    148+            self.received_prefix += byte
    149+            if self.received_prefix[-1] != v1_prefix[len(self.received_prefix) - 1]:
    


    naumenkogs commented at 8:47 am on January 23, 2024:

    9b6425ce0fb74b40e12f92b704a41feec23cd754

    nit: might be cleaner this way

     0        received_prefix_len = 0
     1        while received_prefix_len < 16:
     2            byte = response.read(1)
     3            # return b"" if we need to receive more bytes
     4            if not byte:
     5                return received_prefix_len, b""
     6            received_prefix_len += len(byte)
     7            if byte != v1_prefix[received_prefix_len - 1]:
     8                return received_prefix_len, self.generate_keypair_and_garbage()
     9        # return -1 to decide v1 only after all 16 bytes processed
    10        return received_prefix_len, -1
    

    stratospher commented at 4:55 pm on January 23, 2024:

    respond_v2_handshake() is supposed keep reading bytes from bitcoind and shoot out ellswift bytes when first mismatch from 16 bytes V1_PREFIX happens. Let’s say the mismatch happens after reading 3 bytes - we still need to store the 3 bytes read somewhere (currently stored in self.received_prefix) since it’s actually the first 3 bytes of ellswift which bitcoind was sending us.

    we need this 3 bytes later in complete_handshake() to read the remaining 64-3 bytes and to compute ellswift which bitcoind sent us.

  194. in test/functional/test_framework/p2p.py:301 in 9a20502755 outdated
    313-                h = sha256(th)
    314-                if checksum != h[:4]:
    315-                    raise ValueError("got bad checksum " + repr(self.recvbuf))
    316-                self.recvbuf = self.recvbuf[4+12+4+4+msglen:]
    317+                if self.supports_v2_p2p:
    318+                    if not self.v2_state.tried_v2_handshake:
    


    naumenkogs commented at 9:10 am on January 23, 2024:

    9a2050275573eae31c5dc14fefedb612e951d0b8

    do you mind adding a comment for what happened here?


    stratospher commented at 6:37 am on January 24, 2024:

    good find! replaced it with an else pathway in data_received(). this was kept to not perform parsing/deserialising of P2P messages inside _on_data() in case v2_handshake() wasn’t over.

    done.

  195. in test/functional/test_framework/p2p.py:315 in 9a20502755 outdated
    327+                    self.recvbuf = self.recvbuf[msglen:]
    328+
    329+                    assert msg  # application layer messages(which aren't decoy messages) are non-empty
    330+                    shortid = msg[0]
    331+                    if shortid == 0:
    332+                        # a string command
    


    naumenkogs commented at 9:13 am on January 23, 2024:
    what does this mean?

    stratospher commented at 6:40 am on January 24, 2024:
    added more comments. meant it as the ASCII message type like in v1.
  196. in test/functional/test_framework/p2p.py:228 in 125e24a940 outdated
    222@@ -221,14 +223,15 @@ def connection_made(self, transport):
    223         if self.supports_v2_p2p and self.v2_state.initiating and not self.v2_state.tried_v2_handshake:
    224             send_handshake_bytes = self.v2_state.initiate_v2_handshake()
    225             self.send_raw_message(send_handshake_bytes)
    226-        if self.on_connection_send_msg and not self.supports_v2_p2p:
    227+        if self.on_connection_send_msg and not self.supports_v2_p2p and not self.reconnect:
    


    naumenkogs commented at 9:21 am on January 23, 2024:
    why on_connection_send_msg is not sent here on reconnect (the new condition)?

    stratospher commented at 5:17 am on January 24, 2024:

    (originally because on_connection_send_msg would be end up being unused in v2 connection anyways and wanted to keep the on_connection_send_msg send during the v1 reconnection phase.)

    interesting question! you’re talking about a less common scenario where initiator node is preparing for v2 connection thinking responder node is v2 but v1 version message from responder node gets sent before v2 ellswift bytes from initiator node. if this happens the initiator node wouldn’t attempt a reconnection using v1 since it’s receive buffer is not empty.


    naumenkogs commented at 8:20 am on January 24, 2024:
    I think it deserves a comment :)

    mzumsande commented at 7:07 pm on January 24, 2024:

    The python code only supports the “reconnection” use case in one direction: TestNode connects to the P2PConnection with v2, and reconnects with v1 without anything breaking (e.g. timeouts) on the python side. The reverse direction (P2PConnection actively does reconnections itself for outgoing connections) is not supported and I don’t think it needs to be. So P2PConnection is receiving an inbound connection, and here it must wait to send on_connection_send_msg with sending until the version is received so it doesn’t send it out in the v2 attempt. Maybe a comment could just point that in the reconnection case, on_connection_send_msg is sent in on_version, not here.

    I don’t understand why in the existing v1 master code, the test framework, when receiving an inbound connection, sends out the version message in connection_made instead of doing it in on_version (similar to the way it is done in bitcoind, see here). After all, the usual protocol is that the initiator sends its version, and the responder processes it and responds. But that is unrelated to this PR.


    stratospher commented at 5:53 am on January 25, 2024:

    The reverse direction (P2PConnection actively does reconnections itself for outgoing connections) is not supported and I don’t think it needs to be.

    +1 since we only really care if TestNode/bitcoind (and not P2PConnection) does reconnection logic.

    I don’t understand why in the existing v1 master code, the test framework, when receiving an inbound connection, sends out the version message in connection_made instead of doing it in on_version (similar to the way it is done in bitcoind, see here). After all, the usual protocol is that the initiator sends its version, and the responder processes it and responds. But that is unrelated to this PR.

    i guess support for P2PConnection receiving an inbound connection/TestNode initiating outbound connections was only added later in the test framework when addconnection RPC was introduced. Also the concept of who’s initiating wasn’t relevant until now and we don’t have access to that information in the inside functions.

    maybe we can move initiating from class EncryptedP2PState to class P2PConnection instead.

    (EDIT: added a comment here)

  197. naumenkogs commented at 10:40 am on January 23, 2024: member
    Some low-level feedback for now, mostly nits. I will do high-level review against the BIP shortly.
  198. in test/functional/test_framework/p2p.py:296 in a5e0d0b3ab outdated
    292 
    293     def data_received(self, t):
    294         """asyncio callback when data is read from the socket."""
    295         if len(t) > 0:
    296             self.recvbuf += t
    297+            if self.supports_v2_p2p and not self.v2_state.tried_v2_handshake:
    


    sr-gi commented at 3:46 pm on January 23, 2024:

    I wonder if it’d be worth also checking that len(self.recvbuf) > 15 to avoid calling v2_handshake when we potentially don’t have enough data to decide whether this a v1 or v2 connection. This potentially will call v2_handshake 16 times, and process byte arrays of length from 1..16.

    In practice I don’t think this matters, given the logic never triggers: this happens when the python node is the responder, and the Core node will never send a half-baked magic_bytes + version (it may partially happen randomly, but its rather unlikely).

    Still, it feels a small enough change in case this ends up being reused.


    stratospher commented at 6:19 am on January 24, 2024:
    hmm other than the ellswift bytes, v2_handshake() also needs to read garbage bytes (could be received as separate chunks with size < 16 bytes) and other things like garbage terminator, decoy messages and version packet. i’m leaning towards keeping the interface generic and processing as we receive bytes.

    sr-gi commented at 10:04 pm on January 24, 2024:
    Oh, I thought the recvbuf was a growing-only buffer that was not cleared until the handshake was over.
  199. sr-gi commented at 4:09 pm on January 23, 2024: member

    Approach ACK

    I think this looks good overall, I left some potential improvements/questions but feel free to disregard them given this is just for testing

  200. [rpc/net] Allow v2 p2p support in addconnection
    This test-only RPC is required when a TestNode initiates
    an outbound v2 p2p connection. Add a new arg `v2transport`
    so that the node can attempt v2 connections.
    4487b80517
  201. [test/crypto] Add ECDH
    Co-authored-by: Pieter Wuille <pieter.wuille@gmail.com>
    595ad4b168
  202. [test] Move MAGIC_BYTES to messages.py
    This avoids circular dependency happening when importing MAGIC_BYTES.
    Before,
    	p2p.py <--import for EncryptedP2PState-- v2_p2p.py
    	  |					    ^
    	  |				            |
    	  └---------import for MAGIC_BYTES----------┘
    Now, MAGIC_BYTES are kept separately in messages.py
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    8d6c848a48
  203. [test] Construct class to handle v2 P2P protocol functions
    The class `EncryptedP2PState` stores the 4 32-byte keys, session id,
    garbage terminators, whether it's an initiator/responder, whether the
    initial handshake has been completed etc.. It also contains functions
    to perform the v2 handshake and to encrypt/decrypt p2p v2 messages.
    
    - In an inbound connection to TestNode, P2PConnection is the initiator
    and `initiate_v2_handshake()`, `complete_handshake()`, `authenticate_handshake()`
    are called on it. [ TestNode <----------------- P2PConnection ]
    
    - In an outbound connection from TestNode, P2PConnection is the responder
    and `respond_v2_handshake()`, `complete_handshake()`, `authenticate_handshake()`
    are called on it. [ TestNode -----------------> P2PConnection ]
    b89fa59e71
  204. stratospher force-pushed on Jan 24, 2024
  205. in test/functional/p2p_v2_earlykeyresponse.py:14 in fa4d9d5b39 outdated
     9+from test_framework.crypto.ellswift import ellswift_create
    10+from test_framework.p2p import P2PInterface
    11+from test_framework.v2_p2p import EncryptedP2PState
    12+
    13+
    14+class TestEncryptedP2PState(EncryptedP2PState):
    


    naumenkogs commented at 8:36 am on January 24, 2024:
    The goal of this class is mostly to test the tests, right? Commenting that more clearly could help.

    stratospher commented at 5:55 am on January 25, 2024:
    done.
  206. in test/functional/p2p_v2_earlykeyresponse.py:81 in fa4d9d5b39 outdated
    76+        self.log.info('Sending remaining ellswift and garbage which are different from V1_PREFIX. Since a response is')
    77+        self.log.info('expected now, our custom data_received() function wouldn\'t result in assertion failure')
    78+        ellswift_and_garbage_data = peer1.v2_state.initiate_v2_handshake()
    79+        peer1.send_raw_message(ellswift_and_garbage_data)
    80+        peer1.wait_for_disconnect(timeout=5)
    81+        self.log.info('successful disconnection when MITM happens in the key exchange phase')
    


    naumenkogs commented at 8:41 am on January 24, 2024:
    This comes unexpectedly. This is merely an unintentional side-effect of saving LOC in this test, right?

    stratospher commented at 3:33 pm on January 30, 2024:
    true, will clean it up in this branch.
  207. naumenkogs commented at 8:55 am on January 24, 2024: member

    ACK fa4d9d5b39d5652de1d83564f19b1a0749863b16

    There are many more tests possible, right? E.g. sending more than 4095 bytes of garbage. I assume the goal here wasn’t to cover everything?

  208. DrahtBot requested review from mzumsande on Jan 24, 2024
  209. DrahtBot requested review from sr-gi on Jan 24, 2024
  210. DrahtBot requested review from theStack on Jan 24, 2024
  211. [test] Introduce EncryptedP2PState object in P2PConnection
    Instantiate this object when the connection supports v2 P2P transport
    protocol.
    
    - When a P2PConnection is opened, perform initiate_v2_handshake() if the
    connection is an initiator. application layer messages are only sent after
    the initial v2 handshake is over (for both initiator and responder).
    a049d1bd08
  212. [test] Perform initial v2 handshake 05bddb20f5
  213. [test] Read v2 P2P messages 5b91fb14ab
  214. [test] Use lock for sending P2P messages in test framework
    Messages are built, encrypted and sent over the socket in v2
    connections. If a race condition happens between python's main
    thread and p2p thread with both of them trying to send a message,
    it's possible that the messages get encrypted with wrong keystream.
    
    Messages are built and sent over the socket in v1 connections.
    So there's no problem if messages are sent in the wrong order.
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    Co-authored-by: theStack <sebastian.falbesoner@gmail.com>
    bb7bffed79
  215. [test] Build v2 P2P messages a94e350ac0
  216. [test] Reconnect using v1 P2P when v2 P2P terminates due to magic byte mismatch
    - When a v2 TestNode makes an outbound connection to a P2PInterface node
    which doesn't support v2 but is advertised as v2 by some malicious
    intermediary, the TestNode sends 64 bytes ellswift. The v1 node doesn't
    understand this and disconnects. Then the v2 TestNode reconnects by
    sending a v1/version message.
    382894c3ac
  217. [test] Allow inbound and outbound connections supporting v2 P2P protocol
    - Add an optional `supports_v2_p2p` parameter to specify if the inbound
    and outbound connections support v2 P2P protocol.
    - In the `addconnection_callback` which gets called when creating
    outbound connections, call the `addconnection` RPC with v2 P2P protocol
    support enabled.
    8c054aa04d
  218. [test] Ignore BIP324 decoy messages
    Also allow P2PConnection::send_message() to send decoy messages for
    writing tests.
    4115cf9956
  219. [test] Add functional tests to test v2 P2P behaviour ba737358a3
  220. [test] Check whether v2 TestNode performs downgrading ffe6a56d75
  221. [test] Add functional test to test early key response behaviour in BIP 324
    - A node initiates a v2 connection by sending 64 bytes ellswift
    - In BIP 324 "The responder waits until one byte is received which does not match the
      V1_PREFIX (16 bytes consisting of the network magic followed by "version\x00\x00\x00\x00\x00".)"
    - It's possible that the 64 bytes ellswift sent by an initiator starts with a prefix of V1_PREFIX
    - Example form of 64 bytes ellswift could be:
    	4 bytes network magic + 60 bytes which aren't prefixed with remaining V1_PREFIX
    - We test this behaviour:
    	- when responder receives 4 byte network magic -> no response received by initiator
    	- when first mismatch happens -> response received by initiator
    bc9283c441
  222. stratospher force-pushed on Jan 25, 2024
  223. stratospher commented at 5:59 am on January 25, 2024: contributor

    There are many more tests possible, right? E.g. sending more than 4095 bytes of garbage. I assume the goal here wasn’t to cover everything?

    yes! goal here was to just introduce support for v2 in the test framework. we need more tests after this.

  224. in test/functional/test_framework/v2_p2p.py:119 in b89fa59e71 outdated
    116+    def generate_keypair_and_garbage(self):
    117+        """Generates ellswift keypair and 4095 bytes garbage at max"""
    118+        self.privkey_ours, self.ellswift_ours = ellswift_create()
    119+        garbage_len = random.randrange(MAX_GARBAGE_LEN + 1)
    120+        self.sent_garbage = random.randbytes(garbage_len)
    121+        logger.debug(f"sending {garbage_len} bytes of garbage data")
    


    theStack commented at 7:25 pm on January 25, 2024:
    nit: not a big deal, but I feel this message would fit more to the call-site, when the garbage is actually passed to the transport layer (before .send_raw_message()).

    stratospher commented at 2:53 pm on January 31, 2024:
    didn’t include this in #29356 because we’d need to call it in 2 places (before initiate_v2_handshake, respond_v2_handshake) and also calculate garbage_len there again.

    stratospher commented at 3:20 pm on May 17, 2024:
    done in #29431 actually
  225. mzumsande commented at 7:50 pm on January 25, 2024: contributor
    Code Review ACK bc9283c4415a932ec1eeb70ca2aa4399c80437b3
  226. DrahtBot requested review from naumenkogs on Jan 25, 2024
  227. in test/functional/test_framework/v2_p2p.py:87 in b89fa59e71 outdated
    83+        encrypt/decrypt v2 P2P messages using v2_enc_packet() and v2_receive_packet().
    84+    """
    85+    def __init__(self, *, initiating, net):
    86+        self.initiating = initiating  # True if initiator
    87+        self.net = net
    88+        self.peer = {}  # object with various BIP324 derived keys and ciphers
    


    theStack commented at 8:54 pm on January 25, 2024:
    nit: the name self.peer doesn’t say a lot, maybe something like self.key_material would be more descriptive? (though it’s strictly speaking more than that, as it contains also stuff derived from the keys like garbage terminators and session id)

    sr-gi commented at 10:25 pm on January 25, 2024:
  228. in test/functional/test_framework/p2p.py:236 in 05bddb20f5 outdated
    231@@ -232,13 +232,62 @@ def connection_lost(self, exc):
    232         self.recvbuf = b""
    233         self.on_close()
    234 
    235+    # v2 handshake method
    236+    def v2_handshake(self):
    


    theStack commented at 9:03 pm on January 25, 2024:
    nit: could name this method in a way that it’s more obvious that this is called in the asyncio callback context, e.g. _on_data_v2_handshake.

    stratospher commented at 2:50 pm on January 31, 2024:
    done in #29356
  229. in test/functional/test_framework/p2p.py:597 in ffe6a56d75 outdated
    589@@ -590,6 +590,13 @@ def wait_for_disconnect(self, timeout=60):
    590         test_function = lambda: not self.is_connected
    591         self.wait_until(test_function, timeout=timeout, check_connected=False)
    592 
    593+    def wait_for_reconnect(self, timeout=60):
    594+        def test_function():
    595+            if not (self.is_connected and self.last_message.get('version') and self.v2_state is None):
    596+                return False
    597+            return True
    


    theStack commented at 9:27 pm on January 25, 2024:

    nit: a bit shorter

    0            return self.is_connected and self.last_message.get('version') and not self.supports_v2_p2p
    

    stratospher commented at 2:50 pm on January 31, 2024:
    done in #29356
  230. theStack approved
  231. theStack commented at 9:34 pm on January 25, 2024: contributor

    Code-review ACK bc9283c4415a932ec1eeb70ca2aa4399c80437b3

    Went through another review round and only found some minor nits, mostly naming suggestions. If you retouch (or follow-up material):

    • should use multi-line imports (see style guidelines in test/functional/README.md)
    • commit 382894c3acd2dbf3e4198814f547c75b6fb17706 has a leading space in the commit subject
  232. naumenkogs commented at 9:49 am on January 29, 2024: member
    ACK bc9283c4415a932ec1eeb70ca2aa4399c80437b3
  233. DrahtBot removed review request from naumenkogs on Jan 29, 2024
  234. glozow commented at 12:33 pm on January 29, 2024: member

    ACK bc9283c4415a932ec1eeb70ca2aa4399c80437b3

    Did some light code review of the P2PConnection functions, mutation testing of EncryptedP2PState, and manually changing other functional tests to use v2 connections.

  235. achow101 merged this on Jan 29, 2024
  236. achow101 closed this on Jan 29, 2024

  237. in src/rpc/net.cpp:405 in 4487b80517 outdated
    401@@ -401,11 +402,16 @@ static RPCHelpMan addconnection()
    402     } else {
    403         throw JSONRPCError(RPC_INVALID_PARAMETER, self.ToString());
    404     }
    405+    bool use_v2transport = !request.params[2].isNull() && request.params[2].get_bool();
    


    maflcko commented at 8:06 am on January 30, 2024:
    nit: Please don’t encode the default value in the syntax of the source code. Otherwise, it is hard to change the default value. Also, it is confusing to have to change two places to change one value. You could just use self.Arg<bool>(2).

    stratospher commented at 2:49 pm on January 31, 2024:
    done in #29356
  238. maflcko commented at 2:29 pm on January 30, 2024: member

    The test fails for me:

     0146/294 - p2p_v2_earlykeyresponse.py failed, Duration: 1 s
     1
     2stdout:
     32024-01-30T13:33:10.051000Z TestFramework (INFO): PRNG seed is: 5518845862592986813
     42024-01-30T13:33:10.052000Z TestFramework (INFO): Initializing test directory /tmp/test_runner_₿_🏃_20240130_142948/p2p_v2_earlykeyresponse_133
     52024-01-30T13:33:10.370000Z TestFramework (INFO): Sending ellswift bytes in parts to ensure that response from responder is received only when
     62024-01-30T13:33:10.370000Z TestFramework (INFO): ellswift bytes have a mismatch from the 16 bytes(network magic followed by "version\x00\x00\x00\x00\x00")
     72024-01-30T13:33:10.371000Z TestFramework (INFO): Sending first 4 bytes of ellswift which match network magic
     82024-01-30T13:33:10.371000Z TestFramework (INFO): If a response is received, assertion failure would happen in our custom data_received() function
     92024-01-30T13:33:10.429000Z TestFramework (INFO): Sending remaining ellswift and garbage which are different from V1_PREFIX. Since a response is
    102024-01-30T13:33:10.429000Z TestFramework (INFO): expected now, our custom data_received() function wouldn't result in assertion failure
    112024-01-30T13:33:10.480000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:12596 due to 
    122024-01-30T13:33:10.518000Z TestFramework (INFO): successful disconnection when MITM happens in the key exchange phase
    132024-01-30T13:33:10.519000Z TestFramework (INFO): Stopping nodes
    142024-01-30T13:33:10.625000Z TestFramework (INFO): Cleaning up /tmp/test_runner_₿_🏃_20240130_142948/p2p_v2_earlykeyresponse_133 on exit
    152024-01-30T13:33:10.625000Z TestFramework (INFO): Tests successful
    16
    17
    18stderr:
    19Fatal error: protocol.data_received() call failed.
    20protocol: <__main__.PeerEarlyKey object at 0x7fe7925dccb0>
    21transport: <_SelectorSocketTransport fd=10 read=polling write=<idle, bufsize=0>>
    22Traceback (most recent call last):
    23  File "python3.12/asyncio/selector_events.py", line 1023, in _read_ready__data_received
    24    self._protocol.data_received(data)
    25  File "test/functional/p2p_v2_earlykeyresponse.py", line 60, in data_received
    26    assert self.v2_state.can_data_be_received and not self.v2_state.send_net_magic
    27AssertionError
    
  239. stratospher commented at 3:32 pm on January 30, 2024: contributor

    hmm weird, looking into it.

    i also have a WIP branch which uses this test file and adds more v2 tests for sending excess garbage bytes, wrong garbage terminator, incorrect ellswift and non-empty version packet.

  240. sr-gi commented at 9:34 pm on January 30, 2024: member

    WIP branch

    Is this consistently failing for you (or can you at least reproduce it)? I can see it also failing on some of the current open PRs but I can not get my head around why (nor can I reproduce it).

    This should only happen if PeerEarlyKey receives any data from the node before sending anything at all, or after sending the network magic but before sending the remaining initial bytes + garbage.

    To my understanding, the node should not have been able to parse enough data to be replying (with either v1 or v2) 😕

    In case you can reproduce it, it may be useful to log what data_received is receiving

  241. mzumsande commented at 11:06 pm on January 30, 2024: contributor

    The test fails for me:

    see #29352 for a fix (and a way to reproduce it).

  242. achow101 referenced this in commit 4b66877197 on Jan 31, 2024
  243. glozow referenced this in commit 4de84557d6 on Feb 6, 2024
  244. fanquake referenced this in commit 5c6d900a27 on Feb 27, 2024
  245. achow101 referenced this in commit bef99176e6 on Mar 12, 2024
  246. stratospher deleted the branch on May 17, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me