Add BIP324: v2 P2P Transport Protocol #1378

pull dhruv wants to merge 1 commits into bitcoin:master from dhruv:bip324 changing 9 files +1296 −1
  1. dhruv commented at 7:35 pm on October 7, 2022: contributor

    Supersedes #1024

    Link to rendered version for your convenience: https://github.com/dhruv/bips/blob/bip324/bip-0324.mediawiki

    Open questions:

    • The BIP text currently includes a table for 1-byte message type IDs. The process for updating this table atomically to assign new IDs when new message types are being proposed is likely going to be an annoyance. What are better ways to handle ongoing allocation?
  2. dhruv force-pushed on Oct 7, 2022
  3. in README.mediawiki:990 in 72e3a6cf0a outdated
    985+| Peer Services
    986+| Version 2 P2P Encrypted Transport Protocol
    987+| Dhruv Mehta, Tim Ruffing, Jonas Schnelli, Pieter Wuille
    988+| Standard
    989+| Draft
    990+|-
    


    ajtowns commented at 0:31 am on October 8, 2022:
    The style="background-color: #ffffcf" line needs to be after bip-324, not before it.

    dhruv commented at 0:40 am on October 8, 2022:
    That fixed it. Thanks, @ajtowns!
  4. dhruv force-pushed on Oct 8, 2022
  5. in bip-0324.mediawiki:558 in 245d21cf2f outdated
    553+|-
    554+!+40
    555+|<code>GETCFHEADERS</code>||<code>CFHEADERS</code>||<code>GETCFCHECKPT</code>||<code>CFCHECKPT</code>
    556+|-
    557+!+44
    558+|<code>WTXIDRELAY</code>||<code>ADDRV2</code>||<code>SENDADDRV2</code>||(undefined)
    


    ajtowns commented at 1:18 am on October 8, 2022:
    VERSION, VERACK,SENDHEADERS, WTXIDRELAY, and SENDADDRV2 are only sent at most once per connection; wouldn’t it make sense to reserve single byte message types for things that will give a meaningful benefit from compression? (EDIT: GETADDR too, arguably)

    dhruv commented at 2:33 am on October 8, 2022:
    There are 210 additional message type IDs still available to allocate. In the future, if we did come close to using them up, a future upgrade could interpret 0xFF as “the next byte(or more than one) is also a part of the message type ID”. Given that, it made sense to allocate an ID to all message types still in use.
  6. in bip-0324.mediawiki:557 in 245d21cf2f outdated
    541+|-
    542+!+24
    543+|<code>GETDATA</code>||<code>GETHEADERS</code>||<code>HEADERS</code>||<code>INV</code>
    544+|-
    545+!+28
    546+|<code>MEMPOOL</code>||<code>MERKLEBLOCK</code>||<code>NOTFOUND</code>||<code>PING</code>
    


    ajtowns commented at 1:22 am on October 8, 2022:
    The BIP-61 REJECT message isn’t supported by bitcoin core since 0.19.0, but does seem to still be supported by btcd eg, so maybe would be worth including? On the other hand, maybe there’s no point making messages that are only useful for debugging more efficient by a few bytes?
  7. in bip-0324.mediawiki:545 in 245d21cf2f outdated
    529+|-
    530+!+8
    531+|(8 byte string)||(9 byte string)||(10 byte string)||(11 byte string)
    532+|-
    533+!+12
    534+|(12 byte string)||<code>ADDR</code>||<code>BLOCK</code>||<code>BLOCKTXN</code>
    


    ajtowns commented at 1:23 am on October 8, 2022:
    Is there any reason to expect clients supporting this BIP to use ADDR rather than ADDRV2?
  8. in bip-0324.mediawiki:523 in 245d21cf2f outdated
    507+| <code>message_type</code> || ''1..13'' || either a one byte ID or an ASCII string prefixed with a length byte
    508+|-
    509+| <code>message_payload</code> || <code>message_length</code> || message payload
    510+|}
    511+
    512+If the first byte of <code>message_type</code> is in the range ''1..12'', it is interpreted as the number of ASCII bytes that follow for the message type. If it is in the range ''13..255'', it is interpreted as a message type ID. This structure results in smaller messages than the v1 protocol as most messages sent/received will have a message type ID.<ref name="smaller_messages">'''How do the length between v1 and v2 compare?''' For messages that use the 1-byte short message type ID, v2 packets use 3 bytes less per message than v1.</ref>
    


    ajtowns commented at 1:31 am on October 8, 2022:

    Updating this table seems like it might be annoying (both if proposals made at the same time both choose the same byte, and if a proposal using just an ascii string wants to add support for a one byte id, while remaining compatible with nodes that aren’t aware of the new id) – it seems like it would serialize adding features in a similar way to having to bump the protocol version number to indicate support for a feature.

    Did you consider a more generic approach, like just running the protocol through a compression stage prior to encryption, and letting that automatically reduce commonly used message types into a shorter byte stream?


    dhruv commented at 2:35 am on October 8, 2022:
    In the time that I worked on it, we did not consider such a generic approach. However, I imagine such an approach might come with new dependencies and an increased surface area?

    sipa commented at 3:06 am on October 8, 2022:

    I don’t think we really considered that possibility in this context.

    Generic compression may be useful, but for it to be useful for tiny messages, I think we’ll need some kind of training data or dictionary approach; generic compression without that really only is beneficial after sufficient amounts of data.

    For example zstd compression have standard tooling for constructing dictionary files based on training data. I could imagine an extension that uses that, either with a single dictionary, or a small per-message-type one. But that doesn’t help us around the message coordination (either because the training data will become outdated with new messages, and switching to another dictionary is a protocol break; or, because new per-message dicts need some form of signalling… like this table).


    ajtowns commented at 6:16 am on October 9, 2022:

    I agree a custom (compression) dictionary doesn’t seem workable; I was thinking of having a compression stream over the entire connection lifetime, and invoking flush after each message, or perhaps when the uncompressed send buffer is empty.

    I guess updating the mapping as a whole (this bip specifies table version 0, then later define table version 1 mapping 13..255 to some particular set of messages, then version 2 later provides some other mapping), and listing out the table versions you both support either using the VERSION message, or via some new message prior to VERACK, and then using the highest version that you both support once VERACK is completed would work. Perhaps message type 0 could be used for VERSION, since there’s presumably no way it can be changed while retaining compatibility?


    sipa commented at 12:36 pm on October 9, 2022:
    An alternative is adding a feature later where the sender just announces their mapping once (“I will use byte <byte> for message type <string>), possibly so that any byte not listed retains its original assignment. That means a one-time up-to-3KiB cost for a fully custom set, but removes all coordination issues.

    ajtowns commented at 1:11 pm on October 9, 2022:
    If you’re mapping 44=WTXIDRELAY then the receiver presumably either knows how to interpret WTXIDRELAY, and can map it to a one or two byte enum MSG_WTXIDRELAY, or doesn’t know how to interpret WTXIDRELAY and can map it to the same one/two byte enum as MSG_INVALID or MSG_IGNORE. So really this would be an up to 3kB message to announce your mappings but only 250 to 500 bytes per peer to actually track the mapping.
  9. notmandatory cross-referenced this on Oct 14, 2022 from issue Socratic Seminar #1 Topics by notmandatory
  10. dhruv cross-referenced this on Oct 20, 2022 from issue BIP324: Cipher suite by dhruv
  11. dhruv cross-referenced this on Oct 21, 2022 from issue BIP324: Enable v2 P2P encrypted transport by dhruv
  12. ProofOfKeags commented at 10:49 pm on October 26, 2022: contributor
    Since TLS isn’t being used is there any way to recover “virtual host” functionality? This is necessary for addressing multiple nodes behind the same NAT which is increasingly common these days.
  13. bitschmidty cross-referenced this on Nov 1, 2022 from issue Newsletters: add 224 (2022-11-02) by harding
  14. yojoe commented at 2:51 pm on November 6, 2022: none

    The Transport layer specification says:

    Given a newly-established connection (typically TCP/IP) between two v2 P2P nodes, …

    What does “typically TCP/IP” mean? Will UDP based connections be supported by BIP-324 as well?

    I’m asking, because I stumbled over this obfs4-like protocol today which supports both, TCP and UDP: https://lib.rs/crates/sosistab sosistab seems to have similar goals as BIP-324 and seems to be in active development for over a year now. Maybe there are some learnings/takeaways for BIP-324 here? Especially the following aspect of sosistap might of interest for Bitcoin:

    Avoids last-mile congestive collapse but works around lossy links. Shamelessly unfair in permanently congested WANs — but that’s really their problem, not yours.

    Any thoughts?

  15. dhruv commented at 3:52 pm on November 6, 2022: contributor
    @yojoe The encrypted protocol we are proposing requires state and the order of packets is important. So we require a stream interface and can’t use UDP.
  16. sipa commented at 4:32 pm on November 6, 2022: member
    @dhruv @yojoe We should probably clarify that BIP324 assumes a reliable stream interface to run on top of, as the statefulnes of the Bitcoin P2P needs the same, so that’s what we’re targetting. The text says “typically” TCP/IP, because it can also run over whatever else provides that (in particular, Tor or I2P also do, and are used as layers to run the P2P protocol over).
  17. yojoe commented at 4:44 pm on November 6, 2022: none

    BIP324 assumes a reliable stream interface to run on top of

    So BIP324 “could” run on QUIC and similar protocols, which use UDP instead of TCP to provide reliable streams over multiplexed connections?

    With quite a lot of P2P applications moving to QUIC (UDP) right now (at least in my perception) and keeping TCP only as a fallback, I think two or three sentences on “Why not QUIC?” would help the BIP324 spec.

  18. sipa commented at 3:05 am on November 7, 2022: member

    @ProofOfKeags

    Since TLS isn’t being used is there any way to recover “virtual host” functionality? This is necessary for addressing multiple nodes behind the same NAT which is increasingly common these days.

    You can forward different port numbers to different machines on the NAT, if that’s what you want. The P2P protocol already relays full IP/port combinations, and since version 23.0 Bitcoin Core also can keep track of different ports on the same IP as separate entities.

    If you’re referring to some kind of (v)host functionality, as it exists in HTTP(S), where the client sends the server an actual hostname it’s trying to connect to, so that multiple hostnames can be served from a single IP: no; I think that would be a significant complication for little/no gain. The P2P protocol has no notion of hostnames right now, and adding something like that would require changes not just to the transport protocol but also to IP address relay to convey these. It’d also require running a trusted service on the NAT to properly dispatch to what’s behind it, so you’d have to wonder if that same entity couldn’t just run a single node for all users behind it (who could run their own nodes just connected to that one).

  19. sipa commented at 3:12 am on November 7, 2022: member

    @yojoe

    So BIP324 “could” run on QUIC and similar protocols, which use UDP instead of TCP to provide reliable streams over multiplexed connections?

    In theory, yes. I don’t think there is that much to gain from QUIC, as we don’t care about latency of connection set-up (we have very long-lived connections in general), integrated TLS encryption, or multiple parallel datastreams. I could imagine making use of the latter, but that’d need a much deeper redesign of the whole P2P protocol to make use of. In BIP324 we’re just looking at improving the transport layer of the protocol.

  20. in bip-0324.mediawiki:357 in 245d21cf2f outdated
    353+        if peer.received_prefix[-1] != V1_PREFIX[len(peer.received_prefix) - 1]:
    354+            peer.privkey_ours, peer.ellswift_ours = ellswift_create(initiating=False)
    355+            peer.sent_garbage = rand_bytes(garbage_len)
    356+            send(peer, ellswift_Y + peer.sent_garbage)
    357+            return
    358+    use_v1_protocol()
    


    dergoegge commented at 7:05 pm on November 26, 2022:

    Am I correct in assuming that this inbound downgrade logic only exists because service flags are additive and v1 clients will make connection to addresses that have the v2 service bit?

    Could it make more sense to introduce a new BIP155 network type instead of using the service flags? All the current network types essentially define a separate transport/network protocol (IPv{4,6}, Tor, I2P, CJDNS) while the current service flags are more about what type of application data a node serves (blocks, witnesses, compact filters, bloom filters).


    sipa commented at 5:05 pm on December 1, 2022:

    So your proposal is something like:

    • Add new BIP155 network types TRANSV2_IPV4, TRANSV2_IPV6 (and perhaps TRANSV2_TOR_V3, etc), corresponding to existing V1 transports on those networks (with similar IP/hash encoding).
    • Allow nodes to support just v1, just v2, or both if done on separate ports.
    • Drop the ability to create a v2 listener that can detect inbound v1 connections on the same port.

    I do agree that’s somewhat cleaner as service flags aren’t a perfect match, but using separate networks has downsides too (roughly in decreasing order of severity):

    • To avoid creating a heightened partition risk between v1 and v2, I think we need to expect that most nodes support both, at least on inbound, probably for a long time. And if that is the case, the introduction of new networks means effectively a doubling of addrv2 data rumoured on the network, as well as a duplication of addrman entries (or equivalent in other implementations).
    • V2 address propagation will initially be hampered by being treated as unknown network by old nodes (a bootstrapping problem every new network has).
    • Requiring every node to listen on two ports if they want to support v2 also means that in some instances, upgrading needs system administrator intervention to open ports.
    • Special logic is needed so that if a node knows about a peer’s IP in both V1 and V2 networks, it can prefer the V2 one.
    • Anti-DOS logic would need to be special cased to treat V1 and corresponding V2 networks as the same (as clearly control over a netrange in one implies control over the same netrange in the other).

    So I think that using service flags is a nice workaround for these issues, as it naturally implies the V1 and V2 “entry” to a node will be thought of as a single entity for the purpose of address relay, even to old nodes. It does mean listeners for now need auto-detection logic to be able to serve both V1 and V2 on the same port, but I think that’s mostly an implementation issue.

    The way to think about the service flags is that it is a hint to try V2 connections to a node. If the service flags incorrectly contains V2, the responder will almost certainly immediately disconnect, which the initiator interprets as a “retry as V1” per BIP324. Once a connection is established, using either V1 or V2, the initiator’s knowledge of the responder’s service flags will be corrected using the VERSION message, so even if the flags incorrectly say V1, the initiator can then still decide to reconnect as V2 (and if not, they will the next time a connection is established).

    So it’s just a hint, and not a disaster if it’s wrong in either direction. It’s an optimistic optimization over just not having any service flag at all, and make every V2 initiator always try V2 first and fall back to V1 on every connection. Given a very effective P2P attacker which can “or in” the V2 flag to every rumoured service flag, the service flag approach effectively becomes identical to not having the flags in the first place.

  21. bitcoin deleted a comment on Nov 27, 2022
  22. manda2020panda approved
  23. adiabat cross-referenced this on Dec 22, 2022 from issue NIP-04 considered harmful by adiabat
  24. dr-orlovsky commented at 10:03 pm on December 28, 2022: contributor

    Bitcoin P2P connections are asymmetric: one peer connects to the other peer, and that other peer must be known by some id. For now, the connection happens by the network address (which serves that id). The proposed wire protocol doesn’t provide any means of authenticating other peer, which opens possibilities for MiTs, and argues that this increases privacy. I see two contradictions:

    • the connecting peer knows the peer it connects to; no anonymity here is gained;
    • not authenticating the remote peer keeps the possibility for MiT attacks from the current wire protocol.

    I think that “exchange-known” key scheme (like in Noise_XK) is more adequate for bitcoin P2P than “exchange-exchange”, as in the current proposal (and like in Noise_XX).

  25. sipa commented at 10:19 pm on December 28, 2022: member

    @dr-orlovsky The proposal isn’t to not authenticate at all; it’s to leave authentication for a future improvement, in cases where it is desired.

    As the text argues, in a Bitcoin P2P setting, authentication is only relevant in cases where the initiator makes a deliberate connection to a peer with whom they have some out-of-band trust relation, and doesn’t matter for automatic connections. That optionality makes it undesirable to integrate in the key exchange (as that would make it mandatory), but nothing prevents it from being done separately, in a post-key-exchange, optional step when desired (that’s easy to do, but comes at the cost of an extra roundtrip, which many transport protocols prioritize; our P2P protocol connections are so long-lived this doesn’t matter).

    And this isn’t just a matter of avoiding a bit of complexity: using something like Noise_XK would require all listening nodes to have a known public key. I think that’s something we actively don’t want, as it would introduce a notion of identities to the protocol that must be known for it to operate at all.

    FWIW, we’re also working on a protocol for an optional authentication scheme without discoverable identities, and with the property that a MitM can’t learn anything about the desired/provided public/private keys. That will take time, as we’d want academic scrutiny before actually proposing it. There are already tangible benefits to deploying opportunistic encryption without authentication at all, so we don’t feel like waiting for that is desirable either. It’s also only one of a possible number of approaches that could be taken w.r.t. authentication.

  26. dr-orlovsky commented at 0:39 am on December 29, 2022: contributor

    authentication is only relevant in cases where the initiator makes a deliberate connection to a peer with whom they have some out-of-band trust relation, and doesn’t matter for automatic connections

    That is exactly the point I am trying to make: I think that even for automatic connections it might be desirable to know the identity of the peer. The requirement to always have a remote pubkey is not that hard to meet: the remote peers are listed either by the config or via P2P messages, and both of them can require to provide the remote identities alongside network address, as it is done in LN.

    At the same time I agree that the abstraction of authentication from encryption is a good thing to have. What is not good is to build a new P2P stack for bitcoin without paying attention to authentication.

  27. sipa commented at 0:45 am on December 29, 2022: member

    The requirement to always have a remote pubkey is not that hard to meet: the remote peers are listed either by the config or via P2P messages, and both of them can require to provide the remote identities alongside network address, as it is done in LN.

    It’s not hard; it’s just a bad idea. Bitcoin nodes don’t, and shouldn’t, have identities.

  28. dr-orlovsky commented at 0:56 am on December 29, 2022: contributor
    Isn’t IP address an identity? Or Onion address? Or any other networking address? Having public key next to it is just a second factor: it doesn’t add to the leak of a privacy or doesn’t make the node “more identifiable” at all; it is just a second factor ensuring that the existing identity is not stolen by a MitM
  29. sipa commented at 1:10 am on December 29, 2022: member

    An IP can be used as an identity, and in some cases it is, and that’s a bad idea. It’s that scenario where adding authentication makes sense, e.g. people configuring manual IP addresses of their own nodes or people they know.

    But in the case of automatic connections, the IP addresses are just endpoints. They’re not identities in the sense that they’re not trusted (nodes verify all incoming data), there is no reputation, and there is no cost to creating a new one. Effectively, every node is equally (dis)trusted, and adding cryptographic identities for that is just pointless. Furthermore, it’s risky; e.g. you need to make sure that nodes use separate identities for each IP they are reachable on (which the node might not even be aware of) or you’re instantly adding a fingerprinting vector.

    And when restricted to just the manual connection use case, non-discoverable identities suffice: all you need is a way to verify you’re connected to the party you intended to talk to; it doesn’t require telling others who you are, greatly reducing those risks.

  30. dr-orlovsky commented at 1:18 am on December 29, 2022: contributor

    I think there is terminological (actually semantic) difference.

    There is an identity for a reputation (state that persists before connections) – and there is an “identity” as a “second factor” to ensure that there is no forgery of some network addresses controlled by authorities (DNS, NATs etc).

    For sure I am not arguing about the first case (reputation, permanent identity).

    The point I am trying to address is that bitcoin P2P protocol probably shouldn’t allow connecting to remote peers by DNS or IP addresses without ensuring that the peer behind that address is the peer which was advertised via P2P. And static public key servers that purpose (and probably it is incorrect to call it “permanent identity”, but its validation against an expected value is still an authentication).

  31. sipa commented at 1:28 am on December 29, 2022: member

    The point I am trying to address is that bitcoin P2P protocol probably shouldn’t allow connecting to remote peers by DNS or IP addresses without ensuring that the peer behind that address is the peer which was advertised via P2P. And static public key servers that purpose (and probably it is incorrect to call it “permanent identity”, but its validation against an expected value is still an authentication).

    I disagree. There is no point in ensuring anything, because there is nothing to ensure: for automatic connections, every node is equally good.

    In any case, I believe I understand your argument, but we fundamentally disagree about what qualities the P2P network should have.

  32. bitcoin deleted a comment on Dec 29, 2022
  33. dr-orlovsky commented at 9:52 am on December 29, 2022: contributor

    I’d like to understand better why you think that P2P network should have a quality that for automatic connections, every node is equally good (i.e. the rational behind that). I am not trying to persuade you, I am asking for better rationale - I think others may have similar questions and it will be nice to clarify the approach better (including the text of the document).

    From my understanding, Bitcoin P2P network has a quality of eclipse attack. Connecting to peers which are guaranteed to be the peers known to be well-connected - or outside of the scope of a vulnerable network segment (for instance being in US for a Chinese node) is a way to mitigate that risk. If these nodes are not authenticated to be the nodes a peer tried to connect to (i.e. proving their network address was not stolen and they are just fakes run by a Chinese government) would make eclipse attacks more probable/simple. Or you mean that such nodes would be authenticated as a layer above this spec, while all other nodes (“automatically connected”) can be fakes without increasing chances of eclipse?

  34. yojoe commented at 3:28 pm on December 29, 2022: none

    I’d like to understand better why you think that P2P network should have a quality that for automatic connections, every node is equally good (i.e. the rational behind that).

    Bitcoin is special among P2P networks in regards to eclipse attacks as they can be easily/cheaply detected here. All you need to do is periodically check the chain tip hash via other low-bandwidth, out-of-band channel(s). Either manually or you script it. It would be really hard for an adversary to eclipse all your information channels and stop you from learning that another (and potentially heavier) chain tip exists. Alternatively, you can already manually and explicitly add a few trusted “nodes” to the mix, e.g. Tor, I2P or CJDNS, whose address schemes are all “public key” based and don’t rely on centralised authorities like DNS. So if you worry about eclipse attacks, you can already detect and somewhat counter them.

    But the main point is that “public key based identities” and “authenticated connections” are not the silver bullet against eclipse attacks in TRUSTLESS, PUBLIC, OPEN NETWORKS.

    Connecting to peers which are guaranteed to be the peers known to be well-connected

    That would be a “web of trust” and not a “trustless” topology.

    or outside of the scope of a vulnerable network segment (for instance being in US for a Chinese node) is a way to mitigate that risk

    AFAIK that’s exactly what the clearnet (IP4/6) peer selection algorithm already considers.

  35. dr-orlovsky approved
  36. dr-orlovsky commented at 4:09 pm on December 29, 2022: contributor
    I’ve got all answers to my questions, thank you. ACK
  37. kallewoof commented at 2:03 am on January 4, 2023: member
    I don’t see any un-addressed concerns regarding this pull request and will merge this soon unless someone pipes up. Thanks.
  38. dhruv commented at 2:16 am on January 4, 2023: contributor
    @kallewoof we have a couple small changes coming up
  39. kallewoof commented at 2:59 am on January 4, 2023: member
    It’s helpful if you change to draft when you don’t want it merged so maintainers know to skip over it.
  40. dhruv force-pushed on Jan 4, 2023
  41. dhruv commented at 3:37 am on January 4, 2023: contributor

    @kallewoof I’m sorry about that – I was unclear about the workflow on the bips repo but I understand now. I have pushed up the latest changes and the working group (@sipa, @real-or-random and I) agrees it’s rfm.

    Latest push:

    • New rekeying mechanism that uses ChaCha20 keystream bytes and does not rely on SHA256 - reducing one dependency for the BIP324 requirements as well as a small efficiency gain. The PRs in Bitcoin Core are already using this new mechanism.
    • Changed rekey interval to 224 messages instead of 256 messages.
    • Erlay message types in the message type id table
    • Reference code changes for terminator derivation
    • Changes to footnotes/rationale
    • Grammar/spelling
  42. kallewoof commented at 4:56 am on January 4, 2023: member

    No worries!

    Maybe mark BIP-151 as “Replaced” in README.mediawiki, similar to BIP-1 replaced by BIP-2?

  43. Add BIP324 32af2c9dc2
  44. dhruv force-pushed on Jan 4, 2023
  45. dhruv commented at 4:47 pm on January 4, 2023: contributor

    Maybe mark BIP-151 as “Replaced” in README.mediawiki, similar to BIP-1 replaced by BIP-2? @kallewoof Done.

  46. kallewoof merged this on Jan 4, 2023
  47. kallewoof closed this on Jan 4, 2023

  48. edwargix cross-referenced this on Jan 8, 2023 from issue Socratic Seminar 12 Reading List (Jan 2023) by 0xBEEFCAF3

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me