Hi y'all,

In case you weren't already tired of all the recent dev list chatter re post
quantum cryptography, here's another!

When the topic of Bitcoin transitioning to a post quantum world is brought
up,
the discussion typically focuses on the consensus layer re swapping out
vulnerable signature schemes. However, the consensus layer isn't the only
area
of Bitcoin that relies in cryptography that would be broken in the face of a
powerful quantum computer! That's right, I'm talking about BIP 324, the
peer to
peer encryption BIP for Bitcoin.

Like everything else on the Internet today, BIP 324 uses ECDH to allow two
connecting peers to derive a shared secret known only to them, which is then
used to encrypt all traffic between them. As ECDH relies on Elliptic Curve
cryptography, a future quantum computer would be able to eavesdrop on a p2p
handshake transcript, then derive the underlying private keys to the
ephemeral
ECDH public key, permitting it to decrypt all traffic. It's actually worse
than
that, as today adversaries can collect all encrypted p2p Bitcoin traffic,
with
the hope of being able to decrypt it all at a future date. This is commonly
referred to as the: "harvest, decrypt later" (HNDL) strategy [11].

Compared to a consensus change, which requires widespread market agreement,
and
coordination to achieve, upgrading BIP 324 to be post quantum resistant is a
much lower hanging fruit worthy of pursing immediately.

Last week I starting thinking a bit about this topic, brushing up on the
latest
literature/techniques, and stumbled onto a few key design questions. The
goal
of this post isn't to propose a new concrete p2p encryption BIP, instead I
want
to start discussion on the various design tradeoffs that came up as I was
researching this p2p encryption transition.

## PQ BIP 324 Design Questions

1. Do we want to pursue a hybrid KEM (key encapsulation mechanism), or go
with
   a pure PQ KEM?

2. Is it still a key requirement that the initial handshake be
   indistinguishable from a random byte string?

   2a. If yes to the above, then should we go with
classical-then-pq-upgrade,
   or a one shot hybrid oblivious KEM.


## A Brief Intro to KEMs + ML-KEM

First, let's introduce the new primitive we have to work with: ML-KEM
(Module-Lattice-Based Key-Encapsulation Mechanism) [1][2]. As it says on the
tin, ML-KEM is a lattice based Key-Encapsulation Mechanism. The phrase KEM
might sound unfamiliar with those comfortable with ECDH, but ECDH is
actually a
KEM itself.

A KEM has 3 algorithms:
  * KeyGen() -> {sk, pk}
     * Generates a public/private secret key pair

  * Encaps(pub) -> {secret, capsule}
     * Generates a new secret value, and a "capsule", which only the holder
of
       pub can use to obtain the secret value.

  * Decaps(priv, capsule) -> secret
     * Uses the private key to extract the secret from the capsule


If you squint a bit, then you'll see that ECDH is a KEM, and a rather
elegant
one at that:
  * KeyGen() -> {k, k*G}
      * Normal EC key generation.

  * Encaps(pub) -> {capsule = x*G, secret = pub*x}
      * The core ECDH routine. The ephemeral public key is actually the
        "capsule". The resulting secret is the ECDH output with the remote
        party's KEM public key and the local secret.

  * Decaps(priv, capsule) -> secret = priv * capsule
      * The receiver completes the key exchange using the ephemeral public
key
        and their own private key.

ECIES is another flavor of EC based KEM.

One thing worth noting is that AFAICT, so far in the NIST PQC world [4],
there is
no known non-interactive key exchange protocol like we enjoy today with
ECDH.
IIUC, the reason is that lattice based schemes derived from the LWE [3]
problem, whose security is predicated on using "noise" to hide a secret
value.
For these cryptosystems, usually a type of "hint" is sent to make everything
work out nicely like in ECDH. However, in the stricter non-interactive
setting
(no messages sent), this doesn't map cleanly.

As a result, ML-KEM looks more like a hybrid encryption protocol (Alice
encrypts a shared secret to bob using asymmetric lattice crypto).

## To Hybrid KEM, Or Not to Hybrid KEM

This brings us to our first design question....

Should we use a hybrid KEM or a pure post quantum one?

A hybrid KEM would keep the existing ECDH, _also_ do ML-KEM, then securely
combine (there's some subtlety there, see [6][7]) the resulting in a
final secret value for encryption. A hybrid KEM is attractive as an
encryption
channel derived from such a KEM is secure if _any_ of the combined schemes
are
secure. This permits schemes to hedge a bit, as hey, maybe the PQ stuff is
actually broken in the future but ECDH isn't. If it's the other way around,
then your encryption scheme is still secure.

### Pure ML-KEM P2P Encrypted Handshake

If we opt to not use a hybrid scheme, then the Elligator layer can be
dropped
all together. Instead, the 1.1 KB (ML-KEM-768) encapsulation keys are sent,
keeping the trailing garbage+terminator in tact.

The initial handshake would look something like:
 * Alice -> Bob: alice_encaps || initiator_garbage
    * Alice derives an encapsulation key, and sends it to Bob.

 * Bob -> Alice: ml_kem_capsule || responder_garbage ||
responder_garbage_terminator || first_encrypted_packet
   * Bob uses Alice's encapsulation key to encapsulate a random secret, and
     sends it over to Alice. He can also encrypt the first message at this
     point.

 * Alice -> Bob: initiator_garbage_terminator || first_encrypted_packet
   * Alice de-encapsulates the shared secret, and can now also start to
encrypt
     messages.

We'd then replace `v2_ecdh` with something like a `v3_mlkem` that derives
the
final shared secret based on the sent/received transcript up until that
point:
  * `sha256_tagged("bip324_ml_kem", ml_kem_secret, alice_encaps,
ml_kem_capsule)`

### Hybrid ML-KEM P2P Encrypted Handshake

If we want to use a hybrid combiner, then along side the normal ellswift
keys,
the ML-KEM-768 encap key is also sent:

 * Alice -> Bob: ellswift_alice || alice_encaps || initiator_garbage
 * Bob -> Alice: ellswift_bob || ml_kem_capsule || responder_garbage ||
responder_garbage_terminator || first_encrypted_packet
 * Alice -> Bob: initiator_garbage_terminator || first_encrypted_packet

Then following guidelines of [7], we'd then replace `v2_ecdh` with something
like `v3_hybrid_shared_secret`:
  * `sha256_tagged("bip324_ellswift_xonly_ecdh_mlkem_768", ml_kem_ss,
ecdh_point_x32, alice_encaps, ml_kem_capsule, ellswift_alice, ellswift_bob)`

## PQ/Hybrid Obfuscated KEMs

At this point, those that are familiar with BIP 324 will recognize that both
the pure PQ and hybrid versions renders the ElligatorSwift usage pretty much
useless. ElligatorSwift encodes a 32-byte public key as a 64-byte value
which
is indistinguishable from a uniformly distributed bitstream. In a bubble,
this
means that the initial BIP 324 handshake to a 3rd party observer just looks
like random bytes. However, with the introduction of ML-KEM, the ML-KEM
encapsulation key is sent in plaintext over the wire. An ML-KEM key has
identifiable structure, as it's a giant vector of polynomial coefficients
mod
3329, which is easily recognizable over the wire.

Luckily, there's an ML-KEM analogue to ElligatorSwift, called Kemeleon
[8][9][10]! In a similar fashion to ElligatorSwift, it takes an ML-KEM
public
key, then encodes it as one giant integer, utilizing rejection sampling.
Kemeleon applies this mapping both to the encapsulation keys, and also the
capsule ciphertext that encrypts the shared secrets. The ML-KEM keys end up
being a bit smaller, while the ciphertexts map to a larger value. Another
tradeoff is that the Kemeleon key generation is ~3x slower than normal
ML-KEM
generation.

One thing to note here is that Kemeleon's "looks random" property isn't
quite
on the same footing as ElligatorSwift's. ElligatorSwift is statistically
indistinguishable from random, since every 512-bit string is a valid
encoding.
Kemeleon's indistinguishability is computational, resting on a Module-LWE
style assumption. So if you naively concatenate an ElligatorSwift key and a
Kemeleon key, the pair is only as obfuscated as the weakest visible half.
This
asymmetry is what motivates the OEINC construction discussed below.

This brings us to our second design question....

Do we still want to ensure that the BIP-324 handshake looks identical to a
pseudorandom bytestream from the very first message?

Assuming yes, then AFAICT, we have two classes of options here:
  1. Retain the existing BIP-324 outer ElligatorSwift handshake, but use
ML-KEM
     within that initial encrypted transport to upgrade to a PQ shared
secret.

  2. Use the Outer Encrypts Inner Nested Combiner (OEINC - "OINK") combiner
     from [8].

  3. Attempt to adapt Drivel from [8] into the Bitcoin p2p setting.

### Classical Encrypted Channel Upgrades to PQ

With the first option, we simply use one KEM right after the other. So BIP
324
v2 would be mostly unchanged, then we _upgrade_ to BIP 324 v3 within v2.

A sketch of this would be something like:
  * Phase 0: normal BIP 324 handshake
  * Phase 1: negotiation of PQ KEM scheme over the encrypted handshake
     * Can be optional, if we just pick a set PQ KEM scheme.
     * Before this point, no Bitcoin p2p message should be sent, as the
channel
       isn't PQC protected yet.
  * Phase 2: do normal ML-KEM within the ElligatorSwift derived encrypted
    transport
     1. Alice sends the encapsulation key
     2. Bob derives a secrets, encrypts it using the encapsulation key
     3. Both sides then derive a PQ shared secret, ss_PQ
  * Phase 3: both sides use a hybrid combiner like sketched out above to
derive
    a new set of transport keys
  * Phase 4: both sides rekey, switching over to a new the transport keys

The upside of this option is that the outer part of BIP 324 remains
unchanged,
then with another round trip, we're able to upgrade the encryption keys to
PQ
hybrid security. The downside is that the very first messages sent aren't PQ
from the start, but a PQ adversary wouldn't be able to decrypt the actual
Bitcoin p2p messages (as we wait to send those until the upgrade). The
handshake still looks like just random bytes.

### Outer Encrypts Inner Nested Combiner

For the second option, [8] (with talk video [9] and slides [10]) describes
an
OEINC scheme where the outer KEM
encrypts the inner KEM, wherein the KEM ciphertext of an inner KEM is
encrypted
using a shared secret derived from the outer KEM. The two KEM ciphertexts
and
the two derived keys are then used alongside a hybrid combiner to derive a
final shared secret.

Unlike the classical-then-pq-upgrade that establishes a classical channel,
then
uses that to upgrade to pq channel, OEINC is a special hybrid combiner that
achieves a similar output but in one swoop. It defines a special KEM, which
can
then be used as the KEM in the very first handshake I sketched out.

A sketch of this KEM looks something like:
  * Setup:
    * The outer KEM is BIP 324's ElligatorSwift-encoded secp256k1 DHKEM.
       * It serves as the outer KEM because its on-wire encoding is
         statistically indistinguishable from random.
    * The inner KEM is ML-Kemeleon.

  * KeyGen():
    * (kem_secret_outer, kem_pubkey_outer) = outKEM.Gen()
    * (kem_secret_inner, kem_pubkey_inner) = inKEM.Gen()
    * combined_pubkey = (kem_pubkey_outer, kem_pubkey_inner)
    * combined_secret = (kem_secret_outer, kem_secret_inner)

  * Encaps(combined_pubkey):
    * (shared_secret_outer, capsule_outer) = outKEM.Encap(kem_pubkey_outer)
    * (encrypt_key_1, encrypt_key_2) = KDF(shared_secret_outer)
    * (shared_secret_inner, capsule_inner) = inKEM.Encap(kem_pubkey_inner)
    * encrypted_capsule_inner = encrypt(encrypt_key_1, capsule_inner)
    * combined_capsule = capsule_outer || encrypted_capsule_inner
    * combined_shared_secret = combine(encrypt_key_2, shared_secret_inner,
combined_capsule)

  * Decaps(combined_secret, combined_capsule):
    * (capsule_outer, encrypted_capsule_inner) = combined_capsule
    * shared_secret_outer = outKEM.Decaps(kem_secret_outer, capsule_outer)
    * (encrypt_key_1, encrypt_key_2) = KDF(shared_secret_outer)
    * capsule_inner = decrypt(encrypt_key_1, encrypted_capsule_inner)
    * shared_secret_inner = inKEM.Decaps(kem_secret_inner, capsule_inner)
    * combined_shared_secret = combine(encrypt_key_2, shared_secret_inner,
combined_capsule)


This is done over just sending the two encapsulated secrets plainly as I
outlined above in order to achieve a stronger security notion. The issue
with
this though is that though ciphertext uniformity (the encapsulated secrets)
is
achieved, the two public keys sent are randomly looking, but not in a
uniform
manner. In practice, this might not really matter much AFAICT (a theoretical
adversary would be able to distinguish the Elligator half from the Kemeleon
half).

### Drivel: PQ-Obfuscated Authentication

The biggest issue with Drivel as a fit for BIP 324 is that it expects the
initiator to already know a long term static public key for the responder.
In
the case of BIP 324, only ephemeral keys are exchanged, so there's no long
term public keys known to either side.

To get around this, we could extend BIP 155 (or make a new one likely, given
size limits) to include a signed OKEM key. However then that would introduce
authentication into the combined set, which explicitly wasn't a design goal
of BIP 324.

With that caveat in mind, here's the construction itself. Drivel [8]
combines
the OEINC scheme with another layer that out-of-the-box assumes an
asymmetric
protocol within a set client and server. The client uses an existing OEINC
KEM public key published by the server to then encrypt a fresh new ephemeral
KEM.

-----

So there we have it. Before drafting a concrete v3 transport, we need to
decide if we want a hybrid KEM, or are fine with a pure PQ KEM. Then we
need to
decide if we want to attempt to maintain the current quality where the p2p
handshake transcript is indistinguishable from random. If yes, then that
forces
another series of decisions re how to construct/compose an oblivious KEM
from
available primitives.

At a glance, the route of classical-then-pq-upgrade seems to be the
simplest.
BIP 324 stays as is, then we run ML-KEM within that. The ML-KEM keys are
encrypted, so there's no need to sprinkle in the layer of Kemeleon.

If we want a nice combined protocol, then we should investigate the OEINC
route. It's more data to send as part of the initial handshake, but we still
keep ElligatorSwift and use that as the outer KEM.

If for some reason we're concerned with a future adversary gaining a
distinguisher for Kemeleon, then maybe we need to bite the bullet and also
roll out a full blown PQ authentication protocol along side everything.

One thing worth flagging for any of the byte-0 designs (where PQ material is
sent in the clear on the very first flight, like the hybrid and OEINC
sketches
above): ML-KEM-768 makes the responder do real work before it can decide if
a
connection is even legit. Today, the responder only needs the first 64 bytes
of an ElligatorSwift share before it can derive the shared secret. With
ML-KEM-768, the responder has to read and validate a 1184 byte encapsulation
key before running Encaps, and FIPS 203 mandates input checks on every
Encaps
and Decaps. In a permissionless P2P network, that's a meaningful change in
inbound DoS surface, and probably calls for stricter handshake byte limits,
tighter timeouts, and possibly some form of stateless cookie/puzzle if
handshake floods become a real problem. The classical-then-pq-upgrade path
sidesteps most of this since the PQ material only shows up after the v2
channel is up.

With all that said, after the above design decisions are addressed, there
aren't too many concrete blockers here w.r.t rolling this out. Of course the
development (eg: selecting/creating a library for ML-KEM and maybe
ML-Kemeleon), and upgrade will take some time. But unlike the consensus
layer, p2p encryption doesn't require the widespread market agreement that
an
actual soft fork does. BIP 324 is a much shorter walk to PQ than the
consensus
layer, and serves as a sort of PQ warm up before the bigger soft fork is
tackled.


-- Laolu

[1]: https://en.wikipedia.org/wiki/ML-KEM
[2]: https://csrc.nist.gov/pubs/fips/203/final
[3]: https://en.wikipedia.org/wiki/Learning_with_errors
[4]: This statement ignores Isogeny based crypto, and also SWOOSH [5] as it
requires 200 KB pubkeys
[5]: https://eprint.iacr.org/2023/271
[6]: https://eprint.iacr.org/2018/024
[7]: https://eprint.iacr.org/2020/1364
[8]: https://eprint.iacr.org/2024/1086
[9]: https://www.youtube.com/watch?v=CvFCYUq5rGg
[10]:
https://csrc.nist.gov/csrc/media/Presentations/2025/kemeleon/images-media/kemeleon.pdf
[11]: https://en.wikipedia.org/wiki/Harvest_now,_decrypt_later

-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAO3Pvs9U3prZJiDs0Ns7LSA07R8hM-GQou_FcTZZz-JUQpUYHw%40mail.gmail.com.