BIP Draft: Formosa — Themed mnemonic sentences for generating deterministic keys #2108

pull Yuri-SVB wants to merge 11 commits into bitcoin:master from Yuri-SVB:master changing 1 files +422 −0
  1. Yuri-SVB commented at 8:16 PM on February 28, 2026: none

    Mnemonic sentences instead of words proposed as forwards- and backwards-compatible expansion to BIP39, itself as Bitcoin Improvement Proposal.

  2. Formosa as BIP
    Mnemonic *sentences* instead of words proposed as forwards- and backwards-compatible expansion to BIP39, itself as Bitcoin Improvement Proposal.
    ea51d9b4b1
  3. in bip.mediawiki:4 in ea51d9b4b1 outdated
       0 | @@ -0,0 +1,224 @@
       1 | +<pre>
       2 | +  BIP: ?
       3 | +  Layer: Applications
       4 | +  Title: Formosa --- Themed mnemonic sentences for generating deterministic keys
    


    murchandamus commented at 7:24 PM on March 2, 2026:

    Title is limited to 50 characters


    Yuri-SVB commented at 9:44 PM on March 23, 2026:

    New title: Encoding seed as themed mnemonic sentences

  4. in bip.mediawiki:14 in ea51d9b4b1
       9 | +  Status: Draft
      10 | +  Type: Standards Track
      11 | +  Created: 2021-12-10
      12 | +  License: BSD-2-Clause
      13 | +  Requires: BIP-0032, BIP-0039
      14 | +  Post-History: https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
    


    murchandamus commented at 7:27 PM on March 2, 2026:

    As we are now following BIP3 for the BIP Process, the Preamble is formatted slightly differently:

      Authors: Yuri S Villas Boas <yuri@t3infosecurity.com>
               André Fidencio Gonçalves <andre7c4@gmail.com>
      Status: Draft
      Type: Specification
      Assigned: ?
      License: BSD-2-Clause
      Requires: 32, 39
      Discussion: https://gnusha.org/pi/bitcoindev/jQqInjh7VTC5byefTzENidJjigvRqf5Y7UvbrWjKPJykvhdlLETeglGE3zoAiVAxUyAXU8uWHsHEjJ0MHqqPTy4prgaIhgMyIrD9c6ZUuE0=@pm.me/#t
                  https://gnusha.org/pi/bitcoindev/F4cs-RJRQYBXhjoS9fc_cUc93yLrkQS5DNQAeFRHrLEQ5bScCjKSnaqN-IcXb16fxqO053muqFCx8_GzzKN5XCGCIHD9Ir1_baI5voKYfOo=@pm.me/
                  https://www.toptal.com/cryptocurrency/formosa-crypto-wallet-management
    
  5. murchandamus commented at 7:31 PM on March 2, 2026: member

    Hi Yuri, thank you for your submission. I see that your proposal was posted to the mailing list in 2023. Since then, we deployed BIP3 as a new BIP Process, so there are a few formatting changes that would be needed to the preamble. I would also suggest that you add a link to the prior discussion to the Discussion header.

    At first glance, your document appears to be missing a Specification, a Rationale, and a Backwards Compability section. Please refer to BIP3 for more information.

  6. murchandamus added the label New BIP on Mar 4, 2026
  7. murchandamus added the label PR Author action required on Mar 4, 2026
  8. murchandamus renamed this:
    Formosa as BIP
    BIP Draft: Formosa — Themed mnemonic sentences for generating deterministic keys
    on Mar 16, 2026
  9. murchandamus commented at 6:11 PM on March 16, 2026: member

    Hi @Yuri-SVB, I haven’t given this document a full review yet, because the initial submission has some formatting issues. If you are still working on this, please update your submission to meet the formatting requirements.

  10. Yuri-SVB commented at 4:35 PM on March 23, 2026: none

    Hello, Murchandamus. Thank you for your attention, and thank you for remembering my earlier attempt from 3 years ago! I believe the requirements are met now.

  11. Update bip.mediawiki
    Co-authored-by: Mark "Murch" Erhardt <murch@murch.one>
    3166be9419
  12. Update bip.mediawiki
    Satisfying requirement of title in fewer than 50 characters.
    738dac9c16
  13. Yuri-SVB commented at 5:50 PM on March 28, 2026: none

    Hi @Yuri-SVB, I haven’t given this document a full review yet, because the initial submission has some formatting issues. If you are still working on this, please update your submission to meet the formatting requirements.

    Hello, Murch! Could you confirm all the remaining formatting requirements were met? Thank you!

  14. murchandamus commented at 5:05 AM on March 29, 2026: member

    Hey Yuri, sorry for not getting around to looking at this yet. The preamble looks much better. I’m afraid I’m gonna be afk next week, so I will not be able to give this a full read until I’m back the week after.

  15. murchandamus removed the label PR Author action required on Mar 29, 2026
  16. Yuri-SVB commented at 4:54 PM on April 1, 2026: none

    Hey Yuri, sorry for not getting around to looking at this yet. The preamble looks much better. I’m afraid I’m gonna be afk next week, so I will not be able to give this a full read until I’m back the week after.

    Hello, Murch. No problem. I hope this compilation of references on Formosa (how I call this BIP39 expansion) can be of help.

  17. in bip.mediawiki:93 in 738dac9c16
      88 | +|  128  |  4 |   132  |  4  |     24      |      12       |
      89 | +|  160  |  5 |   165  |  5  |     30      |      15       |
      90 | +|  192  |  6 |   198  |  6  |     36      |      18       |
      91 | +|  224  |  7 |   231  |  7  |     42      |      21       |
      92 | +|  256  |  8 |   264  |  8  |     48      |      24       |
      93 | +</pre>
    


    murchandamus commented at 7:41 PM on April 14, 2026:

    I’m not completely opposed to a text-only presentation, but wanted to point out that Mediawiki does include table formatting, and most readers of the BIP would probably see the rendered version. When using table formatting, I think it would be possible to skip over the abbreviations to label the table and it would still fit.


  18. in bip.mediawiki:79 in 738dac9c16
      74 | +BIP-0039 is a special case where each sentence contains three 11-bit fields
      75 | +indexing a single 2048-word list (3 x 11 = 33).
      76 | +
      77 | +The following table describes the relation between the initial entropy
      78 | +length (ENT), the checksum length (CS), the number of 33-bit sentences (S),
      79 | +and the length of the generated mnemonic sentence (MS) in words. The word
    


    murchandamus commented at 7:46 PM on April 14, 2026:

    It’s slightly confusing that you speak about multiple sentences that together compose to a single mnemonic sentence. Perhaps it would be better to use distinct terms, i.e, to use a different term for sentences or for the mnemonic sentence. I’m not convinced it’s the right suggestion, but perhaps, S sentences make one “mnemonic story” with MS words?


  19. in bip.mediawiki:101 in 738dac9c16
      96 | +
      97 | +# Initialize an empty sentence array with one slot per category.
      98 | +# For each category in the theme's ''filling order'':
      99 | +## Extract <code>BIT_LENGTH</code> bits from the current position in the bit stream.
     100 | +## Interpret them as an unsigned integer index.
     101 | +## If the category is ''led by'' another category, look up the appropriate sub-list from the leading category's mapping using the already-selected leading word. Otherwise, use the category's total word list.
    


    murchandamus commented at 8:22 PM on April 14, 2026:

    I don’t understand what you mean with “if the category is led by”


  20. in bip.mediawiki:129 in 738dac9c16
     124 | +   - the wordlist is sorted which allows for more efficient lookup of the code words
     125 | +     (i.e. implementations can use binary search instead of linear search)
     126 | +
     127 | +d) first-letters uniqueness
     128 | +   - the wordlist is created in such a way that it's enough to type the first two
     129 | +     letters to unambiguously identify the word
    


    murchandamus commented at 8:29 PM on April 14, 2026:

    This formatting is odd. Did you intend to make those code blocks?

    <img width="754" height="499" alt="Image" src="https://github.com/user-attachments/assets/c67d7b88-2640-494f-a005-68d13d1afc99" />


  21. murchandamus commented at 8:36 PM on April 14, 2026: member

    This reads already pretty well, although the specification could be presented in a more technical manner. It seems a bit light on the Rationale. It would be preferable if there were a Backwards Compatibility section instead of the mention in the Abstract.

    I think an example of a Formosa-encoded seed could help illustrate what you are trying to do, I was firmly expecting to see one until I got to the end.

  22. murchandamus added the label PR Author action required on Apr 14, 2026
  23. Formosa: address PR #2108 review feedback
    Restructure the draft to follow BIP-3 conventions and resolve the issues
    raised by reviewers in https://github.com/bitcoin/bips/pull/2108:
    
    - Introduce explicit Specification section with a Terminology subsection
      that distinguishes 'word', 'category', 'theme', 'sentence' and
      'mnemonic' / 'mnemonic story', removing the ambiguity of using
      'sentence' at two different scales.
    - Replace the unclear 'if the category is led by another category'
      wording with an explicit LED_BY field description and a step-by-step
      algorithm that covers both the leaderless and led cases.
    - Reflow the theme-property list (previously a/b/c/d/e split by an
      intervening paragraph) into a single numbered list so it renders as a
      list rather than as code blocks.
    - Add a dedicated Rationale section covering the 33-bit sentence size,
      themed sentences, free-form theme schema, the LED_BY mechanism, the
      re-encoding-through-BIP-39 design, and why custom themes are
      discouraged.
    - Add a dedicated Backwards Compatibility section describing
      compatibility at the mnemonic, entropy, and seed levels.
    - Add a worked Example section showing a 128-bit entropy being encoded
      into a 4-sentence mnemonic story under a small illustrative theme,
      including bit splitting, FILLING_ORDER vs NATURAL_ORDER, and the
      LED_BY lookup.
    - Tighten the Abstract and Motivation; clarify that BIP-39 is itself a
      Formosa theme.
    f5b0a1e942
  24. Formosa: spell out abbreviated table labels
    Reviewer on PR #2108 asked for no abbreviations in table labels. Replace:
    
    - ENT / CS / S / MS column headers with 'Initial entropy bits',
      'Checksum bits', 'Total bits', 'Number of sentences', 'Mnemonic
      words (6-word theme)' and 'Mnemonic words (BIP-0039)'.
    - 'List size / Bits / Chars to identify / Density (bits/char)' with
      'Wordlist size / Bits per word / Characters to identify / Density
      (bits per character)'.
    - ADJ. with ADJECTIVE in the example bit-assignment diagram, and the
      surrounding narrative ENT/MS uses with the spelled-out forms.
    
    The accompanying formulas now use the expanded names too, so the
    algorithm description and the table column headers stay consistent.
    ac185147e0
  25. Formosa: rebuild Example on the real medieval_fantasy theme
    Replace the previous hypothetical 5-category example with one that
    mirrors the medieval_fantasy theme actually shipped at
    https://github.com/Yuri-SVB/formosa/tree/master/src/mnemonic/themes,
    including:
    
    - the real 6 categories with their actual BIT_LENGTHs
      (VERB=5, SUBJECT=6, OBJECT=6, ADJECTIVE=5, WILDCARD=6, PLACE=5,
      summing to 33);
    - the real FILLING_ORDER and NATURAL_ORDER;
    - the real lead tree (VERB → SUBJECT; SUBJECT → OBJECT and WILDCARD;
      OBJECT → ADJECTIVE; WILDCARD → PLACE), showing that a single
      leader can have several dependent categories;
    - a 33-bit block whose decoded indices (28, 32, 63, 27, 46, 29)
      pick existing words and existing sub-list entries: VERB[28]
      =unveil, SUBJECT_under_unveil[32]=king, OBJECT_under_king[63]
      =wine, ADJECTIVE_under_wine[27]=sweet, WILDCARD_under_king[46]
      =queen, PLACE_under_queen[29]=throne_room, yielding the sentence
      'king unveil sweet wine queen throne_room'.
    
    This keeps the worked example faithful to the reference
    implementation rather than to a fabricated theme, so that anyone can
    reproduce the encoding by parsing medieval_fantasy.json.
    621fa45042
  26. Formosa: explain LED_BY as a primitive next-word predictor
    Add a paragraph to the LED_BY rationale clarifying that a Formosa theme
    behaves as a primitive language model (next-word predictor): each LED_BY relation
    skews the conditional distribution over the next word so that probability
    mass falls only on the 2^BIT_LENGTH words compatible with the already-
    chosen leader, and zero elsewhere. The theme designer plays the role of
    training data, hand-curating which combinations are semantically coherent.
    This framing makes explicit why themes produce sentences that 'sound right'
    while still covering all 2^33 bit patterns of a sentence.
    2d87a3cbe5
  27. Cite the companion project Mooncake (https://github.com/T3-Infosec/mooncake)
    which builds on this property by rendering each Formosa category as an
    on-screen table whose rows and columns are permuted per input session.
    
    Combined with the randomized-indexation property,
    an attacker watching only the screen still learns nothing without also
    recovering the press sequence.
    
    Add a Rationale paragraph explaining a further benefit of splitting the
    vocabulary into several short wordlists (32-128 entries each): such tables
    fit on a mobile-device screen and admit input via on-screen lookup, which
    a single 2048-word list does not.
    
    The randomized indexation:
    
    - defeats pure key-logging (keystrokes alone don't reveal words; the
      attacker also needs the session permutation),
    - raises the bar for shoulder surfing (same as key-logging: only keys
      AND session's permutation suffice. Either alone is uniformative).
    
    This gives an operational, security-focused argument for the
    many-small-lists design that complements the existing memorization and
    information-density arguments.
    
    Formosa: document Mooncake's volume-key input on mobile
    
    Add a paragraph to the Mooncake rationale describing the proposed mobile
    input mechanism: reuse of the volume-up / volume-down keys as a two-button
    binary selector. Because every Formosa category is sized 2^BIT_LENGTH and
    the on-screen table is laid out in rows, sub-rows and columns whose counts
    are powers of two, narrowing to a single cell takes exactly BIT_LENGTH
    presses (5 for a 32-entry category, 6 for 64, 7 for 128). The per-category
    press count is invariant therefore uninformative, and equal to the bits of
    entropy encoded, and the 'one bit per press' bound matches the existing
    side-channel argument.
    
    Add three concrete reasons why volume-key input on mobile resists visual
    
    shoulder surfing better than an on-screen keyboard:
    
    - Subtler input motions: a single finger pressing a side rocker, much
      harder to read from a distance than multi-finger taps on a glass
      keyboard.
    - Easy occlusion with the second hand: both volume keys are on one edge
      of the device, so the free hand (or the holding hand's thumb) can
      cover them without obscuring the screen for the user.
    - Pocket input via headphone volume buttons: because the protocol is
      purely binary, headphone volume controls are sufficient, letting the
      user keep the buttons in a pocket while operating it by feel and
      removing the input motion from the observer's field of view entirely.
    000a7401d9
  28. murchandamus removed the label PR Author action required on Apr 27, 2026
  29. Update bip.mediawiki
    Fixed typo from "dektop"  to "desktop"
    Fixed agreement of number from "Those of a mobile device" to "Those of mobile devices"
    38c7dfd754
  30. in bip.mediawiki:51 in 38c7dfd754
      46 | +the mental associations that aid long-term retention.
      47 | +
      48 | +Formosa builds upon BIP-0039 by organizing mnemonic words into themed sentences
      49 | +with syntactic roles (e.g., subject, verb, adjective, object, place). Each
      50 | +sentence draws vocabulary from a coherent semantic domain --- medieval fantasy,
      51 | +science fiction, nature, finance, or any custom theme --- enabling the user to
    


    murchandamus commented at 4:27 PM on April 29, 2026:

    The triple hyphen doesn’t get rendered as a special character in Mediawiki markup, so perhaps just use em dashes:

    sentence draws vocabulary from a coherent semantic domain — medieval fantasy,
    science fiction, nature, finance, or any custom theme — enabling the user to
    
  31. in bip.mediawiki:4 in 38c7dfd754 outdated
       0 | @@ -0,0 +1,422 @@
       1 | +<pre>
       2 | +  BIP: ?
       3 | +  Layer: Applications
       4 | +  Title: Encoding seed as themed mnemonic sentences
    


    murchandamus commented at 4:44 PM on April 29, 2026:

    “Formosa” has better memorability. E.g., the following has 50 characters:

      Title: Formosa—Seed encoding per themed mnemonic stories
    

    Yuri-SVB commented at 10:50 PM on April 29, 2026:

    Thank you! 'Formosa' alludes to 'format', since it's a format for passwords / entropy arrays.

  32. murchandamus commented at 4:47 PM on April 29, 2026: member

    Good improvements, this reads great. I’m gonna look into a number assignment. It would probably be good if some wallet developers that have worked with BIP39 reviewed it, too.

  33. murchandamus added the label Needs number assignment on Apr 29, 2026
  34. Update bip.mediawiki
    Substituted triple hyphen for —
    
    Co-authored-by: Murch <murch@murch.one>
    923faa4880
  35. Update bip.mediawiki
    Updated title to mention Formosa and be more self-explanatory.
    
    Co-authored-by: Murch <murch@murch.one>
    08df954e5f
  36. Yuri-SVB commented at 10:52 PM on April 29, 2026: none

    Good improvements, this reads great. I’m gonna look into a number assignment. It would probably be good if some wallet developers that have worked with BIP39 reviewed it, too.

    Do you have someone in mind? Would you like me to invite a wallet develper?

  37. codeswot commented at 2:59 AM on May 5, 2026: none

    I was one of the first people to try out Formosa and work on it to some extent on other app. notably Mooncake a Side-channel attack protection wallet app (i used formosa here) and loved it, ported formosa from python to dart at some point too. While my involvement makes me biased, I am providing this review from the perspective of an implementer to verify the protocol's stability and compatibility.

    BIP-39 Compatibility: The Formosa mapping function is a reversible transformation. It is a strict 1:1 mapping of bits to sentence structures. I can confirm that a seed generated via Formosa can be exported to any standard BIP-39 wallet (e.g., Trezor, Ledger, Coldcard) without modification. The mnemonic sentence acts as an encoding layer for the underlying BIP-39 word list, not an alternative cryptographic standard.
    
    Entropy Density: The proposal maintains full 128-bit security by strictly mapping the sentence structures to the defined bit-length. There is no reduction in entropy; the 'themed' words are simply a semantic overlay for the underlying integer values.
    
    Defense-in-Depth: The inclusion of the Mooncake module in our reference implementation demonstrates that the Formosa format allows for UI-level side-channel mitigations (specifically against shoulder surfing and screen capture) that are difficult to implement with standard, non-structured word lists.

    I have verified that the implementation treats the mnemonic as a deterministic derivation of the seed. I am happy to provide test vectors or answer any questions the maintainers have regarding the wallet-import behavior."


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-09 19:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me