BIP 39: potential entropy distribution issue report #1831

pull okba14 wants to merge 1 commits into bitcoin:master from okba14:patch-2 changing 1 files +83 −0
  1. okba14 commented at 5:11 pm on April 18, 2025: none
    Potential Weakness in BIP-39 Mnemonic Entropy Distribution Across Multiple Languages
  2. Create weakness.md
    Potential Weakness in BIP-39 Mnemonic Entropy Distribution Across Multiple Languages
    eaacdcba78
  3. okba14 commented at 5:14 pm on April 18, 2025: none

    Hi maintainers 👋

    I initially attempted to submit this issue via GitHub Issues but could not proceed due to repository contribution restrictions. As recommended in the contributing guidelines, I am submitting my research via this pull request instead.

    Looking forward to your feedback. Thank you for your time and review!

    Best regards,
    Okba [ GUIAR OQBA ] techokba@gmail.com

  4. jonatack commented at 5:41 pm on April 18, 2025: member

    Thank you for your report.

    Does this pertain to the BIP39 reference implementation (https://github.com/trezor/python-mnemonic)?

    For more context, see:

    Note that BIP39 replacements exist like https://github.com/satoshilabs/slips/blob/master/slip-0039.md and BIP93.

    Perhaps add your feedback to https://github.com/bitcoin/bips/wiki/Comments:BIP-0039 if applicable to the BIP39 spec or its reference implementation.

    Closing this PR, as it isn’t a change that is intended to be merged. Feel free to continue the discussion here. Thanks!

  5. jonatack renamed this:
    Create weakness.md
    BIP 39: potential entropy distribution issue report
    on Apr 18, 2025
  6. jonatack closed this on Apr 18, 2025

  7. okba14 commented at 6:03 pm on April 18, 2025: none

    Thank you for your response and for the helpful references.

    Yes, my findings are primarily related to how the BIP-39 standard is implemented and used in practice, including but not limited to the python-mnemonic reference implementation. The statistical anomalies I observed could point to either potential implementation issues or broader concerns regarding entropy usage in mnemonic generation across different platforms and languages.

    I understand this PR doesn’t fit the criteria for a direct merge, and I appreciate the clarification. I will follow your suggestion and contribute my observations to the BIP-0039 Comments Wiki to help improve awareness and discussion around this topic.

    Thank you again for your time and for maintaining such an important repository.

    Best regards, Okba [ GUIAR OQBA ] Security Researcher

  8. jonatack commented at 8:05 pm on April 18, 2025: member

    @okba14 in that comments wiki, I see that you added this note:

    0📌 Final Note from the Researcher:
    1
    2This report is submitted as an initiative to raise security awareness and improve the implementation quality of the BIP-39 standard across different languages and platforms. All observations are based on scientific analysis and public data without compromising any privacy or breaching any system.
    3
    4I welcome all your feedback and comments, and look forward to collaborating with the developer and research community to ensure a more secure environment for blockchain technology users.
    5
    6Best regards, Okba [GUIAR OQBA] Cybersecurity Researcher
    

    It seems to be missing some content that you wished to add. Let me know if I can help.

  9. okba14 commented at 9:07 pm on April 18, 2025: none

    Hi @jonatack, thank you again for your follow-up and close attention.

    You’re absolutely right — the note I added to the wiki was intentionally concise, but I now realize it may have created the impression that something was missing. In fact, I was initially being cautious not to overload the space with technical details before confirming the appropriateness of such content.

    As a cybersecurity researcher, my intention is to shed light on a potentially serious entropy-related issue that I encountered while analyzing the generation and recovery of BIP-39 mnemonic phrases. In short, while testing wallet recovery mechanisms, I was struck by the surprisingly high number of successful wallet restorations using algorithmically generated mnemonic phrases — far beyond what should statistically occur under ideal entropy assumptions.

    I’ve documented the full methodology, findings, and statistical outcomes in a structured report, including:

    Scripted generation and validation of mnemonics using bip_utils, web3.py, and solana.rpc.api

    Observed frequency-based patterns across different languages and positions within the 12/24-word phrases

    A sample set of empty but valid wallets generated during testing (ready to share securely)

    Statistical summaries showing thousands of real wallet recoveries, with balances in some cases

    Given the sensitivity of this topic (especially with real wallets being unintentionally accessed), I wanted to ensure responsible disclosure practices and community input before publishing further.

    I would greatly appreciate your guidance on how best to proceed:

    Would you recommend submitting the full report as a supplemental document?

    Or would it be more appropriate to open a new structured discussion in the BIP39 Comments Wiki?

    Again, thank you for your support and consideration. My sole intent is to contribute meaningfully to the improvement and awareness of potential implementation-level risks in widely-used crypto standards like BIP-39.

    Best regards, Okba [GUIAR OQBA] Cybersecurity Researcher techokba@gmail.com

  10. bitcoin deleted a comment on Apr 18, 2025
  11. murchandamus commented at 11:21 pm on April 18, 2025: contributor
    @okba14: Do you think your findings could be explained by some users manually picking their own words from the word list to create mnemonics rather than generating them randomly?
  12. okba14 commented at 5:16 am on April 19, 2025: none

    @murchandamus : Thank you for your input!

    While it’s technically possible for advanced users to manually select words from the BIP39 word list and import them into wallets that support custom mnemonics, the majority of mainstream wallets do not allow users to manually craft their mnemonic phrases—they’re usually generated randomly using secure entropy sources.

    In my analysis, I considered the possibility of handcrafted mnemonics, but the patterns observed appear too frequent and systematic to be fully explained by manual selection. The entropy distribution and word frequency suggest potential bias or deviation in the generation process, possibly at the wallet level.

    Still, I agree it’s a factor worth keeping in mind when interpreting the findings.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-07 15:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me