BIP39: Adds Russian word list #432

pull farazdagi wants to merge 2 commits into bitcoin:master from farazdagi:master changing 2 files +2068 −0
  1. farazdagi commented at 3:20 pm on August 12, 2016: none
    I’ve tried to follow guidelines defined in other languages.
  2. BIP39: Adds Russian word list 6332230d63
  3. in bip-0039/russian.txt: in 6332230d63 outdated
    192+буфет
    193+бухта
    194+бушлат
    195+бывать
    196+быль
    197+быть
    


    UdjinM6 commented at 4:02 pm on August 12, 2016:
    “бывать” and “быть” are too similar imo and I would exclude both of them tbh
  4. in bip-0039/russian.txt: in 6332230d63 outdated
    237+весло
    238+весна
    239+весть
    240+ветвь
    241+ветер
    242+ветка
    


    UdjinM6 commented at 4:05 pm on August 12, 2016:
    “ветвь” and “ветка” are very close synonyms, should probably pick only one?
  5. in bip-0039/russian.txt: in 6332230d63 outdated
    464+досада
    465+доска
    466+доход
    467+доцент
    468+дочь
    469+дошлый
    


    UdjinM6 commented at 4:11 pm on August 12, 2016:
    “дошлый” is rare/little-used word, probably not a good candidate
  6. in bip-0039/russian.txt: in 6332230d63 outdated
    524+жизнь
    525+жилой
    526+жилье
    527+житель
    528+жить
    529+жрать
    


    UdjinM6 commented at 4:18 pm on August 12, 2016:
    “жрать” is kind of vulgar form of “eat”, I would remove it
  7. in bip-0039/russian.txt: in 6332230d63 outdated
    702+клуб
    703+клык
    704+ключ
    705+клятва
    706+книга
    707+книжка
    


    UdjinM6 commented at 4:25 pm on August 12, 2016:
    “книга” and “книжка” are too close imo, I would remove “книжка”
  8. in bip-0039/russian.txt: in 6332230d63 outdated
    846+лодка
    847+ложь
    848+лозунг
    849+локоть
    850+ломать
    851+лондон
    


    UdjinM6 commented at 4:33 pm on August 12, 2016:
    “лондон” is the name of the city (London), should be removed imo
  9. in bip-0039/russian.txt: in 6332230d63 outdated
    946+мораль
    947+морда
    948+море
    949+мороз
    950+моряк
    951+москвич
    


    UdjinM6 commented at 4:36 pm on August 12, 2016:
    “москвич” is either the name of Russian automobile brand or it means “the one who is living in Moscow”. In either way it’s not a good candidate imo.
  10. in bip-0039/russian.txt: in 6332230d63 outdated
    1182+пальто
    1183+память
    1184+панель
    1185+паника
    1186+парень
    1187+париж
    


    UdjinM6 commented at 4:51 pm on August 12, 2016:
    “париж” is the name of the city “Paris”, should be removed imo

    cryply commented at 6:56 pm on May 12, 2020:
    it is important to have distinct easy to type and probably remember words - париж is good in that sense. it is not vocabulary of russian words - but list of mnemonic words in Russian
  11. in bip-0039/russian.txt: in 6332230d63 outdated
    1411+реформа
    1412+рецепт
    1413+речь
    1414+решать
    1415+решение
    1416+решить
    


    UdjinM6 commented at 4:57 pm on August 12, 2016:
    “решать” and “решить” are too close imo
  12. in bip-0039/russian.txt: in 6332230d63 outdated
    1420+ритм
    1421+рифма
    1422+робкий
    1423+родитель
    1424+родной
    1425+рожа
    


    UdjinM6 commented at 4:58 pm on August 12, 2016:
    “рожа” is a vulgar form of “face”, could be removed probably
  13. in bip-0039/russian.txt: in 6332230d63 outdated
    1428+роль
    1429+роман
    1430+ронять
    1431+роса
    1432+рослый
    1433+россия
    


    UdjinM6 commented at 4:59 pm on August 12, 2016:
    “россия” is the name of the country “Russia”, not sure if it’s a good candidate here
  14. in bip-0039/russian.txt: in 6332230d63 outdated
    1507+сестра
    1508+сеть
    1509+сечение
    1510+сжечь
    1511+сзади
    1512+сибирь
    


    UdjinM6 commented at 5:02 pm on August 12, 2016:
    “сибирь” is the name of the region in Russia - “Siberia”, probably not a good candidate
  15. in bip-0039/russian.txt: in 6332230d63 outdated
    1781+узор
    1782+уйма
    1783+указ
    1784+уклон
    1785+укол
    1786+украина
    


    UdjinM6 commented at 5:10 pm on August 12, 2016:
    “украина” is the name of the country - “Ukraine”, probably not a good candidate
  16. in bip-0039/russian.txt: in 6332230d63 outdated
    1785+укол
    1786+украина
    1787+уксус
    1788+улица
    1789+улыбка
    1790+ум
    


    UdjinM6 commented at 5:10 pm on August 12, 2016:
    “ум” is too short

    cryply commented at 6:57 pm on May 12, 2020:
    why length is a problem? most important will one make mistake while entering word list or not.
  17. in bip-0039/russian.txt: in 6332230d63 outdated
    1849+фонд
    1850+фонтан
    1851+форма
    1852+фото
    1853+фраза
    1854+франция
    


    UdjinM6 commented at 5:12 pm on August 12, 2016:
    “франция” is the name of the country - “France”, probably not a good candidate
  18. in bip-0039/russian.txt: in 6332230d63 outdated
    1889+царство
    1890+царь
    1891+цветок
    1892+целиком
    1893+целое
    1894+целый
    


    UdjinM6 commented at 5:14 pm on August 12, 2016:
    “целое” and “целый” are probably too close
  19. in bip-0039/russian.txt: in 6332230d63 outdated
    2004+энергия
    2005+эпизод
    2006+эпоха
    2007+эскиз
    2008+эссе
    2009+эстония
    


    UdjinM6 commented at 5:18 pm on August 12, 2016:
    “эстония” is the name of the country - “Estonia”, probably not a good candidate
  20. in bip-0039/russian.txt: in 6332230d63 outdated
    2033+язык
    2034+яйцо
    2035+якобы
    2036+якорь
    2037+январь
    2038+япония
    


    UdjinM6 commented at 5:19 pm on August 12, 2016:
    “япония” is the name of the country - Japan, probably not a good candidate
  21. UdjinM6 commented at 5:24 pm on August 12, 2016: contributor

    Also

    итак когда кроме кстати куда либо ловко между наверх назад налево нигде никак нынче однажды около откуда отнюдь отсюда оттого оттуда плохо полтора помимо поперек почему против путем пятеро пяток пять ранее сбоку сверху сегодня сейчас сзади слегка смело снизу снова совсем сорок сразу также твой теперь тогда тоже точно триста туго туда уйма целиком четыре явно якобы ярко ясно

    All of these above do not fit noun/verb/adj criteria - should be removed or mentioned in criteria imo. There also are some “numeric-like” words like “первый”, “тысяча” etc which I’m not sure about too but probably they are ok.

  22. farazdagi commented at 5:42 pm on August 12, 2016: none
    @UdjinM6 Thanks for comments, will go through them today, and push updated list to this PR.
  23. luke-jr added the label Proposed BIP modification on Aug 12, 2016
  24. Manual cleanup a59cc3e1ac
  25. farazdagi commented at 4:08 am on August 14, 2016: none

    I’ve spend considerable amount of time manually going through word list and:

    • applying all suggestions made above (thanks again @UdjinM6)
    • making sure that only nouns/verbs/adjectives are used (mostly nouns)
    • making sure that words are distinct enough from each other (improved Levenshtein distance)

    Please review and let me know if there are any issues left.

  26. UdjinM6 commented at 11:41 am on August 14, 2016: contributor

    Very nice! IMO the list looks much better now 👍

    PS. And btw, thanks for submitting this PR!

  27. luke-jr commented at 6:01 pm on August 14, 2016: member
  28. farazdagi cross-referenced this on Aug 19, 2016 from issue Extended Keys + Remind Details + Login + Complete Transaction w/o unlocking by farazdagi
  29. Bohdat commented at 11:21 am on September 5, 2016: none
    Here is some very familiar words I have found: арка арфа банк танк бард барс батон бутон бинт бунт бочка точка брак брат букет буфет вахта шахта весть честь взвод вывод взор узор влияние слияние волк воля волк толк вход уход глава слава гном гром губа гуща губа шуба дата хата день тень диск риск дума душа душа суша жара фара задор затор замок зарок игла игра имение умение кабель кафель кабель табель капля цапля катер шатер козел котел койка кошка конверт концерт корнет корсет кубок кусок куча туча лента рента лечение течение магия мафия метр мэтр модель модуль мост рост народ наряд нация рация нейлон нейрон нива ниша нить шить нога нота норма форма нота рота олень осень оплата уплата ответ отчет паек парк пакт факт пальто сальто певец перец пена цена петь путь петь сеть пила пища пила сила план плац плита элита повар товар пруд труд пугать ругать путь суть река рука сбруя струя сеть суть слон стон смена стена сосед сосуд удав удар хобот хохот цинк цирк чадо чудо челнок чеснок штаб штат
  30. voisine commented at 0:51 am on September 13, 2016: contributor

    this needs to be NFKD normalized, which you can do with the following perl script:

    0#!/usr/bin/perl
    1
    2use Unicode::Normalize;
    3use strict;
    4use warnings;
    5use open qw(:std :utf8);
    6
    7while (<>) {
    8    print NFKD("$_");
    9}
    
  31. greenaddress commented at 7:54 pm on September 14, 2016: contributor
    reviewed the words - looked OK. The list of words is also sorted so that’s great.
  32. dabura667 commented at 11:37 pm on September 14, 2016: none

    NFKD normalization needed.

    Be sure to resort after normalization.

    Japanese forgot to do so, :-( (oops!)

  33. jonathancross commented at 2:58 pm on March 30, 2017: contributor
    Ping @farazdagi – Seems this still needs to be normalized?
  34. Sjors commented at 10:40 am on June 30, 2017: member

    A general observation about adding more languages to BIP 39 is that English now has broad wallet support. If a new language is only supported by a small number of wallets, this could lead to (unintended) vendor-lockin.

    If someone writes down their mnemonic and puts in a vault, they should be able to take it out 50 years later and have a reasonable chance of finding software that can still import it.

    Perhaps getting BIP 39 (or something similar) recognized as an ISO standard would be a good step towards durability, before adding more languages.

  35. in bip-0039/russian.txt:17 in a59cc3e1ac
    12+аврал
    13+автор
    14+агат
    15+агент
    16+агрегат
    17+адажио
    


    ValleZ commented at 8:02 pm on January 7, 2018:
    адажио is a quite rare word, is it okay to use it here?
  36. nym-zone referenced this in commit 8aaa6f37e8 on Jan 7, 2018
  37. nym-zone referenced this in commit 234c66cd5d on Jan 7, 2018
  38. dabura667 commented at 6:59 am on January 8, 2018: none

    @Sjors BIP39 states

    The conversion of the mnemonic sentence to a binary seed is completely independent from generating the sentence. This results in rather simple code; there are no constraints on sentence structure and clients are free to implement their own wordlists

    And

    software must compute a checksum for the mnemonic sentence using a wordlist and issue a warning if it is invalid.

    Which means “If you can’t detect (or don’t know the wordlist) the checksum, show a warning, but ALLOW THE SEED TO BE GENERATED”

    But almost every single wallet used their “developer common sense” which states “if there exists a checksum. Always check it, and always fail loudly and stop everything”… which makes sense.

    It is the fault of BIP39 which was made to contradict developer common sense that is at fault.

    But to be honest. Electrum supports all BIP39 wordlists, because it actually follows the BIP, and if it doesn’t recognize the wordlist, it shows a warning but generates the wallet anyways. I have recovered many wallets using Electrum.

    Ironically, Electrum’s developer pointed out this contradiction, the authors ignored it, Thomas asked to have his name removed because of this and other problems, and now Electrum is the only wallet that implements BIP39 correctly in this aspect.

  39. nym-zone commented at 8:37 am on January 8, 2018: contributor

    At nym-zone/easyseed@234c66c, I have created a Unicode NFKD-normalized and binary-sorted russian.txt from farazdagi/bips@a59cc3e as modified by approximately the following command:

    0uconv -f utf-8 -t utf-8 -x '::nfkd;' < russian.txt | \
    1	sort -s > normalized/russian.txt
    

    (I originally forgot to force the "C" locale for sort(1); but I later checked, and found it did not make a difference for this list in my environment. It did make a difference for the proposed Ukrainian and Czech lists.)

    The result has been confirmed to not have a leading BOM, and to have a final line terminated with ‘\n’ (#622). I did not yet examine the source for these issues.

    SHA-256 hash for the resulting russian.txt: a8d7b9d8bdd3816eddd2aeb98718ad586d8e7dd8c364a944c072cdf3cd6bcb05

  40. nym-zone commented at 9:00 am on January 8, 2018: contributor

    @Sjors:

    A general observation about adding more languages to BIP 39 is that English now has broad wallet support. If a new language is only supported by a small number of wallets, this could lead to (unintended) vendor-lockin.

    If someone writes down their mnemonic and puts in a vault, they should be able to take it out 50 years later and have a reasonable chance of finding software that can still import it.

    The answer to vendor lock-in is independent implementations. BIP 39’s simplicity facilitates that. In ten days of occasional side-work, I have written a BIP 39 implementation with extensive self-tests which generates mnemonics in any language for which a wordlist is available in the BIP repository. It can output a BIP 32 xprv extended master private key for wallet restoration (although this feature is not yet documented in the manpage). Restoration to xprv from a user-input mnemonic in any language will be added in the near future. This is written in standard C/mostly standard POSIX. Anybody with technical competence who urgently needed to restore a wallet could whip up a barebones/no-tests/no-checksum-check/no-manpage mnemonic-to-xprv tool as a little afternoon project.

    I have C code on my disk with copyright dates from almost 40 years ago—actually, if memory serves, the oldest date I have seen in my platform’s source tree is exactly 1978. Likewise, I expect that my freely available C11 code will compile with minimal changes for decades to come.

    When such tools are available and easy to produce ab initio, where is the vendor lock-in? Wallets don’t need multi-language support to restore from an xprv.

    I am glad to see new languages being proposed and added. The important part is to get the wordlist right before it’s carved into the standard, baked into implementations, and used for wallets containing actual people’s actual money. That is important.

  41. nym-zone referenced this in commit c7d698a35f on Jan 11, 2018
  42. ZilvinasKucinskas commented at 10:23 pm on April 19, 2018: none

    So is it ok to implement this Russian wordlist in the wallet?

    What are the rules of accepting language to BIP39 by the community?

  43. dabura667 commented at 10:55 pm on April 19, 2018: none

    You can implement any wordlist you want, and Electrum will properly recover it. (Though it will not detect checksum errors)

    Other wallets are poorly implemented.

  44. DonaldTsang cross-referenced this on Dec 24, 2018 from issue Binary Lists by DonaldTsang
  45. DonaldTsang cross-referenced this on Aug 22, 2019 from issue BIP39: Russian wordlist added by 3sGgpQ8H
  46. DonaldTsang commented at 1:40 am on August 22, 2019: none
  47. luke-jr closed this on Jul 2, 2021


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-24 02:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me