BIP39: Added ukrainian wordlist #442

pull Bohdat wants to merge 5 commits into bitcoin:master from Bohdat:master changing 2 files +2049 −0
  1. Bohdat commented at 12:09 pm on September 5, 2016: none
  2. Added ukrainian wordlist b705eda943
  3. luke-jr added the label Proposed BIP modification on Sep 5, 2016
  4. luke-jr commented at 8:10 pm on September 5, 2016: member
  5. voisine commented at 0:53 am on September 13, 2016: contributor

    this needs to be NFKD normalized, which you can do with the following perl script:

    0#!/usr/bin/perl
    1
    2use Unicode::Normalize;
    3use strict;
    4use warnings;
    5use open qw(:std :utf8);
    6
    7while (<>) {
    8    print NFKD("$_");
    9}
    
  6. Normalized under NFKD b991ce522f
  7. voisine commented at 5:46 pm on September 13, 2016: contributor
    Looks good to me, but I’d like a second ukranian speaker to go over the list and verify it meets the word list criteria before ACKing
  8. greenaddress commented at 7:45 pm on September 13, 2016: contributor

    we reviewed the words (ukranian speaker) and they look OK - however the list doesn’t seem sorted (run sort on it, export LANG=C first if you don’t have it set).

    If sorted it allows faster processing (binary search) and we think it is worthwhile doing it.

  9. Sorted with sort command fb81332187
  10. zerko commented at 4:50 pm on September 23, 2016: none
    Doesn’t look like words are identifiable by first four letters.
  11. slush0 commented at 9:38 pm on September 23, 2016: contributor

    Script for validating all BIP39 defined rules (like uniqueness of first four letters) is here: https://github.com/trezor/python-mnemonic/blob/master/test_mnemonic.py

    Maybe it will need fixes for UTF-8 (eventually slight rewrite for python3 which handle UTF much better), but passing such tests is needed for adding into BIP.

  12. slush0 commented at 10:01 pm on September 23, 2016: contributor

    Okay, I run test_mnemonic.py (with Python3 - with no problems) and it gave me such list of duplicates: http://pastebin.com/ztBqDT9q

    There were some other minor errors, but this need some work.

  13. Fixed to pass verification script 563de78686
  14. Replaced some words; reduced number of verbs 152fc5937a
  15. nym-zone referenced this in commit 8aaa6f37e8 on Jan 7, 2018
  16. nym-zone referenced this in commit 08a05b40e7 on Jan 7, 2018
  17. nym-zone commented at 9:10 am on January 8, 2018: contributor

    At nym-zone/easyseed@08a05b4, I have created a bugfixed ukrainian.txt which is NFKD-normalized and binary-sorted, and fixes one technical bug.

    The ukrainian.txt from Bohdat/bips@152fc59 contains a trailing space (0x20) then tab (0x09, '\t') after the word at original index 1393 (1-based line number 1394), before the newline '\n'. The problem was first identified by failure of easyseed’s extensive internal self-tests, followed by examination with cmp(1) and hex dumps to diagnose the difference between the wordlist in my source tree, and the wordlist printed on stdout by easyseed -W -P -l uk.

    The following commands pinpoint the problem:

    0$ grep -E '[[:space:]]$' ukrainian.txt | hd
    100000000  d0 bf d1 96 d1 81 d0 bd  d1 8f 20 09 0a           |.......... ..|
    20000000d
    3$ echo "\"`grep -En '[[:space:]]$' ukrainian.txt`\""
    4"1394:пісня 	"
    

    (@dabura667, perhaps you may want to add that to your punch-list of technical checks.)

    It is fixed with the following command:

    0$ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt
    

    After verification that this command made no other changes, the list is normalized and sorted:

    0$ ls -l ukrainian.txt ukfix1/uk_fixed0.txt
    1-rw-r--r-- 1 user user 24550 Jan  7 21:26 ukfix1/uk_fixed0.txt
    2-rw-r--r-- 1 user user 24552 Jan  7 20:31 ukrainian.txt
    3$ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt
    4[...showing only the desired line changed...]
    5$ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \
    6	LC_ALL=C LANG=C sort -s > uk_fixed1.txt
    7$ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt
    8mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y
    

    SHA-256 hash for the resulting ukrainian.txt:

    0612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9
    
  18. nym-zone cross-referenced this on Jan 8, 2018 from issue Czech wordlist for BIP0039 by zizelevak
  19. nym-zone referenced this in commit c7d698a35f on Jan 11, 2018
  20. ldz1 cross-referenced this on Nov 14, 2018 from issue Remove uncompleted bip39 wordlists. by ldz1
  21. DonaldTsang cross-referenced this on Dec 24, 2018 from issue Binary Lists by DonaldTsang
  22. kittyandrew commented at 10:01 pm on June 14, 2021: none

    Hello, I’ve almost started working on my own list for this, and found this pr. Can someone tell me what exactly here needs to be fixed/updated/reviewed?

    Follow up question: are there any preferences for nouns-verbs-adjectives? There are (at least) few special words that translate to “and”, “or”, “there” etc.

    Edit: In addition, there are many closely related words and different forms of the same word - working on those.

    Edit 2: probably will have to create new PR in the end, because current contributor is inactive.

  23. luke-jr closed this on Jul 2, 2021


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me