BIP39: Added ukrainian wordlist #442
pull Bohdat wants to merge 5 commits into bitcoin:master from Bohdat:master changing 2 files +2049 −0-
Bohdat commented at 12:09 pm on September 5, 2016: none
-
Added ukrainian wordlist b705eda943
-
luke-jr added the label Proposed BIP modification on Sep 5, 2016
-
voisine commented at 0:53 am on September 13, 2016: contributor
this needs to be NFKD normalized, which you can do with the following perl script:
0#!/usr/bin/perl 1 2use Unicode::Normalize; 3use strict; 4use warnings; 5use open qw(:std :utf8); 6 7while (<>) { 8 print NFKD("$_"); 9}
-
Normalized under NFKD b991ce522f
-
voisine commented at 5:46 pm on September 13, 2016: contributorLooks good to me, but I’d like a second ukranian speaker to go over the list and verify it meets the word list criteria before ACKing
-
greenaddress commented at 7:45 pm on September 13, 2016: contributor
we reviewed the words (ukranian speaker) and they look OK - however the list doesn’t seem sorted (run sort on it, export LANG=C first if you don’t have it set).
If sorted it allows faster processing (binary search) and we think it is worthwhile doing it.
-
Sorted with sort command fb81332187
-
zerko commented at 4:50 pm on September 23, 2016: noneDoesn’t look like words are identifiable by first four letters.
-
slush0 commented at 9:38 pm on September 23, 2016: contributor
Script for validating all BIP39 defined rules (like uniqueness of first four letters) is here: https://github.com/trezor/python-mnemonic/blob/master/test_mnemonic.py
Maybe it will need fixes for UTF-8 (eventually slight rewrite for python3 which handle UTF much better), but passing such tests is needed for adding into BIP.
-
slush0 commented at 10:01 pm on September 23, 2016: contributor
Okay, I run test_mnemonic.py (with Python3 - with no problems) and it gave me such list of duplicates: http://pastebin.com/ztBqDT9q
There were some other minor errors, but this need some work.
-
Fixed to pass verification script 563de78686
-
Replaced some words; reduced number of verbs 152fc5937a
-
nym-zone referenced this in commit 8aaa6f37e8 on Jan 7, 2018
-
nym-zone referenced this in commit 08a05b40e7 on Jan 7, 2018
-
nym-zone commented at 9:10 am on January 8, 2018: contributor
At nym-zone/easyseed@08a05b4, I have created a bugfixed
ukrainian.txt
which is NFKD-normalized and binary-sorted, and fixes one technical bug.The
ukrainian.txt
from Bohdat/bips@152fc59 contains a trailing space (0x20
) then tab (0x09
,'\t'
) after the word at original index 1393 (1-based line number 1394), before the newline'\n'
. The problem was first identified by failure of easyseed’s extensive internal self-tests, followed by examination withcmp(1)
and hex dumps to diagnose the difference between the wordlist in my source tree, and the wordlist printed on stdout byeasyseed -W -P -l uk
.The following commands pinpoint the problem:
0$ grep -E '[[:space:]]$' ukrainian.txt | hd 100000000 d0 bf d1 96 d1 81 d0 bd d1 8f 20 09 0a |.......... ..| 20000000d 3$ echo "\"`grep -En '[[:space:]]$' ukrainian.txt`\"" 4"1394:пісня "
(@dabura667, perhaps you may want to add that to your punch-list of technical checks.)
It is fixed with the following command:
0$ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt
After verification that this command made no other changes, the list is normalized and sorted:
0$ ls -l ukrainian.txt ukfix1/uk_fixed0.txt 1-rw-r--r-- 1 user user 24550 Jan 7 21:26 ukfix1/uk_fixed0.txt 2-rw-r--r-- 1 user user 24552 Jan 7 20:31 ukrainian.txt 3$ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt 4[...showing only the desired line changed...] 5$ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \ 6 LC_ALL=C LANG=C sort -s > uk_fixed1.txt 7$ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt 8mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y
SHA-256 hash for the resulting
ukrainian.txt
:0612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9
-
nym-zone cross-referenced this on Jan 8, 2018 from issue Czech wordlist for BIP0039 by zizelevak
-
nym-zone referenced this in commit c7d698a35f on Jan 11, 2018
-
ldz1 cross-referenced this on Nov 14, 2018 from issue Remove uncompleted bip39 wordlists. by ldz1
-
DonaldTsang cross-referenced this on Dec 24, 2018 from issue Binary Lists by DonaldTsang
-
kittyandrew commented at 10:01 pm on June 14, 2021: none
Hello, I’ve almost started working on my own list for this, and found this pr. Can someone tell me what exactly here needs to be fixed/updated/reviewed?
Follow up question: are there any preferences for nouns-verbs-adjectives? There are (at least) few special words that translate to “and”, “or”, “there” etc.
Edit: In addition, there are many closely related words and different forms of the same word - working on those.
Edit 2: probably will have to create new PR in the end, because current contributor is inactive.
-
luke-jr commented at 9:29 pm on July 2, 2021: member
-
luke-jr closed this on Jul 2, 2021
This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 05:10 UTC
More mirrored repositories can be found on mirror.b10c.me