Czech wordlist for BIP0039 #493
pull zizelevak wants to merge 15 commits into bitcoin:master from zizelevak:master changing 2 files +2064 −0-
zizelevak commented at 7:34 am on January 30, 2017: contributorConsensus set of words after discussion in czech comunity in facebook group
-
Create Czech.txt e3ec9cf9db
-
Rename Czech.txt to czech.txt 5be84b01f1
-
Update bip-0039-wordlists.md 469a12b918
-
Update bip-0039-wordlists.md 9eefdc9151
-
Update czech.txt
Words are sorting according English alphabet (Czech sorting has difference in “ch”)
-
Update bip-0039-wordlists.md 875f35a07a
-
Update czech.txt 1a3ec3a459
-
Update czech.txt 7ae1049232
-
Update czech.txt 92a07ee121
-
luke-jr added the label Proposed BIP modification on Jan 30, 2017
-
slush0 commented at 0:23 am on January 31, 2017: contributorWordlist passed our unit test in python-mnemonic. I especially like that words don’t use diacritic. ACK for me.
-
paveljanik commented at 6:34 am on January 31, 2017: contributorIs “nynfa” Czech word? orchest? peleton?
-
Update czech.txt dab9d97b44
-
Update czech.txt f4691796c6
-
zizelevak commented at 7:19 am on January 31, 2017: contributor@paveljanik thanx, you found two mistakes. Correct forms are “nymfa” and “orchestr”. I fixed both mistakes. Peleton is OK, according Institute of the Czech Language http://ssjc.ujc.cas.cz/
-
paveljanik commented at 7:25 am on January 31, 2017: contributor
-
zizelevak commented at 7:34 am on January 31, 2017: contributor@paveljanik We both use databases from Institute of the Czech Language. You use only handbook, I use Dictionary of written languages, my database is more complex. Meaning of peleton: The peloton (from French, originally meaning ‘platoon’) is the main group or pack of riders in a road bicycle race. Edit: I explain peleton (older version of this post was explaining nymfa)
-
xHire commented at 7:41 am on January 31, 2017: none@zizelevak More complex may not be always correct, see yourself: http://prirucka.ujc.cas.cz/?slovo=peloton – even referenced SSJČ states that “peleton” is incorrect form. By the way, Prirucka is currently the best normative reference available.
-
Update czech.txt b9f09aae63
-
zizelevak commented at 7:51 am on January 31, 2017: contributor@xHire @paveljanik Peleton was removed, Piksla was added
-
paveljanik commented at 9:16 pm on February 10, 2017: contributor
-
paveljanik commented at 9:18 pm on February 10, 2017: contributorch after c?
-
paveljanik commented at 9:37 pm on February 10, 2017: contributorWhat about limitace -> limit?
-
paveljanik commented at 9:38 pm on February 10, 2017: contributormincovna -> mince?
-
paveljanik commented at 9:53 pm on February 10, 2017: contributormotivace -> motiv, motorka -> motor?
-
paveljanik commented at 9:54 pm on February 10, 2017: contributormoudrost -> moudro
-
paveljanik commented at 9:56 pm on February 10, 2017: contributornikl? [nykl]: not found ;-) Is this a good word?
-
paveljanik commented at 9:57 pm on February 10, 2017: contributornormativ -> norma
-
paveljanik commented at 9:57 pm on February 10, 2017: contributornovotvar -> novota?
-
paveljanik commented at 9:59 pm on February 10, 2017: contributorodolnost -> odolat
-
paveljanik commented at 9:59 pm on February 10, 2017: contributorokovy -> okov
-
paveljanik commented at 10:01 pm on February 10, 2017: contributorotrhanec -> otrhat
-
paveljanik commented at 10:07 pm on February 10, 2017: contributorpopisek -> popis
-
paveljanik commented at 10:09 pm on February 10, 2017: contributorrobotika -> robot (Remember Karel Čapek’s word ;-)
-
paveljanik commented at 10:13 pm on February 10, 2017: contributorspornost -> spor
-
paveljanik commented at 10:14 pm on February 10, 2017: contributortankista -> tank?
-
zizelevak commented at 10:45 pm on February 10, 2017: contributor
changed bariera –> smaragd
ch after c? - worldlist is sorted by english alphabet, not by czech alphabet. Its simpler for implementation this wordlist in wallet software.
Limit - it is not possible use, its included in english worldlist mince - it is not possible use, its included in french worldlist
“Motivace” - I think its more suitable than “motiv” Both words have foreign orgin, but “motivace” has czech suffix -ace.
motor - it is not possible use, its included in english worldlist
“moudro” - I dont agree. “Modrost” is 50-times more frequent than “moudro” according czech corpus SYN2005
nikl - its OK its metal, chemical element with atomic number 28
norma - it is not possible use, its included in spain worldlist
changed Novotvar –> novota
“odolat” - it is not possible, its simular to other word in wordlist “odvolat”
“okov” - Its part of well, “Okovy” it is shackles, I prefer Okovy, I think its more common
changed otrhanec –> otrhat
“popis” - it is not possible, its simular to other word in wordlist “dopis” “robot” - it is not possible use, its included in english worldlist “tank” - it is not possible use, its included in english worldlist
-
zizelevak closed this on Feb 10, 2017
-
zizelevak reopened this on Feb 10, 2017
-
Update czech.txt 702d651aa5
-
Update czech.txt 8a2a06e94c
-
paveljanik commented at 6:47 am on February 11, 2017: contributor
@zizelevak Thanks for updates! Can you now please squash?
I’m fine with the list now. Great work, BTW! 👍
-
xHire commented at 7:33 am on February 11, 2017: noneAh, thanks for reminder (via notification), I also had some comments/questions, will post them later today (probably in the evening)! :c)
-
xHire commented at 8:57 pm on February 11, 2017: none
I divide my comment into several groups to make it easier to work with. One thing to write first: while I make comments about some words, at the end I also provide a buffer of alternatives just in case it is unclear with what words potentially replace some of those problematic ones. (By the way, comments at the beginning of each section are mostly for non-Czech speaking followers.)
- Lhota (spelled as a name)
Infrequently used/known words
In this list I include also (not only) words that I have never heard of. :c) It doesn’t mean I can’t just look them up, but I suppose they are so rare that they might not be so good to be in this dictionary.
- falzum
- gondola
- karfiol
- luneta
- nefrit
- nestor
- normativ
- opuka
- pagoda
- ponton
- rytec
- sahel
- sutana
Words with different diacritics
There are some words in Czech that have dual spelling—one time without diacritics, other time with diacritics. Those I list below sound to me quite forced.
- chlor
- chrom
- folklor
- globus
- kasino
- kastrol
- lahev
- mixer → mixovat
- naftalen → nafta (but it’s probably already taken in another dictionary I suppose)
- ozon
- tampon
- vitamin → vitalita
Plural forms
As stated in README, words should be in their base forms which means they should be in singular.
- holinky → holinka
- lenilky → lentilka
- piliny → pilina
Words with (optional) spacing
So called „příslovečné spřežky“—words which are allowed to have their preposition transformed into prefix. Below are those I suppose are not so much more common in a single word form or might be confusing or don’t sound so well to me.
- nadlouho
- nadrobno
- natrvalo
- natvrdo
- navenek
- zdaleka
Miner modification proposals
So as to make some words sound more natural or clearer.
- dominant → dominanta
- hltan → hltat
- kmit → kmitat
- logicky → logika
- mulat → mula
- muzika → muzikant
- naposled → naposledy
- pasivum → pasivita
- roup → roupice
- vespod → vespodu
Forced forms of words
Technically correct, but (mostly) uncommon word forms (forced into these just to make them lose their diacritics).
- drtivost
- kalnost
- (kluzkost)
- (ladnost)
- levnost
- mokrost → mokro
- movitost
- suknice → sukno (suknice is really so old ;c))
More like informal forms and/or not sounding neutral
(I know some are (probably) correct, they just don’t sound ideal to me.)
- fabrika
- fanda
- fara → farnost
- fiflena
- fixa → fixace
- glejt
- hafan
- hezoun
- kafe
- machr
- marodka
- mejdan
- nimrod
- piksla
- smola
- spratek
- (tatarka)
- vatra
- vloni → vloha
- (zrzek)
Words I simply don’t like here ;c)
(Or I couldn’t decide in which other group to put them.)
- euro (not a Czech word and might also become deprecated soon)
- flirt → flamendr
- lepra
- (limitace)
- minibar → miniatura/minimalista (just wow, why to choose the word “minibar” among all those words prefixed with mini- :-D)
- (mocensky)
- nahota → nahodile/nahoru
- nevina
- (oktet)
- onkolog
- podle (more often is a preposition IMO)
- sekvoje (because it’s commonly spelled a tiny bit differently)
- sklivec
- tankista → tanker (if not already used elsewhere)
- tavenina → tavidlo
- tenor → teror/terorista
- tunika
- varan
Alternatives
I counted 64 words without a suggestion in the upper lists. The words below are already checked not to collide on prefix level and many also on a single letter difference level. I have 58 of them which is 6 words short… So I put 6 or so words above into parentheses to indicate their lower priority. ;c)
- borka
- celistvost
- deflace
- destilace
- dioda
- displej
- epopej
- firma
- fukar
- holokaust
- horda
- kabriolet
- kapybara
- klima/klimatizace
- kosmonaut
- kotoul
- kotrmelec
- kropit/tropit
- lamela
- litr
- lodivod
- lokomotiva
- loterie
- mela/melasa
- mydlinky
- nanometr
- nektarinka
- nora
- nutrie
- orangutan
- parabola
- peloton
- periskop
- pikolitr
- ponynka
- prahora (prahory? in this case, I’m tending to call it a plurale tantum, although it’s (strictly speaking) not the case)
- pranostika
- pruh/prut
- rakovina
- relevance
- rotoped
- rydlo
- seschnout
- sinusoida
- spokojenost
- stranou
- surikata
- tempo
- tiskopis
- titrace
- tranzistor
- traverza
- trend
- utiskovat
- vodivost
- vyrvat
- ziskovost
- zkontrolovat/zkonfiskovat
- zmutovat
I’m looking forward to hear your opinion! :c)
-
zizelevak commented at 2:39 am on February 12, 2017: contributor
@xHire Lhota - removed
falzum - removed gondola - removed karfiol - removed luneta - removed nefrit - removed nestor - removed normativ - removed opuka - removed pagoda - removed ponton - removed rytec - removed sahel - removed sutana - removed
chlor - removed chrom - removed folklor - removed globus - removed kasino - removed kastrol - removed lahev - removed mixer → mixovat - OK, changed naftalen → nafta - removed ozon - removed tampon - removed vitamin → vitalita - OK changed
holinky → holinka - OK changed lenilky → lentilka - OK changed piliny → pilina - OK changed
nadlouho - removed nadrobno - removed natrvalo - removed natvrdo - removed navenek - keep it zdaleka - keep it
dominant → dominanta (no, dominanta is too long) hltan → hltat - OK changed kmit → kmitat - OK changed logicky → logika - OK changed mulat → mula (no, mula is in spain worldlist) muzika → muzikant - OK changed naposled → naposledy (no, naposledy is too long) pasivum → pasivita - OK changed roup → roupice (no, I dont find roupice in dictionary and I never heard it) vespod → vespodu - OK changed
drtivost - removed drtivost, zdrtit, added drtit kalnost - removed (kluzkost) - keep it (ladnost) - keep it levnost - removed mokrost → mokro - OK changed movitost - removed suknice → sukno - OK changed
fabrika - removed fanda - removed fara → farnost - removed (farnost have colision with marnost) fiflena - removed fixa → fixace - OK changed glejt - keep it, its older word, no informal hafan - removed hezoun - removed kafe - removed machr - removed marodka - removed mejdan - removed nimrod - removed piksla - removed smola - removed spratek - removed (tatarka) - removed vatra - removed vloni → vloha - removed (zrzek) - changed to zrzavost
euro - removed flirt → flamendr - keep it, flamender is 6-times less frequent lepra- removed (limitace) - removed minibar → miniatura/minimalista - keep it, your words are too long (mocensky) - removed nahota → nahodile/nahoru - changed nevina - keep it (oktet)- removed onkolog - removed podle - removed sekvoje (because it’s commonly spelled a tiny bit differently) sklivec - removed tankista → tanker - changed tavenina → tavidlo - keep it, I never heard “tavidlo” tenor → teror/terorista - keep it, I try avoid words with terorism tunika - removed varan - keep it
borka - its very uncommon celistvost - no, too many letters, 8 is max limit deflace - OK added destilace - no, too many letters, 8 is max limit dioda - OK added displej - OK added epopej - OK added firma - no, in spain wordlist fukar - OK added holokaust - no, too many letters, 8 is max limit horda - OK added kabriolet - no, too many letters, 8 is max limit kapybara - OK, added klima - OK, added kosmonaut - no, too many letters, 8 is max limit kotoul - OK, added kotrmelec - no, too many letters, 8 is max limit kropit - OK, added lamela - OK, added litr - no, simular to “lotr” lodivod - OK, added lokomotiva - no, too many letters, 8 is max limit loterie - no, in french wordlist melasa - OK, added mydlinky - no, its plural, and singular “midlinka” is uncommon nanometr - OK, added nektarinka - no, too many letters, 8 is max limit nora - OK, added nutrie - OK, added orangutan - no, too many letters, 8 is max limit parabola - no, in italian wordlist peloton - OK, added periskop - OK, added pikolitr - no, its very uncommon ponynka - no, I never heard this word and not found in dictionary prahory - OK, added. plural is OK, its name of geological epoch pranostika - no, too many letters, 8 is max limit prut - OK, added rakovina - OK, added relevance - no, too many letters, 8 is max limit rotoped - OK, added rydlo - OK, added seschnout - no, too many letters, 8 is max limit sinusoida - no, too many letters, 8 is max limit spokojenost - no, too many letters, 8 is max limit stranou - no, first 4 letters same as “strach” surikata - OK, added tempo - no, in italian wordlist tiskopis - OK, added titrace - no, its very uncommon tranzistor - no, too many letters, 8 is max limit traverza - OK, added trend - no, in english wordlist utiskovat - no, too many letters, 8 is max limit vodivost - OK, added vyrvat - OK, added ziskovost - no, too many letters, 8 is max limit zkontrolovat/zkonfiskovat - no, too many letters, 8 is max limit zmutovat - OK, added
Added my new Words dynamit reflex arogance abdikace linoleum zhatit svodidlo hematom manko lomcovat termoska glazura amputace anulovat hluchota aspirace sediment rarita skafandr reputace unavit sponzor mahagon rubrika veranda koalice
-
Update czech.txt b7f682f702
-
luke-jr commented at 3:53 am on March 6, 2017: memberWhat’s the status of this?
-
zizelevak commented at 6:53 am on March 6, 2017: contributor@luke-jr Orginal wordlist was checked by @slush0 and passed all tests. Some members adviced changing some words, which were problematic from language reasons. I dealt with all requests. I think that actual worldlist should be checked by @slush0 again (I am sure that will passed his tests, because I made own tests) and its ready for publish
-
paveljanik commented at 7:30 am on March 6, 2017: contributor@zizelevak what about aspirace and kapybara?
-
zizelevak commented at 7:41 am on March 6, 2017: contributor@paveljanik In czech frequency corpus: aspirace has frequency 433 points, kapybara has 16 points. Aspirace is quite common czech word. Kapybara is not much frequent, but we have there tens of words with less frequency than kapybara. I will keep both words in list.
-
jonathancross commented at 2:15 am on November 7, 2017: contributorFriendly ping @slush0
-
nym-zone referenced this in commit 8aaa6f37e8 on Jan 7, 2018
-
nym-zone referenced this in commit ba25dfac56 on Jan 7, 2018
-
nym-zone commented at 9:15 am on January 8, 2018: contributor
I have created a Unicode NFKD-normalized and sorted
czech.txt
from zizelevak/bips@b7f682f, as modified by approximately the following command:0uconv -f utf-8 -t utf-8 -x '::nfkd;' < czech.txt | \ 1 LC_ALL=C LANG=C sort -s > normalized/czech.txt
The result has been confirmed to not have any leading BOM, and to have a final line terminated with
'\n'
(#622). I did not yet examine the source for these issues. I did examine the source to confirm that no lines had any trailing whitespace (see nym-zone/easyseed@08a05b4, #442).SHA-256 hash for the resulting
czech.txt
:0195136b3ba0f3099a9df625e0963f4efb56625b91c3a76bc5b4a9466a26880f7
-
nym-zone referenced this in commit c7d698a35f on Jan 11, 2018
-
slush0 commented at 0:27 am on August 2, 2018: contributorSorry guys, I completely missed this. If the wordlist still passes bip39 tests, I’m fine with that (I didn’t test it myself).
-
DonaldTsang cross-referenced this on Dec 24, 2018 from issue Binary Lists by DonaldTsang
-
DonaldTsang commented at 1:41 am on August 22, 2019: noneCross-reference https://github.com/bitcoin/bips/pull/753
-
DonaldTsang cross-referenced this on Aug 22, 2019 from issue Polish wordlist for BIP0039 by p2w34
-
luke-jr merged this on Sep 19, 2019
-
luke-jr closed this on Sep 19, 2019
-
tevador commented at 5:42 pm on November 16, 2021: none
I’m not sure how this was missed, but the order of words in the wordlist is not alphabetical. The word “svetr” should come after “svazek”.
The fact that the list looks sorted but is not may cause subtle bugs in applications using binary search to look up code words.
-
iancoleman cross-referenced this on Nov 16, 2021 from issue Czech list is not alphabetically ordered by iancoleman
This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 03:10 UTC
More mirrored repositories can be found on mirror.b10c.me