Czech wordlist for BIP0039 #493

pull zizelevak wants to merge 15 commits into bitcoin:master from zizelevak:master changing 2 files +2064 −0
  1. zizelevak commented at 7:34 am on January 30, 2017: contributor
    Consensus set of words after discussion in czech comunity in facebook group
  2. Create Czech.txt e3ec9cf9db
  3. Rename Czech.txt to czech.txt 5be84b01f1
  4. Update bip-0039-wordlists.md 469a12b918
  5. Update bip-0039-wordlists.md 9eefdc9151
  6. Update czech.txt
     Words are sorting according English alphabet (Czech sorting has difference in “ch”)
    af07fb993e
  7. Update bip-0039-wordlists.md 875f35a07a
  8. Update czech.txt 1a3ec3a459
  9. Update czech.txt 7ae1049232
  10. Update czech.txt 92a07ee121
  11. luke-jr added the label Proposed BIP modification on Jan 30, 2017
  12. luke-jr commented at 5:13 pm on January 30, 2017: member
  13. slush0 commented at 0:23 am on January 31, 2017: contributor
    Wordlist passed our unit test in python-mnemonic. I especially like that words don’t use diacritic. ACK for me.
  14. paveljanik commented at 6:34 am on January 31, 2017: contributor
    Is “nynfa” Czech word? orchest? peleton?
  15. Update czech.txt dab9d97b44
  16. Update czech.txt f4691796c6
  17. zizelevak commented at 7:19 am on January 31, 2017: contributor
    @paveljanik thanx, you found two mistakes. Correct forms are “nymfa” and “orchestr”. I fixed both mistakes. Peleton is OK, according Institute of the Czech Language http://ssjc.ujc.cas.cz/
  18. paveljanik commented at 7:25 am on January 31, 2017: contributor
  19. zizelevak commented at 7:34 am on January 31, 2017: contributor
    @paveljanik We both use databases from Institute of the Czech Language. You use only handbook, I use Dictionary of written languages, my database is more complex. Meaning of peleton: The peloton (from French, originally meaning ‘platoon’) is the main group or pack of riders in a road bicycle race. Edit: I explain peleton (older version of this post was explaining nymfa)
  20. xHire commented at 7:41 am on January 31, 2017: none
    @zizelevak More complex may not be always correct, see yourself: http://prirucka.ujc.cas.cz/?slovo=peloton – even referenced SSJČ states that “peleton” is incorrect form. By the way, Prirucka is currently the best normative reference available.
  21. Update czech.txt b9f09aae63
  22. zizelevak commented at 7:51 am on January 31, 2017: contributor
    @xHire @paveljanik Peleton was removed, Piksla was added
  23. paveljanik commented at 9:16 pm on February 10, 2017: contributor
  24. paveljanik commented at 9:18 pm on February 10, 2017: contributor
    ch after c?
  25. paveljanik commented at 9:37 pm on February 10, 2017: contributor
    What about limitace -> limit?
  26. paveljanik commented at 9:38 pm on February 10, 2017: contributor
    mincovna -> mince?
  27. paveljanik commented at 9:53 pm on February 10, 2017: contributor
    motivace -> motiv, motorka -> motor?
  28. paveljanik commented at 9:54 pm on February 10, 2017: contributor
    moudrost -> moudro
  29. paveljanik commented at 9:56 pm on February 10, 2017: contributor
    nikl? [nykl]: not found ;-) Is this a good word?
  30. paveljanik commented at 9:57 pm on February 10, 2017: contributor
    normativ -> norma
  31. paveljanik commented at 9:57 pm on February 10, 2017: contributor
    novotvar -> novota?
  32. paveljanik commented at 9:59 pm on February 10, 2017: contributor
    odolnost -> odolat
  33. paveljanik commented at 9:59 pm on February 10, 2017: contributor
    okovy -> okov
  34. paveljanik commented at 10:01 pm on February 10, 2017: contributor
    otrhanec -> otrhat
  35. paveljanik commented at 10:07 pm on February 10, 2017: contributor
    popisek -> popis
  36. paveljanik commented at 10:09 pm on February 10, 2017: contributor
    robotika -> robot (Remember Karel Čapek’s word ;-)
  37. paveljanik commented at 10:13 pm on February 10, 2017: contributor
    spornost -> spor
  38. paveljanik commented at 10:14 pm on February 10, 2017: contributor
    tankista -> tank?
  39. zizelevak commented at 10:45 pm on February 10, 2017: contributor

    @paveljanik

    changed bariera –> smaragd

    ch after c? - worldlist is sorted by english alphabet, not by czech alphabet. Its simpler for implementation this wordlist in wallet software.

    Limit - it is not possible use, its included in english worldlist mince - it is not possible use, its included in french worldlist

    “Motivace” - I think its more suitable than “motiv” Both words have foreign orgin, but “motivace” has czech suffix -ace.

    motor - it is not possible use, its included in english worldlist

    “moudro” - I dont agree. “Modrost” is 50-times more frequent than “moudro” according czech corpus SYN2005

    nikl - its OK its metal, chemical element with atomic number 28

    norma - it is not possible use, its included in spain worldlist

    changed Novotvar –> novota

    “odolat” - it is not possible, its simular to other word in wordlist “odvolat”

    “okov” - Its part of well, “Okovy” it is shackles, I prefer Okovy, I think its more common

    changed otrhanec –> otrhat

    “popis” - it is not possible, its simular to other word in wordlist “dopis” “robot” - it is not possible use, its included in english worldlist “tank” - it is not possible use, its included in english worldlist

  40. zizelevak closed this on Feb 10, 2017

  41. zizelevak reopened this on Feb 10, 2017

  42. Update czech.txt 702d651aa5
  43. Update czech.txt 8a2a06e94c
  44. paveljanik commented at 6:47 am on February 11, 2017: contributor

    @zizelevak Thanks for updates! Can you now please squash?

    I’m fine with the list now. Great work, BTW! 👍

  45. xHire commented at 7:33 am on February 11, 2017: none
    Ah, thanks for reminder (via notification), I also had some comments/questions, will post them later today (probably in the evening)! :c)
  46. xHire commented at 8:57 pm on February 11, 2017: none

    I divide my comment into several groups to make it easier to work with. One thing to write first: while I make comments about some words, at the end I also provide a buffer of alternatives just in case it is unclear with what words potentially replace some of those problematic ones. (By the way, comments at the beginning of each section are mostly for non-Czech speaking followers.)

    • Lhota (spelled as a name)

    Infrequently used/known words

    In this list I include also (not only) words that I have never heard of. :c) It doesn’t mean I can’t just look them up, but I suppose they are so rare that they might not be so good to be in this dictionary.

    • falzum
    • gondola
    • karfiol
    • luneta
    • nefrit
    • nestor
    • normativ
    • opuka
    • pagoda
    • ponton
    • rytec
    • sahel
    • sutana

    Words with different diacritics

    There are some words in Czech that have dual spelling—one time without diacritics, other time with diacritics. Those I list below sound to me quite forced.

    • chlor
    • chrom
    • folklor
    • globus
    • kasino
    • kastrol
    • lahev
    • mixer → mixovat
    • naftalen → nafta (but it’s probably already taken in another dictionary I suppose)
    • ozon
    • tampon
    • vitamin → vitalita

    Plural forms

    As stated in README, words should be in their base forms which means they should be in singular.

    • holinky → holinka
    • lenilky → lentilka
    • piliny → pilina

    Words with (optional) spacing

    So called „příslovečné spřežky“—words which are allowed to have their preposition transformed into prefix. Below are those I suppose are not so much more common in a single word form or might be confusing or don’t sound so well to me.

    • nadlouho
    • nadrobno
    • natrvalo
    • natvrdo
    • navenek
    • zdaleka

    Miner modification proposals

    So as to make some words sound more natural or clearer.

    • dominant → dominanta
    • hltan → hltat
    • kmit → kmitat
    • logicky → logika
    • mulat → mula
    • muzika → muzikant
    • naposled → naposledy
    • pasivum → pasivita
    • roup → roupice
    • vespod → vespodu

    Forced forms of words

    Technically correct, but (mostly) uncommon word forms (forced into these just to make them lose their diacritics).

    • drtivost
    • kalnost
    • (kluzkost)
    • (ladnost)
    • levnost
    • mokrost → mokro
    • movitost
    • suknice → sukno (suknice is really so old ;c))

    More like informal forms and/or not sounding neutral

    (I know some are (probably) correct, they just don’t sound ideal to me.)

    • fabrika
    • fanda
    • fara → farnost
    • fiflena
    • fixa → fixace
    • glejt
    • hafan
    • hezoun
    • kafe
    • machr
    • marodka
    • mejdan
    • nimrod
    • piksla
    • smola
    • spratek
    • (tatarka)
    • vatra
    • vloni → vloha
    • (zrzek)

    Words I simply don’t like here ;c)

    (Or I couldn’t decide in which other group to put them.)

    • euro (not a Czech word and might also become deprecated soon)
    • flirt → flamendr
    • lepra
    • (limitace)
    • minibar → miniatura/minimalista (just wow, why to choose the word “minibar” among all those words prefixed with mini- :-D)
    • (mocensky)
    • nahota → nahodile/nahoru
    • nevina
    • (oktet)
    • onkolog
    • podle (more often is a preposition IMO)
    • sekvoje (because it’s commonly spelled a tiny bit differently)
    • sklivec
    • tankista → tanker (if not already used elsewhere)
    • tavenina → tavidlo
    • tenor → teror/terorista
    • tunika
    • varan

    Alternatives

    I counted 64 words without a suggestion in the upper lists. The words below are already checked not to collide on prefix level and many also on a single letter difference level. I have 58 of them which is 6 words short… So I put 6 or so words above into parentheses to indicate their lower priority. ;c)

    • borka
    • celistvost
    • deflace
    • destilace
    • dioda
    • displej
    • epopej
    • firma
    • fukar
    • holokaust
    • horda
    • kabriolet
    • kapybara
    • klima/klimatizace
    • kosmonaut
    • kotoul
    • kotrmelec
    • kropit/tropit
    • lamela
    • litr
    • lodivod
    • lokomotiva
    • loterie
    • mela/melasa
    • mydlinky
    • nanometr
    • nektarinka
    • nora
    • nutrie
    • orangutan
    • parabola
    • peloton
    • periskop
    • pikolitr
    • ponynka
    • prahora (prahory? in this case, I’m tending to call it a plurale tantum, although it’s (strictly speaking) not the case)
    • pranostika
    • pruh/prut
    • rakovina
    • relevance
    • rotoped
    • rydlo
    • seschnout
    • sinusoida
    • spokojenost
    • stranou
    • surikata
    • tempo
    • tiskopis
    • titrace
    • tranzistor
    • traverza
    • trend
    • utiskovat
    • vodivost
    • vyrvat
    • ziskovost
    • zkontrolovat/zkonfiskovat
    • zmutovat

    I’m looking forward to hear your opinion! :c)

  47. zizelevak commented at 2:39 am on February 12, 2017: contributor

    @xHire Lhota - removed

    falzum - removed gondola - removed karfiol - removed luneta - removed nefrit - removed nestor - removed normativ - removed opuka - removed pagoda - removed ponton - removed rytec - removed sahel - removed sutana - removed

    chlor - removed chrom - removed folklor - removed globus - removed kasino - removed kastrol - removed lahev - removed mixer → mixovat - OK, changed naftalen → nafta - removed ozon - removed tampon - removed vitamin → vitalita - OK changed

    holinky → holinka - OK changed lenilky → lentilka - OK changed piliny → pilina - OK changed

    nadlouho - removed nadrobno - removed natrvalo - removed natvrdo - removed navenek - keep it zdaleka - keep it

    dominant → dominanta (no, dominanta is too long) hltan → hltat - OK changed kmit → kmitat - OK changed logicky → logika - OK changed mulat → mula (no, mula is in spain worldlist) muzika → muzikant - OK changed naposled → naposledy (no, naposledy is too long) pasivum → pasivita - OK changed roup → roupice (no, I dont find roupice in dictionary and I never heard it) vespod → vespodu - OK changed

    drtivost - removed drtivost, zdrtit, added drtit kalnost - removed (kluzkost) - keep it (ladnost) - keep it levnost - removed mokrost → mokro - OK changed movitost - removed suknice → sukno - OK changed

    fabrika - removed fanda - removed fara → farnost - removed (farnost have colision with marnost) fiflena - removed fixa → fixace - OK changed glejt - keep it, its older word, no informal hafan - removed hezoun - removed kafe - removed machr - removed marodka - removed mejdan - removed nimrod - removed piksla - removed smola - removed spratek - removed (tatarka) - removed vatra - removed vloni → vloha - removed (zrzek) - changed to zrzavost

    euro - removed flirt → flamendr - keep it, flamender is 6-times less frequent lepra- removed (limitace) - removed minibar → miniatura/minimalista - keep it, your words are too long (mocensky) - removed nahota → nahodile/nahoru - changed nevina - keep it (oktet)- removed onkolog - removed podle - removed sekvoje (because it’s commonly spelled a tiny bit differently) sklivec - removed tankista → tanker - changed tavenina → tavidlo - keep it, I never heard “tavidlo” tenor → teror/terorista - keep it, I try avoid words with terorism tunika - removed varan - keep it

    borka - its very uncommon celistvost - no, too many letters, 8 is max limit deflace - OK added destilace - no, too many letters, 8 is max limit dioda - OK added displej - OK added epopej - OK added firma - no, in spain wordlist fukar - OK added holokaust - no, too many letters, 8 is max limit horda - OK added kabriolet - no, too many letters, 8 is max limit kapybara - OK, added klima - OK, added kosmonaut - no, too many letters, 8 is max limit kotoul - OK, added kotrmelec - no, too many letters, 8 is max limit kropit - OK, added lamela - OK, added litr - no, simular to “lotr” lodivod - OK, added lokomotiva - no, too many letters, 8 is max limit loterie - no, in french wordlist melasa - OK, added mydlinky - no, its plural, and singular “midlinka” is uncommon nanometr - OK, added nektarinka - no, too many letters, 8 is max limit nora - OK, added nutrie - OK, added orangutan - no, too many letters, 8 is max limit parabola - no, in italian wordlist peloton - OK, added periskop - OK, added pikolitr - no, its very uncommon ponynka - no, I never heard this word and not found in dictionary prahory - OK, added. plural is OK, its name of geological epoch pranostika - no, too many letters, 8 is max limit prut - OK, added rakovina - OK, added relevance - no, too many letters, 8 is max limit rotoped - OK, added rydlo - OK, added seschnout - no, too many letters, 8 is max limit sinusoida - no, too many letters, 8 is max limit spokojenost - no, too many letters, 8 is max limit stranou - no, first 4 letters same as “strach” surikata - OK, added tempo - no, in italian wordlist tiskopis - OK, added titrace - no, its very uncommon tranzistor - no, too many letters, 8 is max limit traverza - OK, added trend - no, in english wordlist utiskovat - no, too many letters, 8 is max limit vodivost - OK, added vyrvat - OK, added ziskovost - no, too many letters, 8 is max limit zkontrolovat/zkonfiskovat - no, too many letters, 8 is max limit zmutovat - OK, added

    Added my new Words dynamit reflex arogance abdikace linoleum zhatit svodidlo hematom manko lomcovat termoska glazura amputace anulovat hluchota aspirace sediment rarita skafandr reputace unavit sponzor mahagon rubrika veranda koalice

  48. Update czech.txt b7f682f702
  49. luke-jr commented at 3:53 am on March 6, 2017: member
    What’s the status of this?
  50. zizelevak commented at 6:53 am on March 6, 2017: contributor
    @luke-jr Orginal wordlist was checked by @slush0 and passed all tests. Some members adviced changing some words, which were problematic from language reasons. I dealt with all requests. I think that actual worldlist should be checked by @slush0 again (I am sure that will passed his tests, because I made own tests) and its ready for publish
  51. paveljanik commented at 7:30 am on March 6, 2017: contributor
    @zizelevak what about aspirace and kapybara?
  52. zizelevak commented at 7:41 am on March 6, 2017: contributor
    @paveljanik In czech frequency corpus: aspirace has frequency 433 points, kapybara has 16 points. Aspirace is quite common czech word. Kapybara is not much frequent, but we have there tens of words with less frequency than kapybara. I will keep both words in list.
  53. zizelevak commented at 1:37 pm on July 14, 2017: contributor
    @luke-jr I think that this version is finnal.
  54. luke-jr commented at 5:18 am on July 26, 2017: member
    @slush0 Can you re-ACK the changed version?
  55. jonathancross commented at 2:15 am on November 7, 2017: contributor
    Friendly ping @slush0
  56. nym-zone referenced this in commit 8aaa6f37e8 on Jan 7, 2018
  57. nym-zone referenced this in commit ba25dfac56 on Jan 7, 2018
  58. nym-zone commented at 9:15 am on January 8, 2018: contributor

    I have created a Unicode NFKD-normalized and sorted czech.txt from zizelevak/bips@b7f682f, as modified by approximately the following command:

    0uconv -f utf-8 -t utf-8 -x '::nfkd;' < czech.txt | \
    1	LC_ALL=C LANG=C sort -s > normalized/czech.txt
    

    The result has been confirmed to not have any leading BOM, and to have a final line terminated with '\n' (#622). I did not yet examine the source for these issues. I did examine the source to confirm that no lines had any trailing whitespace (see nym-zone/easyseed@08a05b4, #442).

    SHA-256 hash for the resulting czech.txt:

    0195136b3ba0f3099a9df625e0963f4efb56625b91c3a76bc5b4a9466a26880f7
    
  59. nym-zone referenced this in commit c7d698a35f on Jan 11, 2018
  60. slush0 commented at 0:27 am on August 2, 2018: contributor
    Sorry guys, I completely missed this. If the wordlist still passes bip39 tests, I’m fine with that (I didn’t test it myself).
  61. zizelevak commented at 6:27 am on August 2, 2018: contributor
    @slush0 @luke-jr Every version (including the last one) was checked by my program, that satisfies BIP 0039 rules, including no colissions with other BIP 0039 dictionaries
  62. DonaldTsang cross-referenced this on Dec 24, 2018 from issue Binary Lists by DonaldTsang
  63. DonaldTsang commented at 1:41 am on August 22, 2019: none
  64. DonaldTsang cross-referenced this on Aug 22, 2019 from issue Polish wordlist for BIP0039 by p2w34
  65. luke-jr merged this on Sep 19, 2019
  66. luke-jr closed this on Sep 19, 2019

  67. tevador commented at 5:42 pm on November 16, 2021: none

    I’m not sure how this was missed, but the order of words in the wordlist is not alphabetical. The word “svetr” should come after “svazek”.

    The fact that the list looks sorted but is not may cause subtle bugs in applications using binary search to look up code words.

  68. iancoleman cross-referenced this on Nov 16, 2021 from issue Czech list is not alphabetically ordered by iancoleman

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-28 02:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me