Adding Polish wordlist to BIP39 #1037

pull KarolTrzeszczkowski wants to merge 4 commits into bitcoin:master from KarolTrzeszczkowski:master changing 2 files +2067 −0
  1. KarolTrzeszczkowski commented at 8:32 pm on November 16, 2020: none

    Words chosen using the following rules:

    1. Words are 4-8 letters long.
    2. Words can be uniquely determined typing the first 4 letters.
    3. Special Polish characters like ‘ą’, ‘ę’, ‘ć’, etc… are considered equal to ‘a’, ’e’, ‘c’, etc… in terms of identifying a word. Therefore, there is no need to use a Polish keyboard to introduce the passphrase, an application with the Polish wordlist will be able to identify the words after the first 4 chars have been typed even if the chars with accents have been replaced with the equivalent without accents.
    4. All words are in basic form.
    5. No personal names or geographical names.
    6. No very similar words with 2 letter of difference.
    7. Words are sorted according English alphabet ignoring diacritic signs.
    8. No words already used in other language mnemonic sets (english, italian, french, spanish, czech).
    9. Built with the most popular Polish words based on the Open frequency dictionary of lexems
    10. Words include negative and bad things as those are easier to remember.

    Unlike #753 it this wordlist is based on popular words dictionary and does not include any words used in other language mnemonic sets. It differs also by using polish symbols. Please consider merging it.

  2. Add Polish rules decription to wordlists.md b5022e9034
  3. Upload polish.txt wod list 3f34351f1b
  4. bitmover-studio commented at 8:44 pm on November 18, 2020: contributor

    Hello, I have created a similar list https://github.com/bitcoin/bips/pull/998/ (still not approved). I just used the same script to check your list.

    Your list is very good. levenshtein distance is greater than 1 in every word comparison, and I found no errors in the other rules.

    The only word that I would change is this one:

    mama

    it is a repeated word from the spanish list, mamá

    As most software won’t be able to make a difference between mama and mamá, I would change this one.

    Great work!

  5. Remove accent-stripped conflicts
    Words like mama conflict with Spanish mamá. This commit remove all such words.
    667e05c967
  6. KarolTrzeszczkowski commented at 1:28 am on November 20, 2020: none

    Thank you for the nice words and catching the collision!

    I was able to identify more such word collisions and I removed them: faraon interes ironia legion mama tabu teoria tunel

  7. GitHub-pepe commented at 2:00 am on November 20, 2020: none
    3f34351f1b4891dd3ec1865d188978d4739933ea
  8. p2w34 commented at 9:35 pm on November 29, 2020: none

    Being called to the blackboard by seeing my PR referenced I feel obliged to share some of my thoughts. Here is how I see it:

    • #753 was created by manually selecting the words from the well-respected dictionary (https://sjp.pl/slownik/odmiany/); I strongly believe that manual selection is way better than using any list sorted by the frequency of usage; it is not true that the more frequently a particular word is used, the better it fits
    • I commented multiple times on the idea of avoiding repeating the words already used in other word lists; to not to repeat myself – it brings more cons than pros and should be forgotten
    • I do not think that creating yet another version of the Polish word list was necessary; especially without trying to first comment on the existing PR – what purpose does it serve?
    • Last but not least – the sad truth is that similarly to other PRs this effort is wasted as well and this PR will most likely never get merged. The time and energy spent on it could be for sure used better. To not to repeat myself, see my comment in #998
  9. KarolTrzeszczkowski commented at 9:18 pm on December 2, 2020: none
    @luke-jr Could you please take a look at my pull request?
  10. michaelfolkson commented at 1:00 pm on December 28, 2020: contributor

    As I understand it there are two competing PRs to add a Polish wordlist currently open. This one and #753.

    I don’t speak Polish and afaik Luke and the BIP 39 authors don’t either. Before we ask one of the BIP authors to ACK this (which is needed to merge it) we are going to need Polish speaker(s) who ideally understand BIP 39 to look over this and judge which PR should be merged (if any).

    This PR looks high quality to me but I am neither a Polish speaker nor a BIP author.

  11. michaelfolkson commented at 1:02 pm on December 28, 2020: contributor
    This is also potentially relevant to this PR from one of the BIP 39 authors https://github.com/bitcoin/bips/pull/1047
  12. Merge branch 'master' into master fd1f08762a
  13. tkowalczyk commented at 8:25 pm on December 28, 2020: none

    Please consider this PR it looks promising and it will be definietly valueable for community.

    Code of this PR is not complicated so I believe it will not have an bad impact for project and its efficiency and security.

  14. KrzychuLSK commented at 8:31 pm on December 28, 2020: none
    Great idea! It will be vey valueable for community! I’m polish native speaker so for me this wordlist will be perfect.
  15. cornl1 commented at 8:38 pm on December 28, 2020: none
    Looks really good to me. It may have positive impact on Polish community, especially the newcomers.
  16. Wojtekop commented at 8:51 pm on December 28, 2020: none
    Polish wordlist will be amazing. It will help every polish native speaker like me.
  17. p2w34 commented at 9:30 pm on December 28, 2020: none

    NACK from my side.

    One does not have to spend more than one minute to find words that are considered offensive. I was also stroke by the incorrect order of words at the end of the list. The chosen set of words looks strange to me. I am under the impression the list was generated automatically, without really trying to polish it. Not to mention that the proper approach should be to manually select all the words. And what I really cannot understand is the list of the comments above - are they just quick comments (like doing a favor?), without putting the effort into at least reading the list? Last but not least, I still do not understand what was the reason behind this list while there was already another PR created.

    I am not impressed, it does not look good, hence the NACK.

  18. KarolTrzeszczkowski commented at 9:41 pm on December 28, 2020: none

    @p2w34 I explained in the description that I included offensive words as they are loaded with emotions and easy to remember. Seed words are private so there is no reason to avoid them. If it is required, I will remove them.

    If you could point me directly to the incorrect order? Thank you.

    The reason I created this list was because you refused to include feedback from other people and I didn’t like your choices of words at all. They are mostly weird and not memorable at all. Judging from your attitude and how proud you are of your work, I expect you’d refuse my feedback as you refused to include other people feedback.

  19. p2w34 commented at 9:57 pm on December 28, 2020: none

    The reason I created this list was because you refused to include feedback from other people and I didn’t like your choices of words at all. They are mostly weird and not memorable at all. Judging from your attitude and how proud you are of your work, I expect you’d refuse my feedback as you refused to include other people feedback.

    The only reason I am spending my time being involved in various discussions here is that I am worried about the quality of the word lists. And I cannot say that I am having a good time - on the contrary. I may make comments which sound harsh but I do this only when absolutely necessary. All the comments made in another PR with the Polish list that I created were addressed. And yes, you got me right - I am proud of my work.

    As my final comment, I repeat myself - I am of opinion that BIP0039 should not be continued in the current form. The problem of word lists should be approached separately, in a more holistic manner. This is, however, to be decided by the BIP0039 maintainers. Or one may try to simply write a new proposal.

  20. KarolTrzeszczkowski commented at 10:02 pm on December 28, 2020: none
    @p2w34 if you could point me to the ordering error you mentioned?
  21. KarolTrzeszczkowski commented at 10:54 pm on December 28, 2020: none
    I don’t think the author of the competing PR should leave a NACK here and lie about an ordering error, that would have been found in the initial algorythmic check performed by @bitmover-studio. It’s clear that it’s nothing but an ego battle having nothing to do with the quality of the proposed wordlist.
  22. p2w34 commented at 11:11 pm on December 28, 2020: none

    I don’t think the author of a competing PR should leave a NACK for a competing PR and lie about an ordering error

    There are words starting with ł that are placed at the end of the list, instead of being together with l. Ideally, the algorithmic checks you mention should be done by a native.

    It’s clear that it’s nothing but an ego battle having nothing to do with the quality of the proposed wordlist.

    Again, it is not.

  23. KarolTrzeszczkowski commented at 11:15 pm on December 28, 2020: none

    There are words starting with ł that are placed at the end of the list, instead of being together with l

    You are right. I am sorry for this accusation. It’s supper weird my algorithm left it out and bitmovers check haven’t caught it. I’m sorry once again. I will fix it.

  24. ZenulAbidin commented at 1:30 am on January 4, 2021: none

    There are words starting with ł that are placed at the end of the list, instead of being together with l

    You are right. I am sorry for this accusation. It’s supper weird my algorithm left it out and bitmovers check haven’t caught it. I’m sorry once again. I will fix it.

    I checked the latest revision of your wordlist (which should be this one, correct me if I’m wrong), using my tool bip39validator and my log output (https://paste.ubuntu.com/p/Jwc83KJ8ZB/) says all words are <= 8 chars, no accents, are unique within the first 4 words and have a Levenshtein distance between every other word of at least 2. Those are the default parameters it runs with.

    Those are three of the four major checks that a BIP39 wordlist should be tested against, but you currently have to make the fourth check by hand; ensuring there are no words in this list that are similar to words in other (merged) languages’ lists.

    I should mention that I am not a Polish speaker either.

  25. luke-jr added the label Proposed BIP modification on Feb 3, 2021
  26. luke-jr closed this on Jul 2, 2021


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-27 09:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me