BIP39 Add German Wordlist #1071

pull SebastianFloKa wants to merge 7 commits into bitcoin:master from SebastianFloKa:bip39-add-german-wordlist-(3rd-attempt) changing 2 files +2065 −0
  1. SebastianFloKa commented at 10:52 am on February 22, 2021: none

    The BIP-0039 German Wordlist is based on spelling rules defined in the “German Duden” and checked along different aspects of quality by native speakers. Words were selected manually and also checked manually to ensure words are sufficiently common and positive. Tools were used to ensure sufficient levenshtein distance between words, prevent conflict with other BIP-0039 wordlists as well as to eliminate homophones inside the wordlist.

    There was a first attempt (#721) and a second attempt (#942) for a BIP-0039 German Wordlist. This third attempt intents to combine the requirements from both, the Bitcoin Community within the Geman-speaking area as well as must-have requirements for BIP-0039 Wordlists such as levenshtein distance and no homophones.

    Special considerations:

    1. Words can be uniquely determined typing the first 4 characters.
    2. Words contain between 3 to 8 letters per word
    3. No words with 1 letter of difference (no levenshtein distance substitution, addition or permutation lower than 2)
    4. No words already used in other official BIP-0039-Wordlists
    5. No accents or special characters. No Ä, Ö, Ü, ß
    6. All-Caps in order to address nouns not written in lowercase in German and keep number of characters to 26 (A-Z) only.
    7. Orthography based on German spelling reform of 2006 and based on the German Duden 2021
    8. Only singular nouns and plural tantum nouns (if no singular exists).
    9. If a homophone for a word exists, only one of these words is allowed in the wordlist under condition that using grammatical gender ensures unambiguous spelling.
    10. No offensive words and no words implying negative, sad or bad feelings.
  2. BIP39 Add German Wordlist c10c822448
  3. bip-0039 special considerations german wordlist 7d41aa6b39
  4. SebastianFloKa cross-referenced this on Feb 22, 2021 from issue Add German word list for BIP0039 by DavidMStraub
  5. SebastianFloKa cross-referenced this on Feb 22, 2021 from issue Adding BIP-39 wordlist in German (2nd try) by cr
  6. SebastianFloKa commented at 11:47 am on February 22, 2021: none

    Thanks @DavidMStraub for starting with the first attempt and @cr for the second attempt regarding a BIP-0039 German Wordlist. Hope you will join this PR which main difference is the implementation of levenshtein distance (addition, substitution & permutation not lower than 2).

    Supplementary to the basic requirements some more considerations:

    • This proposal follows @DavidMStraub requirement of nominative nouns. On top countries, cities, persons, names etc. were excluded.
    • @thomasklemm requested to change to more commonly used words, this should be the case now.
    • @cr requested to avoid collision with other released BIP-0039-Wordlists which is taken into consideration.
    • In order to bring in cultural specialty to the BIP-0039 the proposal is written in all-caps. Writing nouns in lower-case-letters is conflicting with common sense of German language. Studies also show that the readability of handwritten Text in all-caps is significantly better, so this lowers the risk of losing money. A positive side-effect is that the number of used characters reduces from 52 to 26. This is an advantage not only for self-filled cold wallets.
    • Going the extra mile even the levenshtein distance “addition” was reduced to a value lower than 3 for the beginning of a word by exluding words with a related meaning (example Lanze & Pflanze, Sekt & Insekt, etc. are in the list - Mut & Unmut not).
    • @rodasmith made some requirements for avoiding homophones. The current list even went beyond by excluding words completely from the list if a homophone exists as a noun with same genus (Miene&Mine, Verse&Ferse, Hund&Hunt, Graph&Graf, etc.) Basis: https://de.wiktionary.org/wiki/Verzeichnis:Deutsch/Homophone
  7. in bip-0039/german.txt:28 in 7d41aa6b39 outdated
    23+ACHTUNG
    24+ACKERBAU
    25+ADAPTER
    26+ADDITION
    27+ADJEKTIV
    28+ADJUDANT
    


    thomasklemm commented at 12:05 pm on February 22, 2021:

    Spelled w/ t instead of d.

    0ADJUTANT
    

    I would have written it the same way, wasn’t sure though, googled it and found it’s spelled differently.


    SebastianFloKa commented at 1:36 pm on February 22, 2021:
    Will be accepted of course. Unfortunately it mainly shows that the spell checker used didn’t work properly - I now quickly tested some other spell checkers and interestingly they come to different conclusions for certain words. Will evaluate later which spell checker works best for our requirements.

    thomasklemm commented at 9:05 pm on February 22, 2021:
    Went through the rest of the list word by word @SebastianFloKa, didn’t find anything else, thinking this one was an outlier.

    SebastianFloKa commented at 9:31 pm on March 2, 2021:
    @thomasklemm thanks a lot for your effort, really appreciated. I used a different spellchecker now and one more typo is “Avocado” which will be removed. We can put all of your highlighted words on a “preliminary replacement list”. Means I’ll not respond to each comment. Below will then be posted a list of optional words in order to decide which ones should be replaced or not.

    SebastianFloKa commented at 1:52 pm on April 8, 2021:
    Word replaced
  8. in bip-0039/german.txt:55 in 7d41aa6b39 outdated
    50+ALPAKA
    51+ALPEN
    52+ALPHABET
    53+ALTERTUM
    54+ALTSTADT
    55+ALTWAGEN
    


    thomasklemm commented at 12:08 pm on February 22, 2021:
    Not too commonly used, 300k Google results vs 10m+ for “Neuwagen” (which Altwagen would be the opposite of)

    SebastianFloKa commented at 1:52 pm on April 8, 2021:
    Word replaced
  9. in bip-0039/german.txt:319 in 7d41aa6b39 outdated
    314+BRISANZ
    315+BROTLAIB
    316+BRUCH
    317+BRUNNEN
    318+BRUST
    319+BRUTZEIT
    


    thomasklemm commented at 12:13 pm on February 22, 2021:
    Read BROTZEIT, anyone else?

    DivineDominion commented at 6:22 am on March 10, 2021:
    No, but I can see how capitalization makes word recognition worse esp at low font sizes here

    SebastianFloKa commented at 1:52 pm on April 8, 2021:
    Word replaced
  10. in bip-0039/german.txt:328 in 7d41aa6b39 outdated
    323+BUG
    324+BUMERANG
    325+BUSFAHRT
    326+BUSREISE
    327+BUSSARD
    328+BUTAN
    


    thomasklemm commented at 12:16 pm on February 22, 2021:

    Bhutan would be pronounced the same, right? If so, maybe we should remove the word.

    I also spelled it “Buthan” when googling since I wasn’t sure (just leaving those comments so others could chime in if that happened to them too)


    DivineDominion commented at 6:23 am on March 10, 2021:
    👍 makes sense. Not sure if folk usually know their way around flammable gas :)

    SebastianFloKa commented at 1:52 pm on April 8, 2021:
    Word replaced
  11. in bip-0039/german.txt:400 in 7d41aa6b39 outdated
    395+DORSCH
    396+DOSIERER
    397+DOZENT
    398+DRACHE
    399+DRAHT
    400+DRALL
    


    thomasklemm commented at 12:20 pm on February 22, 2021:

    SebastianFloKa commented at 9:31 pm on March 2, 2021:
    Intented was the word “der Drall” such as rotation: https://www.duden.de/rechtschreibung/Drall

    SebastianFloKa commented at 1:50 pm on April 8, 2021:
    OK, will replace it (URANERZ)
  12. in bip-0039/german.txt:311 in 7d41aa6b39 outdated
    306+BRATEN
    307+BRAUEREI
    308+BREMSE
    309+BRENNGUT
    310+BRETT
    311+BREZE
    


    thomasklemm commented at 12:23 pm on February 22, 2021:
    0BREZEL
    

    image

    Different word variants exist, I think it would be fine to pick “Brezel”, that’s how Wikipedia lists it: https://de.wikipedia.org/wiki/Brezel


    SebastianFloKa commented at 9:32 pm on March 2, 2021:
    You prefer to keep the word written as “Brezel” or to replace it completely?

    DivineDominion commented at 6:24 am on March 10, 2021:
    +1 for using the non-regional “Brezel”

    SebastianFloKa commented at 1:53 pm on April 8, 2021:
    Word replaced
  13. in bip-0039/german.txt:588 in 7d41aa6b39 outdated
    583+FENSTER
    584+FERIEN
    585+FERKEL
    586+FESTUNG
    587+FETT
    588+FETUS
    


    thomasklemm commented at 12:27 pm on February 22, 2021:
    Listed as “Fötus” in Wikipedia, feels to me more commonly used like that, but we can’t use that spelling due to the Umlaut. Both spellings seems to be fine though, might be hard for a user though if he selectes “o” as a second character expecting to put in “foetus” and doesn’t get a matching word. Replace with another word?

    SebastianFloKa commented at 2:42 pm on April 8, 2021:
    word replaced
  14. in bip-0039/german.txt:632 in 7d41aa6b39 outdated
    627+FRAKTUR
    628+FRAU
    629+FREGATTE
    630+FREITAG
    631+FREQUENZ
    632+FRESKE
    


    thomasklemm commented at 12:32 pm on February 22, 2021:
    0FRESKO
    

    Masculine version seems to be used predominantly here, didn’t know there’s a feminine version. image image


    SebastianFloKa commented at 9:32 pm on March 2, 2021:
    OK, let’s go for “Fresko” except you’d like to replace it completely.

    thomasklemm commented at 2:02 pm on April 8, 2021:
    @SebastianFloKa “Fresko” sounds good and is already replaced, this thread can be marked as resolved too.

    SebastianFloKa commented at 2:41 pm on April 8, 2021:
    Word changed
  15. in bip-0039/german.txt:758 in 7d41aa6b39 outdated
    753+GRUND
    754+GRUPPE
    755+GULASCH
    756+GULLY
    757+GUMMI
    758+GUMPE
    


    thomasklemm commented at 12:36 pm on February 22, 2021:
    Is “Gumpe” common enough? Didn’t know that one. Doesn’t mean anything, just leaving this comment so others could react. Written as it’s spelled, can’t really be written wrongly, so it’s fine even if it would be less known.

    DivineDominion commented at 6:25 am on March 10, 2021:
    Never heard of that. Sounds like something local, much like “Pömpel” and “Puschen”

    SebastianFloKa commented at 1:53 pm on April 8, 2021:
    Word replaced
  16. thomasklemm commented at 12:44 pm on February 22, 2021: none

    Great work @SebastianFloKa, thanks for opening this PR with an alternative list. I know that a lot of work has gone into it already from #721, and again thanks to everyone who participated in the previous two attempts for a German wordlist. Hope some of you native German speakers could go through the list here too and leave some comments!

    Reviewed the wordlist until line 1000 so far, going through the rest later.

  17. rodasmith commented at 6:29 pm on February 22, 2021: none
    ACK. This list does not include any homophones. LGTM
  18. in bip-0039/german.txt:1006 in 7d41aa6b39 outdated
    1001+KNOSPE
    1002+KNOTEN
    1003+KOBOLD
    1004+KOBRA
    1005+KOCHKURS
    1006+KODEX
    


    thomasklemm commented at 8:02 pm on February 22, 2021:
    “Kodex” could also be written “Codex”: https://de.wikipedia.org/wiki/Kodex

    SebastianFloKa commented at 1:53 pm on April 8, 2021:
    Word replaced
  19. in bip-0039/german.txt:1022 in 7d41aa6b39 outdated
    1017+KOMPASS
    1018+KONDOR
    1019+KONFEKT
    1020+KONSUL
    1021+KONTO
    1022+KONUS
    


    thomasklemm commented at 8:03 pm on February 22, 2021:
    “Kegel” would be more commonly known than “Konus”?

    SebastianFloKa commented at 9:33 pm on March 2, 2021:
    “Kegel” creates some levenshtein errors such as “Pegel” and therefore can’t be used. But anyway, let’s put “Konus” on our replacement list.

    SebastianFloKa commented at 1:54 pm on April 8, 2021:
    Word replaced
  20. in bip-0039/german.txt:1137 in 7d41aa6b39 outdated
    1132+LOTION
    1133+LOTSE
    1134+LOTTO
    1135+LUFTWEG
    1136+LUKE
    1137+LUNKER
    


    thomasklemm commented at 8:08 pm on February 22, 2021:

    Wouldn’t have known what “Lunker” is, never heard of that: https://de.wikipedia.org/wiki/Lunker

    Maybe we could use “Klunker” (or a different word)?


    SebastianFloKa commented at 1:54 pm on April 8, 2021:
    Word replaced
  21. in bip-0039/german.txt:1168 in 7d41aa6b39 outdated
    1163+MANN
    1164+MANTEL
    1165+MAPPE
    1166+MARACUJA
    1167+MARDER
    1168+MARILLE
    


    thomasklemm commented at 8:12 pm on February 22, 2021:
    “Marille” is Austrian and Bavarian for “Aprikose”, which we also have in the list above. Just mentioning, guessing it’s well known and fine. https://de.wikipedia.org/wiki/Aprikose

    SebastianFloKa commented at 1:54 pm on April 8, 2021:
    Word replaced
  22. in bip-0039/german.txt:1196 in 7d41aa6b39 outdated
    1191+MENGE
    1192+MENISKUS
    1193+MENSCH
    1194+MERIDIAN
    1195+MERKMAL
    1196+MERLIN
    


    thomasklemm commented at 8:14 pm on February 22, 2021:
    Only name I saw so far in the list?

    SebastianFloKa commented at 9:33 pm on March 2, 2021:
    Had the bird “Merlin” in mind: https://www.duden.de/rechtschreibung/Merlin_Falke_Vogel But OK to put it on our replacement list.

    SebastianFloKa commented at 1:54 pm on April 8, 2021:
    Word replaced
  23. in bip-0039/german.txt:1202 in 7d41aa6b39 outdated
    1197+MESSER
    1198+METALL
    1199+METEOR
    1200+METHODE
    1201+METZGER
    1202+MEUTE
    


    thomasklemm commented at 8:15 pm on February 22, 2021:
    Google lists mostly the band “Meute” on the first page (when I’m googling), maybe “Meuterei”?

    SebastianFloKa commented at 9:34 pm on March 2, 2021:
    https://www.duden.de/rechtschreibung/Meute compared to other words the Duden lists “Meute” (group of people) as not too uncommon (level 2 of 5). But we can put it on our replacement list.

    SebastianFloKa commented at 1:54 pm on April 8, 2021:
    Word replaced
  24. in bip-0039/german.txt:1214 in 7d41aa6b39 outdated
    1209+MIMIK
    1210+MINERAL
    1211+MINIGOLF
    1212+MINUSPOL
    1213+MINZE
    1214+MIRAKEL
    


    thomasklemm commented at 8:21 pm on February 22, 2021:

    Didn’t use “Mirakel” in this form as meaning “Wunder” so far IRL yet, but would have spelled it correctly: https://de.wikipedia.org/wiki/Mirakel

    One might associate it though with “Miracle Whip” (with a c), which my relatives pronounce as “Mirakel Whip”: image

    Maybe we can use “Wunder”? The other words with “under” (Flunder/hundert) don’t clash phonetically.


    SebastianFloKa commented at 9:34 pm on March 2, 2021:
    “Wunder” has at least a levenshtein addition collision with “Wunde”. But if you prefer “Wunder” over “Wunde” (more positiv) we just need to double check if there will be other collisions.

    SebastianFloKa commented at 1:55 pm on April 8, 2021:
    Word replaced
  25. in bip-0039/german.txt:1761 in 7d41aa6b39 outdated
    1756+THEKE
    1757+THEMA
    1758+THEORIE
    1759+THERMIK
    1760+THRON
    1761+TIDEHUB
    


    thomasklemm commented at 8:32 pm on February 22, 2021:
    Wikipedia lists “Tidehub” as “Tidenhub”, both are fine: https://de.wikipedia.org/wiki/Tidenhub

    SebastianFloKa commented at 9:36 pm on March 2, 2021:
    Agree with “Tidenhub”. Do you want to go for this or prefer to replace it completely?

    SebastianFloKa commented at 1:55 pm on April 8, 2021:
    Word replaced
  26. in bip-0039/german.txt:1798 in 7d41aa6b39 outdated
    1793+TRAMPEL
    1794+TRANSIT
    1795+TRAPEZ
    1796+TRATSCH
    1797+TRAUM
    1798+TREBE
    


    thomasklemm commented at 8:33 pm on February 22, 2021:
    Haven’t heard this word before: “auf der Trebe sein” https://www.duden.de/rechtschreibung/Trebe Little risk of spelling it wrong though

    SebastianFloKa commented at 1:55 pm on April 8, 2021:
    Word replaced
  27. in bip-0039/german.txt:1946 in 7d41aa6b39 outdated
    1941+WELS
    1942+WELTALL
    1943+WENDUNG
    1944+WERKTAG
    1945+WESEN
    1946+WESIR
    


    thomasklemm commented at 8:41 pm on February 22, 2021:

    Phonetically similar to “Visier”, which should be more commonly used. “Visier” isn’t in the list yet. https://www.duden.de/rechtschreibung/Visier

    Wikipedia has two more spellings for “Wesir”: https://de.wikipedia.org/wiki/Wesir


    SebastianFloKa commented at 1:55 pm on April 8, 2021:
    Word replaced
  28. in bip-0039/german.txt:2034 in 7d41aa6b39 outdated
    2029+ZUMUTUNG
    2030+ZUNAHME
    2031+ZUNFT
    2032+ZUSATZ
    2033+ZUSCHLAG
    2034+ZUSEHER
    


    thomasklemm commented at 8:47 pm on February 22, 2021:

    “Zuschauer” is more commonly used according to Duden, “Zuseher” seems to be mostly Austrian:

    I also read “zu sehr” when reading it, anyone else?


    DivineDominion commented at 6:27 am on March 10, 2021:
    Sounds oddly formal, Zuschauer would be ok

    SebastianFloKa commented at 1:56 pm on April 8, 2021:
    Word replaced
  29. in bip-0039/german.txt:1140 in 7d41aa6b39 outdated
    1135+LUFTWEG
    1136+LUKE
    1137+LUNKER
    1138+LUNTE
    1139+LURCH
    1140+LUV
    


    thomasklemm commented at 8:51 pm on February 22, 2021:

    Not sure about this one, pronounced [lu:f]. Not too commonly used (https://www.duden.de/rechtschreibung/Luv), though it should be known. https://de.wikipedia.org/wiki/Luv_und_Lee

    When reading as [luf] it can easily become “Luft” or so


    SebastianFloKa commented at 1:56 pm on April 8, 2021:
    Word replaced
  30. in bip-0039/german.txt:1415 in 7d41aa6b39 outdated
    1410+PLAKETTE
    1411+PLANUNG
    1412+PLASTIK
    1413+PLATIN
    1414+PLENUM
    1415+PLEUEL
    


    thomasklemm commented at 8:56 pm on February 22, 2021:
    “Pleuel” seems to be engineering vocabulary, would have spelled it correctly without knowing what it is: https://de.wikipedia.org/wiki/Pleuel

    SebastianFloKa commented at 1:56 pm on April 8, 2021:
    Word replaced
  31. in bip-0039/german.txt:1671 in 7d41aa6b39 outdated
    1666+SPRITZE
    1667+SPROSSE
    1668+SPRUNG
    1669+SPUCKE
    1670+SPULE
    1671+SPUND
    


    thomasklemm commented at 8:58 pm on February 22, 2021:

    Doesn’t seem commonly used, more as “Jungspund” (10x the Google results), but that’s too long.


    SebastianFloKa commented at 9:52 pm on April 7, 2021:
    It is mainly used in the context of barrels, yes. @thomasklemm can / shall we leave it or do you prefer to exchange it?

    thomasklemm commented at 9:11 am on April 8, 2021:
    @SebastianFloKa Fine either way, we can leave it in.
  32. thomasklemm commented at 9:01 pm on February 22, 2021: none
    Looked through the rest of the list, very good work IMO 👍 Just some minor notes on some words.
  33. thomasklemm commented at 9:15 pm on February 22, 2021: none

    Very well-prepared word selection @SebastianFloKa, LGTM 👍 Went through the entire word list word by word, left minor comments on a few words.

    If you have the chance and especially if you’re a native German speaker, please jump in for a review too.

  34. SebastianFloKa commented at 9:39 pm on March 2, 2021: none

    Checking with https://www.korrekturen.de/rechtschreibpruefung.shtml following words (beside already mentioned ones: “Gumpe”, “Tidehub”, “Trebe” & “Zuseher”) are marked eventhough they are all properly listed in the https://www.duden.de/. Beside other reasons this seems partly be related to words more common in Austria or Switzerland. I personally think it’s good to have some words from different parts of German language region as long as they are understood everywhere - open to discuss.

    Allrad Bauchweh Gemahl Kapriole Kassier Kubik Oktagon Petersil Vorkehr Zuhause @thomasklemm in particular and maybe @neox5 wants to have a look as well: Shall we replace all of above words or would you say we can / should keep some of them?

  35. SebastianFloKa commented at 10:03 pm on March 2, 2021: none

    In case we would replace all the words highlighted by @thomasklemm except for “Fresko” & “Tidenhub” as well as all the 10 words marked by the spellchecker (Allrad, …. , Zuhause) there are 31 words to be replaced in the next loop. Due to working for 2 1/2 years on this project now with changing “special considerations” many words felt out of the list over time. Therefore a big “rerun” against levenshtein collision, other worklists, homophones etc. was made and a bunch of acceptable words was found. Here are 31 proposals:

    BART BEIN BLECH BUSCH FUNKE GELD HALLO HARZ HECHT HOLZ KREUZ KURS LIEBE LUST MUSE NATUR PORTO PROBE PUMPE PUNKT RASEN REIHE REST RIND RITUS RUHM STROH TALER TREUE WANNE WUNDER

    Due to the inter correlation it might be necessary to have some backup words:

    AKKU ALGE BELAG BUNDHOSE DEMO DOKU ENDE EURO FANG FIEBER MATHE NETTO PORE SEEHUND SOLD SPESEN VIEH WEITE WOGE VISIER

    So if you prefer to replace some other words from the initial list with above backup words is fine as well.

  36. TZocker commented at 8:17 pm on March 8, 2021: none

    Vorschläge:

    Amsel Ostern Fernweh Simulant Fern Walnuss Lorbeere Misteln Wichtel Holz Zunge Zug Mettigel Maihock Mai Kraut Wurst

  37. DivineDominion commented at 6:31 am on March 10, 2021: none
    If replacements are needed, I’d like to suggest Drossel, so we have all of the well-known bird names of “Amsel, Drossel, Fink und Star”. Would definitely prefer these over nautical vocabulary :)
  38. nisc commented at 7:22 pm on March 10, 2021: none

    Guys thank you for your service, but I can’t hide that I’m mostly following this conversation because it reliably makes me giggle.

    PS: After thinking it through, I would probably not include any of the Breze* words. There are too many regional variations, which will lead at least to confusion, but maybe even to emotion and anger (“why did they dare to include this inferior spelling in my seed phrase?”).

  39. SebastianFloKa commented at 4:48 pm on April 2, 2021: none
    thanks @TZocker @nisc @DivineDominion for joining and your input. New proposal with implementations also with the initial ones of @thomasklemm will follow soon.
  40. SebastianFloKa commented at 7:05 pm on April 4, 2021: none

    Would definitely prefer these over nautical vocabulary @DivineDominion Are there any specific words you’d like to see replaced? Not sure which one is meant with “nautical”.

  41. DivineDominion commented at 8:12 am on April 6, 2021: none
    @SebastianFloKa “Luv” and maybe “Tidehub”, as pointed out by others in comments above
  42. SebastianFloKa commented at 4:17 pm on April 6, 2021: none

    @TZocker I checked your proposals against criterias:

    Vorschläge:

    Amsel –> NOK - Levensthein substitution collision with AMPEL Ostern –> NOK - Indication by first 4 letters not ensured against “Osterei” Fernweh –> OK - we have “Heimweh” already, but OK to go for Fernweh as well. Simulant –> OK Fern –> Not a noun Walnuss –> OK Lorbeere –> OK, typically used as plural, but OK. Misteln –> NOK - plural not singular / singular with levenshtein collision Wichtel –> NOK - Levenshtein substitution collision with Wachtel Holz –> OK - already in proposal list above Zunge –> NOK - levenshtein substituition collision with Junge Zug –> NOK - levenshtein addition first 3 letters collision (Zugriff, Zugzwang, etc.) Mettigel –> NOK - not listed in the German Duden Maihock –> NOK - not listed in the German Duden Mai –> NOK - levenshtein addition first 3 letters collision Kraut –> NOK - levenshtein substitution collision with Kraft Wurst –> NOK - levenshtein substitution collision with Durst @DivineDominion Beside “Amsel” (see “Ampel” above) also “Star” shows a levenshtein substitution collision (“Stau”). “Fink” already in the list, “Drossel” OK to add. Will remove “Luv” & “Tidehub” completely. @nisc all “Breze*” words will be removed

  43. Update bip-0039/german.txt
    Co-authored-by: Thomas Klemm <github@tklemm.eu>
    7541786375
  44. Update german.txt
    Improvement loop mainly based on feedback of @thomasklemm but also @TZocker & @DivineDominion & @nisc
    c9b4386128
  45. neox5 commented at 4:59 pm on April 6, 2021: none

    Vorschläge: Daumen Nagel Schrift Orange Triangel

    If you could share your tools for checking, I would do the checks by myself! So you don’t have to do all the work by yourself 😉

  46. thomasklemm commented at 9:18 am on April 8, 2021: none

    @SebastianFloKa Thanks for incorporating all the feedback to the word list. IMO it’s really good work, has had many iterations already and can get merged. @SebastianFloKa You should see a “Resolve conversation” button next to each individual conversation and can close the ones that are now resolved (Only PR author and repo maintainers seems to see it according to https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/commenting-on-a-pull-request#resolving-conversations, so I can’t mark my own comments as resolved).

    To all German native speakers reading this PR: It would be really good if you can take the time to go through the list and leave your comments or a 👍 on the PR.

  47. thomasklemm cross-referenced this on Apr 8, 2021 from issue Allowing initiator of a conversation to resolve it by thomasklemm
  48. nisc approved
  49. SebastianFloKa commented at 1:47 pm on April 8, 2021: none

    @neox5 thanks

    Vorschläge:

    Daumen - NOK - Levenshtein-substitution collision with Gaumen Nagel - NOK - Levenshtein-substitution collision with Nadel & first 4 letters same with Nagetier Schrift - NOK - first 4 letters same with Schrank Orange - NOK - colission with existing word in BIP0039 English Wordlist

    Regarding the tool:

    • There are actually many separate tools and it would require significant redesign to make it useable for others (background-understanding of abbreviations). But there are ready to use solutions as far as I know shared in other wordlist conversations.

    • Attached a list of backupwords that probably fulfill the requirements in case you want to replace certain words - tbc in individual cases. 210408-BIP39-German-Wordlist_backupwords.txt

    • Will replace “Zunahme” (close to Zuname) with “Zufluss” & long word “Artistik” with shorter “Artikel”

  50. thomasklemm approved
  51. Update german.txt
    "Zunahme" too close to "Zuname" and other improvements
    63b71073c9
  52. b068931cc450442b63f5b3d276ea4297 commented at 3:34 pm on April 10, 2021: none
    I had already finished a draft of it independently in December and unfortunately only now managed to publish it. However, it also contains words that appear in other word lists. Maybe the comparison helps anyway: https://github.com/dys2p/wordlists-de/blob/main/de_2048.md
  53. b068931cc450442b63f5b3d276ea4297 commented at 4:07 pm on April 10, 2021: none
    I just had a quick look at the list, you could shorten some words even more e.g. GOLDADER -> GOLD REISFELD -> REIS or REISE REICHTUM -> REICH
  54. SebastianFloKa commented at 9:32 pm on April 10, 2021: none

    @b068931cc450442b63f5b3d276ea4297 Hi, thanks for participating.

    unfortunately you are right, your list contains many collisions with other BIP0039 Wordlists (278 collisions in total). Beside this it contains

    • 7 collisions with unambigousnes of first 4 letters (baum/baumarkt, dringend/drinnen, etc.),
    • some words are in the list and a homophone exists with same genus (Leib/Laib, Lied/Lid, etc.)
    • and main point is that there are many levenshtein errors. I haven’t checked in detail but they show up quite obviously (bett/brett, bezug/ bezog, anzug/abzug, etc.)

    I just had a quick look at the list, you could shorten some words even more e.g.

    GOLDADER –> GOLD –> NOK - levenshtein substitution error with GELD REISFELD –> REIS –> NOK - levenshtein addition error with PREIS –> or REISE –> NOK - collision with our “extra mile requirement” to avoid levenshtein addition on the first letters if too similar regarding the meaning of a word (ABREISE) REICHTUM -> REICH –> NOK - levenshtein substitution error with TEICH

    Question: Do you have the ability to filter your list for singular nouns only and share it? Then a countercheck might make sense.

  55. SebastianFloKa commented at 11:32 am on April 11, 2021: none
    @b068931cc450442b63f5b3d276ea4297 I filtered your proposal for singular nouns, ran the spell checker (Puzzel, Rinnsaal), excluded some negative words, excluded words mentioned in other wordlists, excluded homophones, excluded collisions concerning first 4 letters-rule & excluded words with levenshtein errors: 210411-BIP39-German-Wordlist_backupwords.txt In a next step we can replace some lower quality words from the current list with some of your words.
  56. b068931cc450442b63f5b3d276ea4297 commented at 5:14 pm on April 11, 2021: none
    @SebastianFloKa Thank you. In which form do we want to do this best and why do you actually only want nouns?
  57. Update german.txt
    Improvement loop related to input of @b068931cc450442b63f5b3d276ea4297: replacing uncommon / difficult words + reducing "homophone risky words" + reducing amount of words starting with AB***
    Thanks for approving if OK or leaving comments if NOK. Also @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith
    ea11524399
  58. SebastianFloKa commented at 1:53 pm on April 12, 2021: none

    @SebastianFloKa Thank you. In which form do we want to do this best

    See latest proposal - leave comments in case you disagree or approve if OK.

    and why do you actually only want nouns?

    There are advantages for brainwallets but these can be negleted as brainwallets aren’t recommended except for special situations. But mainly it’s the same reason why the effort regarding levenshtein distance is done: to reduce room for misinterpretation which might cause loss of money.

    • Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially.
    • Somebody not aware of this special consideration might get cautious if a “non singular noun” occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.
    • Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only. @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297 Thumbs up (or approve changes) in case you agree to latest modifications in the list (or leave comments if not) and we could then go for a third-party-check concerning levenshtein distance etc.
  59. thomasklemm approved
  60. b068931cc450442b63f5b3d276ea4297 commented at 5:37 pm on April 12, 2021: none

    @SebastianFloKa I think that there should be another fundamental discussion about whether it makes sense to omit verbs and adjectives or not. The other word lists also work with adjectives and verbs and the omission only unnecessarily restricts the possible words.

    Somebody aware of this special consideration would very likely recognize if a wrong word that violates this structure (adjective, verb, etc.) was written down accidentially. Somebody not aware of this special consideration might get cautious if a “non singular noun” occure whereas the other 23 words are singular nouns (even if few words in the wordlist exists as verb etc. as well). Not guaranteed that the error will be recognized, of course, but chances are increased.

    By making the words as familiar as possible and known to everyone, you probably also reduce the risk of people making mistakes. Words like kurz, lang, rot, blau, laufen, gehen, stehen are known to elementary school students while words (are only examples) like akazie, amnestie, anagramm, annexion and anode are far less known.

    Same for reconstructing certain words from a partly damaged wallet: It significantly reduces choices if you need to search for a singular noun only.

    This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn’t matter if there were only nouns or nouns, verbs and adjectives. @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith What do you guys say, should only nouns be used or also adjectives and verbs in their base form?

  61. rodasmith commented at 7:21 pm on April 12, 2021: none

    I’m satisfied with the outcome of the earlier conversation in #942 that concluded to use nouns only, avoiding confusion around capitalization. Here’s an excerpt from that conversation:

    One of the advantages of not having interflections would be for the “standard user” to reduce the risk of accidentially misinterpreting a word. Example: Your Seed starts with “FUCHS, GELB, LAUFEN” and then follows a “ANGLE” (1. pers. singular). Some people might either oversee this and assume the noun “ANGEL” is foressen or at least they are confused if there might be a typo (no matter if all caps or not). Another constellation could be when two more expected words are 1 levenshtein addition or subraction away. Example “FANGE” (1. pers. singular) and the expected “FANG” (noun) or “FANGEN” (verb) aside. This is the advantage I see behind reducing the amount of wordclasses (and limit to infinitives): it reduces the risk of misinterpretation when copying, writing down, communicating or reading a seed.

    The decision seemed good then and I don’t see any reason to revisit it.

  62. rodasmith approved
  63. b068931cc450442b63f5b3d276ea4297 commented at 7:52 pm on April 12, 2021: none

    to use nouns only, avoiding confusion around capitalization.

    Capitalization is not a problem because the words in the lists are always all lowercase (except @SebastianFloKa who writes everything in capital letters). I think this is also an important point that should be discussed again. I do not share the opinion of @SebastianFloKa:

    I think there’s a misunderstanding. I meant to say “all caps” instead of “capital letters”, sorry for this. I think our intentions are very close together as I also don’t want people to mix lower case and upper case - agree 100%. But writing nouns in lower case is quite uncommon for german speakers whereas filling out templates in upper case (all caps) is much more common (official documents etc.). From your example: klage farbe anzahl initiative stieg banane seide holt gesagt ahnen KLAGE FARBE ANZAHL INITIATIVE STIEG BANANE SEIDE HOLT GESAGT AHNEN

    All previous lists are in lowercase and with nouns, verbs and adjectives.

  64. TZocker commented at 10:18 pm on April 12, 2021: none

    @SebastianFloKa ich bin der Meinung das es nicht als so entscheiden ist ob Verben etc. auch verwendet werden, wir sollten uns an die anderen Bips richten. Wir bekommen dadurch weitere Alternativen. Levenshtein Kollision wird vieles verhindern. Merksätze wären dann möglich….

    Sry bei meinem Vorschlag habe das mit Levenshtein nicht verstanden. Sry…. Ebenso würde ich deiner Bemerkung mit Lorbeere folgen, dort die Mehrzahl zu verwenden.

    Würde eher darauf wert legen den Sprachschatz auf das Niveau von einem 12 Jährigen zu reduzieren. Und eingedeutschte Wörter wie trainer/viper etc. vermeiden, um mit anderen Bips nicht in Konflikt zukommen. Ebenso die Einheimischen Tiere bevorzugen. Genauso die alten Begrifflichkeiten reduzieren.

    Wörter wie Zwinger und Ritze sollten evtl. noch ersetzt werden (Vieldeutigkeit).

    MFG

  65. SebastianFloKa commented at 9:28 pm on April 19, 2021: none

    This is not true, because the used words are always n of 2048 possible words of the list (if only this list was used). So it doesn’t matter if there were only nouns or nouns, verbs and adjectives.

    Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

    wir sollten uns an die anderen Bips richten. [we should focus on other Bips]

    Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists. It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

    Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

    Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns? I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc.

    About certain words: Foreign words: @TZocker Picking out all foreign words is almost impossible, even Onkel is actually a foreign word. Words sounding too “foreignish” were already eliminated. Trainer: is a borderline word, I would have said this is still OK as it is mentioned in the Duden as very common. But TBD. Viper: Is mentioned in the Duden as “mittelhochdeutsch”, pronounced German (differently pronounced in English) and with latin background long ago. For me OK – TBD Zwinger / Ritze: Not aware of inappropriate meaning, particularly Zwinger, but OK to discuss/change.

    Generally: Q: Isn’t it acceptable if once in a while somebody would look up a word in case he/she doesn’t remember the exact meaning or definition of a word and if he/she is really interested in? Even including verbs & adjectives it’s impossible to ensure that really everybody will be aware of the exact meaning of every single word of the list. Quality of words is subjectively driven topic – Example: I would have said Akazie is less risky to spell incorrectly compared to e.g. Pyjama from your list. But generally it’s true, your list consits less uncommon words. Therefore the proposal was and is to highlight our “no-go-words” and try to replace them. Thanks to your @b068931cc450442b63f5b3d276ea4297 ’s supplementary words we even have some more backup words to work with. I will update the backup-words-list soon. Actually 3 of your mentioned words are part of the top ten worst words in the current proposal, based on my subjective perspective, as well. 210419-10 worst words in german wordlist.txt But if there are too many more, yes, we might have to think about adjectives & verbs, yes.

    Proposal: @b068931cc450442b63f5b3d276ea4297 Could you and other imagine that we keep going through the critical words step by step, filter the really inacceptable ones and try to replace them with better ones from your list + my backup list?

  66. b068931cc450442b63f5b3d276ea4297 commented at 7:23 pm on April 20, 2021: none

    Somebody reconstructing a partially destroyed wallet will appreciate less choices of word categories once it comes to guessing hard to read words (e.g. from housefire etc.). Not a superimportant advantage, true, but mentioned anyway.

    No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

    Well, simply doing the same thing would mean an enormous amount of levenshtein errors (English wordlist) or unintended 9 letters per word (Italian wordlist) etc., so you probably mean to focus on the positive progress of other wordlists. It was the need of the people that the authors of the BIP39 mnemonic seed had in mind when designing this solution rather than accepting current realities of that time. I’m therefore convinced (but in the end it’s their decision) that the authors value the peoples cultural background in orthography (like writing nouns in “capital latin letters”) more than simply following given structures from other language lists that accidentally doesn’t have such a background (Latin languages & English use “lowercase latin characters” for all). Asian wordlists for example deviate from “lowercase latin characters” for exact that reason.

    Q: Is there any advantage for people (people, not for the IT behind that can handle capital letters) in the German language area to write words in the more uncommon “all lowercase” that we might haven’t taken into consideration yet?

    According to my understanding, the advantage is that you don’t have to worry about upper and lower case and therefore write everything in lower case in such lists. This is the case with the other bip39 lists, with diceware lists like https://www.eff.org/deeplinks/2016/07/new-wordlists-random-passphrases and many other projects.

    Q: Do you require to have adjectives and/or verbs in the list or is it because we might not find sufficient easy nouns? I’m asking because I’m generally open to adjectives & verbs (were included in first two proposals 2 years ago), just had the impression it has quite some advantages for the community / users to go for nouns only. And particularly when taking levenshtein into consideration many verbs fall apart anyway. Example: “leben”: kleben, loben, heben, weben, geben, beben, Leber, Segen, etc

    Both, if we take them on the lists corresponds to the convetions of the word choice of the other languages and it increases the pool from which we can use words that most 12 year olds know.

    Proposal: We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

    Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

  67. nisc approved
  68. luke-jr commented at 7:35 pm on April 25, 2021: member
  69. luke-jr added the label Proposed BIP modification on Apr 25, 2021
  70. b068931cc450442b63f5b3d276ea4297 commented at 7:50 pm on April 25, 2021: none
    Neither the list, nor the discussion about it is closed from my point of view. If this list get merged in this form, it would be a missed opportunity.
  71. SebastianFloKa commented at 8:45 pm on April 25, 2021: none

    thanks @luke-jr and other BIP39 authors + responsibles and “welcome” @b068931cc450442b63f5b3d276ea4297 no worries, “proposed BIP modification” doesn’t mean it’s merged.

    No, the word category does not play a role but only the word list (incl. used characters and the length of the words).

    Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn’t have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it’s easier for him to search for a noun only instead of nouns, verbs and adjectives. It’s not a must have or the most important feature, but a small advantage.

    Do you use https://bip39validator.readthedocs.io/en/latest/running.html for the tests?

    no, running my own - but this might be good to work with.

    We go through the list like this and comment behind it those we consider critical/inappropriate and add adjectives and advertisements. From this we then select the best, write everything in lower case and are done?

    What do you mean with advertisement? Generally OK to go through the list and select inappropriate words, of course. For lower case I’m personally not convinced yet, not sure about the others. It feels very strange for people from german language area to write nouns in lower case plus the other reasons (people write more legible in all caps etc.) - also will this later be part of the BIP39 authors decision as well. I’m fine to continue step by step (as we do since years now), just let me replace the 10 above mentioned words with other nouns first (need a bit of time) and then go through the list again.

  72. b068931cc450442b63f5b3d276ea4297 commented at 7:34 am on April 29, 2021: none

    Thank you.

    Have you ever tried to recapture a partially destroyed wallet (e.g. from fire) where e.g. the first letter of a word is illegible as well as some at the end or in the center. A normal user doesn’t have a tool to filter for words with certain letters on certain positions. Means the user will have to guess possible words. So it’s easier for him to search for a noun only instead of nouns, verbs and adjectives. It’s not a must have or the most important feature, but a small advantage.

    I haven’t, but whether it’s a noun, verb or adjective doesn’t matter at all. Since it is 1 of 2048 that are in the list.

    Sorry, I meant verbs and wanted to write an example with werben/Werbung (advertisement) first. With your list, I have already submitted as a pull request what I would remove and what I would add if necessary.

    I am currently working on another list, which could help if we want to add verbs and adjectives.

    For me it feels strange to see and write everything in capital letters. Even when we write normally, most of the letters used in any normal sentence are lowercase. The contract with the applications I also find a bit far-fetched, I think every person writes in letters, messengers and everywhere much more lowercase and finds it rather strange when someone with capslock writes everything in capital letters.

  73. Update german.txt
    Eliminated the words with highest complexity and replaced with simpler ones.
    d97608bf86
  74. SebastianFloKa commented at 2:34 pm on May 4, 2021: none

    I haven’t, but whether it’s a noun, verb or adjective doesn’t matter at all. Since it is 1 of 2048 that are in the list.

    Of course is each word in the list 1of 2048, but in my example the “wordpool” for the user is not the list but all words. Let’s have an example: A steelwallet went through housefire, some words are not completely readable anymore, e.g. at one word the second letter is readable as “L”, the third is “A”, the fourth is “T”, the first letter and the ending is unknown (?LATT???). The user has two options: A) Go through the complete list line by line and check if the word might fit. Or the much more realistic scenario B) one will “guees” which word could be meant. In our case the noun “BLATT” might come to your mind and you will check in the wordlist directly under “B” if this is one possible solution. If also verbs & adjectives are included there are more choices to look up and will be more time consuming to figure out which one is intended: “glatt, platt, flattern, etc.”. Again: this is only a minor advantage in favor of “nouns only” supplementary to the other ones mentioned before (so this alone wouldn’t justify to go for nouns-only).

    The expectation of limiting complexity to a certain age (e.g. 12 year-old) sounds nice, but couldn’t find a source for correlation between “age” and “words”, means it will stay our subjective decision which words to accept.

    Having few words being on a 16 year-old basis would statistically result in every once in a while a wallet created could include one or few words that would need to be looked up by the user (in case even is interested in). So far we said this disadvantage is worth all the advantages gained by nouns-only, it makes sense to go through history of this to get an understanding - but if the community disagrees and requests many words to be replaced and not only few I’m open that the list will be reworked accordingly, of course.

    What’s your positions on this? Or do you want a survey? @thomasklemm @TZocker @DivineDominion @nisc @neox5 @rodasmith @b068931cc450442b63f5b3d276ea4297

  75. b068931cc450442b63f5b3d276ea4297 commented at 10:25 am on May 5, 2021: none

    If I can still read “?LATT???” from the letters I open the list with the 2048 words, press Ctrl+F and enter “LATT”. It really doesn’t matter to which word category the word belongs. I don’t have to go through line by line, and even if I do, it’s easier than picking out a much larger number of nouns from the Duden, for example.

    I see no advantages but many disadvantages in choosing a list of nouns only.

    The 12 years was just an example. The simpler and more widespread the words are, the better. You can also look at “basic vocabulary” and “extended basic vocabulary”, just like the linguistic levels A-B.

    So far we said this disadvantage is worth all the advantages gained by nouns-only …

    No I think you are the only one who says/writes that.

  76. nisc commented at 7:10 am on May 6, 2021: none

    If I can still read “?LATT???” from the letters I open the list with the 2048 words, press Ctrl+F and enter “LATT”.

    I think it’s a tough call. Most people today wouldn’t know how BIP39 works and that there’s a pre-defined list of 2048 words, with each word in the 24-word mnemonic representing 11 bits of a 256+8 bit seed (“What is a Bit?”).

    Other people wouldn’t realize that there’s a pattern, i.e., that the seed only includes nouns.

    I slightly prefer the nouns only version. I think more people see the only-nouns pattern than the 264 bits.

    In the end it really doesn’t matter too much, though. If people lose a lot of money, they’ll seek help. Someone will be able to explain it to them.

  77. bitcoin deleted a comment on Jun 13, 2021
  78. bitcoin deleted a comment on Jun 13, 2021
  79. phuong1143 approved
  80. luke-jr closed this on Jul 2, 2021

  81. ngima cross-referenced this on Dec 17, 2021 from issue Add German dictionary by ngima
  82. peterhgruber commented at 7:53 pm on January 8, 2022: none

    thanks for the effort. Two considerations

    1. I strongly advise all lowercase. I understand that german nouns in lowercase might look unfamiliar, but uppercase has distinct and really bad disadvantages. First, words in all caps are much harder to read (as our mind reads more word contours than individual letters) and second from a practical point of view writing all caps e.g. on an iPhone is a hassle.
    2. Is there really such a necessity for excluding words on wordlists in other languages? This leads to choosing “Fotograf” over “Foto” (I assume). If it were the case (as e.g. many wallet apps have no settings for the language), then one would need to be stricter, i.e. excluding all words that have an identical counterpart in any word list when only considering the first four letters (thus excluding the “Fotograf” as well.
  83. joshuakraemer commented at 3:20 pm on October 29, 2022: none
    Thanks as well! I, as a German, would much prefer all uppercase instead of all lowercase. All lowercase doesn’t conform to the rules of orthography, and traditionally uppercase letters are used if only one case is allowed (e.g. in forms or crosswords). Word contours will be wrong with all lowercase, as nouns are normally written with a capital at the beginning. Anyway, in the case of this word list, correctly reading every single letter is probably more important than quickly reading whole words. Maybe all uppercase is even advantageous for this purpose.
  84. PeterTheOne commented at 11:07 am on October 30, 2023: none
    Why exclude Umlaut and ß, they are part of the Language? They could of course be considered equal to their non Umlaut counterparts (äöü -> aou and ß -> ss) as is the case with. See other languages wordlists. It just seems like an arbitrary constraint.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-24 05:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me