These are a some of the restrictions we have followed in order to create the Spanish wordlist:
- It is possible to uniquely identify a word just typing the first four or less characters of the word.
- Spanish words have special characters (such as á, é, ü, ñ, etc.) not available in every keyboard. It is possible to type 'a' instead of 'á' or 'n' instead of 'ñ', etc. while still being able to identify the word with the first 4 characteres. This means that, for example, 'caña' and 'canal' cannot be both in the wordlist.
- There are some combinations of letters that sound similar. There cannot be two words that only differenciate on those combinations of letters. For example, 'hola' and 'ola' cannot be both in the wordlist. 'quiosco' and 'kiosko' cannot be both in the wordlist.
- Only standard form of verbs (infinitive), nouns and adjetives are in the list.
- The more common words are preferred over less common words.
- Shorter, simpler, words are preferred over longer words. For example 'libro' is prefered over 'librería'
There are more restrictions. The full set of restrictions is defined in python here: https://github.com/Y75QMO/Tests-for-BIP39-Spanish-wordlist.
We have scanned using software more than 600,000 words and manually reviewed one by one more than 40,000 words. Choosing 2,048 and following all the rules is not an easy task and more than one list is possible. Any help or review of the wordlist is appreciated.