How the wordlist is produced:
- Download and uncompress https://dumps.wikimedia.org/idwiki/latest/idwiki-latest-pages-articles.xml.bz2 (the version used when producing the wordlist is 2017-121-30).
- Count the words in all the articles inside articles.xml using this script https://github.com/perlancar/perl-WordLists-ID-Common/blob/master/devscripts/count-words-in-mediawiki-articles . The result is https://raw.githubusercontent.com/perlancar/perl-WordLists-ID-Common/master/devdata/words.txt .
- Curate the words manually (mostly removing non-Indonesian words). The result is https://raw.githubusercontent.com/perlancar/perl-WordLists-ID-Common/master/devdata/words-curated.txt . You can diff this two wordlist text file to see the difference.
- Generate the BIP39 Indonesian wordlist using this script https://github.com/perlancar/perl-WordList-ID-BIP39/blob/master/devscripts/gen-wordlist . This script basically picks the most frequent words in words-curated.txt that are not already in the English, Spanish, French, and Italian BIP39 wordlist.