Per the request task number 1 in issue #17020, "Read ASNs from a file instead of hardcoding"
The function 'lookup_asn' has been replaced in a drop in manner by a tab separated value (.tsv) file loader class and function set. 2 files were added. 'ansdecode.py' which contains a class that loads a ,tsv file and has member functions that replace existing ASN lookups that are performed by parsing http://www.team-cymru.com/IP-ASN-mapping.html. The second file by default is ip2asn.tsv. This file has the format of rangeStart\trangeEnd\tasnNumber\t...\n only the first 3 columns of the TSV are loaded additional columns can be included by will only make importing less efficient by doing so.
Advantages
- Faster: on an Intel Xeon 2699v4 Linux time was reduced from ~2:40 to ~2:05
- Can be used without an internet connection
- Auditable data format
- Asserts on class load with the ample information to identify broken/corrupt lines in the input file
- IPV4/6 Detection is included and could be used to replace the need for manual definitions in makeseeds.py
Disadvantages
- Uncompressed ASN database unmodified is 26MB - https://iptoasn.com/ - Data is public domain
Notes
- ansdecode.py can be included into the makeseeds.py at the cost of additional file length
- deliberate use of a class to make testing and auditing of IP hex strings to integer possible.
- The TSV file could be converted to a CSV easily
- The TSV file could be downloaded at runtime instead of being included