net: 0.19 seeds update #16999

pull laanwj wants to merge 7 commits into bitcoin:master from laanwj:2019_10_seeds_update changing 3 files +1519 −2502
  1. laanwj commented at 5:31 pm on September 30, 2019: member
    • contrib: Improve makeseeds script
    • net: 0.19 hardcoded seeds update

    Sources:

    Output:

    0Initial: IPv4 418690, IPv6 55861, Onion 2747
    1Skip entries with invalid address: IPv4 418690, IPv6 55861, Onion 2747
    2After removing duplicates: IPv4 409220, IPv6 54028, Onion 2717
    3Skip entries from suspicious hosts: IPv4 409219, IPv6 54028, Onion 2717
    4Enforce minimal number of blocks: IPv4 106719, IPv6 46342, Onion 2621
    5Require service bit 1: IPv4 106384, IPv6 46241, Onion 2542
    6Require minimum uptime: IPv4 5300, IPv6 1153, Onion 201
    7Require a known and recent user agent: IPv4 4642, IPv6 1060, Onion 141
    8Filter out hosts with multiple bitcoin ports: IPv4 4642, IPv6 1060, Onion 141
    9Look up ASNs and limit results, both per ASN and globally: IPv4 464, IPv6 48, Onion 141
    
  2. laanwj commented at 5:32 pm on September 30, 2019: member
    Only 48 IPv6 seeds might be slightly concerning, dunno. It looks like they’re nearly all eliminated in the ASN filer step (introduced in #15840).
  3. DrahtBot added the label P2P on Sep 30, 2019
  4. DrahtBot added the label Scripts and tools on Sep 30, 2019
  5. DrahtBot added the label Validation on Sep 30, 2019
  6. laanwj removed the label Validation on Sep 30, 2019
  7. Sjors commented at 7:36 pm on September 30, 2019: member

    The script output table might be easier to read if you put the numbers first (ideally padded). The suspicious hosts check removes suspiciously few entries :-)

    Can you print a breakdown of the IPv6 ASNs and node count?

  8. fanquake added this to the milestone 0.19.0 on Sep 30, 2019
  9. laanwj commented at 5:19 am on October 1, 2019: member

    The suspicious hosts check removes suspiciously few entries :-)

    Yea, only 16 “suspicious hosts” are defined in the script, which were added a long time ago. TBH I don’t really think it makes much sense to maintain such a list in the repository (would make sense to pass it in as an external file, I think).

    (But, this is not the time for a large discussion about this script, that can be done at any time between releases, the point now is to get a list of nodes that is acceptable for merge and is at least more recent than the current one that was last updated for 0.17)

    Can you print a breakdown of the IPv6 ASNs and node count?

    Will have a look at that later.

  10. laanwj commented at 9:00 am on October 1, 2019: member

    It looks like the problem is not the ASN sifting itself, but the way the limiting works. There used to be a limit of 512 IPv4 peers, and no limit on IPv6 or Tor peers (because there are only so few of the latter). However the PR to include the IPv6 addresses in the ASN sifting also included them in the 512 limit. It’s split more or less according to how many IPv4/IPv6 there are.

    Will make limits per-network instead.

  11. contrib: makeseeds: Improve logging and filtering
    - Change regular expression to cover recent versions, as well as
      subversions with custom uacomment, and improve readability.
    - Vary uptime requirements per network (onions are allowed to have less
      uptime, to make sure we get enough of them)
    - Add deduplication step (to allow simple concatentation of multiple seeds files).
    - Log of number of nodes (per network) after every step.
    301c2b1ab5
  12. contrib: makeseeds: Factor out ASN lookup 3314d87966
  13. contrib: makeseeds: dedup by ip,port
    Handle the multiple ports per IP case (as that's a criterion later).
    c254a9ef69
  14. contrib: makeseeds: Limit per network, instead of total ed76299bea
  15. contrib: makeseeds: More fancy output 801d341f3a
  16. net: 0.19 hardcoded seeds update 3b09f2b9d9
  17. laanwj force-pushed on Oct 1, 2019
  18. laanwj commented at 9:40 am on October 1, 2019: member

    Looks better now:

     0  IPv4   IPv6  Onion Pass                                               
     1418690  55861   2747 Initial
     2418690  55861   2747 Skip entries with invalid address
     3410342  54085   2718 After removing duplicates
     4410341  54085   2718 Skip entries from suspicious hosts
     5107151  46381   2622 Enforce minimal number of blocks
     6106814  46278   2543 Require service bit 1
     7  5381   1166    202 Require minimum uptime
     8  4690   1068    141 Require a known and recent user agent
     9  4655   1062    141 Filter out hosts with multiple bitcoin ports
    10   512    110    141 Look up ASNs and limit results per ASN and per net
    
  19. Sjors commented at 11:48 am on October 1, 2019: member

    A max of 2 seeds per ASN seems a bit strict. E.g. AS33915 contains all Dutch Vodafone mobile (not reachable) and Ziggo cable users (long living IP). It’s fine for IPv4 because we still get the max of 512 for that net.

    Maybe make it 10 for IPv6? The odds of randomly connecting to two nodes on the same ASN still seems negligible.

    Should we bump MIN_BLOCKS as well?

  20. laanwj commented at 12:18 pm on October 1, 2019: member

    Maybe make it 10 for IPv6? The odds of randomly connecting to two nodes on the same ASN still seems negligible Should we bump MIN_BLOCKS as well?

    All good suggestions, but I’d say, post-0.19. I’ve already had to do much more maintenance of this script than intended just before a release.

  21. fanquake requested review from TheBlueMatt on Oct 1, 2019
  22. Sjors commented at 2:44 pm on October 1, 2019: member

    ACK 3b09f2b

    Something to consider when we do improve the script, is that it’s not very deterministic. E.g. I ran the script on sipa’s data from a few days later plus my data (uploaded above), and 400 of the resulting nodes are different. Even though our numbers are similar at each step:

     0  IPv4   IPv6  Onion Pass                                               
     1418727  55869   2746 Initial
     2418727  55869   2746 Skip entries with invalid address
     3410413  54115   2718 After removing duplicates
     4410412  54115   2718 Skip entries from suspicious hosts
     5107223  46411   2622 Enforce minimal number of blocks
     6106885  46308   2543 Require service bit 1
     7  5376   1164    202 Require minimum uptime
     8  4685   1066    141 Require a known and recent user agent
     9  4648   1060    141 Filter out hosts with multiple bitcoin ports
    10   512    116    141 Look up ASNs and limit results per ASN and per net
    

    I tried moving the deduplication step after the uptime and sort steps (commit). I ran it, committed, fetched a slightly newer version of sipa’s seed, and ran it again. Now only 140 nodes are different, which is still not great.

  23. MarcoFalke commented at 3:54 pm on October 1, 2019: member

    Something to consider when we do improve the script, is that it’s not very deterministic.

    See also:

  24. jonasschnelli commented at 4:23 pm on October 1, 2019: contributor
    Looks good. ACK 801d341f3a4b00633aa135407752d21ba868e37b - I just did some random manual checks against my data source (https://bitcointools.jonasschnelli.ch/dnsseed.dump.tar.gz) and it seems like that a lot of the new IPs are not listed in my seeder dump. Though I guess that is okay. Did some manual checks with telnet.
  25. in contrib/seeds/makeseeds.py:180 in 301c2b1ab5 outdated
    176-    # Skip entries with valid address.
    177+    print('Initial: %s' % (ip_stats(ips)), file=sys.stderr)
    178+    # Skip entries with invalid address.
    179     ips = [ip for ip in ips if ip is not None]
    180+    print('Skip entries with invalid address: %s' % (ip_stats(ips)), file=sys.stderr)
    181+    # Skip duplicattes (in case multiple seeds files were concatenated)
    


    jkczyz commented at 6:12 pm on October 1, 2019:
    s/duplicattes/duplicates
  26. in contrib/seeds/makeseeds.py:126 in 3314d87966 outdated
    120@@ -121,6 +121,31 @@ def filtermultiport(ips):
    121         hist[ip['sortkey']].append(ip)
    122     return [value[0] for (key,value) in list(hist.items()) if len(value)==1]
    123 
    124+def lookup_asn(net, ip):
    125+    '''
    126+    Look up the asn for an IP (4 or 6) address by querying cymry.com, or None
    


    jkczyz commented at 6:17 pm on October 1, 2019:

    s/cymry/cymru

    nit: s/asn/ASN

  27. TheBlueMatt commented at 6:40 pm on October 1, 2019: member

    Only looked at the address txt file (I’ll let other folks look at the python changes and the equivalence between the txt file and the compiled-in seeds), and found the following groups with ore than two IPs per ASN. 212.33.204.190:8333 should likely be removed as it is RPKI-Invalid (max-length 20, but announced as a 22), so it isn’t gonna be accessible from a number of networks. The 2002:: IPs are 6to4 and likely also shouldn’t be included (except maybe as the corresponding IPv4 addresses).

     0174 38.143.66.107:8333
     1174 [2001:550:3d05:156::100]:8333
     2174 [2602:ffb6:4:739e:f816:3eff:fe00:c2b3]:8333
     34134 14.18.140.45:5559
     44134 180.97.80.213:8333
     54134 182.150.55.96:8333
     64134 222.186.43.66:8333
     76939 [2001:470:1f1d:61f:cd1a::109]:42434
     86939 [2001:470:5:41e::3001]:8333
     96939 [2002:2f5a:562a::2f5a:562a]:8333
    106939 [2002:b6ff:3dca::b6ff:3dca]:28364
    
  28. jkczyz commented at 8:55 pm on October 1, 2019: contributor
    Reviewed Python changes. Some minor typos and nits but otherwise looks like reasonable improvements.
  29. fanquake added the label Waiting for author on Oct 1, 2019
  30. laanwj commented at 6:42 am on October 2, 2019: member

    I’m not going to fix up comment typos here, feel free to do that later.

    Something to consider when we do improve the script, is that it’s not very deterministic.

    The deduplication step randomizes the input order (by putting it into a dict, which uses hashing with a random seed in python).

    This is intentional, because the input (potentially) consists of concatenated nodes files from different sources. If it only picks the first valid nodes it’d effectively ignore your input.

    Sure, a deterministic shuffling could be done.

    and found the following groups with ore than two IPs per ASN.

    Wait, you’re saying that the two IPs per ASN filtering isn’t only non-deterministic, it doesn’t even work?

  31. contrib: Remove invalid nodes from seeds list 0218171a24
  32. laanwj commented at 6:53 am on October 2, 2019: member

    Removed these in a new commit as suggested by TheBlueMatt:

     0174 38.143.66.107:8333
     1174 [2001:550:3d05:156::100]:8333
     2174 [2602:ffb6:4:739e:f816:3eff:fe00:c2b3]:8333
     34134 14.18.140.45:5559
     44134 180.97.80.213:8333
     54134 182.150.55.96:8333
     64134 222.186.43.66:8333
     76939 [2001:470:1f1d:61f:cd1a::109]:42434
     86939 [2001:470:5:41e::3001]:8333
     96939 [2002:2f5a:562a::2f5a:562a]:8333
    106939 [2002:b6ff:3dca::b6ff:3dca]:28364
    
  33. fanquake removed the label Waiting for author on Oct 2, 2019
  34. Sjors commented at 10:04 am on October 2, 2019: member
    ACK 0218171. I also checked that chainparamsseeds.h is generated from nodes_main.txt. Sounds like we should look at this script a bit more outside release moments :-)
  35. laanwj referenced this in commit 27322cd161 on Oct 2, 2019
  36. laanwj merged this on Oct 2, 2019
  37. laanwj closed this on Oct 2, 2019

  38. laanwj referenced this in commit 95bde34a71 on Nov 3, 2020
  39. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me