net: improve and address issues in makeseeds.py #24818

pull RF5 wants to merge 1 commits into bitcoin:master from RF5:master changing 1 files +53 −30
  1. RF5 commented at 10:50 am on April 10, 2022: contributor

    This PR attempts to address some of the areas of improvement raised in #17020 . Concretely, my proposed change is fairly minor but addresses the following changes to makeseeds.py:

    • Increase max seeds per ASN for IPv6 to 10 as recommended here, while keeping max seeds per ASN for IPv4 at 2.
    • Bump MIN_BLOCKS to 730000.
    • Improved script clarity: added function types and more docs to functions, added progress indicator when performing ASN lookup, and change string formatting to better align with bitcoin python style guidelines

    With the different ASN limits for IPv4 and IPv6, and the new minimum block requirement, the current stats look look like:

     0  IPv4   IPv6  Onion Pass
     1470689  73238      0 Initial
     2470689  73238      0 Skip entries with invalid address
     3470689  73238      0 After removing duplicates
     4470688  73238      0 Skip entries from suspicious hosts
     5  6098   1676      0 Enforce minimal number of blocks
     6  5252   1443      0 Require service bit 1
     7  3812    898      0 Require minimum uptime
     8  3738    877      0 Require a known and recent user agent
     9  3715    869      0 Filter out hosts with multiple bitcoin ports
    10   512    512      0 Look up ASNs and limit results per ASN and per net
    

    The new ASN max seeds of 10 allows for 512 IPv6 addresses to be included, up from the ~150 that was filtered by the previous version.

    While there is more to do for #17020 , these changes I think are fairly isolated from the rest and should make it a bit easier for others to get up to speed with what the functions in the script do.

  2. fanquake added the label P2P on Apr 10, 2022
  3. fanquake added the label Scripts and tools on Apr 10, 2022
  4. DrahtBot commented at 9:57 pm on April 11, 2022: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24824 (net: create IP to ASN database from file - makeseeds.py by russeree)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. laanwj commented at 8:13 pm on April 13, 2022: member
    Thanks for working on this. Concept ACK Code review ACK 30aa28ec962ee585b4fa39650354a4944cc69a9c (I’m not sure about some of the mypy annotations see below) The progress indicator is neat!
  6. in contrib/seeds/makeseeds.py:118 in 30aa28ec96 outdated
    112@@ -107,25 +113,26 @@ def parseline(line):
    113         'sortkey': sortkey,
    114     }
    115 
    116-def dedup(ips):
    117-    '''deduplicate by address,port'''
    118+def dedup(ips: List[Dict]) -> List[Dict]:
    


    laanwj commented at 8:20 pm on April 13, 2022:
    Hrm I don’t think the types are correct here. ips is a list of (ip, port) tuples, not dicts. Was this validated using mypy?

    RF5 commented at 8:59 am on April 14, 2022:

    Thanks for the mypy recommendation. I did some more tests (with mypy and good old print statements) and ips in dedup is indeed a list of dictionaries – of the same format that parseline() produces as output. I do believe that the typing for ips is correct here.

    However, testing with mypy did bring up a few other places where I forgot to indicate possible None return types. See the commit below the fixes it.

    Thank you for your time reviewing this :)


    laanwj commented at 6:05 pm on April 14, 2022:
    Indeed, seems I was wrong about the input! But the return type of dedup is a list of tuples, right?

    RF5 commented at 6:17 pm on April 14, 2022:

    I don’t believe so, each value in the list of returned values from dedup is one of the dictionaries from the list ips. Testing with mypy and print statements also seems to confirm that the returned format is the same as the input – a list of dicts.

    But if you think I am confused or have royally stuffed up my tests, please let me know so I can hopefully fix things.


    laanwj commented at 6:25 pm on April 14, 2022:
    Oh, you’re right. It returns d.values which are the values, not the items or keys of the dict.
  7. laanwj commented at 6:25 pm on April 14, 2022: member
    ACK after squashing commits.
  8. improve clarity and up max ipv6 ASNs c457fb144c
  9. RF5 commented at 9:27 pm on April 14, 2022: contributor
    Squashed, thanks again for the help reviewing.
  10. laanwj commented at 8:30 am on April 15, 2022: member
    Concept and code review ACK c457fb144cc3f76db46d8167744f3865af371ed7
  11. laanwj merged this on Apr 15, 2022
  12. laanwj closed this on Apr 15, 2022

  13. jonatack commented at 2:23 pm on April 15, 2022: member
    Post-merge ACK, modulo suggestions in #24863.
  14. sidhujag referenced this in commit aca9b4db3b on Apr 18, 2022
  15. Fabcien referenced this in commit b0a0bdf184 on Oct 5, 2022
  16. DrahtBot locked this on Apr 15, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me