contrib: Bugfix for checking bad dns seeds without casting in makeseeds.py #26681

pull yusufsahinhamza wants to merge 1 commits into bitcoin:master from yusufsahinhamza:fix-makeseeds-bug changing 1 files +6 āˆ’3
  1. yusufsahinhamza commented at 4:50 pm on December 10, 2022: contributor
    • Since seed lines comes with str type, comparing good column directly with 0 (int type) in the if statement was not working at all. This is fixed by casting int type to the values in the good column of seeds text file.
    • Lines that starts with comment in the seeds text file are now ignored.
    • If statement for checking bad seeds are moved to the top of the parseline function as if a seed is bad; there is no point of going forward from there.

    Since this bug-fix eliminates bad seeds over 550k in the first place, in my case; particular job for parsing all seeds speed is up by 600% and whole script’s speed is up by %30.

    Note that stats in the terminal are not going to include bad seeds after this fix, which would be the same if this bug were never there before.

  2. Fix checking bad dns seeds without casting
    Since seed lines comes with 'str' type, comparing it directly with 0
    ('int' type) in the if statement was not working at all. This is fixed
    by casting 'int' type to the values in the 'good' column of seeds text file.
    
    Lines that starts with comment in the seeds text file are now ignored.
    
    If statement for checking bad seeds are moved to the top of the 'parseline'
    function as if seed is bad, there is no point of going forward from there.
    3cc989da5c
  3. DrahtBot commented at 4:50 pm on December 10, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK achow101, jonatack

    If your review is incorrectly listed, please react with šŸ‘Ž to this comment and the bot will ignore it on the next update.

  4. yusufsahinhamza renamed this:
    net, contrib: Fix checking bad dns seeds without casting
    net, contrib: Fix checking bad dns seeds without casting in makeseeds.py
    on Dec 11, 2022
  5. yusufsahinhamza commented at 12:13 pm on December 11, 2022: contributor
    If we still want to see the number of total seeds on the terminal, we can simply print “lines length -1 (comment for the column names)”. And it could look like that: Loading and parsing DNS seedsā€¦Done, 559142 seeds loaded.
  6. mzumsande commented at 2:47 pm on December 11, 2022: contributor
    (sorry, got confused, pls ignore)
  7. yusufsahinhamza renamed this:
    net, contrib: Fix checking bad dns seeds without casting in makeseeds.py
    net, contrib: Bugfix for checking bad dns seeds without casting in `makeseeds.py`
    on Dec 26, 2022
  8. achow101 commented at 7:33 pm on April 19, 2023: member
    ACK 3cc989da5c750e740705131bed05bbf93bfdf169
  9. in contrib/seeds/makeseeds.py:49 in 3cc989da5c
    45@@ -46,10 +46,16 @@ def parseline(line: str) -> Union[dict, None]:
    46     """ Parses a line from `seeds_main.txt` into a dictionary of details for that line.
    47     or `None`, if the line could not be parsed.
    48     """
    49+    if line.startswith('#'):
    


    jonatack commented at 9:40 pm on April 19, 2023:

    It looks like this will only be true for the first line of the file (see https://github.com/sipa/bitcoin-seeder/blob/master/main.cpp#L372), unless a user manually adds # characters to the start of lines.

    0     if line.startswith('#'):
    1         # Ignore line that starts with comment
    2+        print(line.split(), file=sys.stderr)
    3         return None
    
     0Loading asmap database "asmap-filled.dat"ā€¦Done.
     1Loading and parsing DNS seedsā€¦['#', 'address', 'good', 'lastSuccess', '%(2h)', '%(8h)', '%(1d)', '%(7d)', '%(30d)', 'blocks', 'svcs', 'version']
     2Done.
     3  IPv4   IPv6  Onion Pass                                               
     4  3981   1114      0 Initial
     5  3981   1114      0 Skip entries with invalid address
     6  3981   1114      0 After removing duplicates
     7  3955   1108      0 Enforce minimal number of blocks
     8  3955   1108      0 Require service bit 1
     9  2805    799      0 Require minimum uptime
    10  1923    579      0 Require a known and recent user agent
    11  1923    579      0 Filter out hosts with multiple bitcoin ports
    12   512    170      0 Look up ASNs and limit results per ASN and per net
    

    fanquake commented at 9:46 pm on April 19, 2023:

    It looks like this will never be true, unless a user manually adds # characters

    The first line in seeds_main.txt is # addresses, so it’s going to be hit at least once.


    achow101 commented at 9:47 pm on April 19, 2023:
    It catches the first line for me.

    jonatack commented at 9:48 pm on April 19, 2023:
    Yes. Updated my comment to mention the first line and result.

    jonatack commented at 9:52 pm on April 19, 2023:

    It may make more sense to leave out adding the ‘#’ conditional (it doesn’t change the result) and instead do this (the cast to int doesn’t work on “address”):

    0     # Skip bad results.
    1-    if int(sline[1]) == 0:
    2+    if sline[1] == '0':
    3         return None
    

    the final commit would be the following:

     0--- a/contrib/seeds/makeseeds.py
     1+++ b/contrib/seeds/makeseeds.py
     2@@ -51,6 +51,9 @@ def parseline(line: str) -> Union[dict, None]:
     3     if len(sline) < 11:
     4         # line too short to be valid, skip it.
     5         return None
     6+    # Skip bad results.
     7+    if sline[1] == '0':
     8+        return None
     9     m = PATTERN_IPV4.match(sline[0])
    10     sortkey = None
    11     ip = None
    12@@ -84,9 +87,6 @@ def parseline(line: str) -> Union[dict, None]:
    13         sortkey = ip
    14         ipstr = m.group(1)
    15         port = int(m.group(6))
    16-    # Skip bad results.
    17-    if sline[1] == 0:
    18-        return None
    19     # Extract uptime %.
    20     uptime30 = float(sline[7][:-1])
    21     # Extract Unix timestamp of last success.
    

    yusufsahinhamza commented at 11:42 pm on April 19, 2023:
    Yes, i’ve meant to ignore the first line (the comment for column names) in seeds_main.txt by putting the "#" conditional. @jonatack I’ve thought about comparing it in str but then decided to cast with int because, seems like the good column meant to be 0 and 1 which is integer, so i rather use casting and have the "#" conditional.

    jonatack commented at 11:49 pm on April 19, 2023:
    @yusufsahinhamza I cherry-picked your commit here as-is into #27488 to start using it for generating the hardcoded seeds in the v25 release.
  10. in contrib/seeds/makeseeds.py:57 in 3cc989da5c
    52     sline = line.split()
    53     if len(sline) < 11:
    54         # line too short to be valid, skip it.
    55         return None
    56+    # Skip bad results.
    57+    if int(sline[1]) == 0:
    


    jonatack commented at 9:45 pm on April 19, 2023:

    Verified that this fix works with the following diff:

    0     # Skip bad results.
    1     if int(sline[1]) == 0:
    2+        # print(line.split(), file=sys.stderr)
    3         return None
    

    The result of running the script:

    before

     0Loading asmap database "asmap-filled.dat"ā€¦Done.
     1Loading and parsing DNS seedsā€¦Done.
     2  IPv4   IPv6  Onion Pass                                               
     3 99253  23129      0 Initial
     4 99253  23129      0 Skip entries with invalid address
     5 99253  23129      0 Remove duplicates
     6  8850   2230      0 Enforce minimal number of blocks
     7  7608   1800      0 Require service bit 1
     8  2933    832      0 Require minimum uptime
     9  2884    816      0 Require a known and recent user agent
    10  2860    814      0 Filter out hosts with multiple bitcoin ports
    11   512    299      0 Look up ASNs and limit results per ASN and per net
    

    after

     0Loading asmap database "asmap-filled.dat"ā€¦Done.
     1Loading and parsing DNS seedsā€¦Done.
     2  IPv4   IPv6  Onion Pass                                               
     3  3981   1114      0 Initial
     4  3981   1114      0 Skip entries with invalid address
     5  3981   1114      0 After removing duplicates
     6  3955   1108      0 Enforce minimal number of blocks
     7  3955   1108      0 Require service bit 1
     8  2805    799      0 Require minimum uptime
     9  1923    579      0 Require a known and recent user agent
    10  1923    579      0 Filter out hosts with multiple bitcoin ports
    11   512    170      0 Look up ASNs and limit results per ASN and per net
    
  11. jonatack approved
  12. jonatack commented at 9:47 pm on April 19, 2023: contributor

    ACK 3cc989da5c750e740705131bed05bbf93bfdf169

    Edit: modulo #26681 (review)

  13. fanquake renamed this:
    net, contrib: Bugfix for checking bad dns seeds without casting in `makeseeds.py`
    contrib: Bugfix for checking bad dns seeds without casting in `makeseeds.py`
    on Apr 20, 2023
  14. DrahtBot added the label Scripts and tools on Apr 20, 2023
  15. fanquake merged this on Apr 20, 2023
  16. fanquake closed this on Apr 20, 2023

  17. bitcoin deleted a comment on Apr 20, 2023
  18. sidhujag referenced this in commit 31b2aa23ce on Apr 21, 2023
  19. bitcoin locked this on Apr 19, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-30 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me