contrib: document asmap-tool commands more thoroughly #32110

pull jurraca wants to merge 3 commits into bitcoin:master from jurraca:doc-diff-addrs changing 1 files +71 −0
  1. jurraca commented at 10:20 am on March 21, 2025: contributor

    This README was a little sparse in my opinion, and was missing a mention of the diff-addrs command.

    The README updates add background and examples for each command, split in two sections (encode/decode and diff/diff-addrs). This is intended to help people know how and when to run the commands available in the asmap-tool.py script.

    However, I could use some confirmation on the behavior of the --fill flag. It’s true that files generated with this flag set cannot be used to diff files after the fact, but i don’t quite follow what the fill flag does to make that true. sipa could you maybe provide some insight?

  2. DrahtBot commented at 10:20 am on March 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32110.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK fjahr, brunoerg, laanwj
    Stale ACK sipa

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Scripts and tools on Mar 21, 2025
  4. laanwj added the label Docs on Mar 21, 2025
  5. sipa commented at 7:44 pm on March 21, 2025: member

    @jurraca

    The asmap binary format effectively just encodes a mapping from ip ranges to AS numbers by encoding them as a sequence of match instructions, conditional jumps, and return values.

    A whole lot of ranges are unmapped, however, i.e., there just is no AS number for them. What the --fill option does is allow the encoder to map these unmapped ranges to anything, picking whatever is most efficient to encode. For example, if 123.45.0.0/24 and 123.45.2.0/23 map to AS12345, but 123.45.1.0/24 is unmapped, it may turn the whole thing into 123.45.0.0/22 AS12345, because it can be done by just looking at the top 22 bits, rather than the top 23 or 24 bits.

    Doing this results in a non-trivial reduction of the size of the asmap file, as there are often multiple mappings to the same AS number in a small range, with some unmapped holes in between.

    However, it also has a downside for comparability. Because the fillings are chosen in function on whatever just makes the encoding simplest, small changes in the (actual) mapped ranges may translate occasionally to big changes in what the unmapped ranges get assigned. So looking at the diff between two filled asmap files, without ignoring the unmapped ranges in them, will give a distorted view.

  6. jurraca force-pushed on Mar 21, 2025
  7. jurraca commented at 8:53 pm on March 21, 2025: contributor
    thanks for the explanation! Seems like the tradeoff could be a bit dangerous if there are a lot of unmapped ranges. Updated the --fill section to reflect the tradeoff a bit better.
  8. yancyribbens commented at 5:26 pm on March 23, 2025: contributor
    This was pretty interesting to read more about. It looks like ASmap can provide protection from certain types of threat actors, such as those that can allocate a large number of addresses from a single RIR effectively creating a type of Sybil attack. However, it seems to me there’s another attack vector, such as a botnet that infects numerous systems across many RIR’s that this wouldn’t be effective against.
  9. in contrib/asmap/README.md:17 in 1ed54152a2 outdated
     8@@ -9,4 +9,67 @@ Example usage:
     9 python3 asmap-tool.py encode /path/to/input.file /path/to/output.file
    10 python3 asmap-tool.py decode /path/to/input.file /path/to/output.file
    11 python3 asmap-tool.py diff /path/to/first.file /path/to/second.file
    12+python3 asmap-tool.py diff-addrs /path/to/first.file /path/to/second.file addrs.file
    13+```
    14+
    15+### Encoding and Decoding
    16+
    17+ASmap file are somewhat large (~30M) in text form, and should be encoded
    


    sipa commented at 5:55 pm on March 23, 2025:
    Typo: files are, or a file is
  10. in contrib/asmap/README.md:24 in 1ed54152a2 outdated
    19+
    20+The `encode` command takes an ASmap and an output file.
    21+
    22+The `--fill`/`-f` flag further reduces the size of the output file
    23+by assuming an AS assignment for an unmapped network if an adjacent network is assigned.
    24+This procedure is lossy, in the sense that we trade network-to-AS accuracy
    


    sipa commented at 5:57 pm on March 23, 2025:
    Instead of accuracy, maybe say “This procedure is lossy because it loses information about which ranges were unassigned”? Arguably, no accuracy is lost, because all real IP addresses are still mapped to their correct AS.
  11. in contrib/asmap/README.md:27 in 1ed54152a2 outdated
    22+The `--fill`/`-f` flag further reduces the size of the output file
    23+by assuming an AS assignment for an unmapped network if an adjacent network is assigned.
    24+This procedure is lossy, in the sense that we trade network-to-AS accuracy
    25+for a file size reduction. If many ranges in an ASmap are unmapped,
    26+this can potentially result in many arbitrary reassignments,
    27+defeating the purpose of providing an accurate ASmap.
    


    sipa commented at 5:57 pm on March 23, 2025:

    I don’t think it defeats the purpose. It’s just removing information one normally does not care about.

    I’d just say “However, it does interfere with the ability to compute meaningful diffs”.


    jurraca commented at 4:43 pm on March 24, 2025:
    I mean specifically in the case where many ranges are unmapped by accident, not just ranges we expect to be unmapped. For example, if your ASmap was generated incorrectly and only has half the ranges it should, then --fill would be reassigning many ranges to an incorrect AS, right?

    sipa commented at 4:44 pm on March 24, 2025:
    Well, yes, but then the asmap file would already be incorrect before filling too.

    jurraca commented at 5:12 pm on March 24, 2025:

    the impact on the node will be different though, if a peer is unmapped, it will just be another peer to use, if it is mapped incorrectly, it may be rejected when it was fine to connect to. I’d have to run tests to get an idea of impact, but mostly it seems like a potential “footgun” for users.

    How about fa52e60 ?


    sipa commented at 9:49 pm on March 24, 2025:
    Well if the asmap is reasonably up to date, I suspect there will basically be ~no peers in unmapped portions. And if there are, if unfilled it’ll use /16 grouping, and if filled it’ll use an adjacent ASN. I couldn’t say which would be better in general among those.
  12. in contrib/asmap/asmap-tool.py:94 in 1ed54152a2 outdated
    90@@ -91,15 +91,15 @@ def main():
    91 
    92     parser_encode = subparsers.add_parser("encode", help="convert asmap data to binary format")
    93     parser_encode.add_argument('-f', '--fill', dest="fill", default=False, action="store_true",
    94-                               help="permit reassigning undefined network ranges arbitrarily to reduce size")
    95+                               help="permit reassigning undefined network ranges arbitrarily to reduce size. The decoded version of a file created with this flag cannot be used for diffs.")
    


    sipa commented at 5:59 pm on March 23, 2025:
    Sure it can be, it’s just less meaningful.

    jurraca commented at 4:44 pm on March 24, 2025:
    yea i’ll revert this change, don’t think it’s relevant here after all.
  13. in contrib/asmap/asmap-tool.py:102 in 1ed54152a2 outdated
     99                                help="output binary asmap file; default is stdout")
    100 
    101     parser_decode = subparsers.add_parser("decode", help="convert asmap data to text format")
    102     parser_decode.add_argument('-f', '--fill', dest="fill", default=False, action="store_true",
    103-                               help="permit reassigning undefined network ranges arbitrarily to reduce length")
    104+                               help="permit reassigning undefined network ranges arbitrarily to reduce length. The resulting file cannot be used for diffs.")
    


    sipa commented at 5:59 pm on March 23, 2025:
    Same here.
  14. sipa commented at 6:02 pm on March 23, 2025: member

    @yancyribbens

    However, it seems to me there’s another attack vector, such as a botnet that infects numerous systems across many RIR’s that this wouldn’t be effective against.

    Of course, but the theory is that getting access to many reachable IP addresses in many different network is far more expensive than in one or a few.

  15. jurraca force-pushed on Mar 24, 2025
  16. sipa commented at 7:55 pm on March 24, 2025: member
    ACK fa52e606de7a12921619b56a631cd194e2366cc1
  17. brunoerg commented at 1:58 pm on March 25, 2025: contributor
    Concept ACK
  18. in contrib/asmap/README.md:48 in 75b145eef5 outdated
    15+### Comparing ASmaps
    16+
    17+AS control of IP networks changes frequently, therefore it can be useful to get
    18+the changes between to ASmaps via the `diff` and `diff_addrs` commands.
    19+
    20+`diff` takes two ASmap files, and returns a file detailing the changes
    


    brunoerg commented at 2:01 pm on March 25, 2025:
    75b145eef57053f24bc58531163408271d17eb59: nit: I think it would be great to mention that diff may take a while to execute. In my last try it took around 2 minutes and doesn’t show anything until the end.
  19. in contrib/asmap/README.md:51 in 75b145eef5 outdated
    46+```
    47+bitcoin-cli getnodeaddresses 0 > addrs.json
    48+```
    49+or pipe in the address data via stdin:
    50+```
    51+python3 asmap-tool.py diff-addrs path/to/first.file path/to/second.file < $(bitcoin-cli getnodeaddresses 0)
    


    brunoerg commented at 2:03 pm on March 25, 2025:
    75b145eef57053f24bc58531163408271d17eb59: It is diff_addrs, not diff-addrs.

    jurraca commented at 2:29 pm on March 25, 2025:
    fixed in a799327
  20. jurraca force-pushed on Mar 25, 2025
  21. sipa commented at 2:37 pm on March 25, 2025: member
    Perhaps it’s worth mentioning that asmap.py runs about 5x faster under pypy3 than under python3?
  22. brunoerg approved
  23. brunoerg commented at 4:27 pm on March 25, 2025: contributor

    ACK d85b980dfbd4eea7ed2968ab5cd0309576fbbcf9

    happy to re-ack in case more things are addressed

  24. DrahtBot requested review from sipa on Mar 25, 2025
  25. jurraca force-pushed on Mar 25, 2025
  26. brunoerg approved
  27. brunoerg commented at 4:49 pm on March 25, 2025: contributor

    reACK 7ed3df06f7d0c18ef070bd04bdae3b5ddacd52fd

    d85b980dfbd4eea7ed2968ab5cd0309576fbbcf9..7ed3df06f7d0c18ef070bd04bdae3b5ddacd52fd only adds an information about pypy3

  28. jurraca commented at 4:50 pm on March 25, 2025: contributor
    added a note about python3 vs pypy3 at the top, and also in the diffsection aff84d1.
  29. sipa commented at 5:15 pm on March 25, 2025: member

    ACK 7ed3df06f7d0c18ef070bd04bdae3b5ddacd52fd

    EDIT: please address comments by @laanwj below

  30. in contrib/asmap/README.md:20 in 7ed3df06f7 outdated
    15+depending on the amount of data involved and your machine specs.
    16+Consider using `pypy3` for a faster run time.
    17+
    18+### Encoding and Decoding
    19+
    20+ASmap files are somewhat large (~30M) in text form, and should be encoded
    


    laanwj commented at 2:02 pm on March 26, 2025:
    nit: 30MB

    jurraca commented at 8:04 pm on March 26, 2025:
  31. in contrib/asmap/README.md:21 in 7ed3df06f7 outdated
    16+Consider using `pypy3` for a faster run time.
    17+
    18+### Encoding and Decoding
    19+
    20+ASmap files are somewhat large (~30M) in text form, and should be encoded
    21+to binary before being used with Bitcoin Core.
    


    laanwj commented at 2:04 pm on March 26, 2025:
    “need to be” instead of “should be” (it’s not only because of the size, AFAIK bitcoin core just can’t parse text files)

    jurraca commented at 8:04 pm on March 26, 2025:
  32. in contrib/asmap/README.md:80 in 7ed3df06f7 outdated
    75+```
    76+bitcoin-cli getnodeaddresses 0 > addrs.json
    77+```
    78+or pipe in the address data via stdin:
    79+```
    80+python3 asmap-tool.py diff_addrs path/to/first.file path/to/second.file < $(bitcoin-cli getnodeaddresses 0)
    


    laanwj commented at 2:26 pm on March 26, 2025:

    i don’t think this command is correct if you want to pipe in via stdin, as $() puts the data on the command line? do you mean

    0bitcoin-cli getnodeaddresses 0 | python3 asmap-tool.py diff_addrs path/to/first.file path/to/second.file
    

    jurraca commented at 8:04 pm on March 26, 2025:
    thanks, I could have sworn i took this from existing documentation but now I can’t find it, and piping in doesn’t work in either case in fact. fixed in d7a3388.
  33. jurraca force-pushed on Mar 26, 2025
  34. fjahr commented at 8:09 pm on March 26, 2025: contributor
    Concept ACK
  35. fjahr commented at 8:14 pm on March 26, 2025: contributor
    @jurraca I think you should remove the tag of sipa in the description (just change to without the @), otherwise github is a bit spammy because the description is included in the merge commit. (Or maybe that is fixed now but that’s something that used to happen)
  36. laanwj approved
  37. laanwj commented at 7:42 am on March 27, 2025: member
    Thanks, LGTM now ACK 704573e016ac40e7cae00a84a9ba532a760e043e
  38. DrahtBot requested review from brunoerg on Mar 27, 2025
  39. DrahtBot requested review from fjahr on Mar 27, 2025
  40. brunoerg approved
  41. brunoerg commented at 12:01 pm on March 27, 2025: contributor
    reACK 704573e016ac40e7cae00a84a9ba532a760e043e
  42. in contrib/asmap/README.md:20 in 704573e016 outdated
    15+depending on the amount of data involved and your machine specs.
    16+Consider using `pypy3` for a faster run time.
    17+
    18+### Encoding and Decoding
    19+
    20+ASmap files are somewhat large (~30MB) in text form, and need to be encoded
    


    fjahr commented at 3:04 pm on March 27, 2025:
    nit: Well, the ones we make with kartograf are that large. If someone else makes one based on their personal BGP dump or they don’t use all the sources in kartograf they may be smaller. But it’s not a big deal, the point is that people understand that we want to reduce the size. I would have just left the specific number out. But I would say you leave as is unless you have to retouch.

    jurraca commented at 7:51 pm on March 27, 2025:
    yea fair – mentioning the data size doesn’t make sense. removed in 6afffba
  43. in contrib/asmap/README.md:64 in 704573e016 outdated
    59+# 220.157.65.0/24 was AS9723
    60+216.151.172.0/23 AS400080 # was unassigned
    61+2001:470:49::/48 AS20205 # was AS6939
    62+# 2001:678:bd0::/48 was AS207631
    63+2001:67c:308::/48 AS26496 # was unassigned
    64+```
    


    fjahr commented at 3:19 pm on March 27, 2025:
    nit: I find the placement of this example block a bit weird or I misunderstand it’s purpose. Unless there is something I overlooked I think it might make more sense if it was right below the bullet list of the different output states, with a statement on the top saying something like “Part of an output file could look like this for example:”. Right now, because the --ignore-unassigned part is right above it, I thought the block would demonstrate something about this flag but the block includes such entries and I think it doesn’t require an additional example. So I would suggest either moving the block up above --ignore-unassigned or possibly it could also be removed (because the bullet list is already explaining it weel). But may also be ignored, especially if I missed something about the intention of it.

    jurraca commented at 7:51 pm on March 27, 2025:
    good point thanks! fixed in 67d5cc2
  44. fjahr commented at 3:20 pm on March 27, 2025: contributor

    Code review ACK 704573e016ac40e7cae00a84a9ba532a760e043e

    My comments are not critical but I can quickly re-review if you decide to address them.

  45. contrib: (asmap) add diff-addrs example to README
    this command exists and works as expected.
    e047b1deca
  46. contrib: (asmap) add documentation on diff and diff-addrs commands 67d5cc2a06
  47. contrib: (asmap) add docs about encode and decode commands 6afffba34e
  48. jurraca force-pushed on Mar 27, 2025
  49. fjahr commented at 7:58 pm on March 27, 2025: contributor

    re-ACK 6afffba34e086de4cf0bb86729e12d116c1dcc9b

    Just addressed my comments.

  50. DrahtBot requested review from laanwj on Mar 27, 2025
  51. DrahtBot requested review from brunoerg on Mar 27, 2025
  52. brunoerg approved
  53. brunoerg commented at 9:43 pm on March 27, 2025: contributor
    reACK 6afffba34e086de4cf0bb86729e12d116c1dcc9b
  54. laanwj approved
  55. laanwj commented at 11:54 am on March 28, 2025: member
    re-ACK 6afffba34e086de4cf0bb86729e12d116c1dcc9b

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-28 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me