Embed default ASMap as binary dump header file #28792

pull fjahr wants to merge 13 commits into bitcoin:master from fjahr:2023-10-asmap-in-source changing 32 files +430 −189
  1. fjahr commented at 10:13 pm on November 4, 2023: contributor

    This is a step towards making asmap usage more accessible. Currently an asmap file needs to be acquired by there user from some location or the user needs to generate one themselves. Then they need to move the file to the right place in datadir or pass the path to the file as -asmap=PATH in order to use the asmap feature.

    The change here allows for builds to embed asmap data into the bitcoind binary which makes it possible to use the feature without handling of the asmap file by the user. If the user starts bitcoind with -asmap the embedded data will be used for bucketing of nodes.

    The data lives in the repository at src/node/data/ip_asn.dat and can be replaced with a new version at any time. The idea is that the data should be updated with every release. By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=NO). In this case the original behavior of the -asmap option is maintained.

  2. DrahtBot commented at 10:13 pm on November 4, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28792.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa
    Concept ACK Sjors, jonatack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32822 (fuzz: Make process_message(s) more deterministic by maflcko)
    • #32699 (docs: adds correct updated documentation links by Zeegaths)
    • #30595 (kernel: Introduce initial C header API by TheCharlatan)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Nov 4, 2023
  4. fjahr force-pushed on Nov 4, 2023
  5. fjahr force-pushed on Nov 4, 2023
  6. fjahr force-pushed on Nov 4, 2023
  7. sipa commented at 0:24 am on November 5, 2023: member
    Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).
  8. DrahtBot removed the label CI failed on Nov 5, 2023
  9. maflcko commented at 5:32 pm on November 5, 2023: member
    How much data would this add per year to the repo, assuming updates happen?
  10. sipa commented at 5:59 pm on November 5, 2023: member
    @maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%).
  11. luke-jr commented at 4:17 pm on November 8, 2023: member
    Why is the option of loading it from a file even in dev builds, not considered?
  12. fjahr commented at 7:08 pm on November 9, 2023: contributor

    Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

    Yeah, that would work as well. I have drafted that approach here: https://github.com/fjahr/bitcoin/commit/0f9109fc49b8add9c057dcacd537250d4539ae57 Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here.

  13. fjahr commented at 7:11 pm on November 9, 2023: contributor

    Why is the option of loading it from a file even in dev builds, not considered?

    It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: “The alternative approach is not having the data in the source code but only adding it during the build phase of a release.” But this also comes with the aforementioned downsides.

  14. willcl-ark commented at 9:56 am on December 6, 2023: member

    A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the -asmap is not being used? Perhaps this will be handled by the demand paging system? If not, it seems to be a bit of a shame to increase binary size by 20% in all cases.

    It seems to make most sense to me to take the path suggested by @sipa, including the .dat file in the repo, and having it converted to a .h at compile time. It might also be nice to have a configuration flag to disable the embedded header creation for faster builds.

  15. DrahtBot added the label Needs rebase on Dec 14, 2023
  16. fjahr force-pushed on Dec 14, 2023
  17. DrahtBot removed the label Needs rebase on Dec 15, 2023
  18. DrahtBot added the label CI failed on Jan 16, 2024
  19. fjahr renamed this:
    Embedding ASMap files as binary dump header file
    Embed default ASMap as binary dump header file
    on Aug 27, 2024
  20. fjahr force-pushed on Aug 28, 2024
  21. fjahr marked this as ready for review on Aug 28, 2024
  22. fjahr force-pushed on Aug 28, 2024
  23. fjahr commented at 11:48 pm on August 28, 2024: contributor

    I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time.

    I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it’s unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve… And there is also a command for asmap-tool.py to generate the header file.

    I am adding the latest (1724248800) asmap file from asmap-data, the corresponding creation process can be reviewed in the issue and PR.

    I have not added a config option to skip the asmap header file but I will look into that next.

  24. DrahtBot removed the label CI failed on Aug 29, 2024
  25. Sjors commented at 9:13 am on August 29, 2024: member

    I am now using the suggested approach of building the header file at compile time

    I think the suggestion was to not include the header itself in that case.

    the autotools one has existed for a while, I can also remove it if people think it’s unnecessary

    Maybe keep it in until #30664 is merged.

  26. fjahr force-pushed on Aug 29, 2024
  27. fjahr commented at 6:35 pm on August 29, 2024: contributor
    Dropped the autotools build support after the discussion on IRC today (plan is to remove autotools in 24h) and rebased for good measure to have latest CMake fixups included.
  28. in src/init.cpp:1264 in e5a25fd708 outdated
    1265-                InitError(strprintf(_("Could not parse asmap file %s"), fs::quoted(fs::PathToString(asmap_path))));
    1266-                return false;
    1267+
    1268+            if (fs::exists(asmap_path)) {
    1269+                asmap = DecodeAsmap(asmap_path);
    1270+                if (asmap.size() == 0) {
    


    jonatack commented at 6:46 pm on August 29, 2024:

    nit, here and line 1271

    0                if (asmap.empty()) {
    

    fjahr commented at 8:48 pm on August 30, 2024:
    done
  29. fjahr commented at 6:51 pm on August 29, 2024: contributor

    I am now using the suggested approach of building the header file at compile time

    I think the suggestion was to not include the header itself in that case.

    I see. I am indifferent between having them in the repo or not but I am not sure what the arguments are for doing one or the other.

    We currently have two cases that seem comparable where we have the raw files as part of the repo and then make header files out of it in one way or another:

    • The test/data/* files that are built at compile time and the headers are not included in the repo
    • The chainparamsseeds that are built on demand for a release using the python script in contrib. This header file is included in the repo.

    We’ll have the raw file in the repo for sure but I am indifferent between using the build system or the python tool to make the header and I have implemented both which is good for testing anyway, I guess. I am also indifferent between having the header file in the repo or not. Conceptually the chainparamseeds seems closer to what we would do with an embedded asmap file: we basically also refresh it once per (point) release. So I guess that’s why I did it this way, because I was looking at chainparamsees as the reference case.

    I am curious if there are arguments for or against having the header file in the repo that I am currently not seeing. But even if not I am happy to go with what reviewers prefer because I just don’t have a preference right now. Maybe we’ll want to update the chainparamseeds process too if there is an argument for it.

  30. in contrib/asmap/asmap-tool.py:132 in e5a25fd708 outdated
    127@@ -127,6 +128,20 @@ def main():
    128                                    help="address file containing getnodeaddresses output to use in the comparison "
    129                                    "(make sure to set the count parameter to zero to get all node addresses, "
    130                                    "e.g. 'bitcoin-cli getnodeaddresses 0 > addrs.json')")
    131+
    132+    parser_gen_header = subparsers.add_parser("gen_header",
    


    jonatack commented at 7:06 pm on August 29, 2024:

    Might be good to add an example of these new commands and arguments to /contrib/asmap/README.md (and also mention the asmap tooling in /contrib/README.md).

    A higher level doc of overall steps to follow may be helpful, too, unless there is one already that I have forgotten about and overlooked.


    fjahr commented at 8:52 pm on August 30, 2024:

    I have expanded /contrib/asmap/README.md and /contrib/README.md and also added more details to the -asmap help.

    It sounds good to me to add a doc/asmap.md file with a higher level view but I would like to leave it as a follow-up that @jurraca might want to take on soon. He is currently working on getting https://asmap.org in shape and that doc would probably be a condensed version of that.

  31. jonatack commented at 7:08 pm on August 29, 2024: member
    First look at e5a25fd708858204cb6060393dc59d1dfed68acb
  32. Sjors commented at 10:11 am on August 30, 2024: member

    Concept ACK @fjahr if cmake can generate src/init/ip_asn.h from ip_asn.dat and it’s relative quick, then it should be sufficient to only include the .dat file. It saves space in the repo.

    If on the other hand you have to manually call a Python tool to produce the header file, then it would be too much inconvenience for everyone who wants to build Bitcoin Core from source.

    I just tried deleting src/init/ip_asn.h and running cmake again. It seems to take just a fraction of a second, but not sure how to measure that.

    ip_asn.h is also not human readable. It’s not useful to compare the diff between changes. If it were, that could be argument in favor of including it.

    Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?

  33. fjahr force-pushed on Aug 30, 2024
  34. fjahr force-pushed on Aug 30, 2024
  35. DrahtBot added the label CI failed on Aug 30, 2024
  36. DrahtBot commented at 8:54 pm on August 30, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29496076146

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  37. fjahr commented at 9:09 pm on August 30, 2024: contributor

    I have addressed the latest feedback from @Sjors and @jonatack and rebased again because of the CMake situation and because I overhauled a lot of the changes anyway.

    In these latest changes I have

    • Re-organized the commits, I think review should be a bit easier like this
    • Expanded documentation and tests
    • Made it possible to build with the embedded ASMap data or without
    • Removed the header file from the repo
  38. fjahr commented at 9:24 pm on August 30, 2024: contributor

    Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?

    I have expanded the help text for -asmap but I want to expand on it a bit here to explain my reasoning.

    Usage of the embedded asmap data is off by default right now. I am open to throw an on-by-default PR out there as a follow-up but I don’t want that conceptual discussion to hold up review of the implementation PR here. My feeling is that most people will be more comfortable with the v2-transport model: Implement as an off-by-default option for one release and then enable by default in the next version if there are no issues discovered that block it.

    On the -asmap functionality: If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data. I was going back and forth on that last part, i.e. using the embedded data instead of the file in the default location when both are present. But I opted for this way because 1. we are not breaking any workflows that were working before. If someone puts their file in the default location and expects it to be used, it’s still going to be used. And 2. dropping an asmap file in the default location can still be the most comfortable way of updating your asmap data, for example if you want to use the latest file from asmap-data instead of the embedded one.

  39. fjahr force-pushed on Aug 30, 2024
  40. fjahr force-pushed on Sep 1, 2024
  41. fjahr force-pushed on Sep 1, 2024
  42. fjahr force-pushed on Sep 1, 2024
  43. fjahr force-pushed on Sep 1, 2024
  44. Sjors commented at 6:42 am on September 2, 2024: member

    Usage of the embedded asmap data is off by default right now.

    That makes sense.

    If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data.

    I think that’s a good approach.

    CI is unhappy :-(

  45. fjahr commented at 9:31 am on September 2, 2024: contributor

    CI is unhappy :-(

    Yeah, I’ve been trying to fix that but it’s tricky because I can’t reproduce it locally. I guess I need to ask someone with more CMake experience.

  46. in cmake/script/WriteIPASN.cmake:14 in b20f0ef642 outdated
     8+execute_process(
     9+    COMMAND xxd -i "${RAW_FILE}"
    10+    COMMAND sed "s/unsigned char .*/static const unsigned char ip_asn[] = {/"
    11+    COMMAND sed "s/^unsigned int .*_len/static const unsigned int ip_asn_len/"
    12+    OUTPUT_VARIABLE HEX_DUMP
    13+)
    


    maflcko commented at 9:34 am on September 2, 2024:
    missing COMMAND_ERROR_IS_FATAL? Otherwise, a failure is silently ignored, no?

    fjahr commented at 9:53 am on September 2, 2024:
    Added, let’s see how it goes, I didn’t know the function could fail silently.
  47. in cmake/script/WriteIPASN.cmake:5 in b20f0ef642 outdated
    0@@ -0,0 +1,16 @@
    1+file(WRITE "${GENERATED_FILE}" "#ifndef BITCOIN_INIT_IP_ASN_H\n")
    2+file(APPEND "${GENERATED_FILE}" "#define BITCOIN_INIT_IP_ASN_H\n")
    3+file(APPEND "${GENERATED_FILE}" "/**\n")
    4+file(APPEND "${GENERATED_FILE}" " * ASMap data, mapping IP prefixes to ASNs,\n")
    5+file(APPEND "${GENERATED_FILE}" " * AUTOGENERATED by contrib/asmap/asmap-tool.py\n")
    


    maflcko commented at 9:38 am on September 2, 2024:
    Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?

    fjahr commented at 12:04 pm on September 2, 2024:
    Now using the python script from cmake, so marking as resolved.
  48. maflcko commented at 9:38 am on September 2, 2024: member
    I presume the CI fails because xxd is missing?
  49. fjahr force-pushed on Sep 2, 2024
  50. fjahr force-pushed on Sep 2, 2024
  51. fjahr commented at 9:59 am on September 2, 2024: contributor

    I presume the CI fails because xxd is missing?

    Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.

  52. fjahr commented at 10:01 am on September 2, 2024: contributor

    Sounds like it can’t find the raw file…

    0CMake Error at /ci_container_base/cmake/script/WriteIPASN.cmake:8 (execute_process):
    1  execute_process error getting child return code: No such file or directory
    
  53. maflcko commented at 10:02 am on September 2, 2024: member

    Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.

    Are you sure? Note that the CI doesn’t use a desktop VM image, but a minimal docker image. I get:

    0# xxd
    1bash: xxd: command not found
    
  54. fjahr commented at 10:03 am on September 2, 2024: contributor

    Note that the CI doesn’t use a desktop VM image, but a minimal docker image

    Ah, then I guess that was my mistake

  55. fjahr force-pushed on Sep 2, 2024
  56. fjahr force-pushed on Sep 2, 2024
  57. fjahr commented at 12:04 pm on September 2, 2024: contributor

    Looks not using xxd fixed it, thanks @maflcko !

    Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?

    Yeah, I first implemented it in python, then autotools and finally cmake. I kept the python version around because I wasn’t sure which approach we might actually want to go (see #28792 (comment)) and because I thought there might be some possible testing scenarios using it. But it doesn’t matter now since we needed an xxd-free approach to work on all environments and implementing the functionality in cmake directly seemed cumbersome, so I am now just using the python script from cmake as well, leaving us with just the python implementation.

  58. fjahr force-pushed on Sep 2, 2024
  59. in contrib/asmap/asmap-tool.py:225 in 39729a910a outdated
    220+ */
    221+unsigned char ip_asn[] = {{
    222+    {hex_data}
    223+}};
    224+
    225+unsigned int ip_asn_len = {len(data)};
    


    maflcko commented at 12:23 pm on September 2, 2024:
    Using a mutable C-style array seems a bit fragile. IIUC, starting with C++17, using inline constexpr std::array ip_asn{...}; should allow the compiler to optimize the layout, as well as enforce immutability. Moreover, std::array drops the need for ip_asn_len, because the size is embedded in the type of the array. Unrelated, you can change the 0x... above to std::byte{0x...} and then use MakeUCharSpan instead of legacy C-Style reinterpret cast and static cast in the code that uses the array.

    fjahr commented at 3:56 pm on September 2, 2024:
    done
  60. Sjors commented at 1:08 pm on September 2, 2024: member

    so I am now just using the python script from cmake as well, leaving us with just the python implementation

    Mmm, is Python already a build requirement though? Afaik it’s only a test requirement. So if the header can only be generated using a Python script, then that’s an argument for including it in the source code.

  61. maflcko commented at 1:16 pm on September 2, 2024: member

    if the header can only be generated using a Python script, then that’s an argument for including it in the source code.

    It is possible to generate it in native cmake as well. See for example cmake/script/GenerateHeaderFromJson.cmake. Note that previously the “cmake” generation used xxd, which isn’t a build-requirement either.

    Edit: To clarify: I am against reverting to including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

  62. Sjors commented at 1:35 pm on September 2, 2024: member

    The current version of cmake/script/GenerateHeaderFromJson.cmake in this PR calls asmap-tool.py: https://github.com/bitcoin/bitcoin/pull/28792/commits/07f05d5d3c773106fb75a9a25ae970ba0c6cf3a0#diff-cdce11cce5bf019b8a9c82d69718aefd2cca265cbaa45f7e0cebc497f60ddf25R25

    Is there a way to do this without relying either on Python or xxd?

    Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

    Agreed.

  63. fjahr commented at 2:22 pm on September 2, 2024: contributor

    To clarify: I am against including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

    Now I’m confused, the header file isn’t included anymore since a while ago so the data is included only once already, or am I missing something? Are you just saying that shouldn’t be changed back?

  64. fjahr force-pushed on Sep 2, 2024
  65. fjahr commented at 3:57 pm on September 2, 2024: contributor
    Switched back to a native approach in cmake without python or xxd.
  66. fjahr force-pushed on Sep 2, 2024
  67. DrahtBot removed the label CI failed on Sep 2, 2024
  68. in cmake/script/GenerateHeaderAsmap.cmake:6 in b557dc5976 outdated
    0@@ -0,0 +1,33 @@
    1+# Copyright (c) 2023-present The Bitcoin Core developers
    2+# Distributed under the MIT software license, see the accompanying
    3+# file COPYING or https://opensource.org/license/mit/.
    4+
    5+file(READ ${RAW_FILE} hex_content HEX)
    6+string(REGEX MATCHALL "([A-Za-z0-9][A-Za-z0-9])" bytes "${hex_content}")
    


    maflcko commented at 12:57 pm on September 5, 2024:
    Could rebase and use the existing generate_header_from_raw, which should do the same thing, basically?

    fjahr commented at 10:00 am on September 13, 2024:
    Seems like there is still some tinkering going on on that one, I will give it a try when that has settled down.

    fjahr commented at 9:34 am on September 20, 2024:
    I am now using that function. I thought there might be issue with the missing header guards but the generated files are not listed. There is now also no more comment in the generated file.

    maflcko commented at 9:56 am on September 20, 2024:

    missing header guard

    That was somewhat intentional, I guess. A header guard is really only useful for module headers to avoid compile errors if the module is included several times transitively/recursively.

    However, for data-holding headers, the include should only ever happen in a single place in the whole program. Otherwise it seems too easy to bloat the binary. So a compile error in that case is actually intentional and useful.

  69. DrahtBot added the label CI failed on Sep 12, 2024
  70. DrahtBot removed the label CI failed on Sep 15, 2024
  71. DrahtBot added the label Needs rebase on Sep 17, 2024
  72. fjahr force-pushed on Sep 20, 2024
  73. fjahr commented at 9:32 am on September 20, 2024: contributor
    Rebased to incorporate latest relevant changes to the build system and AutoFile. This now also builds on #30901 so please ignore the first commit or leave your review of it in that PR.
  74. DrahtBot removed the label Needs rebase on Sep 20, 2024
  75. fjahr commented at 10:02 pm on September 20, 2024: contributor
    CI failures (timeouts) seem unrelated, should be ready for review as I am hoping #30901 will get merged quickly.
  76. DrahtBot added the label Needs rebase on Nov 6, 2024
  77. fjahr force-pushed on Nov 6, 2024
  78. DrahtBot removed the label Needs rebase on Nov 7, 2024
  79. sipa commented at 2:39 pm on December 2, 2024: member

    Concept ACK.

    It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?

  80. fjahr commented at 2:07 pm on December 13, 2024: contributor

    It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?

    That makes sense and I am giving it a try right now. Will follow-up here when it’s done!

  81. DrahtBot added the label Needs rebase on Dec 17, 2024
  82. fjahr force-pushed on Jan 15, 2025
  83. DrahtBot commented at 8:44 pm on January 15, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35676029008

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  84. DrahtBot added the label CI failed on Jan 15, 2025
  85. fjahr force-pushed on Jan 15, 2025
  86. fjahr force-pushed on Jan 15, 2025
  87. DrahtBot removed the label CI failed on Jan 16, 2025
  88. fjahr commented at 4:13 pm on January 16, 2025: contributor

    Rebased with an updated #30901 and implemented @sipa ’s suggestion here: #28792 (comment) This took me a bit more time than I anticipated and a couple of times I wasn’t sure which approach to go with. This is why I have kept this part as a separate commit for now and I am looking for approach ACKs on that last part in particular first before I squash it into the appropriate previous commits.

    The most interesting pieces IMO:

    • Dual member approach in NetGroupManager with m_asmap and m_loaded_asmap (approach suggested by @sipa )
    • Usage of named constructors
    • Checksum calculation: I wrote a new serializer but didn’t use it because it didn’t feel we should add it just for this. Converting also feels ugly but seems to be the easiest approach. Or maybe I am overlooking something.
  89. in src/util/asmap.cpp:234 in 6cf4930d15 outdated
    233 {
    234-    LogInfo("Opened asmap data (%zu bytes) from embedded byte array\n", data.size());
    235-    return Decode(data);
    236+    HashWriter asmap_hasher;
    237+
    238+    // Asmap used to be a bool vector, so in order to keep the checksum
    


    sipa commented at 9:25 pm on January 16, 2025:
    Do we care about keeping the checksum calculation compatible? For almost all users, this PR being merged will mean a new asmap database anyway, so they’ll incur the price of addrman invalidation that brings anyway.

    fjahr commented at 6:36 pm on January 20, 2025:
    I guess you are right, I probably got a bit too caught up in this thinking that there should be an easier way to keep it consistent. I have removed that backwards compatibility code and updated the test.
  90. DrahtBot removed the label Needs rebase on Jan 17, 2025
  91. fjahr force-pushed on Jan 20, 2025
  92. in src/test/fuzz/asmap.cpp:17 in f2e673f944 outdated
    26-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    27-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    28-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    29-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true,         // Match 0xFF
    30-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true          // Match 0xFF
    31+static const std::vector<std::byte> IPV4_PREFIX_ASMAP = {
    


    sipa commented at 3:28 pm on January 29, 2025:

    This seems to be the prefix it’s trying to match itself, not code for matching it.

    I suggest: static const auto = "fb03ec0fb03fc0fe00fb03ec0fb03fc0fe00fb03ec0fb0fffffeff"_hex_v;


    fjahr commented at 8:29 pm on February 13, 2025:
    done
  93. in contrib/asmap/asmap-tool.py:229 in 97bee93a9c outdated
    224+
    225+        file_content = f"""#include <cstddef>
    226+#include <span>
    227+
    228+namespace node::data {{
    229+inline constexpr std::byte detail_ip_asn_raw[] {{
    


    sipa commented at 3:36 pm on January 29, 2025:
    This could be a lot smaller by using the ""_hex user-defined literal (if you include util/strencodings.h); it’d create an std::array<std::byte, N> instead of a C-array, but the std::span constructor below just work just as well on it.

    fjahr commented at 8:36 pm on February 13, 2025:
    This is neat but I found it to be not as straight forward as it seemed initially. Aside from the additional include I found that I needed to bump template-depth and constexpr-steps in order to make this work at compile time with clang. I am still a bit unclear on what the trade-offs are for this and I will open a separate PR with this so that this discussion doesn’t extend the scope here further.

    sipa commented at 8:51 pm on February 13, 2025:
    Yeah, given that the file isn’t committed its size doesn’t matter that much. And it may be that constexpr evaluating a huge amount of data is non-trivial for the compiler, unsure. This isn’t that important.

    fjahr commented at 8:59 pm on February 13, 2025:
    Opened #31861 so it can be discussed there :)
  94. in contrib/asmap/asmap-tool.py:12 in 97bee93a9c outdated
     8@@ -9,6 +9,7 @@
     9 import json
    10 import math
    11 from collections import defaultdict
    12+from pathlib import Path
    


    sipa commented at 3:39 pm on January 29, 2025:
    This new gen_header command doesn’t seem to be used by the build system; what is its point?

    fjahr commented at 8:29 pm on February 13, 2025:
    It was a left-over of a previous approach and I left it in initially in case it might be interesting for testing, experiments etc. just because it was already there. But the PR has grown in scope a lot and it’s becoming annoying to deal with for me and reviewers, so I dropped it now.
  95. sipa commented at 3:55 pm on January 29, 2025: member

    Concept ACK.

    I’d recommend integrating the last commit into the rest of the PR, making the std::vector<bool> -> std::span<std::byte> for the interpreter change first, and then introducing the embedded asmap data afterwards.

  96. DrahtBot added the label Needs rebase on Feb 12, 2025
  97. fjahr force-pushed on Feb 13, 2025
  98. fjahr commented at 8:37 pm on February 13, 2025: contributor

    Addressed @sipa ’s feedback and rebased on the latest version of #30901.

    The python tool command has been dropped and the commit have been reorganized as well.

  99. DrahtBot removed the label Needs rebase on Feb 13, 2025
  100. sipa commented at 3:24 pm on February 14, 2025: member
    Will review again soon, but a quick question: would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node? For example, to be able to run asmap.py diff on it. It could be an RPC command that just spits out the data in textual format, or a REST endpoint that directly provides it in binary.
  101. DrahtBot added the label Needs rebase on Feb 14, 2025
  102. jonatack commented at 3:58 pm on February 28, 2025: member
    Concept ACK, will review after rebase.
  103. fjahr force-pushed on Mar 3, 2025
  104. DrahtBot removed the label Needs rebase on Mar 3, 2025
  105. fjahr commented at 6:44 pm on March 3, 2025: contributor

    Rebased the previous conflict and #30901 has been merged now as well 🥳

    I also added the -DWITH_EMBEDDED_ASMAP=NO to one of the CI jobs.

    would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node?

    Let me address that in a follow-up. I think it’s a nice-to-have since the necessary data for diffing should always be available in https://github.com/asmap/asmap-data when we follow the workflow we propose but yeah, one less step of indirection and having the ability to do it all with the data that’s in the repo is definitely better.

  106. fjahr force-pushed on Mar 29, 2025
  107. fjahr commented at 10:46 pm on March 29, 2025: contributor
    Rebased to get all the latest cmake changes and swapped the included mapping data to the latest asmap run. This used the --fill feature in encoding which leads to additional space savings. This is why the embedded data has gone down significantly from ~1.7MB to ~1.4MB.
  108. in src/test/fuzz/asmap.cpp:34 in d07856e1b1 outdated
    32+    std::vector<std::byte> asmap_vec = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    33     for (int i = 0; i < asmap_size; ++i) {
    34+        uint8_t byte = buffer[1 + i];
    35         for (int j = 0; j < 8; ++j) {
    36-            asmap.push_back((buffer[1 + i] >> j) & 1);
    37+            asmap_vec.push_back(static_cast<std::byte>((byte >> j) & 1));
    


    sipa commented at 6:31 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    I don’t think this is what we want to happen: it’s converting every byte of input data into 8 bytes that are each 0x00 or 0x01. This commit is also changing the meaning of interpretation, which makes this a bit confusing.

    This should just be:

    0for (int i = 0; i < asmap_size; ++i) {
    1    asmap_vec.push_back(buffer[1 + i]);
    2}
    

    I think.


    fjahr commented at 10:52 pm on April 22, 2025:
    Yeah, that’s right, fixed it.
  109. in src/netgroup.h:66 in d07856e1b1 outdated
    60@@ -52,7 +61,10 @@ class NetGroupManager {
    61     bool UsingASMap() const;
    62 
    63 private:
    64-    /** Compressed IP->ASN mapping, loaded from a file when a node starts.
    65+    /** Compressed IP->ASN mapping.
    66+     *
    67+     * Data may beloaded from a file when a node starts or embedded in the
    


    sipa commented at 6:32 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    Nit: typo be loaded


    fjahr commented at 10:52 pm on April 22, 2025:
    Fixed
  110. in src/netgroup.h:19 in d07856e1b1 outdated
    14@@ -15,9 +15,18 @@
    15  */
    16 class NetGroupManager {
    17 public:
    18-    explicit NetGroupManager(std::vector<bool> asmap)
    19-        : m_asmap{std::move(asmap)}
    20-    {}
    21+    static NetGroupManager WithEmbeddedAsmap(std::span<const std::byte> asmap) {
    


    sipa commented at 6:34 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    I think the fact that this commit both changes the interpretation from individual-bools to bits-of-byte while also adding support for span-based embedded asmap file makes it a bit harder to review.

    Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vector<std::byte>), and then in a second commit add support for embedded span-based ones?

  111. DrahtBot added the label Needs rebase on Apr 11, 2025
  112. fjahr force-pushed on Apr 22, 2025
  113. fjahr commented at 11:03 pm on April 22, 2025: contributor

    Rebased and addressed @sipa ’s feedback. Also made a few further improvements: Added some missing includes, dropped some docs that have been added in #32110 already (silent merge conflict), unified asmap version/checksum naming (I went with version because that’s what we have been communicating to users already).

    Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vectorstd::byte), and then in a second commit add support for embedded span-based ones?

    Done, I split the commit. I hope it makes review easier to have these changes separate. The downside is that a lot of lines are now touched in two commits but I guess it will be ok since many of them are easy to grasp.

  114. DrahtBot removed the label Needs rebase on Apr 22, 2025
  115. achow101 added this to the milestone 30.0 on Apr 25, 2025
  116. in src/netgroup.cpp:91 in 828cf86a03 outdated
    90-    std::vector<bool> ip_bits(128);
    91+    std::vector<std::byte> ip_bits(16);
    92     if (address.HasLinkedIPv4()) {
    93         // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
    94         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
    95-            for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
    


    sipa commented at 1:21 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    Simpler:

     0--- a/src/netgroup.cpp
     1+++ b/src/netgroup.cpp
     2@@ -90,18 +90,18 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
     3     if (address.HasLinkedIPv4()) {
     4         // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
     5         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
     6-            ip_bits[byte_i] = static_cast<std::byte>(IPV4_IN_IPV6_PREFIX[byte_i]);
     7+            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
     8         }
     9         uint32_t ipv4 = address.GetLinkedIPv4();
    10         for (int i = 0; i < 4; ++i) {
    11-            ip_bits[12 + i] = static_cast<std::byte>((ipv4 >> (24 - i * 8)) & 0xFF);
    12+            ip_bits[12 + i] = std::byte(ipv4 >> (24 - i * 8));
    13         }
    14     } else {
    15         // Use all 128 bits of the IPv6 address otherwise
    16         assert(address.IsIPv6());
    17         auto addr_bytes = address.GetAddrBytes();
    18         for (int8_t byte_i = 0; byte_i < 16; ++byte_i) {
    19-            ip_bits[byte_i] = static_cast<std::byte>(addr_bytes[byte_i]);
    20+            ip_bits[byte_i] = std::byte(addr_bytes[byte_i]);
    21         }
    22     }
    23     uint32_t mapped_as = Interpret(m_asmap, ip_bits);
    

    fjahr commented at 8:31 pm on May 8, 2025:
    Done
  117. in src/test/fuzz/asmap.cpp:32 in 828cf86a03 outdated
    32+    std::vector<std::byte> asmap_vec = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    33     for (int i = 0; i < asmap_size; ++i) {
    34-        for (int j = 0; j < 8; ++j) {
    35-            asmap.push_back((buffer[1 + i] >> j) & 1);
    36-        }
    37+        asmap_vec.push_back(static_cast<std::byte>(buffer[1 + i]));
    


    sipa commented at 1:25 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    Simpler:

    0asmap_vec.push_back(std::byte(buffer[1 + i]));
    

    fjahr commented at 8:30 pm on May 8, 2025:
    Taken
  118. in src/test/fuzz/asmap.cpp:34 in 828cf86a03 outdated
    34-        for (int j = 0; j < 8; ++j) {
    35-            asmap.push_back((buffer[1 + i] >> j) & 1);
    36-        }
    37+        asmap_vec.push_back(static_cast<std::byte>(buffer[1 + i]));
    38     }
    39+    std::vector<std::byte> asmap(asmap_vec);
    


    sipa commented at 1:26 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    I don’t think this is necessary? The code above could operate on asmap instead of asmap_vec directly?


    fjahr commented at 8:30 pm on May 8, 2025:
    Hm, right, that’s a left-over from previous iterations. Done.
  119. in src/test/fuzz/asmap_direct.cpp:34 in 828cf86a03 outdated
    30@@ -31,19 +31,21 @@ FUZZ_TARGET(asmap_direct)
    31     if (buffer.size() - sep_pos - 1 > 128) return; // At most 128 bits in IP address
    32 
    33     // Checks on asmap
    34-    std::vector<bool> asmap(buffer.begin(), buffer.begin() + sep_pos);
    35+    std::vector<std::byte> asmap(reinterpret_cast<const std::byte*>(buffer.data()),
    


    sipa commented at 1:37 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    This is not correct. The code (and comment) above indicate that buffer is being interpreted using 1 bit of asmap code per byte. The line here is interpreting it as 1 byte per byte.

    You need some conversion function, I think:

     0namespace {
     1
     2std::vector<std::byte> BitsToBytes(std::span<const uint8_t> input) noexcept
     3{
     4    std::vector<std::byte> ret;
     5    uint8_t next_byte{0};
     6    int next_byte_bits{0};
     7    for (uint8_t val : input) {
     8        next_byte |= (val & 1) << (next_byte_bits++);
     9        if (next_byte_bits == 8) {
    10            ret.push_back(std::byte(next_byte));
    11            next_byte = 0;
    12            next_byte_bits = 0;
    13        }
    14    }
    15    return ret;
    16}
    17
    18} // namespace
    19
    20FUZZ_TARGET(asmap_direct)
    21{
    22    ...
    23    auto asmap = BitsToBytes(std::span{buffer}.first(sep_pos));
    24    ...
    25    auto addr = BitsToBytes(std::span{buffer}.subspan(sep_pos + 1));
    26    ...
    27}
    

    fjahr commented at 8:31 pm on May 8, 2025:
    Taken but this code skipped the last byte if the input wasn’t a multiple of 8, so I added that.
  120. in src/test/fuzz/asmap_direct.cpp:38 in 828cf86a03 outdated
    36+                                 reinterpret_cast<const std::byte*>(buffer.data() + sep_pos));
    37     if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    38         // Verify that for valid asmaps, no prefix (except up to 7 zero padding bits) is valid.
    39-        std::vector<bool> asmap_prefix = asmap;
    40-        while (!asmap_prefix.empty() && asmap_prefix.size() + 7 > asmap.size() && asmap_prefix.back() == false) {
    41+        std::vector<std::byte> asmap_prefix = asmap;
    


    sipa commented at 1:40 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    This doesn’t make sense anymore. You can’t slice off individual bits from an std::byte.


    fjahr commented at 8:33 pm on May 8, 2025:
    Right, I adapted this part to skip the last byte since the BitsToBytes function would pad again, but otherwise I think this now functions as the original fuzz test again.
  121. in src/util/asmap.cpp:25 in 828cf86a03 outdated
    21@@ -21,15 +22,14 @@ namespace {
    22 
    23 constexpr uint32_t INVALID = 0xFFFFFFFF;
    24 
    25-uint32_t DecodeBits(std::vector<bool>::const_iterator& bitpos, const std::vector<bool>::const_iterator& endpos, uint8_t minval, const std::vector<uint8_t> &bit_sizes)
    26+uint32_t DecodeBits(size_t& bitpos, const std::vector<std::byte>& data, uint8_t minval, const std::vector<uint8_t>& bit_sizes)
    


    sipa commented at 1:49 am on April 26, 2025:

    “In commit “refactor: Operate on bytes instead of bits in Asmap code”:

    I think it would be helpful to have a bit getter helper function:

    0inline bool GetBit(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    1{
    2    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
    3}
    

    And then for example:

    0    ...
    1    bit = GetBit(data, bitpos++);
    2    ...
    

    and

    0    ...
    1    if (GetBit(ip, ip.size() * 8 - bits)) {
    2        --bits;
    3    }
    4    ...
    

    fjahr commented at 8:30 pm on May 8, 2025:
    Good idea. Done, also made a BE version of it.
  122. in src/test/fuzz/asmap_direct.cpp:36 in c1c6943ea9 outdated
    32@@ -33,19 +33,19 @@ FUZZ_TARGET(asmap_direct)
    33     // Checks on asmap
    34     std::vector<std::byte> asmap(reinterpret_cast<const std::byte*>(buffer.data()),
    35                                  reinterpret_cast<const std::byte*>(buffer.data() + sep_pos));
    36-    if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    37+    if (SanityCheckASMap(std::span<const std::byte>(asmap), buffer.size() - 1 - sep_pos)) {
    


    sipa commented at 1:58 am on April 26, 2025:

    In commit “refactor: Use span instead of vector for data in util/asmap”

    No need for the explicit conversion:

    0-if (SanityCheckASMap(std::span<const std::byte>(asmap), buffer.size() - 1 - sep_pos)) {
    1+if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    

    Also twice the same further in this function.


    fjahr commented at 8:30 pm on May 8, 2025:
    Done
  123. fjahr force-pushed on May 8, 2025
  124. sipa commented at 3:47 pm on June 9, 2025: member
    To convince myself that with all the bit/byte order changes, no semantics of the actual interpretation are affected, I wrote some known-correct unit tests for the asmap interpreter, added before the changes in this PR, and maintained throughout: https://github.com/sipa/bitcoin/commits/pr28792. Feel free to grab any part of it.
  125. in src/init.cpp:1494 in 84a8ad67a8 outdated
    1494+                    } else {
    1495+                        InitError(strprintf(_("Could not parse asmap file in default location %s"), fs::quoted(fs::PathToString(asmap_path))));
    1496+                    }
    1497+                    return false;
    1498+                }
    1499+                node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithLoadedAsmap(asmap));
    


    sipa commented at 7:41 pm on June 9, 2025:

    In commit “init, net: Implement usage of binary-embedded asmap data”

    Nit: use std::move(asmap) here to avoid a copy (of the entire loaded asmap blob).


    fjahr commented at 3:44 pm on June 11, 2025:
    Done
  126. sipa commented at 7:45 pm on June 9, 2025: member
    ACK 61714f390e74f49d0f37d7f96cc0645c8215728a
  127. DrahtBot requested review from Sjors on Jun 9, 2025
  128. DrahtBot requested review from jonatack on Jun 9, 2025
  129. fanquake commented at 9:53 am on June 10, 2025: member

    Can you update the PR description to reflect the current state of the PR, it seems like it describes this as one of multiple approaches, and mostly talks about a different approach?

    Looking through the changes here, I can’t see any documentation about where the .dat comes from, who’s responsible for generating it (what’s required determinism wise), how often it needs updating (who’s resposible for that, as well as maintaining any relevant tooling, and where that lives), how any of this effects the release process etc.

    If we are going to bundle the .dat, and use it by default in releases, can you elaborate on why we need a build option to disable the embedding, but not the asmap feature entirely; this seems to just complicate the settings/usage handling, i.e we need to handle the case where someone has disabled the embedding (built their own binary), but then passed the -asamp option, but not provided an asmap?

  130. fanquake commented at 3:25 pm on June 10, 2025: member

    Some other questions:

    • How quickly does the data degrade? Im guessing not so fast that there could be any adverse effects within the supported lifetime of a binary?
    • Is the expectation that we’d backport updates to any embedded data for every new (point) release?
  131. jurraca commented at 10:43 am on June 11, 2025: contributor

    How quickly does the data degrade?

    ran some numbers, counting by the number of addresses within networks, and diff’ing month to month, the ASmap from Dec 1 to Apr 1 drifted about 5%:

     0IPv4 drift from Dec 1
     1Jan 1: 1.40%
     2Feb 1: 3.72%
     3Mar 1: 4.90%
     4Apr 1: 5.23%
     5
     6IPv6 drift from Dec 1
     7Jan 1: 2.04%
     8Feb 1: 4.29%
     9Mar 1: 5.23%
    10Apr 1: 5.74%
    

    Method: running python3 contrib/asmap/asmap-tool.py diff map1.txt map2.txt which outputs the total count of addresses changed between the two files. I was running ASmap continually during those months but unfortunately don’t have files past Apr 1. Would be worth considering drift for a given node’s peerset against the ASmap. @0xB10C had previously shared some data which measured approximately 0.5% drift per month. (edit: this was the % of unmapped addresses in the peerset)

    EDIT: using diff_addrs docs to compare two ASmaps from March 23rd to June 11 with my node’s peers, the AS assignment drift is about 10%:

    0ASmap 1: 2025-03-23
    1ASmap 2: 2025-06-11
    2getnodeaddresses output from 2025-03-25
    3Summary: 5,092 (10.13%) of 50,260 addresses were reassigned (migrations=450, assignments=0, unassignments=4642)
    4getnodeaddresses output from 2025-06-10
    5Summary: 5,221 (10.37%) of 50,347 addresses were reassigned (migrations=420, assignments=0, unassignments=4801)
    

    This seems strangely high, and doesn’t match up with our previous estimates, so something may be off with my ASmaps…

  132. fjahr commented at 12:56 pm on June 11, 2025: contributor

    Can you update the PR description to reflect the current state of the PR, it seems like it describes this as one of multiple approaches, and mostly talks about a different approach?

    Right, fixed.

    Looking through the changes here, I can’t see any documentation about where the .dat comes from, who’s responsible for generating it (what’s required determinism wise), how often it needs updating (who’s resposible for that, as well as maintaining any relevant tooling, and where that lives), how any of this effects the release process etc.

    At a high level the description of the creation process as in practice at https://github.com/asmap/asmap-data here is still pretty much up-to-date: https://delvingbitcoin.org/t/asmap-creation-process/548

    I will add documentation on this here but I need to think about wording and placement a bit more. Similar to how we don’t share hard rules for when a PR is ready for merge I think it might be harmful to be overly specific with the description of the process here. Generally, the process is designed to share a lot of the characteristics of Core PRs/workflows to reduce cognitive overhead.

    What’s not spelled out on delving and won’t be in any documentation but should be said: Currently I am the sole maintainer of the data repo as in anyone can run the initiate the process and open a PR with the results but I am the only one who can merge stuff into asmap-data and kartograf currently. I am happy to share the access or move the repos if that is requested so there is some redundancy. The main question for me is if this should rather be shared with Bitcoin Core maintainers or rather regular contributors that are knowledgeable in the ASMap topic. I think I brought this up on IRC in the past but maybe now is a good time to revisit that question.

    If we are going to bundle the .dat, and use it by default in releases,

    The usage of asmap with the embedded asmap data is off by default and would be until we decide to change that (at least one release cycle), following the model of v2-transport. However, the .dat in the repository is embedded in builds by default, just to make sure we are on the same page.

    can you elaborate on why we need a build option to disable the embedding, but not the asmap feature entirely; this seems to just complicate the settings/usage handling, i.e we need to handle the case where someone has disabled the embedding (built their own binary), but then passed the -asamp option, but not provided an asmap?

    There was some interest expressed in this build option in the past though I am not sure anymore where and when that was. Might have just been the argument that this makes it easier for forks like Knots to build without the data. That and probably that embedding ~1.5MB of “obscure” data might scare some people in principle. I found the code overhead to be manageable but I am always happy to remove code and reduce the scope!

    EDIT: This was probably also before the data was checked into the repository. When discussing other solutions this build option would have been more important so that users could build when they had no file at hand.

    In the case where someone uses -asamp without a path to a file and the build does not have asmap data embedded the behavior falls back to the current one: The node looks for the file in the default location (in the datadir) and if it doesn’t find it, it returns an init error. There is a functional test that covers this behavior and one of the CI tasks builds without the embedded asmap. I wrote a bit more nuance on the parameter behavior here as well: #28792 (comment)

    How quickly does the data degrade? Im guessing not so fast that there could be any adverse effects within the supported lifetime of a binary?

    This is a difficult question to answer because part of the data is outdated by the time the map is created since part of the internet routing table is extremely volatile. In 2024 there were ~200k IPV4 BGP announcements daily for example. @jurraca has been building additional tooling and creating statistics to show how much volatility there really is and how much of it is relevant for the bitcoin network (deferring to his posts). Last but not least there is the dimension of legal entities vs. ASNs. Part of the volatility will harmless since the the same legal entity has just changed their ASN or shifted prefixes to a subsidiary etc. Unfortunately, this part is hard to quantify without considerable manual effort.

    But the guiding belief for this effort has been that even an badly outdated ASMap file should still be far better that the current default since prefixes don’t discriminate at all and it is not unusual for a node to have 2-3 their peers hosted at AWS or Hetzner. This was demonstrated in some of @virtu’s research though I can’t find a link to the presentation at the moment. Part of the relevant findings are teased here: #16599 (comment)

    Is the expectation that we’d backport updates to any embedded data for every new (point) release?

    I would recommend this, yes. I can’t think of a reason not to do it and it should be an easy commit to review because all it does is change the embedded .dat file to the same data that is in master so this should add considerable overhead afaict.

  133. tests: add unit test vectors for asmap interpreter f384d057ee
  134. refactor: Operate on bytes instead of bits in Asmap code 29055ae951
  135. refactor: Modernize logging in util/asmap.cpp 932c5b26f9
  136. refactor: Unify asmap version calculation and naming
    Calculate the asmap version only in one place: A dedicated function in util/asmap.
    
    The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
    2aa6d44687
  137. refactor: Use span instead of vector for data in util/asmap
    This prevents holding the asmap data in memory twice.
    e83be80c18
  138. build: Add embedded asmap data
    The data embedded is from the latest ASMap file from the asmap-data
    repository: https://github.com/fjahr/asmap-data/blob/main/1742572800_asmap.dat
    0249b0951e
  139. build: Generate ip_asn.dat.h during build process
    This can be disabled with -DWITH_EMBEDDED_ASMAP=OFF.
    8c1dad51c3
  140. common: Use fallback in GetPathArg if the path is 1
    Otherwise -asmap=1, for example, would be interpreted as a path "1".
    0818235f23
  141. init, net: Implement usage of binary-embedded asmap data 961dc29922
  142. doc: Expand documentation on asmap feature and tooling 072c993e0f
  143. refactor: Add AutoFile::size 0a0315abc9
  144. CI: Use without embedded asmap build option in one ci job 6699dd4b6c
  145. fjahr force-pushed on Jun 11, 2025
  146. fjahr commented at 3:45 pm on June 11, 2025: contributor

    To convince myself that with all the bit/byte order changes, no semantics of the actual interpretation are affected, I wrote some known-correct unit tests for the asmap interpreter, added before the changes in this PR, and maintained throughout: https://github.com/sipa/bitcoin/commits/pr28792. Feel free to grab any part of it.

    Nice, I have added this as the first commit to show that it should pass on each of the following commits (with minor edits).

  147. fjahr force-pushed on Jun 13, 2025
  148. fjahr commented at 8:57 pm on June 13, 2025: contributor
    I have pushed an initial text for the asmap data documentation that should answer most of the questions @fanquake asked. I avoided answering the “who is responsible” part directly since the process is open for anyone to contribute just like contributing to bitcoin core is. IMO this is similar to how release-process.md describes the steps that need to be completed for the release but there is no prescription for who should perform these steps and it is also not spelled out explicitly that mainters need to merge the PRs that complete these steps.
  149. in doc/asmap-data.md:5 in 2495279f83 outdated
    0@@ -0,0 +1,58 @@
    1+# Embedded ASMap data
    2+
    3+## Background
    4+
    5+The ASMap feature (available via `-asmap`) makes it possible to use a peers AS Number (an ISP/hoster identifier)
    


    sipa commented at 11:01 pm on June 14, 2025:

    “peers AS Number” -> “peer’s AS number”

    Maybe add “(ASN)” as that term is used further.

  150. in doc/asmap-data.md:8 in 2495279f83 outdated
    0@@ -0,0 +1,58 @@
    1+# Embedded ASMap data
    2+
    3+## Background
    4+
    5+The ASMap feature (available via `-asmap`) makes it possible to use a peers AS Number (an ISP/hoster identifier)
    6+in netgroup bucketing in order to ensure a higher diversity in the peer
    7+set. By default the buckets are formed based on IP prefixes but this does not
    8+prevent having multiple connections to peers at the same large-scale hoster,
    


    sipa commented at 11:02 pm on June 14, 2025:
    “having multiple connections to peers” -> “having connections dominated by peers” (multiple isn’t so much a problem on itself)
  151. in doc/asmap-data.md:26 in 2495279f83 outdated
    21+Sourcing the data from a single trusted source is problematic as well.
    22+
    23+The [Kartograf](https://github.com/asmap/kartograf) tool was created to
    24+deal with these uncertainties as good as possible. The mapping data is sourced from RPKI, IRR and
    25+Routeviews. The former two are themselves used as security mechanisms to
    26+protect against BGP security issues, which why they are considered more secure and
    


    sipa commented at 11:09 pm on June 14, 2025:
    “which why” -> “which is why”
  152. in doc/asmap-data.md:27 in 2495279f83 outdated
    22+
    23+The [Kartograf](https://github.com/asmap/kartograf) tool was created to
    24+deal with these uncertainties as good as possible. The mapping data is sourced from RPKI, IRR and
    25+Routeviews. The former two are themselves used as security mechanisms to
    26+protect against BGP security issues, which why they are considered more secure and
    27+their data takes precedent. The latter is a trusted collector of BGP traffic
    


    sipa commented at 11:10 pm on June 14, 2025:
    “precedent” -> “precedence”
  153. in doc/asmap-data.md:31 in 2495279f83 outdated
    26+protect against BGP security issues, which why they are considered more secure and
    27+their data takes precedent. The latter is a trusted collector of BGP traffic
    28+and only used for IP space that is not covered by RPKI and IRR.
    29+
    30+The process in which the Kartograf project parses, processes and merges these
    31+data sources are deterministic. Given the raw download files from these
    


    sipa commented at 11:10 pm on June 14, 2025:
    “are” -> “is”
  154. in doc/asmap-data.md:33 in 2495279f83 outdated
    28+and only used for IP space that is not covered by RPKI and IRR.
    29+
    30+The process in which the Kartograf project parses, processes and merges these
    31+data sources are deterministic. Given the raw download files from these
    32+different sources, anyone can build their own map file and verify the content
    33+matches with other users results. Before the map is usable by Bitcoin Core
    


    sipa commented at 11:10 pm on June 14, 2025:
    “users” -> “users'”
  155. in doc/asmap-data.md:37 in 2495279f83 outdated
    32+different sources, anyone can build their own map file and verify the content
    33+matches with other users results. Before the map is usable by Bitcoin Core
    34+it needs to be encoded as well. This is done using `asmap-tool.py` in `contrib/asmap`
    35+and this step is deterministic as well.
    36+
    37+When it comes to obtaining the initial input data the high volatility remains
    


    sipa commented at 11:11 pm on June 14, 2025:
    comma after “data”
  156. in doc/asmap-data.md:54 in 2495279f83 outdated
    49+or an explicit schedule.
    50+
    51+## Release process
    52+
    53+As an upcoming release approaches the embedded ASMap data should be updated
    54+by replacing the `ip_asn.dat` in a newer ASMap file from the asmap-data
    


    sipa commented at 11:12 pm on June 14, 2025:
    “in” -> “to”
  157. fjahr force-pushed on Jun 15, 2025
  158. fjahr commented at 12:01 pm on June 15, 2025: contributor
    Addressed @sipa ’s comments (all taken) on the asmap data documentation.
  159. doc: Add initial asmap data documentation 071483f649
  160. in doc/asmap-data.md:18 in ccbefd52a1 outdated
    13+available to users when they are unable to provide a file.
    14+
    15+## Data sourcing and tools
    16+
    17+ASMap is a mapping of IP prefix to ASN, essentially a snapshot of the
    18+internet routing table at a some point in time. Due to the high volatility
    


    maflcko commented at 7:51 am on June 16, 2025:
    In doc/asmap-data.md: “snapshot of the internet routing table at a some point in time.”
    a some → some [extra “a” makes the phrase ungrammatical]
    
    In doc/asmap-data.md under “Release process”: “replacing the ip_asn.dat to a newer ASMap file”
    to → with [wrong preposition]
    
    In doc/asmap-data.md under “Release process”: “obtaining a fresh for use in the upcoming release.”
    missing noun → “obtaining a fresh map for use in the upcoming release.” [omitted object makes it incomprehensible]
    

    fjahr commented at 11:21 am on June 16, 2025:
    All fixed
  161. fjahr force-pushed on Jun 16, 2025
  162. in doc/asmap-data.md:42 in 071483f649
    37+When it comes to obtaining the initial input data, the high volatility remains
    38+a challenge if users don't want to trust a single creator of used ASMap file.
    39+To overcome this, multiple users can start the download process at the exact
    40+same time which leads to a high likelihood that their downloaded data will be
    41+similar enough that they receive the same output at the end of the process.
    42+This process is regularly coordinated at the [asmap-data](https://github.com/asmap/asmap-data)
    


    sipa commented at 3:29 pm on June 24, 2025:

    See https://github.com/asmap/asmap-data/issues/25

    The repository description there says:

    This repository is strictly used for demonstration purposes to help with conceptual discussions about ASMap in the Bitcoin Core release process. Any data uploaded here should be treated with extreme caution and is not inteded for production use.

    which doesn’t seem to match the expectations around the repository as used here.


    fjahr commented at 10:50 pm on June 24, 2025:
    Thanks, I am addressing this in https://github.com/asmap/asmap-data/pull/26
  163. sipa commented at 5:24 pm on June 24, 2025: member
    Code review ACK 071483f64932e806ac8280d8c16e9bdf6be23eb3

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-30 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me