build: Embedded ASMap [3/3]: Build binary dump header file #28792

pull fjahr wants to merge 11 commits into bitcoin:master from fjahr:2023-10-asmap-in-source changing 28 files +501 −242
  1. fjahr commented at 10:13 pm on November 4, 2023: contributor

    Depends on #33878. This is the third in a tripled of PRs that implement the necessary changes for embedding of asmap data into the binary. This last part add the initial asmap data, implements the build changes and adds further documentation.

    Currently an asmap file needs to be acquired by there user from some location or the user needs to generate one themselves. Then they need to move the file to the right place in datadir or pass the path to the file as -asmap=PATH in order to use the asmap feature. The change here allows for builds to embed asmap data into the bitcoind binary which makes it possible to use the feature without handling of the asmap file by the user. If the user starts bitcoind with -asmap the embedded data will be used for bucketing of nodes.

    The data lives in the repository at src/node/data/ip_asn.dat and can be replaced with a new version at any time. The idea is that the data should be updated with every release. By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=OFF). In this case the original behavior of the -asmap option is maintained.

  2. DrahtBot commented at 10:13 pm on November 4, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/28792.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK Sjors, jonatack, laanwj, hodlinator
    Stale ACK sipa

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34374 (kernel: use structured logging and simplify logging interface by stickies-v)
    • #33974 (cmake: Check dependencies after build option interaction by hebasto)
    • #33966 (refactor: disentangle miner startup defaults from runtime options by Sjors)
    • #31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
    • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
    • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
    • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
    • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Nov 4, 2023
  4. fjahr force-pushed on Nov 4, 2023
  5. fjahr force-pushed on Nov 4, 2023
  6. fjahr force-pushed on Nov 4, 2023
  7. sipa commented at 0:24 am on November 5, 2023: member
    Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).
  8. DrahtBot removed the label CI failed on Nov 5, 2023
  9. maflcko commented at 5:32 pm on November 5, 2023: member
    How much data would this add per year to the repo, assuming updates happen?
  10. sipa commented at 5:59 pm on November 5, 2023: member
    @maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%).
  11. luke-jr commented at 4:17 pm on November 8, 2023: member
    Why is the option of loading it from a file even in dev builds, not considered?
  12. fjahr commented at 7:08 pm on November 9, 2023: contributor

    Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

    Yeah, that would work as well. I have drafted that approach here: https://github.com/fjahr/bitcoin/commit/0f9109fc49b8add9c057dcacd537250d4539ae57 Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here.

  13. fjahr commented at 7:11 pm on November 9, 2023: contributor

    Why is the option of loading it from a file even in dev builds, not considered?

    It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: “The alternative approach is not having the data in the source code but only adding it during the build phase of a release.” But this also comes with the aforementioned downsides.

  14. willcl-ark commented at 9:56 am on December 6, 2023: member

    A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the -asmap is not being used? Perhaps this will be handled by the demand paging system? If not, it seems to be a bit of a shame to increase binary size by 20% in all cases.

    It seems to make most sense to me to take the path suggested by @sipa, including the .dat file in the repo, and having it converted to a .h at compile time. It might also be nice to have a configuration flag to disable the embedded header creation for faster builds.

  15. DrahtBot added the label Needs rebase on Dec 14, 2023
  16. fjahr force-pushed on Dec 14, 2023
  17. DrahtBot removed the label Needs rebase on Dec 15, 2023
  18. DrahtBot added the label CI failed on Jan 16, 2024
  19. fjahr renamed this:
    Embedding ASMap files as binary dump header file
    Embed default ASMap as binary dump header file
    on Aug 27, 2024
  20. fjahr force-pushed on Aug 28, 2024
  21. fjahr marked this as ready for review on Aug 28, 2024
  22. fjahr force-pushed on Aug 28, 2024
  23. fjahr commented at 11:48 pm on August 28, 2024: contributor

    I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time.

    I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it’s unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve… And there is also a command for asmap-tool.py to generate the header file.

    I am adding the latest (1724248800) asmap file from asmap-data, the corresponding creation process can be reviewed in the issue and PR.

    I have not added a config option to skip the asmap header file but I will look into that next.

  24. DrahtBot removed the label CI failed on Aug 29, 2024
  25. Sjors commented at 9:13 am on August 29, 2024: member

    I am now using the suggested approach of building the header file at compile time

    I think the suggestion was to not include the header itself in that case.

    the autotools one has existed for a while, I can also remove it if people think it’s unnecessary

    Maybe keep it in until #30664 is merged.

  26. fjahr force-pushed on Aug 29, 2024
  27. fjahr commented at 6:35 pm on August 29, 2024: contributor
    Dropped the autotools build support after the discussion on IRC today (plan is to remove autotools in 24h) and rebased for good measure to have latest CMake fixups included.
  28. in src/init.cpp:1264 in e5a25fd708 outdated
    1265-                InitError(strprintf(_("Could not parse asmap file %s"), fs::quoted(fs::PathToString(asmap_path))));
    1266-                return false;
    1267+
    1268+            if (fs::exists(asmap_path)) {
    1269+                asmap = DecodeAsmap(asmap_path);
    1270+                if (asmap.size() == 0) {
    


    jonatack commented at 6:46 pm on August 29, 2024:

    nit, here and line 1271

    0                if (asmap.empty()) {
    

    fjahr commented at 8:48 pm on August 30, 2024:
    done
  29. fjahr commented at 6:51 pm on August 29, 2024: contributor

    I am now using the suggested approach of building the header file at compile time

    I think the suggestion was to not include the header itself in that case.

    I see. I am indifferent between having them in the repo or not but I am not sure what the arguments are for doing one or the other.

    We currently have two cases that seem comparable where we have the raw files as part of the repo and then make header files out of it in one way or another:

    • The test/data/* files that are built at compile time and the headers are not included in the repo
    • The chainparamsseeds that are built on demand for a release using the python script in contrib. This header file is included in the repo.

    We’ll have the raw file in the repo for sure but I am indifferent between using the build system or the python tool to make the header and I have implemented both which is good for testing anyway, I guess. I am also indifferent between having the header file in the repo or not. Conceptually the chainparamseeds seems closer to what we would do with an embedded asmap file: we basically also refresh it once per (point) release. So I guess that’s why I did it this way, because I was looking at chainparamsees as the reference case.

    I am curious if there are arguments for or against having the header file in the repo that I am currently not seeing. But even if not I am happy to go with what reviewers prefer because I just don’t have a preference right now. Maybe we’ll want to update the chainparamseeds process too if there is an argument for it.

  30. in contrib/asmap/asmap-tool.py:132 in e5a25fd708 outdated
    127@@ -127,6 +128,20 @@ def main():
    128                                    help="address file containing getnodeaddresses output to use in the comparison "
    129                                    "(make sure to set the count parameter to zero to get all node addresses, "
    130                                    "e.g. 'bitcoin-cli getnodeaddresses 0 > addrs.json')")
    131+
    132+    parser_gen_header = subparsers.add_parser("gen_header",
    


    jonatack commented at 7:06 pm on August 29, 2024:

    Might be good to add an example of these new commands and arguments to /contrib/asmap/README.md (and also mention the asmap tooling in /contrib/README.md).

    A higher level doc of overall steps to follow may be helpful, too, unless there is one already that I have forgotten about and overlooked.


    fjahr commented at 8:52 pm on August 30, 2024:

    I have expanded /contrib/asmap/README.md and /contrib/README.md and also added more details to the -asmap help.

    It sounds good to me to add a doc/asmap.md file with a higher level view but I would like to leave it as a follow-up that @jurraca might want to take on soon. He is currently working on getting https://asmap.org in shape and that doc would probably be a condensed version of that.

  31. jonatack commented at 7:08 pm on August 29, 2024: member
    First look at e5a25fd708858204cb6060393dc59d1dfed68acb
  32. Sjors commented at 10:11 am on August 30, 2024: member

    Concept ACK @fjahr if cmake can generate src/init/ip_asn.h from ip_asn.dat and it’s relative quick, then it should be sufficient to only include the .dat file. It saves space in the repo.

    If on the other hand you have to manually call a Python tool to produce the header file, then it would be too much inconvenience for everyone who wants to build Bitcoin Core from source.

    I just tried deleting src/init/ip_asn.h and running cmake again. It seems to take just a fraction of a second, but not sure how to measure that.

    ip_asn.h is also not human readable. It’s not useful to compare the diff between changes. If it were, that could be argument in favor of including it.

    Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?

  33. fjahr force-pushed on Aug 30, 2024
  34. fjahr force-pushed on Aug 30, 2024
  35. DrahtBot added the label CI failed on Aug 30, 2024
  36. DrahtBot commented at 8:54 pm on August 30, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29496076146

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  37. fjahr commented at 9:09 pm on August 30, 2024: contributor

    I have addressed the latest feedback from @Sjors and @jonatack and rebased again because of the CMake situation and because I overhauled a lot of the changes anyway.

    In these latest changes I have

    • Re-organized the commits, I think review should be a bit easier like this
    • Expanded documentation and tests
    • Made it possible to build with the embedded ASMap data or without
    • Removed the header file from the repo
  38. fjahr commented at 9:24 pm on August 30, 2024: contributor

    Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?

    I have expanded the help text for -asmap but I want to expand on it a bit here to explain my reasoning.

    Usage of the embedded asmap data is off by default right now. I am open to throw an on-by-default PR out there as a follow-up but I don’t want that conceptual discussion to hold up review of the implementation PR here. My feeling is that most people will be more comfortable with the v2-transport model: Implement as an off-by-default option for one release and then enable by default in the next version if there are no issues discovered that block it.

    On the -asmap functionality: If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data. I was going back and forth on that last part, i.e. using the embedded data instead of the file in the default location when both are present. But I opted for this way because 1. we are not breaking any workflows that were working before. If someone puts their file in the default location and expects it to be used, it’s still going to be used. And 2. dropping an asmap file in the default location can still be the most comfortable way of updating your asmap data, for example if you want to use the latest file from asmap-data instead of the embedded one.

  39. fjahr force-pushed on Aug 30, 2024
  40. fjahr force-pushed on Sep 1, 2024
  41. fjahr force-pushed on Sep 1, 2024
  42. fjahr force-pushed on Sep 1, 2024
  43. fjahr force-pushed on Sep 1, 2024
  44. Sjors commented at 6:42 am on September 2, 2024: member

    Usage of the embedded asmap data is off by default right now.

    That makes sense.

    If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data.

    I think that’s a good approach.

    CI is unhappy :-(

  45. fjahr commented at 9:31 am on September 2, 2024: contributor

    CI is unhappy :-(

    Yeah, I’ve been trying to fix that but it’s tricky because I can’t reproduce it locally. I guess I need to ask someone with more CMake experience.

  46. in cmake/script/WriteIPASN.cmake:14 in b20f0ef642 outdated
     8+execute_process(
     9+    COMMAND xxd -i "${RAW_FILE}"
    10+    COMMAND sed "s/unsigned char .*/static const unsigned char ip_asn[] = {/"
    11+    COMMAND sed "s/^unsigned int .*_len/static const unsigned int ip_asn_len/"
    12+    OUTPUT_VARIABLE HEX_DUMP
    13+)
    


    maflcko commented at 9:34 am on September 2, 2024:
    missing COMMAND_ERROR_IS_FATAL? Otherwise, a failure is silently ignored, no?

    fjahr commented at 9:53 am on September 2, 2024:
    Added, let’s see how it goes, I didn’t know the function could fail silently.
  47. in cmake/script/WriteIPASN.cmake:5 in b20f0ef642 outdated
    0@@ -0,0 +1,16 @@
    1+file(WRITE "${GENERATED_FILE}" "#ifndef BITCOIN_INIT_IP_ASN_H\n")
    2+file(APPEND "${GENERATED_FILE}" "#define BITCOIN_INIT_IP_ASN_H\n")
    3+file(APPEND "${GENERATED_FILE}" "/**\n")
    4+file(APPEND "${GENERATED_FILE}" " * ASMap data, mapping IP prefixes to ASNs,\n")
    5+file(APPEND "${GENERATED_FILE}" " * AUTOGENERATED by contrib/asmap/asmap-tool.py\n")
    


    maflcko commented at 9:38 am on September 2, 2024:
    Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?

    fjahr commented at 12:04 pm on September 2, 2024:
    Now using the python script from cmake, so marking as resolved.
  48. maflcko commented at 9:38 am on September 2, 2024: member
    I presume the CI fails because xxd is missing?
  49. fjahr force-pushed on Sep 2, 2024
  50. fjahr force-pushed on Sep 2, 2024
  51. fjahr commented at 9:59 am on September 2, 2024: contributor

    I presume the CI fails because xxd is missing?

    Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.

  52. fjahr commented at 10:01 am on September 2, 2024: contributor

    Sounds like it can’t find the raw file…

    0CMake Error at /ci_container_base/cmake/script/WriteIPASN.cmake:8 (execute_process):
    1  execute_process error getting child return code: No such file or directory
    
  53. maflcko commented at 10:02 am on September 2, 2024: member

    Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.

    Are you sure? Note that the CI doesn’t use a desktop VM image, but a minimal docker image. I get:

    0# xxd
    1bash: xxd: command not found
    
  54. fjahr commented at 10:03 am on September 2, 2024: contributor

    Note that the CI doesn’t use a desktop VM image, but a minimal docker image

    Ah, then I guess that was my mistake

  55. fjahr force-pushed on Sep 2, 2024
  56. fjahr force-pushed on Sep 2, 2024
  57. fjahr commented at 12:04 pm on September 2, 2024: contributor

    Looks not using xxd fixed it, thanks @maflcko !

    Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?

    Yeah, I first implemented it in python, then autotools and finally cmake. I kept the python version around because I wasn’t sure which approach we might actually want to go (see #28792 (comment)) and because I thought there might be some possible testing scenarios using it. But it doesn’t matter now since we needed an xxd-free approach to work on all environments and implementing the functionality in cmake directly seemed cumbersome, so I am now just using the python script from cmake as well, leaving us with just the python implementation.

  58. fjahr force-pushed on Sep 2, 2024
  59. in contrib/asmap/asmap-tool.py:225 in 39729a910a outdated
    220+ */
    221+unsigned char ip_asn[] = {{
    222+    {hex_data}
    223+}};
    224+
    225+unsigned int ip_asn_len = {len(data)};
    


    maflcko commented at 12:23 pm on September 2, 2024:
    Using a mutable C-style array seems a bit fragile. IIUC, starting with C++17, using inline constexpr std::array ip_asn{...}; should allow the compiler to optimize the layout, as well as enforce immutability. Moreover, std::array drops the need for ip_asn_len, because the size is embedded in the type of the array. Unrelated, you can change the 0x... above to std::byte{0x...} and then use MakeUCharSpan instead of legacy C-Style reinterpret cast and static cast in the code that uses the array.

    fjahr commented at 3:56 pm on September 2, 2024:
    done
  60. Sjors commented at 1:08 pm on September 2, 2024: member

    so I am now just using the python script from cmake as well, leaving us with just the python implementation

    Mmm, is Python already a build requirement though? Afaik it’s only a test requirement. So if the header can only be generated using a Python script, then that’s an argument for including it in the source code.

  61. maflcko commented at 1:16 pm on September 2, 2024: member

    if the header can only be generated using a Python script, then that’s an argument for including it in the source code.

    It is possible to generate it in native cmake as well. See for example cmake/script/GenerateHeaderFromJson.cmake. Note that previously the “cmake” generation used xxd, which isn’t a build-requirement either.

    Edit: To clarify: I am against reverting to including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

  62. Sjors commented at 1:35 pm on September 2, 2024: member

    The current version of cmake/script/GenerateHeaderFromJson.cmake in this PR calls asmap-tool.py: https://github.com/bitcoin/bitcoin/pull/28792/commits/07f05d5d3c773106fb75a9a25ae970ba0c6cf3a0#diff-cdce11cce5bf019b8a9c82d69718aefd2cca265cbaa45f7e0cebc497f60ddf25R25

    Is there a way to do this without relying either on Python or xxd?

    Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

    Agreed.

  63. fjahr commented at 2:22 pm on September 2, 2024: contributor

    To clarify: I am against including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.

    Now I’m confused, the header file isn’t included anymore since a while ago so the data is included only once already, or am I missing something? Are you just saying that shouldn’t be changed back?

  64. fjahr force-pushed on Sep 2, 2024
  65. fjahr commented at 3:57 pm on September 2, 2024: contributor
    Switched back to a native approach in cmake without python or xxd.
  66. fjahr force-pushed on Sep 2, 2024
  67. DrahtBot removed the label CI failed on Sep 2, 2024
  68. in cmake/script/GenerateHeaderAsmap.cmake:6 in b557dc5976 outdated
    0@@ -0,0 +1,33 @@
    1+# Copyright (c) 2023-present The Bitcoin Core developers
    2+# Distributed under the MIT software license, see the accompanying
    3+# file COPYING or https://opensource.org/license/mit/.
    4+
    5+file(READ ${RAW_FILE} hex_content HEX)
    6+string(REGEX MATCHALL "([A-Za-z0-9][A-Za-z0-9])" bytes "${hex_content}")
    


    maflcko commented at 12:57 pm on September 5, 2024:
    Could rebase and use the existing generate_header_from_raw, which should do the same thing, basically?

    fjahr commented at 10:00 am on September 13, 2024:
    Seems like there is still some tinkering going on on that one, I will give it a try when that has settled down.

    fjahr commented at 9:34 am on September 20, 2024:
    I am now using that function. I thought there might be issue with the missing header guards but the generated files are not listed. There is now also no more comment in the generated file.

    maflcko commented at 9:56 am on September 20, 2024:

    missing header guard

    That was somewhat intentional, I guess. A header guard is really only useful for module headers to avoid compile errors if the module is included several times transitively/recursively.

    However, for data-holding headers, the include should only ever happen in a single place in the whole program. Otherwise it seems too easy to bloat the binary. So a compile error in that case is actually intentional and useful.

  69. DrahtBot added the label CI failed on Sep 12, 2024
  70. DrahtBot removed the label CI failed on Sep 15, 2024
  71. DrahtBot added the label Needs rebase on Sep 17, 2024
  72. fjahr force-pushed on Sep 20, 2024
  73. fjahr commented at 9:32 am on September 20, 2024: contributor
    Rebased to incorporate latest relevant changes to the build system and AutoFile. This now also builds on #30901 so please ignore the first commit or leave your review of it in that PR.
  74. DrahtBot removed the label Needs rebase on Sep 20, 2024
  75. fjahr commented at 10:02 pm on September 20, 2024: contributor
    CI failures (timeouts) seem unrelated, should be ready for review as I am hoping #30901 will get merged quickly.
  76. DrahtBot added the label Needs rebase on Nov 6, 2024
  77. fjahr force-pushed on Nov 6, 2024
  78. DrahtBot removed the label Needs rebase on Nov 7, 2024
  79. sipa commented at 2:39 pm on December 2, 2024: member

    Concept ACK.

    It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?

  80. fjahr commented at 2:07 pm on December 13, 2024: contributor

    It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?

    That makes sense and I am giving it a try right now. Will follow-up here when it’s done!

  81. DrahtBot added the label Needs rebase on Dec 17, 2024
  82. fjahr force-pushed on Jan 15, 2025
  83. DrahtBot commented at 8:44 pm on January 15, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35676029008

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  84. DrahtBot added the label CI failed on Jan 15, 2025
  85. fjahr force-pushed on Jan 15, 2025
  86. fjahr force-pushed on Jan 15, 2025
  87. DrahtBot removed the label CI failed on Jan 16, 2025
  88. fjahr commented at 4:13 pm on January 16, 2025: contributor

    Rebased with an updated #30901 and implemented @sipa ’s suggestion here: #28792 (comment) This took me a bit more time than I anticipated and a couple of times I wasn’t sure which approach to go with. This is why I have kept this part as a separate commit for now and I am looking for approach ACKs on that last part in particular first before I squash it into the appropriate previous commits.

    The most interesting pieces IMO:

    • Dual member approach in NetGroupManager with m_asmap and m_loaded_asmap (approach suggested by @sipa )
    • Usage of named constructors
    • Checksum calculation: I wrote a new serializer but didn’t use it because it didn’t feel we should add it just for this. Converting also feels ugly but seems to be the easiest approach. Or maybe I am overlooking something.
  89. in src/util/asmap.cpp:234 in 6cf4930d15 outdated
    233 {
    234-    LogInfo("Opened asmap data (%zu bytes) from embedded byte array\n", data.size());
    235-    return Decode(data);
    236+    HashWriter asmap_hasher;
    237+
    238+    // Asmap used to be a bool vector, so in order to keep the checksum
    


    sipa commented at 9:25 pm on January 16, 2025:
    Do we care about keeping the checksum calculation compatible? For almost all users, this PR being merged will mean a new asmap database anyway, so they’ll incur the price of addrman invalidation that brings anyway.

    fjahr commented at 6:36 pm on January 20, 2025:
    I guess you are right, I probably got a bit too caught up in this thinking that there should be an easier way to keep it consistent. I have removed that backwards compatibility code and updated the test.
  90. DrahtBot removed the label Needs rebase on Jan 17, 2025
  91. fjahr force-pushed on Jan 20, 2025
  92. in src/test/fuzz/asmap.cpp:17 in f2e673f944 outdated
    26-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    27-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    28-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    29-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true,         // Match 0xFF
    30-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true          // Match 0xFF
    31+static const std::vector<std::byte> IPV4_PREFIX_ASMAP = {
    


    sipa commented at 3:28 pm on January 29, 2025:

    This seems to be the prefix it’s trying to match itself, not code for matching it.

    I suggest: static const auto = "fb03ec0fb03fc0fe00fb03ec0fb03fc0fe00fb03ec0fb0fffffeff"_hex_v;


    fjahr commented at 8:29 pm on February 13, 2025:
    done
  93. in contrib/asmap/asmap-tool.py:229 in 97bee93a9c outdated
    224+
    225+        file_content = f"""#include <cstddef>
    226+#include <span>
    227+
    228+namespace node::data {{
    229+inline constexpr std::byte detail_ip_asn_raw[] {{
    


    sipa commented at 3:36 pm on January 29, 2025:
    This could be a lot smaller by using the ""_hex user-defined literal (if you include util/strencodings.h); it’d create an std::array<std::byte, N> instead of a C-array, but the std::span constructor below just work just as well on it.

    fjahr commented at 8:36 pm on February 13, 2025:
    This is neat but I found it to be not as straight forward as it seemed initially. Aside from the additional include I found that I needed to bump template-depth and constexpr-steps in order to make this work at compile time with clang. I am still a bit unclear on what the trade-offs are for this and I will open a separate PR with this so that this discussion doesn’t extend the scope here further.

    sipa commented at 8:51 pm on February 13, 2025:
    Yeah, given that the file isn’t committed its size doesn’t matter that much. And it may be that constexpr evaluating a huge amount of data is non-trivial for the compiler, unsure. This isn’t that important.

    fjahr commented at 8:59 pm on February 13, 2025:
    Opened #31861 so it can be discussed there :)
  94. in contrib/asmap/asmap-tool.py:12 in 97bee93a9c outdated
     8@@ -9,6 +9,7 @@
     9 import json
    10 import math
    11 from collections import defaultdict
    12+from pathlib import Path
    


    sipa commented at 3:39 pm on January 29, 2025:
    This new gen_header command doesn’t seem to be used by the build system; what is its point?

    fjahr commented at 8:29 pm on February 13, 2025:
    It was a left-over of a previous approach and I left it in initially in case it might be interesting for testing, experiments etc. just because it was already there. But the PR has grown in scope a lot and it’s becoming annoying to deal with for me and reviewers, so I dropped it now.
  95. sipa commented at 3:55 pm on January 29, 2025: member

    Concept ACK.

    I’d recommend integrating the last commit into the rest of the PR, making the std::vector<bool> -> std::span<std::byte> for the interpreter change first, and then introducing the embedded asmap data afterwards.

  96. DrahtBot added the label Needs rebase on Feb 12, 2025
  97. fjahr force-pushed on Feb 13, 2025
  98. fjahr commented at 8:37 pm on February 13, 2025: contributor

    Addressed @sipa ’s feedback and rebased on the latest version of #30901.

    The python tool command has been dropped and the commit have been reorganized as well.

  99. DrahtBot removed the label Needs rebase on Feb 13, 2025
  100. sipa commented at 3:24 pm on February 14, 2025: member
    Will review again soon, but a quick question: would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node? For example, to be able to run asmap.py diff on it. It could be an RPC command that just spits out the data in textual format, or a REST endpoint that directly provides it in binary.
  101. DrahtBot added the label Needs rebase on Feb 14, 2025
  102. jonatack commented at 3:58 pm on February 28, 2025: member
    Concept ACK, will review after rebase.
  103. fjahr force-pushed on Mar 3, 2025
  104. DrahtBot removed the label Needs rebase on Mar 3, 2025
  105. fjahr commented at 6:44 pm on March 3, 2025: contributor

    Rebased the previous conflict and #30901 has been merged now as well 🥳

    I also added the -DWITH_EMBEDDED_ASMAP=NO to one of the CI jobs.

    would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node?

    Let me address that in a follow-up. I think it’s a nice-to-have since the necessary data for diffing should always be available in https://github.com/asmap/asmap-data when we follow the workflow we propose but yeah, one less step of indirection and having the ability to do it all with the data that’s in the repo is definitely better.

  106. fjahr force-pushed on Mar 29, 2025
  107. fjahr commented at 10:46 pm on March 29, 2025: contributor
    Rebased to get all the latest cmake changes and swapped the included mapping data to the latest asmap run. This used the --fill feature in encoding which leads to additional space savings. This is why the embedded data has gone down significantly from ~1.7MB to ~1.4MB.
  108. in src/test/fuzz/asmap.cpp:34 in d07856e1b1 outdated
    32+    std::vector<std::byte> asmap_vec = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    33     for (int i = 0; i < asmap_size; ++i) {
    34+        uint8_t byte = buffer[1 + i];
    35         for (int j = 0; j < 8; ++j) {
    36-            asmap.push_back((buffer[1 + i] >> j) & 1);
    37+            asmap_vec.push_back(static_cast<std::byte>((byte >> j) & 1));
    


    sipa commented at 6:31 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    I don’t think this is what we want to happen: it’s converting every byte of input data into 8 bytes that are each 0x00 or 0x01. This commit is also changing the meaning of interpretation, which makes this a bit confusing.

    This should just be:

    0for (int i = 0; i < asmap_size; ++i) {
    1    asmap_vec.push_back(buffer[1 + i]);
    2}
    

    I think.


    fjahr commented at 10:52 pm on April 22, 2025:
    Yeah, that’s right, fixed it.
  109. in src/netgroup.h:66 in d07856e1b1 outdated
    60@@ -52,7 +61,10 @@ class NetGroupManager {
    61     bool UsingASMap() const;
    62 
    63 private:
    64-    /** Compressed IP->ASN mapping, loaded from a file when a node starts.
    65+    /** Compressed IP->ASN mapping.
    66+     *
    67+     * Data may beloaded from a file when a node starts or embedded in the
    


    sipa commented at 6:32 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    Nit: typo be loaded


    fjahr commented at 10:52 pm on April 22, 2025:
    Fixed
  110. in src/netgroup.h:24 in d07856e1b1 outdated
    14@@ -15,9 +15,18 @@
    15  */
    16 class NetGroupManager {
    17 public:
    18-    explicit NetGroupManager(std::vector<bool> asmap)
    19-        : m_asmap{std::move(asmap)}
    20-    {}
    21+    static NetGroupManager WithEmbeddedAsmap(std::span<const std::byte> asmap) {
    


    sipa commented at 6:34 pm on April 10, 2025:

    In commit “refactor: Use span of bytes for asmap”

    I think the fact that this commit both changes the interpretation from individual-bools to bits-of-byte while also adding support for span-based embedded asmap file makes it a bit harder to review.

    Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vector<std::byte>), and then in a second commit add support for embedded span-based ones?

  111. DrahtBot added the label Needs rebase on Apr 11, 2025
  112. fjahr force-pushed on Apr 22, 2025
  113. fjahr commented at 11:03 pm on April 22, 2025: contributor

    Rebased and addressed @sipa ’s feedback. Also made a few further improvements: Added some missing includes, dropped some docs that have been added in #32110 already (silent merge conflict), unified asmap version/checksum naming (I went with version because that’s what we have been communicating to users already).

    Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vectorstd::byte), and then in a second commit add support for embedded span-based ones?

    Done, I split the commit. I hope it makes review easier to have these changes separate. The downside is that a lot of lines are now touched in two commits but I guess it will be ok since many of them are easy to grasp.

  114. DrahtBot removed the label Needs rebase on Apr 22, 2025
  115. achow101 added this to the milestone 30.0 on Apr 25, 2025
  116. in src/netgroup.cpp:91 in 828cf86a03 outdated
    90-    std::vector<bool> ip_bits(128);
    91+    std::vector<std::byte> ip_bits(16);
    92     if (address.HasLinkedIPv4()) {
    93         // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
    94         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
    95-            for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
    


    sipa commented at 1:21 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    Simpler:

     0--- a/src/netgroup.cpp
     1+++ b/src/netgroup.cpp
     2@@ -90,18 +90,18 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
     3     if (address.HasLinkedIPv4()) {
     4         // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
     5         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
     6-            ip_bits[byte_i] = static_cast<std::byte>(IPV4_IN_IPV6_PREFIX[byte_i]);
     7+            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
     8         }
     9         uint32_t ipv4 = address.GetLinkedIPv4();
    10         for (int i = 0; i < 4; ++i) {
    11-            ip_bits[12 + i] = static_cast<std::byte>((ipv4 >> (24 - i * 8)) & 0xFF);
    12+            ip_bits[12 + i] = std::byte(ipv4 >> (24 - i * 8));
    13         }
    14     } else {
    15         // Use all 128 bits of the IPv6 address otherwise
    16         assert(address.IsIPv6());
    17         auto addr_bytes = address.GetAddrBytes();
    18         for (int8_t byte_i = 0; byte_i < 16; ++byte_i) {
    19-            ip_bits[byte_i] = static_cast<std::byte>(addr_bytes[byte_i]);
    20+            ip_bits[byte_i] = std::byte(addr_bytes[byte_i]);
    21         }
    22     }
    23     uint32_t mapped_as = Interpret(m_asmap, ip_bits);
    

    fjahr commented at 8:31 pm on May 8, 2025:
    Done
  117. in src/test/fuzz/asmap.cpp:32 in 828cf86a03 outdated
    32+    std::vector<std::byte> asmap_vec = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    33     for (int i = 0; i < asmap_size; ++i) {
    34-        for (int j = 0; j < 8; ++j) {
    35-            asmap.push_back((buffer[1 + i] >> j) & 1);
    36-        }
    37+        asmap_vec.push_back(static_cast<std::byte>(buffer[1 + i]));
    


    sipa commented at 1:25 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    Simpler:

    0asmap_vec.push_back(std::byte(buffer[1 + i]));
    

    fjahr commented at 8:30 pm on May 8, 2025:
    Taken
  118. in src/test/fuzz/asmap.cpp:34 in 828cf86a03 outdated
    34-        for (int j = 0; j < 8; ++j) {
    35-            asmap.push_back((buffer[1 + i] >> j) & 1);
    36-        }
    37+        asmap_vec.push_back(static_cast<std::byte>(buffer[1 + i]));
    38     }
    39+    std::vector<std::byte> asmap(asmap_vec);
    


    sipa commented at 1:26 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    I don’t think this is necessary? The code above could operate on asmap instead of asmap_vec directly?


    fjahr commented at 8:30 pm on May 8, 2025:
    Hm, right, that’s a left-over from previous iterations. Done.
  119. in src/test/fuzz/asmap_direct.cpp:34 in 828cf86a03 outdated
    30@@ -31,19 +31,21 @@ FUZZ_TARGET(asmap_direct)
    31     if (buffer.size() - sep_pos - 1 > 128) return; // At most 128 bits in IP address
    32 
    33     // Checks on asmap
    34-    std::vector<bool> asmap(buffer.begin(), buffer.begin() + sep_pos);
    35+    std::vector<std::byte> asmap(reinterpret_cast<const std::byte*>(buffer.data()),
    


    sipa commented at 1:37 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    This is not correct. The code (and comment) above indicate that buffer is being interpreted using 1 bit of asmap code per byte. The line here is interpreting it as 1 byte per byte.

    You need some conversion function, I think:

     0namespace {
     1
     2std::vector<std::byte> BitsToBytes(std::span<const uint8_t> input) noexcept
     3{
     4    std::vector<std::byte> ret;
     5    uint8_t next_byte{0};
     6    int next_byte_bits{0};
     7    for (uint8_t val : input) {
     8        next_byte |= (val & 1) << (next_byte_bits++);
     9        if (next_byte_bits == 8) {
    10            ret.push_back(std::byte(next_byte));
    11            next_byte = 0;
    12            next_byte_bits = 0;
    13        }
    14    }
    15    return ret;
    16}
    17
    18} // namespace
    19
    20FUZZ_TARGET(asmap_direct)
    21{
    22    ...
    23    auto asmap = BitsToBytes(std::span{buffer}.first(sep_pos));
    24    ...
    25    auto addr = BitsToBytes(std::span{buffer}.subspan(sep_pos + 1));
    26    ...
    27}
    

    fjahr commented at 8:31 pm on May 8, 2025:
    Taken but this code skipped the last byte if the input wasn’t a multiple of 8, so I added that.
  120. in src/test/fuzz/asmap_direct.cpp:38 in 828cf86a03 outdated
    36+                                 reinterpret_cast<const std::byte*>(buffer.data() + sep_pos));
    37     if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    38         // Verify that for valid asmaps, no prefix (except up to 7 zero padding bits) is valid.
    39-        std::vector<bool> asmap_prefix = asmap;
    40-        while (!asmap_prefix.empty() && asmap_prefix.size() + 7 > asmap.size() && asmap_prefix.back() == false) {
    41+        std::vector<std::byte> asmap_prefix = asmap;
    


    sipa commented at 1:40 am on April 26, 2025:

    In commit “refactor: Operate on bytes instead of bits in Asmap code”

    This doesn’t make sense anymore. You can’t slice off individual bits from an std::byte.


    fjahr commented at 8:33 pm on May 8, 2025:
    Right, I adapted this part to skip the last byte since the BitsToBytes function would pad again, but otherwise I think this now functions as the original fuzz test again.
  121. in src/util/asmap.cpp:25 in 828cf86a03 outdated
    21@@ -21,15 +22,14 @@ namespace {
    22 
    23 constexpr uint32_t INVALID = 0xFFFFFFFF;
    24 
    25-uint32_t DecodeBits(std::vector<bool>::const_iterator& bitpos, const std::vector<bool>::const_iterator& endpos, uint8_t minval, const std::vector<uint8_t> &bit_sizes)
    26+uint32_t DecodeBits(size_t& bitpos, const std::vector<std::byte>& data, uint8_t minval, const std::vector<uint8_t>& bit_sizes)
    


    sipa commented at 1:49 am on April 26, 2025:

    “In commit “refactor: Operate on bytes instead of bits in Asmap code”:

    I think it would be helpful to have a bit getter helper function:

    0inline bool GetBit(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    1{
    2    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
    3}
    

    And then for example:

    0    ...
    1    bit = GetBit(data, bitpos++);
    2    ...
    

    and

    0    ...
    1    if (GetBit(ip, ip.size() * 8 - bits)) {
    2        --bits;
    3    }
    4    ...
    

    fjahr commented at 8:30 pm on May 8, 2025:
    Good idea. Done, also made a BE version of it.
  122. in src/test/fuzz/asmap_direct.cpp:36 in c1c6943ea9 outdated
    32@@ -33,19 +33,19 @@ FUZZ_TARGET(asmap_direct)
    33     // Checks on asmap
    34     std::vector<std::byte> asmap(reinterpret_cast<const std::byte*>(buffer.data()),
    35                                  reinterpret_cast<const std::byte*>(buffer.data() + sep_pos));
    36-    if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    37+    if (SanityCheckASMap(std::span<const std::byte>(asmap), buffer.size() - 1 - sep_pos)) {
    


    sipa commented at 1:58 am on April 26, 2025:

    In commit “refactor: Use span instead of vector for data in util/asmap”

    No need for the explicit conversion:

    0-if (SanityCheckASMap(std::span<const std::byte>(asmap), buffer.size() - 1 - sep_pos)) {
    1+if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    

    Also twice the same further in this function.


    fjahr commented at 8:30 pm on May 8, 2025:
    Done
  123. fjahr force-pushed on May 8, 2025
  124. sipa commented at 3:47 pm on June 9, 2025: member
    To convince myself that with all the bit/byte order changes, no semantics of the actual interpretation are affected, I wrote some known-correct unit tests for the asmap interpreter, added before the changes in this PR, and maintained throughout: https://github.com/sipa/bitcoin/commits/pr28792. Feel free to grab any part of it.
  125. in src/init.cpp:1494 in 84a8ad67a8 outdated
    1494+                    } else {
    1495+                        InitError(strprintf(_("Could not parse asmap file in default location %s"), fs::quoted(fs::PathToString(asmap_path))));
    1496+                    }
    1497+                    return false;
    1498+                }
    1499+                node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithLoadedAsmap(asmap));
    


    sipa commented at 7:41 pm on June 9, 2025:

    In commit “init, net: Implement usage of binary-embedded asmap data”

    Nit: use std::move(asmap) here to avoid a copy (of the entire loaded asmap blob).


    fjahr commented at 3:44 pm on June 11, 2025:
    Done
  126. sipa commented at 7:45 pm on June 9, 2025: member
    ACK 61714f390e74f49d0f37d7f96cc0645c8215728a
  127. DrahtBot requested review from Sjors on Jun 9, 2025
  128. DrahtBot requested review from jonatack on Jun 9, 2025
  129. fanquake commented at 9:53 am on June 10, 2025: member

    Can you update the PR description to reflect the current state of the PR, it seems like it describes this as one of multiple approaches, and mostly talks about a different approach?

    Looking through the changes here, I can’t see any documentation about where the .dat comes from, who’s responsible for generating it (what’s required determinism wise), how often it needs updating (who’s resposible for that, as well as maintaining any relevant tooling, and where that lives), how any of this effects the release process etc.

    If we are going to bundle the .dat, and use it by default in releases, can you elaborate on why we need a build option to disable the embedding, but not the asmap feature entirely; this seems to just complicate the settings/usage handling, i.e we need to handle the case where someone has disabled the embedding (built their own binary), but then passed the -asamp option, but not provided an asmap?

  130. fanquake commented at 3:25 pm on June 10, 2025: member

    Some other questions:

    • How quickly does the data degrade? Im guessing not so fast that there could be any adverse effects within the supported lifetime of a binary?
    • Is the expectation that we’d backport updates to any embedded data for every new (point) release?
  131. jurraca commented at 10:43 am on June 11, 2025: contributor

    How quickly does the data degrade?

    ran some numbers, counting by the number of addresses within networks, and diff’ing month to month, the ASmap from Dec 1 to Apr 1 drifted about 5%:

     0IPv4 drift from Dec 1
     1Jan 1: 1.40%
     2Feb 1: 3.72%
     3Mar 1: 4.90%
     4Apr 1: 5.23%
     5
     6IPv6 drift from Dec 1
     7Jan 1: 2.04%
     8Feb 1: 4.29%
     9Mar 1: 5.23%
    10Apr 1: 5.74%
    

    Method: running python3 contrib/asmap/asmap-tool.py diff map1.txt map2.txt which outputs the total count of addresses changed between the two files. I was running ASmap continually during those months but unfortunately don’t have files past Apr 1. Would be worth considering drift for a given node’s peerset against the ASmap. @0xB10C had previously shared some data which measured approximately 0.5% drift per month. (edit: this was the % of unmapped addresses in the peerset)

    EDIT: using diff_addrs docs to compare two ASmaps from March 23rd to June 11 with my node’s peers, the AS assignment drift is about 10%:

    0ASmap 1: 2025-03-23
    1ASmap 2: 2025-06-11
    2getnodeaddresses output from 2025-03-25
    3Summary: 5,092 (10.13%) of 50,260 addresses were reassigned (migrations=450, assignments=0, unassignments=4642)
    4getnodeaddresses output from 2025-06-10
    5Summary: 5,221 (10.37%) of 50,347 addresses were reassigned (migrations=420, assignments=0, unassignments=4801)
    

    This seems strangely high, and doesn’t match up with our previous estimates, so something may be off with my ASmaps…

  132. fjahr commented at 12:56 pm on June 11, 2025: contributor

    Can you update the PR description to reflect the current state of the PR, it seems like it describes this as one of multiple approaches, and mostly talks about a different approach?

    Right, fixed.

    Looking through the changes here, I can’t see any documentation about where the .dat comes from, who’s responsible for generating it (what’s required determinism wise), how often it needs updating (who’s resposible for that, as well as maintaining any relevant tooling, and where that lives), how any of this effects the release process etc.

    At a high level the description of the creation process as in practice at https://github.com/asmap/asmap-data here is still pretty much up-to-date: https://delvingbitcoin.org/t/asmap-creation-process/548

    I will add documentation on this here but I need to think about wording and placement a bit more. Similar to how we don’t share hard rules for when a PR is ready for merge I think it might be harmful to be overly specific with the description of the process here. Generally, the process is designed to share a lot of the characteristics of Core PRs/workflows to reduce cognitive overhead.

    What’s not spelled out on delving and won’t be in any documentation but should be said: Currently I am the sole maintainer of the data repo as in anyone can run the initiate the process and open a PR with the results but I am the only one who can merge stuff into asmap-data and kartograf currently. I am happy to share the access or move the repos if that is requested so there is some redundancy. The main question for me is if this should rather be shared with Bitcoin Core maintainers or rather regular contributors that are knowledgeable in the ASMap topic. I think I brought this up on IRC in the past but maybe now is a good time to revisit that question.

    If we are going to bundle the .dat, and use it by default in releases,

    The usage of asmap with the embedded asmap data is off by default and would be until we decide to change that (at least one release cycle), following the model of v2-transport. However, the .dat in the repository is embedded in builds by default, just to make sure we are on the same page.

    can you elaborate on why we need a build option to disable the embedding, but not the asmap feature entirely; this seems to just complicate the settings/usage handling, i.e we need to handle the case where someone has disabled the embedding (built their own binary), but then passed the -asamp option, but not provided an asmap?

    There was some interest expressed in this build option in the past though I am not sure anymore where and when that was. Might have just been the argument that this makes it easier for forks like Knots to build without the data. That and probably that embedding ~1.5MB of “obscure” data might scare some people in principle. I found the code overhead to be manageable but I am always happy to remove code and reduce the scope!

    EDIT: This was probably also before the data was checked into the repository. When discussing other solutions this build option would have been more important so that users could build when they had no file at hand.

    In the case where someone uses -asamp without a path to a file and the build does not have asmap data embedded the behavior falls back to the current one: The node looks for the file in the default location (in the datadir) and if it doesn’t find it, it returns an init error. There is a functional test that covers this behavior and one of the CI tasks builds without the embedded asmap. I wrote a bit more nuance on the parameter behavior here as well: #28792 (comment)

    How quickly does the data degrade? Im guessing not so fast that there could be any adverse effects within the supported lifetime of a binary?

    This is a difficult question to answer because part of the data is outdated by the time the map is created since part of the internet routing table is extremely volatile. In 2024 there were ~200k IPV4 BGP announcements daily for example. @jurraca has been building additional tooling and creating statistics to show how much volatility there really is and how much of it is relevant for the bitcoin network (deferring to his posts). Last but not least there is the dimension of legal entities vs. ASNs. Part of the volatility will harmless since the the same legal entity has just changed their ASN or shifted prefixes to a subsidiary etc. Unfortunately, this part is hard to quantify without considerable manual effort.

    But the guiding belief for this effort has been that even an badly outdated ASMap file should still be far better that the current default since prefixes don’t discriminate at all and it is not unusual for a node to have 2-3 their peers hosted at AWS or Hetzner. This was demonstrated in some of @virtu’s research though I can’t find a link to the presentation at the moment. Part of the relevant findings are teased here: #16599 (comment)

    Is the expectation that we’d backport updates to any embedded data for every new (point) release?

    I would recommend this, yes. I can’t think of a reason not to do it and it should be an easy commit to review because all it does is change the embedded .dat file to the same data that is in master so this should add considerable overhead afaict.

  133. fjahr force-pushed on Jun 11, 2025
  134. fjahr commented at 3:45 pm on June 11, 2025: contributor

    To convince myself that with all the bit/byte order changes, no semantics of the actual interpretation are affected, I wrote some known-correct unit tests for the asmap interpreter, added before the changes in this PR, and maintained throughout: https://github.com/sipa/bitcoin/commits/pr28792. Feel free to grab any part of it.

    Nice, I have added this as the first commit to show that it should pass on each of the following commits (with minor edits).

  135. fjahr force-pushed on Jun 13, 2025
  136. fjahr commented at 8:57 pm on June 13, 2025: contributor
    I have pushed an initial text for the asmap data documentation that should answer most of the questions @fanquake asked. I avoided answering the “who is responsible” part directly since the process is open for anyone to contribute just like contributing to bitcoin core is. IMO this is similar to how release-process.md describes the steps that need to be completed for the release but there is no prescription for who should perform these steps and it is also not spelled out explicitly that mainters need to merge the PRs that complete these steps.
  137. in doc/asmap-data.md:5 in 2495279f83 outdated
    0@@ -0,0 +1,58 @@
    1+# Embedded ASMap data
    2+
    3+## Background
    4+
    5+The ASMap feature (available via `-asmap`) makes it possible to use a peers AS Number (an ISP/hoster identifier)
    


    sipa commented at 11:01 pm on June 14, 2025:

    “peers AS Number” -> “peer’s AS number”

    Maybe add “(ASN)” as that term is used further.

  138. in doc/asmap-data.md:8 in 2495279f83 outdated
    0@@ -0,0 +1,58 @@
    1+# Embedded ASMap data
    2+
    3+## Background
    4+
    5+The ASMap feature (available via `-asmap`) makes it possible to use a peers AS Number (an ISP/hoster identifier)
    6+in netgroup bucketing in order to ensure a higher diversity in the peer
    7+set. By default the buckets are formed based on IP prefixes but this does not
    8+prevent having multiple connections to peers at the same large-scale hoster,
    


    sipa commented at 11:02 pm on June 14, 2025:
    “having multiple connections to peers” -> “having connections dominated by peers” (multiple isn’t so much a problem on itself)
  139. in doc/asmap-data.md:26 in 2495279f83 outdated
    21+Sourcing the data from a single trusted source is problematic as well.
    22+
    23+The [Kartograf](https://github.com/asmap/kartograf) tool was created to
    24+deal with these uncertainties as good as possible. The mapping data is sourced from RPKI, IRR and
    25+Routeviews. The former two are themselves used as security mechanisms to
    26+protect against BGP security issues, which why they are considered more secure and
    


    sipa commented at 11:09 pm on June 14, 2025:
    “which why” -> “which is why”
  140. in doc/asmap-data.md:27 in 2495279f83 outdated
    22+
    23+The [Kartograf](https://github.com/asmap/kartograf) tool was created to
    24+deal with these uncertainties as good as possible. The mapping data is sourced from RPKI, IRR and
    25+Routeviews. The former two are themselves used as security mechanisms to
    26+protect against BGP security issues, which why they are considered more secure and
    27+their data takes precedent. The latter is a trusted collector of BGP traffic
    


    sipa commented at 11:10 pm on June 14, 2025:
    “precedent” -> “precedence”
  141. in doc/asmap-data.md:31 in 2495279f83 outdated
    26+protect against BGP security issues, which why they are considered more secure and
    27+their data takes precedent. The latter is a trusted collector of BGP traffic
    28+and only used for IP space that is not covered by RPKI and IRR.
    29+
    30+The process in which the Kartograf project parses, processes and merges these
    31+data sources are deterministic. Given the raw download files from these
    


    sipa commented at 11:10 pm on June 14, 2025:
    “are” -> “is”
  142. in doc/asmap-data.md:33 in 2495279f83 outdated
    28+and only used for IP space that is not covered by RPKI and IRR.
    29+
    30+The process in which the Kartograf project parses, processes and merges these
    31+data sources are deterministic. Given the raw download files from these
    32+different sources, anyone can build their own map file and verify the content
    33+matches with other users results. Before the map is usable by Bitcoin Core
    


    sipa commented at 11:10 pm on June 14, 2025:
    “users” -> “users'”
  143. in doc/asmap-data.md:37 in 2495279f83 outdated
    32+different sources, anyone can build their own map file and verify the content
    33+matches with other users results. Before the map is usable by Bitcoin Core
    34+it needs to be encoded as well. This is done using `asmap-tool.py` in `contrib/asmap`
    35+and this step is deterministic as well.
    36+
    37+When it comes to obtaining the initial input data the high volatility remains
    


    sipa commented at 11:11 pm on June 14, 2025:
    comma after “data”
  144. in doc/asmap-data.md:54 in 2495279f83 outdated
    49+or an explicit schedule.
    50+
    51+## Release process
    52+
    53+As an upcoming release approaches the embedded ASMap data should be updated
    54+by replacing the `ip_asn.dat` in a newer ASMap file from the asmap-data
    


    sipa commented at 11:12 pm on June 14, 2025:
    “in” -> “to”
  145. fjahr force-pushed on Jun 15, 2025
  146. fjahr commented at 12:01 pm on June 15, 2025: contributor
    Addressed @sipa ’s comments (all taken) on the asmap data documentation.
  147. in doc/asmap-data.md:18 in ccbefd52a1 outdated
    13+available to users when they are unable to provide a file.
    14+
    15+## Data sourcing and tools
    16+
    17+ASMap is a mapping of IP prefix to ASN, essentially a snapshot of the
    18+internet routing table at a some point in time. Due to the high volatility
    


    maflcko commented at 7:51 am on June 16, 2025:
    In doc/asmap-data.md: “snapshot of the internet routing table at a some point in time.”
    a some → some [extra “a” makes the phrase ungrammatical]
    
    In doc/asmap-data.md under “Release process”: “replacing the ip_asn.dat to a newer ASMap file”
    to → with [wrong preposition]
    
    In doc/asmap-data.md under “Release process”: “obtaining a fresh for use in the upcoming release.”
    missing noun → “obtaining a fresh map for use in the upcoming release.” [omitted object makes it incomprehensible]
    

    fjahr commented at 11:21 am on June 16, 2025:
    All fixed
  148. fjahr force-pushed on Jun 16, 2025
  149. in doc/asmap-data.md:43 in 071483f649 outdated
    37+When it comes to obtaining the initial input data, the high volatility remains
    38+a challenge if users don't want to trust a single creator of used ASMap file.
    39+To overcome this, multiple users can start the download process at the exact
    40+same time which leads to a high likelihood that their downloaded data will be
    41+similar enough that they receive the same output at the end of the process.
    42+This process is regularly coordinated at the [asmap-data](https://github.com/asmap/asmap-data)
    


    sipa commented at 3:29 pm on June 24, 2025:

    See https://github.com/asmap/asmap-data/issues/25

    The repository description there says:

    This repository is strictly used for demonstration purposes to help with conceptual discussions about ASMap in the Bitcoin Core release process. Any data uploaded here should be treated with extreme caution and is not inteded for production use.

    which doesn’t seem to match the expectations around the repository as used here.


    fjahr commented at 10:50 pm on June 24, 2025:
    Thanks, I am addressing this in https://github.com/asmap/asmap-data/pull/26
  150. sipa commented at 5:24 pm on June 24, 2025: member
    Code review ACK 071483f64932e806ac8280d8c16e9bdf6be23eb3
  151. in doc/asmap-data.md:44 in 071483f649 outdated
    38+a challenge if users don't want to trust a single creator of used ASMap file.
    39+To overcome this, multiple users can start the download process at the exact
    40+same time which leads to a high likelihood that their downloaded data will be
    41+similar enough that they receive the same output at the end of the process.
    42+This process is regularly coordinated at the [asmap-data](https://github.com/asmap/asmap-data)
    43+project. If enough participants have joined the effort (5 or more is recommended) and a majority of the
    


    fanquake commented at 1:13 pm on June 25, 2025:

    If enough participants have joined the effort (5 or more is recommended)

    Just noting that of the last three additions to the repo (https://github.com/asmap/asmap-data/pull/23, https://github.com/asmap/asmap-data/pull/21, https://github.com/asmap/asmap-data/pull/19), it seems like none of them reached 5: I can see 3, 3, and 3 participants.


    fjahr commented at 4:37 pm on June 25, 2025:

    The reviewers of the PR are only ACKing that the actual data from the collaborative run was used and that the data was encoded correctly by reproducing the file hash. This is the part I am referring to a little further below:

    0Files will not be merged to the repository
    1without at least two additional reviewers confirming that the process described
    2above was followed as expected and that the encoding step yielded the same
    3file hash.
    

    The 5 participants is the number I am looking for in the collaborative run which can be seen in the issues that preceed the PRs, they should always be linked in the description of the PRs. For example this one https://github.com/asmap/asmap-data/issues/20 had 5 (preceeded the PR 21 you linked), and this one had 9 https://github.com/asmap/asmap-data/issues/22 (preceeded PR 23).

    I think can make the distinction more clear by spelling out which part is in the issue and which part is in the PR. Do you think that would make it clearer? I just try to strike a balance so that we don’t have to edit this document too frequently.

    The 5 participant minimum is kind of arbitrarily picked, I admit, but I try to get to what I would say is comparable to 3 ACKs. More would be nice but I wasn’t confident we would get more people involved on a regular basis and majority of 5 is 3 so basically this is a minimum of 3 ACKs on the input data. The mismatches are not comparable to NACKs because there is no right or wrong there (assuming no malice), they are just different because of timing or connection problems. On the mechanical encoding review part in the PR similarly I am looking for 2-3 ACKs.

  152. DrahtBot added the label Needs rebase on Jul 3, 2025
  153. fjahr force-pushed on Jul 5, 2025
  154. fjahr commented at 10:12 pm on July 5, 2025: contributor

    Rebased

    Also moving the conversation here that I started on IRC last week: I suggested some sharing of access to the tools and data repositories in the asmap github org in case of emergency and asked for feedback. @sipa suggested that instead the https://github.com/asmap/asmap-data could be moved to the bitcoin-core org instead. I now think this is actually even better because that repository holds data that is encoded particularly for use in Bitcoin Core while the rest of the tooling in the asmap org are more independent of this use case and could be used in other contexts.

    I would suggest we move the repository over once the PR is merged or very close to it, since it still needs some additional review. I guess I would still be able to help with administering the asmap-data repository after the move, similar to how @dergoegge administers the qa-assets repo for example, so there should be no additional workload for maintainers caused by this afaict.

    For completeness, here is a link to the (very brief) original conversation: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-06-26#1131356;

  155. DrahtBot removed the label Needs rebase on Jul 5, 2025
  156. DrahtBot added the label Needs rebase on Jul 19, 2025
  157. fjahr force-pushed on Jul 20, 2025
  158. fjahr commented at 9:56 pm on July 20, 2025: contributor
    Rebased and I shaved off 4 easy to review commits from here and put them into #33026. Maybe splitting this off helps to move things along a bit faster.
  159. DrahtBot removed the label Needs rebase on Jul 20, 2025
  160. DrahtBot added the label Needs rebase on Jul 25, 2025
  161. fjahr force-pushed on Jul 25, 2025
  162. DrahtBot removed the label Needs rebase on Jul 25, 2025
  163. DrahtBot added the label CI failed on Jul 25, 2025
  164. DrahtBot commented at 11:39 pm on July 25, 2025: contributor

    🚧 At least one of the CI tasks failed. Task no wallet, libbitcoinkernel: https://github.com/bitcoin/bitcoin/runs/46745894874 LLM reason (✨ experimental): Build failure caused by incorrect constructor invocation of ‘NetGroupManager’ in test code.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  165. fjahr force-pushed on Jul 31, 2025
  166. fjahr force-pushed on Jul 31, 2025
  167. DrahtBot removed the label CI failed on Aug 1, 2025
  168. fjahr force-pushed on Aug 18, 2025
  169. fjahr commented at 4:34 pm on August 18, 2025: contributor
    Updated the map to that of the latest run: https://github.com/asmap/asmap-data/pull/30
  170. achow101 removed this from the milestone 30.0 on Aug 18, 2025
  171. achow101 added this to the milestone 31.0 on Aug 18, 2025
  172. fjahr commented at 3:41 pm on September 15, 2025: contributor

    Updated the map to that of the latest run: asmap/asmap-data#30

    FYI for reviewers, new maps are available in asmap-data and will continue to be updated over the next few months but I don’t plan to update the map data that is included here for at least a while. This is by choice: If this had been merged for v30 then this map data might have been included and it would have been the default until the release of v31. So keeping the data as-is kind of simulates that and allows for an easier analysis if there are noticeable signs of “decaying” data. But of course newer (and older) maps are always available in asmap-data for further testing. If you are looking for very old maps to test with you may have to check out the history of that repository since we remove maps that are outdated after a while.

  173. laanwj commented at 11:32 am on October 22, 2025: member
    Concept ACK
  174. TheBlueMatt commented at 3:02 pm on November 2, 2025: contributor

    I couldn’t really find a better place to leave this comment, even though its clearly not really quite targeted at this PR itself (which is awesome!), so I’ll leave it here and y’all can yell at me for not finding the right place later.

    I saw in the core-dev writeups that there appears to be some intent to build asmap files based on only RPKI data (if that’s wrong, please skip the rest of this comment). I fear that that is a particularly bad idea.

    While RPKI deployment has been quite successful, it is used primarily by larger ISPs and cloud service providers, with the long-tail of providers largely not using it. In seeking to use asmap data in Bitcoin Core, part of the goal is to improve diversity of connections across different ISPs to avoid a single large ISP or cloud service provider from being able to dominate connections and address tables. If we use an asmap built only from RPKI data, that (I assume? at least in part?) implies that we treat these large cloud providers each as separate entities while the long-tail of ISPs that we’d really like to bias towards will be treated as only one or a handful of ISPs [1].

    Historically I’ve heard many complaints that ASMap data isn’t deterministic when generated from non-RPKI data and that this prevents inclusion, but I find this argument incredibly lazy. Yes, the data isn’t deterministic when built from public looking glasses or native BGP feeds, and comparing two such resulting files to determine if the differences are “valid” isn’t really all that practical, but I don’t see why this is such a blocker. As long as any differences are small (under 0.1% of addresses, I’d bet), the ability to maliciously craft an ASMap such that the network biases towards your own addresses shouldn’t be all that practical. Further, if someone is trying to maliciously bias ASMap results, its rater trivial even with RPKI - generally RIRs don’t place very strict limits on ASN availability (there are still tons available!) so its rather easy to just request a hundred or two ASNs and split your address space across those. Worse, with an RPKI-only ASMap the impact of such activities is made greater as you can ensure that your prefixes are included as separate ISPs, giving you an edge over the long-tail ISPs that may not be.

    [1] this, of course, isn’t quite true, if we’re worried about eclipse attacks the long-tail of ISPs is likely connected to through T1s or other large carriers, and biasing towards the long-tail may end up giving large carriers an edge in their ability to be MiTM. This is, however, the price we pay for ISP diversity, which has much better non-MiTM sybil resistance which IMO is a much more likely threat model.

  175. sipa commented at 3:37 pm on November 2, 2025: member
    @TheBlueMatt FYI, the repository with asmap construction scripts is https://github.com/asmap/kartograf. My understanding (which may be incomplete) is that RPKI is used where available, but supplemented with other sources.
  176. fjahr commented at 3:51 pm on November 2, 2025: contributor

    I saw in the core-dev writeups that there appears to be some intent to build asmap files based on only RPKI data (if that’s wrong, please skip the rest of this comment). I fear that that is a particularly bad idea.

    You mean the asmap session probably and I think that’s a misunderstand of the notes. I don’t think they are released yet but maybe you can point me to the line you are referring to and we can amend that to be better understandable. I thought there might a btc transcripts PR already but I couldn’t find it so far. The notes are best effort and I reviewed them but only briefly so it wouldn’t surprise me if something slipped through that can be interpretted differently than it is intended.

    There is not intent to change what we have been doing for the coordinated runs in asmap-data, which is RPKI + RIR IRRs + BGP announcements from Routeviews (in order of priority). Someone could use Kartograf to build a RPKI-only map but we are working pretty hard to make sure we can use all the sources that make sense so get the best possible view of the network.

  177. fjahr commented at 3:55 pm on November 2, 2025: contributor

    I was misled by https://github.com/bitcointranscripts/bitcointranscripts/pull/593/files#diff-e25cced7d45bde5a991f8ac638162335638ea1ba39b21a5cb76d5e5e31b31345R9-R10

    Ok, I see, that could have been more clear as either “depends on rpki-client” (the only dependency outside of python) or “depends on RPKI, IRR and Routeviews downloads” or something. I will clarify in the PR.

  178. fjahr commented at 4:03 pm on November 2, 2025: contributor
    This should clarify it: https://github.com/bitcointranscripts/bitcointranscripts/pull/594 Sorry for triggering a false alarm with the notes :D
  179. DrahtBot added the label Needs rebase on Nov 4, 2025
  180. fjahr force-pushed on Nov 5, 2025
  181. DrahtBot removed the label Needs rebase on Nov 6, 2025
  182. in cmake/script/GenerateHeaderFromRaw.cmake:21 in e00deb2cfa outdated
    17@@ -18,6 +18,5 @@ ${formatted_bytes}
    18 };
    19 
    20 inline constexpr std::span ${raw_source_basename}{detail_${raw_source_basename}_raw};
    21-}
    22-")
    23+}")
    


    hodlinator commented at 3:59 pm on November 11, 2025:
    Why change this? Various tools expect an empty line before EOF (although I admit it’s probably rarely relevant for these generated files).

    fjahr commented at 9:27 pm on November 15, 2025:
    I can’t remember encountering such a tool but primarily I was going by our style guide, which I thought had mentioned that we don’t do new lines before EOF but I guess it doesn’t (anymore?) since I couldn’t find it right now. At least we have InsertNewlineAtEOF: false in our .clang-format.

    hodlinator commented at 7:48 pm on November 17, 2025:

    Good point regarding .clang-format, though that value from 13f36c020f0329b5e975282b45292fdf2a495e31 comes from clang-format defaults rather than a conscious decision on our part. (It will not remove newlines at EOF if they exist though, https://releases.llvm.org/16.0.0/tools/clang/docs/ClangFormatStyleOptions.html#insertnewlineateof).

    GitHub displays a red warning symbol in diffs when one removes the empty newline at ends of files.

    git apply diff.patch fails for…

     0diff --git a/src/util/asmap.cpp b/src/util/asmap.cpp
     1index 267bc652d5..3ebce045d6 100644
     2--- a/src/util/asmap.cpp
     3+++ b/src/util/asmap.cpp
     4@@ -335,6 +335,7 @@ std::vector<std::byte> DecodeAsmap(fs::path path)
     5 
     6     // Read entire file into memory
     7     std::vector<std::byte> buffer(length);
     8+    foo
     9     file.read(buffer);
    10 
    11     if (!CheckAsmap(buffer)) {
    

    ….if there is no newline after the ending open brace.

    0₿ git apply diff.patch
    1error: corrupt patch at line 12
    

    Tried with git apply --inaccurate-eof diff.patch and git apply --whitespace=fix diff.patch but they produce the same error.

    Please consider it a nit though.


    hodlinator commented at 9:41 am on November 18, 2025:
    Related: #33896

    hodlinator commented at 12:17 pm on January 12, 2026:
    (Since the .clang-format rule change in #33896 was merged in 2 days without any resistance I’d like to repeat my suggestion to exclude the change to cmake/script/GenerateHeaderFromRaw.cmake in this PR which makes it generate files that are in violation of that rule).
  183. in src/test/fuzz/asmap_direct.cpp:15 in e00deb2cfa
    11@@ -12,6 +12,24 @@
    12 
    13 #include <cassert>
    14 
    15+std::vector<std::byte> BitsToBytes(std::span<const uint8_t> input) noexcept
    


    hodlinator commented at 4:01 pm on November 11, 2025:

    A variant of BitsToBytes from src/merkleblock.cpp already exists, but it seems okay to make more modern divisionless version (albeit dividing by 8 probably gets optimized as 3 right shifts). I understand not wanting to touch & reuse consensus code for now.

    nit: Clearer parameter name?

    0std::vector<std::byte> BitsToBytes(std::span<const uint8_t> bits) noexcept
    

    fjahr commented at 2:48 pm on November 16, 2025:
  184. in src/util/asmap.h:16 in e00deb2cfa
    12 #include <cstdint>
    13+#include <span>
    14 #include <vector>
    15 
    16-uint32_t Interpret(const std::vector<bool> &asmap, const std::vector<bool> &ip);
    17+uint32_t Interpret(const std::span<const std::byte>& asmap, const std::span<const std::byte>& ip);
    


    hodlinator commented at 4:05 pm on November 11, 2025:

    nit: We tend to drop const...& for spans, similar to how we don’t use const int&.

    0uint32_t Interpret(std::span<const std::byte> asmap, std::span<const std::byte> ip);
    

    As the style in:

    https://github.com/bitcoin/bitcoin/blob/e00deb2cfa60e846ea3de48bed35c7d3992db4bd/src/test/fuzz/asmap_direct.cpp#L15

    (One can always add const for the function definition in the .cpp file, as it is the concern of the implementation).


    fjahr commented at 8:51 pm on November 16, 2025:
  185. in ci/test/00_setup_env_native_nowallet_libbitcoinkernel.sh:16 in e00deb2cfa
    12@@ -13,4 +13,4 @@ export PACKAGES="python3-zmq python3-pip clang-17 llvm-17 libc++abi-17-dev libc+
    13 export PIP_PACKAGES="--break-system-packages pycapnp"
    14 export DEP_OPTS="NO_WALLET=1 CC=clang-17 CXX='clang++-17 -stdlib=libc++'"
    15 export GOAL="install"
    16-export BITCOIN_CONFIG="-DREDUCE_EXPORTS=ON -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON -DBUILD_SHARED_LIBS=ON"
    17+export BITCOIN_CONFIG="-DREDUCE_EXPORTS=ON -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON -DBUILD_SHARED_LIBS=ON -DWITH_EMBEDDED_ASMAP=NO"
    


    hodlinator commented at 4:10 pm on November 11, 2025:

    nit: NO might work, but isn’t OFF a clearer inversion of ON?

    0export BITCOIN_CONFIG="-DREDUCE_EXPORTS=ON -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON -DBUILD_SHARED_LIBS=ON -DWITH_EMBEDDED_ASMAP=OFF"
    

    fjahr commented at 10:50 pm on November 16, 2025:

    One might say that “NO” is literally the inversion of “ON” 😜

    Changed in the latest push 🙂


    hodlinator commented at 12:37 pm on November 21, 2025:

    PR description nit:

    0- -DWITH_EMBEDDED_ASMAP=NO
    1+ -DWITH_EMBEDDED_ASMAP=OFF
    

    hodlinator commented at 8:14 am on November 25, 2025:

    I was pointing to this sentence in the PR description at the top, where it’s still using “NO” rather than “OFF”, as was changed in the code:

    By default the data at that location is embedded into the binary but there is also a build option to prevent this (-DWITH_EMBEDDED_ASMAP=NO).

    (You can respond by jumping here: #28792 (review)).


    fjahr commented at 1:17 pm on November 25, 2025:
    Ah, I get it now, I guess it was late yesterday. Seems like @fanquake already fixed it.
  186. in src/netgroup.cpp:95 in e00deb2cfa outdated
    94         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
    95-            for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
    96-                ip_bits[byte_i * 8 + bit_i] = (IPV4_IN_IPV6_PREFIX[byte_i] >> (7 - bit_i)) & 1;
    97-            }
    98+            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
    99         }
    


    hodlinator commented at 7:00 pm on November 11, 2025:

    nit: Could replace fors with copy_n():

     0--- a/src/netgroup.cpp
     1+++ b/src/netgroup.cpp
     2@@ -89,9 +89,8 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
     3     std::vector<std::byte> ip_bits(16);
     4     if (address.HasLinkedIPv4()) {
     5         // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
     6-        for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
     7-            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
     8-        }
     9+        std::copy_n(std::as_bytes(std::span{IPV4_IN_IPV6_PREFIX}).begin(),
    10+                    IPV4_IN_IPV6_PREFIX.size(), ip_bits.begin());
    11         uint32_t ipv4 = address.GetLinkedIPv4();
    12         for (int i = 0; i < 4; ++i) {
    13             ip_bits[12 + i] = std::byte((ipv4 >> (24 - i * 8)) & 0xFF);
    14@@ -100,9 +99,9 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
    15         // Use all 128 bits of the IPv6 address otherwise
    16         assert(address.IsIPv6());
    17         auto addr_bytes = address.GetAddrBytes();
    18-        for (int8_t byte_i = 0; byte_i < 16; ++byte_i) {
    19-            ip_bits[byte_i] = std::byte(addr_bytes[byte_i]);
    20-        }
    21+        assert(addr_bytes.size() == ip_bits.size());
    22+        std::copy_n(std::as_bytes(std::span{addr_bytes}).begin(),
    23+                    addr_bytes.size(), ip_bits.begin());
    24     }
    25     uint32_t mapped_as = Interpret(m_asmap, ip_bits);
    26     return mapped_as;
    

    fjahr commented at 2:39 pm on November 16, 2025:
    Addressed in #33878
  187. in src/test/fuzz/asmap.cpp:33 in e00deb2cfa
    33     for (int i = 0; i < asmap_size; ++i) {
    34-        for (int j = 0; j < 8; ++j) {
    35-            asmap.push_back((buffer[1 + i] >> j) & 1);
    36-        }
    37+        asmap.push_back(std::byte(buffer[1 + i]));
    38     }
    


    hodlinator commented at 7:22 pm on November 11, 2025:

    nit: Could replace for with:

    0    std::ranges::copy(std::as_bytes(buffer.subspan(1, asmap_size)), std::back_inserter(asmap));
    

    fjahr commented at 2:45 pm on November 16, 2025:
  188. in src/test/fuzz/asmap_direct.cpp:49 in e00deb2cfa
    45@@ -28,22 +46,24 @@ FUZZ_TARGET(asmap_direct)
    46     }
    47     if (!sep_pos_opt) return; // Needs exactly 1 separator
    48     const size_t sep_pos{sep_pos_opt.value()};
    49-    if (buffer.size() - sep_pos - 1 > 128) return; // At most 128 bits in IP address
    50+    const size_t ip_len = buffer.size() - sep_pos - 1;
    


    hodlinator commented at 8:00 pm on November 11, 2025:

    nit: Align with style of prior line (sep_pos)?

    0    const size_t ip_len{buffer.size() - sep_pos - 1};
    

    fjahr commented at 2:59 pm on November 16, 2025:
  189. in src/test/fuzz/asmap_direct.cpp:53 in e00deb2cfa
    51+    if (ip_len > 128) return; // At most 128 bits in IP address
    52 
    53     // Checks on asmap
    54-    std::vector<bool> asmap(buffer.begin(), buffer.begin() + sep_pos);
    55-    if (SanityCheckASMap(asmap, buffer.size() - 1 - sep_pos)) {
    56+    auto asmap = BitsToBytes(std::span{buffer}.first(sep_pos));
    


    hodlinator commented at 8:58 pm on November 11, 2025:

    nit: buffer is already a span:

     0--- a/src/test/fuzz/asmap_direct.cpp
     1+++ b/src/test/fuzz/asmap_direct.cpp
     2@@ -50,11 +50,11 @@ FUZZ_TARGET(asmap_direct)
     3     if (ip_len > 128) return; // At most 128 bits in IP address
     4 
     5     // Checks on asmap
     6-    auto asmap = BitsToBytes(std::span{buffer}.first(sep_pos));
     7+    auto asmap = BitsToBytes(buffer.first(sep_pos));
     8     if (SanityCheckASMap(asmap, ip_len)) {
     9         // Verify that for valid asmaps, no prefix (except up to 7 zero padding bits) is valid.
    10         for (size_t prefix_len = sep_pos - 1; prefix_len > 0; --prefix_len) {
    11-            auto prefix = BitsToBytes(std::span{buffer}.first(prefix_len));
    12+            auto prefix = BitsToBytes(buffer.first(prefix_len));
    13             // We have to skip the prefixes of the same length as the original
    14             // asmap, since they will contain some zero padding bits in the last
    15             // byte.
    16@@ -63,7 +63,7 @@ FUZZ_TARGET(asmap_direct)
    17         }
    18 
    19         // No address input should trigger assertions in interpreter
    20-        auto addr = BitsToBytes(std::span{buffer}.subspan(sep_pos + 1));
    21+        auto addr = BitsToBytes(buffer.subspan(sep_pos + 1));
    22         (void)Interpret(asmap, addr);
    23     }
    24 }
    

    fjahr commented at 3:01 pm on November 16, 2025:
  190. in src/util/asmap.cpp:148 in e00deb2cfa
    146+bool SanityCheckASMap(const std::span<const std::byte>& asmap, int bits)
    147 {
    148-    const std::vector<bool>::const_iterator begin = asmap.begin(), endpos = asmap.end();
    149-    std::vector<bool>::const_iterator pos = begin;
    150+    size_t pos{0};
    151+    size_t endpos{asmap.size() * 8};
    


    hodlinator commented at 10:07 pm on November 11, 2025:

    nit: So much happening in this function, please restore const:

    0    const size_t endpos{asmap.size() * 8};
    

    fjahr commented at 3:07 pm on November 16, 2025:
  191. in src/util/asmap.cpp:38 in e00deb2cfa
    34+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    35+{
    36+    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
    37+}
    38+
    39+uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint8_t minval, const std::vector<uint8_t>& bit_sizes)
    


    hodlinator commented at 10:15 am on November 12, 2025:

    nit: Could avoid some static initialization+deinitialization CPU cycles:

    0uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint8_t minval, const std::span<const uint8_t> bit_sizes)
    
     0--- a/src/util/asmap.cpp
     1+++ b/src/util/asmap.cpp
     2@@ -35,7 +35,7 @@ inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
     3     return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
     4 }
     5 
     6-uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint8_t minval, const std::vector<uint8_t>& bit_sizes)
     7+uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte> data, uint8_t minval, const std::span<const uint8_t> bit_sizes)
     8 {
     9     uint32_t val = minval;
    10     bool bit;
    11@@ -70,29 +70,29 @@ enum class Instruction : uint32_t
    12     DEFAULT = 3,
    13 };
    14 
    15-const std::vector<uint8_t> TYPE_BIT_SIZES{0, 0, 1};
    16 Instruction DecodeType(size_t& bitpos, const std::span<const std::byte>& data)
    17 {
    18+    constexpr uint8_t TYPE_BIT_SIZES[]{0, 0, 1};
    19     return Instruction(DecodeBits(bitpos, data, 0, TYPE_BIT_SIZES));
    20 }
    21 
    22-const std::vector<uint8_t> ASN_BIT_SIZES{15, 16, 17, 18, 19, 20, 21, 22, 23, 24};
    23 uint32_t DecodeASN(size_t& bitpos, const std::span<const std::byte>& data)
    24 {
    25+    constexpr uint8_t ASN_BIT_SIZES[]{15, 16, 17, 18, 19, 20, 21, 22, 23, 24};
    26     return DecodeBits(bitpos, data, 1, ASN_BIT_SIZES);
    27 }
    28 
    29 
    30-const std::vector<uint8_t> MATCH_BIT_SIZES{1, 2, 3, 4, 5, 6, 7, 8};
    31 uint32_t DecodeMatch(size_t& bitpos, const std::span<const std::byte>& data)
    32 {
    33+    constexpr uint8_t MATCH_BIT_SIZES[]{1, 2, 3, 4, 5, 6, 7, 8};
    34     return DecodeBits(bitpos, data, 2, MATCH_BIT_SIZES);
    35 }
    36 
    37 
    38-const std::vector<uint8_t> JUMP_BIT_SIZES{5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30};
    39 uint32_t DecodeJump(size_t& bitpos, const std::span<const std::byte>& data)
    40 {
    41+    constexpr uint8_t JUMP_BIT_SIZES[]{5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30};
    42     return DecodeBits(bitpos, data, 17, JUMP_BIT_SIZES);
    43 }
    44 
    

    (Also removes data from file-global scope).


    fjahr commented at 4:11 pm on November 16, 2025:

    Addressed in #33878

    (Also removes data from file-global scope).

    Not a problem because of the anonymous namespace I think.

  192. in src/util/asmap.cpp:130 in e00deb2cfa
    145             matchlen = std::bit_width(match) - 1;
    146             if (bits < matchlen) break; // Not enough input bits
    147             for (uint32_t bit = 0; bit < matchlen; bit++) {
    148-                if ((ip[ip.size() - bits]) != ((match >> (matchlen - 1 - bit)) & 1)) {
    149+                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
    150                     return default_asn;
    


    hodlinator commented at 8:51 am on November 13, 2025:

    Side note: the documentation below indicates that a failure should be returned if no default ASN has been set, but the current code (also on master) returns zero: https://github.com/bitcoin/bitcoin/blob/e00deb2cfa60e846ea3de48bed35c7d3992db4bd/contrib/asmap/asmap.py#L158-L165

    I think it would be good to update the documentation or the code to match, but understand if others don’t want to do it in the context of this PR.


    fjahr commented at 3:06 pm on November 16, 2025:
    I have been working on improved documentation. Please check the new commit in #33878 to see if that is fixing this completely.
  193. in src/test/fuzz/asmap.cpp:20 in e00deb2cfa outdated
    30-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    31-    true, true, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, // Match 0x00
    32-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true,         // Match 0xFF
    33-    true, true, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true          // Match 0xFF
    34-};
    35+static const auto IPV4_PREFIX_ASMAP = "fb03ec0fb03fc0fe00fb03ec0fb03fc0fe00fb03ec0fb0fffffeff"_hex_v;
    


    hodlinator commented at 9:28 am on November 13, 2025:

    remark in 35e6c5bdd467e8342ed606f338c3b4a363c2ef35 - “refactor: Operate on bytes instead of bits in Asmap code”:

    Endian-inversion on a bit-level

    The old vector<bool> data on master converts to "dfc0...."_hex (df being the bit-endian-inversion of fb, and c0 being the inversion of 03). But the old code would endian-invert each byte while loading from disk or from a byte-buffer:

    https://github.com/bitcoin/bitcoin/blob/8f7e02a08626a3141b64df09dfeb38d7d9e2974c/src/util/asmap.cpp#L212-L215

    https://github.com/bitcoin/bitcoin/blob/8f7e02a08626a3141b64df09dfeb38d7d9e2974c/src/test/netbase_tests.cpp#L633-L635

    https://github.com/bitcoin/bitcoin/blob/8f7e02a08626a3141b64df09dfeb38d7d9e2974c/src/test/addrman_tests.cpp#L56-L58

    And since we want to remove those loops in the new version and just consume the bytes, the “asmap-runtime” now needs to endian-invert each byte internally on the fly when interpreting the individual bits (GetBitLE).

    So it makes sense to invert the data in test/fuzz/asmap.cpp.

    Notable: fb03ec0fb03fc0fe00fb03ec0fb03fc0fe00fb03ec0fb0fffffeff exactly matches the first 27 bytes of src/test/data/asmap.raw.

  194. in src/test/fuzz/asmap_direct.cpp:33 in e00deb2cfa outdated
    28+    if (next_byte_bits) ret.push_back(std::byte(next_byte));
    29+
    30+    return ret;
    31+}
    32+
    33 FUZZ_TARGET(asmap_direct)
    


    hodlinator commented at 1:50 pm on November 13, 2025:

    How about switching to using raw bytes instead and dropping BitsToBytes?

     0FUZZ_TARGET(asmap_direct)
     1{
     2    FuzzedDataProvider provider{buffer.data(), buffer.size()};
     3    const int ip_len{provider.ConsumeIntegralInRange<int>(0, 128)}; // At most 128 bits in IP address
     4    const auto asmap_size{static_cast<int64_t>(provider.remaining_bytes()) - ((ip_len + 7) / 8)};
     5    if (asmap_size < 0) return;
     6
     7    // Checks on asmap
     8    const auto asmap{provider.ConsumeBytes<std::byte>(asmap_size)};
     9    if (SanityCheckASMap(asmap, ip_len)) {
    10        // Verify that for valid asmaps, no prefix (except up to 7 zero padding bits) is valid.
    11        auto asmap_prefix{asmap};
    12        // We have to skip the prefixes of the same length as the original
    13        // asmap, since they will contain some zero padding bits in the last
    14        // byte.
    15        asmap_prefix.pop_back();
    16        while (!asmap_prefix.empty()) {
    17            assert(!SanityCheckASMap(asmap_prefix, ip_len));
    18            for (int bit = 0; bit < 8; ++bit) {
    19                asmap_prefix.back() &= ~std::byte{uint8_t(1 << bit)};
    20                assert(!SanityCheckASMap(asmap_prefix, ip_len));
    21            }
    22            asmap_prefix.pop_back();
    23        }
    24        // No address input should trigger assertions in interpreter
    25        const auto addr{provider.ConsumeRemainingBytes<std::byte>()};
    26        (void)Interpret(asmap, addr);
    27    }
    28}
    

    fjahr commented at 2:58 pm on November 16, 2025:
    Probably not a bad idea in principle but the nature of the test changes with your suggested code, so I would like to manage the PR scope a bit and keep this for a follow-up.
  195. in src/netgroup.cpp:17 in e00deb2cfa
    15+uint256 NetGroupManager::GetAsmapVersion() const
    16 {
    17     if (!m_asmap.size()) return {};
    18-
    19-    return (HashWriter{} << m_asmap).GetHash();
    20+    return AsmapVersion(m_asmap);
    


    hodlinator commented at 8:02 pm on November 13, 2025:

    In 3da238bf0585fb2ed93e7976b8b2e27fa84536e3 “refactor: Unify asmap version calculation and naming”: If we keep naming the called function AsmapVersion() rather than AsmapChecksum(), it should probably be the one checking for zero size and returning {}/zeroes, rather than the if-statement above.

     0uint256 NetGroupManager::GetAsmapVersion() const
     1{
     2    return AsmapVersion(m_asmap);
     3}
     4
     5uint256 AsmapVersion(const std::span<const std::byte>& data)
     6{
     7    if (data.empty()) return {};
     8
     9    HashWriter asmap_hasher;
    10    asmap_hasher << data;
    11    return asmap_hasher.GetHash();
    12}
    

    fjahr commented at 3:17 pm on November 16, 2025:
    Done in #33878
  196. in src/util/asmap.cpp:217 in e00deb2cfa
    213+    if (!SanityCheckASMap(data, 128)) {
    214+        LogInfo("Sanity check of asmap data failed\n");
    215+        return {};
    216+    }
    217+    return data;
    218+}
    


    hodlinator commented at 8:17 pm on November 13, 2025:

    Seems more useful to return a bool instead of the input / empty. Could make it more like SanityCheckASMap, just with default bit-length.

    0bool CheckAsmap(const std::span<const std::byte>& data)
    1{
    2    if (!SanityCheckASMap(data, 128)) {
    3        LogWarning("Sanity check of asmap data failed\n");
    4        return false;
    5    }
    6    return true;
    7}
    

    Also, DecodeAsmap() could be changed to call CheckAsmap() since it’s also using 128.


    fjahr commented at 4:38 pm on November 16, 2025:
  197. in src/netgroup.h:92 in e00deb2cfa
    89+     * asmap data.
    90+     */
    91+    const std::span<const std::byte>  m_asmap;
    92+    std::vector<std::byte> m_loaded_asmap;
    93+
    94+    explicit NetGroupManager(std::span<const std::byte> asmap, std::vector<std::byte> loaded_asmap)
    


    hodlinator commented at 8:34 pm on November 13, 2025:

    The vector would be better as an r-value here and in WithLoadedAsmap() to make clear that we expect to take ownership (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f18-for-will-move-from-parameters-pass-by-x-and-stdmove-the-parameter).

    0    explicit NetGroupManager(std::span<const std::byte> asmap, std::vector<std::byte>&& loaded_asmap)
    

    Also, m_asmap(asmap.empty() ? std::span<const std::byte>() : asmap) seems overly verbose.

     0diff --git a/src/init.cpp b/src/init.cpp
     1index afc317ea74..21c0e140a5 100644
     2--- a/src/init.cpp
     3+++ b/src/init.cpp
     4@@ -1598,8 +1598,8 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
     5                         InitError(strprintf(_("Could not read embedded asmap data")));
     6                         return false;
     7                     }
     8-                    node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithEmbeddedAsmap(asmap));
     9                     asmap_version = AsmapVersion(asmap);
    10+                    node.netgroupman = std::make_unique<NetGroupManager>(NetGroupManager::WithEmbeddedAsmap(asmap));
    11                     LogInfo("Opened asmap data (%zu bytes) from embedded byte array\n", asmap.size());
    12                 #else
    13                     // If there is no embedded data, fail and report the default
    14diff --git a/src/netgroup.h b/src/netgroup.h
    15index 8040656bb9..588fb4ca75 100644
    16--- a/src/netgroup.h
    17+++ b/src/netgroup.h
    18@@ -20,9 +20,8 @@ public:
    19         return NetGroupManager(asmap, {});
    20     }
    21 
    22-    static NetGroupManager WithLoadedAsmap(std::vector<std::byte> loaded_asmap) {
    23-        std::span<const std::byte> asmap_span(loaded_asmap);
    24-        return NetGroupManager(asmap_span, std::move(loaded_asmap));
    25+    static NetGroupManager WithLoadedAsmap(std::vector<std::byte>&& asmap) {
    26+        return NetGroupManager(std::span{asmap}, std::move(asmap));
    27     }
    28 
    29     static NetGroupManager NoAsmap() {
    30@@ -89,9 +88,9 @@ private:
    31     const std::span<const std::byte>  m_asmap;
    32     std::vector<std::byte> m_loaded_asmap;
    33 
    34-    explicit NetGroupManager(std::span<const std::byte> asmap, std::vector<std::byte> loaded_asmap)
    35-        : m_asmap(asmap.empty() ? std::span<const std::byte>() : asmap),
    36-          m_loaded_asmap(std::move(loaded_asmap))
    37+    explicit NetGroupManager(std::span<const std::byte> embedded_asmap, std::vector<std::byte>&& loaded_asmap)
    38+        : m_asmap{embedded_asmap},
    39+          m_loaded_asmap{std::move(loaded_asmap)}
    40     {}
    41 };
    42 
    43diff --git a/src/test/fuzz/util/net.h b/src/test/fuzz/util/net.h
    44index 537df1d133..acb9170913 100644
    45--- a/src/test/fuzz/util/net.h
    46+++ b/src/test/fuzz/util/net.h
    47@@ -239,7 +239,7 @@ public:
    48     if (!SanityCheckASMap(std::span<std::byte>(asmap), 128)) {
    49         return NetGroupManager::NoAsmap();
    50     }
    51-    return NetGroupManager::WithLoadedAsmap(asmap);
    52+    return NetGroupManager::WithLoadedAsmap(std::move(asmap));
    53 }
    54 
    55 inline CSubNet ConsumeSubNet(FuzzedDataProvider& fuzzed_data_provider) noexcept
    

    fjahr commented at 4:00 pm on November 16, 2025:
  198. in src/util/asmap.cpp:108 in e00deb2cfa
    117     uint32_t default_asn = 0;
    118     uint32_t jump, match, matchlen;
    119     Instruction opcode;
    120-    while (pos != endpos) {
    121-        opcode = DecodeType(pos, endpos);
    122+    while (pos < asmap.size() * 8) {
    


    hodlinator commented at 8:02 pm on November 14, 2025:

    nit: Would be nice to retain endpos here in Interpret() as done in SanityCheckASMap() to keep them more similar.

     0--- a/src/util/asmap.cpp
     1+++ b/src/util/asmap.cpp
     2@@ -101,11 +101,12 @@ uint32_t DecodeJump(size_t& bitpos, const std::span<const std::byte>& data)
     3 uint32_t Interpret(const std::span<const std::byte>& asmap, const std::span<const std::byte>& ip)
     4 {
     5     size_t pos{0};
     6+    const size_t endpos{asmap.size() * 8};
     7     uint8_t bits = ip.size() * 8;
     8     uint32_t default_asn = 0;
     9     uint32_t jump, match, matchlen;
    10     Instruction opcode;
    11-    while (pos < asmap.size() * 8) {
    12+    while (pos != endpos) {
    13         opcode = DecodeType(pos, asmap);
    14         if (opcode == Instruction::RETURN) {
    15             default_asn = DecodeASN(pos, asmap);
    16@@ -115,7 +116,7 @@ uint32_t Interpret(const std::span<const std::byte>& asmap, const std::span<cons
    17             jump = DecodeJump(pos, asmap);
    18             if (jump == INVALID) break; // Jump offset straddles EOF
    19             if (bits == 0) break; // No input bits left
    20-            if (int64_t{jump} >= static_cast<int64_t>(asmap.size() * 8 - pos)) break; // Jumping past EOF
    21+            if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
    22             if (GetBitBE(ip, bits)) {
    23                 pos += jump;
    24             }
    

    fjahr commented at 3:04 pm on November 16, 2025:
  199. in test/functional/feature_asmap.py:21 in e00deb2cfa outdated
    30@@ -31,7 +31,7 @@
    31 
    32 DEFAULT_ASMAP_FILENAME = 'ip_asn.map' # defined in src/init.cpp
    33 ASMAP = 'src/test/data/asmap.raw' # path to unit test skeleton asmap
    34-VERSION = 'fec61fa21a9f46f3b17bdcd660d7f4cd90b966aad3aec593c99b35f0aca15853'
    35+VERSION = 'bafc9da308f45179443bd1d22325400ac9104f741522d003e3fac86700f68895'
    


    hodlinator commented at 9:44 am on November 15, 2025:

    remark in 0b13a009d259adff3468ad5111295a80584ee51b “refactor: Use span instead of vector for data in util/asmap”: When will I get used to vectors being serialized with size and spans being serialized without size, leading to different hashes? :)

    Would be nice to point out in the commit message: “The version hash changes due to span’s being serialized without their size-prefix (unlike vectors).”


    fjahr commented at 10:54 pm on November 16, 2025:
    Missed the bit about the commit message initially, now also addressed in #33878
  200. in src/init.cpp:1560 in e00deb2cfa
    1555@@ -1544,33 +1556,65 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1556     ApplyArgsManOptions(args, peerman_opts);
    1557 
    1558     {
    1559-
    1560-        // Read asmap file if configured
    1561-        std::vector<bool> asmap;
    1562+        // Read asmap file or embedded data if configured and initialize
    1563+        // Netgroupman with or without it
    


    hodlinator commented at 9:56 am on November 15, 2025:
    meganit in 0b13a009d259adff3468ad5111295a80584ee51b “refactor: Use span instead of vector for data in util/asmap”: This comment should probably change in one of the later commits, like f5c8e41733fe91b6f7c66056a0a40c9b8ab86257 “init, net: Implement usage of binary-embedded asmap data”.

    fjahr commented at 3:20 pm on November 16, 2025:
    Removed the embedding mention in #33878
  201. in src/init.cpp:1581 in e00deb2cfa outdated
    1591+                    // on if it was passed or the default location
    1592+                    if (asmap_file_set) {
    1593+                        InitError(strprintf(_("Could not parse asmap file %s"), fs::quoted(fs::PathToString(asmap_path))));
    1594+                    } else {
    1595+                        InitError(strprintf(_("Could not parse asmap file in default location %s"), fs::quoted(fs::PathToString(asmap_path))));
    1596+                    }
    


    hodlinator commented at 12:29 pm on November 15, 2025:
    Why does it matter whether it was from the default location or not?

    fjahr commented at 9:28 pm on November 16, 2025:
    It’s just clearer where the error is come from. A user may have forgotten that they had a file in the default location and this could save them a bit of error chasing. But this is pretty much irrelevant now since it looks like the default location will be removed shortly, see #33770

    hodlinator commented at 8:05 pm on November 17, 2025:
    Ah, wasn’t on my radar, cheers! Does simplify things a bit.
  202. in src/init.cpp:1562 in e00deb2cfa
    1560-        // Read asmap file if configured
    1561-        std::vector<bool> asmap;
    1562+        // Read asmap file or embedded data if configured and initialize
    1563+        // Netgroupman with or without it
    1564+        assert(!node.netgroupman);
    1565+        uint256 asmap_version;
    


    hodlinator commented at 12:31 pm on November 15, 2025:
    nit: Could move asmap_version into the if-block since it’s not used outside it.

    fjahr commented at 3:21 pm on November 16, 2025:
    Done in #33878

    hodlinator commented at 8:07 pm on November 17, 2025:
    Thanks! (I believe that’s fixed in this PR - 093980af8e4e594716b9219931fd441266c1fcd4 “init, net: Implement usage of binary-embedded asmap data”)
  203. in doc/asmap-data.md:7 in e00deb2cfa
    0@@ -0,0 +1,58 @@
    1+# Embedded ASMap data
    2+
    3+## Background
    4+
    5+The ASMap feature (available via `-asmap`) makes it possible to use a peer's AS Number (ASN), an ISP/hoster identifier,
    6+in netgroup bucketing in order to ensure a higher diversity in the peer
    7+set. By default the buckets are formed based on IP prefixes but this does not
    


    hodlinator commented at 12:36 pm on November 15, 2025:

    nit: Make it clearer that we are not talking about ASMap here:

    0set. When not using this, the default behavior is to have the buckets formed based on IP prefixes but this does not
    

    fjahr commented at 10:50 pm on November 16, 2025:
    Done
  204. in src/util/asmap.h:18 in e00deb2cfa
    15 
    16-uint32_t Interpret(const std::vector<bool> &asmap, const std::vector<bool> &ip);
    17+uint32_t Interpret(const std::span<const std::byte>& asmap, const std::span<const std::byte>& ip);
    18 
    19-bool SanityCheckASMap(const std::vector<bool>& asmap, int bits);
    20+bool SanityCheckASMap(const std::span<const std::byte>& asmap, int bits);
    


    hodlinator commented at 12:40 pm on November 15, 2025:
    meganit: We use Asmap casing in the other functions, would be nice to make this SanityCheckAsmap. Maybe as an added commit to #33026.

    fjahr commented at 9:40 pm on November 16, 2025:
  205. in src/netgroup.h:95 in e00deb2cfa
    92+    std::vector<std::byte> m_loaded_asmap;
    93+
    94+    explicit NetGroupManager(std::span<const std::byte> asmap, std::vector<std::byte> loaded_asmap)
    95+        : m_asmap(asmap.empty() ? std::span<const std::byte>() : asmap),
    96+          m_loaded_asmap(std::move(loaded_asmap))
    97+    {}
    


    hodlinator commented at 1:21 pm on November 15, 2025:

    nit: Could also remove stray space and add sanity check in ctor:

    0    const std::span<const std::byte> m_asmap;
    1    std::vector<std::byte> m_loaded_asmap;
    2
    3    explicit NetGroupManager(std::span<const std::byte> asmap, std::vector<std::byte> loaded_asmap)
    4        : m_asmap(asmap.empty() ? std::span<const std::byte>() : asmap),
    5          m_loaded_asmap(std::move(loaded_asmap))
    6    {
    7        assert(m_loaded_asmap.empty() || m_asmap.data() == m_loaded_asmap.data());
    8    }
    

    fjahr commented at 4:04 pm on November 16, 2025:
  206. in src/netgroup.h:90 in e00deb2cfa
    87+     * thread-safe. m_asmap can either point to m_loaded_asmap which holds
    88+     * data loaded from an external file at runtime or it can point to embedded
    89+     * asmap data.
    90+     */
    91+    const std::span<const std::byte>  m_asmap;
    92+    std::vector<std::byte> m_loaded_asmap;
    


    hodlinator commented at 1:31 pm on November 15, 2025:

    Would be good to declare these public:

    0    NetGroupManager(const NetGroupManager&) = delete;
    1    NetGroupManager(NetGroupManager&&) = default;
    2    NetGroupManager& operator=(const NetGroupManager&) = delete;
    3    NetGroupManager& operator=(NetGroupManager&&) = delete;
    

    Otherwise with the auto-generated l-value versions there naively copy the span, making it point to the other NetGroupManager instance’s vector data which might go out of scope.


    hodlinator commented at 1:41 pm on November 15, 2025:

    Noticed in 0b13a009d259adff3468ad5111295a80584ee51b “refactor: Use span instead of vector for data in util/asmap” that you remove the const from the vector.

    0    const std::vector<std::byte> m_loaded_asmap;
    

    For some reason adding it back leads to AddrManImpl::Check() failures when running feature_asmap.py - any idea why?


    fjahr commented at 3:35 pm on November 16, 2025:
    Addressed in #33878

    fjahr commented at 3:54 pm on November 16, 2025:
    I didn’t fully validate this yet but it looks to me like with the const the vector can not be moved anymore, it is copied instead. This means that when the ctor WithLoadedAsmap tries to move it, it is copied instead and then the span pointer is dangling.

    hodlinator commented at 3:58 pm on November 21, 2025:

    Close, fixed the feature_asmap.py failure by implementing special logic in the move-ctor. But I recommend keeping it as a non-const member so that it can be moved from instead of copied using the default move-ctor.

     0--- a/src/netgroup.h
     1+++ b/src/netgroup.h
     2@@ -17,7 +17,10 @@
     3 class NetGroupManager {
     4 public:
     5     NetGroupManager(const NetGroupManager&) = delete;
     6-    NetGroupManager(NetGroupManager&&) = default;
     7+    NetGroupManager(NetGroupManager&& other)
     8+        : m_loaded_asmap{other.m_loaded_asmap},
     9+          m_asmap{m_loaded_asmap.empty() ? other.m_asmap : m_loaded_asmap}
    10+    {}
    11     NetGroupManager& operator=(const NetGroupManager&) = delete;
    12     NetGroupManager& operator=(NetGroupManager&&) = delete;
    13 
    14@@ -90,12 +93,12 @@ private:
    15      * data loaded from an external file at runtime or it can point to embedded
    16      * asmap data.
    17      */
    18+    const std::vector<std::byte> m_loaded_asmap;
    19     const std::span<const std::byte> m_asmap;
    20-    std::vector<std::byte> m_loaded_asmap;
    21 
    22     explicit NetGroupManager(std::span<const std::byte> embedded_asmap, std::vector<std::byte>&& loaded_asmap)
    23-        : m_asmap{embedded_asmap},
    24-          m_loaded_asmap{std::move(loaded_asmap)}
    25+        : m_loaded_asmap{std::move(loaded_asmap)},
    26+          m_asmap{embedded_asmap}
    27     {
    28         assert(m_loaded_asmap.empty() || m_asmap.data() == m_loaded_asmap.data());
    29     }
    
  207. hodlinator commented at 2:13 pm on November 15, 2025: contributor

    Reviewed e00deb2cfa60e846ea3de48bed35c7d3992db4bd

    Seems like this PR has evolved a bit since before CMake. Pretty good shape now (using CMake byte header generation instead of Python script etc).

    Left a bunch of nits but also some more in-depth suggestions & questions.

    Skipped aspects addressed in other PR with similar commits (https://github.com/bitcoin/bitcoin/pull/33026#pullrequestreview-3441618331).

  208. fjahr renamed this:
    Embed default ASMap as binary dump header file
    Embedded ASMap [3/3]: Build binary dump header file
    on Nov 15, 2025
  209. fjahr commented at 9:03 pm on November 15, 2025: contributor

    I am taking this PR back into draft status temporarily because I have been slicing it into 3 parts thanks to the suggestions of @hodlinator . The first part is #33026 which features some minor prep work and is ready for review right now. The second part is #33878 which can be reviewed now but has not addressed the latest feedback from today. There is one new commit in there which I hope is interesting: it adds further documentation to asmap.cpp to make it more accessible. This PR here will become the third part but I will need some time to address the latest comments and only then push an updated version to not rebase on the predecessor PRs too often.

    I have also resurrected a tracking issue for asmap at #33879 since there have also been some further spin-off PRs that should hopefully make it into the same with the embedding.

  210. fjahr marked this as a draft on Nov 15, 2025
  211. fjahr commented at 9:43 pm on November 16, 2025: contributor
    All the feedback that applied to #33878 has now been addressed there. I am marking these comments as resolved and if there are further notes on these please move the conversation over there. Thanks!
  212. fjahr renamed this:
    Embedded ASMap [3/3]: Build binary dump header file
    build: Embedded ASMap [3/3]: Build binary dump header file
    on Nov 16, 2025
  213. DrahtBot added the label Build system on Nov 16, 2025
  214. fjahr force-pushed on Nov 16, 2025
  215. fjahr force-pushed on Nov 16, 2025
  216. DrahtBot added the label CI failed on Nov 16, 2025
  217. DrahtBot commented at 10:58 pm on November 16, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/19413285446/job/55537763402 LLM reason (✨ experimental): Lint failure: ruff flagged Python f-strings without placeholders (F541) causing the CI to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  218. fjahr commented at 10:59 pm on November 16, 2025: contributor
    This is now in sync with #33026 and #33878 again so technically ready for review but of course the other PRs should be checked out first if you haven’t yet.
  219. fjahr marked this as ready for review on Nov 16, 2025
  220. DrahtBot removed the label CI failed on Nov 17, 2025
  221. in doc/asmap-data.md:52 in 0ae539dbff outdated
    47+without at least two additional reviewers confirming that the process described
    48+above was followed as expected and that the encoding step yielded the same
    49+file hash. New files are created on an ongoing basis but without any central planning
    50+or an explicit schedule.
    51+
    52+## Release process
    


    fanquake commented at 10:10 am on November 18, 2025:

    a new creation process

    What is the “new creation process”:

    • Who is involved (and has the permissions needed to merge into the required repos).
    • How many total people are required?
    • Can it happen at anytime, or does it require coordination days ahead of time?
    • What should happen if the process creates data that isn’t usable?

    fjahr commented at 11:32 pm on November 19, 2025:

    I think this is just poor wording on my part, I don’t mean a completely new process there, I mean the process that is described in this document can be run again at any time so that there is a new map created that can be used for a release. If that answers your question then I will amend that part of the document to be more clear.

    I will still give short answer to the questions in case above doesn’t resolve them. As mentioned previously here: https://github.com/bitcoin/bitcoin/pull/28792/files/0ae539dbff83229b1512f92067522424125229ed#r2167155199 I am hesitant to write the specifics of the process in very specific details for similar reasons as we don’t write anywhere explicitly that “a PR will get merged when it has 3 ACKs” or so.

    Who is involved (and has the permissions needed to merge into the required repos).

    Anyone can be involved similar to how PRs here can be opened and reviewed by anyone, see for example the latest run here: https://github.com/asmap/asmap-data/issues/34. There is nothing special about the people participating. We just tag those that asked for it for their convenience so they don’t miss a new run. I am the only one who has permission to merge new maps into that repo. I have signaled several times that (for example here and here) I am happy to move the repo into the bitcoin-core org or share maintainership of the repo with core maintainers, whatever you think works best. I didn’t push for this since review here was slow for a long time. Happy to tackle this soon if you want to get it done before the embedding merge.

    How many total people are required?

    I answered this here nothing really has changed. I would add that the tooling also let’s a single person build a map but the numbers I came up with are just designed to roughly match core’s processes of distributing trust among multiple participants.

    The 5 participant minimum is kind of arbitrarily picked, I admit, but I try to get to what I would say is comparable to 3 ACKs. More would be nice but I wasn’t confident we would get more people involved on a regular basis and majority of 5 is 3 so basically this is a minimum of 3 ACKs on the input data. The mismatches are not comparable to NACKs because there is no right or wrong there (assuming no malice), they are just different because of timing or connection problems. On the mechanical encoding review part in the PR similarly I am looking for 2-3 ACKs.

    Can it happen at anytime, or does it require coordination days ahead of time?

    For convenience we currently schedule the runs several days in advance so people can set things up ahead of time and don’t miss a spontaneous event. But if you can have 5 people online with the tooling setup you can launch it at any point basically.

    I want to highlight also here that there has been great progress in the performance of kartograf because of the work of @jurraca . In my home which has good but not crazy connection speed and on my modern laptop the run now takes about 12 minutes while previously it took much much longer. This means a spontaneous run is much more feasible than it used to be.

    There are also efforts to make it more convenient to do the runs as a cron job or similar so regularly scheduled runs become even less of a hassle.

    What should happen if the process creates data that isn’t usable?

    This should be an exception, while we had some misses in the past most runs where successful and we are still researching better ways of optimizing the tools to increase the chances. But they will never go away completely. That being said, the options are either to run again or not and use an older version. The former would probably be used if there is a motivated group that wants to create a map that is as fresh as possible while the latter is what we do with our monthly runs. If it fails we just wait for the next one in a month because there is no pressure.

  222. in CMakeLists.txt:130 in 0ae539dbff outdated
    126@@ -127,6 +127,8 @@ if(WITH_ZMQ)
    127   find_package(ZeroMQ 4.0.0 MODULE REQUIRED)
    128 endif()
    129 
    130+option(WITH_EMBEDDED_ASMAP "Embed default ASMap data." ON)
    


    fanquake commented at 10:26 am on November 18, 2025:
    I can’t remember where we landed on making this is a build option. Has anyone asked for the ability to turn this off at build time, as opposed to just at runtime (if it becomes the default). I can imagine that could be asked for, for someone wanting to do a from-source build without any arbitrary blobs. In that case, what is the process for that person to recreate, (after the fact), the ip_asn.dat file that will exist in this repo?

    fjahr commented at 10:57 pm on November 19, 2025:

    I can’t remember where we landed on making this is a build option. Has anyone asked for the ability to turn this off at build time, as opposed to just at runtime (if it becomes the default)

    We didn’t really land anywhere. My last response on it was here and then the discussion didn’t continue, I think. Similar to the default location discussion I am not going to die on this hill, I just want to give users the options they might want to get the most out of asmap or for asmap to not get in their way if they don’t want it (yet). Something I could do is pull this part out of here into a separate, optional follow-up PR on top of this one. Then the embedding can be reviewed and potentially merged without it while the other PR gives users that want this option a venue to speak up and if they do we can still add this before a release. Wdyt?

    I can imagine that could be asked for, for someone wanting to do a from-source build without any arbitrary blobs. In that case, what is the process for that person to recreate, (after the fact), the ip_asn.dat file that will exist in this repo?

    Based on my suggested process for creating the maps and selecting one for the release the asmap-data repo should hold the map that was selected. So the easiest way to get the file is to get the map directly from there. It’s been discussed already that this repo will move into the bitcoin-core org when embedding is merged. @sipa has also suggested to have an export functionality in core nodes that have been built with the map, which I have planned to do as a quick follow-up to this PR.

  223. fanquake referenced this in commit 509dc91db1 on Nov 19, 2025
  224. DrahtBot added the label Needs rebase on Nov 19, 2025
  225. fjahr force-pushed on Nov 19, 2025
  226. fjahr commented at 11:47 pm on November 19, 2025: contributor
    Rebased since #33026 was merged, of course still based on #33878.
  227. DrahtBot removed the label Needs rebase on Nov 20, 2025
  228. DrahtBot added the label Needs rebase on Nov 20, 2025
  229. fjahr force-pushed on Nov 20, 2025
  230. fjahr commented at 2:55 pm on November 20, 2025: contributor
    Only rebased again
  231. DrahtBot removed the label Needs rebase on Nov 20, 2025
  232. in src/CMakeLists.txt:289 in 1dcd2e8a98
    285@@ -286,6 +286,13 @@ target_link_libraries(bitcoin_node
    286     $<TARGET_NAME_IF_EXISTS:libevent::pthreads>
    287     $<TARGET_NAME_IF_EXISTS:USDT::headers>
    288 )
    289+if (WITH_EMBEDDED_ASMAP)
    


    hodlinator commented at 1:00 pm on November 21, 2025:

    nit: Convention seems to be

    0if(WITH_EMBEDDED_ASMAP)
    

    fjahr commented at 1:00 am on November 25, 2025:
    Done
  233. hodlinator commented at 2:24 pm on November 21, 2025: contributor

    Concept ACK 1dcd2e8a983406d12009e41bac5dbaeb12f330be

    I do understand the distaste at adding 1MB+ binary blobs into a reproducible build. From that perspective, it would be cleaner to include the ASMap blob as a separate file included in our Bitcoin Core releases.

    But having a separate file opens up the can of worms of where to install it on various operating systems and distributions, how to encourage package maintainers to include and ensure it ends up in the right locations, and making Bitcoin Core search an ever changing list of directories to check for ASMap file existence (or requiring always specifying -noasmap or -asmap=/path/on/distro/x/asmap.dat). That would hamper broad ASMap adoption, especially if we want to make it default-on in the future.

    Might be good to add a blurb about the consequences of not embedding to the PR description.

    Generation times unaffected

    OFF

    0$ ccache -C && rm -rf build && time cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=OFF
    1
    2real	0m18.115s
    3real	0m18.091s
    4real	0m18.027s
    

    ON

    0$ ccache -C && rm -rf build && time cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=ON
    1
    2real	0m18.050s
    3real	0m17.949s
    4real	0m17.995s
    

    Compile times fairly unaffected

    OFF

    0$ ccache -C && rm -rf build && cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=OFF && \
    1time cmake --build build -t bitcoind
    2
    3real	2m28.611s
    4real	2m27.686s
    5real	2m26.425s
    

    ON

    0$ ccache -C && rm -rf build && cmake -B build --preset dev-mode -DENABLE_IPC=OFF -DBUILD_GUI=OFF -GNinja -DWITH_EMBEDDED_ASMAP=ON && \
    1time cmake --build build -t bitcoind
    2
    3real	2m29.730s
    4real	2m29.345s
    5real	2m30.751s
    6real	2m28.495s
    7real	2m26.334s
    
  234. DrahtBot added the label Needs rebase on Nov 21, 2025
  235. fjahr force-pushed on Nov 25, 2025
  236. fjahr commented at 1:00 am on November 25, 2025: contributor

    Addressed comments and rebased after #33770 was merged including the latest changes in #33878. This PR now necessarily re-adds the possibility to set a bool option for -asmap to make it possible to use asmap without an external file. I have done this a bit differently now and I think it’s better.

    Re @hodlinator #28792 (review): I don’t think I understand your comment. I am pretty sure I have addressed your previous comment to use OFF but the file was renamed in the meantime which I pulled in here with my previous rebase. This is probably why I can’t answer to your comment inline. Can you clarify if this isn’t addressed yet somehow or if I am misunderstanding something? Thanks!

  237. fjahr force-pushed on Nov 25, 2025
  238. DrahtBot added the label CI failed on Nov 25, 2025
  239. DrahtBot commented at 1:04 am on November 25, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/19654544240/job/56288400840 LLM reason (✨ experimental): CI failed due to a lint error from ruff: an f-string without placeholders in the Python tests (feature_asmap).

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  240. DrahtBot removed the label Needs rebase on Nov 25, 2025
  241. DrahtBot removed the label CI failed on Nov 25, 2025
  242. hodlinator commented at 10:27 am on January 12, 2026: contributor

    Re @hodlinator … Can you clarify if this isn’t addressed yet somehow or if I am misunderstanding something? Thanks!

    Update to the main thread in case others are waiting for me to respond to the above - I responded in the thread: #28792 (review) and the issue was resolved.

    Other reviewers: please have a look at the PR that should be merged ahead of this one: https://github.com/bitcoin/bitcoin/pull/33878

  243. DrahtBot added the label Needs rebase on Jan 20, 2026
  244. refactor: Operate on bytes instead of bits in Asmap code
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    fa41fc6a1a
  245. refactor: Unify asmap version calculation and naming
    Calculate the asmap version only in one place: A dedicated function in util/asmap.
    
    The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
    385c34a052
  246. refactor: Use span instead of vector for data in util/asmap
    This prevents holding the asmap data in memory twice.
    
    The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
    cf4943fdcd
  247. doc: Add more extensive docs to asmap implementation
    Also makes minor improvement on the python implementation documentation.
    79e97d45c1
  248. refactor: Simplify Interpret asmap function
    This aligns it more with SanityCheckAsmap and reduces variable scope.
    
    Also unify asmap casing in SanityCheckAsmap function name.
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    4fec726c4d
  249. build: Add embedded asmap data
    The data embedded is from the latest ASMap file from the asmap-data
    repository: https://github.com/asmap/asmap-data/blob/main/1755187200_asmap.dat
    3e52804c67
  250. build: Generate ip_asn.dat.h during build process
    This can be disabled with -DWITH_EMBEDDED_ASMAP=OFF.
    7d08a59a91
  251. init, net: Implement usage of binary-embedded asmap data 6a01e25443
  252. doc: Expand documentation on asmap feature and tooling c518145d47
  253. ci: Use without embedded asmap build option in one ci job a38fd8e4e3
  254. fjahr force-pushed on Jan 21, 2026
  255. fjahr commented at 8:43 am on January 21, 2026: contributor
    Rebased
  256. DrahtBot removed the label Needs rebase on Jan 21, 2026
  257. DrahtBot added the label CI failed on Jan 21, 2026
  258. doc: Add initial asmap data documentation 9b48054832
  259. in doc/asmap-data.md:39 in aa6fc82c8f
    34+matches with other users' results. Before the map is usable by Bitcoin Core
    35+it needs to be encoded as well. This is done using `asmap-tool.py` in `contrib/asmap`
    36+and this step is deterministic as well.
    37+
    38+When it comes to obtaining the initial input data, the high volatility remains
    39+a challenge if users don't want to trust a single creator of used ASMap file.
    


    maflcko commented at 11:14 am on January 26, 2026:
    creator of used ASMap file -> creator of the used ASMap file [clarifies noun phrase; original phrasing is awkward and can impede comprehension]
    

    fjahr commented at 11:55 am on January 26, 2026:
    Fixed
  260. fjahr force-pushed on Jan 26, 2026
  261. sedited commented at 12:46 pm on January 26, 2026: contributor

    I feel apprehensive to be commenting this so late in this discussion, but also compelled to voice my concerns around embedding the data in the binary. My initial reaction a few years ago to this was that adding something that can only go stale over time, doesn’t seem well suited for embedding - especially since many users tend to run old clients. From reading a bunch of prior discussions my impression since then is that the autonomous system data remains remarkably robust over time, meaning the data is unlikely to go stale quickly. Given that dumps exist now going back a few years, how does a map from say five years ago hold up now? I believe it would be valuable to have this additional data as a motivation for this PR.

    Is there really that much complexity involved with shipping an external dump file? I feel like this naturally fits into /share, which we already ship and is commonly part of a bitcoin installation. In my view not doing so out of fear of deployment complexity pulls the recent direction into using multiple specialized binaries, some of which are installed in a separate /libexec directory, into question. Is there another motivation for it outside of users potentially missing to install the data file in an expected directory?

  262. DrahtBot removed the label CI failed on Jan 26, 2026
  263. jurraca commented at 5:06 pm on January 26, 2026: contributor

    While I don’t have a 5 year old ASmap, we can use the data in https://github.com/asmap/asmap-data/ which goes back about 2 years to get a sense of the map “decay”. There are some gaps in the data, but diff’ing each ASmap against the latest one from January 8 2026, and looking only at IPv4, it looks like:

    y-axis “diff” is in percentage, i.e. the % of IPv4 addresses that have changed AS assignment between this ASmap and the latest one. x-axis is the last two years by month. Blue is unfilled, red is filled. I can’t explain the outliers for October and November for the filled maps. We switched to adding both filled and unfilled last year, since the ASmap that Core embeds will be filled.

     0Diff 2024/1704463200_asmap_unfilled.dat
     1date Fri Jan  5 02:00:00 PM WET 2024
     2Summary
     3IPv4: 169354 entries with 457126643 (2^28.77) addresses changed
     4IPv6: 76246 entries with 4206462542595703067617859911811074 (2^111.70) addresses changed
     5
     6Diff 2024/1706536800_asmap_unfilled.dat
     7date Mon Jan 29 02:00:00 PM WET 2024
     8Summary
     9IPv4: 165843 entries with 463591644 (2^28.79) addresses changed
    10IPv6: 74985 entries with 4165872077080205881347881190817794 (2^111.68) addresses changed
    11
    12Diff 2024/1710770400_asmap_unfilled.dat
    13date Mon Mar 18 02:00:00 PM WET 2024
    14Summary
    15IPv4: 159417 entries with 449513388 (2^28.74) addresses changed
    16IPv6: 70986 entries with 3736118957815016883858751409356808 (2^111.53) addresses changed
    17
    18Diff 2024/1724248800_asmap_unfilled.dat
    19date Wed Aug 21 03:00:00 PM WEST 2024
    20Summary
    21IPv4: 133017 entries with 372739692 (2^28.47) addresses changed
    22IPv6: 60304 entries with 2974030463145239874385482782605310 (2^111.20) addresses changed
    23
    24Diff 2024/1730210400_asmap_unfilled.dat
    25date Tue Oct 29 02:00:00 PM WET 2024
    26Summary
    27IPv4: 122352 entries with 346273742 (2^28.37) addresses changed
    28IPv6: 57377 entries with 2426292002339195161349943971545077 (2^110.90) addresses changed
    29
    30Diff 2024/1733234400_asmap_unfilled.dat
    31date Tue Dec  3 02:00:00 PM WET 2024
    32Summary
    33IPv4: 117097 entries with 339829015 (2^28.34) addresses changed
    34IPv6: 54523 entries with 2476550026428340323531252209549299 (2^110.93) addresses changed
    35
    36Diff 2025/1755187200_asmap_unfilled.dat
    37date Thu Aug 14 05:00:00 PM WEST 2025
    38Summary
    39IPv4: 59660 entries with 153487021 (2^27.19) addresses changed
    40IPv6: 21323 entries with 966165905693022508351155643351022 (2^109.57) addresses changed
    41
    42Diff 2025/1757606400_asmap_unfilled.dat
    43date Thu Sep 11 05:00:00 PM WEST 2025
    44Summary
    45IPv4: 52192 entries with 143403367 (2^27.10) addresses changed
    46IPv6: 17846 entries with 818296502522348220948416577863669 (2^109.33) addresses changed
    47
    48Diff 2025/1762444800_asmap_unfilled.dat
    49date Thu Nov  6 04:00:00 PM WET 2025
    50Summary
    51IPv4: 34900 entries with 112374663 (2^26.74) addresses changed
    52IPv6: 11095 entries with 588018644402064840046959300444161 (2^108.86) addresses changed
    53
    54Diff 2025/1764864000_asmap_unfilled.dat
    55date Thu Dec  4 04:00:00 PM WET 2025
    56Summary
    57IPv4: 25888 entries with 84135112 (2^26.33) addresses changed
    58IPv6: 7177 entries with 358403593593551172531041177960448 (2^108.14) addresses changed
    

    output by this bash script, which you can tweak and run to reproduce (the year paths and filenames are hardcoded).

     0#!/bin/bash
     1
     2summary_file="summary.txt"
     3
     4for file in 2024/*_unfilled.dat; do
     5    if [[ -f "$file" ]]; then
     6	filename=$(basename "$file")
     7	timestamp=$(echo "$filename" | cut -d'_' -f1)
     8        readable_date=$(date -d @"$timestamp")
     9        echo "Diff $file" >> "$summary_file"
    10	echo "date $readable_date" >> "$summary_file"
    11        python3 ~/code/bitcoin/contrib/asmap/asmap-tool.py diff "$file" 2026/1767888000_asmap_unfilled.dat | tail -n 3 >> "$summary_file"
    12        echo "" >> "$summary_file"
    13    fi
    14done
    

    Will dig around for some older data. cc @fjahr if you have older ASmaps lying around.

  264. fjahr commented at 9:52 am on January 27, 2026: contributor

    I feel apprehensive to be commenting this so late in this discussion, but also compelled to voice my concerns around embedding the data in the binary. My initial reaction a few years ago to this was that adding something that can only go stale over time, doesn’t seem well suited for embedding - especially since many users tend to run old clients.

    The overaching motivation here is that we think using asmap will help all users individually and the network as a whole. Because of that we want to turn it on by default eventually so even the users that don’t change anything in their configs can get the benefit. This seems to be only possible to do reliably by embedding the data in the binary.

    From reading a bunch of prior discussions my impression since then is that the autonomous system data remains remarkably robust over time, meaning the data is unlikely to go stale quickly. Given that dumps exist now going back a few years, how does a map from say five years ago hold up now? I believe it would be valuable to have this additional data as a motivation for this PR. @jurraca posted some analysis already. 5 years seems excessive to me, given that our releases reach EOL after 18 months. Looking back two years seems fine in terms of time horizon and we also don’t have older data because the kartograf project only got stable and to a comparable feature set of today about two years ago. So any comparison with older files or even non-kartograf generated files could be misleading. It would seem backwards to me to test and (upon success) give a thumbs up for using unsupported bitcoin core versions, even if it’s just from the asmap perspective. @jurraca and me are very happy to run more analysis and show the data if it convinces people further but it would be helpful if there was some input of what concerns there are exactly. The map is just a snapshot of the global internet routing table, it may be more interesting to post some research from that level but I am not sure people are interested to invest the time and study it. We can also look at the data on the level of how networks containing bitcoin nodes change, on a shorter timeframe we have already done that and we could take 2 year old bitnodes snapshots and look how they are affected. But it is debatable how valuable that information is since hopefully nodes are being run in a much broader range of networks in the future and asmap should work the same way then.

    Is there really that much complexity involved with shipping an external dump file?

    It should be less complex from what I can tell, we could basically throw away all the changes here. But we couldn’t turn asmap on by default.

    I feel like this naturally fits into /share, which we already ship and is commonly part of a bitcoin installation.

    I only spent a short time looking at it but it doesn’t seem like any data there is loaded at runtime so I don’t see how it is a natural fit. Can you point me to the files in there that are already used in the same way asmap files would be used? It seems like rather the files there are either embedded into the builds too (but I don’t have a good overview of the QT code) or they are just support scripts.

    In my view not doing so out of fear of deployment complexity pulls the recent direction into using multiple specialized binaries, some of which are installed in a separate /libexec directory, into question. Is there another motivation for it outside of users potentially missing to install the data file in an expected directory?

    Sure, the users that use these specialized binaries directly will certainly be capable to use external asmap files. But it would be just a power user feature then and if that is fine I don’t think we need to do anything, we could just keep the feature as is and let people download files from https://github.com/asmap/asmap-data/. If users have to touch these files manually anyway, they might as well get the freshest one.

  265. hodlinator commented at 10:12 am on January 27, 2026: contributor

    Is there really that much complexity involved with shipping an external dump file?

    It should be less complex from what I can tell, we could basically throw away all the changes here. But we couldn’t turn asmap on by default.

    A possible alternative to embedding could be to by default require an -asmap=path/to/file CLI arg / setting on mainnet, while still allowing running with -asmap= (explicitly unsetting) but in that case emit a warning on startup so distributions/package maintainers aren’t let off the hook.

    I also think that it probably shouldn’t be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

  266. sedited commented at 10:38 am on January 27, 2026: contributor

    Re #28792 (comment)

    I only spent a short time looking at it but it doesn’t seem like any data there is loaded at runtime so I don’t see how it is a natural fit.

    We don’t load anything from /share currently, but I don’t see a difference between starting a binary from /libexec and loading from /share at startup in that respect. There are plenty of applications that do load static data from /share.

    But we couldn’t turn asmap on by default.

    Am I right that the concern is that even if we’d always attempt to load from some static resource path, we would have to fail silently if the data file cannot be found, meaning we’d still not be able to ship the feature to all users?

    5 years seems excessive to me, given that our releases reach EOL after 18 months.

    I do think this is a slight shift in the guarantees of the software. AFAIK we don’t currently ship a feature that decreases in usefulness over time on an already running node. As I already said, I largely do not share this concern. The data @jurraca posted above should be enough to address those reservations (which seems to indicate a potential linear regression?).

  267. fjahr commented at 12:59 pm on January 28, 2026: contributor

    @sedited

    We don’t load anything from /share currently, but I don’t see a difference between starting a binary from /libexec and loading from /share at startup in that respect. There are plenty of applications that do load static data from /share.

    Maybe there are applications that load data from share that they put there themselves, I don’t have a lot of experience with using share or such applications. But my understanding from some research is that data put in share usually exists so the OS can access and use them, too (hence the name). So for example icons for applications to discplay in different UI elements of the OS GUI. The asmap data isn’t used by the OS so it doesn’t seem like a good fit conceptually.

    Am I right that the concern is that even if we’d always attempt to load from some static resource path, we would have to fail silently if the data file cannot be found, meaning we’d still not be able to ship the feature to all users?

    I am inexperienced with using share folders in different OS, if you say it’s guaranteed that in every OS/distro we are planning to support some form of a share folder exists and it is always accessible to bitcoin core at runtime and that the maintenance burden to ensure that this always holds holds in the future is also not higher than the maintenance burden of embedding the data, then I would say it’s a viable alternative. I honestly have no idea how to verify this.

    AFAIK we don’t currently ship a feature that decreases in usefulness over time on an already running node.

    At least we ship the hardcoded seeds as embedded data and I would expect that data to decrease in usefulness at roughly the same rate. And I would say the usefulness of the software as a whole decreases as we publish security advisories to which that software is still vulnerable.

    which seems to indicate a potential linear regression?

    I would expect it to flatten more over time but it is also possible that this effect is offset by registration of new IPv6 blocks and moves from IPv4 to IPv6.

  268. fjahr commented at 1:39 pm on January 28, 2026: contributor

    @hodlinator

    A possible alternative to embedding could be to by default require an -asmap=path/to/file CLI arg / setting on mainnet, while still allowing running with -asmap= (explicitly unsetting) but in that case emit a warning on startup so distributions/package maintainers aren’t let off the hook.

    It sounds like the burden would still fall on users having to get an asmap file from somewhere quite often. Since we won’t always be able to reach these users ourselves, there would be an increased chance of those users using a badly outdated or even malicious asmap file they got from somewhere. And this would be a degraded UX.

    Possibly the asmap files from asmap-data could contain a signature by the maintainers which would be checked by bitcoin core to verify it’s a verified asmap and otherwise there is a warning. I don’t think that’s a good idea but that would move this into a bit more secure territory, similar to how we have the assumeutxo hashes. Of course only embedding the hashes doesn’t work because we want people to use the latest version if possible…

    I also think that it probably shouldn’t be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

    There are certainly ways we could investigate improving the bucketing logic before we enable asmap by default. We had discussions on this in early 2023 based on virtu’s simulations for example. But I would like to pick up that thread when we have embedding actually merged and off-by-default.

  269. hodlinator commented at 4:27 pm on January 28, 2026: contributor

    @fjahr

    It sounds like the burden would still fall on users having to get an asmap file from somewhere quite often.

    I was thinking we would include the latest “official” ASMap file in release packages:

    • Windows installer would put it in a subdirectory close to the executables.
    • Linux/UNIX distros would be encouraged (not required) to put it in one of <10 standard locations: /usr/share/, /local/share/, etc
    • Mac app dir would contain it in a subdirectory.

    A less heavy-handed approach than my earlier comment #28792 (comment) (which required explicit -asmap=... to run on mainnet). Would be:

    1. If ASMap is enabled (or later default-on):
      • On Windows/Mac: Bitcoin Core looks for an ASMap file next to the binaries.
      • On Linux/UNIX we check the list of expected standard locations /usr/share/, /local/share/, etc. More esoteric distros like Nix/Guix would still use -asmap=....
    2. If an ASMap is not found:
      • If it was requested via -asmap=..., we output a fatal init-error.
      • If it was default-on, we emit a warning. This encourages distros to set it up to work, although they could also opt to explicitly disable it in the worst case, something that is less of an issue with the approach embedding in the executables/shared libraries, but they could still chose to exclude it from the build.

    Agree that some kind of scheme to validate asmap-files would be good.


    I also think that it probably shouldn’t be an all-or-nothing thing. Maybe use ASMap for 2/3 of outbound node slots by default and non-ASMap for the rest?

    […] I would like to pick up that thread when we have embedding actually merged and off-by-default.

    :+1:

  270. sipa commented at 4:38 pm on January 28, 2026: member

    @sedited

    My initial reaction a few years ago to this was that adding something that can only go stale over time, doesn’t seem well suited for embedding - especially since many users tend to run old clients.

    That’s a fair concern, but my (largely unsubstantiated) belief is that even a 5-year old asmap file would be far more accurate than the default /16-based bucketing we use now. And I don’t think this is something we can exactly measure: even if ranges’s ASN assignment changes over time, that does not imply that the inaccuracy caused by that exposes users more to attacks than /16 bucketing does.

    Is there really that much complexity involved with shipping an external dump file? I feel like this naturally fits into /share, which we already ship and is commonly part of a bitcoin installation. In my view not doing so out of fear of deployment complexity pulls the recent direction into using multiple specialized binaries, some of which are installed in a separate /libexec directory, into question. Is there another motivation for it outside of users potentially missing to install the data file in an expected directory?

    I’m not sure I’ve ever even considered that option - possibly because I’ve always thought of (and frequently used) Bitcoin Core as a standalone bitcoind / bitcoin-cli that can be run from anywhere. So from that perspective, the only options are built-in to the binary, or expecting the user to put it in the datadir or otherwise manually provide a location. I think built-in is the far less error-prone option out of those.

    If we accept that some “installation” for usage is inevitably going to be necessary(*) with the libexec/ move, then putting the asmap data in a /share-like path seems pretty reasonable from a filesystem layout perspective.

    If we’d go that route, I’d aim for a:

    • If -noasmap specified, fall back to /16 bucketing like today.
    • If -asmap=PATH, specified, use that path if it exists and fail otherwise
    • If $DATADIR/asmap.dat exists, use that.
    • If $INSTALLPATH/share/asmap.dat exists, use that.
    • (Possibly?) If $(dirname argv[0])/asmap.dat exists, use that, so someone can have a bitcoind and asmap.dat in the same directly, and it’ll just work?
    • Otherwise, fail.

    So a user cannot inadvertently end up in a situation where they are not using proper asmap data.

    However, I don’t see how this addresses the reasoning for not wanting built-in asmap data: data in /share will go stale just as much as the binary goes out of stale, if we’re working in the “installed environment” setting, as /bin/bitcoind and /share/asmap.dat will just be updated simultaneously? In the “manual/uninstalled” environment, we’d expect people to have asmap.dat files in a manual path anyway.

    (*) Is that “installed environment” really compatible with the movement towards reproducible and increasingly static builds that work on as many distinct systems as possible? Because how can this neutral, run-anywhere, reproducible, single binary know what the OS’s equivalent /share path would be? Does it just try a number of fixed typical locations? What about guix/nix like systems that install binaries in a versioned paths?

  271. darosior commented at 5:19 pm on January 28, 2026: member

    Embedding the ASMap in the binary is a tradeoff that is only worth it if we are going to enable it by default. Correct? If so, i think it is necessary that we have buy-in on the direction prior to embedding the binary. Clearly there is interest, but i’m not sure there is broad agreement yet (see for instance #16599 (comment)).

    To be clear i’m not objecting to this PR, i just want to make sure that we don’t end up with a half-shipped feature where we eat the costs but don’t reap the benefits because we didn’t set a clear direction from the get go and it turns out to be more controversial than anticipated.

  272. fjahr commented at 8:14 pm on January 28, 2026: contributor

    Embedding the ASMap in the binary is a tradeoff that is only worth it if we are going to enable it by default. Correct?

    I think lowering the friction to use the feature is a win either way but I don’t have to motivate myself to review the code so I guess my opinion doesn’t count much ;)

    If so, i think it is necessary that we have buy-in on the direction prior to embedding the binary. Clearly there is interest, but i’m not sure there is broad agreement yet (see for instance #16599 (comment)).

    I would expect that there is agreement that the asmap data will be helpful to create better peer diversity even if the bucketing or connection logic might have to be tweaked to prevent adverse effects when all nodes in the network are using asmap at some point.

    On the linked comment in particular, to me it sounds more like something that should be tackled on the connection side because it just shows a general problem we will always encounter when we try to create peer diversity through discrimination of certain peers based on any data, not just asmap. If I get a new IPv6 /32 and launch thousands of nodes in it, we would start to see the same effect under the current default bucketing logic, right? And since IPv4 is getting more and more scarce it would be totally plausible (if not inevitable) that the big cloud hosters just decide to move all of their cheaper servers in one IPv6 /32 each and IPv4 will be only be available via a very costly upgrade (actually AWS already charges a small amount it seems). So this is not as unrealistic as it may seem.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-31 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me