refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878

pull fjahr wants to merge 5 commits into bitcoin:master from fjahr:2025-11-asmap-slice-2 changing 19 files +352 −208
  1. fjahr commented at 8:50 pm on November 15, 2025: contributor

    This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).

    The changes are:

    • Modernizes and simplifies the asmap code by operating on std::byte instead of bits
    • Unifies asmap version calculation and naming (previously it was called version and checksum interchangeably)
    • Operate on a span rather than a vector in the asmap internal to prevent holding the asmap data in memory twice
    • Add more extensive documentation to the asmap implementation
    • Unify asmap casing in implemetation function names

    The first three commits were already part of #28792, the others are new.

    The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.

  2. DrahtBot commented at 8:50 pm on November 15, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33920 (Export embedded ASMap RPC by fjahr)
    • #33631 (init: Split file path handling out of -asmap option by fjahr)
    • #31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
    • #28792 (build: Embedded ASMap [3/3]: Build binary dump header file by fjahr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. fjahr commented at 9:06 pm on November 15, 2025: contributor
    @sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :)
  4. in src/util/asmap.cpp:25 in 93205dd90f outdated
    20@@ -21,140 +21,235 @@
    21 #include <utility>
    22 #include <vector>
    23 
    24+/*
    25+ * ASMap (Autonomous System Map) Implementation
    


    sipa commented at 10:05 pm on November 15, 2025:

    All the documentation changes in this commit look correct to me.

    One thing that may be worth mentioning early on is that it’s a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.


    fjahr commented at 9:46 pm on November 16, 2025:
    Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.
  5. DrahtBot added the label CI failed on Nov 15, 2025
  6. fjahr force-pushed on Nov 16, 2025
  7. fjahr commented at 9:45 pm on November 16, 2025: contributor
    All the feedback from #28792 that applied to the commits here has been addressed here now.
  8. fjahr force-pushed on Nov 16, 2025
  9. fjahr force-pushed on Nov 16, 2025
  10. DrahtBot removed the label CI failed on Nov 17, 2025
  11. laanwj commented at 12:47 pm on November 17, 2025: member

    The first three PRs were already part of #28792, the others are new.

    i’m sure you mean the first three commits? 😄

  12. in src/netgroup.cpp:87 in c2274746d1 outdated
    82@@ -81,31 +83,28 @@ std::vector<unsigned char> NetGroupManager::GetGroup(const CNetAddr& address) co
    83 uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
    84 {
    85     uint32_t net_class = address.GetNetClass();
    86-    if (m_asmap.size() == 0 || (net_class != NET_IPV4 && net_class != NET_IPV6)) {
    87+    if (m_asmap.empty() || (net_class != NET_IPV4 && net_class != NET_IPV6)) {
    88         return 0; // Indicates not found, safe because AS0 is reserved per RFC7607.
    89     }
    90-    std::vector<bool> ip_bits(128);
    


    laanwj commented at 2:23 pm on November 17, 2025:
    Isn’t this ip_bytes now? (though bits is technically still valid ofc)

    hodlinator commented at 8:42 pm on November 17, 2025:
    Agree with the rename suggestion, would make assert(addr_bytes.size() == ip_bits.size()); less surprising below.

    fjahr commented at 3:20 pm on November 19, 2025:
    Renamed
  13. in src/util/asmap.cpp:207 in db54e01640 outdated
    202@@ -203,10 +203,8 @@ std::vector<bool> DecodeAsmap(fs::path path)
    203         LogWarning("Failed to open asmap file from disk");
    204         return bits;
    205     }
    206-    file.seek(0, SEEK_END);
    207-    int length = file.tell();
    


    waketraindev commented at 3:19 pm on November 17, 2025:
    Bundling the implementation of AutoFile::size here doesn’t seem strictly necessary. It looks like you could rely on the length value return by tell() and avoid depending on Streams refactoring. That might help keep this PR a bit more focused on the asmap changes without pulling in broader Streams updates or file-size-related changes elsewhere.

    fjahr commented at 0:13 am on November 19, 2025:
    The commit is here because this PR depends on #33026 and the commit is coming from there, so the discussion on these commits should be held there. I just responded there, so marking this as resolved here.
  14. in src/netgroup.cpp:96 in c2274746d1 outdated
     96         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
     97-            for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
     98-                ip_bits[byte_i * 8 + bit_i] = (IPV4_IN_IPV6_PREFIX[byte_i] >> (7 - bit_i)) & 1;
     99-            }
    100+            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
    101         }
    


    hodlinator commented at 5:59 pm on November 17, 2025:
    In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: The for doing ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]); can now be removed since we have copy_n doing the same thing above.

    fjahr commented at 2:56 pm on November 19, 2025:
    Ah, rebase fail I guess, done.
  15. in src/test/netbase_tests.cpp:631 in c2274746d1 outdated
    637-    }
    638+        "63dc33d28f757a4a5e15d6a08"_hex_v};
    639 
    640     // Construct NetGroupManager with this data.
    641-    NetGroupManager netgroup{std::move(asmap_bits)};
    642+    NetGroupManager netgroup{std::move(ASMAP_DATA)};
    


    hodlinator commented at 6:48 pm on November 17, 2025:

    In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: It’s unfortunate that you need to make the hex data a vector again when the later span commit will accept the std::array.

    You could do:

    0NetGroupManager netgroup{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};
    

    In this commit and avoid changing the type of ASMAP_DATA.


    fjahr commented at 3:16 pm on November 19, 2025:
    That works, done.

    hodlinator commented at 9:38 pm on November 19, 2025:

    I meant you could could have this in the intermediate commit:

    0NetGroupManager netgroup{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};
    

    But then once we reach the commit adding embedding support change to:

    0auto netgroup{NetGroupManager::WithEmbeddedAsmap(ASMAP_DATA)};
    

    …ending up going directly from std::array -> std::span.


    If you prefer the current way I would still avoid auto and change:

    0- auto asmap_vec{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};
    1+ std::vector asmap_vec(ASMAP_DATA.begin(), ASMAP_DATA.end());
    2  NetGroupManager netgroup{asmap_vec};
    

    But no big deal.

    Nice to see 1/3 merged!


    fjahr commented at 10:28 pm on November 19, 2025:
    Ah, I see, thanks for clarifying! Done.

    hodlinator commented at 8:55 am on November 20, 2025:
    Thank you! In adfd800626a7910a603c34ab733fc1e91d438bc5 the ASMAP_DATA type is still unnecessarily changed from constexpr auto to static const auto.
  16. in src/util/asmap.cpp:215 in c2274746d1 outdated
    221-        for (int bit = 0; bit < 8; ++bit) {
    222-            bits.push_back((cur_byte >> bit) & 1);
    223-        }
    224-    }
    225-    if (!SanityCheckASMap(bits, 128)) {
    226+    file.seek(0, SEEK_SET);
    


    hodlinator commented at 6:53 pm on November 17, 2025:
    In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: Line seems like a leftover from a merge?

    fjahr commented at 3:15 pm on November 19, 2025:
    Indeed, removed.
  17. in src/bench/addrman.cpp:1 in 4c8e4eee00 outdated


    hodlinator commented at 7:09 pm on November 17, 2025:

    nit in 4c8e4eee00f16bace634221277c694ee5dda23f8: Sorry, accidentally injected possessive apostrophe, “span’s”, commit message should be:

    The version hash changes due to spans being serialized without their size-prefix (unlike vectors).


    fjahr commented at 3:15 pm on November 19, 2025:
    Fixed
  18. in src/util/asmap.cpp:310 in 0196f70a8c outdated
    304@@ -203,6 +305,10 @@ bool SanityCheckASMap(const std::span<const std::byte>& asmap, int bits)
    305     return false; // Reached EOF without RETURN instruction
    306 }
    307 
    308+/**
    309+ * Provides a safe interface for validating ASMap data before use. Validates
    310+ * the data and returns it if valid, empty span otherwise.
    


    hodlinator commented at 7:28 pm on November 17, 2025:
    In 0196f70a8c99b9c03c81e879cd9f05560fc20543: Could replace second sentence with “Returns true if the data is valid for 128 bits long inputs” or just remove it (no longer returns span).

    fjahr commented at 3:15 pm on November 19, 2025:
    Done
  19. in src/util/asmap.h:16 in eaf5e12d2b outdated
    12 #include <cstdint>
    13+#include <span>
    14 #include <vector>
    15 
    16-uint32_t Interpret(const std::vector<bool> &asmap, const std::vector<bool> &ip);
    17+uint32_t Interpret(std::span<const std::byte> asmap, std::span<const std::byte> ip);
    


    hodlinator commented at 8:19 pm on November 17, 2025:

    fjahr commented at 3:15 pm on November 19, 2025:
    Ok, I hope I got them all now.
  20. in src/util/asmap.cpp:131 in eaf5e12d2b
    138+    // A jump instruction, encoded as [1,0], inspects the next unused bit in the input
    139+    // and either continues execution (if 0), or skips a specified number of bits (if 1).
    140+    // It is followed by an integer using jump encoding.
    141     JUMP = 1,
    142+    // A match instruction, encoded as [1,1,0], inspects 1 or more of the next unused bits
    143+    // in the input. If they all match, execution continues. If not, the default ASN is returned.
    


    hodlinator commented at 8:23 pm on November 17, 2025:

    Related: #28792 (review)

    Thanks for documenting the C++ implementation!

    nit:

    0    // in the input. If they all match, execution continues. If not, the default ASN is returned (or 0 if unset).
    

    Could also correct the Python side:

     0--- a/contrib/asmap/asmap.py
     1+++ b/contrib/asmap/asmap.py
     2@@ -157,7 +157,7 @@ class _Instruction(Enum):
     3     JUMP = 1
     4     # A match instruction, encoded as [1,1,0] inspects 1 or more of the next unused bits
     5     # in the input with its argument. If they all match, execution continues. If they do
     6-    # not, failure is returned. If a default instruction has been executed before, instead
     7+    # not, 0 is returned. If a default instruction has been executed before, instead
     8     # of failure the default instruction's argument is returned. It is followed by an
     9     # integer in match encoding, and a subprogram. That value is at least 2 bits and at
    10     # most 9 bits. An n-bit value signifies matching (n-1) bits in the input with the lower
    

    fjahr commented at 3:15 pm on November 19, 2025:
    Done. And I just clarified that failure means 0 in python, otherwise there would have been more edits necessary to make this clear, for example the line below your suggestion still talks about a failure too.
  21. hodlinator commented at 8:30 pm on November 17, 2025: contributor

    Reviewed eaf5e12d2b52d66a75a68cf94d328d0ec9048986

    Thanks for peeling off this PR as I suggested!

  22. fjahr commented at 0:10 am on November 19, 2025: contributor

    The first three PRs were already part of #28792, the others are new.

    i’m sure you mean the first three commits? 😄

    Yepp, fixed :)

  23. DrahtBot added the label Needs rebase on Nov 19, 2025
  24. fjahr force-pushed on Nov 19, 2025
  25. fjahr force-pushed on Nov 19, 2025
  26. DrahtBot added the label CI failed on Nov 19, 2025
  27. fjahr commented at 3:22 pm on November 19, 2025: contributor
    Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews!
  28. DrahtBot removed the label Needs rebase on Nov 19, 2025
  29. refactor: Operate on bytes instead of bits in Asmap code adfd800626
  30. refactor: Unify asmap version calculation and naming
    Calculate the asmap version only in one place: A dedicated function in util/asmap.
    
    The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
    d278d6193b
  31. refactor: Use span instead of vector for data in util/asmap
    This prevents holding the asmap data in memory twice.
    
    The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
    6b9ce8cff5
  32. doc: Add more extensive docs to asmap implementation
    Also makes minor improvement on the python implementation documentation.
    8a356b70b7
  33. asmap: Unify asmap casing in SanityCheckAsmap function name b41f5a29f7
  34. fjahr force-pushed on Nov 19, 2025
  35. DrahtBot removed the label CI failed on Nov 19, 2025
  36. in src/util/asmap.h:22 in d278d6193b outdated
    17@@ -17,5 +18,7 @@ bool SanityCheckASMap(const std::vector<std::byte>& asmap, int bits);
    18 
    19 /** Read asmap from provided binary file */
    20 std::vector<std::byte> DecodeAsmap(fs::path path);
    21+/** Calculate the asmap version, a checksum identifying the asmap being used. */
    22+uint256 AsmapVersion(const std::vector<std::byte>& data);
    


    hodlinator commented at 9:15 am on November 20, 2025:
    meganit in d278d6193b8d754eb108bc3ccc256260085b32c9 “refactor: Unify asmap version calculation and naming”: As uint256 AsmapVersion(const std::vector<std::byte>& data) is new in this PR, it could be introduced as uint256 AsmapVersion(std::span<const std::byte> data) from the beginning to avoid line churn.
  37. in src/util/asmap.h:23 in 6b9ce8cff5 outdated
    21 
    22-/** Read asmap from provided binary file */
    23+/** Read and check asmap from provided binary file */
    24 std::vector<std::byte> DecodeAsmap(fs::path path);
    25+/** Check asmap from embedded data */
    26+bool CheckAsmap(std::span<const std::byte> data);
    


    hodlinator commented at 9:35 am on November 20, 2025:

    in 6b9ce8cff58d7214fca51a8024ec4efb8694e059 on CheckAsmap:

    1. Justification for existence: I dislike default parameter values, so appreciate a distinct function from SanityCheckASMap.
    2. Comment: Nothing makes it specialized for only embedded data?
    3. Location: Maybe it could be added directly after SanityCheckASMap in both .h + .cpp since they are so closely related? (Could place it before but that would put it between Interpret and SanityCheckASMap which are nice to have after each other as they are also closely related).
    4. Naming: Could make it more descriptive - something like CheckStandardAsmap or CheckIPV6Asmap?
    5. Usage: Could be used in more places where we have SanityCheckAsmap(asmap, 128)
       0₿ git grep CheckAsmap
       1src/test/fuzz/asmap.cpp:    if (!SanityCheckAsmap(asmap, 128)) return;
       2src/test/fuzz/asmap_direct.cpp:    if (SanityCheckAsmap(asmap, ip_len)) {
       3src/test/fuzz/asmap_direct.cpp:            assert(!SanityCheckAsmap(prefix, ip_len));
       4src/test/fuzz/util/net.h:    if (!SanityCheckAsmap(std::span<std::byte>(asmap), 128)) {
       5src/util/asmap.cpp:    // - should have been caught by SanityCheckAsmap below
       6src/util/asmap.cpp:bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
       7src/util/asmap.cpp:bool CheckAsmap(const std::span<const std::byte> data)
       8src/util/asmap.cpp:    if (!SanityCheckAsmap(data, 128)) {
       9src/util/asmap.cpp:    if (!CheckAsmap(buffer)) {
      10src/util/asmap.h:bool SanityCheckAsmap(std::span<const std::byte> asmap, int bits);
      11src/util/asmap.h:bool CheckAsmap(std::span<const std::byte> data);
      
  38. in src/test/addrman_tests.cpp:27 in 6b9ce8cff5 outdated
    23@@ -24,7 +24,7 @@ using namespace std::literals;
    24 using node::NodeContext;
    25 using util::ToString;
    26 
    27-static NetGroupManager EMPTY_NETGROUPMAN{{}};
    28+static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
    


    hodlinator commented at 9:46 am on November 20, 2025:

    nit in 6b9ce8cff58d7214fca51a8024ec4efb8694e059:

      0diff --git a/src/bench/addrman.cpp b/src/bench/addrman.cpp
      1index 1e93b9c395..2f474c2411 100644
      2--- a/src/bench/addrman.cpp
      3+++ b/src/bench/addrman.cpp
      4@@ -24,7 +24,7 @@
      5 static constexpr size_t NUM_SOURCES = 64;
      6 static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
      7 
      8-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
      9+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
     10 static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
     11 
     12 static std::vector<CAddress> g_sources;
     13diff --git a/src/test/addrman_tests.cpp b/src/test/addrman_tests.cpp
     14index c9a0ebebf0..f1e54416cd 100644
     15--- a/src/test/addrman_tests.cpp
     16+++ b/src/test/addrman_tests.cpp
     17@@ -24,7 +24,7 @@ using namespace std::literals;
     18 using node::NodeContext;
     19 using util::ToString;
     20 
     21-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
     22+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
     23 static const bool DETERMINISTIC{true};
     24 
     25 static int32_t GetCheckRatio(const NodeContext& node_ctx)
     26@@ -584,7 +584,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket_legacy)
     27 // 101.8.0.0/16 AS8
     28 BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
     29 {
     30-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     31+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     32 
     33     CAddress addr1 = CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE);
     34     CAddress addr2 = CAddress(ResolveService("250.1.1.1", 9999), NODE_NONE);
     35@@ -637,7 +637,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
     36 
     37 BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
     38 {
     39-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     40+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     41 
     42     CAddress addr1 = CAddress(ResolveService("250.1.2.1", 8333), NODE_NONE);
     43     CAddress addr2 = CAddress(ResolveService("250.1.2.1", 9999), NODE_NONE);
     44@@ -714,7 +714,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
     45 
     46 BOOST_AUTO_TEST_CASE(addrman_serialization)
     47 {
     48-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     49+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
     50 
     51     const auto ratio = GetCheckRatio(m_node);
     52     auto addrman_asmap1 = std::make_unique<AddrMan>(netgroupman, DETERMINISTIC, ratio);
     53diff --git a/src/test/fuzz/asmap.cpp b/src/test/fuzz/asmap.cpp
     54index fec0f3f50a..6ed5aa7cd9 100644
     55--- a/src/test/fuzz/asmap.cpp
     56+++ b/src/test/fuzz/asmap.cpp
     57@@ -42,6 +42,6 @@ FUZZ_TARGET(asmap)
     58         memcpy(&ipv4, addr_data, addr_size);
     59         net_addr.SetIP(CNetAddr{ipv4});
     60     }
     61-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
     62+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
     63     (void)netgroupman.GetMappedAS(net_addr);
     64 }
     65diff --git a/src/test/fuzz/p2p_handshake.cpp b/src/test/fuzz/p2p_handshake.cpp
     66index 75e14865a4..a56e1e5f1f 100644
     67--- a/src/test/fuzz/p2p_handshake.cpp
     68+++ b/src/test/fuzz/p2p_handshake.cpp
     69@@ -48,7 +48,7 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize)
     70     chainman.ResetIbd();
     71 
     72     node::Warnings warnings{};
     73-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
     74+    auto netgroupman{NetGroupManager::NoAsmap()};
     75     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
     76     auto peerman = PeerManager::make(connman, addrman,
     77                                      /*banman=*/nullptr, chainman,
     78diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp
     79index 5ac796ae2c..30c121d7a2 100644
     80--- a/src/test/fuzz/process_message.cpp
     81+++ b/src/test/fuzz/process_message.cpp
     82@@ -77,7 +77,7 @@ FUZZ_TARGET(process_message, .init = initialize_process_message)
     83     chainman.DisableNextWrite();
     84 
     85     node::Warnings warnings{};
     86-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
     87+    auto netgroupman{NetGroupManager::NoAsmap()};
     88     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
     89     auto peerman = PeerManager::make(connman, addrman,
     90                                      /*banman=*/nullptr, chainman,
     91diff --git a/src/test/netbase_tests.cpp b/src/test/netbase_tests.cpp
     92index 5086f21ff4..4d355142e2 100644
     93--- a/src/test/netbase_tests.cpp
     94+++ b/src/test/netbase_tests.cpp
     95@@ -325,7 +325,7 @@ BOOST_AUTO_TEST_CASE(subnet_test)
     96 
     97 BOOST_AUTO_TEST_CASE(netbase_getgroup)
     98 {
     99-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; // use /16
    100+    auto netgroupman{NetGroupManager::NoAsmap()}; // use /16
    101     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("127.0.0.1")) == std::vector<unsigned char>({0})); // Local -> !Routable()
    102     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("257.0.0.1")) == std::vector<unsigned char>({0})); // !Valid -> !Routable()
    103     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("10.0.0.1")) == std::vector<unsigned char>({0})); // RFC1918 -> !Routable()
    
  39. in src/util/asmap.cpp:65 in b41f5a29f7
    61+ * Extract a single bit from byte array using big-endian bit ordering.
    62+ * Used for IP addresses to match network byte order conventions.
    63+ */
    64+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    65+{
    66+    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
    


    hodlinator commented at 10:43 am on November 20, 2025:

    Here’s a reduced version:

    0    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1;
    

    I’m surprised by the way the shift amount changes for different bit positions. Would have expected something closer to this for big endian…

    0    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - (bitpos % 8))) & 1;
    

    ….but it fails the unit tests.

    Godbolt experiment https://godbolt.org/z/6rjEzxhPM outputs:

     0PR bit shift expression (adfd8006...): (7 - ((bytes.size() * 8 - bitpos) % 8))
     1PR bit shift reduced: (7 + bitpos) % 8
     2Expected BE bit shift: 7 - (bitpos % 8)
     3
     4bitpos:  0, PR: 7, reduced: 7, expected: 7
     5bitpos:  1, PR: 0, reduced: 0, expected: 6
     6bitpos:  2, PR: 1, reduced: 1, expected: 5
     7bitpos:  3, PR: 2, reduced: 2, expected: 4
     8bitpos:  4, PR: 3, reduced: 3, expected: 3
     9bitpos:  5, PR: 4, reduced: 4, expected: 2
    10bitpos:  6, PR: 5, reduced: 5, expected: 1
    11bitpos:  7, PR: 6, reduced: 6, expected: 0
    12bitpos:  8, PR: 7, reduced: 7, expected: 7
    13bitpos:  9, PR: 0, reduced: 0, expected: 6
    14bitpos: 10, PR: 1, reduced: 1, expected: 5
    15bitpos: 11, PR: 2, reduced: 2, expected: 4
    16bitpos: 12, PR: 3, reduced: 3, expected: 3
    17bitpos: 13, PR: 4, reduced: 4, expected: 2
    18bitpos: 14, PR: 5, reduced: 5, expected: 1
    19bitpos: 15, PR: 6, reduced: 6, expected: 0
    

    However, when looking at the usage in Interpret(), we see if we would encounter JUMP as the first instruction, our bitpos is sort of out of bounds by one.

    https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L184

    https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L201

    If we instead at the callsite make sure to send in in-bounds values - GetBitBE(ip, bits - 1), it turns out we can simplify GetBitBE quite a bit.

     0--- a/src/util/asmap.cpp
     1+++ b/src/util/asmap.cpp
     2@@ -48,8 +48,8 @@ namespace {
     3 constexpr uint32_t INVALID = 0xFFFFFFFF;
     4 
     5 /**
     6- * Extract a single bit from byte array using little-endian bit ordering.
     7- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
     8+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
     9+ * Used for ASMap data.
    10  */
    11 inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    12 {
    13@@ -57,12 +57,12 @@ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    14 }
    15 
    16 /**
    17- * Extract a single bit from byte array using big-endian bit ordering.
    18+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
    19  * Used for IP addresses to match network byte order conventions.
    20  */
    21 inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    22 {
    23-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
    24+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
    25 }
    26 
    27 /**
    28@@ -198,7 +198,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
    29             if (jump == INVALID) break; // Jump offset straddles EOF
    30             if (bits == 0) break; // No input bits left
    31             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
    32-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
    33+            if (GetBitBE(ip, bits - 1)) {  // Check next IP bit (big-endian)
    34                 pos += jump;  // Bit = 1: skip to right subtree
    35             }
    36             // Bit = 0: fall through to left subtree
    37@@ -213,7 +213,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
    38             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
    39             if (bits < matchlen) break; // Not enough input bits
    40             for (uint32_t bit = 0; bit < matchlen; bit++) {
    41-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
    42+                if (GetBitBE(ip, bits - 1) != ((match >> (matchlen - 1 - bit)) & 1)) {
    43                     return default_asn;  // Pattern mismatch - use default
    44                 }
    45                 bits--;
    

    The similarity we get between the shifting in this new GetBitBE and GetBitLE show that we have the same bit ordering, just different byte-ordering.

    https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L54-L56

    That’s why I also updated the comments in the diff above.

  40. in src/util/asmap.cpp:186 in b41f5a29f7
    212-    uint8_t bits = ip.size();
    213+    size_t pos{0};
    214+    const size_t endpos{asmap.size() * 8};
    215+    uint8_t bits = ip.size() * 8;
    216     uint32_t default_asn = 0;
    217     uint32_t jump, match, matchlen;
    


    hodlinator commented at 1:43 pm on November 20, 2025:

    nit: Here are some changes that bring the Interpret() implementation even closer to SanityCheckAsmap() while also decreasing variable scope, making the code more readable. Maybe something to include in the last commit which aligns the casing of SanityCheckAsmap?

     0--- a/src/util/asmap.cpp
     1+++ b/src/util/asmap.cpp
     2@@ -183,18 +183,16 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
     3     const size_t endpos{asmap.size() * 8};
     4     uint8_t bits = ip.size() * 8;
     5     uint32_t default_asn = 0;
     6-    uint32_t jump, match, matchlen;
     7-    Instruction opcode;
     8     while (pos < endpos) {
     9-        opcode = DecodeType(pos, asmap);
    10+        Instruction opcode = DecodeType(pos, asmap);
    11         if (opcode == Instruction::RETURN) {
    12             // Found leaf node - return the ASN
    13-            default_asn = DecodeASN(pos, asmap);
    14-            if (default_asn == INVALID) break; // ASN straddles EOF
    15-            return default_asn;
    16+            uint32_t asn = DecodeASN(pos, asmap);
    17+            if (asn == INVALID) break; // ASN straddles EOF
    18+            return asn;
    19         } else if (opcode == Instruction::JUMP) {
    20             // Binary branch: if IP bit is 1, jump forward; else continue
    21-            jump = DecodeJump(pos, asmap);
    22+            uint32_t jump = DecodeJump(pos, asmap);
    23             if (jump == INVALID) break; // Jump offset straddles EOF
    24             if (bits == 0) break; // No input bits left
    25             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
    26@@ -208,11 +206,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
    27             // The match value encodes both length and pattern:
    28             // - highest set bit position determines length (bit_width - 1)
    29             // - lower bits contain the pattern to compare
    30-            match = DecodeMatch(pos, asmap);
    31+            uint32_t match = DecodeMatch(pos, asmap);
    32             if (match == INVALID) break; // Match bits straddle EOF
    33-            matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
    34+            int matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
    35             if (bits < matchlen) break; // Not enough input bits
    36-            for (uint32_t bit = 0; bit < matchlen; bit++) {
    37+            for (int bit = 0; bit < matchlen; bit++) {
    38                 if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
    39                     return default_asn;  // Pattern mismatch - use default
    40                 }
    
  41. in src/util/asmap.cpp:63 in b41f5a29f7
    59+
    60+/**
    61+ * Extract a single bit from byte array using big-endian bit ordering.
    62+ * Used for IP addresses to match network byte order conventions.
    63+ */
    64+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    


    hodlinator commented at 2:29 pm on November 20, 2025:

    nit: In other cases we have

    • DecodeBits(size_t& bitpos, const std::span<const std::byte>& data... and
    • DecodeType(size_t& bitpos, const std::span<const std::byte>& data)

    So it might be nice to have the same argument order here:

    0inline bool GetBitLE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept
    1{
    2    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
    3}
    4
    5/**
    6 * Extract a single bit from byte array using big-endian bit ordering.
    7 * Used for IP addresses to match network byte order conventions.
    8 */
    9inline bool GetBitBE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept
    

    Taking it one step further, using the observation that we only inspect a bit once, we could perform incrementing and decrementing within these functions as well. This also is influenced by the other comment investigating GetBitBE.

     0 constexpr uint32_t INVALID = 0xFFFFFFFF;
     1 
     2 /**
     3- * Extract a single bit from byte array using little-endian bit ordering.
     4- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
     5+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
     6+ * Used for ASMap data.
     7  */
     8-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
     9+inline bool DecodeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
    10 {
    11-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
    12+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
    13+    ++bitpos;
    14+    return bit;
    15 }
    16 
    17 /**
    18- * Extract a single bit from byte array using big-endian bit ordering.
    19+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
    20  * Used for IP addresses to match network byte order conventions.
    21  */
    22-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
    23+inline bool DecodeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
    24 {
    25-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
    26+    --bitpos;
    27+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
    28 }
    29 
    30 /**
    31@@ -88,8 +91,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
    32         // Read continuation bit to determine if we're in this class
    33         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
    34             if (bitpos >= data.size() * 8) break;
    35-            bit = GetBitLE(data, bitpos);
    36-            bitpos++;
    37+            bit = DecodeBitLE(bitpos, data);
    38         } else {
    39             bit = 0;  // Last class has no continuation bit
    40         }
    41@@ -101,8 +103,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
    42             // Decode the position within this class in big endian
    43             for (int b = 0; b < *bit_sizes_it; b++) {
    44                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
    45-                bit = GetBitLE(data, bitpos);
    46-                bitpos++;
    47+                bit = DecodeBitLE(bitpos, data);
    48                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
    49             }
    50             return val;
    51@@ -198,11 +199,10 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
    52             if (jump == INVALID) break; // Jump offset straddles EOF
    53             if (bits == 0) break; // No input bits left
    54             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
    55-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
    56+            if (DecodeBitBE(bits, ip)) {  // Check next IP bit (big-endian)
    57                 pos += jump;  // Bit = 1: skip to right subtree
    58             }
    59             // Bit = 0: fall through to left subtree
    60-            bits--;
    61         } else if (opcode == Instruction::MATCH) {
    62             // Compare multiple IP bits against a pattern
    63             // The match value encodes both length and pattern:
    64@@ -213,10 +213,9 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
    65             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
    66             if (bits < matchlen) break; // Not enough input bits
    67             for (uint32_t bit = 0; bit < matchlen; bit++) {
    68-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
    69+                if (DecodeBitBE(bits, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
    70                     return default_asn;  // Pattern mismatch - use default
    71                 }
    72-                bits--;
    73             }
    74             // Pattern matched - continue execution
    75         } else if (opcode == Instruction::DEFAULT) {
    76@@ -260,8 +259,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
    77                 // Nothing to execute anymore
    78                 if (endpos - pos > 7) return false; // Excessive padding
    79                 while (pos != endpos) {
    80-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
    81-                    ++pos;
    82+                    if (DecodeBitLE(pos, asmap)) return false; // Nonzero padding bit
    83                 }
    84                 return true; // Sanely reached EOF
    85             } else {
    

    hodlinator commented at 7:21 pm on November 20, 2025:
    (Maybe ConsumeBit(LE|BE) would be more appropriate than DecodeBit(LE|BE)).

    hodlinator commented at 12:04 pm on November 21, 2025:

    Couldn’t stop thinking about this. It felt weird to implement reverse byte order in GetBitBE but then also start bits from the end of ip data and count backwards. Found a way that I think simplifies things further, making the order we consume bytes between LE&BE implementations the same, but having different bit-order. Diff against b41f5a29f7d56da8fd157770ad29b5776c3684c6:

      0--- a/src/util/asmap.cpp
      1+++ b/src/util/asmap.cpp
      2@@ -48,21 +48,25 @@ namespace {
      3 constexpr uint32_t INVALID = 0xFFFFFFFF;
      4 
      5 /**
      6- * Extract a single bit from byte array using little-endian bit ordering.
      7- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
      8+ * Extract a single bit from byte array using little-endian bit ordering (LSB first).
      9+ * Used for ASMap data.
     10  */
     11-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
     12+inline bool ConsumeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
     13 {
     14-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
     15+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
     16+    ++bitpos;
     17+    return bit;
     18 }
     19 
     20 /**
     21- * Extract a single bit from byte array using big-endian bit ordering.
     22+ * Extract a single bit from byte array using big-endian bit ordering (MSB first).
     23  * Used for IP addresses to match network byte order conventions.
     24  */
     25-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
     26+inline bool ConsumeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
     27 {
     28-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
     29+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (7 - (bitpos % 8))) & 1;
     30+    ++bitpos;
     31+    return bit;
     32 }
     33 
     34 /**
     35@@ -88,8 +92,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
     36         // Read continuation bit to determine if we're in this class
     37         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
     38             if (bitpos >= data.size() * 8) break;
     39-            bit = GetBitLE(data, bitpos);
     40-            bitpos++;
     41+            bit = ConsumeBitLE(bitpos, data);
     42         } else {
     43             bit = 0;  // Last class has no continuation bit
     44         }
     45@@ -101,8 +104,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
     46             // Decode the position within this class in big endian
     47             for (int b = 0; b < *bit_sizes_it; b++) {
     48                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
     49-                bit = GetBitLE(data, bitpos);
     50-                bitpos++;
     51+                bit = ConsumeBitLE(bitpos, data);
     52                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
     53             }
     54             return val;
     55@@ -181,7 +183,8 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
     56 {
     57     size_t pos{0};
     58     const size_t endpos{asmap.size() * 8};
     59-    uint8_t bits = ip.size() * 8;
     60+    uint8_t ip_bit{0};
     61+    const uint8_t ip_bits_end = ip.size() * 8;
     62     uint32_t default_asn = 0;
     63     uint32_t jump, match, matchlen;
     64     Instruction opcode;
     65@@ -196,13 +199,12 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
     66             // Binary branch: if IP bit is 1, jump forward; else continue
     67             jump = DecodeJump(pos, asmap);
     68             if (jump == INVALID) break; // Jump offset straddles EOF
     69-            if (bits == 0) break; // No input bits left
     70+            if (ip_bit == ip_bits_end) break; // No input bits left
     71             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
     72-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
     73+            if (ConsumeBitBE(ip_bit, ip)) {  // Check next IP bit (big-endian)
     74                 pos += jump;  // Bit = 1: skip to right subtree
     75             }
     76             // Bit = 0: fall through to left subtree
     77-            bits--;
     78         } else if (opcode == Instruction::MATCH) {
     79             // Compare multiple IP bits against a pattern
     80             // The match value encodes both length and pattern:
     81@@ -211,12 +213,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
     82             match = DecodeMatch(pos, asmap);
     83             if (match == INVALID) break; // Match bits straddle EOF
     84             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
     85-            if (bits < matchlen) break; // Not enough input bits
     86+            if ((ip_bits_end - ip_bit) < matchlen) break; // Not enough input bits
     87             for (uint32_t bit = 0; bit < matchlen; bit++) {
     88-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
     89+                if (ConsumeBitBE(ip_bit, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
     90                     return default_asn;  // Pattern mismatch - use default
     91                 }
     92-                bits--;
     93             }
     94             // Pattern matched - continue execution
     95         } else if (opcode == Instruction::DEFAULT) {
     96@@ -260,8 +261,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
     97                 // Nothing to execute anymore
     98                 if (endpos - pos > 7) return false; // Excessive padding
     99                 while (pos != endpos) {
    100-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
    101-                    ++pos;
    102+                    if (ConsumeBitLE(pos, asmap)) return false; // Nonzero padding bit
    103                 }
    104                 return true; // Sanely reached EOF
    105             } else {
    
  42. hodlinator commented at 3:12 pm on November 20, 2025: contributor

    Reviewed b41f5a29f7d56da8fd157770ad29b5776c3684c6

    Sorry for not finding these earlier, getting closer to A-C-K.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-22 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me