refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation

fjahr commented at 8:50 pm on November 15, 2025: contributor

This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).

The changes are:

Modernizes and simplifies the asmap code by operating on std::byte instead of bits
Unifies asmap version calculation and naming (previously it was called version and checksum interchangeably)
Operate on a span rather than a vector in the asmap internal to prevent holding the asmap data in memory twice
Add more extensive documentation to the asmap implementation
Unify asmap casing in implemetation function names

The first three commits were already part of #28792, the others are new.

The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.

DrahtBot commented at 8:50 pm on November 15, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	hodlinator, sipa, sedited

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34448 (ci, iwyu: Fix warnings in src/util and treat them as errors by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

“+ # not, failure (represented by 0) is returned. If a default instruction has been executed before, instead” -> “+ # not, failure (represented by 0) is returned. If a default instruction has been executed before, the default instruction’s argument is returned instead.” [The added line ends with “instead”, leaving the sentence split awkwardly across lines; rewording makes the meaning explicit and grammatically complete.]
“+ * Lastly, bit_sizes[k] bits encoding in big endian the position within that class” -> “+ * Lastly, bit_sizes[k] bits encode in big-endian the position within that class.” [The original lacks a verb (“encoding” used incorrectly), making the sentence ungrammatical; inserting “encode” fixes it and clarifies intent.]

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

DecodeBits(bitpos, data, 0, TYPE_BIT_SIZES) in src/util/asmap.cpp
DecodeBits(bitpos, data, 1, ASN_BIT_SIZES) in src/util/asmap.cpp
DecodeBits(bitpos, data, 2, MATCH_BIT_SIZES) in src/util/asmap.cpp
DecodeBits(bitpos, data, 17, JUMP_BIT_SIZES) in src/util/asmap.cpp

2026-01-20

fjahr commented at 9:06 pm on November 15, 2025: contributor

@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :)

in src/util/asmap.cpp:25 in 93205dd90f outdated

20@@ -21,140 +21,235 @@
21 #include <utility>
22 #include <vector>
23 
24+/*
25+ * ASMap (Autonomous System Map) Implementation

sipa commented at 10:05 pm on November 15, 2025:

All the documentation changes in this commit look correct to me.

One thing that may be worth mentioning early on is that it’s a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.

fjahr commented at 9:46 pm on November 16, 2025:

Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.

DrahtBot added the label CI failed on Nov 15, 2025

fjahr force-pushed on Nov 16, 2025

fjahr commented at 9:45 pm on November 16, 2025: contributor

All the feedback from #28792 that applied to the commits here has been addressed here now.

fjahr force-pushed on Nov 16, 2025

DrahtBot removed the label CI failed on Nov 17, 2025

laanwj commented at 12:47 pm on November 17, 2025: member

The first three PRs were already part of #28792, the others are new.

i’m sure you mean the first three commits? 😄

in src/netgroup.cpp:87 in c2274746d1 outdated

82@@ -81,31 +83,28 @@ std::vector<unsigned char> NetGroupManager::GetGroup(const CNetAddr& address) co
83 uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
84 {
85     uint32_t net_class = address.GetNetClass();
86-    if (m_asmap.size() == 0 || (net_class != NET_IPV4 && net_class != NET_IPV6)) {
87+    if (m_asmap.empty() || (net_class != NET_IPV4 && net_class != NET_IPV6)) {
88         return 0; // Indicates not found, safe because AS0 is reserved per RFC7607.
89     }
90-    std::vector<bool> ip_bits(128);

laanwj commented at 2:23 pm on November 17, 2025:

Isn’t this ip_bytes now? (though bits is technically still valid ofc)

hodlinator commented at 8:42 pm on November 17, 2025:

Agree with the rename suggestion, would make assert(addr_bytes.size() == ip_bits.size()); less surprising below.

fjahr commented at 3:20 pm on November 19, 2025:

Renamed

in src/util/asmap.cpp:207 in db54e01640 outdated

202@@ -203,10 +203,8 @@ std::vector<bool> DecodeAsmap(fs::path path)
203         LogWarning("Failed to open asmap file from disk");
204         return bits;
205     }
206-    file.seek(0, SEEK_END);
207-    int length = file.tell();

waketraindev commented at 3:19 pm on November 17, 2025:

Bundling the implementation of AutoFile::size here doesn’t seem strictly necessary. It looks like you could rely on the length value return by tell() and avoid depending on Streams refactoring. That might help keep this PR a bit more focused on the asmap changes without pulling in broader Streams updates or file-size-related changes elsewhere.

fjahr commented at 0:13 am on November 19, 2025:

The commit is here because this PR depends on #33026 and the commit is coming from there, so the discussion on these commits should be held there. I just responded there, so marking this as resolved here.

in src/netgroup.cpp:96 in c2274746d1 outdated

 96         for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
 97-            for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
 98-                ip_bits[byte_i * 8 + bit_i] = (IPV4_IN_IPV6_PREFIX[byte_i] >> (7 - bit_i)) & 1;
 99-            }
100+            ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]);
101         }

hodlinator commented at 5:59 pm on November 17, 2025:

In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: The for doing ip_bits[byte_i] = std::byte(IPV4_IN_IPV6_PREFIX[byte_i]); can now be removed since we have copy_n doing the same thing above.

fjahr commented at 2:56 pm on November 19, 2025:

Ah, rebase fail I guess, done.

in src/test/netbase_tests.cpp:631 in c2274746d1 outdated

637-    }
638+        "63dc33d28f757a4a5e15d6a08"_hex_v};
639 
640     // Construct NetGroupManager with this data.
641-    NetGroupManager netgroup{std::move(asmap_bits)};
642+    NetGroupManager netgroup{std::move(ASMAP_DATA)};

hodlinator commented at 6:48 pm on November 17, 2025:

In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: It’s unfortunate that you need to make the hex data a vector again when the later span commit will accept the std::array.

You could do:

0NetGroupManager netgroup{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};

In this commit and avoid changing the type of ASMAP_DATA.

fjahr commented at 3:16 pm on November 19, 2025:

That works, done.

hodlinator commented at 9:38 pm on November 19, 2025:

I meant you could could have this in the intermediate commit:

0NetGroupManager netgroup{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};

But then once we reach the commit adding embedding support change to:

0auto netgroup{NetGroupManager::WithEmbeddedAsmap(ASMAP_DATA)};

…ending up going directly from std::array -> std::span.

If you prefer the current way I would still avoid auto and change:

0- auto asmap_vec{std::vector(ASMAP_DATA.begin(), ASMAP_DATA.end())};
1+ std::vector asmap_vec(ASMAP_DATA.begin(), ASMAP_DATA.end());
2  NetGroupManager netgroup{asmap_vec};

But no big deal.

Nice to see 1/3 merged!

fjahr commented at 10:28 pm on November 19, 2025:

Ah, I see, thanks for clarifying! Done.

hodlinator commented at 8:55 am on November 20, 2025:

Thank you! In adfd800626a7910a603c34ab733fc1e91d438bc5 the ASMAP_DATA type is still unnecessarily changed from constexpr auto to static const auto.

fjahr commented at 11:47 pm on November 24, 2025:

Removed the change from constexpr.

in src/util/asmap.cpp:215 in c2274746d1 outdated

221-        for (int bit = 0; bit < 8; ++bit) {
222-            bits.push_back((cur_byte >> bit) & 1);
223-        }
224-    }
225-    if (!SanityCheckASMap(bits, 128)) {
226+    file.seek(0, SEEK_SET);

hodlinator commented at 6:53 pm on November 17, 2025:

In c2274746d1b3d1a2712ef8bd567d25f5aa5cf5bb “refactor: Operate on bytes instead of bits in Asmap code”: Line seems like a leftover from a merge?

fjahr commented at 3:15 pm on November 19, 2025:

Indeed, removed.

in src/bench/addrman.cpp:1 in 4c8e4eee00 outdated

hodlinator commented at 7:09 pm on November 17, 2025:

nit in 4c8e4eee00f16bace634221277c694ee5dda23f8: Sorry, accidentally injected possessive apostrophe, “span’s”, commit message should be:

The version hash changes due to spans being serialized without their size-prefix (unlike vectors).

fjahr commented at 3:15 pm on November 19, 2025:

Fixed

in src/util/asmap.cpp:310 in 0196f70a8c outdated

304@@ -203,6 +305,10 @@ bool SanityCheckASMap(const std::span<const std::byte>& asmap, int bits)
305     return false; // Reached EOF without RETURN instruction
306 }
307 
308+/**
309+ * Provides a safe interface for validating ASMap data before use. Validates
310+ * the data and returns it if valid, empty span otherwise.

hodlinator commented at 7:28 pm on November 17, 2025:

In 0196f70a8c99b9c03c81e879cd9f05560fc20543: Could replace second sentence with “Returns true if the data is valid for 128 bits long inputs” or just remove it (no longer returns span).

fjahr commented at 3:15 pm on November 19, 2025:

Done

in src/util/asmap.h:16 in eaf5e12d2b outdated

12 #include <cstdint>
13+#include <span>
14 #include <vector>
15 
16-uint32_t Interpret(const std::vector<bool> &asmap, const std::vector<bool> &ip);
17+uint32_t Interpret(std::span<const std::byte> asmap, std::span<const std::byte> ip);

hodlinator commented at 8:19 pm on November 17, 2025:

Continuing #28792 (review)

We still mix std::span and const std::span& here in:

https://github.com/bitcoin/bitcoin/blob/eaf5e12d2b52d66a75a68cf94d328d0ec9048986/src/util/asmap.h#L16-L25

fjahr commented at 3:15 pm on November 19, 2025:

Ok, I hope I got them all now.

in src/util/asmap.cpp:131 in eaf5e12d2b

138+    // A jump instruction, encoded as [1,0], inspects the next unused bit in the input
139+    // and either continues execution (if 0), or skips a specified number of bits (if 1).
140+    // It is followed by an integer using jump encoding.
141     JUMP = 1,
142+    // A match instruction, encoded as [1,1,0], inspects 1 or more of the next unused bits
143+    // in the input. If they all match, execution continues. If not, the default ASN is returned.

hodlinator commented at 8:23 pm on November 17, 2025:

Related: #28792 (review)

Thanks for documenting the C++ implementation!

nit:

0    // in the input. If they all match, execution continues. If not, the default ASN is returned (or 0 if unset).

Could also correct the Python side:

 0--- a/contrib/asmap/asmap.py
 1+++ b/contrib/asmap/asmap.py
 2@@ -157,7 +157,7 @@ class _Instruction(Enum):
 3     JUMP = 1
 4     # A match instruction, encoded as [1,1,0] inspects 1 or more of the next unused bits
 5     # in the input with its argument. If they all match, execution continues. If they do
 6-    # not, failure is returned. If a default instruction has been executed before, instead
 7+    # not, 0 is returned. If a default instruction has been executed before, instead
 8     # of failure the default instruction's argument is returned. It is followed by an
 9     # integer in match encoding, and a subprogram. That value is at least 2 bits and at
10     # most 9 bits. An n-bit value signifies matching (n-1) bits in the input with the lower

fjahr commented at 3:15 pm on November 19, 2025:

Done. And I just clarified that failure means 0 in python, otherwise there would have been more edits necessary to make this clear, for example the line below your suggestion still talks about a failure too.

hodlinator commented at 8:30 pm on November 17, 2025: contributor

Reviewed eaf5e12d2b52d66a75a68cf94d328d0ec9048986

Thanks for peeling off this PR as I suggested!

fjahr commented at 0:10 am on November 19, 2025: contributor

The first three PRs were already part of #28792, the others are new.

i’m sure you mean the first three commits? 😄

Yepp, fixed :)

DrahtBot added the label Needs rebase on Nov 19, 2025

fjahr force-pushed on Nov 19, 2025

DrahtBot added the label CI failed on Nov 19, 2025

fjahr commented at 3:22 pm on November 19, 2025: contributor

Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews!

DrahtBot removed the label Needs rebase on Nov 19, 2025

fjahr force-pushed on Nov 19, 2025

DrahtBot removed the label CI failed on Nov 19, 2025

in src/util/asmap.h:22 in d278d6193b outdated

17@@ -17,5 +18,7 @@ bool SanityCheckASMap(const std::vector<std::byte>& asmap, int bits);
18 
19 /** Read asmap from provided binary file */
20 std::vector<std::byte> DecodeAsmap(fs::path path);
21+/** Calculate the asmap version, a checksum identifying the asmap being used. */
22+uint256 AsmapVersion(const std::vector<std::byte>& data);

hodlinator commented at 9:15 am on November 20, 2025:

meganit in d278d6193b8d754eb108bc3ccc256260085b32c9 “refactor: Unify asmap version calculation and naming”: As uint256 AsmapVersion(const std::vector<std::byte>& data) is new in this PR, it could be introduced as uint256 AsmapVersion(std::span<const std::byte> data) from the beginning to avoid line churn.

fjahr commented at 11:47 pm on November 24, 2025:

I actually slightly prefer to keep it as is, because if I do this earlier I also have to change the tests because the hash result changes too with the span usage. I think it makes more sense to have this in the later commit where there is a much bigger overall benefit of introducing span evident that is actually what justifies the hash result being changed.

hodlinator commented at 8:23 am on November 25, 2025:

Ah, makes sense too. :+1:

in src/util/asmap.h:23 in 6b9ce8cff5 outdated

21 
22-/** Read asmap from provided binary file */
23+/** Read and check asmap from provided binary file */
24 std::vector<std::byte> DecodeAsmap(fs::path path);
25+/** Check asmap from embedded data */
26+bool CheckAsmap(std::span<const std::byte> data);

hodlinator commented at 9:35 am on November 20, 2025:

in 6b9ce8cff58d7214fca51a8024ec4efb8694e059 on CheckAsmap:

Justification for existence: I dislike default parameter values, so appreciate a distinct function from SanityCheckASMap.
Comment: Nothing makes it specialized for only embedded data?
Location: Maybe it could be added directly after SanityCheckASMap in both .h + .cpp since they are so closely related? (Could place it before but that would put it between Interpret and SanityCheckASMap which are nice to have after each other as they are also closely related).
Naming: Could make it more descriptive - something like CheckStandardAsmap or CheckIPV6Asmap?

Usage: Could be used in more places where we have SanityCheckAsmap(asmap, 128)

 0₿ git grep CheckAsmap
 1src/test/fuzz/asmap.cpp:    if (!SanityCheckAsmap(asmap, 128)) return;
 2src/test/fuzz/asmap_direct.cpp:    if (SanityCheckAsmap(asmap, ip_len)) {
 3src/test/fuzz/asmap_direct.cpp:            assert(!SanityCheckAsmap(prefix, ip_len));
 4src/test/fuzz/util/net.h:    if (!SanityCheckAsmap(std::span<std::byte>(asmap), 128)) {
 5src/util/asmap.cpp:    // - should have been caught by SanityCheckAsmap below
 6src/util/asmap.cpp:bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
 7src/util/asmap.cpp:bool CheckAsmap(const std::span<const std::byte> data)
 8src/util/asmap.cpp:    if (!SanityCheckAsmap(data, 128)) {
 9src/util/asmap.cpp:    if (!CheckAsmap(buffer)) {
10src/util/asmap.h:bool SanityCheckAsmap(std::span<const std::byte> asmap, int bits);
11src/util/asmap.h:bool CheckAsmap(std::span<const std::byte> data);

fjahr commented at 11:48 pm on November 24, 2025:

Renamed to CheckStandardAsmap and using it in a two more places, but keeping it out of asmap_direct fuzzer because we are testing the bits there explicitly. Moved it below below the sanity check in .h which is the same location as .cpp

in src/test/addrman_tests.cpp:27 in 6b9ce8cff5 outdated

23@@ -24,7 +24,7 @@ using namespace std::literals;
24 using node::NodeContext;
25 using util::ToString;
26 
27-static NetGroupManager EMPTY_NETGROUPMAN{{}};
28+static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};

hodlinator commented at 9:46 am on November 20, 2025:

nit in 6b9ce8cff58d7214fca51a8024ec4efb8694e059:

  0diff --git a/src/bench/addrman.cpp b/src/bench/addrman.cpp
  1index 1e93b9c395..2f474c2411 100644
  2--- a/src/bench/addrman.cpp
  3+++ b/src/bench/addrman.cpp
  4@@ -24,7 +24,7 @@
  5 static constexpr size_t NUM_SOURCES = 64;
  6 static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
  7 
  8-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
  9+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
 10 static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
 11 
 12 static std::vector<CAddress> g_sources;
 13diff --git a/src/test/addrman_tests.cpp b/src/test/addrman_tests.cpp
 14index c9a0ebebf0..f1e54416cd 100644
 15--- a/src/test/addrman_tests.cpp
 16+++ b/src/test/addrman_tests.cpp
 17@@ -24,7 +24,7 @@ using namespace std::literals;
 18 using node::NodeContext;
 19 using util::ToString;
 20 
 21-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
 22+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
 23 static const bool DETERMINISTIC{true};
 24 
 25 static int32_t GetCheckRatio(const NodeContext& node_ctx)
 26@@ -584,7 +584,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket_legacy)
 27 // 101.8.0.0/16 AS8
 28 BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
 29 {
 30-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 31+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 32 
 33     CAddress addr1 = CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE);
 34     CAddress addr2 = CAddress(ResolveService("250.1.1.1", 9999), NODE_NONE);
 35@@ -637,7 +637,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
 36 
 37 BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
 38 {
 39-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 40+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 41 
 42     CAddress addr1 = CAddress(ResolveService("250.1.2.1", 8333), NODE_NONE);
 43     CAddress addr2 = CAddress(ResolveService("250.1.2.1", 9999), NODE_NONE);
 44@@ -714,7 +714,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
 45 
 46 BOOST_AUTO_TEST_CASE(addrman_serialization)
 47 {
 48-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 49+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 50 
 51     const auto ratio = GetCheckRatio(m_node);
 52     auto addrman_asmap1 = std::make_unique<AddrMan>(netgroupman, DETERMINISTIC, ratio);
 53diff --git a/src/test/fuzz/asmap.cpp b/src/test/fuzz/asmap.cpp
 54index fec0f3f50a..6ed5aa7cd9 100644
 55--- a/src/test/fuzz/asmap.cpp
 56+++ b/src/test/fuzz/asmap.cpp
 57@@ -42,6 +42,6 @@ FUZZ_TARGET(asmap)
 58         memcpy(&ipv4, addr_data, addr_size);
 59         net_addr.SetIP(CNetAddr{ipv4});
 60     }
 61-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
 62+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
 63     (void)netgroupman.GetMappedAS(net_addr);
 64 }
 65diff --git a/src/test/fuzz/p2p_handshake.cpp b/src/test/fuzz/p2p_handshake.cpp
 66index 75e14865a4..a56e1e5f1f 100644
 67--- a/src/test/fuzz/p2p_handshake.cpp
 68+++ b/src/test/fuzz/p2p_handshake.cpp
 69@@ -48,7 +48,7 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize)
 70     chainman.ResetIbd();
 71 
 72     node::Warnings warnings{};
 73-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
 74+    auto netgroupman{NetGroupManager::NoAsmap()};
 75     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
 76     auto peerman = PeerManager::make(connman, addrman,
 77                                      /*banman=*/nullptr, chainman,
 78diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp
 79index 5ac796ae2c..30c121d7a2 100644
 80--- a/src/test/fuzz/process_message.cpp
 81+++ b/src/test/fuzz/process_message.cpp
 82@@ -77,7 +77,7 @@ FUZZ_TARGET(process_message, .init = initialize_process_message)
 83     chainman.DisableNextWrite();
 84 
 85     node::Warnings warnings{};
 86-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
 87+    auto netgroupman{NetGroupManager::NoAsmap()};
 88     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
 89     auto peerman = PeerManager::make(connman, addrman,
 90                                      /*banman=*/nullptr, chainman,
 91diff --git a/src/test/netbase_tests.cpp b/src/test/netbase_tests.cpp
 92index 5086f21ff4..4d355142e2 100644
 93--- a/src/test/netbase_tests.cpp
 94+++ b/src/test/netbase_tests.cpp
 95@@ -325,7 +325,7 @@ BOOST_AUTO_TEST_CASE(subnet_test)
 96 
 97 BOOST_AUTO_TEST_CASE(netbase_getgroup)
 98 {
 99-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; // use /16
100+    auto netgroupman{NetGroupManager::NoAsmap()}; // use /16
101     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("127.0.0.1")) == std::vector<unsigned char>({0})); // Local -> !Routable()
102     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("257.0.0.1")) == std::vector<unsigned char>({0})); // !Valid -> !Routable()
103     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("10.0.0.1")) == std::vector<unsigned char>({0})); // RFC1918 -> !Routable()

fjahr commented at 11:48 pm on November 24, 2025:

Done

in src/util/asmap.cpp:65 in b41f5a29f7

61+ * Extract a single bit from byte array using big-endian bit ordering.
62+ * Used for IP addresses to match network byte order conventions.
63+ */
64+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
65+{
66+    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;

hodlinator commented at 10:43 am on November 20, 2025:

Here’s a reduced version:

0    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1;

I’m surprised by the way the shift amount changes for different bit positions. Would have expected something closer to this for big endian…

0    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - (bitpos % 8))) & 1;

….but it fails the unit tests.

Godbolt experiment https://godbolt.org/z/6rjEzxhPM outputs:

 0PR bit shift expression (adfd8006...): (7 - ((bytes.size() * 8 - bitpos) % 8))
 1PR bit shift reduced: (7 + bitpos) % 8
 2Expected BE bit shift: 7 - (bitpos % 8)
 3
 4bitpos:  0, PR: 7, reduced: 7, expected: 7
 5bitpos:  1, PR: 0, reduced: 0, expected: 6
 6bitpos:  2, PR: 1, reduced: 1, expected: 5
 7bitpos:  3, PR: 2, reduced: 2, expected: 4
 8bitpos:  4, PR: 3, reduced: 3, expected: 3
 9bitpos:  5, PR: 4, reduced: 4, expected: 2
10bitpos:  6, PR: 5, reduced: 5, expected: 1
11bitpos:  7, PR: 6, reduced: 6, expected: 0
12bitpos:  8, PR: 7, reduced: 7, expected: 7
13bitpos:  9, PR: 0, reduced: 0, expected: 6
14bitpos: 10, PR: 1, reduced: 1, expected: 5
15bitpos: 11, PR: 2, reduced: 2, expected: 4
16bitpos: 12, PR: 3, reduced: 3, expected: 3
17bitpos: 13, PR: 4, reduced: 4, expected: 2
18bitpos: 14, PR: 5, reduced: 5, expected: 1
19bitpos: 15, PR: 6, reduced: 6, expected: 0

However, when looking at the usage in Interpret(), we see if we would encounter JUMP as the first instruction, our bitpos is sort of out of bounds by one.

https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L184

https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L201

If we instead at the callsite make sure to send in in-bounds values - GetBitBE(ip, bits - 1), it turns out we can simplify GetBitBE quite a bit.

 0--- a/src/util/asmap.cpp
 1+++ b/src/util/asmap.cpp
 2@@ -48,8 +48,8 @@ namespace {
 3 constexpr uint32_t INVALID = 0xFFFFFFFF;
 4 
 5 /**
 6- * Extract a single bit from byte array using little-endian bit ordering.
 7- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
 8+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
 9+ * Used for ASMap data.
10  */
11 inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
12 {
13@@ -57,12 +57,12 @@ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
14 }
15 
16 /**
17- * Extract a single bit from byte array using big-endian bit ordering.
18+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
19  * Used for IP addresses to match network byte order conventions.
20  */
21 inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
22 {
23-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
24+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
25 }
26 
27 /**
28@@ -198,7 +198,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
29             if (jump == INVALID) break; // Jump offset straddles EOF
30             if (bits == 0) break; // No input bits left
31             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
32-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
33+            if (GetBitBE(ip, bits - 1)) {  // Check next IP bit (big-endian)
34                 pos += jump;  // Bit = 1: skip to right subtree
35             }
36             // Bit = 0: fall through to left subtree
37@@ -213,7 +213,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
38             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
39             if (bits < matchlen) break; // Not enough input bits
40             for (uint32_t bit = 0; bit < matchlen; bit++) {
41-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
42+                if (GetBitBE(ip, bits - 1) != ((match >> (matchlen - 1 - bit)) & 1)) {
43                     return default_asn;  // Pattern mismatch - use default
44                 }
45                 bits--;

The similarity we get between the shifting in this new GetBitBE and GetBitLE show that we have the same bit ordering, just different byte-ordering.

https://github.com/bitcoin/bitcoin/blob/b41f5a29f7d56da8fd157770ad29b5776c3684c6/src/util/asmap.cpp#L54-L56

That’s why I also updated the comments in the diff above.

in src/util/asmap.cpp:186 in b41f5a29f7

212-    uint8_t bits = ip.size();
213+    size_t pos{0};
214+    const size_t endpos{asmap.size() * 8};
215+    uint8_t bits = ip.size() * 8;
216     uint32_t default_asn = 0;
217     uint32_t jump, match, matchlen;

hodlinator commented at 1:43 pm on November 20, 2025:

nit: Here are some changes that bring the Interpret() implementation even closer to SanityCheckAsmap() while also decreasing variable scope, making the code more readable. Maybe something to include in the last commit which aligns the casing of SanityCheckAsmap?

 0--- a/src/util/asmap.cpp
 1+++ b/src/util/asmap.cpp
 2@@ -183,18 +183,16 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
 3     const size_t endpos{asmap.size() * 8};
 4     uint8_t bits = ip.size() * 8;
 5     uint32_t default_asn = 0;
 6-    uint32_t jump, match, matchlen;
 7-    Instruction opcode;
 8     while (pos < endpos) {
 9-        opcode = DecodeType(pos, asmap);
10+        Instruction opcode = DecodeType(pos, asmap);
11         if (opcode == Instruction::RETURN) {
12             // Found leaf node - return the ASN
13-            default_asn = DecodeASN(pos, asmap);
14-            if (default_asn == INVALID) break; // ASN straddles EOF
15-            return default_asn;
16+            uint32_t asn = DecodeASN(pos, asmap);
17+            if (asn == INVALID) break; // ASN straddles EOF
18+            return asn;
19         } else if (opcode == Instruction::JUMP) {
20             // Binary branch: if IP bit is 1, jump forward; else continue
21-            jump = DecodeJump(pos, asmap);
22+            uint32_t jump = DecodeJump(pos, asmap);
23             if (jump == INVALID) break; // Jump offset straddles EOF
24             if (bits == 0) break; // No input bits left
25             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
26@@ -208,11 +206,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
27             // The match value encodes both length and pattern:
28             // - highest set bit position determines length (bit_width - 1)
29             // - lower bits contain the pattern to compare
30-            match = DecodeMatch(pos, asmap);
31+            uint32_t match = DecodeMatch(pos, asmap);
32             if (match == INVALID) break; // Match bits straddle EOF
33-            matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
34+            int matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
35             if (bits < matchlen) break; // Not enough input bits
36-            for (uint32_t bit = 0; bit < matchlen; bit++) {
37+            for (int bit = 0; bit < matchlen; bit++) {
38                 if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
39                     return default_asn;  // Pattern mismatch - use default
40                 }

fjahr commented at 11:48 pm on November 24, 2025:

Done

in src/util/asmap.cpp:63 in b41f5a29f7

59+
60+/**
61+ * Extract a single bit from byte array using big-endian bit ordering.
62+ * Used for IP addresses to match network byte order conventions.
63+ */
64+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept

hodlinator commented at 2:29 pm on November 20, 2025:

nit: In other cases we have

DecodeBits(size_t& bitpos, const std::span<const std::byte>& data... and
DecodeType(size_t& bitpos, const std::span<const std::byte>& data)
…

So it might be nice to have the same argument order here:

0inline bool GetBitLE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept
1{
2    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
3}
4
5/**
6 * Extract a single bit from byte array using big-endian bit ordering.
7 * Used for IP addresses to match network byte order conventions.
8 */
9inline bool GetBitBE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept

Taking it one step further, using the observation that we only inspect a bit once, we could perform incrementing and decrementing within these functions as well. This also is influenced by the other comment investigating GetBitBE.

 0 constexpr uint32_t INVALID = 0xFFFFFFFF;
 1 
 2 /**
 3- * Extract a single bit from byte array using little-endian bit ordering.
 4- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
 5+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
 6+ * Used for ASMap data.
 7  */
 8-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 9+inline bool DecodeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
10 {
11-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
12+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
13+    ++bitpos;
14+    return bit;
15 }
16 
17 /**
18- * Extract a single bit from byte array using big-endian bit ordering.
19+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
20  * Used for IP addresses to match network byte order conventions.
21  */
22-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
23+inline bool DecodeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
24 {
25-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
26+    --bitpos;
27+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
28 }
29 
30 /**
31@@ -88,8 +91,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
32         // Read continuation bit to determine if we're in this class
33         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
34             if (bitpos >= data.size() * 8) break;
35-            bit = GetBitLE(data, bitpos);
36-            bitpos++;
37+            bit = DecodeBitLE(bitpos, data);
38         } else {
39             bit = 0;  // Last class has no continuation bit
40         }
41@@ -101,8 +103,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
42             // Decode the position within this class in big endian
43             for (int b = 0; b < *bit_sizes_it; b++) {
44                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
45-                bit = GetBitLE(data, bitpos);
46-                bitpos++;
47+                bit = DecodeBitLE(bitpos, data);
48                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
49             }
50             return val;
51@@ -198,11 +199,10 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
52             if (jump == INVALID) break; // Jump offset straddles EOF
53             if (bits == 0) break; // No input bits left
54             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
55-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
56+            if (DecodeBitBE(bits, ip)) {  // Check next IP bit (big-endian)
57                 pos += jump;  // Bit = 1: skip to right subtree
58             }
59             // Bit = 0: fall through to left subtree
60-            bits--;
61         } else if (opcode == Instruction::MATCH) {
62             // Compare multiple IP bits against a pattern
63             // The match value encodes both length and pattern:
64@@ -213,10 +213,9 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
65             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
66             if (bits < matchlen) break; // Not enough input bits
67             for (uint32_t bit = 0; bit < matchlen; bit++) {
68-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
69+                if (DecodeBitBE(bits, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
70                     return default_asn;  // Pattern mismatch - use default
71                 }
72-                bits--;
73             }
74             // Pattern matched - continue execution
75         } else if (opcode == Instruction::DEFAULT) {
76@@ -260,8 +259,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
77                 // Nothing to execute anymore
78                 if (endpos - pos > 7) return false; // Excessive padding
79                 while (pos != endpos) {
80-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
81-                    ++pos;
82+                    if (DecodeBitLE(pos, asmap)) return false; // Nonzero padding bit
83                 }
84                 return true; // Sanely reached EOF
85             } else {

hodlinator commented at 7:21 pm on November 20, 2025:

(Maybe ConsumeBit(LE|BE) would be more appropriate than DecodeBit(LE|BE)).

hodlinator commented at 12:04 pm on November 21, 2025:

Couldn’t stop thinking about this. It felt weird to implement reverse byte order in GetBitBE but then also start bits from the end of ip data and count backwards. Found a way that I think simplifies things further, making the order we consume bytes between LE&BE implementations the same, but having different bit-order. Diff against b41f5a29f7d56da8fd157770ad29b5776c3684c6:

  0--- a/src/util/asmap.cpp
  1+++ b/src/util/asmap.cpp
  2@@ -48,21 +48,25 @@ namespace {
  3 constexpr uint32_t INVALID = 0xFFFFFFFF;
  4 
  5 /**
  6- * Extract a single bit from byte array using little-endian bit ordering.
  7- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
  8+ * Extract a single bit from byte array using little-endian bit ordering (LSB first).
  9+ * Used for ASMap data.
 10  */
 11-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 12+inline bool ConsumeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
 13 {
 14-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
 15+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
 16+    ++bitpos;
 17+    return bit;
 18 }
 19 
 20 /**
 21- * Extract a single bit from byte array using big-endian bit ordering.
 22+ * Extract a single bit from byte array using big-endian bit ordering (MSB first).
 23  * Used for IP addresses to match network byte order conventions.
 24  */
 25-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 26+inline bool ConsumeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
 27 {
 28-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
 29+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (7 - (bitpos % 8))) & 1;
 30+    ++bitpos;
 31+    return bit;
 32 }
 33 
 34 /**
 35@@ -88,8 +92,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
 36         // Read continuation bit to determine if we're in this class
 37         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
 38             if (bitpos >= data.size() * 8) break;
 39-            bit = GetBitLE(data, bitpos);
 40-            bitpos++;
 41+            bit = ConsumeBitLE(bitpos, data);
 42         } else {
 43             bit = 0;  // Last class has no continuation bit
 44         }
 45@@ -101,8 +104,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
 46             // Decode the position within this class in big endian
 47             for (int b = 0; b < *bit_sizes_it; b++) {
 48                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
 49-                bit = GetBitLE(data, bitpos);
 50-                bitpos++;
 51+                bit = ConsumeBitLE(bitpos, data);
 52                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
 53             }
 54             return val;
 55@@ -181,7 +183,8 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
 56 {
 57     size_t pos{0};
 58     const size_t endpos{asmap.size() * 8};
 59-    uint8_t bits = ip.size() * 8;
 60+    uint8_t ip_bit{0};
 61+    const uint8_t ip_bits_end = ip.size() * 8;
 62     uint32_t default_asn = 0;
 63     uint32_t jump, match, matchlen;
 64     Instruction opcode;
 65@@ -196,13 +199,12 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
 66             // Binary branch: if IP bit is 1, jump forward; else continue
 67             jump = DecodeJump(pos, asmap);
 68             if (jump == INVALID) break; // Jump offset straddles EOF
 69-            if (bits == 0) break; // No input bits left
 70+            if (ip_bit == ip_bits_end) break; // No input bits left
 71             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
 72-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
 73+            if (ConsumeBitBE(ip_bit, ip)) {  // Check next IP bit (big-endian)
 74                 pos += jump;  // Bit = 1: skip to right subtree
 75             }
 76             // Bit = 0: fall through to left subtree
 77-            bits--;
 78         } else if (opcode == Instruction::MATCH) {
 79             // Compare multiple IP bits against a pattern
 80             // The match value encodes both length and pattern:
 81@@ -211,12 +213,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
 82             match = DecodeMatch(pos, asmap);
 83             if (match == INVALID) break; // Match bits straddle EOF
 84             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
 85-            if (bits < matchlen) break; // Not enough input bits
 86+            if ((ip_bits_end - ip_bit) < matchlen) break; // Not enough input bits
 87             for (uint32_t bit = 0; bit < matchlen; bit++) {
 88-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
 89+                if (ConsumeBitBE(ip_bit, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
 90                     return default_asn;  // Pattern mismatch - use default
 91                 }
 92-                bits--;
 93             }
 94             // Pattern matched - continue execution
 95         } else if (opcode == Instruction::DEFAULT) {
 96@@ -260,8 +261,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
 97                 // Nothing to execute anymore
 98                 if (endpos - pos > 7) return false; // Excessive padding
 99                 while (pos != endpos) {
100-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
101-                    ++pos;
102+                    if (ConsumeBitLE(pos, asmap)) return false; // Nonzero padding bit
103                 }
104                 return true; // Sanely reached EOF
105             } else {

fjahr commented at 11:49 pm on November 24, 2025:

Thanks, I have applied this latest patch, I think that also resolves your earlier comment that was in a different comment thread.

hodlinator commented at 8:43 am on November 25, 2025:

Thank you for taking the patch! :handshake:

Agreed, it builds on #33878 (review)

hodlinator commented at 3:12 pm on November 20, 2025: contributor

Reviewed b41f5a29f7d56da8fd157770ad29b5776c3684c6

Sorry for not finding these earlier, getting closer to A-C-K.

fjahr force-pushed on Nov 24, 2025

fjahr commented at 11:50 pm on November 24, 2025: contributor

Addressed the latest review comments from @hodlinator , thanks for the detailed patch suggestions and rationale, co-authorship added!

in src/test/fuzz/util/net.h:239 in 825d677a86

234@@ -234,9 +235,11 @@ class FuzzedSock : public Sock
235 
236 [[nodiscard]] inline NetGroupManager ConsumeNetGroupManager(FuzzedDataProvider& fuzzed_data_provider) noexcept
237 {
238-    std::vector<bool> asmap = ConsumeRandomLengthBitVector(fuzzed_data_provider);
239-    if (!SanityCheckASMap(asmap, 128)) asmap.clear();
240-    return NetGroupManager(asmap);
241+    std::vector<std::byte> asmap{ConsumeRandomLengthByteVector<std::byte>(fuzzed_data_provider)};
242+    if (!CheckStandardAsmap(std::span<std::byte>(asmap))) {

hodlinator commented at 9:01 am on November 25, 2025:

nit: Seems to compile fine without this on Clang:

0    if (!CheckStandardAsmap(asmap)) {

fjahr commented at 4:34 pm on December 31, 2025:

done

in src/util/asmap.cpp:217 in dc0c4310ba outdated

221@@ -214,19 +222,18 @@ std::vector<std::byte> DecodeAsmap(fs::path path)
222     }
223     int64_t length{file.size()};
224     LogInfo("Opened asmap file %s (%d bytes) from disk", fs::quoted(fs::PathToString(path)), length);
225-

hodlinator commented at 9:04 am on November 25, 2025:

meganit in dc0c4310bae257236d3cd6a06de8071276d4548b: Empty line is removed only to be re-added in next commit.

fjahr commented at 4:34 pm on December 31, 2025:

done

hodlinator approved

hodlinator commented at 10:25 pm on November 25, 2025: contributor

ACK 825d677a863d153d4c466288397da773ee2bb2f5

Operating on std::span<std::byte> rather than std::vector<bool>& enables for efficient use of asmap data embedded in the binary (#28792).

I have thoroughly reviewed and contributed patches for this refactoring up to the point of having a conflict of interest in getting it merged. :)

To build more confidence of the bool -> byte transition, I developed a C++ test with more readable asmap input data - cf34bb9806e95116d4b65b6dc80dc11a1f723a51 (https://github.com/hodlinator/bitcoin/tree/pr/33878_suggestions2 for interleaved branch). Not sure whether it’s worth including in some form.

fjahr commented at 10:51 pm on November 25, 2025: contributor

To build more confidence of the bool -> byte transition, I developed a C++ test with more readable asmap input data - https://github.com/bitcoin/bitcoin/commit/cf34bb9806e95116d4b65b6dc80dc11a1f723a51 (https://github.com/hodlinator/bitcoin/tree/pr/33878_suggestions2 for interleaved branch). Not sure whether it’s worth including in some form.

Thanks, looks like a very nice visualization at least. I will consider including it and addressing your other comments until I have to retouch here. If that doesn’t happen there are follow-ups already planned where these could be included. Thanks again for the reviews :)

in src/netgroup.h:19 in 87f72d9ed0 outdated

15  * Netgroup manager
16  */
17 class NetGroupManager {
18 public:
19-    explicit NetGroupManager(std::vector<bool> asmap)
20+    explicit NetGroupManager(std::vector<std::byte> asmap)

sedited commented at 1:13 pm on December 29, 2025:

In commit 87f72d9ed0e43387c183ab232674a292ab82f6e0

Could enforce moving explicitly by passing std::vector<std::byte>&& (though I realize this is changed again in a later commit).

fjahr commented at 4:34 pm on December 31, 2025:

done

in src/addrman.cpp:159 in b89ffbc56a outdated

155@@ -156,7 +156,7 @@ void AddrManImpl::Serialize(Stream& s_) const
156      * * for each new bucket:
157      *   * number of elements
158      *   * for each element: index in the serialized "all new addresses"
159-     * * asmap checksum
160+     * * asmap version

sedited commented at 9:55 pm on December 29, 2025:

In commit b89ffbc56a431bb662407d97f1b62eb99050eeeb

Is version really clearer? In the asmap-data repository the entries seem to be versioned by their unix timestamp, that seems clearer to me and also conveys a temporal ordering between data dumps.

fjahr commented at 4:34 pm on December 31, 2025:

I think the version/checksum and timestamps are two different things. My goal here was just to not have two different names (version and checksum) for the same thing in the code base, for clarity, greppability etc. The timestamps are important outside of core because we do these coordinated launches in the asmap-data project and the timestamp ties the data back to one of these events. It might be that there are other means of getting an asmap in the future where people don’t care so much about the exact timestamp and they don’t care about it, but that’s just theoretical. The version here is a hash, so a data integrity check and I think that is important to have even though we currently only print it. We currently don’t have a concept to get the timestamp of a file into bitcoin core. We could add a header to the file similar to what we have done with the assumeutxo dumps but even then the timestamp could be lying. Other than assumeutxo we don’t have anything to compare and check against either the hash or the timestamp and I guess that’s ok, or rather, unavoidable.

Sorry for rambling a bit. At second thought, maybe your point was rather that we should use checksum instead of version? I think I could get behind that if we say the timestamp is the “version” and the “checksum” is the integrity check :)

in src/util/asmap.cpp:76 in dc0c4310ba outdated

71@@ -71,35 +72,33 @@ enum class Instruction : uint32_t
72     DEFAULT = 3,
73 };
74 
75-const std::vector<uint8_t> TYPE_BIT_SIZES{0, 0, 1};
76-Instruction DecodeType(size_t& bitpos, const std::vector<std::byte>& data)
77+constexpr uint8_t TYPE_BIT_SIZES[]{0, 0, 1};
78+Instruction DecodeType(size_t& bitpos, const std::span<const std::byte>& data)

sedited commented at 10:21 pm on December 29, 2025:

In commit dc0c4310bae257236d3cd6a06de8071276d4548b

Is there a reason for passing these spans by reference?

fjahr commented at 4:34 pm on December 31, 2025:

nope, done

in src/netgroup.h:94 in dc0c4310ba outdated

91+     * thread-safe. m_asmap can either point to m_loaded_asmap which holds
92+     * data loaded from an external file at runtime or it can point to embedded
93+     * asmap data.
94+     */
95+    const std::span<const std::byte> m_asmap;
96+    std::vector<std::byte> m_loaded_asmap;

sedited commented at 10:23 pm on December 29, 2025:

In commit dc0c4310bae257236d3cd6a06de8071276d4548b

You mention that this dual ownership structure is done to avoid holding the asmap in memory twice. Where is this currently the case? Does that only happen in test code at the moment?

fjahr commented at 4:34 pm on December 31, 2025:

It was the case an earlier version of the code. I guess referencing something that isn’t actually there anymore isn’t something we should be doing in docs, even if it’s implicitly. But in this case I think there should be some documentation in this regard so it clears up why we need m_asmap and m_loaded_asmap here.

sedited commented at 12:23 pm on January 2, 2026:

I’m more asking why keep both of them at all. Seems a bit too complicated to me. Requiring the map to be moved in seems good enough, no?

sipa commented at 1:18 pm on January 2, 2026:

In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.

sipa commented at 1:19 pm on January 2, 2026:

In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.

fjahr force-pushed on Dec 31, 2025

fjahr commented at 4:36 pm on December 31, 2025: contributor

Addressed comments by @hodlinator and @sedited

DrahtBot added the label CI failed on Dec 31, 2025

DrahtBot commented at 5:50 pm on December 31, 2025: contributor

🚧 At least one of the CI tasks failed. Task test max 6 ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/20623068864/job/59228690078 LLM reason (✨ experimental): CI failed due to uncommitted changes in the git index blocking the rebase step after a successful build.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

fjahr force-pushed on Jan 1, 2026

maflcko removed the label CI failed on Jan 1, 2026

hodlinator approved

hodlinator commented at 12:30 pm on January 3, 2026: contributor

re-ACK c5251f39a90ff63bed417b0113ba4fdbc7ea78d0

Changes since my first ACK (https://github.com/bitcoin/bitcoin/pull/33878#pullrequestreview-3503770177) are only minor adjustments based off that review and another (https://github.com/bitcoin/bitcoin/pull/33878#pullrequestreview-3615845587):

Made NetGroupManager-ctor temporarily take r-value
Avoided span-references
Cleaned up whitespace across commits

sedited approved

sedited commented at 10:02 am on January 4, 2026: contributor

ACK c5251f39a90ff63bed417b0113ba4fdbc7ea78d0

DrahtBot added the label Needs rebase on Jan 20, 2026

refactor: Operate on bytes instead of bits in Asmap code

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

fa41fc6a1a

refactor: Unify asmap version calculation and naming

Calculate the asmap version only in one place: A dedicated function in util/asmap.

The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.

385c34a052

refactor: Use span instead of vector for data in util/asmap

This prevents holding the asmap data in memory twice.

The version hash changes due to spans being serialized without their size-prefix (unlike vectors).

cf4943fdcd

doc: Add more extensive docs to asmap implementation

Also makes minor improvement on the python implementation documentation.

79e97d45c1

refactor: Simplify Interpret asmap function

This aligns it more with SanityCheckAsmap and reduces variable scope.

Also unify asmap casing in SanityCheckAsmap function name.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

4fec726c4d

fjahr force-pushed on Jan 20, 2026

DrahtBot removed the label Needs rebase on Jan 21, 2026

fjahr commented at 8:25 am on January 21, 2026: contributor

Rebased

hodlinator approved

hodlinator commented at 9:22 am on January 21, 2026: contributor

re-ACK 4fec726c4d352daf2fb4a7e5ed463e44c8815ddb

Changes since previous ACK (https://github.com/bitcoin/bitcoin/pull/33878#pullrequestreview-3624323262):

Resolved conflict with fabf8d1c5bdb6a944762792cdc762caa6c1a760b

DrahtBot requested review from sedited on Jan 21, 2026

sipa commented at 5:01 pm on February 2, 2026: member

ACK 4fec726c4d352daf2fb4a7e5ed463e44c8815ddb

sedited approved

sedited commented at 5:08 pm on February 2, 2026: contributor

Re-ACK 4fec726c4d352daf2fb4a7e5ed463e44c8815ddb

sedited merged this on Feb 2, 2026

sedited closed this on Feb 2, 2026

sedited referenced this in commit 925759d172 on Feb 23, 2026

sedited referenced this in commit 8640523843 on Feb 23, 2026

hebasto commented at 1:50 pm on February 23, 2026: member

Could changes in src/test/addrman_tests.cpp cause failures when running cross-compiled tests on Windows?

See:

maflcko commented at 2:06 pm on February 23, 2026: member

Could changes in src/test/addrman_tests.cpp cause failures when running cross-compiled tests on Windows?

See:
* https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507

* https://github.com/bitcoin/bitcoin/actions/runs/22237824844/job/64344924186?pr=33974

See:

https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507#step:8:652

0./test/txospenderindex_tests.cpp(13): Entering test suite "txospenderindex_tests"
1./test/txospenderindex_tests.cpp(15): Entering test case "txospenderindex_initial_sync"
2./test/txospenderindex_tests.cpp(60): fatal error: in "txospenderindex_tests/txospenderindex_initial_sync": critical check tx_spender->has_value() has failed
3./test/txospenderindex_tests.cpp(15): Leaving test case "txospenderindex_initial_sync"; testing time: 236038us
4./test/txospenderindex_tests.cpp(13): Leaving test suite "txospenderindex_tests"; testing time: 236068us

hebasto commented at 3:00 pm on February 23, 2026: member

Could changes in src/test/addrman_tests.cpp cause failures when running cross-compiled tests on Windows? See:
0* https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507
1
2* https://github.com/bitcoin/bitcoin/actions/runs/22237824844/job/64344924186?pr=33974

See:

https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507#step:8:652

0./test/txospenderindex_tests.cpp(13): Entering test suite "txospenderindex_tests"
1./test/txospenderindex_tests.cpp(15): Entering test case "txospenderindex_initial_sync"
2./test/txospenderindex_tests.cpp(60): fatal error: in "txospenderindex_tests/txospenderindex_initial_sync": critical check tx_spender->has_value() has failed
3./test/txospenderindex_tests.cpp(15): Leaving test case "txospenderindex_initial_sync"; testing time: 236038us
4./test/txospenderindex_tests.cpp(13): Leaving test suite "txospenderindex_tests"; testing time: 236068us

Thanks! It’s already tracked in #34637.

maflcko commented at 9:02 am on February 25, 2026: member

Could changes in src/test/addrman_tests.cpp cause failures when running cross-compiled tests on Windows? See:
0* https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507
1
2* https://github.com/bitcoin/bitcoin/actions/runs/22237824844/job/64344924186?pr=33974

See: https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507#step:8:652

0./test/txospenderindex_tests.cpp(13): Entering test suite "txospenderindex_tests"
1./test/txospenderindex_tests.cpp(15): Entering test case "txospenderindex_initial_sync"
2./test/txospenderindex_tests.cpp(60): fatal error: in "txospenderindex_tests/txospenderindex_initial_sync": critical check tx_spender->has_value() has failed
3./test/txospenderindex_tests.cpp(15): Leaving test case "txospenderindex_initial_sync"; testing time: 236038us
4./test/txospenderindex_tests.cpp(13): Leaving test suite "txospenderindex_tests"; testing time: 236068us

Thanks! It’s already tracked in #34637.

I guess it may be easier to spot if fail-fast behavior was used. Discussion in https://github.com/bitcoin/bitcoin/issues/34666

hodlinator commented at 1:42 pm on February 25, 2026: contributor

Ah, I see now that the other unit test run also failed in txospenderindex_tests - https://github.com/bitcoin/bitcoin/actions/runs/22304756945/job/64521461506?pr=31507#step:8:652

For the record, the end of the output was:

0 ./test/addrman_tests.cpp(1172): Leaving test case "addrman_size"; testing time: 4390us
1
2*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
3./test/addrman_tests.cpp(50): Leaving test suite "addrman_tests"; testing time: 227097us
4Leaving test module "Bitcoin Core Test Suite"; testing time: 85011678us
5Command '['./bin/test_bitcoin.exe', '-l', 'test_suite']' returned non-zero exit status 201.

(addrman_size is the last test case in the addrman_tests suite).

Was looking into exit code 201, it comes from Boost Test https://www.boost.org/doc/libs/1_87_0/boost/cstdlib.hpp

0const int exit_test_failure = 201;

Agree it would be easier to spot the actual error if the test_bitcoin failed quicker.

refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)