fjahr
commented at 10:13 pm on November 4, 2023:
contributor
This is one of two options for how we may embed ASMap files into Bitcoin Core releases. This approach here uses a binary dump of the file which is then committed to the source code as a header file. The alternative approach is not having the data in the source code but only adding it during the build phase of a release. I initially favored the second approach but it seems harder to achieve than I originally expected and I have also started to see the downsides of this approach as well. For example, we would not have the asmap data available in dev builds, meaning those who want to use asmap with dev builds would always need to provide a file as input. Also, opening a PR for new ASMap data before the first RC allows for earlier testing of the data and the PR itself is the natural place where reviewers can give feedback. If the asmap data is only included during the build phase potential problems can only be discovered while testing RC1 and a separate issue would need to be opened to coordinate a fix. I had also not realized that the binary dump header file approach here is very similar to the embedding of the seeds.
I am still planning to demo how the embedding would work during the build phase, but since I have started to like the approach shown here a lot more lately, I thought I would demo it and hear people’s opinions on it.
Currently, the binary dump header file could simply be generated using xxd -i path/to/ip_asn.map init/ip_asn.h. But I would also be open to adding an equivalent to makeseeds.py to make the process more convenient if people would prefer that.
DrahtBot
commented at 10:13 pm on November 4, 2023:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#31868 ([IBD] specialize block serialization by l0rinc)
#30595 (kernel: Introduce initial C header API by TheCharlatan)
#29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
#29307 (util: explicitly close all AutoFiles that have been written by vasild)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label
CI failed
on Nov 4, 2023
fjahr force-pushed
on Nov 4, 2023
fjahr force-pushed
on Nov 4, 2023
fjahr force-pushed
on Nov 4, 2023
sipa
commented at 0:24 am on November 5, 2023:
member
Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).
DrahtBot removed the label
CI failed
on Nov 5, 2023
maflcko
commented at 5:32 pm on November 5, 2023:
member
How much data would this add per year to the repo, assuming updates happen?
sipa
commented at 5:59 pm on November 5, 2023:
member
@maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%).
luke-jr
commented at 4:17 pm on November 8, 2023:
member
Why is the option of loading it from a file even in dev builds, not considered?
fjahr
commented at 7:08 pm on November 9, 2023:
contributor
Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).
fjahr
commented at 7:11 pm on November 9, 2023:
contributor
Why is the option of loading it from a file even in dev builds, not considered?
It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: “The alternative approach is not having the data in the source code but only adding it during the build phase of a release.” But this also comes with the aforementioned downsides.
willcl-ark
commented at 9:56 am on December 6, 2023:
member
A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the -asmap is not being used? Perhaps this will be handled by the demand paging system? If not, it seems to be a bit of a shame to increase binary size by 20% in all cases.
It seems to make most sense to me to take the path suggested by @sipa, including the .dat file in the repo, and having it converted to a .h at compile time. It might also be nice to have a configuration flag to disable the embedded header creation for faster builds.
DrahtBot added the label
Needs rebase
on Dec 14, 2023
fjahr force-pushed
on Dec 14, 2023
DrahtBot removed the label
Needs rebase
on Dec 15, 2023
DrahtBot added the label
CI failed
on Jan 16, 2024
fjahr renamed this:
Embedding ASMap files as binary dump header file
Embed default ASMap as binary dump header file
on Aug 27, 2024
fjahr force-pushed
on Aug 28, 2024
fjahr marked this as ready for review
on Aug 28, 2024
fjahr force-pushed
on Aug 28, 2024
fjahr
commented at 11:48 pm on August 28, 2024:
contributor
I am picking this back up since we have come a long way in terms of tooling outside of Bitcoin Core and lately I was just waiting for cmake and v28 feature freeze. I will suggest an early merge in the cycle for v29 so more people are encouraged to use one steady file over a longer period of time.
I am now using the suggested approach of building the header file at compile time and I added the functionality for both autotools and cmake (the autotools one has existed for a while, I can also remove it if people think it’s unnecessary). Of course this is rebased to make cmake available. First time I am editing the cmake build system so there are probably things to improve… And there is also a command for asmap-tool.py to generate the header file.
fjahr
commented at 6:35 pm on August 29, 2024:
contributor
Dropped the autotools build support after the discussion on IRC today (plan is to remove autotools in 24h) and rebased for good measure to have latest CMake fixups included.
in
src/init.cpp:1264
in
e5a25fd708outdated
1265- InitError(strprintf(_("Could not parse asmap file %s"), fs::quoted(fs::PathToString(asmap_path))));
1266- return false;
1267+
1268+ if (fs::exists(asmap_path)) {
1269+ asmap = DecodeAsmap(asmap_path);
1270+ if (asmap.size() == 0) {
fjahr
commented at 6:51 pm on August 29, 2024:
contributor
I am now using the suggested approach of building the header file at compile time
I think the suggestion was to not include the header itself in that case.
I see. I am indifferent between having them in the repo or not but I am not sure what the arguments are for doing one or the other.
We currently have two cases that seem comparable where we have the raw files as part of the repo and then make header files out of it in one way or another:
The test/data/* files that are built at compile time and the headers are not included in the repo
The chainparamsseeds that are built on demand for a release using the python script in contrib. This header file is included in the repo.
We’ll have the raw file in the repo for sure but I am indifferent between using the build system or the python tool to make the header and I have implemented both which is good for testing anyway, I guess. I am also indifferent between having the header file in the repo or not. Conceptually the chainparamseeds seems closer to what we would do with an embedded asmap file: we basically also refresh it once per (point) release. So I guess that’s why I did it this way, because I was looking at chainparamsees as the reference case.
I am curious if there are arguments for or against having the header file in the repo that I am currently not seeing. But even if not I am happy to go with what reviewers prefer because I just don’t have a preference right now. Maybe we’ll want to update the chainparamseeds process too if there is an argument for it.
in
contrib/asmap/asmap-tool.py:132
in
e5a25fd708outdated
127@@ -127,6 +128,20 @@ def main():
128 help="address file containing getnodeaddresses output to use in the comparison "
129 "(make sure to set the count parameter to zero to get all node addresses, "
130 "e.g. 'bitcoin-cli getnodeaddresses 0 > addrs.json')")
131+
132+ parser_gen_header = subparsers.add_parser("gen_header",
Might be good to add an example of these new commands and arguments to /contrib/asmap/README.md (and also mention the asmap tooling in /contrib/README.md).
A higher level doc of overall steps to follow may be helpful, too, unless there is one already that I have forgotten about and overlooked.
I have expanded /contrib/asmap/README.md and /contrib/README.md and also added more details to the -asmap help.
It sounds good to me to add a doc/asmap.md file with a higher level view but I would like to leave it as a follow-up that @jurraca might want to take on soon. He is currently working on getting https://asmap.org in shape and that doc would probably be a condensed version of that.
jonatack
commented at 7:08 pm on August 29, 2024:
member
First look at e5a25fd708858204cb6060393dc59d1dfed68acb
Sjors
commented at 10:11 am on August 30, 2024:
member
Concept ACK@fjahr if cmake can generate src/init/ip_asn.h from ip_asn.dat and it’s relative quick, then it should be sufficient to only include the .dat file. It saves space in the repo.
If on the other hand you have to manually call a Python tool to produce the header file, then it would be too much inconvenience for everyone who wants to build Bitcoin Core from source.
I just tried deleting src/init/ip_asn.h and running cmake again. It seems to take just a fraction of a second, but not sure how to measure that.
ip_asn.h is also not human readable. It’s not useful to compare the diff between changes. If it were, that could be argument in favor of including it.
Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?
fjahr force-pushed
on Aug 30, 2024
fjahr force-pushed
on Aug 30, 2024
DrahtBot added the label
CI failed
on Aug 30, 2024
DrahtBot
commented at 8:54 pm on August 30, 2024:
contributor
Make sure to run all tests locally, according to the documentation.
The failure may happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
fjahr
commented at 9:09 pm on August 30, 2024:
contributor
I have addressed the latest feedback from @Sjors and @jonatack and rebased again because of the CMake situation and because I overhauled a lot of the changes anyway.
In these latest changes I have
Re-organized the commits, I think review should be a bit easier like this
Expanded documentation and tests
Made it possible to build with the embedded ASMap data or without
Removed the header file from the repo
fjahr
commented at 9:24 pm on August 30, 2024:
contributor
Can you expand the documentation for -asmap to explain how it behaves? I haven’t studied the code in detail yet, but do we now always turn the -asmap feature on, using the default map? Or is it off by default and can be used with -asmap=1 for the default map and -asmap=somefile.dat for a custom / up to date map?
I have expanded the help text for -asmap but I want to expand on it a bit here to explain my reasoning.
Usage of the embedded asmap data is off by default right now. I am open to throw an on-by-default PR out there as a follow-up but I don’t want that conceptual discussion to hold up review of the implementation PR here. My feeling is that most people will be more comfortable with the v2-transport model: Implement as an off-by-default option for one release and then enable by default in the next version if there are no issues discovered that block it.
On the -asmap functionality: If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data. I was going back and forth on that last part, i.e. using the embedded data instead of the file in the default location when both are present. But I opted for this way because 1. we are not breaking any workflows that were working before. If someone puts their file in the default location and expects it to be used, it’s still going to be used. And 2. dropping an asmap file in the default location can still be the most comfortable way of updating your asmap data, for example if you want to use the latest file from asmap-data instead of the embedded one.
fjahr force-pushed
on Aug 30, 2024
fjahr force-pushed
on Sep 1, 2024
fjahr force-pushed
on Sep 1, 2024
fjahr force-pushed
on Sep 1, 2024
fjahr force-pushed
on Sep 1, 2024
Sjors
commented at 6:42 am on September 2, 2024:
member
Usage of the embedded asmap data is off by default right now.
That makes sense.
If a file path is given (-asmap=foo.map), the file is either found and opened, otherwise we error. If no path is given (-asmap or -asmap=1) we first look for a file in the default file location. If there is no file in the default location and the build includes the embedded asmap data, we use the embedded data.
I think that’s a good approach.
CI is unhappy :-(
fjahr
commented at 9:31 am on September 2, 2024:
contributor
CI is unhappy :-(
Yeah, I’ve been trying to fix that but it’s tricky because I can’t reproduce it locally. I guess I need to ask someone with more CMake experience.
in
cmake/script/WriteIPASN.cmake:14
in
b20f0ef642outdated
8+execute_process(
9+ COMMAND xxd -i "${RAW_FILE}"
10+ COMMAND sed "s/unsigned char .*/static const unsigned char ip_asn[] = {/"
11+ COMMAND sed "s/^unsigned int .*_len/static const unsigned int ip_asn_len/"
12+ OUTPUT_VARIABLE HEX_DUMP
13+)
maflcko
commented at 9:34 am on September 2, 2024:
missing COMMAND_ERROR_IS_FATAL? Otherwise, a failure is silently ignored, no?
maflcko
commented at 9:38 am on September 2, 2024:
Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?
Now using the python script from cmake, so marking as resolved.
maflcko
commented at 9:38 am on September 2, 2024:
member
I presume the CI fails because xxd is missing?
fjahr force-pushed
on Sep 2, 2024
fjahr force-pushed
on Sep 2, 2024
fjahr
commented at 9:59 am on September 2, 2024:
contributor
I presume the CI fails because xxd is missing?
Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.
fjahr
commented at 10:01 am on September 2, 2024:
contributor
Sounds like it can’t find the raw file…
0CMake Error at /ci_container_base/cmake/script/WriteIPASN.cmake:8 (execute_process):
1 execute_process error getting child return code: No such file or directory
maflcko
commented at 10:02 am on September 2, 2024:
member
Maybe but it would really surprise me because the failures are in Ubuntu 22.04 environments and I tried reproduce the error in a fresh Ubuntu 22.04 VM and it seemed xxd was included from the start.
Are you sure? Note that the CI doesn’t use a desktop VM image, but a minimal docker image. I get:
0# xxd
1bash: xxd: command not found
fjahr
commented at 10:03 am on September 2, 2024:
contributor
Note that the CI doesn’t use a desktop VM image, but a minimal docker image
Ah, then I guess that was my mistake
fjahr force-pushed
on Sep 2, 2024
fjahr force-pushed
on Sep 2, 2024
fjahr
commented at 12:04 pm on September 2, 2024:
contributor
Seems a bit odd to implement this both in Python and cmake (by calling xxd). Wouldn’t it be easier to just implement it once and require that to be called before use in the other place?
Yeah, I first implemented it in python, then autotools and finally cmake. I kept the python version around because I wasn’t sure which approach we might actually want to go (see #28792 (comment)) and because I thought there might be some possible testing scenarios using it. But it doesn’t matter now since we needed an xxd-free approach to work on all environments and implementing the functionality in cmake directly seemed cumbersome, so I am now just using the python script from cmake as well, leaving us with just the python implementation.
fjahr force-pushed
on Sep 2, 2024
in
contrib/asmap/asmap-tool.py:225
in
39729a910aoutdated
maflcko
commented at 12:23 pm on September 2, 2024:
Using a mutable C-style array seems a bit fragile. IIUC, starting with C++17, using inline constexpr std::array ip_asn{...}; should allow the compiler to optimize the layout, as well as enforce immutability. Moreover, std::array drops the need for ip_asn_len, because the size is embedded in the type of the array. Unrelated, you can change the 0x... above to std::byte{0x...} and then use MakeUCharSpan instead of legacy C-Style reinterpret cast and static cast in the code that uses the array.
Sjors
commented at 1:08 pm on September 2, 2024:
member
so I am now just using the python script from cmake as well, leaving us with just the python implementation
Mmm, is Python already a build requirement though? Afaik it’s only a test requirement. So if the header can only be generated using a Python script, then that’s an argument for including it in the source code.
maflcko
commented at 1:16 pm on September 2, 2024:
member
if the header can only be generated using a Python script, then that’s an argument for including it in the source code.
It is possible to generate it in native cmake as well. See for example cmake/script/GenerateHeaderFromJson.cmake. Note that previously the “cmake” generation used xxd, which isn’t a build-requirement either.
Edit: To clarify: I am against reverting to including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.
Sjors
commented at 1:35 pm on September 2, 2024:
member
Is there a way to do this without relying either on Python or xxd?
Twice that for no strong reason seems annoying to anyone cloning the repo in the future.
Agreed.
fjahr
commented at 2:22 pm on September 2, 2024:
contributor
To clarify: I am against including the data twice. 1.6 MB on every bump is bad enough. Twice that for no strong reason seems annoying to anyone cloning the repo in the future.
Now I’m confused, the header file isn’t included anymore since a while ago so the data is included only once already, or am I missing something? Are you just saying that shouldn’t be changed back?
fjahr force-pushed
on Sep 2, 2024
fjahr
commented at 3:57 pm on September 2, 2024:
contributor
Switched back to a native approach in cmake without python or xxd.
fjahr force-pushed
on Sep 2, 2024
DrahtBot removed the label
CI failed
on Sep 2, 2024
in
cmake/script/GenerateHeaderAsmap.cmake:6
in
b557dc5976outdated
0@@ -0,0 +1,33 @@
1+# Copyright (c) 2023-present The Bitcoin Core developers
2+# Distributed under the MIT software license, see the accompanying
3+# file COPYING or https://opensource.org/license/mit/.
4+
5+file(READ ${RAW_FILE} hex_content HEX)
6+string(REGEX MATCHALL "([A-Za-z0-9][A-Za-z0-9])" bytes "${hex_content}")
maflcko
commented at 12:57 pm on September 5, 2024:
Could rebase and use the existing generate_header_from_raw, which should do the same thing, basically?
fjahr
commented at 10:00 am on September 13, 2024:
Seems like there is still some tinkering going on on that one, I will give it a try when that has settled down.
I am now using that function. I thought there might be issue with the missing header guards but the generated files are not listed. There is now also no more comment in the generated file.
maflcko
commented at 9:56 am on September 20, 2024:
missing header guard
That was somewhat intentional, I guess. A header guard is really only useful for module headers to avoid compile errors if the module is included several times transitively/recursively.
However, for data-holding headers, the include should only ever happen in a single place in the whole program. Otherwise it seems too easy to bloat the binary. So a compile error in that case is actually intentional and useful.
DrahtBot added the label
CI failed
on Sep 12, 2024
DrahtBot removed the label
CI failed
on Sep 15, 2024
DrahtBot added the label
Needs rebase
on Sep 17, 2024
fjahr force-pushed
on Sep 20, 2024
fjahr
commented at 9:32 am on September 20, 2024:
contributor
Rebased to incorporate latest relevant changes to the build system and AutoFile. This now also builds on #30901 so please ignore the first commit or leave your review of it in that PR.
DrahtBot removed the label
Needs rebase
on Sep 20, 2024
fjahr
commented at 10:02 pm on September 20, 2024:
contributor
CI failures (timeouts) seem unrelated, should be ready for review as I am hoping #30901 will get merged quickly.
DrahtBot added the label
Needs rebase
on Nov 6, 2024
fjahr force-pushed
on Nov 6, 2024
DrahtBot removed the label
Needs rebase
on Nov 7, 2024
sipa
commented at 2:39 pm on December 2, 2024:
member
Concept ACK.
It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?
fjahr
commented at 2:07 pm on December 13, 2024:
contributor
It would be nice if the asmap data blob didn’t need to be converted to std::vector<bool> at runtime anymore, because that implies using memory twice (the blob itself, plus the decoded vector, despite containing effectively the same data). To improve upon that, I don’t think it’s hard to make the asmap interpreter code operate on a std::span<const std::byte> instead of a std::vector<bool>. Do you feel like looking into that, or do you want me to do so as a follow-up?
That makes sense and I am giving it a try right now. Will follow-up here when it’s done!
DrahtBot added the label
Needs rebase
on Dec 17, 2024
fjahr force-pushed
on Jan 15, 2025
DrahtBot
commented at 8:44 pm on January 15, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Jan 15, 2025
fjahr force-pushed
on Jan 15, 2025
fjahr force-pushed
on Jan 15, 2025
DrahtBot removed the label
CI failed
on Jan 16, 2025
fjahr
commented at 4:13 pm on January 16, 2025:
contributor
Rebased with an updated #30901 and implemented @sipa ’s suggestion here: #28792 (comment) This took me a bit more time than I anticipated and a couple of times I wasn’t sure which approach to go with. This is why I have kept this part as a separate commit for now and I am looking for approach ACKs on that last part in particular first before I squash it into the appropriate previous commits.
The most interesting pieces IMO:
Dual member approach in NetGroupManager with m_asmap and m_loaded_asmap (approach suggested by @sipa )
Usage of named constructors
Checksum calculation: I wrote a new serializer but didn’t use it because it didn’t feel we should add it just for this. Converting also feels ugly but seems to be the easiest approach. Or maybe I am overlooking something.
in
src/util/asmap.cpp:234
in
6cf4930d15outdated
233 {
234- LogInfo("Opened asmap data (%zu bytes) from embedded byte array\n", data.size());
235- return Decode(data);
236+ HashWriter asmap_hasher;
237+
238+ // Asmap used to be a bool vector, so in order to keep the checksum
Do we care about keeping the checksum calculation compatible? For almost all users, this PR being merged will mean a new asmap database anyway, so they’ll incur the price of addrman invalidation that brings anyway.
I guess you are right, I probably got a bit too caught up in this thinking that there should be an easier way to keep it consistent. I have removed that backwards compatibility code and updated the test.
DrahtBot removed the label
Needs rebase
on Jan 17, 2025
fjahr force-pushed
on Jan 20, 2025
in
src/test/fuzz/asmap.cpp:17
in
f2e673f944outdated
This could be a lot smaller by using the ""_hex user-defined literal (if you include util/strencodings.h); it’d create an std::array<std::byte, N> instead of a C-array, but the std::span constructor below just work just as well on it.
This is neat but I found it to be not as straight forward as it seemed initially. Aside from the additional include I found that I needed to bump template-depth and constexpr-steps in order to make this work at compile time with clang. I am still a bit unclear on what the trade-offs are for this and I will open a separate PR with this so that this discussion doesn’t extend the scope here further.
Yeah, given that the file isn’t committed its size doesn’t matter that much. And it may be that constexpr evaluating a huge amount of data is non-trivial for the compiler, unsure. This isn’t that important.
It was a left-over of a previous approach and I left it in initially in case it might be interesting for testing, experiments etc. just because it was already there. But the PR has grown in scope a lot and it’s becoming annoying to deal with for me and reviewers, so I dropped it now.
sipa
commented at 3:55 pm on January 29, 2025:
member
Concept ACK.
I’d recommend integrating the last commit into the rest of the PR, making the std::vector<bool> -> std::span<std::byte> for the interpreter change first, and then introducing the embedded asmap data afterwards.
DrahtBot added the label
Needs rebase
on Feb 12, 2025
fjahr force-pushed
on Feb 13, 2025
fjahr
commented at 8:37 pm on February 13, 2025:
contributor
Addressed @sipa ’s feedback and rebased on the latest version of #30901.
The python tool command has been dropped and the commit have been reorganized as well.
DrahtBot removed the label
Needs rebase
on Feb 13, 2025
sipa
commented at 3:24 pm on February 14, 2025:
member
Will review again soon, but a quick question: would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node? For example, to be able to run asmap.py diff on it. It could be an RPC command that just spits out the data in textual format, or a REST endpoint that directly provides it in binary.
DrahtBot added the label
Needs rebase
on Feb 14, 2025
jonatack
commented at 3:58 pm on February 28, 2025:
member
Concept ACK, will review after rebase.
fjahr force-pushed
on Mar 3, 2025
DrahtBot removed the label
Needs rebase
on Mar 3, 2025
fjahr
commented at 6:44 pm on March 3, 2025:
contributor
Rebased the previous conflict and #30901 has been merged now as well 🥳
I also added the -DWITH_EMBEDDED_ASMAP=NO to one of the CI jobs.
would it make sense to provide the ability for a user (who doesn’t necessarily have easy access to the source code) to extract the asmap data from a running node?
Let me address that in a follow-up. I think it’s a nice-to-have since the necessary data for diffing should always be available in https://github.com/asmap/asmap-data when we follow the workflow we propose but yeah, one less step of indirection and having the ability to do it all with the data that’s in the repo is definitely better.
fjahr force-pushed
on Mar 29, 2025
fjahr
commented at 10:46 pm on March 29, 2025:
contributor
Rebased to get all the latest cmake changes and swapped the included mapping data to the latest asmap run. This used the --fill feature in encoding which leads to additional space savings. This is why the embedded data has gone down significantly from ~1.7MB to ~1.4MB.
in
src/test/fuzz/asmap.cpp:34
in
d07856e1b1outdated
I don’t think this is what we want to happen: it’s converting every byte of input data into 8 bytes that are each 0x00 or 0x01. This commit is also changing the meaning of interpretation, which makes this a bit confusing.
This should just be:
0for (int i =0; i < asmap_size; ++i) {
1 asmap_vec.push_back(buffer[1+ i]);
2}
60@@ -52,7 +61,10 @@ class NetGroupManager {
61 bool UsingASMap() const;
6263 private:
64- /** Compressed IP->ASN mapping, loaded from a file when a node starts.
65+ /** Compressed IP->ASN mapping.
66+ *
67+ * Data may beloaded from a file when a node starts or embedded in the
I think the fact that this commit both changes the interpretation from individual-bools to bits-of-byte while also adding support for span-based embedded asmap file makes it a bit harder to review.
Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vector<std::byte>), and then in a second commit add support for embedded span-based ones?
DrahtBot added the label
Needs rebase
on Apr 11, 2025
refactor: Operate on bytes instead of bits in Asmap code828cf86a03
refactor: Modernize logging in util/asmap.cpp3e61557e62
refactor: Unify asmap version calculation and naming
Calculate the asmap version only in one place: A dedicated function in util/asmap.
The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
791345eceb
refactor: Use span instead of vector for data in util/asmap
This prevents holding the asmap data in memory twice.
c1c6943ea9
build: Add embedded asmap data
The data embedded is from the latest ASMap file from the asmap-data
repository: https://github.com/fjahr/asmap-data/blob/main/1742572800_asmap.dat
dd44a5de4c
build: Generate ip_asn.dat.h during build process
This can be disabled with -DWITH_EMBEDDED_ASMAP=OFF.
7550deece0
common: Use fallback in GetPathArg if the path is 1
Otherwise -asmap=1, for example, would be interpreted as a path "1".
8852dde7a5
init, net: Implement usage of binary-embedded asmap dataea9bd2006f
doc: Expand documentation on asmap feature and tooling9d88d99e1c
refactor: Add AutoFile::sizee407395190
CI: Use without embedded asmap build option in one ci job56df667f01
fjahr force-pushed
on Apr 22, 2025
fjahr
commented at 11:03 pm on April 22, 2025:
contributor
Rebased and addressed @sipa ’s feedback. Also made a few further improvements: Added some missing includes, dropped some docs that have been added in #32110 already (silent merge conflict), unified asmap version/checksum naming (I went with version because that’s what we have been communicating to users already).
Would it be possible to separate these into two commits (probably first switching the interpretation to be bit-of-std::byte based but leaving everything in std::vectorstd::byte), and then in a second commit add support for embedded span-based ones?
Done, I split the commit. I hope it makes review easier to have these changes separate. The downside is that a lot of lines are now touched in two commits but I guess it will be ok since many of them are easy to grasp.
DrahtBot removed the label
Needs rebase
on Apr 22, 2025
achow101 added this to the milestone 30.0
on Apr 25, 2025
in
src/netgroup.cpp:91
in
828cf86a03outdated
90- std::vector<bool> ip_bits(128);
91+ std::vector<std::byte> ip_bits(16);
92 if (address.HasLinkedIPv4()) {
93 // For lookup, treat as if it was just an IPv4 address (IPV4_IN_IPV6_PREFIX + IPv4 bits)
94 for (int8_t byte_i = 0; byte_i < 12; ++byte_i) {
95- for (uint8_t bit_i = 0; bit_i < 8; ++bit_i) {
In commit “refactor: Operate on bytes instead of bits in Asmap code”
This is not correct. The code (and comment) above indicate that buffer is being interpreted using 1 bit of asmap code per byte. The line here is interpreting it as 1 byte per byte.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-05-02 06:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me