bench: reduces unnecessary big vector definition #32787
pull sr-gi wants to merge 1 commits into bitcoin:master from sr-gi:20250620-fix-rollingbloom-bench changing 1 files +1 −1-
sr-gi commented at 6:34 pm on June 20, 2025: memberThe data vector used in the benchmark was defined to hold 32 bytes, instead of 4, while the writing functions work with 32-bit values.
-
bench: reduces unnecessary big vector definition
data was defined to hold 32 bytes, instead of 4, while the writing functions work with 32-bit values
-
DrahtBot added the label Tests on Jun 20, 2025
-
DrahtBot commented at 6:34 pm on June 20, 2025: contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage & Benchmarks
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32787.
Reviews
See the guideline for information on the review process.
Type Reviewers ACK furszy, davidgumberg User requested bot ignore TheCharlatan If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
-
furszy commented at 6:42 pm on June 20, 2025: member
ACK 2a50c61e8d41be425f9f43ea62d1329258ee0126
Guess could have used
sizeof(uint32_t)
too. But seems fine anyway. -
TheCharlatan approved
-
TheCharlatan commented at 7:05 pm on June 20, 2025: contributorACK 2a50c61e8d41be425f9f43ea62d1329258ee0126
-
davidgumberg commented at 11:29 pm on June 20, 2025: contributor
-
in src/bench/rollingbloom.cpp:17 in 2a50c61e8d
13@@ -14,7 +14,7 @@ 14 static void RollingBloom(benchmark::Bench& bench) 15 { 16 CRollingBloomFilter filter(120000, 0.000001); 17- std::vector<unsigned char> data(32); 18+ std::vector<unsigned char> data(4);
maflcko commented at 8:35 am on June 21, 2025:nack. The previous code is correct.
Usually rolling bloom filters are used to insert (w)txids, which are of size 256 bits, which is 32 bytes. There is no place in the codebase that inserts 4 bytes.
Changing this will change the goal of the benchmark (you can also see this in https://corecheck.dev/bitcoin/bitcoin/pulls/32787).
If this is too confusing, it could make sense to document it somehow, but I think changing the size is wrong.
l0rinc commented at 11:56 am on June 22, 2025:During a quick IBD we seem to be getting these values:
- 18/34 bytes - https://github.com/bitcoin/bitcoin/blob/ed060e01e756fc8d4c1c590e79f9555ae86c433f/src/net_processing.cpp#L1104
- 32 bytes - https://github.com/bitcoin/bitcoin/blob/ed060e01e756fc8d4c1c590e79f9555ae86c433f/src/node/txdownloadman_impl.cpp#L103
I was also wondering where the
4
was coming from, wasn’t obvious from the description.
Also note that the return value of https://github.com/bitcoin/bitcoin/blob/ed060e01e756fc8d4c1c590e79f9555ae86c433f/src/bench/rollingbloom.cpp#L25 is ignored so it may be eliminated in the benchmark.
assert(!filter.contains(data))
also wouldn’t work since it’s a probabilistic data structure and will fail for a large--min-time
- but if you touch again please considerankerl::nanobench::doNotOptimizeAway(!filter.contains(data))
instead.
TheCharlatan commented at 7:11 pm on June 22, 2025:it could make sense to document it somehow, but I think changing the size is wrong.
Yes, egg on the face, should get a small comment, since I and others missed it.
sr-gi commented at 3:30 pm on June 23, 2025:Usually rolling bloom filters are used to insert (w)txids, which are of size 256 bits, which is 32 bytes. There is no place in the codebase that inserts 4 bytes.
I see. I missed that the value to be inserted into the filter was hashed before being inserted; therefore, the leading zeros matter.
Wouldn’t it make more sense to define cont as
uint256
to avoid any confusion? @l0rinc I think if we do that, it may make more sense to just open another PR and a comment to that one explaining why we are using a 32-bit value even though the filter write is 32 bytes (and then ditch this PR).maflcko changes_requestedfanquake marked this as a draft on Jun 23, 2025
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-01 00:12 UTC
More mirrored repositories can be found on mirror.b10c.me