fuzz: speed up addrman #30688

pull brunoerg wants to merge 1 commits into bitcoin:master from brunoerg:2024-08-fuzz-addrman changing 1 files +3 −2
  1. brunoerg commented at 9:20 AM on August 21, 2024: contributor

    There is no need to iterate 10'000 times and call Add 10'000 times at once (we only add 1000 addresses at once in practice). These values were introduced in 214d9055acdd72189a2f415477ce472ca8db4191.

    I see an avg of 40 exec/s on master for this target and 115 exec/s on this branch.

  2. DrahtBot commented at 9:20 AM on August 21, 2024: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30688.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK Eunovo

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Aug 21, 2024
  4. maflcko commented at 9:45 AM on August 21, 2024: member

    we only add 1000 addresses at once in practice

    Are you referring to MAX_ADDR_TO_SEND? If yes, it could make sense to clarify this in the description or even in a code comment, given that the variable name seems to indicate it is only used for sending, not for processing/adding.

  5. brunoerg commented at 11:11 AM on August 21, 2024: contributor

    Are you referring to MAX_ADDR_TO_SEND? If yes, it could make sense to clarify this in the description or even in a code comment, given that the variable name seems to indicate it is only used for sending, not for processing/adding.

    Yes, the name is a little bit confusing since it's the maximum number of addresses permitted in an ADDR message. So we use it for sending and receiving.

    if (vAddr.size() > MAX_ADDR_TO_SEND)
    {
        Misbehaving(*peer, strprintf("%s message size = %u", msg_type, vAddr.size()));
        return;
    }
    
  6. brunoerg force-pushed on Aug 21, 2024
  7. in src/test/fuzz/addrman.cpp:250 in 232bf91d0c outdated
     246 | @@ -247,7 +247,8 @@ FUZZ_TARGET(addrman, .init = initialize_addrman)
     247 |              },
     248 |              [&] {
     249 |                  std::vector<CAddress> addresses;
     250 | -                LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 10000) {
     251 | +                // Maximum number of addresses permitted in an ADDR message.
    


    maflcko commented at 11:33 AM on August 21, 2024:

    nit, could clarify the link and where the value is taken from?

                    // Maximum number of addresses permitted in an ADDR message. (MAX_ADDR_TO_SEND)
    

    brunoerg commented at 1:17 PM on August 21, 2024:

    Sounds good, I'll address it.


    brunoerg commented at 1:18 PM on August 21, 2024:

    done

  8. brunoerg force-pushed on Aug 21, 2024
  9. brunoerg commented at 1:18 PM on August 21, 2024: contributor

    Force-pushed addressing #30688 (review)

  10. Eunovo approved
  11. Eunovo commented at 7:31 AM on August 29, 2024: contributor

    Code Review ACK

  12. Eunovo commented at 11:26 AM on August 29, 2024: contributor

    Tested ACK I tested this using the method below but I'm a little uncomfortable with the magic number in the test. It might be worth it to expose the MAX_ADDR_TO_SEND and use in tests.

    Ran FUZZ=addrman build_fuzz/src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/addrman for master and FUZZ=addrman src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/addrman for the current PR for 5mins each and got the following output (last few lines shown):

    On master

    [#6076](/bitcoin-bitcoin/6076/)   REDUCE cov: 3887 ft: 21925 corp: 1246/57Mb lim: 1013425 exec/s: 27 rss: 1044Mb L: 801/1013406 MS: 1 EraseBytes-
    [#6547](/bitcoin-bitcoin/6547/)   NEW    cov: 3887 ft: 21926 corp: 1247/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 716612/1013406 MS: 1 CMP- DE: "\000\377\377SSSSSSSSSSSSS"-
    [#6959](/bitcoin-bitcoin/6959/)   REDUCE cov: 3887 ft: 21926 corp: 1247/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 696/1013406 MS: 2 CopyPart-EraseBytes-
    [#7484](/bitcoin-bitcoin/7484/)   NEW    cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 262394/1013406 MS: 5 ChangeBit-PersAutoDict-PersAutoDict-ChangeASCIIInt-ChangeByte- DE: "y\000\000\000"-"y\000\000\000"-
    [#7885](/bitcoin-bitcoin/7885/)   REDUCE cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 42585/1013406 MS: 1 EraseBytes-
    [#7922](/bitcoin-bitcoin/7922/)   REDUCE cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 436055/1013406 MS: 2 ChangeBit-EraseBytes-
    [#7988](/bitcoin-bitcoin/7988/)   REDUCE cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 64/1013406 MS: 1 EraseBytes-
    [#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb
    [#8350](/bitcoin-bitcoin/8350/)   REDUCE cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 119516/1013406 MS: 2 InsertByte-EraseBytes-
    [#8358](/bitcoin-bitcoin/8358/)   REDUCE cov: 3887 ft: 21933 corp: 1248/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 2385/1013406 MS: 3 ChangeASCIIInt-PersAutoDict-EraseBytes- DE: "\377\377\377\377"-
    [#8446](/bitcoin-bitcoin/8446/)   NEW    cov: 3887 ft: 21934 corp: 1249/58Mb lim: 1013425 exec/s: 29 rss: 1044Mb L: 122257/1013406 MS: 3 ShuffleBytes-InsertRepeatedBytes-CopyPart-
    [#8767](/bitcoin-bitcoin/8767/)   REDUCE cov: 3887 ft: 21934 corp: 1249/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 119/1013406 MS: 1 EraseBytes-
    [#8768](/bitcoin-bitcoin/8768/)   REDUCE cov: 3887 ft: 21934 corp: 1249/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 222/1013406 MS: 1 EraseBytes-
    [#8993](/bitcoin-bitcoin/8993/)   REDUCE cov: 3887 ft: 21934 corp: 1249/58Mb lim: 1013425 exec/s: 30 rss: 1044Mb L: 107/1013406 MS: 5 InsertRepeatedBytes-CopyPart-EraseBytes-ChangeByte-EraseBytes-
    

    On this PR

    [#20159](/bitcoin-bitcoin/20159/)  REDUCE cov: 3884 ft: 21207 corp: 1202/37Mb lim: 1048551 exec/s: 70 rss: 750Mb L: 66/1042462 MS: 1 EraseBytes-
    [#20465](/bitcoin-bitcoin/20465/)  REDUCE cov: 3884 ft: 21207 corp: 1202/37Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 192/1042462 MS: 1 EraseBytes-
    [#20537](/bitcoin-bitcoin/20537/)  REDUCE cov: 3884 ft: 21207 corp: 1202/37Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 30270/1042462 MS: 2 ChangeByte-EraseBytes-
    [#20721](/bitcoin-bitcoin/20721/)  NEW    cov: 3884 ft: 21216 corp: 1203/38Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 320476/1042462 MS: 4 InsertByte-ChangeByte-ChangeASCIIInt-CMP- DE: "\000\000\177\324\242\177>\247"-
    [#21069](/bitcoin-bitcoin/21069/)  REDUCE cov: 3884 ft: 21216 corp: 1203/38Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 119/1042462 MS: 3 ChangeBit-ShuffleBytes-EraseBytes-
    [#21131](/bitcoin-bitcoin/21131/)  REDUCE cov: 3884 ft: 21216 corp: 1203/38Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 84506/1042462 MS: 2 ShuffleBytes-EraseBytes-
    [#21309](/bitcoin-bitcoin/21309/)  NEW    cov: 3884 ft: 21217 corp: 1204/38Mb lim: 1048551 exec/s: 71 rss: 750Mb L: 500634/1042462 MS: 3 PersAutoDict-CrossOver-CopyPart- DE: "-connect"-
    [#21330](/bitcoin-bitcoin/21330/)  NEW    cov: 3884 ft: 21218 corp: 1205/38Mb lim: 1048551 exec/s: 72 rss: 750Mb L: 214/1042462 MS: 1 CMP- DE: "\000\000RP\000rq="-
    [#21377](/bitcoin-bitcoin/21377/)  REDUCE cov: 3884 ft: 21218 corp: 1205/38Mb lim: 1048551 exec/s: 72 rss: 750Mb L: 120595/1042462 MS: 2 ChangeBit-EraseBytes-
    [#21630](/bitcoin-bitcoin/21630/)  REDUCE cov: 3884 ft: 21218 corp: 1205/38Mb lim: 1048551 exec/s: 72 rss: 750Mb L: 51001/1042462 MS: 3 ChangeBit-InsertRepeatedBytes-EraseBytes-
    [#21687](/bitcoin-bitcoin/21687/)  REDUCE cov: 3884 ft: 21218 corp: 1205/38Mb lim: 1048551 exec/s: 72 rss: 750Mb L: 26311/1042462 MS: 2 ChangeByte-EraseBytes-
    

    Which shows a more than 2x increase in the exec/s on my machine.

    NB I use timeout 300s to run both commands for exactly 300s

  13. brunoerg requested review from maflcko on Aug 29, 2024
  14. in src/test/fuzz/addrman.cpp:239 in a8ce65c30f outdated
     235 | @@ -236,7 +236,7 @@ FUZZ_TARGET(addrman, .init = initialize_addrman)
     236 |          }
     237 |      }
     238 |      AddrManDeterministic& addr_man = *addr_man_ptr;
     239 | -    LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 10000) {
     240 | +    LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 100) {
    


    maflcko commented at 6:11 AM on August 30, 2024:

    Is this still enough to cover the code?

    I see a small drop in coverage due to this.


    brunoerg commented at 5:18 PM on August 30, 2024:

    I think so. I do not see any reason for not.

    I didn't generate a report to see the difference, but I think that it might be related to any specific case related to collision that was reached due to the frequency of addrs being added, so I expect this to find same but taking a little bit more time? Anyway, I'm leaving it running to make sure it can achieve same coverage.


    brunoerg commented at 3:51 PM on September 18, 2024:
  15. in src/test/fuzz/addrman.cpp:251 in a8ce65c30f outdated
     246 | @@ -247,7 +247,8 @@ FUZZ_TARGET(addrman, .init = initialize_addrman)
     247 |              },
     248 |              [&] {
     249 |                  std::vector<CAddress> addresses;
     250 | -                LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 10000) {
     251 | +                // Maximum number of addresses permitted in an ADDR message (MAX_ADDR_TO_SEND).
     252 | +                LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), 1000) {
    


    l0rinc commented at 11:44 AM on August 30, 2024:

    Was wondering where most time is spent here, a quick profiling on master reveals: <img width="600" alt="image" src="https://github.com/user-attachments/assets/0d7d6c4f-b4bc-482a-9760-1660a6659e35"> i.e. serialization seems to take most time - not sure there's anything we can do about that here.

    But checked how much time is spent reallocating the vector instead, i.e.:

    static void AddrManAdd2(benchmark::Bench& bench)
    {
        bench.run([&] {
            AddrMan addr_man{EMPTY_NETGROUPMAN, /*deterministic=*/true, ADDRMAN_CONSISTENCY_CHECK_RATIO};
            // Add one address to the new table
            CService addr = Lookup("250.3.1.1", 8333, false).value();
            addr_man.Add({CAddress(addr, NODE_NONE)}, addr, std::chrono::seconds{1234});
    
            std::vector<CAddress> addresses;
            //  addresses.reserve(10000);
            for(int i = 0; i < 10000; ++i) {
                addresses.emplace_back(addr, NODE_NONE);
            }
            addr_man.Add(addresses, addr, std::chrono::seconds{1234});
            ankerl::nanobench::doNotOptimizeAway(addr_man.Size());
        });
    }
    

    which reveals a small, but measurable gain if we add a reserve to vector: | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 1,116,935.10 | 895.31 | 2.8% | 0.37 | AddrManAdd2

    vs

            std::vector<CAddress> addresses;
            addresses.reserve(10000);
    

    | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 971,714.29 | 1,029.11 | 0.5% | 0.35 | AddrManAdd2

    note: push_back vs emplace_back doesn't seem to make a difference here.


    brunoerg commented at 5:20 PM on August 30, 2024:

    Why should we reserve this number if we don't know the exact size? I mean, for example, reserving 10,000 but pushing only 2 addresses.


    l0rinc commented at 6:31 PM on August 30, 2024:

    Instead of a limited while, could we generate a random count and allocate and iterate based on that?


    maflcko commented at 7:40 AM on September 2, 2024:

    Instead of a limited while, could we generate a random count and allocate and iterate based on that?

    LIMITED_WHILE(fuzzed_data_provider.ConsumeBool(), ... is used intentionally to break out of the while loop when the fuzz input is exhausted. ConsumeBool will return false when no input bytes are available.


    l0rinc commented at 2:26 PM on September 2, 2024:

    I meant generating the size directly to be able to reserve, i.e.

    auto count = fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 1000);
    std::vector<CAddress> addresses;
    addresses.reserve(count);
    for (size_t i = 0; i < count; ++i) {
        addresses.push_back(ConsumeAddress(fuzzed_data_provider));
    }
    

    like we seem to be doing in many other places.


    maflcko commented at 3:17 PM on September 2, 2024:

    I meant generating the size directly to be able to reserve, i.e.

    Yes, I understood what you meant. My reply still holds. I don't see the benefit of creating possibly up to 1000 constant addresses when the fuzz input is exhausted. The bitdeque code you link to isn't a loop, so it looks unrelated. For a loop of up to 32, I can see how both ways are acceptable. And in the golumb_rice fuzz target the ConsumeRandomLengthByteVector may similarly return up to 512 empty vectors.

    In practise, I doubt that any of this matters, but this isn't a linear problem of whether or not a call to reserve can be used or not. This affects the fuzz engine minimization algorithm, as well as the search algorithm (and possibly speed).

  16. bitcoin deleted a comment on Aug 30, 2024
  17. fuzz: speed up addrman
    There is no need to iterate 10'000 times and
    call `Add` 10'000 times at once (we only add
    1000 addresses at once in practice). These values
    were introduced in 214d9055acdd72189a2f415477ce472ca8db4191.
    de85ebeff4
  18. brunoerg force-pushed on Sep 18, 2024
  19. brunoerg commented at 3:51 PM on September 18, 2024: contributor

    Force-pushed increasing the number of iterations from 100 to 1000 (the original value is 10'000). With this value, I could see faster execs (115 exec/s) and same coverage.

  20. brunoerg commented at 4:33 PM on September 19, 2024: contributor

    Just provided the fuzz inputs for testing: https://github.com/brunoerg/qa-assets/pull/1

  21. Eunovo commented at 4:56 AM on September 25, 2024: contributor

    I retested this PR @https://github.com/bitcoin/bitcoin/pull/30688/commits/de85ebeff481c4afa1b6b3ea9470333692e4f099 against master using the qa-assets https://github.com/bitcoin-core/qa-assets from and https://github.com/brunoerg/qa-assets/pull/1

    This PR vs Master on https://github.com/bitcoin-core/qa-assets

    [#2419155](/bitcoin-bitcoin/2419155/)        REDUCE cov: 3972 ft: 22225 corp: 1718/24Mb lim: 964552 exec/s: 90 rss: 1015Mb L: 94/949713 MS: 4 CopyPart-ChangeASCIIInt-ChangeBinInt-EraseBytes-
    

    Master

    [#2762232](/bitcoin-bitcoin/2762232/)        REDUCE cov: 4008 ft: 22596 corp: 1684/30Mb lim: 883365 exec/s: 65 rss: 1032Mb L: 568/868887 MS: 1 EraseBytes-
    

    On https://github.com/brunoerg/qa-assets/pull/1

    [#4862507](/bitcoin-bitcoin/4862507/)        REDUCE cov: 3969 ft: 22094 corp: 1675/25Mb lim: 979714 exec/s: 87 rss: 987Mb L: 1895/936269 MS: 1 EraseBytes-
    

    Master

    [#1164418](/bitcoin-bitcoin/1164418/)        REDUCE cov: 4005 ft: 22237 corp: 1418/33Mb lim: 880321 exec/s: 65 rss: 981Mb L: 4738/805664 MS: 2 ChangeByte-EraseBytes-
    

    I ran the fuzz tests for several hours, while this PR is always faster, there is still a clear drop in coverage, even on the qa-assets provided in https://github.com/brunoerg/qa-assets/pull/1

  22. marcofleon commented at 4:41 PM on October 24, 2024: contributor

    @Eunovo I'm not seeing that reduction in coverage when I run the target. What sanitizers are you using? Also, what does the coverage say if you do -runs=1? (instead of letting it run for a while)

  23. brunoerg marked this as a draft on Oct 25, 2024
  24. brunoerg commented at 3:53 PM on October 25, 2024: contributor

    I just moved it to draft since I'm doing more experiments yet.

  25. maflcko commented at 3:24 PM on January 22, 2025: member

    Are you still working on this?

  26. brunoerg commented at 4:35 PM on January 24, 2025: contributor

    Closing for now. Focusing on other things.

  27. brunoerg closed this on Jan 24, 2025

  28. bitcoin locked this on Jan 24, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 03:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me