addrman: Make consistency checks a runtime option #20233

pull jnewbery wants to merge 4 commits into bitcoin:master from jnewbery:2020-10-addrman-sanity changing 11 files +113 −104
  1. jnewbery commented at 9:42 pm on October 23, 2020: member

    CAddrMan has internal consistency checks. Currently, these are only run when the program is compiled with the DEBUG_ADDRMAN option. This option is not enabled on any of our CI builds, and it’s likely that no-one is running them at all.

    This PR makes consistency checks a (hidden) runtime option that can be enabled with -checkaddrman, where -checkaddrman=n will result in the consistency checks running every n operations (similar to -checkmempool=n). We set the ratio to 1/100 for our unit tests, and leave it disabled by default for all networks. Additionally, a consistency check failure now asserts, rather than logging and continuing. This matches the behavior of CTxMemPool and TxRequestTracker, where a failed consistency check asserts.

  2. jnewbery renamed this:
    addrman: make sanity checks a runtime option
    addrman: Make sanity checks a runtime option
    on Oct 23, 2020
  3. DrahtBot added the label P2P on Oct 23, 2020
  4. DrahtBot added the label RPC/REST/ZMQ on Oct 23, 2020
  5. in src/test/util/setup_common.cpp:177 in b155d8aeda outdated
    171@@ -172,7 +172,9 @@ TestingSetup::TestingSetup(const std::string& chainName, const std::vector<const
    172 
    173     m_node.banman = MakeUnique<BanMan>(GetDataDir() / "banlist.dat", nullptr, DEFAULT_MISBEHAVING_BANTIME);
    174     m_node.connman = MakeUnique<CConnman>(0x1337, 0x1337, *m_node.addrman); // Deterministic randomness for tests.
    175-    m_node.peerman = MakeUnique<PeerManager>(chainparams, *m_node.connman, m_node.banman.get(), *m_node.scheduler, *m_node.chainman, *m_node.mempool);
    176+    m_node.peerman = MakeUnique<PeerManager>(chainparams, *m_node.connman, *m_node.addrman,
    177+                                             m_node.banman.get(), *m_node.scheduler, *m_node.chainman,
    178+                                             *m_node.mempool);
    


    Empact commented at 11:50 pm on October 23, 2020:
    nit: other cases line break after banman

    jnewbery commented at 4:06 pm on October 29, 2020:
    I don’t it matters where the line break is, as long as the line isn’t too long.
  6. MarcoFalke removed the label RPC/REST/ZMQ on Oct 24, 2020
  7. DrahtBot commented at 8:59 am on October 24, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22627 ([addrman] De-duplicate Add() function by amitiuttarwar)
    • #21878 (Make all networking code mockable by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. practicalswift commented at 9:03 am on October 24, 2020: contributor

    Concept ACK

    Should we run with -checkaddrman in at least one of the Travis jobs?

  9. jnewbery commented at 10:28 am on October 24, 2020: member

    Should we run with -checkaddrman in at least one of the Traivs jobs?

    -checkaddrman is enabled by default for regtest, so is on for all Travis jobs (although our functional tests probably don’t exercise addrman very much)

  10. jnewbery marked this as a draft on Oct 24, 2020
  11. jnewbery force-pushed on Oct 24, 2020
  12. jnewbery commented at 5:11 pm on October 24, 2020: member
    The consistency check fails on one of the unit tests. I’ve converted this PR to draft status while I work out why it’s failing.
  13. practicalswift commented at 9:56 pm on October 24, 2020: contributor

    -checkaddrman is enabled by default for regtest, so is on for all Travis jobs (although our functional tests probably don’t exercise addrman very much)

    Oh, I missed the “by default on regtest” in the PR description. Sorry about that :)

    Defaulting to sanity checking in regtest makes perfect sense! Even more Concept ACK :)

  14. jnewbery renamed this:
    addrman: Make sanity checks a runtime option
    addrman: Make consistency checks a runtime option
    on Oct 25, 2020
  15. jnewbery force-pushed on Oct 25, 2020
  16. jnewbery marked this as ready for review on Oct 25, 2020
  17. jnewbery commented at 10:00 am on October 25, 2020: member
    I’ve fixed the failing unit tests.
  18. jnewbery commented at 2:47 pm on October 25, 2020: member
    This makes some of the functional tests run very slowly. I’ll do some comparisons and maybe disable consistency checks on the slowest tests.
  19. jnewbery commented at 12:02 pm on October 26, 2020: member
    Marking as draft until #20228 is merged.
  20. jnewbery marked this as a draft on Oct 26, 2020
  21. ajtowns commented at 2:57 am on January 6, 2021: member

    This seems to make rpc_net and p2p_getaddr_caching tests run extremely slowly:

    rpc_net.py                                       | ✓ Passed  | 1372 s
    p2p_getaddr_caching.py                           | ✓ Passed  | 1367 s
    

    I don’t think there’s any need for this to be based on #20228 – passing a “check_addrman” from init to CConnman seems to work fine, see https://github.com/ajtowns/bitcoin/commits/202101-addrman-check eg.

    More generally, I’m not sure that these consistency checks are that useful to enable at runtime/CI rather than just using them as an aid when making changes to addrman internals.

  22. jnewbery commented at 9:30 am on January 6, 2021: member

    This seems to make rpc_net and p2p_getaddr_caching tests run extremely slowly.

    Yes, see my comment at #20233 (comment). I was planning on addressing that after 20228 was merged, probably by disabling the checks for those tests, or perhaps by changing the consistency check configuration to be a ratio, like the mempool consistency checks:

    https://github.com/bitcoin/bitcoin/blob/68196a891056f98c1df0debacf09fb2ea4682a43/src/txmempool.h#L491

    I don’t think there’s any need for this to be based on #20228 – passing a “check_addrman” from init to CConnman seems to work fine, see ajtowns/bitcoin@202101-addrman-check (commits) eg.

    Sure, we could continue to pass function calls and (now) ctor parameters through CConnman, but I don’t see why that is desirable. Connman uses addrman, but there’s no reason for it to have an addrman. Addrman is also used by peerman and the RPCs. Connman’s resposibility is to open and maintain connections. Storing and maintaining our map of accessible nodes on the network is a separate responsibility.

    There was also some discussion here: http://www.erisian.com.au/bitcoin-core-dev/log-2020-12-16.html#l-585 about being able to manage addrman more directly. Again, in that case it’d be easier if RPC directly held a handle to the addrman instead of passing messages through connman.

    More generally, I’m not sure that these consistency checks are that useful to enable at runtime/CI rather than just using them as an aid when making changes to addrman internals.

    I think somewhere that they’re definitely useful would be in fuzz testing, to ensure that there’s no way to violate addrman’s invariants. See the usage of txrequest’s sanity check here:

    https://github.com/bitcoin/bitcoin/blob/68196a891056f98c1df0debacf09fb2ea4682a43/src/test/fuzz/txrequest.cpp#L373

    which is called after every loop in the fuzz tester.

  23. ajtowns commented at 6:24 am on January 7, 2021: member

    Connman uses addrman, but there’s no reason for it to have an addrman.

    There’s no point having addrman without a connman outside of unit tests (in which case the test can create an addrman directly), and connman uses addrman. That’s enough reason for connman to be the thing that owns it, and for anything that wants it to access it via connman. I don’t think there’s anything you could do with node.addrman that you couldn’t do equally well with node.connman.GetAddrman().

    But more importantly, it’s easy to separate the two questions (should connman own the addrman instance and should debug_addrman be used more often) and judge them independently, so we should.

    I think somewhere that they’re definitely useful would be in fuzz testing, […]

    Yeah; I agree in principle. But while I think txrequest’s checks are reasonably optimised to be run frequently with modest data sets, I think addrman’s is very heavy weight and only really works for pretty small ones. Maybe it would make sense to have the automatic internal checks (Good() surrounds Good_() by invocations to Check()) only happen via compile-time changes so you can use them when debugging, but still have Check_() compiled unconditionally so that the fuzzer and unit tests call Check() themselves.

    Any idea how much this PR currently slows down the fuzzer? Doing the checks irregularly might be enough either way though – perhaps especially if how irregular it is adjust dynamically (100% for the first x checks, 50% for the next x checks, etc?)

    Hmm, actually, at least as it stands, perhaps addrman checks should be a regtest-only option, rather than a regtest-by-default option? Aren’t those checks always too heavyweight to run when addrman is populated from mainnet/testnet?

  24. jnewbery commented at 9:55 am on January 7, 2021: member

    But more importantly, it’s easy to separate the two questions (should connman own the addrman instance and should debug_addrman be used more often) and judge them independently, so we should.

    The aim of this PR is to add the ability to run internal consistency checks in addrman. If that can be done without adding more unnecessary coupling between components, then we should do that. There’s no good reason that connman should be forwarding constructor parameters through to addrman, so I don’t want to add that.

    This design of both connman and peerman holding a handle to another component is exactly the same as we use for banman, with the added bonus that banman is internally an optional component (although there’s no exposed configuration option to disable it). That design ensures very clean separation between components with no unexpected dependencies creeping in, and no boilerplate forwarding code in connman.

    Changing the constuctor parameters is changing the public interface of addrman, so there are two choices:

    1. keep it in connman and update connman’s ctor interface, adding yet more coupling between components
    2. split out addrman and then add the ctor parameter

    (2) is more work now, but it’s a net removal of boilerplate code (-24 in #20228), and is a better design for the long term.

  25. MarcoFalke referenced this in commit 1999baac30 on Mar 30, 2021
  26. jnewbery force-pushed on Mar 30, 2021
  27. jnewbery commented at 10:54 am on March 30, 2021: member
    Rebased on master. I’m leaving this as draft since I still need to update it to disable addrman consistency checks for certain tests.
  28. jnewbery force-pushed on Jun 2, 2021
  29. vasild commented at 11:43 am on July 16, 2021: member
    Concept ACK
  30. jnewbery force-pushed on Jul 18, 2021
  31. jnewbery force-pushed on Jul 19, 2021
  32. jnewbery marked this as ready for review on Jul 19, 2021
  33. jnewbery commented at 10:49 am on July 19, 2021: member

    Addrman corruption is getting more attention recently, so I’ve rebased on master, with some small changes:

    • the option is now an integer, where setting -checkaddrman=n will cause the consistency checks to be run every n operations (similar to -checkmempool=n).

    • the consistency check ratio is set to 100 for unit tests. On my machine, running the addrman_tests took:

      • with no consistency checks: 0.65s
      • with consistency check ratio set to 100: 0.71s
      • with consistency check ratio set to 1: 7.75s

      Running consistency checks on 1% of operations seems to give good increased coverage without impacting test runtime too much

    • consistency checks are disabled by default for all networks, including regtest. That can be changed in a follow-up PR if people think it should be enabled by default for regtest.

  34. jonatack commented at 11:01 am on July 19, 2021: member
    Concept ACK to facilitate using this check, and as recompiling after defining DEBUG_ADDRMAN is very slow. Perhaps use uint32 rather than int32.
  35. in src/init.cpp:1164 in 9d5575e27c outdated
    1160@@ -1160,7 +1161,8 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1161     const bool ignores_incoming_txs{args.GetBoolArg("-blocksonly", DEFAULT_BLOCKSONLY)};
    1162 
    1163     assert(!node.addrman);
    1164-    node.addrman = std::make_unique<CAddrMan>();
    1165+    int32_t check_addrman = std::min<int32_t>(std::max<int32_t>(args.GetArg("-checkaddrman", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), 0), 1000000);
    


    MarcoFalke commented at 11:09 am on July 19, 2021:

    jnewbery commented at 1:05 pm on July 19, 2021:
    Nice. Done.
  36. in src/addrman.h:754 in 9d5575e27c outdated
    738-        const int err = Check_();
    739+
    740+        int err = Check_();
    741         if (err) {
    742             LogPrintf("ADDRMAN CONSISTENCY CHECK FAILED!!! err=%i\n", err);
    743+            assert(false);
    


    MarcoFalke commented at 11:11 am on July 19, 2021:
    Nice. Verbosely complaining to developers about bugs makes them less likely to get missed.

    MarcoFalke commented at 11:27 am on July 19, 2021:
    Unrelated improvement: Maybe just simplify the if foo return -xx; with assert(!foo); in the cpp file?

    vasild commented at 12:03 pm on July 19, 2021:
    0            std::abort();
    

    jnewbery commented at 1:05 pm on July 19, 2021:
    @MarcoFalke I left the cpp file unchanged in case we want to later add some kind of recovery code instead of simply asserting on failure. Changing the if foo return -xx; to assert(!foo) would involve making changes in more places. @vasild - What’s the advantage of using std::abort? I see lots more examples of assert(false) in the code than std::abort().

    vasild commented at 3:04 pm on July 19, 2021:
    assert() calls abort(), so it is the same thing in practice (modulo the “strange” message assertion failed: false). Feel free to ignore or use assert(0) or assert(nullptr).

    ajtowns commented at 5:26 am on July 20, 2021:
    assert(false && "addrman consistency check failed"); perhaps? Or assert(err == 0) or Assume(err == 0)

    MarcoFalke commented at 5:29 am on July 20, 2021:
    #22500 ;)

    jnewbery commented at 9:30 am on July 21, 2021:
    Thanks @MarcoFalke. I’ve added the two commits from that PR into this branch.
  37. MarcoFalke approved
  38. jnewbery force-pushed on Jul 19, 2021
  39. MarcoFalke commented at 2:07 pm on July 19, 2021: member
    review ACK a343f9079c30e1dce6f13852ff84bab7dab3fcde
  40. jamesob commented at 2:12 pm on July 19, 2021: member
    Concept ACK, seems like an obvious win. Will review soon.
  41. in src/addrman.h:488 in a343f9079c outdated
    484@@ -482,7 +485,7 @@ class CAddrMan
    485         mapAddr.clear();
    486     }
    487 
    488-    CAddrMan()
    489+    CAddrMan(int32_t addrman_check=DEFAULT_ADDRMAN_CONSISTENCY_CHECKS) : m_consistency_check(addrman_check)
    


    vasild commented at 3:29 pm on July 19, 2021:
    nit: clang-format, spaces around =

    jnewbery commented at 10:33 am on July 20, 2021:
    done
  42. in src/addrman.h:736 in a343f9079c outdated
    735     {
    736-#ifdef DEBUG_ADDRMAN
    737         AssertLockHeld(cs);
    738-        const int err = Check_();
    739+
    740+        int err = Check_();
    


    vasild commented at 3:31 pm on July 19, 2021:
    Why remove const?

    jnewbery commented at 10:33 am on July 20, 2021:
    added back
  43. in src/init.cpp:505 in a343f9079c outdated
    501@@ -502,6 +502,7 @@ void SetupServerArgs(ArgsManager& argsman)
    502     argsman.AddArg("-checkblocks=<n>", strprintf("How many blocks to check at startup (default: %u, 0 = all)", DEFAULT_CHECKBLOCKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    503     argsman.AddArg("-checklevel=<n>", strprintf("How thorough the block verification of -checkblocks is: %s (0-4, default: %u)", Join(CHECKLEVEL_DOC, ", "), DEFAULT_CHECKLEVEL), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    504     argsman.AddArg("-checkblockindex", strprintf("Do a consistency check for the block tree, chainstate, and other validation data structures occasionally. (default: %u, regtest: %u)", defaultChainParams->DefaultConsistencyChecks(), regtestChainParams->DefaultConsistencyChecks()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    505+    argsman.AddArg("-checkaddrman", strprintf("Do consistency checks on the addrman before and after every <n> operations (default: %u)", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    


    vasild commented at 3:37 pm on July 19, 2021:
    0    argsman.AddArg("-checkaddrman", strprintf("Do consistency checks on the addrman every <n> operations. Use 0 to disable. (default: %u)", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    

    jnewbery commented at 10:33 am on July 20, 2021:
    done
  44. in src/addrman.cpp:468 in a343f9079c outdated
    464 int CAddrMan::Check_()
    465 {
    466     AssertLockHeld(cs);
    467 
    468+    if (m_consistency_check == 0) return 0;
    469+    if (GetRand(m_consistency_check) >= 1) return 0;
    


    vasild commented at 3:58 pm on July 19, 2021:
    This could result in “sometimes fails, sometimes succeeds” tests on CI. A developer should always get the failure if using -checkaddrman=1. If this is undesirable, then if (++counter % m_consistency_check == 0) can be considered to get the checks at deterministic times.

    MarcoFalke commented at 4:09 pm on July 19, 2021:
    The same is true for CTxMemPool::check. At least here we could use insecure_rand, which should be deterministic if initialized deterministically.

    vasild commented at 7:33 am on July 20, 2021:
    Or set g_mock_deterministic_tests to true for all tests :eyes: :bomb:

    jnewbery commented at 8:24 am on July 20, 2021:

    At least here we could use insecure_rand, which should be deterministic if initialized deterministically.

    This has the unfortunate effect that the checks are run on the same operations for every run of the unit tests. Having them picked at random means that eventually they’re run at every operation in the unit tests. Taking entropy from insecure_rand at different points would also invalidate the tests that have hard-coded values based on the determinstic insecure_rand.


    jnewbery commented at 10:35 am on July 20, 2021:
    Leaving this as is for now. The GetRand() call doesn’t seem to result in a performance degradation from my tests: #20233 (comment).
  45. vasild approved
  46. vasild commented at 4:03 pm on July 19, 2021: member

    ACK a343f9079c30e1dce6f13852ff84bab7dab3fcde

    This results in about 8% slowdown of ./src/test/test_bitcoin on my machine (approx 6m10s -> 6m45s).

  47. jnewbery commented at 5:11 pm on July 19, 2021: member

    This results in about 8% slowdown of ./src/test/test_bitcoin on my machine (approx 6m10s -> 6m45s).

    Hmmm, that’s rather worrying. I’d expect the only test for this to have a material impact on would be the addrman tests, which take less than a second to run. Perhaps the calls to GetRand() are expensive enough that even with consistency checks disabled, this has a negative effect on performance.

  48. MarcoFalke commented at 5:20 pm on July 19, 2021: member
    GetRand seems unneeded wasteful anyway (it calls SeedFast on every call). Have you seen #20233 (review) ?
  49. jnewbery commented at 5:30 pm on July 19, 2021: member

    GetRand seems unneeded wasteful anyway (it calls SeedFast on every call). Have you seen #20233 (comment) ?

    I see it. I’ll try this with insecure_rand tomorrow. Thanks for the tip!

  50. DrahtBot added the label Needs rebase on Jul 20, 2021
  51. in src/addrman.h:730 in a343f9079c outdated
    728@@ -723,22 +729,19 @@ class CAddrMan
    729     CAddrInfo SelectTriedCollision_() EXCLUSIVE_LOCKS_REQUIRED(cs);
    730 
    731     //! Consistency check
    732-    void Check()
    


    vasild commented at 7:40 am on July 20, 2021:

    Would it be a good idea to

    1. do the check unconditionally at the end of Unserialize() (as if m_consistency_check == 1) and
    2. if it fails, to treat it as disk corruption - throw std::ios_base::failure instead of abort()?

    jnewbery commented at 8:20 am on July 20, 2021:
    Seems reasonable. This can be done as a follow-up if desired.
  52. jnewbery commented at 10:30 am on July 20, 2021: member

    This results in about 8% slowdown of ./src/test/test_bitcoin on my machine (approx 6m10s -> 6m45s). @vasild how many samples did you take? There’s some variance in the time taken to run the entire test suite, so you’ll need to run it several times to ensure that this isn’t just noise. Here are my results:

    Returning immediately from _Check():

    0→ time `for i in {1..5}; do ./src/test/test_bitcoin; done`
    1
    2[...]
    3
    4real	4m1.543s
    5user	2m28.929s
    6sys	0m25.260s
    

    And calling GetRand(m_consistency_check) in _Check():

    0→ time `for i in {1..5}; do ./src/test/test_bitcoin; done`
    1
    2[...]
    3
    4real	4m0.947s
    5user	2m33.306s
    6sys	0m21.416s
    

    So for me, there’s essentially no difference.

  53. jnewbery force-pushed on Jul 20, 2021
  54. DrahtBot removed the label Needs rebase on Jul 20, 2021
  55. jnewbery commented at 9:31 am on July 21, 2021: member
    Added the two commits from PR #22500, to resolve the various suggestions in this thread: #20233 (review).
  56. jnewbery force-pushed on Jul 21, 2021
  57. jnewbery commented at 9:41 am on July 21, 2021: member
    @MarcoFalke has just told me that he no longer believes that #22500 is the right approach, so I’ve removed those two commits again.
  58. lsilva01 approved
  59. lsilva01 commented at 11:17 pm on July 21, 2021: contributor
    Code Review ACK and Tested ACK https://github.com/bitcoin/bitcoin/pull/20233/commits/b251f68ba7069fdb11cb66c0b6b8acdcce667fc9 on Ubuntu 20.04 using the runtime argument -checkaddrman=100 on mainnet. Unit tests (including --run_test=addrman_tests) have run successfully.
  60. in src/addrman.cpp:468 in b251f68ba7 outdated
    464 {
    465     AssertLockHeld(cs);
    466 
    467+    // Run consistency checks 1 in m_consistency_check_ratio times if enabled
    468+    if (m_consistency_check_ratio == 0) return 0;
    469+    if (GetRand(m_consistency_check_ratio) >= 1) return 0;
    


    sipa commented at 7:27 pm on July 22, 2021:
    Perhaps use insecure_rand.randrange(m_consistency_check_ratio)? Probably orders of magnitude faster…

    MarcoFalke commented at 7:45 pm on July 22, 2021:
    See previous discussion: #20233 (review)

    jnewbery commented at 8:01 pm on July 22, 2021:
    I think I was mistaken about this causing problems for the unit tests. I’ll try again with insecure_rand tomorrow.

    jnewbery commented at 10:01 am on July 23, 2021:
    Done.
  61. in src/test/addrman_tests.cpp:37 in 21112bd5df outdated
    33@@ -34,7 +34,7 @@ class CAddrManTest : public CAddrMan
    34     //! Ensure that bucket placement is always the same for testing purposes.
    35     void MakeDeterministic()
    36     {
    37-        nKey.SetNull();
    38+        nKey = uint256{1};
    


    mzumsande commented at 11:11 pm on July 22, 2021:
    The same MakeDeterministic() also exists in net_tests.cpp and should be changed there too to prevent net_tests/caddrdb_read from failing (though I’m not sure why these tests are there, looks to me like they should be integrated into addrman_tests.cpp or removed if redundant).

    jnewbery commented at 10:03 am on July 23, 2021:

    Oh good spot! There’s also a deterministic addrman in the fuzzers.

    I’ve added a deterministic argument to the CAddrMan ctor (similar to TxRequestTracker) to consolidate these into one place, so that the tests can just ask for a deterministic addrman rather than constructing and addrman and then updating its internals.

    I agree that those addrman tests in net_tests.cpp should be moved to addrman_tests.cpp, but that can be done in another PR.

  62. jnewbery force-pushed on Jul 23, 2021
  63. jnewbery commented at 10:04 am on July 23, 2021: member
    Thanks for the recent review comments @sipa @MarcoFalke @mzumsande. All comments should now be addressed.
  64. in src/addrman.cpp:440 in 6818d5e82f outdated
    464 {
    465     AssertLockHeld(cs);
    466 
    467+    // Run consistency checks 1 in m_consistency_check_ratio times if enabled
    468+    if (m_consistency_check_ratio == 0) return 0;
    469+    if (insecure_rand.randrange(m_consistency_check_ratio) >= 1) return 0;
    


    MarcoFalke commented at 1:52 pm on July 23, 2021:

    This is still true, right?

    This has the unfortunate effect that the checks are run on the same operations for every run of the unit tests. Having them picked at random means that eventually they’re run at every operation in the unit tests.

    From #20233 (review)


    jnewbery commented at 2:54 pm on July 23, 2021:
    Yes, I think this is true, but I’m not sure how big a problem it is. Before this PR we weren’t running the consistency tests at all in the unit tests.
  65. in src/addrman.h:494 in 6818d5e82f outdated
    488@@ -486,9 +489,15 @@ class CAddrMan
    489         mapAddr.clear();
    490     }
    491 
    492-    CAddrMan()
    493+    CAddrMan(bool deterministic = false,
    494+             int32_t consistency_check_ratio = DEFAULT_ADDRMAN_CONSISTENCY_CHECKS)
    495+        : m_consistency_check_ratio(consistency_check_ratio)
    


    vasild commented at 10:14 am on July 26, 2021:

    nit: initialize insecure_rand here:

    0        : insecure_rand(deterministic), m_consistency_check_ratio(consistency_check_ratio)          
    

    and remove insecure_rand = FastRandomContext(true); from below.

    Also, given that Clear() (called from the constructor) contains nKey = insecure_rand.rand256(); maybe also remove nKey = uint256{1}; from below.


    jnewbery commented at 3:49 pm on July 26, 2021:

    I’ve moved the insecure_rand to the initializer list as suggested.

    I’m not going to remove the nKey assignment in this PR. I’d like to remove the Clear() method from CAddrMan in a future PR. The only place it’s used in the product code is if peers.dat deserialization fails, we’ll call Clear(). I think it’s better just to instantiate a new CAddrMan object in that case, since it’s definitely possible to imagine new data being added to CAddrMan and not added to the Clear() method. It wasn’t previously possible to instantiate a new CAddrMan since it was a global initialized before main(), but now that its lifetime is managed to context.node, we can just reset the unique ptr to a new instance.


    vasild commented at 4:35 pm on July 26, 2021:
    Maybe also adding explicit would make sense.

    jnewbery commented at 8:46 am on July 27, 2021:
    Made the ctor explicit.
  66. vasild approved
  67. vasild commented at 12:43 pm on July 26, 2021: member

    ACK 6818d5e82f0a2ce5a7092c5a2b9d0760eca60866

    @vasild how many samples did you take?

    Two from the PR and two with master and the variance seemed low. However when I run this again five times I see the timing wildly varies between 5min and 8min for both PR and master. Btw that is a debug build, not meant to be a performance benchmark but rather how long would it take on the dev’s machines and on CI (which usually run debug builds).

  68. jnewbery force-pushed on Jul 26, 2021
  69. jnewbery commented at 4:00 pm on July 26, 2021: member

    Thanks for the review @vasild! I’ve taken your suggestion of setting insecure_rand in the initializer list.

    I’ve also removed the default args for the CAddrMan constructor, since default args are evil and can hide subtle bugs.

  70. in src/bench/addrman.cpp:85 in 24515fee60 outdated
    81@@ -82,7 +82,7 @@ static void AddrManAdd(benchmark::Bench& bench)
    82 
    83 static void AddrManSelect(benchmark::Bench& bench)
    84 {
    85-    CAddrMan addrman;
    86+   CAddrMan addrman(/* deterministic */ false, /* consistency_checks */ 0);
    


    vasild commented at 4:17 pm on July 26, 2021:
    nit: here and below - indentation of 3 spaces

    jnewbery commented at 8:45 am on July 27, 2021:
    oops. Fixed.
  71. in src/test/fuzz/connman.cpp:28 in 24515fee60 outdated
    24@@ -25,7 +25,7 @@ FUZZ_TARGET_INIT(connman, initialize_connman)
    25 {
    26     FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    27     SetMockTime(ConsumeTime(fuzzed_data_provider));
    28-    CAddrMan addrman;
    29+    CAddrMan addrman(/* deterministic */ false, /* consistency_checks */ 0);
    


    vasild commented at 4:32 pm on July 26, 2021:

    In the previous incarnation of this PR which did not pass consistency_checks explicitly: CAddrMan addrman(/* deterministic */ false) it would have equaled to DEFAULT_ADDRMAN_CONSISTENCY_CHECKS.

    In the latest incarnation (24515fee60) 0 is passed explicitly.

    This contains the assumption that DEFAULT_ADDRMAN_CONSISTENCY_CHECKS is 0 which is true now, but could change in the future.

    I think it is good to keep using DEFAULT_ADDRMAN_CONSISTENCY_CHECKS as default value instead of 0. That is, either call CAddrMan addrman(/* deterministic */ false, DEFAULT_ADDRMAN_CONSISTENCY_CHECKS) in all places or restore the default second argument, which defaults to DEFAULT_ADDRMAN_CONSISTENCY_CHECKS.


    jnewbery commented at 8:49 am on July 27, 2021:

    It’s the other way around. The previous version implicitly assumed that the DEFAULT_ADDRMAN_CONSISTENCY_CHECKS was 0. This line is explicitly saying that we want an addrman without consistency checks.

    If we used DEFAULT_ADDRMAN_CONSISTENCY_CHECKS explicitly in the call sites and that value changes in future (unlikely I think, since we don’t want these debug checks enabled in production), then all call sites would have to be checked to make sure that the change in behavior is what is wanted. For example, we certainly wouldn’t want to enable the consistency checks in benchmarks, since it’d totally skew the measurements.


    vasild commented at 1:28 pm on July 27, 2021:
    I see, makes sense, thanks for the elaboration!
  72. jnewbery deleted a comment on Jul 26, 2021
  73. jnewbery force-pushed on Jul 27, 2021
  74. vasild approved
  75. vasild commented at 1:29 pm on July 27, 2021: member
    ACK 0f959391190d973f09ea8ba5d4f34a87e2164040
  76. vasild commented at 10:54 am on July 30, 2021: member

    Before this PR, it was possible to run all unit and fuzz tests with checks enabled on every step (when compiled with DEBUG_ADDRMAN). One had to recompile, but changing the source code was not necessary.

    With this PR, addrman unit tests are run with checks on every 100th step, other unit tests and fuzz tests have the checks disabled. To change that (to enable the checks or get them more often than once per 100), a modification of the source code is necessary on the following lines:

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/addrman_tests.cpp#L25

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/fuzz/addrman.cpp#L29

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/fuzz/connman.cpp#L28

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/fuzz/data_stream.cpp#L24

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/fuzz/deserialize.cpp#L202

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/net_tests.cpp#L38

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/net_tests.cpp#L122

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/net_tests.cpp#L139

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/net_tests.cpp#L153

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/net_tests.cpp#L169

    https://github.com/bitcoin/bitcoin/blob/0f959391190d973f09ea8ba5d4f34a87e2164040/src/test/util/setup_common.cpp#L196

    What about replacing those hardcoded 0s (no checks) with a constant DEFAULT_ADDRMAN_CONSISTENCY_CHECKS_IN_TESTS that can be easily changed in order to enable the checks? Or maybe have two constants ..._IN_ADDRMAN_TESTS=100 and ..._IN_TESTS=0.

    Or replace the 0s with something like m_args.GetArg("-checkaddrman", 0) so that the checks can be enabled without modifying the source code, e.g.:

    0./src/test/test_bitcoin --run_test="net_tests/*" -- -checkaddrman=5
    
  77. DrahtBot added the label Needs rebase on Aug 2, 2021
  78. jnewbery force-pushed on Aug 2, 2021
  79. jnewbery force-pushed on Aug 2, 2021
  80. jnewbery commented at 1:09 pm on August 2, 2021: member

    Rebased and taken the fix from #22601.

    What about replacing those hardcoded 0s (no checks) with a constant DEFAULT_ADDRMAN_CONSISTENCY_CHECKS_IN_TESTS that can be easily changed in order to enable the checks? Or maybe have two constants …_IN_ADDRMAN_TESTS=100 and …_IN_TESTS=0.

    Or replace the 0s with something like m_args.GetArg("-checkaddrman", 0) so that the checks can be enabled without modifying the source code. @vasild - This can be done in a follow-up PR if desired. This PR has already gone through multiple rounds of review, so I’d like to settle on a final version of it now.

  81. DrahtBot removed the label Needs rebase on Aug 2, 2021
  82. theStack commented at 8:31 pm on August 2, 2021: member

    Concept ACK

    This approach seems to resolve a lot of potential headache caused by the fact that barely anyone ever builds with DEBUG_ADDRMAN (see #21940 (comment) for an example where this caused problems) and we don’t have a CI instance running with it. The only slight drawback I could think of this solution is that the (released) bitcoind binary is a little bit larger compared to the current compile-time flag approach, containing code that the average user will never run. OTOH the fact that with this the tests can trigger regular addrman consistency checks with the default build outweights this by far :)

  83. in src/addrman.h:126 in 7883160288 outdated
    121@@ -119,7 +122,7 @@ class CAddrInfo : public CAddress
    122  *        tried ones) is evicted from it, back to the "new" buckets.
    123  *    * Bucket selection is based on cryptographic hashing, using a randomly-generated 256-bit key, which should not
    124  *      be observable by adversaries.
    125- *    * Several indexes are kept for high performance. Defining DEBUG_ADDRMAN will introduce frequent (and expensive)
    126+ *    * Several indexes are kept for high performance. Setting m_consistency_check_ratio will introduce (expensive)
    127  *      consistency checks for the entire data structure.
    


    jonatack commented at 4:16 pm on August 4, 2021:

    7883160 suggestion

    0- *    * Several indexes are kept for high performance. Setting m_consistency_check_ratio will introduce (expensive)
    1- *      consistency checks for the entire data structure.
    2+ *    * Several indexes are kept for high performance. Setting m_consistency_check_ratio with the -checkaddrman
    3+        configuration option will introduce (expensive) consistency checks for the entire data structure.
    

    jnewbery commented at 3:16 pm on August 5, 2021:
    done
  84. in src/init.cpp:505 in 7883160288 outdated
    500@@ -501,6 +501,7 @@ void SetupServerArgs(ArgsManager& argsman)
    501     argsman.AddArg("-checkblocks=<n>", strprintf("How many blocks to check at startup (default: %u, 0 = all)", DEFAULT_CHECKBLOCKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    502     argsman.AddArg("-checklevel=<n>", strprintf("How thorough the block verification of -checkblocks is: %s (0-4, default: %u)", Join(CHECKLEVEL_DOC, ", "), DEFAULT_CHECKLEVEL), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    503     argsman.AddArg("-checkblockindex", strprintf("Do a consistency check for the block tree, chainstate, and other validation data structures occasionally. (default: %u, regtest: %u)", defaultChainParams->DefaultConsistencyChecks(), regtestChainParams->DefaultConsistencyChecks()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    504+    argsman.AddArg("-checkaddrman", strprintf("Do consistency checks on the addrman every <n> operations. Use 0 to disable. (default: %u)", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    505     argsman.AddArg("-checkmempool=<n>", strprintf("Run checks every <n> transactions (default: %u, regtest: %u)", defaultChainParams->DefaultConsistencyChecks(), regtestChainParams->DefaultConsistencyChecks()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    


    jonatack commented at 4:19 pm on August 4, 2021:

    7883160 missng =<n>? (and optional suggestions for the descriptions)

    0-    argsman.AddArg("-checkaddrman", strprintf("Do consistency checks on the addrman every <n> operations. Use 0 to disable. (default: %u)", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    1-    argsman.AddArg("-checkmempool=<n>", strprintf("Run checks every <n> transactions (default: %u, regtest: %u)", defaultChainParams->DefaultConsistencyChecks(), regtestChainParams->DefaultConsistencyChecks()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    2+    argsman.AddArg("-checkaddrman=<n>", strprintf("Run addrman consistency checks every <n> operations. Use 0 to disable. (default: %u)", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    3+    argsman.AddArg("-checkmempool=<n>", strprintf("Run mempool consistency checks every <n> transactions (default: %u, regtest: %u)", defaultChainParams->DefaultConsistencyChecks(), regtestChainParams->DefaultConsistencyChecks()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
    

    jnewbery commented at 3:17 pm on August 5, 2021:
    It’s the same default for both mainnet and regtest (disabled).

    jnewbery commented at 11:04 am on August 11, 2021:
    Done
  85. in src/test/net_tests.cpp:108 in 7883160288 outdated
    100@@ -105,7 +101,6 @@ BOOST_AUTO_TEST_CASE(cnode_listen_port)
    101 BOOST_AUTO_TEST_CASE(caddrdb_read)
    102 {
    103     CAddrManUncorrupted addrmanUncorrupted;
    104-    addrmanUncorrupted.MakeDeterministic();
    


    jonatack commented at 4:21 pm on August 4, 2021:

    579584d Should the 4 CAddrMan ctors that follow pass deterministic true instead of false?

    0-    CAddrMan addrman1(/* deterministic */ false, /* consistency_checks */ 0);
    1+    CAddrMan addrman1(/* deterministic */ true, /* consistency_checks */ 0);
    

    jnewbery commented at 3:18 pm on August 5, 2021:
    I don’t think so. They weren’t deterministic before.
  86. in src/bench/addrman.cpp:118 in 7883160288 outdated
    116 
    117-    std::vector<CAddrMan> addrmans(bench.epochs() * bench.epochIterations());
    118-    for (auto& addrman : addrmans) {
    119-        FillAddrMan(addrman);
    120+    std::vector<std::unique_ptr<CAddrMan>> addrmans(addrman_count);
    121+    for (size_t i{0}; i < addrman_count; i++) {
    


    jonatack commented at 4:23 pm on August 4, 2021:

    579584d nit, prefer prefix iterator (idem in addrman_tests.cpp)

    0    for (size_t i{0}; i < addrman_count; ++i) {
    

    jnewbery commented at 3:21 pm on August 5, 2021:
    Fixed here. I don’t think I’ve added any i++ in addrman_tests.cpp.
  87. in src/bench/addrman.cpp:119 in 7883160288 outdated
    117-    std::vector<CAddrMan> addrmans(bench.epochs() * bench.epochIterations());
    118-    for (auto& addrman : addrmans) {
    119-        FillAddrMan(addrman);
    120+    std::vector<std::unique_ptr<CAddrMan>> addrmans(addrman_count);
    121+    for (size_t i{0}; i < addrman_count; i++) {
    122+        addrmans[i] = std::make_unique<CAddrMan>(/* deterministic */ false, /* consistency_checks */ 0);
    


    jonatack commented at 4:24 pm on August 4, 2021:
    579584d for benchmarks would it be preferable to pass deterministic true?

    jnewbery commented at 3:22 pm on August 5, 2021:
    Perhaps, but that’s a change in behaviour. Maybe we can do it in a follow up PR.
  88. in src/bench/addrman.cpp:115 in 7883160288 outdated
    111@@ -112,10 +112,12 @@ static void AddrManGood(benchmark::Bench& bench)
    112      * we want to do the same amount of work in every loop iteration. */
    113 
    114     bench.epochs(5).epochIterations(1);
    115+    size_t addrman_count = bench.epochs() * bench.epochIterations();
    


    jonatack commented at 4:24 pm on August 4, 2021:
    579584d nit, can be const

    jnewbery commented at 3:23 pm on August 5, 2021:
    done
  89. in src/addrman.h:708 in 7883160288 outdated
    696@@ -691,6 +697,9 @@ class CAddrMan
    697     //! Holds addrs inserted into tried table that collide with existing entries. Test-before-evict discipline used to resolve these collisions.
    698     std::set<int> m_tried_collisions;
    699 
    700+    /** Perform consistency checks every m_consistency_check_ratio operations (if non-zero). */
    701+    const int32_t m_consistency_check_ratio;
    


    jonatack commented at 4:29 pm on August 4, 2021:

    7883160 negative values would be meaningless and value is clamped in range 0…1000000; could be unsigned

     0--- a/src/addrman.h
     1+++ b/src/addrman.h
     2@@ -27,7 +27,7 @@ 
     3 /** Default for -checkaddrman */
     4 /** Default for -checkaddrman */
     5-static constexpr int32_t DEFAULT_ADDRMAN_CONSISTENCY_CHECKS{0};
     6+static constexpr uint32_t DEFAULT_ADDRMAN_CONSISTENCY_CHECKS{0};
     7 
     8@@ -495,7 +495,7 @@ public: 
     9-    explicit CAddrMan(bool deterministic, int32_t consistency_check_ratio)
    10+    explicit CAddrMan(bool deterministic, uint32_t consistency_check_ratio)
    11
    12@@ -704,7 +704,7 @@ private:
    13     /** Perform consistency checks every m_consistency_check_ratio operations (if non-zero). */
    14-    const int32_t m_consistency_check_ratio;
    15+    const uint32_t m_consistency_check_ratio;
    16
    17--- a/src/init.cpp
    18+++ b/src/init.cpp
    19@@ -1165,7 +1165,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    20 
    21     assert(!node.addrman);
    22-    auto check_addrman = std::clamp<int32_t>(args.GetArg("-checkaddrman", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), 0, 1000000);
    23+    auto check_addrman = std::clamp<uint32_t>(args.GetArg("-checkaddrman", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), 0, 1000000);
    

    jnewbery commented at 3:24 pm on August 5, 2021:
    Since we’re already clamping the value, I don’t think it matters.
  90. jonatack commented at 4:51 pm on August 4, 2021: member

    ACK 7883160288226ddc9a682e48647efef051a6286e code review, rebased to current master, and at each commit debug-built with CPPFLAGS="-DDEBUG_ADDRMAN" and ran net_tests, addrman_tests, and ./src/bench/bench_bitcoin -filter=AddrManGood. A number of suggestions below for this PR or follow-ups. Happy to re-ACK if any are taken.

    Noting here these follow-ups in the comments that look worthwhile:

  91. jonatack commented at 4:54 pm on August 4, 2021: member
    In 0d2d0de the commit message states “addrman_tests fail when consistency checks are enabled”…I was unable to reproduce this failure in the preceding commits. Is it still the case, should the message be updated, can the commit be dropped?
  92. in src/init.cpp:1164 in 7883160288 outdated
    1159@@ -1159,7 +1160,8 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1160     const bool ignores_incoming_txs{args.GetBoolArg("-blocksonly", DEFAULT_BLOCKSONLY)};
    1161 
    1162     assert(!node.addrman);
    1163-    node.addrman = std::make_unique<CAddrMan>();
    1164+    auto check_addrman = std::clamp<int32_t>(args.GetArg("-checkaddrman", DEFAULT_ADDRMAN_CONSISTENCY_CHECKS), 0, 1000000);
    1165+    node.addrman = std::make_unique<CAddrMan>(/* determinstic */ false, /* consistency_checks */ check_addrman);
    


    jonatack commented at 4:58 pm on August 4, 2021:

    78831602

    0$ test/lint/lint-spelling.sh 
    1src/init.cpp:1164: determinstic ==> deterministic
    

    jnewbery commented at 3:29 pm on August 5, 2021:
    fixed
  93. in src/test/addrman_tests.cpp:259 in 0d2d0dee48 outdated
    255@@ -256,24 +256,27 @@ BOOST_AUTO_TEST_CASE(addrman_new_collisions)
    256 
    257     CNetAddr source = ResolveIP("252.2.2.2");
    258 
    259-    BOOST_CHECK_EQUAL(addrman.size(), 0U);
    260+    uint32_t addrs{0};
    


    amitiuttarwar commented at 10:39 pm on August 4, 2021:
    nit: the variable name addrs makes me think its a vector since that’s a common naming pattern in the codebase, so I’d prefer if this was something like num_addrs.

    jnewbery commented at 4:07 pm on August 5, 2021:
    done!
  94. in src/test/fuzz/addrman.cpp:29 in 7883160288 outdated
    25@@ -26,7 +26,7 @@ FUZZ_TARGET_INIT(addrman, initialize_addrman)
    26 {
    27     FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
    28     SetMockTime(ConsumeTime(fuzzed_data_provider));
    29-    CAddrMan addr_man(/* deterministic */ true);
    30+    CAddrMan addr_man(/* deterministic */ true, /* consistency_checks */ 0);
    


    amitiuttarwar commented at 11:12 pm on August 4, 2021:

    Could it make sense to enable the consistency checks here?

    I saw the conversation here #20233 (comment), where you said:

    I think somewhere that they’re definitely useful would be in fuzz testing, to ensure that there’s no way to violate addrman’s invariants.

    So wondering why you landed on disabling.


    jnewbery commented at 3:29 pm on August 5, 2021:
    Added here and to the other fuzzers.
  95. amitiuttarwar commented at 11:24 pm on August 4, 2021: contributor

    light code review ACK 7883160288

    I think its great that, with this PR, the addrman unit tests would run the consistency checks, and it’s easier for developers to manually run or to potentially incorporate into the functional tests.

  96. Add missing const to CAddrMan::Check_()
    Also: Always compile the function signature to avoid similar issues in
    the future.
    ee458d84fc
  97. DrahtBot added the label Needs rebase on Aug 5, 2021
  98. [addrman] Add deterministic argument to CAddrMan ctor
    Removes the need for tests to update nKey and insecure_rand after constructing
    a CAddrMan.
    fa9710f62c
  99. [tests] Make deterministic addrman use nKey = 1
    addrman_tests fail when consistency checks are enabled, since the tests
    set the deterministic test addrman's nKey value to zero, which is an
    invalid value. Change this so that deterministic addrman's nKey value is
    set to 1.
    
    This requires updating a few tests that are using magic values derived
    from nKey being set to 0.
    10aac24145
  100. jnewbery force-pushed on Aug 5, 2021
  101. jnewbery commented at 4:15 pm on August 5, 2021: member
    Rebased and addressed review comments from @jonatack and @amitiuttarwar
  102. DrahtBot removed the label Needs rebase on Aug 5, 2021
  103. jnewbery commented at 5:17 pm on August 5, 2021: member

    In 0d2d0de the commit message states “addrman_tests fail when consistency checks are enabled”…I was unable to reproduce this failure in the preceding commits. Is it still the case, should the message be updated, can the commit be dropped? @jonatack - I think it’s still true. Try enabling DEBUG_ADDRMAN and running the addrman tests here: https://github.com/jnewbery/bitcoin/tree/2021-08-fail-addrman-tests

    The first commit fixes the build so it’s possible to build with DEBUG_ADDRMAN. The second commit aborts if Check() fails, rather than just logging.

    Without the Make deterministic addrman use nKey = 1 commit, then when we make addrman checks a runtime option and add the assert, then the tests would fail.

  104. jonatack commented at 5:48 pm on August 5, 2021: member
    Thanks @jnewbery, I’ll have a look (once the guix build finishes ⏰)
  105. amitiuttarwar commented at 6:55 pm on August 6, 2021: contributor
    reACK 00fd0891ce
  106. jonatack commented at 10:14 am on August 7, 2021: member

    ACK 00fd0891ce799084fdd137fafc0522c27fbfb429 per git range-diff 03826ae 7883160 00fd089 modulo the addrman fuzzer crash

     0SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior serialize.h:205:93 in 
     1fuzz: ./addrman.h:754: void CAddrMan::Check() const: Assertion `false' failed.
     2==62677== ERROR: libFuzzer: deadly signal
     3    [#0](/bitcoin-bitcoin/0/) 0x55f7ef82d671 in __sanitizer_print_stack_trace (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0x1000671)
     4    [#1](/bitcoin-bitcoin/1/) 0x55f7ef775008 in fuzzer::PrintStackTrace() fuzzer.o
     5    [#2](/bitcoin-bitcoin/2/) 0x55f7ef758ef3 in fuzzer::Fuzzer::CrashCallback() fuzzer.o
     6    [#3](/bitcoin-bitcoin/3/) 0x7fe5896ab13f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1413f)
     7    [#4](/bitcoin-bitcoin/4/) 0x7fe589317ce0 in __libc_signal_restore_set signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
     8    [#5](/bitcoin-bitcoin/5/) 0x7fe589317ce0 in raise signal/../sysdeps/unix/sysv/linux/raise.c:48:3
     9    [#6](/bitcoin-bitcoin/6/) 0x7fe589301536 in abort stdlib/abort.c:79:7
    10    [#7](/bitcoin-bitcoin/7/) 0x7fe58930140e in __assert_fail_base assert/assert.c:92:3
    11    [#8](/bitcoin-bitcoin/8/) 0x7fe589310661 in __assert_fail assert/assert.c:101:3
    12    [#9](/bitcoin-bitcoin/9/) 0x55f7ef86ed17 in CAddrMan::Check() const ./addrman.h:754:13
    13    [#10](/bitcoin-bitcoin/10/) 0x55f7ef866d2c in CAddrMan::Select(bool) const ./addrman.h:601:9
    14    [#11](/bitcoin-bitcoin/11/) 0x55f7ef8638e1 in addrman_fuzz_target(Span<unsigned char const>) test/fuzz/addrman.cpp:306:26
    15    [#12](/bitcoin-bitcoin/12/) 0x55f7ef85b9d5 in void std::__invoke_impl<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >(std::__invoke_other, void (*&)(Span<unsigned char const>), Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14
    16    [#13](/bitcoin-bitcoin/13/) 0x55f7ef85b89d in std::enable_if<is_invocable_r_v<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >, void>::type std::__invoke_r<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >(void (*&)(Span<unsigned char const>), Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:110:2
    17    [#14](/bitcoin-bitcoin/14/) 0x55f7ef85b608 in std::_Function_handler<void (Span<unsigned char const>), void (*)(Span<unsigned char const>)>::_M_invoke(std::_Any_data const&, Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9
    18    [#15](/bitcoin-bitcoin/15/) 0x55f7f078934d in std::function<void (Span<unsigned char const>)>::operator()(Span<unsigned char const>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14
    19    [#16](/bitcoin-bitcoin/16/) 0x55f7f0788ff8 in LLVMFuzzerTestOneInput test/fuzz/fuzz.cpp:91:5
    20    [#17](/bitcoin-bitcoin/17/) 0x55f7ef75a783 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) fuzzer.o
    21    [#18](/bitcoin-bitcoin/18/) 0x55f7ef759c8a in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) fuzzer.o
    22    [#19](/bitcoin-bitcoin/19/) 0x55f7ef75c084 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) fuzzer.o
    23    [#20](/bitcoin-bitcoin/20/) 0x55f7ef75c299 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) fuzzer.o
    24    [#21](/bitcoin-bitcoin/21/) 0x55f7ef74bae7 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) fuzzer.o
    25    [#22](/bitcoin-bitcoin/22/) 0x55f7ef7757c2 in main (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0xf487c2)
    26    [#23](/bitcoin-bitcoin/23/) 0x7fe589302d09 in __libc_start_main csu/../csu/libc-start.c:308:16
    27    [#24](/bitcoin-bitcoin/24/) 0x55f7ef722969 in _start (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0xef5969)
    28
    29NOTE: libFuzzer has rudimentary signal handlers.
    30      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
    31SUMMARY: libFuzzer: deadly signal
    32MS: 0 ; base unit: 0000000000000000000000000000000000000000
    330x0,0x0,0x0,0x0,0x0,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xe1,0xdc,0xdc,0xdc,0xdc,0xdc,0xdc,0xdc,0x23,0x23,0x23,0xbb,0x1,0x0,0x0,0x23,0x27,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x40,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x0,0x0,0x0,0x0,0x1,0x0,0x0,0x0,0x0,0x0,0xf,0xda,0x3f,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x23,0x23,0x0,0x0,0x0,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xff,0xff,0xff,0x1f,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xb4,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x23,0x23,0x23,0x23,0x23,0x23,0xe1,0xdc,0xdc,0xdc,0xdc,0x32,0xa3,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x41,0x0,0xff,0x0,0x0,0xff,0x0,0x0,0x2b,
    34\x00\x00\x00\x00\x00#############\xe1\xdc\xdc\xdc\xdc\xdc\xdc\xdc###\xbb\x01\x00\x00#'#############@#################\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x0f\xda?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00##\x00\x00\x00########\xff\xff\xff\x1f####################\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xb4\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff######\xe1\xdc\xdc\xdc\xdc2\xa3\x01\x00\x00\x00\x00\x00\x00A\x00\xff\x00\x00\xff\x00\x00+
    35artifact_prefix='./'; Test unit written to ./crash-d20c926b04d9e4e2657d97d3c5e77bde0e9a9daa
    36Base64: AAAAAAAjIyMjIyMjIyMjIyMj4dzc3Nzc3NwjIyO7AQAAIycjIyMjIyMjIyMjIyMjQCMjIyMjIyMjIyMjIyMjIyMjAAAAAAEAAAAAAA/aPwAAAAAAAAAAAAAAIyMAAAAjIyMjIyMjI////x8jIyMjIyMjIyMjIyMjIyMjIyMjI/T09PT09PT09LT///////////////////8jIyMjIyPh3Nzc3DKjAQAAAAAAAEEA/wAA/wAAKw==
    

    Don’t hesitate to reconsider taking the suggestion at #20233 (review).

    Without the Make deterministic addrman use nKey = 1 commit, then when we make addrman checks a runtime option and add the assert, then the tests would fail.

    Got it. You are right–I dropped that commit and the addrman unit tests fail.

  107. jnewbery force-pushed on Aug 11, 2021
  108. jnewbery commented at 11:11 am on August 11, 2021: member

    I’ve disabled the consistency checks again in the addrman fuzz test. I think the problem is that the fuzzer will call Clear(), and subsequent calls to Check() may fail, since Clear() may leave the CAddrMan in a slightly inconsistent state.

    I have a branch that removes Clear() at https://github.com/jnewbery/bitcoin/tree/2021-08-remove-addrman-clear, which I’ll open after this is merged, and then re-enable consistency checks in the fuzzer.

  109. [addrman] Make addrman consistency checks a runtime option
    Currently addrman consistency checks are a compile time option, and are not
    enabled in our CI. It's unlikely anyone is running these consistency checks.
    
    Make them a runtime option instead, where users can enable addrman
    consistency checks every n operations (similar to mempool tests). Update
    the addrman unit tests to do internal consistency checks every 100
    operations (checking on every operations causes the test runtime to
    increase by several seconds).
    
    Also assert on a failed addrman consistency check to terminate program
    execution.
    a4d78546b0
  110. jnewbery force-pushed on Aug 12, 2021
  111. jonatack commented at 5:02 pm on August 12, 2021: member

    ACK a4d78546b0858602c60c03fdf8b35ca666ab2e56 per git diff 00fd089 a4d7854, tested by adding logging similar to #22479 and running with -checkaddrman=<n> for various values 0/1/10/100 etc, tested the updated docs with bitcoind -help-debug | grep -A2 "checkaddrman\|checkmempool" and verified rebased on master that compiling with CPPFLAGS="-DDEBUG_ADDRMAN" no longer causes the build to error.

    0src/addrman.cpp
    1@@ -438,6 +438,7 @@ int CAddrMan::Check_() const
    2     // Run consistency checks 1 in m_consistency_check_ratio times if enabled
    3     if (m_consistency_check_ratio == 0) return 0;
    4     if (insecure_rand.randrange(m_consistency_check_ratio) >= 1) return 0;
    5+    LogPrintf("Check addrman\n");
    
    0$ ./src/bitcoind -help-debug | grep -A2 "checkaddrman\|checkmempool"
    1  -checkaddrman=<n>
    2       Run addrman consistency checks every <n> operations. Use 0 to disable.
    3       (default: 0)
    4--
    5  -checkmempool=<n>
    6       Run mempool consistency checks every <n> transactions. Use 0 to disable.
    7       (default: 0, regtest: 1)
    
  112. mzumsande commented at 9:39 pm on August 12, 2021: member
    Code-review ACK a4d78546b0858602c60c03fdf8b35ca666ab2e56 Also did some testing with different values of -checkaddrman and it worked as expected.
  113. in src/test/addrman_tests.cpp:876 in 10aac24145 outdated
    877+    BOOST_CHECK(addrman.Add(CAddress(addr36, NODE_NONE), source));
    878+    addrman.Good(addr36);
    879 
    880-    BOOST_CHECK(addrman.size() == 23);
    881-    BOOST_CHECK(addrman.SelectTriedCollision().ToString() == "250.1.1.19:0");
    882+    BOOST_CHECK(addrman.size() == 36);
    


    theStack commented at 8:04 am on August 13, 2021:
    0    BOOST_CHECK_EQUAL(addrman.size(), 36);
    
  114. in src/test/addrman_tests.cpp:904 in 10aac24145 outdated
    907-    BOOST_CHECK(!addrman.Add(CAddress(addr23, NODE_NONE), source));
    908-    addrman.Good(addr23);
    909-    BOOST_CHECK(addrman.size() == 33);
    910+    BOOST_CHECK(!addrman.Add(CAddress(addr36, NODE_NONE), source));
    911+    addrman.Good(addr36);
    912+    BOOST_CHECK(addrman.size() == 59);
    


    theStack commented at 8:05 am on August 13, 2021:
    0    BOOST_CHECK_EQUAL(addrman.size(), 59);
    
  115. theStack approved
  116. theStack commented at 8:23 am on August 13, 2021: member

    Code-review ACK a4d78546b0858602c60c03fdf8b35ca666ab2e56

    Agree with vasild’s idea of introducing constants for the consistency_checks CAddrMan ctor param in unit tests, which can be done in a follow-up: #20233 (comment)

    Also left two yocto-nits below regarding unit test assertion style, probably also best picked up in a follow-up.

  117. fanquake commented at 9:02 am on August 13, 2021: member
    The various nits/changes mentioned can be done in a followup. Either separately, or potentially as part of the Clear() branch.
  118. fanquake referenced this in commit 803ef70fd9 on Aug 13, 2021
  119. mzumsande commented at 9:11 am on August 13, 2021: member

    I’ve disabled the consistency checks again in the addrman fuzz test. I think the problem is that the fuzzer will call Clear(), and subsequent calls to Check() may fail, since Clear() may leave the CAddrMan in a slightly inconsistent state.

    I have a branch that removes Clear() at https://github.com/jnewbery/bitcoin/tree/2021-08-remove-addrman-clear, which I’ll open after this is merged, and then re-enable consistency checks in the fuzzer.

    Not sure if this is sufficient. Since #22493, we have a fuzz test that deserializes from random data, and Unserialize() doesn’t seem to be designed to prevent the finer inconsistencies from occurring that Check() checks for. I verified that the fuzz inputs from #22503 and #22519 still crash when running on this PR with consistency_check_ratio=1 (#22504 doesn’t for some reason) `

  120. fanquake closed this on Aug 13, 2021

  121. jnewbery deleted the branch on Aug 13, 2021
  122. sidhujag referenced this in commit 1416e23ca6 on Aug 15, 2021
  123. vasild commented at 12:24 pm on October 27, 2021: member

    … replace the 0s with something like m_args.GetArg("-checkaddrman", 0) so that the checks can be enabled without modifying the source code.

    @vasild - This can be done in a follow-up PR …

    Done in #23373.

  124. MarcoFalke referenced this in commit d0bf9bb6a5 on Jan 17, 2022
  125. fanquake deleted a comment on Mar 31, 2022
  126. Fabcien referenced this in commit ffb9f52db9 on Oct 17, 2022
  127. Fabcien referenced this in commit 470b579bbb on Oct 17, 2022
  128. Fabcien referenced this in commit 88622d9c02 on Oct 17, 2022
  129. DrahtBot locked this on Mar 31, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me