p2p: track AddrMan totals by network and table, improve precision of adding fixed seeds #26847

pull mzumsande wants to merge 4 commits into bitcoin:master from mzumsande:202212_addrman_tracknets changing 8 files +182 βˆ’62
  1. mzumsande commented at 9:45 pm on January 8, 2023: contributor

    AddrMan currently doesn’t track the number of its entries by network, it only knows the total number of addresses. This PR makes AddrMan keep track of these numbers, which would be helpful for multiple things:

    1. Allow to specifically add fixed seeds to AddrMan of networks where we don’t have any addresses yet - even if AddrMan as a whole is not empty (partly fixing #26035). This is in particular helpful if the user abruptly changes -onlynet settings (such that addrs that used to be reachable are no longer and vice versa), in which case they currently could get stuck and not find any outbound peers. The second commit of this PR implements this.
    2. (Future work): Add logic for automatic connection management with respect to networks - such as making attempts to have at least one connection to each reachable network as suggested here. This would involve requesting an address from a particular network from AddrMan, and expanding its corresponding function AddrMan::Select() to do this requires internal knowledge of the current number of addresses for each network and table to avoid getting stuck in endless loops.
    3. (Future work): Perhaps display the totals to users. At least I would find this helpful to debug, the existing option (./bitcoin-cli -addrinfo) is rather indirect by doing the aggregation itself in each call, doesn’t distinguish between new and tried, and being based on AddrMan::GetAddr() it’s also subject to a quality filter which we probably don’t want in this spot.
  2. DrahtBot commented at 9:45 pm on January 8, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK naumenkogs, stratospher, vasild, achow101
    Concept ACK amitiuttarwar, brunoerg, sipa

    If your review is incorrectly listed, please react with πŸ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26261 (p2p: cleanup LookupIntern, Lookup and LookupHost by brunoerg)
    • #26114 (net: Make AddrFetch connections to fixed seeds by mzumsande)
    • #25284 (consensus: Remove dependency on ADDRV2_FORMAT by MarcoFalke)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Jan 8, 2023
  4. amitiuttarwar commented at 9:56 pm on January 8, 2023: contributor
    approach ACK. relatively simple patch to ameliorate the issue of address discovery when switching networks
  5. in src/addrman.cpp:1104 in db92c1bb6c outdated
    1104+    for (auto net : map_counts) {
    1105+        cnt_new += net.second.first;
    1106+        cnt_tried += net.second.second;
    1107+    }
    1108+    if (cnt_new != nNew || cnt_tried != nTried) {
    1109+        return -20;
    


    naumenkogs commented at 8:11 am on January 9, 2023:
    Perhaps it’s a good opportunity to start using a named variable? :)

    mzumsande commented at 7:41 pm on January 9, 2023:
    Do you mean getting rid of the return codes and return something like error strings instead? Hmm, not sure if this wouldn’t better be a separate PR.

    naumenkogs commented at 8:29 am on January 10, 2023:
    I was thinking not touching the legacy code, but for the newly added -20 create an enum. And the legacy codes could be replaced later on. This is not very important to me, feel free to resolve if you don’t like the idea.

    mzumsande commented at 10:19 pm on January 11, 2023:
    ok - I think refactoring the codes might make sense, but if so I’d prefer to do this everywhere at once (and in a separate refactoring PR)
  6. in src/addrman.cpp:990 in db92c1bb6c outdated
    985+    }
    986+    if (auto it = map_counts.find(*net); it != map_counts.end()) {
    987+        auto entry = it->second;
    988+        if (!in_new.has_value()) {
    989+            return entry.first + entry.second;
    990+        } else if (in_new == true) {
    


    naumenkogs commented at 8:13 am on January 9, 2023:
    Here and above, maybe use return (in_new == true) ? entry.first : entry.second;?

    mzumsande commented at 7:41 pm on January 9, 2023:
    Changed it to the ternary operator.
  7. in src/addrman.cpp:1104 in db92c1bb6c outdated
    1103+    int cnt_tried{0};
    1104+    for (auto net : map_counts) {
    1105+        cnt_new += net.second.first;
    1106+        cnt_tried += net.second.second;
    1107+    }
    1108+    if (cnt_new != nNew || cnt_tried != nTried) {
    


    naumenkogs commented at 8:15 am on January 9, 2023:
    Given this dependency, should we drop nNew and nTried altogether?

    mzumsande commented at 7:45 pm on January 9, 2023:
    I think that’s not so straightforward - nNew and nTried are part of the serialization after all. Also, I don’t like that then we’d need to build the sum over all networks every time we needed nNew or nTried (as the check code does).

    naumenkogs commented at 8:39 am on January 10, 2023:

    Also, I don’t like that then we’d need to build the sum over all networks every time we needed nNew or nTried (as the check code does).

    Well, I don’t think something being inside the Check is a good justification β€”Β it’s the other way around, Check is needed to make sure those values are sane. Furthermore, the sum over all networks is 5 addition operations, right?

    However, I agree that dropping these values might be not trivial due to serialization. Not sure I’ll find time to prototype that, but maybe someone will :) Otherwise feel free to close this suggestion.


    naumenkogs commented at 8:39 am on January 10, 2023:
    Also, see this :)

    mzumsande commented at 10:23 pm on January 11, 2023:
    I’ve changed the check such that it no longer compares with nNew and nTried but does a recount, see below for more context (of course the relation is still there…).
  8. in src/net.h:976 in 85dc0c9734 outdated
    968@@ -969,6 +969,12 @@ class CConnman
    969     void RecordBytesRecv(uint64_t bytes);
    970     void RecordBytesSent(uint64_t bytes) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex);
    971 
    972+    /**
    973+     Return networks for which we have no addresses in addrman and therefore
    974+     may require loading fixed seeds.
    975+     */
    976+    std::vector<Network> GetFixedSeedNetworks() const;
    


    naumenkogs commented at 8:20 am on January 9, 2023:
    I wish for which we have no addresses was captured by the function name somehow. Perhaps GetRelevantFixedSeedNetworks?

    mzumsande commented at 7:46 pm on January 9, 2023:
    Renamed it, pretty sure I even used that name in an earlier iteration.
  9. brunoerg commented at 5:05 pm on January 9, 2023: contributor
    Concept ACK
  10. mzumsande force-pushed on Jan 9, 2023
  11. in src/addrman.cpp:1098 in 4b06a304db outdated
    1094@@ -1065,6 +1095,15 @@ int AddrManImpl::CheckAddrman() const
    1095     if (nKey.IsNull())
    1096         return -16;
    1097 
    1098+    int cnt_new{0};
    


    sipa commented at 9:58 pm on January 9, 2023:
    Nit: maybe use size_t for these two variables, as the values being summed are size_t as well. I realize you can’t actually build an addrman with enough entries to exceed int, but I feel it’s better to be consistent.

    mzumsande commented at 8:03 pm on January 10, 2023:
    That would get me a compiler warning comparison of integer expressions of different signedness, since nNew and nTried are int. But I’ll change the sanity check to compare with the actual totals instead of nNew and nTried, as suggested by you below.
  12. in src/addrman.cpp:981 in 4b06a304db outdated
    976+
    977+    if (!net.has_value()) {
    978+        if (!in_new.has_value()) {
    979+            return nNew + nTried;
    980+        } else {
    981+            return (in_new == true) ? nNew : nTried;
    


    sipa commented at 10:03 pm on January 9, 2023:
    Nit: return *in_new ? nNew : nTried; ?

    mzumsande commented at 10:12 pm on January 11, 2023:
    done
  13. in src/addrman.cpp:989 in 4b06a304db outdated
    984+    if (auto it = map_counts.find(*net); it != map_counts.end()) {
    985+        auto entry = it->second;
    986+        if (!in_new.has_value()) {
    987+            return entry.first + entry.second;
    988+        } else {
    989+            return (in_new == true) ? entry.first : entry.second;
    


    sipa commented at 10:03 pm on January 9, 2023:
    Nit: return *in_new ? entry.first : entry.second;?

    mzumsande commented at 10:12 pm on January 11, 2023:
    done
  14. in src/addrman.cpp:1100 in 4b06a304db outdated
    1094@@ -1065,6 +1095,15 @@ int AddrManImpl::CheckAddrman() const
    1095     if (nKey.IsNull())
    1096         return -16;
    1097 
    1098+    int cnt_new{0};
    1099+    int cnt_tried{0};
    1100+    for (auto net : map_counts) {
    


    sipa commented at 10:05 pm on January 9, 2023:
    It’d be good if the sanity check here actually recomputed the totals per-network (by iterating over the actual addrman entries).

    mzumsande commented at 10:17 pm on January 11, 2023:
    I changed this to actually recompute the totals - while doing so I noticed that it is possible that it’s possible that there are “empty” entries (0 new and 0 tried) in m_map_counts if an addr of some network was first added and later removed again from. That’s not a problem in my opinion, but it means that the size of the map is not necessarily reliable. The changed check should work with this, potentially also adding an entry empty to local_counts.
  15. in src/addrman_impl.h:221 in 4b06a304db outdated
    216@@ -215,6 +217,9 @@ class AddrManImpl
    217     /** Reference to the netgroup manager. netgroupman must be constructed before addrman and destructed after. */
    218     const NetGroupManager& m_netgroupman;
    219 
    220+    /** Number of entries in addrman per network and new/tried table. */
    221+    std::unordered_map<Network, std::pair<size_t, size_t>> map_counts GUARDED_BY(cs);
    


    sipa commented at 10:05 pm on January 9, 2023:
    Nit: naming convention? m_map_counts

    mzumsande commented at 10:11 pm on January 11, 2023:
    renamed
  16. sipa commented at 10:07 pm on January 9, 2023: member
    Concept/approach ACK
  17. mzumsande force-pushed on Jan 11, 2023
  18. in src/addrman.h:106 in 03a094324a outdated
    101@@ -102,6 +102,13 @@ class AddrMan
    102     //! Return the number of (unique) addresses in all tables.
    103     size_t size() const;
    104 
    105+    /**
    106+    * Return the number of (unique) addresses, optionally filtered by network
    


    naumenkogs commented at 8:09 am on January 12, 2023:
    What do you mean by unique here? How they could not be unique?

    mzumsande commented at 3:24 pm on January 12, 2023:
    I copied it from the line above - it means that addresses with multiple occurrences (they can be in up to 8 different spots in the new tables) are counted only once.
  19. in src/addrman.cpp:1107 in 03a094324a outdated
    1102+    // doesn't have if addrs from a network were being added and then removed again in the past.
    1103+    if (m_map_counts.size() < local_counts.size()) {
    1104+        return -20;
    1105+    }
    1106+    for (const auto& [net, count] : m_map_counts) {
    1107+        if (local_counts[net].first != count.first || local_counts[net].second != count.second) {
    


    naumenkogs commented at 8:20 am on January 12, 2023:
    This is probably too crazy, but I was thinking what if these values overflow (from ++) and are not equal anymore due to UB? I’m actually not sure what’s our attitude towards such risks across the code.

    mzumsande commented at 4:05 pm on January 12, 2023:
    Yes, It definitely shouldn’t be possible and would mean that there is a serious bug in addrman, but I think that within the Check() code, we definitely want to abort. After all, Check() doesn’t normally run in production (DEFAULT_ADDRMAN_CONSISTENCY_CHECKS == 0) and needs to be activated by the user.
  20. in src/addrman.cpp:973 in c75ca24338 outdated
    969@@ -962,6 +970,28 @@ std::optional<AddressPosition> AddrManImpl::FindAddressEntry_(const CAddress& ad
    970     }
    971 }
    972 
    973+size_t AddrManImpl::size_(std::optional<Network> net, std::optional<bool> in_new) const
    


    vasild commented at 2:12 pm on January 13, 2023:
    nit: should be Size_() according to doc/developer-notes.md (and the wrapper Size()). I know there is already an existent size() which you probably wanted to be consistent with. Maybe rename that to match the coding style?

    mzumsande commented at 10:26 pm on January 16, 2023:
    Yes, I had used lower case for consistency. I now changed it to upper case, and added a commit that removes the old size() to always use the new Size() as suggested by Amiti. I made that an extra commit because it involved renaming a lot of existing call sites.
  21. in src/addrman.cpp:1019 in c75ca24338 outdated
    1015@@ -986,6 +1016,7 @@ int AddrManImpl::CheckAddrman() const
    1016 
    1017     std::unordered_set<int> setTried;
    1018     std::unordered_map<int, int> mapNew;
    1019+    std::unordered_map<Network, std::pair<size_t, size_t>> local_counts;
    


    vasild commented at 2:16 pm on January 13, 2023:

    nit: Feel free to ignore. It is ok as it is, but I find code like the following a bit unreadable:

    0x = it->second;
    1x.first = 5;
    2x.second = 6;
    3// or it->second.first = 000;
    

    Consider:

    0struct NewTriedCount {
    1    size_t n_new;
    2    size_t n_tried;
    3};
    4std::unordered_map<Network, NewTriedCount> local_counts;
    

    Just to give more meaningful names to those obscure first and second.


    amitiuttarwar commented at 11:14 pm on January 13, 2023:

    +1 to named struct, esp since both values are counts. when intending to increment tried count, it’s easier to accidentally increment “x.first” than it would be “x.new”. during review I was regularly reminding myself “first = new & second = tried”, and named vars would completely remove that mental overhead.

    clarification: I realized this comment is referring to the code in CheckAddrman, I was suggesting the named struct would be used for the member m_map_counts. @vasild were you thinking the same, or just thinking of updating here?


    vasild commented at 1:46 pm on January 14, 2023:
    m_map_counts too. The less first/second the better.

    mzumsande commented at 7:40 pm on January 16, 2023:
    I like the suggestion and changed it to use the struct in both spots.
  22. in src/addrman.cpp:1120 in c75ca24338 outdated
    1117@@ -1074,6 +1118,15 @@ size_t AddrManImpl::size() const
    1118     return vRandom.size();
    1119 }
    1120 
    1121+size_t AddrManImpl::size(std::optional<Network> net, std::optional<bool> in_new) const
    1122+{
    1123+    LOCK(cs);
    1124+    Check();
    1125+    auto ret = size_(net, in_new);
    1126+    Check();
    


    vasild commented at 2:18 pm on January 13, 2023:
    Is Check() needed before and after, given that size_() is a const method?

    mzumsande commented at 7:48 pm on January 16, 2023:
    It’s not needed, but this pattern is also used for other const methods (GetAddr(), Select()), plus we have a few mutables in AddrMan - since Check() is only run if the user enables it, I guess it’s better to invoke it once too often rather than not?
  23. in src/test/addrman_tests.cpp:1009 in c75ca24338 outdated
    1004+    // add two ipv4 addresses, one to tried and new
    1005+    CAddress addr1{CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE)};
    1006+    BOOST_CHECK(addrman->Add({addr1}, source));
    1007+    BOOST_CHECK(addrman->Good(addr1));
    1008+    CAddress addr2{CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE)};
    1009+    BOOST_CHECK(addrman->Add({CAddress(ResolveService("250.1.1.2", 8333), NODE_NONE)}, source));
    


    vasild commented at 2:28 pm on January 13, 2023:

    addr2 is unused. Probably you meant:

    0    CAddress addr2{CAddress(ResolveService("250.1.1.2", 8333), NODE_NONE)};
    1    BOOST_CHECK(addrman->Add({addr2}, source));
    

    addr1, addr2 and addr3 below can be const.

  24. in src/test/addrman_tests.cpp:1005 in c75ca24338 outdated
    1000+    BOOST_CHECK_EQUAL(addrman->size(/*net=*/NET_IPV4, /*in_new=*/std::nullopt), 0U);
    1001+    BOOST_CHECK_EQUAL(addrman->size(/*net=*/std::nullopt, /*in_new=*/true), 0U);
    1002+    BOOST_CHECK_EQUAL(addrman->size(/*net=*/NET_IPV4, /*in_new=*/false), 0U);
    1003+
    1004+    // add two ipv4 addresses, one to tried and new
    1005+    CAddress addr1{CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE)};
    


    vasild commented at 2:33 pm on January 13, 2023:
    nit: CAddress addr1{CAddress(...)}; is equivalent to: CAddress addr1{...};
  25. in src/net.cpp:1669 in c75ca24338 outdated
    1669-                // manual intervention will be needed (either delete peers.dat after
    1670-                // configuration change or manually add some reachable peer using addnode),
    1671-                // see <https://github.com/bitcoin/bitcoin/issues/26035> for details.
    1672+                // In case previously unreachable networks become reachable
    1673+                // (e.g. in case of -onlynet changes by the user), fixed seeds will
    1674+                // be loaded only for networks for which no we have no addressses.
    


    vasild commented at 3:05 pm on January 13, 2023:

    nit:

    0                // be loaded only for networks for which we have no addressses.
    
  26. in src/net.cpp:1671 in c75ca24338 outdated
    1673+                // (e.g. in case of -onlynet changes by the user), fixed seeds will
    1674+                // be loaded only for networks for which no we have no addressses.
    1675                 seed_addrs.erase(std::remove_if(seed_addrs.begin(), seed_addrs.end(),
    1676-                                               [](const CAddress& addr) { return !IsReachable(addr); }),
    1677-                                seed_addrs.end());
    1678+                                                [fixed_seed_networks](const CAddress& addr) { return std::find(fixed_seed_networks.begin(), fixed_seed_networks.end(), addr.GetNetwork()) == fixed_seed_networks.end(); }),
    


    vasild commented at 3:10 pm on January 13, 2023:

    A vector is usually not cheap to copy. Better capture it by reference:

    0                                                [&fixed_seed_networks](const CAddress& addr) { return std::find(fixed_seed_networks.begin(), fixed_seed_networks.end(), addr.GetNetwork()) == fixed_seed_networks.end(); }),
    

    vasild commented at 3:15 pm on January 13, 2023:

    This will iterate fixed_seed_networks for every seed address: O(num_addresses * num_networks). I know the list of networks is tiny and this can probably be ignored wrt performance, but it can easily be made O(num_addresses). Also, if std::unordered_set is used, the code will be shorter:

    0std::find(fixed_seed_networks.begin(), fixed_seed_networks.end(), addr.GetNetwork()) == fixed_seed_networks.end();
    

    would become:

    0fixed_seed_networks.count(addr.GetNetwork()) == 0;
    

    mzumsande commented at 9:51 pm on January 16, 2023:
    changed it to unordered_set as suggested, thanks!
  27. in src/net.cpp:1589 in c75ca24338 outdated
    1584+        if (IsReachable(net) && addrman.size(net, std::nullopt) == 0) {
    1585+            networks.push_back(net);
    1586+        }
    1587+    }
    1588+    return networks;
    1589+}
    


    vasild commented at 3:32 pm on January 13, 2023:
    This does not need to be a method of CConnman because it does not access any of its members (other than addrman). Would it be more appropriate to have it as AddrMan method? Then the public interface of AddrMan need not be changed by this PR. Edit: I mean no need to add this AddrMan method size(std::optional<Network> net, std::optional<bool> in_new).

    mzumsande commented at 9:17 pm on January 16, 2023:
    I’m not sure if this belongs into AddrMan because of the IsReachable - I think the information which networks are reachable belongs into net, and possibly at some later point into CConnman (I know vfLimited is currently a global, but this might change). I don’t think it makes sense that AddrMan should know or deal with this information, so I think I prefer leaving this out of AddrMan and reducing the change of the public interface by combining the size functions as suggested by Amiti below.

    vasild commented at 12:58 pm on January 30, 2023:
    What about a standalone function?

    mzumsande commented at 3:32 pm on February 1, 2023:
    I’d be ok with that, but then we’d have to pass it addrman, which is a member of CConnman, so I’m not sure why that is better.
  28. vasild commented at 3:34 pm on January 13, 2023: contributor
    Concept ACK
  29. in src/addrman.cpp:602 in 03a094324a outdated
    598@@ -592,6 +599,7 @@ bool AddrManImpl::AddSingle(const CAddress& addr, const CNetAddr& source, std::c
    599         pinfo = Create(addr, source, &nId);
    600         pinfo->nTime = std::max(NodeSeconds{0s}, pinfo->nTime - time_penalty);
    601         nNew++;
    602+        m_map_counts[pinfo->GetNetwork()].first++;
    


    amitiuttarwar commented at 10:51 pm on January 13, 2023:

    any reason you opted to add this here vs within the Create function?

    my guess is to keep it close to nNew++, but to imo it’d be better to move the nNew logic into Create instead of adding more external dependencies. it’s nbd because Create is only called from here, but imo would better match expectations / reduce room for accidental divergence in the future


    mzumsande commented at 9:01 pm on January 16, 2023:
    Yes, I wanted to keep these together. I will add a commit that moves both of these lines into Create.
  30. in src/addrman.h:108 in 03a094324a outdated
    101@@ -102,6 +102,13 @@ class AddrMan
    102     //! Return the number of (unique) addresses in all tables.
    103     size_t size() const;
    104 
    105+    /**
    106+    * Return the number of (unique) addresses, optionally filtered by network
    107+    * and/or addrman table (new if true, tried if false).
    108+    * If any of the optional args are nullopt, don't filter with respect to it.
    109+    */
    


    amitiuttarwar commented at 0:41 am on January 14, 2023:

    I think using doxygen guidelines makes this comment a bit easier to parse

    0/**
    1* Return size information about addrman.
    2*
    3* [@param](/bitcoin-bitcoin/contributor/param/)[in] net 	 		Select addresses only from specified network (nullopt = all)
    4* [@param](/bitcoin-bitcoin/contributor/param/)[in] in_new 		Select addresses only from one table (true = new, false = tried, nullopt = both)
    5*
    6* [@return](/bitcoin-bitcoin/contributor/return/) 					Number of unique addresses that match specified options.
    7*/
    

    mzumsande commented at 9:00 pm on January 16, 2023:
    done
  31. in src/addrman.cpp:991 in 03a094324a outdated
    986+        if (!in_new.has_value()) {
    987+            return entry.first + entry.second;
    988+        } else {
    989+            return *in_new ? entry.first : entry.second;
    990+        }
    991+    }
    


    amitiuttarwar commented at 1:08 am on January 14, 2023:

    style suggestion to consider. mainly, reversing the logic blocks in the if statements to reduce the ! indirections of nested logic.

     0    if (!net) {
     1        if (in_new.has_value()) {
     2            return *in_new ? nNew : nTried;
     3        } else {
     4            return nNew + nTried;
     5        }
     6    }
     7
     8    if (auto it = m_map_counts.find(*net); it != m_map_counts.end()) {
     9        auto entry = it->second;
    10        if (in_new.has_value()) {
    11            return *in_new ? entry.first : entry.second;
    12        } else {
    13            return entry.first + entry.second;
    14        }
    15    }
    

    amitiuttarwar commented at 1:09 am on January 14, 2023:
    also just noting that you use nNew + nTried to represent size of unique addresses in addrman. this breaks from the convention of using vRandom.size(), but makes sense to me.

    mzumsande commented at 9:04 pm on January 16, 2023:

    I took your suggestion wrt to the order.

    Yes, these need to be identical, but I actually changed it to use the old vRandom.size() used in size() so that there isn’t a change from the status quo when the two size() functions are merged as you suggest below.

  32. in src/addrman_impl.h:221 in 03a094324a outdated
    216@@ -215,6 +217,9 @@ class AddrManImpl
    217     /** Reference to the netgroup manager. netgroupman must be constructed before addrman and destructed after. */
    218     const NetGroupManager& m_netgroupman;
    219 
    220+    /** Number of entries in addrman per network and new/tried table. */
    221+    std::unordered_map<Network, std::pair<size_t, size_t>> m_map_counts GUARDED_BY(cs);
    


    amitiuttarwar commented at 1:18 am on January 14, 2023:
    naming: including the word map in the variable name seems like a hungarian notation thing. I believe we’re moving away from this convention in the codebase? what do you think about something like m_network_counts?

    mzumsande commented at 7:42 pm on January 16, 2023:
    done
  33. in src/net.h:976 in c75ca24338 outdated
    968@@ -969,6 +969,12 @@ class CConnman
    969     void RecordBytesRecv(uint64_t bytes);
    970     void RecordBytesSent(uint64_t bytes) EXCLUSIVE_LOCKS_REQUIRED(!m_total_bytes_sent_mutex);
    971 
    972+    /**
    973+     Return networks for which we have no addresses in addrman and therefore
    974+     may require loading fixed seeds.
    975+     */
    976+    std::vector<Network> GetRelevantFixedSeedNetworks() const;
    


    amitiuttarwar commented at 2:31 am on January 14, 2023:
    naming: “fixed seeds” seems odd to put in the function name because that’s the caller - doesn’t really have to do with the logic of the function. I think the functionality would better be captured with something like GetReachableEmptyNetworks()

    mzumsande commented at 7:51 pm on January 16, 2023:
    renamed, thanks
  34. amitiuttarwar commented at 2:46 am on January 14, 2023: contributor

    code looks good. I went through all the addrman touch points to the new & tried tables to double check nothing was missed. tried my best to think through edge cases of what could go wrong and don’t have anything :)

    left a bunch of review comments that are all style related, no blockers.

    another style related suggestion: what do you think about consolidating the two size() functions? adding default params to the newly introduced function means you can remove the old one & none* of the call sites need to change. I tried it out here. [*] none of the bitcoind callers. one test call site failed because of the shift from vRandom.size to nNew + nTried. the test is checking a corrupted addrman, and I don’t think the size is testing anything meaningful there, so I removed that check.

  35. vasild commented at 1:56 pm on January 14, 2023: contributor

    … of what could go wrong and don’t have anything …

    Oh, well.

    When nothing can possibly go wrong, it will.

    :)

    I guess nNew and m_map_count[].first could go out of sync in the future. It is good that now they are always updated next to each other (or within 1-2 lines). Further enforcing of this would be to add a new method that modifies both and only change them through this method. E.g.

    0void IncNew(ssize_t delta, Network net)
    1{
    2    nNew += delta;
    3    m_map_count[net].first += delta;
    4}
    

    Or ditch nNew and sum m_map_count[].first whenever we would read nNew, via a new method like size_t CountNew() const? Performance considerations?

  36. amitiuttarwar commented at 8:50 pm on January 14, 2023: contributor

    ah, just to clarify, I meant I couldn’t identify anything that could go wrong in the current version of the code. ofc things can always go awry in the future, which is why code style is useful to help reduce the chances of that happening :)

    I guess nNew and m_map_count[].first could go out of sync in the future. It is good that now they are always updated next to each other (or within 1-2 lines). Further enforcing of this would be to add a new method that modifies both and only change them through this method.

    agreed that the counts of nNew (and also nTried) could go out of sync with the totals in m_map_count. it can also be problematic if these values are in sync with each other, but out of sync with the actual values stored on the new / tried tables & other addrman structures.

    the code added to CheckAddrman are helpful because they offer a programatic testing against this inadvertently changing in the future. I don’t think having specific “accessor” methods makes the most sense because we already have functions that offer logical coupling of creating & removing addrs on addrman*. also, it would require 4 additional functions (increment & decrement / new & tried).

    [*] for eg. other than unserialization, all additions to the new table go through Addrman::Add, which calls AddSingle which uses Create as a helper to construct the AddrInfo object & start persisting to internal data structs. my comment here suggested moving incrementing both counts inside the Create function.

    Or ditch nNew and sum m_map_count[].first whenever we would read nNew, via a new method like size_t CountNew() const? Performance considerations?

    this one I’m more open to. martin pointed out some difficulties in that approach here, and I personally think its nbd either way.

  37. mzumsande force-pushed on Jan 16, 2023
  38. mzumsande force-pushed on Jan 16, 2023
  39. mzumsande commented at 11:00 pm on January 16, 2023: contributor

    Thanks a lot for the reviews! I think I have addressed all comments and implemented most of the suggestions - let me know if I missed something!

    another style related suggestion: what do you think about consolidating the two size() functions? adding default params to the newly introduced function means you can remove the old one & none* of the call sites need to change. I tried it out here. [*] none of the bitcoind callers. one test call site failed because of the shift from vRandom.size to nNew + nTried. the test is checking a corrupted addrman, and I don’t think the size is testing anything meaningful there, so I removed that check.

    I did that, and combined it with @vasild’s suggestion to use upper case notation Size() as per the developer notes. This involved a lot of renaming of existing call sites, so I did it in an extra commit at the end. Since I switched back to use vRandom.size to keep existing behavior, the corrupted AddrMan test doesn’t need to be changed anymore.

    Or ditch nNew and sum m_map_count[].first whenever we would read nNew, via a new method like size_t CountNew() const? Performance considerations?

    this one I’m more open to. martin pointed out some difficulties in that approach here, and I personally think its nbd either way.

    Yes, since it’s part of the serialization we can’t completely do without it. Of course, we could decide to just use nNew and nTried locally in Unserialize(), and rebuild it from the sum from m_network_counts everywhere else. I would suspect that there is no major difference performance-wise because the maximum number of networks is very small - I could try to benchmark it.

  40. mzumsande force-pushed on Jan 16, 2023
  41. in src/addrman.cpp:985 in ce95448e42 outdated
    980+        } else {
    981+            return vRandom.size();
    982+        }
    983+    }
    984+    if (auto it = m_network_counts.find(*net); it != m_network_counts.end()) {
    985+        auto entry = it->second;
    


    naumenkogs commented at 9:10 am on January 17, 2023:
    nit: I’d rename this into network_count. entry made me think it’s about one particular addr

    mzumsande commented at 11:09 pm on January 26, 2023:
    renamed to net_count for brevity
  42. in src/addrman.cpp:1107 in ce95448e42 outdated
    1102+    // doesn't have if addrs from a network were being added and then removed again in the past.
    1103+    if (m_network_counts.size() < local_counts.size()) {
    1104+        return -20;
    1105+    }
    1106+    for (const auto& [net, count] : m_network_counts) {
    1107+        if (local_counts[net].n_new != count.n_new || local_counts[net].n_tried != count.n_tried) {
    


    naumenkogs commented at 9:34 am on January 17, 2023:
    local_count[net] could be called on the non-existent key, but that’s fine and will return 0, right?

    naumenkogs commented at 9:39 am on January 17, 2023:
    perhaps a test for this case would be great…

    mzumsande commented at 6:39 pm on January 18, 2023:

    local_count[net] could be called on the non-existent key, but that’s fine and will return 0, right?

    yes, that’s correct - it will create a entry in local_counts with 0, and then return 0. I think that’s actually already happening in some unit test I think - but I will try to add a more explicit test soon!


    naumenkogs commented at 11:26 am on January 19, 2023:
    Oh, I should have tried to break it and see whether there is a such test. Explicit is better of course.

    mzumsande commented at 11:08 pm on January 26, 2023:

    For this to happen, we need to have an address in AddrMan for a network at one point in time, which later gets removed, so that we have an entry in m_network_counts that has only zeroes, but no entry in local_counts. This happens in the addrman_tests/remove_invalid unit test which tampers with the serialized stream, so that an address gets added and then removed during Deserialization - if you break CheckAddrman() by adding an assert at this spot, this subtest will fail.

    I think it could also happen theoretically if the last remaining address from network A is being kicked out by an address from network B due to a collision in the new tables. But this doesn’t seem trivial to write a unit test for, since addresses from different networks are typically from different groups, so I’m not sure if it’s easily possible to write an explicit test for this.

  43. addrman: add function to return size by network and table
    For now, the new functionality will be used in the context of
    querying fixed seeds. Other possible applications for
    future changes is the use in the context of making automatic
    connections to specific networks, or making more detailed info
    about addrman accessible via rpc.
    d35595a78a
  44. net: Load fixed seeds from reachable networks for which we don't have addresses
    Previously, we'd only load fixed seeds if we'd not
    know any addresses at all. This change makes it possible
    to change -onlynet abruptly, e.g. from -onlynet=onion to
    -onlynet=i2p and still find peers.
    c77c877a8e
  45. addrman, refactor: move count increment into Create()
    Create() is only called in one spot, so this doesn't
    change behavior.
    
    Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
    4885d6f197
  46. addrman, refactor: combine two size functions
    The functionality of the old size() is covered by the new Size()
    when no arguments are specified, so this does not change behavior.
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    80f39c99ef
  47. mzumsande force-pushed on Jan 26, 2023
  48. naumenkogs commented at 7:23 am on January 27, 2023: member
    utACK 80f39c9
  49. fanquake requested review from amitiuttarwar on Jan 27, 2023
  50. fanquake removed review request from amitiuttarwar on Jan 27, 2023
  51. fanquake requested review from ajtowns on Jan 27, 2023
  52. stratospher commented at 10:00 pm on January 27, 2023: contributor
    ACK 80f39c9
  53. in src/addrman.h:109 in 80f39c99ef
    106+    *
    107+    * @param[in] net              Select addresses only from specified network (nullopt = all)
    108+    * @param[in] in_new           Select addresses only from one table (true = new, false = tried, nullopt = both)
    109+    * @return                     Number of unique addresses that match specified options.
    110+    */
    111+    size_t Size(std::optional<Network> net = {}, std::optional<bool> in_new = {}) const;
    


    vasild commented at 1:44 pm on January 30, 2023:

    nit, if you re-touch: usually std::nullopt is used instead of {} and I think it is more readable, e.g.:

    https://github.com/bitcoin/bitcoin/blob/79e18ebc81f112f989ff50fb6f5a6a17db9b6758/src/streams.h#L236


    mzumsande commented at 6:05 pm on January 31, 2023:
    will do if I need to re-touch for another reason.

    mzumsande commented at 3:42 pm on February 1, 2023:
    done in #27015
  54. vasild approved
  55. vasild commented at 1:46 pm on January 30, 2023: contributor
    ACK 80f39c99ef2d30e3e2d8dbc068d25cf92aa32344
  56. achow101 commented at 8:55 pm on January 31, 2023: member
    ACK 80f39c99ef2d30e3e2d8dbc068d25cf92aa32344
  57. achow101 merged this on Jan 31, 2023
  58. achow101 closed this on Jan 31, 2023

  59. sidhujag referenced this in commit 6df4c23cbc on Feb 1, 2023
  60. mzumsande commented at 4:48 am on February 1, 2023: contributor
    I just noticed that this can lead to intermittent failures of the load_addrman_corrupted unit test, while checking the size in https://github.com/bitcoin/bitcoin/blob/ba3d32715f985314eb1fdb006bfc4127f8d516a7/src/test/addrman_tests.cpp#L951 the internal checks fail (that weren’t executed for Size() before this PR, and now are executed in some runs). I think it might be best to just remove this line from the test, since this is a situation where we purposefully load a corrupted peers.dat - seems a good thing that the Addrman Check fails then… I will open a PR tomorrow.
  61. fanquake referenced this in commit 2d5acc901d on Feb 1, 2023
  62. sidhujag referenced this in commit 4732901cd5 on Feb 1, 2023
  63. mzumsande deleted the branch on Oct 13, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me