net: Continuous ASMap health check #27581

pull fjahr wants to merge 4 commits into bitcoin:master from fjahr:2023-05-asmap-health-check changing 11 files +98 −14
  1. fjahr commented at 12:16 pm on May 5, 2023: contributor

    There are certain statistics we can collect by running all our known clearnet addresses against the ASMap file. This could show issues with a maliciously manipulated file or with an old file that has decayed with time.

    This is just a proof of concept for now. My idea currently is to run the analysis once per day and print the results to logs if an ASMap file is used.

  2. DrahtBot commented at 12:16 pm on May 5, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, brunoerg, achow101
    Concept ACK pablomartin4btc, virtu

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28690 (build: Introduce internal kernel library by TheCharlatan)
    • #27600 (net: Add new permission forceinbound to evict a random unprotected connection if all slots are otherwise full by pinheadmz)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on May 5, 2023
  4. in src/net_processing.cpp:1972 in ce356cac72 outdated
    1967+        std::transform(v4_addrs.begin(), v4_addrs.end(), std::back_inserter(clearnet_addrs),
    1968+            [](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
    1969+        std::transform(v6_addrs.begin(), v6_addrs.end(), std::back_inserter(clearnet_addrs),
    1970+            [](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
    1971+        m_connman.ASMapHealthCheck(clearnet_addrs);
    1972+    }
    


    dergoegge commented at 1:14 pm on May 5, 2023:
    I think this belongs in net.cpp and shouldn’t use the block height.

    dergoegge commented at 1:15 pm on May 5, 2023:
    (ignore if you were going to change this once out of draft anyway)

    fjahr commented at 1:20 pm on May 5, 2023:

    I wasn’t happy with the location myself but I need more time to think about where it would fit in best, so happy to take suggestions. I just tried to put it somewhere where it requires the least amount of new dependencies and shipped it to get conceptual feedback.

    and shouldn’t use the block height.

    What do you suggest to use instead of block height?


    dergoegge commented at 1:32 pm on May 5, 2023:

    happy to take suggestions.

    I think it would make sense for the opencon thread (ThreadOpenConnections) to handle this.

    What do you suggest to use instead of block height?

    0if (m_last_asmap_health_check + 24h < NodeClock::now()) {
    1    // Do health check
    2}
    

    fjahr commented at 7:50 pm on May 24, 2023:
    Sounds good, I am using this now.
  5. DrahtBot added the label CI failed on May 5, 2023
  6. in src/netgroup.cpp:118 in ce356cac72 outdated
    109@@ -109,3 +110,23 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
    110     uint32_t mapped_as = Interpret(m_asmap, ip_bits);
    111     return mapped_as;
    112 }
    113+
    114+void NetGroupManager::ASMapHealthCheck(const std::vector<CNetAddr> clearnet_addrs) const {
    115+    std::set<uint32_t> clearnet_asns{};
    116+    int unmapped_count{0};
    117+
    118+    for (auto addr : clearnet_addrs) {
    


    pablomartin4btc commented at 3:03 pm on May 5, 2023:

    I think you need to declare addr as const (tidyCI test is failing due to that).

    0    for (const auto &addr : clearnet_addrs) {
    

    fjahr commented at 7:50 pm on May 24, 2023:
    done
  7. pablomartin4btc commented at 3:03 pm on May 5, 2023: member
    Concept ACK.
  8. fjahr force-pushed on May 24, 2023
  9. fjahr force-pushed on May 24, 2023
  10. fjahr marked this as ready for review on May 24, 2023
  11. fjahr commented at 7:57 pm on May 24, 2023: contributor
    I think this is ready for review. I have included the feedback so far and I think the current extent of the checks and stats is good enough as a first step. I am currently still going through all the possible cases we could see an ASMap file be harmful to a node and this may lead to some additional checks and stats being logged in the future but I need some more time for this and it can be done as a follow-up.
  12. DrahtBot removed the label CI failed on May 25, 2023
  13. virtu commented at 3:16 pm on June 8, 2023: contributor
    Concept ACK.
  14. in src/net.cpp:1648 in 9b1d32e203 outdated
    1641@@ -1642,6 +1642,20 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    1642     const bool dnsseed = gArgs.GetBoolArg("-dnsseed", DEFAULT_DNSSEED);
    1643     bool add_fixed_seeds = gArgs.GetBoolArg("-fixedseeds", DEFAULT_FIXEDSEEDS);
    1644 
    1645+    // Run the ASMap Health check and log results once every 24h.
    1646+    if (m_netgroupman.UsingASMap() && m_last_asmap_health_check + 24h < NodeClock::now()) {
    1647+        m_last_asmap_health_check = NodeClock::now();
    1648+        const std::vector<CAddress> v4_addrs{GetAddresses(0, 0, Network::NET_IPV4)};
    


    brunoerg commented at 5:06 pm on June 8, 2023:

    nit:

    0        const std::vector<CAddress> v4_addrs{GetAddresses(/*max_addresses=*/0, /*max_pct=*/0, Network::NET_IPV4)};
    

    fjahr commented at 11:46 am on June 12, 2023:
    done

    brunoerg commented at 6:04 pm on June 12, 2023:
    I think you forgot this. (in case you have to touch the code again)

    fjahr commented at 6:10 pm on June 12, 2023:
    Hm, I did do it, not sure why it got lost…
  15. brunoerg commented at 5:15 pm on June 8, 2023: contributor

    Concept ACK, but I’m not conviced about the approach.

    In 9b1d32e203435a62caef7079fe8c49f755b43e3dNet: Add continuous ASMap health check logging”: You’re basically fetching all addresses (ipv4/ipv6) from addrman and doing a sanity check every 24hs. Considering we don’t change asmap file frequently (we need to restart the node to do it), do we really need to do it for all addresses every 24hs?

    Perhaps we could do it when:

    • Node is starting (it happens in this current implementation because of OpenThreadConnection) - this is the only situation where asmap file can be changed
    • Every time we are going to add an address to addrman’s new table?
  16. brunoerg commented at 6:35 pm on June 8, 2023: contributor

    My idea currently is to run the analysis once per day

    Instead of creating m_last_asmap_health_check and putting the logic into ThreadOpenConnection. Why not creating ASMapHealthCheck in CConnman:

     0void CConnman::ASMapHealthCheck()
     1{
     2    const std::vector<CAddress> v4_addrs{GetAddresses(0, 0, Network::NET_IPV4)};
     3    const std::vector<CAddress> v6_addrs{GetAddresses(0, 0, Network::NET_IPV6)};
     4    std::vector<CNetAddr> clearnet_addrs;
     5    clearnet_addrs.reserve(v4_addrs.size() + v6_addrs.size());
     6    std::transform(v4_addrs.begin(), v4_addrs.end(), std::back_inserter(clearnet_addrs),
     7        [](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
     8    std::transform(v6_addrs.begin(), v6_addrs.end(), std::back_inserter(clearnet_addrs),
     9        [](const CAddress& addr) { return static_cast<CNetAddr>(addr); });
    10    m_netgroupman.ASMapHealthCheck(clearnet_addrs);
    11}
    

    and then you could do in CConnman::Start something like:

    0if (m_netgroupman.UsingASMap()) {
    1    scheduler.scheduleFromNow([this] { ASMapHealthCheck(); }, ASMAP_HEALTH_CHECK);
    2}
    

    being ASMAP_HEALTH_CHECK the period, e.g.:

    0static constexpr std::chrono::hours ASMAP_HEALTH_CHECK{24};
    

    Obs: I didn’t test it, just a suggestion that came to my mind.

  17. in src/netgroup.cpp:124 in 9b1d32e203 outdated
    119+        uint32_t asn = GetMappedAS(addr);
    120+        if (asn == 0) {
    121+            ++unmapped_count;
    122+            continue;
    123+        }
    124+        clearnet_asns.insert(GetMappedAS(addr));
    


    brunoerg commented at 7:36 pm on June 8, 2023:

    To avoid calling GetMappedAS again

    0        clearnet_asns.insert(asn);
    

    fjahr commented at 11:45 am on June 12, 2023:
    done
  18. fjahr force-pushed on Jun 12, 2023
  19. fjahr commented at 11:51 am on June 12, 2023: contributor
    @brunoerg Thanks for the review! I like the idea of using the scheduler and I applied that change. However, I am unsure about tying the check to adding new addresses to the new table. It seems arbitrary to put it there, I would also like to run it on removals if we were going this route. From looking at some debug logs from a node I resynced the day before, I see changes in the address tables happening every couple of minutes. So we would still need to track the elapsed time to not run the check more often than necessary. So I think just using the 24h scheduler is cleaner and just as good.
  20. DrahtBot added the label CI failed on Jun 12, 2023
  21. in test/functional/feature_asmap.py:41 in 93808b9b7b outdated
    37@@ -38,8 +38,12 @@ def expected_messages(filename):
    38 
    39 class AsmapTest(BitcoinTestFramework):
    40     def set_test_params(self):
    41-        self.num_nodes = 1
    42-        self.extra_args = [["-checkaddrman=1"]]  # Do addrman checks on all operations.
    43+        self.num_nodes = 2
    


    brunoerg commented at 2:42 pm on June 12, 2023:
    Why did you add one more node? It doesn’t seem to have an usage and removing it doesn’t make the test fails.

    fjahr commented at 5:17 pm on June 12, 2023:
    It was need for the old approach, but not anymore I guess
  22. brunoerg commented at 2:49 pm on June 12, 2023: contributor

    So we would still need to track the elapsed time to not run the check more often than necessary. So I think just using the 24h scheduler is cleaner and just as good.

    I agree.

  23. in test/functional/feature_asmap.py:120 in 93808b9b7b outdated
    114@@ -111,6 +115,15 @@ def test_empty_asmap(self):
    115         self.node.assert_start_raises_init_error(extra_args=['-asmap'], expected_msg=msg)
    116         os.remove(self.default_asmap)
    117 
    118+    def test_asmap_health_check(self):
    119+        self.log.info('Test bitcoind -asmap logs ASMap Health Check with basic stats')
    120+        self.stop_node(0)
    


    brunoerg commented at 2:54 pm on June 12, 2023:
    Since test_asmapp_health_check is called after test_empty_asmap, node0 is not running, so we don’t need to stop it here again.

    fjahr commented at 5:16 pm on June 12, 2023:
    done
  24. in src/netgroup.h:47 in 93808b9b7b outdated
    40@@ -41,6 +41,16 @@ class NetGroupManager {
    41      */
    42     uint32_t GetMappedAS(const CNetAddr& address) const;
    43 
    44+    /**
    45+     *  Analyze and print current health of ASMap based buckets.
    46+     */
    47+    void ASMapHealthCheck(const std::vector<CNetAddr> clearnet_addrs) const;
    


    brunoerg commented at 3:01 pm on June 12, 2023:
    0    void ASMapHealthCheck(const std::vector<CNetAddr>& clearnet_addrs) const;
    

    fjahr commented at 5:16 pm on June 12, 2023:
    done
  25. in src/netgroup.cpp:118 in 93808b9b7b outdated
    109@@ -109,3 +110,23 @@ uint32_t NetGroupManager::GetMappedAS(const CNetAddr& address) const
    110     uint32_t mapped_as = Interpret(m_asmap, ip_bits);
    111     return mapped_as;
    112 }
    113+
    114+void NetGroupManager::ASMapHealthCheck(const std::vector<CNetAddr> clearnet_addrs) const {
    115+    std::set<uint32_t> clearnet_asns{};
    116+    int unmapped_count{0};
    117+
    118+    for (const auto &addr : clearnet_addrs) {
    


    brunoerg commented at 3:01 pm on June 12, 2023:

    nit:

    0    for (const auto& addr : clearnet_addrs) {
    

    fjahr commented at 5:16 pm on June 12, 2023:
    done
  26. fjahr force-pushed on Jun 12, 2023
  27. fjahr force-pushed on Jun 12, 2023
  28. fjahr force-pushed on Jun 12, 2023
  29. fjahr commented at 7:22 pm on June 12, 2023: contributor
    Addressed the feedback and rebased, which seems to have fixed the CI issues finally.
  30. DrahtBot removed the label CI failed on Jun 12, 2023
  31. in src/netgroup.h:45 in b87045da22 outdated
    40@@ -41,6 +41,16 @@ class NetGroupManager {
    41      */
    42     uint32_t GetMappedAS(const CNetAddr& address) const;
    43 
    44+    /**
    45+     *  Analyze and print current health of ASMap based buckets.
    


    brunoerg commented at 5:22 pm on August 10, 2023:

    nit:

    0     *  Analyze and log current health of ASMap based buckets.
    

    fjahr commented at 3:48 pm on August 15, 2023:
    Thanks, I will change this when I have to retouch.

    fjahr commented at 10:29 pm on September 29, 2023:
    done
  32. brunoerg approved
  33. brunoerg commented at 2:45 pm on August 15, 2023: contributor
    crACK b87045da22adec42cba28d1c9e9b910ab6cf5380
  34. in src/netgroup.cpp:127 in b87045da22 outdated
    122+            continue;
    123+        }
    124+        clearnet_asns.insert(asn);
    125+    }
    126+
    127+    LogPrint(BCLog::NET, "ASMap Health Check: %i clearnet peers are mapped to %i ASNs with %i peers being unmapped\n", clearnet_addrs.size(), clearnet_asns.size(), unmapped_count);
    


    mzumsande commented at 6:18 pm on September 13, 2023:
    Why not make this log unconditional, since it’s being run only once every 24h and not remotely triggerable? It’s not practical to run with BCLog::NET for extended periods of time for most users, considering how spammy it is.

    fjahr commented at 10:22 am on September 15, 2023:
    Makes sense to me, I didn’t think about the spammyness.

    fjahr commented at 10:29 pm on September 29, 2023:
    done
  35. in src/net.cpp:3735 in b87045da22 outdated
    2921@@ -2916,6 +2922,19 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& address) const
    2922     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2923 }
    2924 
    2925+void CConnman::ASMapHealthCheck()
    2926+{
    2927+    const std::vector<CAddress> v4_addrs{GetAddresses(/*max_addresses=*/ 0, /*max_pct=*/ 0, Network::NET_IPV4)};
    


    mzumsande commented at 6:20 pm on September 13, 2023:
    Just noting that this means that addresses for which IsTerrible() is true (e.g. not seen in 30 days) will be excluded from the statistics. These addresses are not treated any different when selecting peers for new automatic connections.

    fjahr commented at 10:19 am on September 15, 2023:
    Ah, hm, that doesn’t sound ideal to me. Do you know what the rationale is for excluding them from GetAddr() in all cases? I don’t see that mentioned in GetAddr() header file doc comment and GetAddr_() just says “Filter for quality”. Git blaming my way back to the original point it was added seems tedious at first look due to the many refactors on this code…

    mzumsande commented at 8:57 am on September 19, 2023:
    Mostly historical reasons, GetAddr() was initially only used to answer GETADDR messages, and only later used for reporting purposes such as getnodeaddresses. #27511 has some more discussion on this. I don’t think it is strictly necessary to do anything about it here, but one way to deal with it would be to just add a bool to GetAddr() that lets callers pick whether to filter out Terrible addresses or not, and use it to include all addresses in the ASMap check.

    fjahr commented at 10:30 pm on September 29, 2023:
    I have implemented this per our offline conversation. Keeping this conversation unresolved though so others can chime in if they have a different opinion on this topic.
  36. DrahtBot added the label Needs rebase on Sep 21, 2023
  37. fjahr force-pushed on Sep 29, 2023
  38. fjahr commented at 10:30 pm on September 29, 2023: contributor
    Rebased and addressed feedback from @brunoerg and @mzumsande . After some offline discussion with @mzumsande I have added the possibility to use GetAddr() unfiltered, i.e. with isTerrible() = true addresses. Otherwise, the results are not as expressive as I would have liked but happy to bikeshed this if people have different opinions :)
  39. DrahtBot removed the label Needs rebase on Sep 29, 2023
  40. fjahr force-pushed on Sep 30, 2023
  41. DrahtBot added the label CI failed on Sep 30, 2023
  42. DrahtBot added the label Needs rebase on Oct 3, 2023
  43. fjahr force-pushed on Oct 3, 2023
  44. fjahr commented at 3:49 pm on October 3, 2023: contributor
    Rebased
  45. DrahtBot removed the label Needs rebase on Oct 3, 2023
  46. maflcko commented at 3:43 pm on October 4, 2023: member
    CI times out
  47. net: Optionally include terrible addresses in GetAddr results e16f420547
  48. fuzz: Let fuzzers use filter options in GetAddr/GetAddresses b8843d37ae
  49. test: Add tests for unfiltered GetAddr usage 28d7e55dff
  50. fjahr force-pushed on Oct 4, 2023
  51. fjahr commented at 4:09 pm on October 4, 2023: contributor

    CI times out

    Doesn’t really look related? One failed after 120m, one after 16s and most succeeded. I also don’t see any relation to test changes here. Pushed a new rebase to see if the issues persist…

  52. DrahtBot removed the label CI failed on Oct 4, 2023
  53. in src/net.cpp:3311 in 8f7dfe77b8 outdated
    3304@@ -3305,6 +3305,12 @@ bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
    3305     // Dump network addresses
    3306     scheduler.scheduleEvery([this] { DumpAddresses(); }, DUMP_PEERS_INTERVAL);
    3307 
    3308+    // Run the ASMap Health check once and then schedule it to run every 24h.
    3309+    if (m_netgroupman.UsingASMap()) {
    3310+        ASMapHealthCheck();
    3311+        scheduler.scheduleFromNow([this] { ASMapHealthCheck(); }, ASMAP_HEALTH_CHECK_INTERVAL);
    


    mzumsande commented at 11:54 pm on November 30, 2023:
    This is not continuous, it only schedules it once more after the initial execution. I think that you meant to use scheduleEvery.

    fjahr commented at 9:12 pm on December 2, 2023:
    You are right, I pushed a fix.
  54. net: Add continuous ASMap health check logging 3ea54e5db7
  55. fjahr force-pushed on Dec 2, 2023
  56. mzumsande commented at 6:16 pm on December 4, 2023: contributor
    ACK 3ea54e5db7d53da5afa321e1800c29aa269dd3b3
  57. DrahtBot requested review from pablomartin4btc on Dec 4, 2023
  58. DrahtBot requested review from brunoerg on Dec 4, 2023
  59. brunoerg approved
  60. brunoerg commented at 8:32 pm on December 4, 2023: contributor

    crACK 3ea54e5db7d53da5afa321e1800c29aa269dd3b3

    code lgtm, will test in practice soon!

  61. achow101 commented at 4:15 pm on December 6, 2023: member
    ACK 3ea54e5db7d53da5afa321e1800c29aa269dd3b3
  62. achow101 merged this on Dec 6, 2023
  63. achow101 closed this on Dec 6, 2023

  64. pablomartin4btc approved
  65. pablomartin4btc commented at 4:23 pm on December 6, 2023: member
    ACK 3ea54e5db7d53da5afa321e1800c29aa269dd3b3

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me