Add unit tests for eviction logic #19966

issue practicalswift openend this issue on September 17, 2020
  1. practicalswift commented at 5:13 am on September 17, 2020: contributor

    The eviction logic is one of the most important parts in our code base: a bug in the eviction logic could allow for attacker triggered network partitioning.

    Unfortunately the eviction logic is currently not tested properly.

    The unit tests and fuzz tests don’t cover the eviction logic at all.

    The functional tests cover some parts of the eviction logic but the coverage is spotty/incomplete.

    Unfortunately CConnman::AttemptToEvictConnection is currently somewhat tricky to test.

    To solve this I suggest simplifying CConnman::AttemptToEvictConnection() by introducing a new function …

    0[[nodiscard]] Optional<NodeEvictionCandidate> SelectNodeToEvict(std::vector<NodeEvictionCandidate> vEvictionCandidates)
    

    … which would be called from the now much simplified CConnman::AttemptToEvictConnection.

    SelectNodeToEvict would contain all eviction selection logic and could thus be tested trivially.

  2. jonatack commented at 5:56 am on September 17, 2020: member
    Concept ACK on improving the design testability and coverage here.
  3. practicalswift commented at 12:26 pm on September 17, 2020: contributor

    The functional tests cover some parts of the eviction logic but the coverage is spotty/incomplete.

    In order to get a rough idea of the quality of the current eviction logic testing I performed some basic mutation testing of this code.

    All tests (including p2p_eviction.py) pass successfully also in the presence of these mutations:

     0$ git diff
     1diff --git a/src/net.cpp b/src/net.cpp
     2index e35d05cec..e138a4a13 100644
     3--- a/src/net.cpp
     4+++ b/src/net.cpp
     5@@ -826,43 +826,36 @@ static bool ReverseCompareNodeMinPingTime(const NodeEvictionCandidate &a, const
     6
     7 static bool ReverseCompareNodeTimeConnected(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b)
     8 {
     9-    return a.nTimeConnected > b.nTimeConnected;
    10+    return false;
    11 }
    12
    13 static bool CompareLocalHostTimeConnected(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b)
    14 {
    15-    if (a.m_is_local != b.m_is_local) return b.m_is_local;
    16-    return a.nTimeConnected > b.nTimeConnected;
    17+    return false;
    18 }
    19
    20 static bool CompareNetGroupKeyed(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b) {
    21-    return a.nKeyedNetGroup < b.nKeyedNetGroup;
    22+    return false;
    23 }
    24
    25 static bool CompareNodeBlockTime(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b)
    26 {
    27     // There is a fall-through here because it is common for a node to have many peers which have not yet relayed a block.
    28     if (a.nLastBlockTime != b.nLastBlockTime) return a.nLastBlockTime < b.nLastBlockTime;
    29-    if (a.fRelevantServices != b.fRelevantServices) return b.fRelevantServices;
    30-    return a.nTimeConnected > b.nTimeConnected;
    31+    return false;
    32 }
    33
    34 static bool CompareNodeTXTime(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b)
    35 {
    36     // There is a fall-through here because it is common for a node to have more than a few peers that have not yet relayed txn.
    37     if (a.nLastTXTime != b.nLastTXTime) return a.nLastTXTime < b.nLastTXTime;
    38-    if (a.fRelayTxes != b.fRelayTxes) return b.fRelayTxes;
    39-    if (a.fBloomFilter != b.fBloomFilter) return a.fBloomFilter;
    40-    return a.nTimeConnected > b.nTimeConnected;
    41+    return false;
    42 }
    43
    44 // Pick out the potential block-relay only peers, and sort them by last block time.
    45 static bool CompareNodeBlockRelayOnlyTime(const NodeEvictionCandidate &a, const NodeEvictionCandidate &b)
    46 {
    47-    if (a.fRelayTxes != b.fRelayTxes) return a.fRelayTxes;
    48-    if (a.nLastBlockTime != b.nLastBlockTime) return a.nLastBlockTime < b.nLastBlockTime;
    49-    if (a.fRelevantServices != b.fRelevantServices) return b.fRelevantServices;
    50-    return a.nTimeConnected > b.nTimeConnected;
    51+    return false;
    52 }
    53
    54 //! Sort an array by the specified comparator, then erase the last K elements.
    55@@ -870,8 +863,6 @@ template<typename T, typename Comparator>
    56 static void EraseLastKElements(std::vector<T> &elements, Comparator comparator, size_t k)
    57 {
    58     std::sort(elements.begin(), elements.end(), comparator);
    59-    size_t eraseSize = std::min(k, elements.size());
    60-    elements.erase(elements.end() - eraseSize, elements.end());
    61 }
    62
    63 /** Try to find a connection to evict when the node is full.
    64$ make
    65$ make check
    66$ echo $?
    670
    68$ test/functional/test_runner.py
    69$ echo $?
    700
    
  4. practicalswift commented at 6:15 am on September 18, 2020: contributor

    The small patch taking us to the suggested unit testable SelectNodeToEvict could look something like this:

      0diff --git a/src/net.cpp b/src/net.cpp
      1index 79e7f88c2..10c1246e4 100644
      2--- a/src/net.cpp
      3+++ b/src/net.cpp
      4@@ -874,43 +874,7 @@ static void EraseLastKElements(std::vector<T> &elements, Comparator comparator,
      5     elements.erase(elements.end() - eraseSize, elements.end());
      6 }
      7 
      8-/** Try to find a connection to evict when the node is full.
      9- *  Extreme care must be taken to avoid opening the node to attacker
     10- *   triggered network partitioning.
     11- *  The strategy used here is to protect a small number of peers
     12- *   for each of several distinct characteristics which are difficult
     13- *   to forge.  In order to partition a node the attacker must be
     14- *   simultaneously better at all of them than honest peers.
     15- */
     16-bool CConnman::AttemptToEvictConnection()
     17-{
     18-    std::vector<NodeEvictionCandidate> vEvictionCandidates;
     19-    {
     20-        LOCK(cs_vNodes);
     21-
     22-        for (const CNode* node : vNodes) {
     23-            if (node->HasPermission(PF_NOBAN))
     24-                continue;
     25-            if (!node->IsInboundConn())
     26-                continue;
     27-            if (node->fDisconnect)
     28-                continue;
     29-            bool peer_relay_txes = false;
     30-            bool peer_filter_not_null = false;
     31-            if (node->m_tx_relay != nullptr) {
     32-                LOCK(node->m_tx_relay->cs_filter);
     33-                peer_relay_txes = node->m_tx_relay->fRelayTxes;
     34-                peer_filter_not_null = node->m_tx_relay->pfilter != nullptr;
     35-            }
     36-            NodeEvictionCandidate candidate = {node->GetId(), node->nTimeConnected, node->nMinPingUsecTime,
     37-                                               node->nLastBlockTime, node->nLastTXTime,
     38-                                               HasAllDesirableServiceFlags(node->nServices),
     39-                                               peer_relay_txes, peer_filter_not_null, node->addr, node->nKeyedNetGroup,
     40-                                               node->m_prefer_evict, node->addr.IsLocal()};
     41-            vEvictionCandidates.push_back(candidate);
     42-        }
     43-    }
     44-
     45+[[nodiscard]] Optional<NodeEvictionCandidate> SelectNodeToEvict(std::vector<NodeEvictionCandidate> vEvictionCandidates) {
     46     // Protect connections with certain characteristics
     47 
     48     // Deterministically select 4 peers to protect by netgroup.
     49@@ -948,7 +912,7 @@ bool CConnman::AttemptToEvictConnection()
     50     total_protect_size -= initial_size - vEvictionCandidates.size();
     51     EraseLastKElements(vEvictionCandidates, ReverseCompareNodeTimeConnected, total_protect_size);
     52 
     53-    if (vEvictionCandidates.empty()) return false;
     54+    if (vEvictionCandidates.empty()) return nullopt;
     55 
     56     // If any remaining peers are preferred for eviction consider only them.
     57     // This happens after the other preferences since if a peer is really the best by other criteria (esp relaying blocks)
     58@@ -979,8 +943,56 @@ bool CConnman::AttemptToEvictConnection()
     59     // Reduce to the network group with the most connections
     60     vEvictionCandidates = std::move(mapNetGroupNodes[naMostConnections]);
     61 
     62     // Disconnect from the network group with the most connections
     63-    NodeId evicted = vEvictionCandidates.front().id;
     64+    return vEvictionCandidates.front();
     65+}
     66+
     67+/** Try to find a connection to evict when the node is full.
     68+ *  Extreme care must be taken to avoid opening the node to attacker
     69+ *   triggered network partitioning.
     70+ *  The strategy used here is to protect a small number of peers
     71+ *   for each of several distinct characteristics which are difficult
     72+ *   to forge.  In order to partition a node the attacker must be
     73+ *   simultaneously better at all of them than honest peers.
     74+ */
     75+bool CConnman::AttemptToEvictConnection()
     76+{
     77+    std::vector<NodeEvictionCandidate> vEvictionCandidates;
     78+    {
     79+        LOCK(cs_vNodes);
     80+
     81+        for (const CNode* node : vNodes) {
     82+            if (node->HasPermission(PF_NOBAN))
     83+                continue;
     84+            if (!node->IsInboundConn())
     85+                continue;
     86+            if (node->fDisconnect)
     87+                continue;
     88+            bool peer_relay_txes = false;
     89+            bool peer_filter_not_null = false;
     90+            if (node->m_tx_relay != nullptr) {
     91+                LOCK(node->m_tx_relay->cs_filter);
     92+                peer_relay_txes = node->m_tx_relay->fRelayTxes;
     93+                peer_filter_not_null = node->m_tx_relay->pfilter != nullptr;
     94+            }
     95+            NodeEvictionCandidate candidate = {node->GetId(), node->nTimeConnected, node->nMinPingUsecTime,
     96+                                               node->nLastBlockTime, node->nLastTXTime,
     97+                                               HasAllDesirableServiceFlags(node->nServices),
     98+                                               peer_relay_txes, peer_filter_not_null, node->addr, node->nKeyedNetGroup,
     99+                                               node->m_prefer_evict, node->addr.IsLocal()};
    100+            vEvictionCandidates.push_back(candidate);
    101+        }
    102+    }
    103+
    104+    const Optional<NodeEvictionCandidate> node_to_evict = SelectNodeToEvict(vEvictionCandidates);
    105+    if (!node_to_evict) {
    106+        return false;
    107+    }
    108+    const NodeId evicted = node_to_evict->id;
    109     LOCK(cs_vNodes);
    110     for (CNode* pnode : vNodes) {
    111         if (pnode->GetId() == evicted) {
    
  5. fanquake added the label Tests on Sep 18, 2020
  6. amitiuttarwar commented at 4:46 pm on September 18, 2020: contributor

    strong concept ACK on improving eviction testing 🙌🏽 . its critical logic, hard to reason about, and lacking automated testing.

    I’m trying to wrap my head around this strategy, is this right so far? -

    • the main idea is to separate AttemptToEvictConnection into two parts of 1. selecting eviction candidates 2. selecting which node to evict from the candidates
    • the proposed strategy is to extract # 2 into SelectNodeToEvict.
    • this enables sanity testing SelectNodeToEvict (as done in #19972) by constructing NodeEvictionCandidates with fuzzed inputs

    this seems like a great start. do you think this would enable us to write unit tests as well?

  7. practicalswift commented at 8:19 am on September 19, 2020: contributor

    I’m trying to wrap my head around this strategy, is this right so far? -

    • the main idea is to separate AttemptToEvictConnection into two parts of 1. selecting eviction candidates 2. selecting which node to evict from the candidates
    • the proposed strategy is to extract # 2 into SelectNodeToEvict.
    • this enables sanity testing SelectNodeToEvict (as done in #19972) by constructing NodeEvictionCandidates with fuzzed inputs

    Exactly! :)

    And as you point out the fuzzing harness is only meant as basic sanity testing to make sure the basic hygiene routine is covered – that no ASan or UBSan warnings are triggered no matter what input is fed, and some very basic sanity assertions. The important testing of the eviction logic is expected to be done in unit tests.

    this seems like a great start. do you think this would enable us to write unit tests as well?

    Absolutely! :)

    I suggest introducing something along the lines of:

    0void AssertIsEvicted(const NodeId expected_to_be_evicted, std::vector<NodeEvictionCandidate> eviction_candidates);
    1void AssertIsNotEvicted(const NodeId expected_not_to_be_evicted, std::vector<NodeEvictionCandidate> eviction_candidates);
    

    These could be used to test that the “correct node” is evicted/not evicted (whichever makes more sense) given some synthetic NodeEvictionCandidate:s carefully constructed to have different characteristics that are relevant from an eviction policy perspective.

  8. ariard commented at 0:08 am on September 22, 2020: member

    Concept ACK. I think covering heuristic stubs correctness (e.g CompareLocalHostTimeConnected) is a good step forward. These stubs aren’t straightforward to understand so unit test coverage would be great.

    I can’t think of issue as splitting AttemptToEvictConnection between SelectNodeToEvict/AttemptToEvictCandidates.

  9. practicalswift commented at 8:46 am on November 25, 2020: contributor
    This issue is addressed in #20477 (“test/net: Add unit testing of node eviction logic”).
  10. MarcoFalke closed this on Dec 16, 2020

  11. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me