Cache responses to GETADDR to prevent topology leaks #18991

pull naumenkogs wants to merge 5 commits into bitcoin:master from naumenkogs:2020-05-addr-response-caching changing 11 files +183 −13
  1. naumenkogs commented at 1:23 am on May 17, 2020: member

    This is a very simple code change with a big p2p privacy benefit.

    It’s currently trivial to scrape any reachable node’s AddrMan (a database of all nodes known to them along with the timestamps). We do have a limit of one GETADDR per connection, but a spy can disconnect and reconnect even from the same IP, and send GETADDR again and again.

    Since we respond with 1,000 random records at most, depending on the AddrMan size it takes probably up to 100 requests for an spy to make sure they scraped (almost) everything. I even have a script for that. It is totally doable within couple minutes.

    Then, with some extra protocol knowledge a spy can infer the direct peers of the victim, and other topological stuff.

    I suggest to cache responses to GETADDR on a daily basis, so that an attacker gets at most 1,000 records per day, and can’t track the changes in real time. I will be following up with more improvements to addr relay privacy, but this one alone is a very effective. And simple!

    I doubt any of the real software does reconnect to get new addrs from a given peer, so we shouldn’t be cutting anyone. I also believe it doesn’t have any negative implications on the overall topology quality. And the records being “outdated” for at most a day doesn’t break any honest assumptions either.

  2. naumenkogs commented at 1:24 am on May 17, 2020: member
  3. fanquake added the label P2P on May 17, 2020
  4. naumenkogs force-pushed on May 17, 2020
  5. naumenkogs force-pushed on May 17, 2020
  6. tryphe commented at 2:24 am on May 17, 2020: contributor

    Concept ACK

    Suggestion: Instead of modifying CConnman and net.h, add all these changes into CAddrMan and add a new function CAddrMan::GetAddrCache, so the functionality is local to that class, since there’s a lot of similar stuff going on in there.

    Other thoughts: I wonder what implications this might have in advertising new peers, i.e. suppose the 24 hour window rolls over right before your daemon sees a new peer. Nobody would be able to get that peer from you for at least 24 hours. But I’m not sure if that’s anything to be worried about, because everyone’s 24 hour window will expire at different times, so some peers would always have the chance to advertise it soon.

  7. naumenkogs commented at 2:44 am on May 17, 2020: member

    Suggestion: Instead of modifying CConnman and net.h, add all these changes into CAddrMan and add a new function CAddrMan::GetAddrCache, so the functionality is local to that class, since there’s a lot of similar stuff going on in there.

    To me it feels like CConman is the perfect place for this cache. It’s really something making sense only in the p2p context. We would never use this cache for any local stuff. But let’s see what other reviewers think about it?

    I wonder what implications this might have in advertising new peers, i.e. suppose the 24 hour window rolls over right before your daemon sees a new peer. Nobody would be able to get that peer from you for at least 24 hours. But I’m not sure if that’s anything to be worried about, because everyone’s 24 hour window will expire at different times, so some peers would always have the chance to advertise it soon.

    True, but I think the role of this aspect in advertising new node is minor. Nodes don’t send many GETADDRs, and one should be lucky to get that new node randomly. I think the primary method of advertising new peers is them calling AdvertiseLocal and forwarding unsolicited ADDR messages. And that one remains unaffected by this change.

    In addition, in future I’m planning to suggest doing AdvertiseLocal for feeler connections or other random connections, it should help both privacy and better addresss relay.

  8. naumenkogs force-pushed on May 17, 2020
  9. DrahtBot commented at 3:32 am on May 17, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #17428 (p2p: Try to preserve outbound block-relay-only connections during restart by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. practicalswift commented at 5:25 am on May 17, 2020: contributor
    Concept ACK: nice simple idea - that’s the best kind of idea! :)
  11. practicalswift commented at 5:30 am on May 17, 2020: contributor

    Would it make sense to have one cache per network type so that the answer cannot be used to fingerprint a node across network types?

    (I know there are other ways to do such fingerprinting, but no reason to add yet another one :))

  12. naumenkogs commented at 3:09 pm on May 17, 2020: member

    @practicalswift good observation re: across net fingerprinting! To be clear, my PR as it is now doesn’t introduce another fingerprinting, but just makes it a little easier for a spy. Previously they would have to make 100 requests and compare across nodes, now they’d have to make just one.

    But it would be indeed nice to not make it easier, and ideally even mitigate the fingerprinting you mention too. For starters, we may have 2 caches: for regular (ipv4/v6) and non-regular (onion) networks, based on where the requester is located. I will put together a commit for that. It could be further improved when we integrate addrv2.

    This would be another big improvement, now cross-node fingerprinting via GETADDR would take days or months, instead of just couple hours at most (a spy currently still have to ask a lot of nodes to map Tor nodes to regular nodes).

  13. naumenkogs force-pushed on May 17, 2020
  14. naumenkogs force-pushed on May 17, 2020
  15. DrahtBot added the label Needs rebase on May 21, 2020
  16. MarcoFalke commented at 12:14 pm on May 21, 2020: member
    What effect does this have on nodes that explicitly want to fan out onion addresses on the ipv4 net? Do they not exist?
  17. naumenkogs commented at 6:50 am on May 22, 2020: member

    What effect does this have on nodes that explicitly want to fan out onion addresses on the ipv4 net? Do they not exist?

    I don’t touch that behaviour at all! I introduce two separate caches, but both of them are filled the same way as before. The separation is based on who requests (onion or regular), not the type of AddrMan records.

    So, if this relay was possible before, it still is. If we previously already had unsatisfactory onion to ipv4 relay, then it remains, or maybe a little worse. But this should be addressed in a separate PR.

  18. in src/net.cpp:2541 in 7c72f28e3d outdated
    2488+            cache->m_update_addr_response = PoissonNextSend(current_time, AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL);
    2489         }
    2490-        return m_addrs_response_cache;
    2491+        return cache->m_addrs_response_cache;
    2492     } else {
    2493         return addrman.GetAddr();
    


    ariard commented at 7:12 am on May 22, 2020:

    A way to simplify code IMO:

     0if (use_cache == ADDR_CACHE_NONE) {
     1    return addrman.GetAddr();
     2}
     3CachedAddrResponse* cache;
     4if (use_cache == ADDR_CACHE_REGULAR) {
     5     cache = &m_cached_addr_response_regular;
     6} else {
     7     cache = &m_cached_addr_response_other;
     8}
     9const std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
    10if (cache->m_update_addr_response < current_time) {
    11    cache->m_addrs_response_cache = addrman.GetAddr();
    12    cache->m_update_addr_response = PoissonNextSend(current_time, AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL);
    13}
    14return cache->m_addrs_response_cache;
    

    naumenkogs commented at 6:52 am on May 25, 2020:
    Agree.
  19. in src/net.h:427 in 0bbcf38b5b outdated
    419@@ -416,6 +420,10 @@ class CConnman
    420     std::atomic<NodeId> nLastNodeId{0};
    421     unsigned int nPrevNodeCount{0};
    422 
    423+    // Cache responses to addr requests to minimize privacy leak.
    


    ariard commented at 7:39 am on May 22, 2020:
    Unless you think it’s too serious for now, I would comment the full privacy leak.

    naumenkogs commented at 8:25 am on May 25, 2020:
    Adding more info without talking too much specifics.
  20. in src/net.h:69 in 0bbcf38b5b outdated
    63@@ -64,6 +64,8 @@ static const int MAX_ADDNODE_CONNECTIONS = 8;
    64 static const int MAX_BLOCKS_ONLY_CONNECTIONS = 2;
    65 /** Maximum number of feeler connections */
    66 static const int MAX_FEELER_CONNECTIONS = 1;
    67+/** Average delay between updating address responses */
    68+static constexpr std::chrono::hours AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL{24};
    


    ariard commented at 7:48 am on May 22, 2020:

    What’s your rational for coming with 24h in average ? You announce this fix prevent tracking real-time changes in local node topology. I think to be effective you need to define real-time further with regards to expected topology changes, which for outbound peers is only stale tip or them falling offline ?

    Or say differently does natural outbound full-relay peer rotation is faster that 24 hours to rotate a reasonable subset before next spy query ?


    naumenkogs commented at 7:33 am on May 25, 2020:

    I would say 24h is pretty arbitrary, but reasonable. Let me explain.

    Upper bound: Shouldn’t be too long so that the returned records are still relevant for GETADDR requestor. Although currently IsTerrible is fine with everything in last 30 days, we also should make sure that nodes in the response are not dead. I can’t find the data, but I remember reading that the churn among reachable nodes (the ones we care) is low, so 1 day should be fine.

    Lower bound: Too frequent cache updates can leak privacy again. For example, think about the attack where you map onion to ipv4 of the same node by correlating addrmans. Ideally, the response in two caches does not intersect.

    The main event for both bounds is probably a feeler (because it updated nTime of the feeler’s addr record). Feeler every 2 minutes update 720 records a day, so that’s the same order as GETADDR response (1000 records).

    Same order means that not many of the new records will be missed in the cached response , as well as updates (upper bound concern). It also means that the caches are updated at the same pace as the AddrMan updates happen (lower bound concern).

    I think any reasoning beyond that requires substantial research into attacks.


    ariard commented at 0:19 am on June 2, 2020:

    We should dissociate network churn, i.e how many peers fall offline after X and peers rotation, i.e changes among our connections demography after same X. I think for counter-measure to be efficient you need rotation to be high enough to blur adversary guess of your peers topology.

    Obviously, a higher rotation target is likely to come with a higher churn, and therefore decrease validity of returned addresses. But I think it’s okay to try more connections compare to leaking your topology, goals don’t have same value.

    I agree, sounds hard to guess the sweet spot with further research, you should comment the trade-off and invite to further investigation.

    With regards to bounds, I think it make sense to evaluate that after a feeler cycle of a day, you may dispose from a more accurate view of the network and therefore you should update your cache. An average delay of 36h would sounds correct to me too, even making counter-measure a bit more efficient.


    glowang commented at 3:07 pm on June 2, 2020:
    As for rotation of outbound peers, I don’t know if we actually have any mechanism to do that programmatically. This idea was discussed here but the PR was closed as further research is needed. As rotation of inbound peers, does anyone know whether that even exists & how it is done?

    naumenkogs commented at 12:25 pm on June 3, 2020:

    Inbound rotation is something I’ve never heard of :)

    It practice it just means dropping an existing connection someone made to us… It doesn’t feel useful from the first glance, unless that peer was malicious or broken. An inbound connection can’t be immediately replaced, so this seems like a loose-loose w.r.t security. We can drop a spy, true, but right now that is solved by slower relays to inbounds etc.

    We “rotate” inbound peers if we’re at capacity (117 inbounds or whatnot), but that’s a completely different story. That’s called eviction.

  21. in src/net.h:258 in 7c72f28e3d outdated
    252@@ -253,9 +253,17 @@ class CConnman
    253     void SetServices(const CService &addr, ServiceFlags nServices);
    254     void MarkAddressGood(const CAddress& addr);
    255     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    256+
    257+    // Addr responses stored in different caches per network prevent
    258+    // cross-network node identity.
    


    ariard commented at 7:59 am on May 22, 2020:

    Please explain how cross-network node identity would operate, i.e a spy connecting first on regular, querying cache, reconnecting on other, querying again cache and testing for equality.

    Going further, shouldn’t you cache based on address announced ? Isn’t this a fingerprint for multi-homed nodes ?


    naumenkogs commented at 8:38 am on May 25, 2020:

    Adding more documentation.

    Going further, shouldn’t you cache based on address announced ? Isn’t this a fingerprint for multi-homed nodes ?

    What makes you think so?


    ariard commented at 11:55 pm on June 1, 2020:
    If you announce 2 different IPv4 addresses, requestor_network is going to be the same (NET_IPV4) for both and hit the same cache ?

    naumenkogs commented at 3:59 am on June 2, 2020:
    Oh, I see, you mean own address announced. That’s pretty crazy, I’m not even sure our software currently can do this.
  22. ariard commented at 8:20 am on May 22, 2020: member

    On network topology, the change I can foresee is a decrease in freshness of received address and therefore for bootstrapping nodes nudge towards connecting to older nodes ? By older I mean the ones we were aware of before AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL period. I don’t say this side-effect is bad, it maybe even good to favor connection to “mature” nodes but maybe it should be documented.

    To me it feels like CConman is the perfect place for this cache. It’s really something making sense only in the p2p context. We would never use this cache for any local stuff. But let’s see what other reviewers think about it?

    I think we should avoid introducing a timer in AddrMan, it makes it harder to reason on? And agree cache is only due to “untrustness” of query in this context.

    I doubt any of the real software does reconnect to get new addrs from a given peer, so we shouldn’t be cutting anyone.

    In case of doubt, add a release note ? But even if there is such software I would qualify them of broken, or non-distinguishable from spy.

  23. ariard commented at 8:25 am on May 22, 2020: member
    With regards to fingerprint, I think we have a more signaling descriptor, i.e address, and changing this one would force cleaning the cache, thus it doesn’t introduce a more severe one ?
  24. naumenkogs commented at 8:41 am on May 25, 2020: member

    On network topology, the change I can foresee is a decrease in freshness of received address and therefore for bootstrapping nodes nudge towards connecting to older nodes ?

    I talk about this here.

    With regards to fingerprint, I think we have a more signaling descriptor, i.e address, and changing this one would force cleaning the cache, thus it doesn’t introduce a more severe one ?

    Can you elaborate? Not sure I’m following.

  25. naumenkogs force-pushed on May 25, 2020
  26. naumenkogs force-pushed on May 25, 2020
  27. DrahtBot removed the label Needs rebase on May 25, 2020
  28. in src/net_processing.cpp:3216 in 328788ceae outdated
    3212+        CConnman::AddrCacheNetwork use_cache;
    3213+        Network peer_network = pfrom->addr.GetNetwork();
    3214+        if (peer_network == NET_IPV4 || peer_network == NET_IPV6) {
    3215+            use_cache = CConnman::ADDR_CACHE_REGULAR;
    3216+        } else {
    3217+            use_cache = CConnman::ADDR_CACHE_OTHER;
    


    glowang commented at 2:46 pm on May 28, 2020:

    My understand is you want to use ADDR_CACHE_OTHER for privacy-preserving networks, so that the caches returned from these networks are not mapped 1-1 with caches returned from identifying networks like ipv4, is that right?

    If this is the case, I believe NET_TEREDO peers and any peers with NET_UNKNOWN will also hit the else case and use the same cache as Tor’s? Do NET_TEREDO and NET_UNKNOWN networks have the same privacy level as Tor’s?


    naumenkogs commented at 3:06 pm on May 29, 2020:

    My understand is you want to use ADDR_CACHE_OTHER for privacy-preserving networks, so that the caches returned from these networks are not mapped 1-1 with caches returned from identifying networks like ipv4, is that right?

    Yes. I mean, ideally not just private/identifying networks. Ideally, if a node is hosted on Tor AND i2p AND ipv4, neither of them should be mapped together. But allocating a new cache for every network is probably too big of an overkill for such a niche use-case.

    That’s why I leave 2 caches, keeping in mind that ipv4 + ipv6 is one big group, and everything else can be grouped together. So, I guess that answers your question?

    If other reviewers point out some other usecase, I’ll be happy to discuss more.

  29. in src/net.cpp:2484 in 328788ceae outdated
    2480+            cache = &m_cached_addr_response_regular;
    2481+        } else {
    2482+            cache = &m_cached_addr_response_other;
    2483+        }
    2484+        const std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
    2485+        if (cache->m_update_addr_response < current_time) {
    


    glowang commented at 3:24 pm on May 28, 2020:
    if you initialized m_update_addr_response to be 0 with std::chrono::microseconds m_update_addr_response{0};, wouldn’t this clause always be true?

    naumenkogs commented at 3:22 pm on May 29, 2020:
    It is true on the first ever iteration, but then cache->m_update_addr_response = PoissonNextSend(current_time, AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL); in the conditional branch executes and makes it non-zero right away. So that it will be true again only after AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL
  30. in src/net.h:443 in bff16049e4 outdated
    440+    struct CachedAddrResponse {
    441+        std::vector<CAddress> m_addrs_response_cache;
    442+        std::chrono::microseconds m_update_addr_response{0};
    443+    };
    444+
    445+    // Keep a separate cache for other net (onion etc.) peers so that
    


    sipa commented at 4:59 pm on May 29, 2020:
    Why not one for every network? 1000 addresses per network is still only a tiny amount of memory.
  31. naumenkogs force-pushed on May 30, 2020
  32. naumenkogs commented at 4:26 pm on May 30, 2020: member
    After the numerous requests I introduced cache per Network. I think we should make it more fine-grained when addrv2 proposal goes through.
  33. naumenkogs force-pushed on May 30, 2020
  34. ariard commented at 0:26 am on June 2, 2020: member

    Can you elaborate? Not sure I’m following.

    I was thinking we should have a clear idea of node descriptors leveraged for fingerprinting, like your announced address, your mempool, your chain tip or cached blocks… Thus when you introduce a new one like your cache, if we have an already more severe one like the address, and updating address would also clear the cache it’s okay to introduce it.

  35. naumenkogs commented at 4:03 am on June 2, 2020: member

    I was thinking we should have a clear idea of node descriptors leveraged for fingerprinting, like your announced address, your mempool, your chain tip or cached blocks…

    Oh, I see. Well, I would say currently it’s the worst: AddrMan is very identifiable and easy to scrape. With caches, it becomes at least very hard to scrape. Anything beyond would be interesting as a follow-up research, but probably out of scope for this PR. All we need to know that this PR improves privacy for free, and nobody seem to argue with that.

  36. in src/net.cpp:2488 in a6a5b97520 outdated
    2485+
    2486+        // const std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
    2487+        // if (cache->m_update_addr_response < current_time) {
    2488+        //     cache->m_addrs_response_cache = addrman.GetAddr();
    2489+        //     cache->m_update_addr_response = PoissonNextSend(current_time, AVG_ADDRESS_RESPONSE_UPDATE_INTERVAL);
    2490+        // }
    


    luke-jr commented at 7:31 pm on June 2, 2020:
    What is all this for?

    naumenkogs commented at 1:55 am on June 3, 2020:
    Oh, oops, me being sloppy.
  37. in src/net.h:260 in a6a5b97520 outdated
    253@@ -251,7 +254,12 @@ class CConnman
    254     void SetServices(const CService &addr, ServiceFlags nServices);
    255     void MarkAddressGood(const CAddress& addr);
    256     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    257-    std::vector<CAddress> GetAddresses();
    258+
    259+    // Cache is used to minimize topology leaks, so it should
    260+    // be used for all non-trusted calls, for example, p2p.
    261+    // NET_INTERNAL represents a non-malicious call (either from RPC) or
    


    luke-jr commented at 7:32 pm on June 2, 2020:
    I think this should probably be a whitelist flag, not an assumption internal nodes are non-malicious
  38. luke-jr commented at 7:34 pm on June 2, 2020: member

    To be clear, my PR as it is now doesn’t introduce another fingerprinting, but just makes it a little easier for a spy. Previously they would have to make 100 requests and compare across nodes, now they’d have to make just one.

    For the record, I think you’re wrong: it did introduce a new fingerprinting. Previously, you couldn’t realistically fingerprint non-listening peers.

  39. naumenkogs commented at 1:59 am on June 3, 2020: member

    For the record, I think you’re wrong: it did introduce a new fingerprinting. Previously, you couldn’t realistically fingerprint non-listening peers.

    How would you fingerprint it after my PR? Nodes do not respond to GETADDR from outbound.

  40. naumenkogs force-pushed on Jun 3, 2020
  41. naumenkogs force-pushed on Jun 3, 2020
  42. naumenkogs force-pushed on Jun 3, 2020
  43. naumenkogs force-pushed on Jun 3, 2020
  44. naumenkogs force-pushed on Jun 3, 2020
  45. naumenkogs force-pushed on Jun 4, 2020
  46. DrahtBot added the label Needs rebase on Jun 4, 2020
  47. naumenkogs force-pushed on Jun 5, 2020
  48. DrahtBot removed the label Needs rebase on Jun 5, 2020
  49. in test/functional/p2p_getaddr_caching.py:1 in cd83106ce2 outdated
    0@@ -0,0 +1,107 @@
    1+#!/usr/bin/env python3
    


    luke-jr commented at 5:56 pm on June 7, 2020:
    File needs +x permisison
  50. luke-jr changes_requested
  51. naumenkogs force-pushed on Jun 9, 2020
  52. luke-jr referenced this in commit eae8a397cc on Jun 9, 2020
  53. luke-jr referenced this in commit c260f4a589 on Jun 9, 2020
  54. luke-jr referenced this in commit 92d8a782de on Jun 9, 2020
  55. luke-jr referenced this in commit ab12a67439 on Jun 9, 2020
  56. laanwj added this to the "Blockers" column in a project

  57. luke-jr commented at 7:42 am on June 14, 2020: member

    Seems to have an intermittent test error :(

     02020-06-14T07:39:48.638000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_rr8f1aik
     12020-06-14T07:39:49.959000Z TestFramework (INFO): Create connection that sends and requests addr messages
     22020-06-14T07:39:50.061000Z TestFramework (INFO): Fill peer AddrMan with a lot of records
     32020-06-14T07:39:50.127000Z TestFramework (INFO): Send many addr requests within short time to receive same response
     42020-06-14T07:39:51.105000Z TestFramework (INFO): After time passed, see a new response to addr request
     52020-06-14T07:39:51.308000Z TestFramework (ERROR): Assertion failed
     6Traceback (most recent call last):
     7  File "test/functional/test_framework/test_framework.py", line 112, in main
     8    self.run_test()
     9  File "test/functional/p2p_getaddr_caching.py", line 103, in run_test
    10    assert(set(responses[0]) != set(last_addr_receiver.received_addrs))
    11AssertionError
    
  58. naumenkogs force-pushed on Jun 14, 2020
  59. naumenkogs force-pushed on Jun 14, 2020
  60. naumenkogs commented at 1:51 pm on June 14, 2020: member
    @luke-jr good catch, thank you! I think Poisson timer varied too much, I made it more straightforward.
  61. luke-jr commented at 8:11 pm on June 14, 2020: member

    Looking at the [older] code, I see two possible causes (and no clear good solutions):

    1. Not enough addresses to ensure the new set is distinct
    2. PoissonNextSend may be choosing a >3 day delay.
  62. luke-jr commented at 8:38 pm on June 14, 2020: member
    Regarding the new version, you’ve switched from a 60+% chance of <24h, to only times between 24h-27h. I guess that’s fine, but might be more fingerprintable?
  63. naumenkogs commented at 5:17 am on June 15, 2020: member

    @luke-jr I’m pretty sure the problem was (2), because I did a bunch of experiments (like ran tests 1,000 times before and after my recent force push).

    Regarding the interval, I think we should just focus average ~24h (my last commit has sloppy 25.5h, making it 24h now) as we discussed before.

    A. Not sure why Poisson(24h) would have a 60% chance of <24h? (less important) B. Not sure why a high chance of shorter intervals make it less fingerprintable (I think shorter actually makes it more fingerprintable cuz attacker can collect their data faster)? (more important)

  64. naumenkogs force-pushed on Jun 15, 2020
  65. luke-jr referenced this in commit 0a38689e3b on Jun 15, 2020
  66. luke-jr referenced this in commit c217d9b968 on Jun 15, 2020
  67. luke-jr referenced this in commit c19faa0f77 on Jun 15, 2020
  68. luke-jr referenced this in commit 9b23a01a2a on Jun 15, 2020
  69. luke-jr referenced this in commit b3d53b566d on Jun 15, 2020
  70. in src/net.h:258 in d1312914a0 outdated
    251@@ -251,7 +252,12 @@ class CConnman
    252     void SetServices(const CService &addr, ServiceFlags nServices);
    253     void MarkAddressGood(const CAddress& addr);
    254     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    255-    std::vector<CAddress> GetAddresses();
    256+
    257+    // Cache is used to minimize topology leaks, so it should
    258+    // be used for all non-trusted calls, for example, p2p.
    259+    // A non-malicious call (either from RPC) or a whitelisted node
    


    mzumsande commented at 10:12 pm on June 15, 2020:
    nit: move the ) until after whitelisted node
  71. in test/functional/p2p_getaddr_caching.py:26 in d1312914a0 outdated
    23+    wait_until
    24+)
    25+import time
    26+
    27+ADDRS = []
    28+for i in range(1000):
    


    mzumsande commented at 9:12 pm on June 16, 2020:
    Since all addresses go into the same bucket, I think that only 64 of them will be kept in addrman, 14 of which will be returned in response to each GETADDR - so creating 1000 addresses instead of less does not add randomness to the GETADDR responses - though it’s fine of course if you want to test the 1000 address limit for ADDR.

    naumenkogs commented at 7:22 am on June 17, 2020:
    nice catch! Let’s keep it this way just in case e.g. bucketing logic is changed in the future.

    pinheadmz commented at 2:12 pm on July 1, 2020:
    Does it make sense to start off the node with > 1000 addresses? If I understand this test correctly, after the cache timeout the only thing that changes is the sequence of the same 1000 addresses in the message. I’m not sure if its worth it but we could also ensure that different addresses were sent after the timeout.

    jnewbery commented at 5:30 pm on July 4, 2020:

    I think that @mzumsande is right. Because these go into the same bucket, only a small number of them will be kept in addrman, and only 23% of those will be returned by CAddrMan::GetAddr(). I haven’t verified the numbers 64 and 14, but that sounds right to me.

    I suspect if we changed this to load >4000 addresses into addrman, then the test would fail, because responses to GETADDR messages would be different (see my comments about how the cache interacts with vAddrToSend. We should make that change, and also fix the code so it does what we’re expecting!

  72. mzumsande commented at 9:23 pm on June 16, 2020: member

    Concept ACK. Looks like a nice privacy gain without serious downsides.

    nit: src/test/fuzz/net_permissions.cpp could also be extended for PF_ADDR

  73. naumenkogs commented at 7:38 am on June 17, 2020: member
    Addressed feedback from @mzumsande: added PF_ADDR to the fuzzer and fixed the comment.
  74. naumenkogs force-pushed on Jun 17, 2020
  75. laanwj commented at 11:22 am on June 18, 2020: member

    I don’t touch that behaviour at all! I introduce two separate caches, but both of them are filled the same way as before. The separation is based on who requests (onion or regular), not the type of AddrMan records.

    Just a question: How do you recognize an incoming ONION connection? They look like incoming IPv4 connections, that tend to come from 127.0.0.1. I think until #8973 is implemented, which would use a dedicated incoming P2P port for connections coming from Tor, it’s not possible to reliably distinguish them. Though maybe “comes from localhost” is good enough in practical cases, I don’t know.

    But as I see you currently use pfrom.addr.GetNetwork(); which will return IPv4 in this case?

  76. naumenkogs commented at 12:11 pm on June 18, 2020: member

    @laanwj good point! I actually don’t have much hands-on experience with Tor inbounds. I can’t verify what you say atm, but yeah, if they look like 127.0.0.1, they will be grouped with “comes from localhost” until #8973.

    I think it’s fine to merge it as-is, and then when #8973 comes the grouping can be improved.

  77. fjahr commented at 10:34 am on June 25, 2020: member

    I am not sure I understand how big the privacy gain is from this. Couldn’t the spy just take the ~1000 records we send them and then send all of them a GETADDR and still get a very good view of the topology of the network within a few minutes? Granted, they may still be missing a few of our connections after that. But then maybe a simpler approach would be to keep a few connections ‘secret’ and never announce them?

    Also, is it generally considered bad if a node wants to know about as many peers as possible? If I was a merchant or a miner/pool I might want to optimize my network view for non-malicious reasons if there is any (even very small) indication that I might make more money or be a little more secure.

  78. naumenkogs commented at 11:03 am on June 25, 2020: member

    Couldn’t the spy just take the ~1000 records we send them and then send all of them a GETADDR and still get a very good view of the topology of the network within a few minutes?

    I don’t know how to exploit what you’re describing :) Like, yeah, you send them all a GETADDR, but then what?

    I’m talking about a different set of attacks here. One example is mapping two virtual identities (Tor and ipv4) of the same node together. If you can scrape all their AddrMan records and compare timings, it’s very easy to tell if they’re the same physical node. After my change, it’s much harder, because two sets of records may not overlap at all. I have other attacks in mind, per which one can tell direct peers of a node by looking at all AddrMan records of a victim.

    Also, is it generally considered bad if a node wants to know about as many peers as possible?

    Well, if you want more security, you can query 100 random nodes x 1000 addrs and get the same full AddrMan list :) This is by all means better than relying on scraping full AddrMans from first 8 peers you have. Maybe someone can implement this, but I’m not convinced it’s very useful :) Getting 8 x 1000 addrs at the beginning and catching up via other means later seems good enough. Furthermore, if one of your 8 peers fills your AddrMan with trash, you’re in trouble…

  79. MarcoFalke added the label Review club on Jun 25, 2020
  80. in src/net.cpp:2533 in 8e6ad248bf outdated
    2529@@ -2529,7 +2530,7 @@ std::vector<CAddress> CConnman::GetAddresses(Optional<Network> requestor_network
    2530 {
    2531     if (requestor_network) {
    2532         const std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
    2533-        const Network network = requestor_network.get();
    2534+        const Network network = requestor_network.get_value_or(NET_UNROUTABLE);
    


    achow101 commented at 6:10 pm on June 25, 2020:
    This seems unnecessary. get_value_or only matters if the Optional is nullopt and a nullopt can’t reach this code path.
  81. fjahr commented at 5:16 pm on June 28, 2020: member

    I don’t know how to exploit what you’re describing :) Like, yeah, you send them all a GETADDR, but then what?

    I’m talking about a different set of attacks here. One example is mapping two virtual identities (Tor and ipv4) of the same node together. If you can scrape all their AddrMan records and compare timings, it’s very easy to tell if they’re the same physical node. After my change, it’s much harder, because two sets of records may not overlap at all. I have other attacks in mind, per which one can tell direct peers of a node by looking at all AddrMan records of a victim.

    I was going by the original description which did not mention decorrelation between networks, I think. So I took that as not the primary goal of this PR. I do see network decorrelation as an improvement. I meant the other attacks that could be launched when an attacker has all the AddrMan records of the victim. Maybe I have a wrong sense of the typical network view of a node and it is more incomplete than I imagine but my idea was that if the victim has a pretty complete view of the network then launching the same attack on the network that you can collect from the other peers should not be that much different. Sure, it would be a bit less efficient but that was basically my question: how much more protected is my privacy when the attacks are launched with the complete set of the victim’s peers vs. launching the same attack with a set of peers collected from the whole network.

    Based on that it may depend if there are other solutions that could be compared with this one: For example, what if nodes run getaddr more often so that everyone has a mostly complete view of the network, then fingerprinting is hard because everyone’s network view is the same. Of course, the downside is more traffic. Another solution could be keeping a small set of peers secret. I am not sure if that is better, I would just like to discuss it. :)

    Well, if you want more security, you can query 100 random nodes x 1000 addrs and get the same full AddrMan list :) This is by all means better than relying on scraping full AddrMans from first 8 peers you have. Maybe someone can implement this, but I’m not convinced it’s very useful :) Getting 8 x 1000 addrs at the beginning and catching up via other means later seems good enough. Furthermore, if one of your 8 peers fills your AddrMan with trash, you’re in trouble…

    True, it would make sense to ask more peers than just scraping one completely. I guess that was more of a devil’s advocate question :) I just wanted to mention that I think there is a small chance that some network participant does call getaddr multiple times without malicious intentions.

  82. naumenkogs commented at 6:10 am on June 29, 2020: member

    how much more protected is my privacy when the attacks are launched with the complete set of the victim’s peers vs. launching the same attack with a set of peers collected from the whole network.

    I see the confusion! In my opinion (and it seems to be the threat model other people share), just a list of all nodes in the network (or all nodes a victim has) is not private info. Sure, it does help attacks too, but I doubt it’s practical to keep this list secret overtime.

    This is what my original post says:

    Then, with some extra protocol knowledge a spy can infer the direct peers of the victim, and other topological stuff.

    What is dangerous is exposing immediate peers of victims in the network. And to learn that, a spy would use somewhat more sophisticated observations based on the AddrMan. For example, constantly scrape it, detect a new node N in the list of a given target node T, and make a good guess that N just connected to T (because other nodes’ AddrMans doesn’t contain N).

    This is just one new example of an attack I never mentioned before! I have a couple of them in mind, I was just trying to avoid going in detail… So in this attack example “keeping a small set of peers secret” won’t help. Better AddrMan sync will help better, but still not as good as caching (and yeah, bandwidth costs).

  83. achow101 commented at 7:24 pm on June 29, 2020: member
    ACK 52d22a3c235bfed8a772b7cccbe470efdfa10291
  84. in src/net.h:259 in 52d22a3c23 outdated
    251@@ -251,7 +252,12 @@ class CConnman
    252     void SetServices(const CService &addr, ServiceFlags nServices);
    253     void MarkAddressGood(const CAddress& addr);
    254     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    255-    std::vector<CAddress> GetAddresses();
    256+
    257+    // Cache is used to minimize topology leaks, so it should
    258+    // be used for all non-trusted calls, for example, p2p.
    259+    // A non-malicious call (from RPC or a whitelisted node)
    260+    // should avoid using the cache by passing nullopt.
    


    andrewtoth commented at 3:05 am on June 30, 2020:
    Can you elaborate on why a whitelisted node should avoid using the cache? If there are no real downsides to caching for 24 hours then this shouldn’t matter, right?

    naumenkogs commented at 12:13 pm on June 30, 2020:

    Some hypothetical examples which might make sense in a trusted setting (which doesn’t really make sense in a public trust-minimized setting):

    • 2 remote nodes exchanging each others AddrMan and compare to make sure no poisoning is happening
    • one node learns full AddrMan (or more than 1 cached list) from its trusted peer

    I don’t have a strong justification for this feature, but I thought it might be useful. Interested in other opinions!


    andrewtoth commented at 1:55 pm on June 30, 2020:
    Yeah I don’t see any downsides to maintaining the current behaviour for trusted callers. I was just confused at seeing it in the implementation but no mention of it in the PR description or discussion.
  85. andrewtoth commented at 3:07 am on June 30, 2020: contributor
    Concept ACK.
  86. in src/net.h:260 in 52d22a3c23 outdated
    256+
    257+    // Cache is used to minimize topology leaks, so it should
    258+    // be used for all non-trusted calls, for example, p2p.
    259+    // A non-malicious call (from RPC or a whitelisted node)
    260+    // should avoid using the cache by passing nullopt.
    261+    std::vector<CAddress> GetAddresses(Optional<Network> requestor_network);
    


    jnewbery commented at 9:05 pm on June 30, 2020:

    It feels to me like this would be more natural as an overloaded function rather than passing an optional:

    • GetAddresses() returns an uncached vector of addresses from addrman as now
    • GetAddresses(Network) returns the cached vector of addresses for that network.

    Why? Because the logic for those two cases is entirely separated in the implementation, it avoids packing and unpacking the Network in an optional, and doing this means that some of the calling sites don’t need to be changed.


    naumenkogs commented at 8:27 am on July 1, 2020:
    Agreed, will update.
  87. in src/net.cpp:2532 in 52d22a3c23 outdated
    2525@@ -2525,9 +2526,20 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2526     addrman.Add(vAddr, addrFrom, nTimePenalty);
    2527 }
    2528 
    2529-std::vector<CAddress> CConnman::GetAddresses()
    2530+std::vector<CAddress> CConnman::GetAddresses(Optional<Network> requestor_network)
    2531 {
    2532-    return addrman.GetAddr();
    2533+    if (requestor_network) {
    2534+        const std::chrono::microseconds current_time = GetTime<std::chrono::microseconds>();
    


    jnewbery commented at 9:07 pm on June 30, 2020:
    This is a good place for an auto. The return type of GetTime<std::chrono::microseconds>() is obvious, so there’s no need to type it out.
  88. in src/net.h:450 in 52d22a3c23 outdated
    434+    // Addr responses stored in different caches
    435+    // per network prevent cross-network node identification.
    436+    // If a node for example is multi-homed under Tor and IPv6,
    437+    // a single cache (or no cache at all) would let an attacker
    438+    // to easily detect that it is the same node by comparing responses.
    439+    std::map<Network, CachedAddrResponse> m_addr_response_caches;
    


    jnewbery commented at 9:09 pm on June 30, 2020:
    What are the additional memory requirements for this structure? Could you add that to the comment?
  89. in src/net_processing.cpp:3341 in 52d22a3c23 outdated
    3337+            Network peer_network = pfrom.addr.GetNetwork();
    3338+            vAddr = connman->GetAddresses(peer_network);
    3339+        }
    3340         FastRandomContext insecure_rand;
    3341         for (const CAddress &addr : vAddr) {
    3342             if (!banman->IsBanned(addr)) {
    


    jnewbery commented at 9:21 pm on June 30, 2020:
    Perhaps this filtering out banned addresses shouldn’t happen if we’re sending from the cache. If an address in the cache is banned during the cache’s lifetime, subsequent GETADDR requests will receive a different response, which could be a marginal privacy leak.

    jnewbery commented at 2:22 pm on July 1, 2020:
    In fact, I think this filtering out of banned addresses should move to CConnman::GetAddresses(). The only other client of that function is getnodeaddresses in rpc/net, and that probably doesn’t want to know about banned addresses either.
  90. in src/net_processing.cpp:3337 in 52d22a3c23 outdated
    3333+        std::vector<CAddress> vAddr;
    3334+        if (pfrom.HasPermission(PF_ADDR)) {
    3335+            vAddr = connman->GetAddresses(nullopt);
    3336+        } else {
    3337+            Network peer_network = pfrom.addr.GetNetwork();
    3338+            vAddr = connman->GetAddresses(peer_network);
    


    jnewbery commented at 9:23 pm on June 30, 2020:
    Merge these two lines? No need for a local peer_network variable (the compiled code will be exactly the same, but creating a local variable that won’t be used again signals the wrong intent in the code)
  91. in test/functional/p2p_getaddr_caching.py:7 in 52d22a3c23 outdated
    0@@ -0,0 +1,107 @@
    1+#!/usr/bin/env python3
    2+# Copyright (c) 2020 The Bitcoin Core developers
    3+# Distributed under the MIT software license, see the accompanying
    4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
    5+"""
    6+Test addr response caching
    7+"""
    


    jnewbery commented at 9:23 pm on June 30, 2020:
    nit: docstrings should be one line if there’s only one line of text. Join these lines.
  92. in test/functional/p2p_getaddr_caching.py:78 in 52d22a3c23 outdated
    73+        N = 5
    74+        cur_mock_time = int(time.time())
    75+        for i in range(N):
    76+            addr_receiver = self.nodes[0].add_p2p_connection(AddrReceiver())
    77+            with self.nodes[0].assert_debug_log([
    78+                    "received: getaddr (0 bytes) peer={}".format(i+1),
    


    jnewbery commented at 9:24 pm on June 30, 2020:
    nit: spaces around +. Same below in line 95.
  93. in test/functional/p2p_getaddr_caching.py:32 in 52d22a3c23 outdated
    27+ADDRS = []
    28+for i in range(1000):
    29+    addr = CAddress()
    30+    addr.time = int(time.time())
    31+    addr.nServices = NODE_NETWORK | NODE_WITNESS
    32+    third_octet = int(i / 256) % 256
    


    jnewbery commented at 9:27 pm on June 30, 2020:
    prefer (i >> 8) % 256 to avoid having to convert between ints and floats (and better signal that you’re removing the bottom 8 bits for the third section of the IP address). Better yet: i >> 8 since your loop doesn’t go over 256^2.
  94. in test/functional/p2p_getaddr_caching.py:65 in 52d22a3c23 outdated
    60+        addr_source = self.nodes[0].add_p2p_connection(P2PInterface())
    61+        msg_send_addrs = msg_addr()
    62+
    63+        self.log.info('Fill peer AddrMan with a lot of records')
    64+        msg_send_addrs.addrs = ADDRS
    65+        with self.nodes[0].assert_debug_log([
    


    jnewbery commented at 9:52 pm on June 30, 2020:
    I have a general distaste for asserting on log contents (logs aren’t part of a stable API, so can change between releases, and it’s annoying if that causes tests to break). I suggest just removing the assert_debug_log()
  95. in test/functional/p2p_getaddr_caching.py:77 in 52d22a3c23 outdated
    72+        self.log.info('Send many addr requests within short time to receive same response')
    73+        N = 5
    74+        cur_mock_time = int(time.time())
    75+        for i in range(N):
    76+            addr_receiver = self.nodes[0].add_p2p_connection(AddrReceiver())
    77+            with self.nodes[0].assert_debug_log([
    


    jnewbery commented at 9:53 pm on June 30, 2020:
    Again, there’s no need for this assert_debug_log(). You’re already asserting on the contents of the response.
  96. in test/functional/p2p_getaddr_caching.py:82 in 52d22a3c23 outdated
    77+            with self.nodes[0].assert_debug_log([
    78+                    "received: getaddr (0 bytes) peer={}".format(i+1),
    79+            ]):
    80+                addr_receiver.send_and_ping(msg_getaddr())
    81+                # Trigger response
    82+                cur_mock_time += 5 * 60
    


    jnewbery commented at 9:56 pm on June 30, 2020:
    why is this change in mocktime between the send and the wait_until? Presumably it can go afterwards just as easily? Splitting up the send and wait_until response makes this marginally more difficult to follow.

    naumenkogs commented at 9:09 am on July 4, 2020:
    See AVG_ADDRESS_BROADCAST_INTERVAL, this code is necessary to bypass it (same below). Maybe I don’t understand your suggestion.

    jnewbery commented at 4:22 pm on July 4, 2020:
    Ah, you’re right. I misunderstood how the ADDR response is batched/delayed.
  97. in test/functional/p2p_getaddr_caching.py:84 in 52d22a3c23 outdated
    79+            ]):
    80+                addr_receiver.send_and_ping(msg_getaddr())
    81+                # Trigger response
    82+                cur_mock_time += 5 * 60
    83+                self.nodes[0].setmocktime(cur_mock_time)
    84+                wait_until(addr_receiver.received_addr, timeout=30, lock=mininode_lock)
    


    jnewbery commented at 9:56 pm on June 30, 2020:
    no need for custom timeouts here.
  98. in test/functional/p2p_getaddr_caching.py:100 in 52d22a3c23 outdated
     95+                "received: getaddr (0 bytes) peer={}".format(N+1),
     96+        ]):
     97+            last_addr_receiver.send_and_ping(msg_getaddr())
     98+            # Trigger response
     99+            cur_mock_time += 5 * 60
    100+            self.nodes[0].setmocktime(cur_mock_time)
    


    jnewbery commented at 9:57 pm on June 30, 2020:
    I don’t think this setmocktime is necessary.

    jnewbery commented at 4:23 pm on July 4, 2020:
    As above, this is incorrect. Disregard.
  99. in src/test/fuzz/net_permissions.cpp:27 in 52d22a3c23 outdated
    22@@ -23,6 +23,7 @@ void test_one_input(const std::vector<uint8_t>& buffer)
    23                                                                                              NetPermissionFlags::PF_FORCERELAY,
    24                                                                                              NetPermissionFlags::PF_NOBAN,
    25                                                                                              NetPermissionFlags::PF_MEMPOOL,
    26+                                                                                             NetPermissionFlags::PF_ADDR,
    


    jnewbery commented at 9:58 pm on June 30, 2020:
    This change can be squashed into commit Add whitelist flag enabling non-cached addr sharing
  100. jnewbery commented at 9:59 pm on June 30, 2020: member

    Concept ACK, but I think I need a bit more time to think about the implications.

    I’ve left some style suggestions in the mean time.

  101. pinheadmz commented at 1:46 pm on July 1, 2020: member

    Dunno if this has been mentioned already but if you rebase to include #19192 you can add a help description of the new net permission addr.

    https://github.com/bitcoin/bitcoin/blob/bb588669f9e84011969b67f807f12c3480489955/src/net_permissions.cpp#L11-L17

  102. willcl-ark commented at 2:09 pm on July 1, 2020: member

    tACK.

    PR passed tests with MacOS 10.15 on both this branch and cherry-picked on top of master at 2af56d6d5c387c3208d3d5aae8d428a3d610446f.

    I like the general approach of the PR: there seems like little danger associated with a rolling ADDR cache and certainly some possible privacy upside if the assumptions in the first post (e.g. being able to infer direct peers) are correct.

  103. in src/net.cpp:2531 in 52d22a3c23 outdated
    2525@@ -2525,9 +2526,20 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2526     addrman.Add(vAddr, addrFrom, nTimePenalty);
    2527 }
    2528 
    2529-std::vector<CAddress> CConnman::GetAddresses()
    2530+std::vector<CAddress> CConnman::GetAddresses(Optional<Network> requestor_network)
    2531 {
    2532-    return addrman.GetAddr();
    2533+    if (requestor_network) {
    


    pinheadmz commented at 2:33 pm on July 1, 2020:
    This might be an insane edge case, but if pfrom.addr.GetNetwork() is NET_UNROUTABLE, the value will be 0 (and falsey ?) and break the intent of this logic. I’m not sure how an attacker could use an unroutable address to break your cache though so probably moot.

    naumenkogs commented at 2:40 pm on July 1, 2020:
    Are you sure Optional(0) would be false? Would be useful to double-check the optional spec, but I expected that any value present (including 0) would give true

    pinheadmz commented at 3:21 pm on July 1, 2020:
    Confirmed, you are right. Thanks!
  104. rajarshimaitra commented at 3:03 pm on July 1, 2020: contributor
    tACK. Compiled and tests passing. Need to think more on attack surfaces but so far with the above discussion from more competent people, it seems to be fine. And the gains are rational. So far my only thought is there is a replication of addresses happening in the cache as per network types. How much is the memory of consumption for this cache? It seems to be at least 3 times (ipv4/ipv6/local+tor) of whatever it was before? Is there any measurement on the number?
  105. troygiorshev commented at 5:03 pm on July 1, 2020: contributor

    ACK 52d22a3c235bfed8a772b7cccbe470efdfa10291

    Reviewed, tested.

    Without checking, I’m not worried about the memory usage for m_addr_response_caches.

    I need to look further at the justification of 24h and 1000 cached addresses.

    I think this change may have larger (positive) consequences than described in the description, though I would like to make that more precise.

    Some interesting, if slightly outdated papers on this topic:

  106. Emzy commented at 6:19 pm on July 1, 2020: contributor

    My concern would be nodes that change there IP address often. For example in Germany it is often the case that the ISP will give you every 24 hours a new IPv4 address and a new IPv6 prefix.

    I think these nodes would get less incoming connections, because it will take between 0-24 hours to get their new address in the P2P network and also to DNS seeds.

    It could be that this is not relevant, because dial in users may most of the time don’t allow incoming connections. But it will further disencourage running a node at home.

  107. naumenkogs force-pushed on Jul 4, 2020
  108. naumenkogs force-pushed on Jul 4, 2020
  109. in src/net.cpp:2534 in a5131685d1 outdated
    2527@@ -2527,7 +2528,24 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2528 
    2529 std::vector<CAddress> CConnman::GetAddresses()
    2530 {
    2531-    return addrman.GetAddr();
    2532+    std::vector<CAddress> addresses;
    2533+    for (CAddress addr: addrman.GetAddr()) {
    2534+        if (!m_banman->IsBanned(addr)) {
    2535+            addresses.push_back(addr);
    


    jnewbery commented at 4:41 pm on July 4, 2020:

    This is less efficient than the current code because you’re creating a new vector and individually copying elements to it. This is also going to involve a lot of reallocations, since addresses can potentially grow as large as 2500.

    Instead, assign addresses to be the return value from GetAddr() (which should be a move because of RVO), and then use the remove-erase idiom to delete the unwanted elements from addresses. See https://github.com/bitcoin/bitcoin/blob/5ec19df687c2a429dd8872d640ce88fdc06ded9b/src/net.cpp#L917 for an example of that in our codebase.


    naumenkogs commented at 7:53 am on July 7, 2020:
    Right, I was considering this option, but for some reason I thought it’s even more expensive… Now that I re-read this explainer it actually makes sense. Thanks!
  110. in src/net.h:256 in a5131685d1 outdated
    252@@ -252,6 +253,11 @@ class CConnman
    253     void MarkAddressGood(const CAddress& addr);
    254     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    255     std::vector<CAddress> GetAddresses();
    256+    // Cache is used to minimize topology leaks, so it should
    


    jnewbery commented at 4:42 pm on July 4, 2020:
    nit: prefer to use doxygen-style comments for function declaration comments.
  111. in src/net.h:259 in a5131685d1 outdated
    252@@ -252,6 +253,11 @@ class CConnman
    253     void MarkAddressGood(const CAddress& addr);
    254     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    255     std::vector<CAddress> GetAddresses();
    256+    // Cache is used to minimize topology leaks, so it should
    257+    // be used for all non-trusted calls, for example, p2p.
    258+    // A non-malicious call (from RPC or a whitelisted node)
    259+    // should avoid using the cache by passing nullopt.
    


    jnewbery commented at 4:44 pm on July 4, 2020:
    This comment about nullopt is outdated
  112. in src/net.h:445 in a5131685d1 outdated
    434+    // Addr responses stored in different caches
    435+    // per network prevent cross-network node identification.
    436+    // If a node for example is multi-homed under Tor and IPv6,
    437+    // a single cache (or no cache at all) would let an attacker
    438+    // to easily detect that it is the same node by comparing responses.
    439+    // The used memory equals to 1000 CAddress records (or around 32 bytes) per
    


    jnewbery commented at 4:49 pm on July 4, 2020:
    Where did you get 1000 from? The maximum number of CAddress objects returned by CAddrMan::GetAddr() is ADDRMAN_GETADDR_MAX(2500)

    naumenkogs commented at 8:40 am on July 7, 2020:
    You are right, 1000 limit is applied later, “receiver rejects addr messages larger than 1000”
  113. in src/net.cpp:2550 in a5131685d1 outdated
    2544+    if (m_addr_response_caches.find(requestor_network) == m_addr_response_caches.end() ||
    2545+        m_addr_response_caches[requestor_network].m_update_addr_response < current_time) {
    2546+        m_addr_response_caches[requestor_network].m_addrs_response_cache = GetAddresses();
    2547+        m_addr_response_caches[requestor_network].m_update_addr_response = current_time + std::chrono::hours(21) + GetRandMillis(std::chrono::hours(6));
    2548+    }
    2549+    return m_addr_response_caches[requestor_network].m_addrs_response_cache;
    


    jnewbery commented at 4:56 pm on July 4, 2020:
    It’s a bit of a shame that the return has to copy the vector rather than eg returning a reference to const. But it’s a lot better than the current code, which iterates through the entire AddrMan on each GetAddresses call. A follow-up PR could potentially make this more efficient.
  114. in src/net_processing.cpp:3488 in a5131685d1 outdated
    3336         FastRandomContext insecure_rand;
    3337         for (const CAddress &addr : vAddr) {
    3338-            if (!banman->IsBanned(addr)) {
    3339-                pfrom.PushAddress(addr, insecure_rand);
    3340-            }
    3341+            pfrom.PushAddress(addr, insecure_rand);
    


    jnewbery commented at 5:13 pm on July 4, 2020:

    Well this is fascinating. Because of the way we call PushAddress() serially with the elements of the vector in order, and each call after vAddrToSend is full evicts a random element, then repeatedly sending GETADDR will result in ADDR responses that leak the entire m_addr_response_caches[requestor_network].m_addrs_response_cache and the order of elements in that vector. I don’t think that’s a problem, because the cache is basically public data, but it’s an interesting outcome of the way the vAddrToSend is constructed.

    We should probably limit CachedAddrResponse.m_addrs_response_cache to 1000 elements, possibly by adding a max_elements value to the CAddrMan::GetAddr() function. Sorry to keep asking you to go deeper into CAddrMan!


    naumenkogs commented at 10:25 am on July 7, 2020:
    Can we discuss this a bit more? I don’t mind adding that code, I just don’t see the motivation. As you point out, this particular leak doesn’t matter, because cache is public data. Why limiting to 1000? Maybe another motivation is saving 300 KB of RAM?

    naumenkogs commented at 10:46 am on July 7, 2020:
    As someone points out, there’s a slight non-useful inconsistency. Do you think we should address it within this PR?

    jnewbery commented at 4:11 pm on July 7, 2020:

    I just don’t see the motivation

    As far as I can tell, the intent for this PR was that for each network, we’d create a cache of 1000 addresses, and serve the same list of 1000 addresses (less banned/discouraged) in response to each GETADDR request. Indeed, that’s what the functional test is testing for. Instead, we create a cache of 2500, and serve a random 1000 of them to each GETADDR request, with the following distribution:

    image

    (ie the last address record in the cache always appears in the ADDR response, whereas the first 1000 appear with a probability of ~0.22)

    That’s very unexpected! The fact that neither the author nor any of the reviewers realised that is motivation enough to make the code easier to understand.


    naumenkogs commented at 4:39 pm on July 7, 2020:
    Right, I guess I’ll put together a separate commit just to address 2500/1000 confusion we had even before this PR

    jnewbery commented at 4:51 pm on July 7, 2020:

    I’ll put together a separate commit just to address 2500/1000 confusion

    That’d be great. I think it can be a separate PR. My suggestion: remove the ADDRMAN_GETADDR_MAX_PCT and ADDRMAN_GETADDR_MAX constants from addrman and make it the caller’s responsibility to set max records/percentage values. Net processing can use 1000/23 and rpc/net can set no max, potentially exposing all address records in addrman to the client.


    sipa commented at 7:40 pm on July 7, 2020:
    @jnewbery I think that’s the right approach.

    naumenkogs commented at 8:45 am on July 8, 2020:
    Added the basic fix and updated the test, leaving moving responsibilities for a future PR.
  115. naumenkogs force-pushed on Jul 7, 2020
  116. DrahtBot added the label Needs rebase on Jul 7, 2020
  117. naumenkogs force-pushed on Jul 8, 2020
  118. DrahtBot removed the label Needs rebase on Jul 8, 2020
  119. in src/net_processing.cpp:2446 in 8934f6051a outdated
    2442@@ -2443,7 +2443,7 @@ void ProcessMessage(
    2443         vRecv >> vAddr;
    2444 
    2445         // Don't want addr from older versions unless seeding
    2446-        if (pfrom.nVersion < CADDR_TIME_VERSION && connman->GetAddressCount() > 1000)
    2447+        if (pfrom.nVersion < CADDR_TIME_VERSION && connman->GetAddressCount() > ADDRMAN_GETADDR_MAX)
    


    jnewbery commented at 2:00 pm on July 8, 2020:
    I think it’d be more appropriate to use MAX_ADDR_TO_SEND as the constant here (and below)
  120. in src/addrman.h:160 in 8934f6051a outdated
    156@@ -157,7 +157,7 @@ class CAddrInfo : public CAddress
    157 #define ADDRMAN_GETADDR_MAX_PCT 23
    158 
    159 //! the maximum number of nodes to return in a getaddr call
    160-#define ADDRMAN_GETADDR_MAX 2500
    161+#define ADDRMAN_GETADDR_MAX 1000
    


    jnewbery commented at 2:21 pm on July 8, 2020:
    Maybe add a static_assert in net.h that MAX_ADDR_TO_SEND is the same as ADDRMAN_GETADDR_MAX (with a TODO to remove ADDRMAN_GETADDR_MAX)

    ariard commented at 0:16 am on August 3, 2020:
    I think commit message could be clearer, it doesn’t remove the limit on AddrMan queries, rather bound it to the maximum size of ADDR message ?
  121. in src/net.h:427 in 8934f6051a outdated
    423@@ -416,6 +424,25 @@ class CConnman
    424     std::atomic<NodeId> nLastNodeId{0};
    425     unsigned int nPrevNodeCount{0};
    426 
    427+    // Cache responses to addr requests to minimize privacy leak.
    


    jnewbery commented at 2:23 pm on July 8, 2020:
    nit: please doxygen-format these comments too

    jnewbery commented at 9:51 am on July 16, 2020:
    and the one below please!
  122. in test/functional/p2p_getaddr_caching.py:76 in 8934f6051a outdated
    63+        msg_send_addrs = msg_addr()
    64+        self.log.info('Fill peer AddrMan with a lot of records')
    65+        addrs = gen_addrs(10000)
    66+        for i in range(10):
    67+            msg_send_addrs.addrs = addrs[i * 1000:(i + 1) * 1000]
    68+            addr_source.send_and_ping(msg_send_addrs)
    


    jnewbery commented at 2:39 pm on July 8, 2020:
    This doesn’t appear to fill the addrman with 10,000 address records. If I run the getnodeaddresses RPC after this repeatedly and keep a set of all of the address records returned, I only get to around 3500 (it varies on each test run).

    naumenkogs commented at 8:38 pm on July 8, 2020:
    Sure, but my goal here was to make it more than 1,000; just in case. Any particular suggestion?

    jnewbery commented at 8:59 pm on July 8, 2020:

    Because of the limit of returning 23% of records in GetAddr(), there need to be ~4350 records for the cache to be filled with 1000 records.

    I haven’t looked into how CAddress records in ADDR messages are saved to AddrMan, so I can’t offer any insight into why only ~3500 are being saved here.


    naumenkogs commented at 9:29 pm on July 8, 2020:

    Sorry, I’m still not sure what we’re trying to accomplish here… I think as long as we use 1,000 and not 2,500 for max cache size, either actual cache size (less or equal then 1,000) is fine.

    I just chose this 10,000 value so that it’s big enough, just to hit some potential corner case. I don’t see a real reason to go beyond that (for example, to have a cache with 1,000 records). I can probably achieve that, but right now I just don’t see the motivation.


    jnewbery commented at 5:27 am on July 9, 2020:

    The motivation is that the tests should test the limits of the new functionality. The cache can have up to 1000 entries in it. What’s the behavior like when that cache is full?

    The description of this PR says:

    cache responses to GETADDR on a daily basis, so that an attacker gets at most 1,000 records per day

    The test should test that an attacker is actually only able to get 1,000 records per day.


    naumenkogs commented at 8:52 am on July 9, 2020:

    I updated the test and I think these 2 checks are sort of sufficient here:

    1. All responses during the day are same
    2. They are less than 1000 records I think (1) is much more interesting then (2), and the second part of the test confirms that (1) is only day-long cache.

    I’ll take a look why it’s not exact cache size of 1000 later, but I still think it’s not that substantial.


    naumenkogs commented at 1:31 pm on July 14, 2020:

    Alright, so the issue is that we map every source of the ADDRs to 64 buckets 64 slots each -> one source can add at most 64x64=4096 records. This results in at most 4096*0.23=942 records returned from AddrMan.

    As far as I know, there is currently no good way to work around this per-source limit while testing, so leaving it as is for now. As I mention above, I think this is not critical for this test.


    jnewbery commented at 4:48 pm on July 14, 2020:
    ok, sounds reasonable. It’s a shame that our testing framework won’t allow us to fully test the edges of this, but that shouldn’t hold this PR up.
  123. jnewbery commented at 2:39 pm on July 8, 2020: member
    Concept ACK doing the simple thing here and fixing the addrman interface properly in a separate PR.
  124. naumenkogs force-pushed on Jul 9, 2020
  125. naumenkogs force-pushed on Jul 9, 2020
  126. DrahtBot added the label Needs rebase on Jul 9, 2020
  127. naumenkogs force-pushed on Jul 14, 2020
  128. in src/net.cpp:2534 in 18b568a8ac outdated
    2528@@ -2528,7 +2529,22 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2529 
    2530 std::vector<CAddress> CConnman::GetAddresses()
    2531 {
    2532-    return addrman.GetAddr();
    2533+    std::vector<CAddress> addresses = addrman.GetAddr();
    2534+    addresses.erase(std::remove_if(addresses.begin(), addresses.end(),
    2535+                    [this](CAddress const &addr){return m_banman->IsDiscouraged(addr) || m_banman->IsBanned(addr);}),
    


    jnewbery commented at 3:42 pm on July 14, 2020:
    This should check whether m_banman exists before dereferencing.
  129. naumenkogs force-pushed on Jul 15, 2020
  130. naumenkogs commented at 8:48 am on July 15, 2020: member
    I believe I resolved all the outstanding feedback (some things be better in a follow-up), looking for code acks now :)
  131. naumenkogs force-pushed on Jul 15, 2020
  132. jnewbery commented at 9:54 am on July 15, 2020: member
    netbase_tests/netpermissions_test is failing: check strings.size() == 6U has failed [7 != 6]
  133. in src/net.cpp:980 in 79bf91f88f outdated
    976@@ -977,6 +977,7 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
    977         if (gArgs.GetBoolArg("-whitelistrelay", DEFAULT_WHITELISTRELAY)) NetPermissions::AddFlag(permissionFlags, PF_RELAY);
    978         NetPermissions::AddFlag(permissionFlags, PF_MEMPOOL);
    979         NetPermissions::AddFlag(permissionFlags, PF_NOBAN);
    980+        NetPermissions::AddFlag(permissionFlags, PF_ADDR);
    


    MarcoFalke commented at 10:23 am on July 15, 2020:
    why extend this further? legacy should die
  134. MarcoFalke added the label Needs release note on Jul 15, 2020
  135. DrahtBot removed the label Needs rebase on Jul 15, 2020
  136. naumenkogs force-pushed on Jul 15, 2020
  137. naumenkogs force-pushed on Jul 15, 2020
  138. naumenkogs force-pushed on Jul 15, 2020
  139. naumenkogs force-pushed on Jul 15, 2020
  140. in src/net.h:261 in fe7d2fbc63 outdated
    254@@ -251,6 +255,13 @@ class CConnman
    255     void MarkAddressGood(const CAddress& addr);
    256     void AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddress& addrFrom, int64_t nTimePenalty = 0);
    257     std::vector<CAddress> GetAddresses();
    258+    /**
    259+     * Cache is used to minimize topology leaks, so it should
    260+     * be used for all non-trusted calls, for example, p2p.
    261+     * A non-malicious call (from RPC or a whitelisted peer) should
    


    MarcoFalke commented at 4:37 pm on July 15, 2020:
    whitelist is deprecated, please mention the exact permission flag. See also #19474
  141. naumenkogs force-pushed on Jul 15, 2020
  142. DrahtBot added the label Needs rebase on Jul 16, 2020
  143. naumenkogs force-pushed on Jul 16, 2020
  144. DrahtBot removed the label Needs rebase on Jul 16, 2020
  145. in src/net.h:445 in 8ff4c6754d outdated
    440+    // Addr responses stored in different caches
    441+    // per network prevent cross-network node identification.
    442+    // If a node for example is multi-homed under Tor and IPv6,
    443+    // a single cache (or no cache at all) would let an attacker
    444+    // to easily detect that it is the same node by comparing responses.
    445+    // The used memory equals to 2500 CAddress records (or around 80 bytes) per
    


    jnewbery commented at 9:51 am on July 16, 2020:
    This is incorrect. It should be 1000 CAddress records.
  146. in test/functional/p2p_getaddr_caching.py:72 in 8ff4c6754d outdated
    62+        self.log.info('Create connection that sends and requests addr messages')
    63+        addr_source = self.nodes[0].add_p2p_connection(P2PInterface())
    64+
    65+        msg_send_addrs = msg_addr()
    66+        self.log.info('Fill peer AddrMan with a lot of records')
    67+        total_addrs = 10000
    


    jnewbery commented at 9:56 am on July 16, 2020:
    I think it’s worth a comment here explaining #18991 (review), otherwise people reading this test will expect there to be 10,000 entries in addrman.
  147. in test/functional/p2p_getaddr_caching.py:23 in 8ff4c6754d outdated
    18+from test_framework.test_framework import BitcoinTestFramework
    19+from test_framework.util import (
    20+    assert_equal,
    21+    wait_until
    22+)
    23+import time
    


    jnewbery commented at 9:57 am on July 16, 2020:
    Place standard library imports before local project imports.
  148. in test/functional/p2p_getaddr_caching.py:84 in 8ff4c6754d outdated
    79+            addr_receiver.send_and_ping(msg_getaddr())
    80+            # Trigger response
    81+            cur_mock_time += 5 * 60
    82+            self.nodes[0].setmocktime(cur_mock_time)
    83+            wait_until(addr_receiver.received_addr, lock=mininode_lock)
    84+            responses.append(addr_receiver.received_addrs)
    


    jnewbery commented at 10:00 am on July 16, 2020:
    You should really take mininode_lock before reading this field (or better - add a getter method to AddrReceiver that takes the lock internally and returns a list)
  149. in test/functional/p2p_getaddr_caching.py:100 in 8ff4c6754d outdated
     95+        # Trigger response
     96+        cur_mock_time += 5 * 60
     97+        self.nodes[0].setmocktime(cur_mock_time)
     98+        wait_until(last_addr_receiver.received_addr, lock=mininode_lock)
     99+        # new response is different
    100+        assert(set(responses[0]) != set(last_addr_receiver.received_addrs))
    


    jnewbery commented at 10:00 am on July 16, 2020:
    as above, this requires mininode_lock
  150. jnewbery commented at 10:01 am on July 16, 2020: member

    utACK 8ff4c6754dce3b679d928265eb04a6b98fe9a8d7

    A few minor comments inline.

  151. naumenkogs force-pushed on Jul 16, 2020
  152. jnewbery commented at 4:55 pm on July 21, 2020: member
    Changes look good but p2p_permissions.py seems to be broken.
  153. naumenkogs force-pushed on Jul 22, 2020
  154. jnewbery commented at 9:38 am on July 23, 2020: member
    utACK 130655a09054e7fe4388a4383ed320f4ce7de103
  155. jnewbery commented at 5:34 pm on July 23, 2020: member
    @naumenkogs - I have a branch at https://github.com/jnewbery/bitcoin/tree/2020-07-addrman-get that allows addresses to be added to AddrMan manually over RPC. With that branch, I’m very easily able to add thousands of addresses, which would make it possible to test the edge conditions of this PR. It conflicts with this branch, so I’m not planning to open it as a PR yet, but perhaps you’d like to take it after this gets merged.
  156. in test/functional/p2p_blocksonly.py:71 in 130655a090 outdated
    64@@ -65,10 +65,10 @@ def run_test(self):
    65         second_peer = self.nodes[0].add_p2p_connection(P2PInterface())
    66         peer_1_info = self.nodes[0].getpeerinfo()[0]
    67         assert_equal(peer_1_info['whitelisted'], True)
    68-        assert_equal(peer_1_info['permissions'], ['noban', 'forcerelay', 'relay', 'mempool', 'download'])
    69+        assert_equal(peer_1_info['permissions'], ['noban', 'forcerelay', 'relay', 'mempool', "download"])
    70         peer_2_info = self.nodes[0].getpeerinfo()[1]
    71         assert_equal(peer_2_info['whitelisted'], True)
    72-        assert_equal(peer_2_info['permissions'], ['noban', 'forcerelay', 'relay', 'mempool', 'download'])
    73+        assert_equal(peer_2_info['permissions'], ['noban', 'forcerelay', 'relay', 'mempool', "download"])
    


    MarcoFalke commented at 6:10 pm on July 23, 2020:
    why is this changed?
  157. in src/net.cpp:2534 in 130655a090 outdated
    2527@@ -2528,7 +2528,24 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2528 
    2529 std::vector<CAddress> CConnman::GetAddresses()
    2530 {
    2531-    return addrman.GetAddr();
    2532+    std::vector<CAddress> addresses = addrman.GetAddr();
    2533+    if (m_banman) {
    2534+        addresses.erase(std::remove_if(addresses.begin(), addresses.end(),
    2535+                        [this](CAddress const &addr){return m_banman->IsDiscouraged(addr) || m_banman->IsBanned(addr);}),
    


    MarcoFalke commented at 6:12 pm on July 23, 2020:
    0                        [this](const CAddress& addr){return m_banman->IsDiscouraged(addr) || m_banman->IsBanned(addr);}),
    

    Any reason to use this style, as opposed to the project’s style?

  158. in test/functional/p2p_getaddr_caching.py:89 in 130655a090 outdated
    84+            addr_receiver = self.nodes[0].add_p2p_connection(AddrReceiver())
    85+            addr_receiver.send_and_ping(msg_getaddr())
    86+            # Trigger response
    87+            cur_mock_time += 5 * 60
    88+            self.nodes[0].setmocktime(cur_mock_time)
    89+            wait_until(addr_receiver.addr_received, lock=mininode_lock)
    


    MarcoFalke commented at 6:14 pm on July 23, 2020:
    0            addr_receiver.wait_until(addr_receiver.addr_received)
    
  159. in test/functional/p2p_getaddr_caching.py:104 in 130655a090 outdated
     99+        last_addr_receiver = self.nodes[0].add_p2p_connection(AddrReceiver())
    100+        last_addr_receiver.send_and_ping(msg_getaddr())
    101+        # Trigger response
    102+        cur_mock_time += 5 * 60
    103+        self.nodes[0].setmocktime(cur_mock_time)
    104+        wait_until(last_addr_receiver.addr_received, lock=mininode_lock)
    


    MarcoFalke commented at 6:14 pm on July 23, 2020:
    same
  160. MarcoFalke commented at 6:15 pm on July 23, 2020: member
    left some style-nits, didn’t review. (Feel free to ignore, ofc)
  161. hebasto commented at 7:53 am on July 24, 2020: member
    Concept ACK.
  162. Move filtering banned addrs inside GetAddresses() ded742bc5b
  163. Remove useless 2500 limit on AddrMan queries 7cc0e8101f
  164. naumenkogs force-pushed on Jul 24, 2020
  165. naumenkogs commented at 3:06 pm on July 24, 2020: member
    Addressed nits suggested by @MarcoFalke
  166. naumenkogs force-pushed on Jul 24, 2020
  167. naumenkogs force-pushed on Jul 24, 2020
  168. naumenkogs force-pushed on Jul 24, 2020
  169. hebasto commented at 3:49 am on July 25, 2020: member
    @naumenkogs @sipa Would these changes affect the reference bitcoin-seeder?
  170. MarcoFalke commented at 6:44 am on July 25, 2020: member
    re-run ci
  171. MarcoFalke closed this on Jul 25, 2020

  172. MarcoFalke reopened this on Jul 25, 2020

  173. naumenkogs commented at 9:14 am on July 27, 2020: member
    @hebasto I think there might be a very limited effect: seeders might be getting/serving slightly outdated timestamps. But this effect is very limited, because the cache lifetime is only 24h. Addrs are considered to be non-terrible (see isTerrible) if they’re less than 30 days old. It was somewhat discussed at the review club, sipa was also there.
  174. jonatack commented at 9:59 am on July 27, 2020: member
    @naumenkogs My unresolved worry with this change is expressed in the second review club log here, lines 347-370. WDYT?
  175. naumenkogs commented at 10:39 am on July 27, 2020: member

    @jonatack Since we currently don’t have a good way to measure all those AddrMan properties and have a high-level “roadmap” of moving forward, I think we should focus on the outstanding problems.

    There is a problem with AddrMan privacy. This PR clearly improves upon that direction. How does it affect other properties? Not much, the only consequence is lists being a bit more outdated. I have plans to improve on this property in future. For example, I’m thinking of self-announcements to feelers. Also, see #17194.

    If you see the current state of AddrMan as a trade-off, I think it’s inadequate: the privacy loss is too high/dangerous, and the benefit w.r.t. freshness etc. is too low.

    I think this is sufficient to merge this PR.

  176. jonatack commented at 10:53 am on July 27, 2020: member
    I’m onboard with improved privacy; what I’m still working out are the consequences of adversely affecting discovery of newly online peers or ones who change IP address frequently. The former case is one I’m observing closely nowadays using a custom peer connections dashboard WRT notably peer discovery and inbounds evictions. Do you have a branch for the self announcements to feelers?
  177. naumenkogs commented at 11:01 am on July 27, 2020: member

    @jonatack no, I don’t have that branch, it’s only an idea right now.

    So your concern is that some peers would be easier deprioritized and maybe evicted more often? Yeah, it’s a valid one, but again:

    • I don’t think this PR makes it much worse. There are probably easy ways to affect that stuff already.
    • I don’t think the publicity of AddrMan is the right solution to the problem you’re describing anyways.

    I’m interested in exploring this problem more, but I think it can be done separately.

  178. EthanHeilman commented at 4:29 pm on July 28, 2020: contributor

    I’m onboard with improved privacy; what I’m still working out are the consequences of adversely affecting discovery of newly online peers or ones who change IP address frequently.

    Can we build some sort of simulation to test what the effects of this are? If getaddr is the main way that new IPs are propagated, then this would seriously hurt new IP propagation. If, as I suspect, getadddr isn’t particularly important to propagate new IPs then this makes some sense.

    With regards to fingerprint, I think we have a more signaling descriptor, i.e address, and changing this one would force cleaning the cache, thus it doesn’t introduce a more severe one ? @ariard raises a really important issue here. If the cache value returned is always the same in a 24 hour period then within that 24 hour period I can always re-identify a node by seeing if it returned the exact same 1000 addresses. With out this PR I would have to make a few hundred connections to re-identify a node. Maybe this fingerprinting attack isn’t a concern due to how bitcoin is used?

  179. naumenkogs commented at 7:48 am on July 29, 2020: member

    With out this PR I would have to make a few hundred connections to re-identify a node. Maybe this fingerprinting attack isn’t a concern due to how bitcoin is used?

    1. True, but I wouldn’t rely on this “protection” at all, because creating 100 connections (from the same source IP) is easy.
    2. Your argument also reminded me that this PR protects from mapping IP<->Tor identities of the same node as per 100-connections method you mention. After this PR, there will be separate caches for every net, so an attacker would have to spend weeks to achieve the same precision.
  180. EthanHeilman commented at 2:56 pm on July 29, 2020: contributor

    After this PR, there will be separate caches for every net, so an attacker would have to spend weeks to achieve the same precision.

    That seems like a pretty big win, although do we know how private Bitcoin nodes on tor are?

  181. naumenkogs commented at 3:18 pm on July 29, 2020: member

    That seems like a pretty big win, although do we know how private Bitcoin nodes on tor are?

    There is a decent number of them (couple thousand listening all the time, and more non-listening I guess), but I think we don’t really know how many of them configured everything properly :)

    There are maybe still a couple places in the p2p protocol where a spy can correlate ip<->Tor (need more research here). Tor nodes are probably prone to sybil attacks too (even more than regular nodes), which potentially enables even worse spying.

    Nonetheless I think fixing this issue is helpful for them, although this is not the primary motivation for this change.

  182. fanquake requested review from jnewbery on Jul 30, 2020
  183. fanquake requested review from MarcoFalke on Jul 30, 2020
  184. fanquake requested review from sipa on Jul 30, 2020
  185. in src/net.h:440 in 20de7928de outdated
    435+    struct CachedAddrResponse {
    436+        std::vector<CAddress> m_addrs_response_cache;
    437+        std::chrono::microseconds m_update_addr_response{0};
    438+    };
    439+
    440+    // Addr responses stored in different caches
    


    jnewbery commented at 10:43 am on July 30, 2020:
    Can you make this a doxygen style comment please :pray:
  186. jnewbery commented at 10:51 am on July 30, 2020: member
    Code review ACK 20de7928de545316320e6d82c89151159b12378f
  187. Cache responses to addr requests
    Prevents a spy from scraping victim's AddrMan by
    reconnecting and re-requesting addrs.
    acd6135b43
  188. Add addr permission flag enabling non-cached addr sharing cf1569e074
  189. Test addr response caching 3bd67ba5a4
  190. naumenkogs force-pushed on Jul 30, 2020
  191. jnewbery commented at 11:43 am on July 30, 2020: member
    reACK 3bd67ba5a4ef3c20ef1f873b63c9f53a6c8608b6
  192. in src/net.cpp:2546 in acd6135b43 outdated
    2538@@ -2539,6 +2539,17 @@ std::vector<CAddress> CConnman::GetAddresses()
    2539     return addresses;
    2540 }
    2541 
    2542+std::vector<CAddress> CConnman::GetAddresses(Network requestor_network)
    2543+{
    2544+    const auto current_time = GetTime<std::chrono::microseconds>();
    2545+    if (m_addr_response_caches.find(requestor_network) == m_addr_response_caches.end() ||
    2546+        m_addr_response_caches[requestor_network].m_update_addr_response < current_time) {
    


    promag commented at 1:58 pm on August 2, 2020:

    acd6135b43941fa51d52f5fcdb2ce944280ad01e

    Use iterator instead:

    0auto i = m_addr_response_caches.find(requestor_network);
    1if (i == m_addr_response_caches.end() || i->m_update_addr_response < current_time)
    

    naumenkogs commented at 10:37 am on August 11, 2020:
    It doesn’t really work because it requires “i->second” :(
  193. in src/net.cpp:2548 in acd6135b43 outdated
    2543+{
    2544+    const auto current_time = GetTime<std::chrono::microseconds>();
    2545+    if (m_addr_response_caches.find(requestor_network) == m_addr_response_caches.end() ||
    2546+        m_addr_response_caches[requestor_network].m_update_addr_response < current_time) {
    2547+        m_addr_response_caches[requestor_network].m_addrs_response_cache = GetAddresses();
    2548+        m_addr_response_caches[requestor_network].m_update_addr_response = current_time + std::chrono::hours(21) + GetRandMillis(std::chrono::hours(6));
    


    promag commented at 2:04 pm on August 2, 2020:

    acd6135b43941fa51d52f5fcdb2ce944280ad01e

    I think you could explain this expiration duration. Looks like it could be a lot less and still prevent spying?


    jnewbery commented at 1:12 pm on August 3, 2020:
    The idea is to prevent peers scraping our addrman. The shorter the cache duration, the less time it would take a peer to get an (almost) complete scan of addrman.

    naumenkogs commented at 5:26 pm on August 3, 2020:

    But yeah, it’s hard to justify 24h vs 1h cache without a good analytical model of the whole addr relay.

    Right now cache lifetime is 0, so it takes like 0 seconds to scrape it. If cache was 1h, it would take about 100h (because ~100 requests are usually needed). However, during those 100h sensitive data might be already overwritten, so an attacker might not get useful info. It feels like 100h is too short, but maybe 1000h would be fine then?

    On the other hand, shorter lifetime would keep addr records a bit more fresh (if we consider GETADDR relay aspect), so that’s the benefit/tradeoff. But again, how much does the freshness of GETADDR matter? And especially since with 24h cache the records are at most 24h outdated, while we look at 1 week when considering IsTerrible(). That’s why I don’t think it’s a big of a concern and a tradeoff to consider.

  194. in src/net.h:436 in acd6135b43 outdated
    431+     * Attack example: scraping addrs in real-time may allow an attacker
    432+     * to infer new connections of the victim by detecting new records
    433+     * with fresh timestamps (per self-announcement).
    434+     */
    435+    struct CachedAddrResponse {
    436+        std::vector<CAddress> m_addrs_response_cache;
    


    promag commented at 2:07 pm on August 2, 2020:

    acd6135b43941fa51d52f5fcdb2ce944280ad01e

    nit, no need for m_ prefix in structs (?).


    jnewbery commented at 4:57 pm on August 2, 2020:
    I assumed that all member variables (of classes and structs) should have the m_ prefix.

    naumenkogs commented at 7:12 am on August 3, 2020:
    Yeah Ive got the same impression.

    promag commented at 10:49 am on August 3, 2020:
    See some recently added structs. Nit anyway.

    naumenkogs commented at 10:41 am on August 11, 2020:
    I see e.g. TxDownloadState has m_ prefix :)
  195. promag commented at 2:12 pm on August 2, 2020: member
    Code review ACK 3bd67ba5a4ef3c20ef1f873b63c9f53a6c8608b6.
  196. in src/net.cpp:2535 in ded742bc5b outdated
    2529@@ -2530,7 +2530,13 @@ void CConnman::AddNewAddresses(const std::vector<CAddress>& vAddr, const CAddres
    2530 
    2531 std::vector<CAddress> CConnman::GetAddresses()
    2532 {
    2533-    return addrman.GetAddr();
    2534+    std::vector<CAddress> addresses = addrman.GetAddr();
    2535+    if (m_banman) {
    2536+        addresses.erase(std::remove_if(addresses.begin(), addresses.end(),
    


    ariard commented at 0:04 am on August 3, 2020:
    nit: I think that’s a slight perfomance regression as you will iterate twice on the non-erased vector elements. Likely not to matter as it’s not a hot point.

    jnewbery commented at 10:15 am on August 3, 2020:
    No. remove-erase removes unwanted elements in a single pass.

    ariard commented at 12:31 pm on August 3, 2020:

    Previously, in net_processing, we iterate on vAddr and filter out discouraged/banned elements before to push them in vAddrToSend. So that’s a single pass.

    Now, in net.cpp we iterate on addresses on every element, then in net_processing we iterate again on vAddr to push elements in vAddrToSend. So that’s two pass on the valid set ?


    jnewbery commented at 1:21 pm on August 3, 2020:
    Ah, ok. Yes, you’re right that for peers with “addr” permissions, we’ll iterate through the list twice (although the list will be max size 1000 instead of 2500, so still fewer items). However, in the common case, we’ll only iterate through the cached list once, and not have to call IsDiscouraged() or IsBanned() on that list.
  197. in src/net.h:437 in acd6135b43 outdated
    432+     * to infer new connections of the victim by detecting new records
    433+     * with fresh timestamps (per self-announcement).
    434+     */
    435+    struct CachedAddrResponse {
    436+        std::vector<CAddress> m_addrs_response_cache;
    437+        std::chrono::microseconds m_update_addr_response{0};
    


    ariard commented at 0:19 am on August 3, 2020:
    It sounds you compare a verb m_update_addr_response to a noun current_time. IMO, m_cache_expiration_time would be more meaningful
  198. in src/net.h:448 in acd6135b43 outdated
    443+     * If a node for example is multi-homed under Tor and IPv6,
    444+     * a single cache (or no cache at all) would let an attacker
    445+     * to easily detect that it is the same node by comparing responses.
    446+     * The used memory equals to 1000 CAddress records (or around 32 bytes) per
    447+     * distinct Network (up to 5) we have/had an inbound peer from,
    448+     * resulting in at most ~160 KB.
    


    ariard commented at 0:37 am on August 3, 2020:

    You may underscore that the cache is global and not per-peer thus being uniform to who ever is querying during the cache period. I think a per-peer cache would be worst, unless you have first a global and announce random subset from it.

    That said, an area of research could be to tie the cache in function of the subnet/ASN of requestors to nudge their connections towards a more diversified graph. Or sort/cap them by subnet. But likely such mechanism could be a double-edge :)

  199. ariard commented at 0:45 am on August 3, 2020: member

    Code Review ACK 3bd67ba

    Mostly minors, I think commit messages/comments could be more explicit about assumptions and resume fruitful PR/IRC discussions but can be done as follow-up.

  200. naumenkogs commented at 7:21 am on August 3, 2020: member

    Thanks for the reviews! It’s currently Code review ACKs from ariard, jnewbery and promag on the up-to-date commit, several of Concept ACKs before, and several tested ACKs on the version (before minor refactoring) we discussed at the review club.

    The latest suggestions seem to be couple nits and discussion points, so for the sake of review efficiency let’s not address them for now :)

  201. jnewbery commented at 10:16 am on August 3, 2020: member

    The latest suggestions seem to be couple nits and discussion points, so for the sake of review efficiency let’s not address them for now :)

    I agree. I think this is ready for merge now. Minor review points can be fixed in a follow-up.

  202. promag commented at 12:49 pm on August 3, 2020: member
    Don’t forget #18991 (review), seems it wasn’t previously discussed.
  203. laanwj merged this on Aug 3, 2020
  204. laanwj closed this on Aug 3, 2020

  205. laanwj commented at 1:06 pm on August 3, 2020: member

    Don’t forget #18991 (comment), seems it wasn’t previously discussed.

    Sorry, hadn’t noticed this before merging. As it’s a question, I think discussion can simply continue here, and if further changes are necessary it can be in a follow-up PR?

  206. laanwj removed this from the "Blockers" column in a project

  207. sidhujag referenced this in commit 810e528e95 on Aug 3, 2020
  208. sipa commented at 8:13 pm on August 3, 2020: member
    Random thought: there is a (rather uncommon, I suspect) scenario in which this definitely worsens node identity discoverability: if a peer has multiple listening addresses associated with one network. I don’t think this is a big concern, but it’s also fairly easy to address: make the cache map not indexed by just Network, but also add the local socket address (see getsockname).
  209. naumenkogs commented at 7:16 am on August 4, 2020: member

    scenario in which this definitely worsens node identity discoverability: if a peer has multiple listening addresses associated with one network

    You probably mean makes it easier to discover […]. True, although again, it was already trivial (just make 100 GETADDR even from the same host) and see timestamps, so I won’t say this degradation is a big deal at all. I believe @ariard raised this concern somewhere above.

    But yeah, it would be great to address this issue… Then it would be another good improvement over the pre-cache code. Seems like a good candidate for inclusion in a follow-up. @sipa should we bound number of caches in this case? If getsockname for some reason returns too many values we shouldn’t blow up memory. Just not sure how valid this concern is. Alternative is just simple “new cache every time”

  210. laanwj referenced this in commit bd00d3b1f2 on Aug 12, 2020
  211. luke-jr referenced this in commit 3d36fb5efe on Aug 15, 2020
  212. luke-jr referenced this in commit 5bbf638596 on Aug 15, 2020
  213. luke-jr referenced this in commit aa50bfbc28 on Aug 15, 2020
  214. laanwj referenced this in commit c0c409dcd3 on Sep 21, 2020
  215. sidhujag referenced this in commit eec709c2bf on Sep 21, 2020
  216. laanwj removed the label Needs release note on Jan 13, 2021
  217. Fabcien referenced this in commit 5c9fac955c on Sep 7, 2021
  218. deadalnix referenced this in commit fe421f64f6 on Sep 7, 2021
  219. deadalnix referenced this in commit 2d21afd643 on Sep 7, 2021
  220. deadalnix referenced this in commit 37bf68c1c4 on Sep 7, 2021
  221. deadalnix referenced this in commit d1958e8341 on Sep 7, 2021
  222. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-03 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me