Evict outbound peers if tip is stale #11534

pull sdaftuar wants to merge 5 commits into bitcoin:master from sdaftuar:2017-10-stale-tip-eviction changing 6 files +397 −1
  1. sdaftuar commented at 2:31 pm on October 20, 2017: member

    This builds on #11490 and is related to the work there, but I’ve separated this for review in order to not hold up #11490 (which I think is done).

    This pr attempts to implement a strategy suggested by @gmaxwell, of choosing a peer for outbound eviction if our tip has not been updated in a while. I’d like to suggest this for consideration in 0.15.0.2 as well, as it is designed to mitigate p2p disruption in the event that all our outbound peers stop relaying blocks to us. @theuni If you have a chance to review, I could use feedback on the first new commit here (Add hacky accessors for manipulating connman peers in tests). I did the quickest thing I could in order to get a unit test working, but I’m pretty sure I violated all sorts of design goals, so I could use some guidance about the right way to add peers to connman in a unit test.

  2. Disconnecting from bad outbound peers in IBD
    When in IBD, we'd like to use all our outbound peers to help us
    sync the chain.  Disconnect any outbound peers whose headers have
    insufficient work.
    e9d58023ae
  3. Permit disconnection of outbound peers on bad/slow chains
    Currently we have no rotation of outbound peers.  If an outbound peer
    stops serving us blocks, or is on a consensus-incompatible chain with
    less work than our tip (but otherwise valid headers), then we will never
    disconnect that peer, even though that peer is using one of our 8
    outbound connection slots.  Because we rely on our outbound peers to
    find an honest node in order to reach consensus, allowing an
    incompatible peer to occupy one of those slots is undesirable,
    particularly if it is possible for all such slots to be occupied by such
    peers.
    
    Protect against this by always checking to see if a peer's best known
    block has less work than our tip, and if so, set a 20 minute timeout --
    if the peer is still not known to have caught up to a chain with as much
    work as ours after 20 minutes, then send a single getheaders message,
    wait 2 more minutes, and if a better header hasn't been received by then,
    disconnect that peer.
    
    Note:
    
    - we do not require that our peer sync to the same tip as ours, just an
    equal or greater work tip.  (Doing otherwise would risk partitioning the
    network in the event of a chain split, and is also unnecessary.)
    
    - we pick 4 of our outbound peers and do not subject them to this logic,
    to be more conservative. We don't wish to permit temporary network
    issues (or an attacker) to excessively disrupt network topology.
    9a70054f65
  4. Add unit test for outbound peer eviction 39e0c8e1e7
  5. Add hacky accessors for manipulating connman peers in tests 6cd3ff9731
  6. fanquake added the label P2P on Oct 20, 2017
  7. sdaftuar force-pushed on Oct 20, 2017
  8. in src/net.cpp:2889 in 2ea6f2cf07 outdated
    2877@@ -2862,3 +2878,15 @@ uint64_t CConnman::CalculateKeyedNetGroup(const CAddress& ad) const
    2878 
    2879     return GetDeterministicRandomizer(RANDOMIZER_ID_NETGROUP).Write(vchNetGroup.data(), vchNetGroup.size()).Finalize();
    2880 }
    2881+
    2882+void CConnman::AddToVNodes(CNode &node)
    


    promag commented at 8:56 pm on October 20, 2017:

    Is this only meant for tests?

    Otherwise replace the following code with a call to this function? https://github.com/bitcoin/bitcoin/blob/ff92fbf24739a022eb677daab03c87c5e6971094/src/net.cpp#L1140-L1143


    sdaftuar commented at 9:27 am on October 21, 2017:
    Yeah this is only for the unit test; I’m hoping someone is able to suggest a better way of doing this than what I did here to get the test working.

    promag commented at 8:27 am on October 24, 2017:
    @sdaftuar see 6f6e7e0 for an alternative that doesn’t change the production interface.

    theuni commented at 5:14 pm on October 24, 2017:

    I’m ok with leaving the hacks for now, though please make them private and a friend to the boost test functions.

    After we merge #11457 (and the follow-up for addrman: https://github.com/theuni/bitcoin/tree/move-addrdb), I believe we’ll have enough broken out to create 2 CConnman instances and run them against each-other for these tests. At that point, we can remove the test hacks.


    sdaftuar commented at 2:32 pm on October 26, 2017:
    @promag Thanks for the patch, I’m going to include it in #11560
  9. Disconnect an outbound peer if tip is stale c071f62e21
  10. sdaftuar force-pushed on Oct 23, 2017
  11. in src/net_processing.cpp:3412 in c071f62e21
    3407+                NodeId worst_peer = -1;
    3408+                int64_t oldest_block_announcement = GetTime();
    3409+                int64_t worst_peer_connect_time = 0;
    3410+                for (auto it = mapNodeState.begin(); it != mapNodeState.end(); ++it) {
    3411+                    int64_t connect_time = 0;
    3412+                    if (connman->ForNode(it->first, [&](CNode *pnode){
    


    theuni commented at 7:59 pm on October 24, 2017:

    I wonder if we might turn this logic around: rather than marking for disconnection, mark for protection from disconnection. For example, each time we get a new header that extends our tip, we could do:

    0connman->ResetOutboundEvictionTimer(id, current_time + new_timeout);
    

    Then when we need a connection slot, if any timer has expred, whichever has been expired the longest gets dropped. That saves us from having to care about what type of node (inbound/outbound/oneshot/manual/etc.) this is, and just lets outbound eviction do its thing if the criteria are met.

  12. in src/net_processing.cpp:131 in c071f62e21
    126@@ -127,6 +127,12 @@ namespace {
    127     /** Number of outbound peers with m_protect_from_disconnect. */
    128     int g_outbound_peers_with_protect_from_disconnect = 0;
    129 
    130+    /** When to next check whether our tip is stale. */
    131+    int64_t g_tip_stale_check_time = 0;
    


    theuni commented at 8:03 pm on October 24, 2017:
    Move these into PeerLogicValidation ?
  13. sdaftuar commented at 4:56 pm on October 26, 2017: member
    Closing in favor of #11560
  14. sdaftuar closed this on Oct 26, 2017

  15. laanwj referenced this in commit 2f959a5874 on Nov 2, 2017
  16. codablock referenced this in commit f544e6a642 on Sep 26, 2019
  17. codablock referenced this in commit 58cb7e38f4 on Sep 29, 2019
  18. barrystyle referenced this in commit 0e69e92f08 on Jan 22, 2020
  19. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me