[P2P] Timeout for headers sync #10345

pull sdaftuar wants to merge 2 commits into bitcoin:master from sdaftuar:2017-05-timeout-headers-sync changing 2 files +42 −1
  1. sdaftuar commented at 7:55 pm on May 5, 2017: member

    This first commit delays requesting blocks from peers unless their tip has more than nMinimumChainWork (our anti-DoS threshold). This speeds up initial headers sync by delaying block downloads from starting until we’re closer to being caught up.

    The second commit implements a timeout on the initial headers sync: when we send the initial getheaders to a peer, estimate the number of headers we expect to receive, and set a timeout (15 minutes + 1 ms / header).

    This is a pretty naive algorithm, but I think the worst case behavior (eg if we’re bottlenecked by our own bandwidth) is that we cycle through peers while downloading headers, at most every 15 minutes – which I think isn’t too awful, since those 15 minute windows should be making progress at least, and this bounds the time a bad peer can interfere with headers sync.

    FYI I kicked off a fresh node with this patch, and it took me about 1 minute to have the whole headers chain.

    Implements #9068, though this isn’t the exact approach @sipa suggested in that issue.

  2. fanquake added the label P2P on May 5, 2017
  3. gmaxwell commented at 8:16 am on May 6, 2017: contributor
    Concept ACK. I suppose this timeout could later be made dynamic like has been proposed block transfer timeouts.
  4. Delay parallel block download until chain has sufficient work
    nMinimumChainWork is an anti-DoS threshold; wait until we have a proposed
    tip with more work than that before downloading blocks towards that tip.
    e2652002b6
  5. in src/net_processing.cpp:3220 in 1ac0cb8d18 outdated
    3220+            // Detect whether this is a stalling initial-headers-sync peer
    3221+            if (pindexBestHeader->GetBlockTime() <= GetAdjustedTime() - 24*60*60) {
    3222+                if (nNow > state.nHeadersSyncTimeout && !pto->fWhitelisted && nSyncStarted == 1 && (nPreferredDownload - state.fPreferredDownload >= 1)) {
    3223+                    // Disconnect a (non-whitelisted) peer if it is our only sync peer,
    3224+                    // and we have others we could be using instead.
    3225+                    LogPrintf("Timeout downloading headers from peer=%d, disconnecting\n", pto->id);
    


    MarcoFalke commented at 5:59 pm on May 8, 2017:
    0error: const NodeId CNode::id is private within this context
    

    paveljanik commented at 6:56 pm on May 8, 2017:
    It should be pto->GetId().
  6. sdaftuar force-pushed on May 8, 2017
  7. sdaftuar commented at 6:33 pm on May 8, 2017: member
    Rebased after #10189.
  8. sdaftuar force-pushed on May 8, 2017
  9. in src/net_processing.cpp:3225 in ca4aebd3c5 outdated
    3220+                    LogPrintf("Timeout downloading headers from peer=%d, disconnecting\n", pto->GetId());
    3221+                    pto->fDisconnect = true;
    3222+                    return true;
    3223+                }
    3224+            } else {
    3225+                // Once we've caught up once, reset the timeout so we can't trigger
    


    jonasschnelli commented at 6:16 am on May 9, 2017:
    nit: Once we've caught up once

    sdaftuar commented at 2:21 pm on May 9, 2017:
    Improved the comment in 898fc2ac689aa77e74362aa27d12120256c1942e.
  10. jonasschnelli commented at 6:25 am on May 9, 2017: contributor

    utACK ca4aebd3c50bc424afd25a5611a6ea1e77429413

    For the hybrid SPV mode (#9483), a fast headers sync is crucial and also for normal operations, someone could significant delay your initial sync. If you connect to a malicious or just a very slow peer (It happened to me a couple of times), couldn’t you – in theory – spend a couple of hours syncing headers? Even after this PR, I guess a peer could serve you the 2k headers package every 10minutes (while pong accordingly) leading to a header sync of more then 30h.

    I’m not sure if this is solvable without to much of effort, maybe the only solution would be parallel headers download… or at least a check among a handful of peers which could serve 3x2000 headers the fastest.

  11. sdaftuar commented at 12:28 pm on May 9, 2017: member

    If you connect to a malicious or just a very slow peer (It happened to me a couple of times), couldn’t you – in theory – spend a couple of hours syncing headers? Even after this PR, I guess a peer could serve you the 2k headers package every 10minutes (while pong accordingly) leading to a header sync of more then 30h. @jonasschnelli After this PR, if you have started syncing headers from your initial sync peer and your pindexBestHeader isn’t caught up to within 24 hours of current time, then you’ll disconnect them. So this puts a timeout on the entire headers sync (a bit over 20 minutes, currently – 15 minutes + 1 ms/[expected number of headers]).

    Once our headers chain is within 24 hours of current time, we’ll request headers from our other peers as well, so after that point as long as one of your peers isn’t stalling you, you’ll get the headers chain.

  12. jonasschnelli commented at 12:34 pm on May 9, 2017: contributor
    Thanks @sdaftuar for the explanation. Make sense. I thought the timer resets after each headers message. A 20min delay hurts but is probably unavoidable (the fact that we don’t know the bandwidth limitations and potentials).
  13. in src/net_processing.cpp:3217 in 898fc2ac68 outdated
    3209@@ -3206,6 +3210,24 @@ bool SendMessages(CNode* pto, CConnman& connman, const std::atomic<bool>& interr
    3210                 return true;
    3211             }
    3212         }
    3213+        // Check for headers sync timeouts
    3214+        if (state.fSyncStarted && state.nHeadersSyncTimeout < std::numeric_limits<int64_t>::max()) {
    3215+            // Detect whether this is a stalling initial-headers-sync peer
    3216+            if (pindexBestHeader->GetBlockTime() <= GetAdjustedTime() - 24*60*60) {
    3217+                if (nNow > state.nHeadersSyncTimeout && !pto->fWhitelisted && nSyncStarted == 1 && (nPreferredDownload - state.fPreferredDownload >= 1)) {
    


    TheBlueMatt commented at 5:50 pm on May 9, 2017:
    I believe usually its good practice to print if we would otherwise have disconnected a whitelisted peer.

    sdaftuar commented at 5:05 pm on May 10, 2017:
    Fixed in a27ff02f537a3225372e1e1286f0a93fb9037ba9
  14. in src/net_processing.h:22 in 898fc2ac68 outdated
    16@@ -17,6 +17,10 @@ static const int64_t ORPHAN_TX_EXPIRE_TIME = 20 * 60;
    17 static const int64_t ORPHAN_TX_EXPIRE_INTERVAL = 5 * 60;
    18 /** Default number of orphan+recently-replaced txn to keep around for block reconstruction */
    19 static const unsigned int DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN = 100;
    20+/** Headers download timeout expressed in microseconds
    21+ *  Timeout = base + per_header * (expected number of headers) */
    22+static constexpr int64_t HEADERS_DOWNLOAD_TIMEOUT_BASE = 15 * 60 * 1000000; // 15 minutes
    


    TheBlueMatt commented at 5:54 pm on May 9, 2017:
    If I can take this opportunity to bikeshed your constants - why even bother with a base timeout this high? maybe just a minute (to handle the really-close-to-done-dont-want-a-1ms-timeout case) and then double/tripple the TIMEOUT_PER_HEADER?

    sdaftuar commented at 4:39 pm on May 10, 2017:

    I just wanted to do something clearly safe (it’s easy to reason about booting your sync peer potentially every 15 minutes, but slightly scarier to think there could be a circumstance you’d boot that peer every 1 minute).

    Anyway I’d like to see what other opinions are, I’m fine adjusting them however people would like…

  15. TheBlueMatt commented at 5:54 pm on May 9, 2017: member
    utACK 898fc2ac689aa77e74362aa27d12120256c1942e, few notes, but none matter much.
  16. sdaftuar commented at 5:06 pm on May 10, 2017: member
    Addressed @TheBlueMatt’s nit in a fixup commit, let me know if I can squash.
  17. in src/net_processing.cpp:3217 in a27ff02f53 outdated
    3209@@ -3206,6 +3210,28 @@ bool SendMessages(CNode* pto, CConnman& connman, const std::atomic<bool>& interr
    3210                 return true;
    3211             }
    3212         }
    3213+        // Check for headers sync timeouts
    3214+        if (state.fSyncStarted && state.nHeadersSyncTimeout < std::numeric_limits<int64_t>::max()) {
    3215+            // Detect whether this is a stalling initial-headers-sync peer
    3216+            if (pindexBestHeader->GetBlockTime() <= GetAdjustedTime() - 24*60*60) {
    3217+                if (nNow > state.nHeadersSyncTimeout && nSyncStarted == 1 && (nPreferredDownload - state.fPreferredDownload >= 1)) {
    


    jnewbery commented at 8:03 pm on May 22, 2017:

    nPreferredDownload - state.fPreferredDownload >= 1 was slightly confusing for me. It’s saying we should only timeout a peer if we have another peer which is preferred. This doesn’t exactly match the logic in SendMessages() where we’ll try to start block sync from a non-preferred peer if we don’t have any better options.

    I think this is fine. It’s not obvious that we should time out a peer if we don’t have any other options which are preferred, but perhaps a comment here would make this clearer.


    sdaftuar commented at 8:38 pm on May 22, 2017:
    Added a comment in 595d51c
  18. jnewbery commented at 8:04 pm on May 22, 2017: member
    utACK a27ff02f537a3225372e1e1286f0a93fb9037ba9 with one request for a comment.
  19. in src/net_processing.cpp:3228 in 595d51c2b7 outdated
    3223+                    if (!pto->fWhitelisted) {
    3224+                        LogPrintf("Timeout downloading headers from peer=%d, disconnecting\n", pto->GetId());
    3225+                        pto->fDisconnect = true;
    3226+                        return true;
    3227+                    } else {
    3228+                        LogPrintf("Timeout downleading headers from whitelisted peer=%d, not disconnecting\n", pto->GetId());
    


    TheBlueMatt commented at 3:01 pm on June 1, 2017:
    “downleading”

    TheBlueMatt commented at 4:08 pm on June 1, 2017:
    Additionally, it would be nice to unset this peer as fSyncStarted so that we pick another peer to donwnload from, in case the whitelisted sync peer is broken.

    sdaftuar commented at 5:45 pm on June 1, 2017:
  20. TheBlueMatt commented at 4:09 pm on June 1, 2017: member
    Discussed making the whole download logic a much clearer state machine, but that can come after this.
  21. sdaftuar force-pushed on Jun 1, 2017
  22. Add timeout for headers sync
    At startup, we choose one peer to serve us the headers chain, until
    our best header is close to caught up.  Disconnect this peer if more
    than 15 minutes + 1ms/expected_header passes and our best header
    is still more than 1 day away from current time.
    76f74811c4
  23. sdaftuar force-pushed on Jun 5, 2017
  24. sdaftuar commented at 8:34 pm on June 5, 2017: member
    Squashed 65e300812 -> 76f74811c
  25. laanwj commented at 10:23 am on June 6, 2017: member
    utACK 76f7481
  26. laanwj merged this on Jun 6, 2017
  27. laanwj closed this on Jun 6, 2017

  28. laanwj referenced this in commit 67700b3924 on Jun 6, 2017
  29. laanwj referenced this in commit 815fe62421 on Sep 6, 2017
  30. PastaPastaPasta referenced this in commit 7482e4f67f on Aug 17, 2019
  31. PastaPastaPasta referenced this in commit a0a3c2743a on Sep 20, 2019
  32. PastaPastaPasta referenced this in commit 0a30af964b on Sep 23, 2019
  33. PastaPastaPasta referenced this in commit e360b50a09 on Sep 23, 2019
  34. PastaPastaPasta referenced this in commit b2c0e526ba on Sep 24, 2019
  35. codablock referenced this in commit e14cc023ae on Sep 26, 2019
  36. codablock referenced this in commit 9938dd83d4 on Sep 29, 2019
  37. barrystyle referenced this in commit fb16d0dc2c on Jan 22, 2020
  38. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me