Allow 2 simultaneous block downloads

morcos commented at 3:35 pm on December 30, 2016: member

This is built off of #9375, #9252, and #9400. I’ll properly rebase it when those are merged.

It provides special logic to issue a second getdata request for a cmpctblock if there is only 1 request outstanding and we think the announced block would be our new tip.

It also changes the cmpctblock processing logic to be first come first served (regardless of in flight block requests) and allow two simultaneous compact block reconstructions.

So in particular, it is now possible to:

receive headers -> request cmpctblock from peer 1
receive headers -> request cmpctblock from peer 2
receive cmpctblock -> request blocktxn from peer 3
receive cmpctblock -> request blocktxn from peer 4

Upon receiving a cmpctblock from peers 1 or 2 after this, it will be treated the same as receiving an unsolicited cmpctblock from peer 5 and it will attempt opportunistic reconstruction but not request blocktxn. It will also remove the block in flight for peer 1 or 2.

I believe that the multiple requests is an acceptable increase in bandwidth in order to provide robustness against a single peer stalling us at any point in the logic. The reason to allow a second request for a cmpctblock is to allow LB peers a chance to become HB even if there is a staller who is always announcing first, otherwise, only existing HB peers would have a chance to deliver the block.

morcos force-pushed on Dec 30, 2016

morcos commented at 4:09 pm on December 30, 2016: member

Seems to fail sendheaders.py but not on my local machine… I’ll look into it…

in src/net_processing.cpp: in f4832c33f0 outdated

1925 
1926         if (pindex->nStatus & BLOCK_HAVE_DATA) // Nothing to do here
1927             return true;
1928-
1929+        LogPrintf("ChainWork check at height %d new: %s  tip: %s\n",pindex->nHeight,pindex->nChainWork.GetHex(),chainActive.Tip()->nChainWork.GetHex());
1930         if (pindex->nChainWork <= chainActive.Tip()->nChainWork || // We know something better

rebroad commented at 6:06 am on December 31, 2016:

why not make this < instead of <= ? why do we avoid requesting compact blocks for competing best blocks?

rebroad commented at 10:31 am on January 1, 2017:

@morcos I couldn’t see this added line in any of the 3 PRs you mentioned.

in src/net_processing.cpp: in f4832c33f0 outdated

2021@@ -1893,24 +2022,41 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
2022                     fBlockReconstructed = true;
2023                 }
2024             }
2025+            if (pindex->nHeight == 165) {

rebroad commented at 6:11 am on December 31, 2016:

What is the deal with block height 165?

in src/net_processing.cpp: in f4832c33f0 outdated

2083@@ -1946,17 +2084,27 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
2084         {
2085             LOCK(cs_main);
2086 
2087-            map<uint256, pair<NodeId, list<QueuedBlock>::iterator> >::iterator it = mapBlocksInFlight.find(resp.blockhash);
2088-            if (it == mapBlocksInFlight.end() || !it->second.second->partialBlock ||
2089-                    it->second.first != pfrom->GetId()) {
2090+            bool fExpectedBLOCKTXN = false;

rebroad commented at 6:12 am on December 31, 2016:

Very much approve of the variable namings chosen in various places.

in src/net_processing.cpp: in f4832c33f0 outdated

2303@@ -2142,10 +2304,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
2304                             pindexLast->GetBlockHash().ToString(), pindexLast->nHeight);
2305                 }
2306                 if (vGetData.size() > 0) {
2307-                    if (nodestate->fSupportsDesiredCmpctVersion && vGetData.size() == 1 && mapBlocksInFlight.size() == 1 && pindexLast->pprev->IsValid(BLOCK_VALID_CHAIN)) {
2308-                        // We seem to be rather well-synced, so it appears pfrom was the first to provide us
2309-                        // with this block! Let's get them to announce using compact blocks in the future.
2310-                        MaybeSetPeerAsAnnouncingHeaderAndIDs(nodestate, pfrom, connman);
2311+                    if (nodestate->fSupportsDesiredCmpctVersion && vGetData.size() == 1 && mmapBlocksInFlight.size() == mmapBlocksInFlight.count(vGetData[0].hash) && pindexLast->pprev->IsValid(BLOCK_VALID_CHAIN)) {

rebroad commented at 6:15 am on December 31, 2016:

Why not make this <=2 rather than ==1 (for mapBlocksInFlight.size())? this way if two blocks get announced in close proximity we can request compact blocks for both of them (rather than compact block for the oldest, and full block for the most recent).

rebroad commented at 6:17 am on December 31, 2016: contributor

I like the coding style and elegance of this. Will help with testing.

morcos commented at 2:30 pm on December 31, 2016: member

@rebroad As mentioned in the PR comment, this is built off 3 other PR’s. All of your comments either belong on those PR’s or are related to the last commit which was just for debugging the travis failure and will be removed. I have just left it there for now in case anyone else wants to see the error details.

TheBlueMatt commented at 4:19 pm on December 31, 2016: member

@morcos claimed on IRC the test failures might be (in part) due to the issue mentioned at #9375 (comment)

fanquake added the label P2P on Jan 1, 2017

fanquake added the label Validation on Jan 1, 2017

rebroad commented at 10:32 am on January 1, 2017: contributor

@morcos although most of the lines I am commenting on have not been introduced by you, given you are changing the code near to them, I tought it was a good opportunity to suggest these changes as I believe they ought to be changed at some point, so perhaps could be with this PR.

morcos force-pushed on Jan 1, 2017

morcos commented at 6:19 pm on January 1, 2017: member

Rebased with the newest commit from #9375 which fixes the failure.

For clarity only the 5 commits on which I’m the author are meant for review here. The others are contained in the linked PR’s…

MarcoFalke commented at 6:51 pm on January 1, 2017: member

OT: Imo it makes sense to octomerge all pulls which this pull depends on and rebase the commits of this pull on top of the merge commit. Thus, the original commit hashes are preserved and it is clear what was old and should be reviewed in other pulls. Also, it is easier to see the fresh commits.

in src/net_processing.cpp: in 6060f3da8f outdated

2062-                return true;
2063-            } else {
2064+                if (fPeerWasFirstRequest) {
2065+                    // We requested this block, but its far into the future, so our
2066+                    // mempool will probably be useless - request the block normally
2067+                    // Only allow a the first peer to request a full block

instagibbs commented at 3:58 pm on January 4, 2017:

“Only ask for the full block from the first peer we requested from”?

in src/net_processing.cpp: in 2ea1ceca56 outdated

278+void ClearDownloadState(BlockDownloadMap::iterator itInFlight) {
279+    CNodeState *state = State(itInFlight->second.first);
280+    state->nBlocksInFlightValidHeaders -= itInFlight->second.second->fValidatedHeaders;
281+    if (state->nBlocksInFlightValidHeaders == 0 && itInFlight->second.second->fValidatedHeaders) {
282+        // Last validated block on the queue was received.
283+        nPeersWithValidatedDownloads--;

instagibbs commented at 4:07 pm on January 4, 2017:

could just do the -= itInFlight->second.second->fValidatedHeaders as above to match.

in src/net_processing.cpp: in 2ea1ceca56 outdated

368+    while (range.first != range.second) {
369+        found = true;
370+        BlockDownloadMap::iterator itInFlight = range.first;
371+        ClearDownloadState(itInFlight);
372+        range.first++;
373+        mmapBlocksInFlight.erase(itInFlight);

instagibbs commented at 4:30 pm on January 4, 2017:

since C++11 can also just capture the return value and set as range.first instead of worrying about iterator invalidation.

instagibbs commented at 5:39 pm on January 4, 2017: member

pre-rebase utACK. I still think it might make sense to only allow a second if a HB peer is responsible for one of them, but I think that kind of change is desired is complementary on top of this.

in src/net_processing.cpp: in 2ea1ceca56 outdated

104@@ -105,7 +105,8 @@ namespace {
105         bool fValidatedHeaders;                                  //!< Whether this block has validated headers at the time of request.
106         std::unique_ptr<PartiallyDownloadedBlock> partialBlock;  //!< Optional, used for CMPCTBLOCK downloads
107     };
108-    map<uint256, pair<NodeId, list<QueuedBlock>::iterator> > mapBlocksInFlight;
109+    typedef std::multimap<uint256, pair<NodeId, list<QueuedBlock>::iterator>> BlockDownloadMap;

ryanofsky commented at 9:29 pm on January 5, 2017:

I think indexing might by NodeId would be better than using a multimap, because it would ensure that there couldn’t be multiple entries for a block from the same node. I’d suggest:

0typedef map<pair<uint256, NodeId>, list<QueuedBlock>::iterator> BlockDownloadMap;

This also would let you replace some of the while loops added here with direct lookups. I thought some of these (especially the while loop with the break statement setting fExpectedBLOCKTXN) were kind of confusing.

The downside of using a map is that you wouldn’t have an equal_range method to call in the places that do require a loop. But you could replace those equal_range calls with calls to an equivalent helper function:

0pair<BlockDownloadMap::iterator, BlockDownloadMap::iterator>
1GetBlockDownloadRange(BlockDownloadMap& blocks, const uint256& hash) {
2    return {blocks.lower_bound({hash, numeric_limits<NodeId>::min()}),
3            blocks.upper_bound({hash, numeric_limits<NodeId>::max()})};
4}

morcos commented at 8:58 pm on January 6, 2017:

I’m unsure about this change. In particular there are a lot of mmapBlocksInFlight.count(hash) calls that would get a bit less clear… I’ll think about it some more

ryanofsky commented at 9:24 pm on January 6, 2017:

Yeah, I noticed that in the later commits. You could have a count helper function returning std::distance(range.first, range.second). I do think a map is a better way to represent the data, but the c++ map implementation does makes it a little awkward. Anyway, it’s just something to consider.

in src/net_processing.cpp: in 69a8ca6fb0 outdated

2291@@ -2292,6 +2292,18 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
2292                 }
2293                 pindexWalk = pindexWalk->pprev;
2294             }
2295+            // Special case for second cmpctblock request of tip

ryanofsky commented at 9:55 pm on January 5, 2017:

Could you expand this comment a little bit to describe the condition being checked? In particular I don’t understand how the IsWitnessEnabled and fHaveWitness parts relate to making the request.

TheBlueMatt commented at 9:12 pm on February 24, 2017:

Also, maybe consider just pulling the MSG_CMPCT_BLOCK setting a bit later down up to here and just requesting the block in this if statement (possibly before the while loop above).

in src/net_processing.cpp: in 558a807f8e outdated

1946@@ -1947,7 +1947,7 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
1947 
1948         if (pindex->nChainWork <= chainActive.Tip()->nChainWork || // We know something better
1949                 pindex->nTx != 0) { // We had this block at some point, but pruned it
1950-            if (fAlreadyInFlight) {
1951+            if (fInFlightFromSamePeer) {

ryanofsky commented at 10:01 pm on January 5, 2017:

Is this change (“Only request full blocks from the peer we thought had the block in-flight”) a change in behavior? Or is it just a cleanup after the previous multimap commit? It seems like this commit should be merged into the preceding or following one, or the commit message should be extended to say what the effect is, what motivates it.

morcos force-pushed on Jan 6, 2017

morcos commented at 9:00 pm on January 6, 2017: member

Rebased and I think done the way @MarcoFalke suggested.

Addressed feedback

morcos force-pushed on Jan 17, 2017

morcos commented at 6:19 pm on January 17, 2017: member

rebased

morcos force-pushed on Jan 20, 2017

da2ce7 commented at 4:29 pm on February 7, 2017: none

#9375, #9252, and #9400 are merged, needs rebase.

Turn mapBlocksInFlight into a multimap d0260aea66

Only mark block as received if BLOCK_VALID_TRANSACTIONS f66254fa2d

Only request full blocks from the peer we thought had the block in-flight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)

8ffd51f8c8

Allow multiple compact block reconstructions simultaneously 3f62b2cc18

Allow second request of cmpctblock 5eb857dccd

morcos force-pushed on Feb 14, 2017

morcos commented at 6:38 pm on February 14, 2017: member

rebased

in src/net_processing.cpp: in d0260aea66 outdated

273@@ -274,6 +274,41 @@ void InitializeNode(CNode *pnode, CConnman& connman) {
274         PushNodeVersion(pnode, connman, GetTime());
275 }
276 
277+// Requires cs_main

TheBlueMatt commented at 5:15 pm on February 24, 2017:

Can you AssertLockHeld?

in src/net_processing.cpp: in d0260aea66 outdated

290+    state->vBlocksInFlight.erase(itInFlight->second.second);
291+    state->nBlocksInFlight--;
292+    state->nStallingSince = 0;
293+}
294+
295+// Requires cs_main.

TheBlueMatt commented at 5:15 pm on February 24, 2017:

Can you AssertLockHeld?

in src/net_processing.cpp: in 5eb857dccd

2397+            // and conditions to do CMPCT_BLOCK announcement below.
2398+            if (vToFetch.size() == 0 && pindexLast->pprev == chainActive.Tip() &&
2399+                mmapBlocksInFlight.size() == mmapBlocksInFlight.count(pindexLast->GetBlockHash()) &&
2400+                mmapBlocksInFlight.count(pindexLast->GetBlockHash()) < MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK &&
2401+                !(pindexLast->nStatus & BLOCK_HAVE_DATA) &&
2402+                (!IsWitnessEnabled(pindexLast->pprev, chainparams.GetConsensus()) || State(pfrom->GetId())->fHaveWitness) &&

TheBlueMatt commented at 9:13 pm on February 24, 2017:

In the strange case that a peer does not set the fHaveWItness service bit, but does announce compact blocks v2, I believe this line would result in a full block request. More generally, because the two if statements always have to be in sync to avoid this, I really prefer we pull the actual request logic into this if statement.

in src/net_processing.cpp: in 3f62b2cc18 outdated

2085@@ -2072,8 +2086,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2086         // We want to be a bit conservative just to be extra careful about DoS
2087         // possibilities in compact block processing...
2088         if (pindex->nHeight <= chainActive.Height() + 2) {
2089-            if ((!fAlreadyInFlight && nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) ||
2090-                fInFlightFromSamePeer) {
2091+            if ((countPartialBlocksStarted < MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK && nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER)) {

TheBlueMatt commented at 9:20 pm on February 24, 2017:

The way I read this, the use of countPartialBlocksStarted, instead of a countBlocksStarted, means that we will request up to two compact blocks at a time, even if we are already requesting the full block from a peer. This seems strange to me, why not just max 2 in-flights at the same time for a given block, with the second never being a full block?

TheBlueMatt commented at 9:30 pm on February 24, 2017:

Also, why drop the fInFlightFromSamePeer option? It looks like we’ll never getblocktxn from two peers simultaneously?

in src/net_processing.cpp: in 5eb857dccd

2399+                mmapBlocksInFlight.size() == mmapBlocksInFlight.count(pindexLast->GetBlockHash()) &&
2400+                mmapBlocksInFlight.count(pindexLast->GetBlockHash()) < MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK &&
2401+                !(pindexLast->nStatus & BLOCK_HAVE_DATA) &&
2402+                (!IsWitnessEnabled(pindexLast->pprev, chainparams.GetConsensus()) || State(pfrom->GetId())->fHaveWitness) &&
2403+                nodestate->fSupportsDesiredCmpctVersion) {
2404+                vToFetch.push_back(pindexLast);

TheBlueMatt commented at 9:26 pm on February 24, 2017:

Further, simply adding the entry to vToFetch here may result in a null pointer dereference, I believe. If a peer announces two headers messages back-to-back, the first time we will MarkBlockAsInFlight to them, and the second time we’ll hit this condition and add the entry to vToFetch again (which should also be fixed). Further down, we’ll MarkBlockAsInFlight to them again, but MarkBlockAsInFlight requires that, if the block is already in-flight to the same peer, pit be something non-NULL as it will be dereferenced, but it is NULL in the call below.

TheBlueMatt commented at 9:27 pm on February 24, 2017:

I believe the above needs fixing in three ways - MarkBlockAsInFlight needs to be more robust against NULL pit, the request needs to move into this if statement and skip the remainder of this block of code, and we shouldn’t double-request from the same peer.

in src/net_processing.cpp: in 3f62b2cc18 outdated

2106@@ -2094,6 +2107,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2107                     return true;
2108                 } else if (status == READ_STATUS_FAILED) {
2109                     // Duplicate txindexes, the block is now in-flight, so just request it
2110+                    // NOTE: This is the one place two full block requests can be outstanding

TheBlueMatt commented at 9:31 pm on February 24, 2017:

OK, so why not just check fPeerWasFirstRequest and MarkBlockAsNotInFlight otherwise?

in src/net_processing.h:13 in 5eb857dccd

 8@@ -9,6 +9,8 @@
 9 #include "net.h"
10 #include "validationinterface.h"
11 
12+/** Maximum number of outstanding CMPCTBLOCK requests for the same block. */
13+static const int MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK = 2;

jameshilliard commented at 10:11 pm on June 28, 2017:

Should this be configurable? Might make sense for miners to have this higher.

TheBlueMatt commented at 10:18 pm on June 28, 2017:

Likely not, asking all your peers for a copy of the block poses the same network-DoS risks as connecting to hundreds of nodes, which people like to do because they believe it will help (though it usually actually hurts) them get their blocks out faster.

More importantly, if you’re a miner and have good peers, I think it’d be somewhat rare for you to receive a third compact block announce before the first can respond to your blocktxn request, at least it will be once we get proper multi-threaded ProcessMessages implemented to respond to blocktxn requests for the latest block in the background (see #10652 for the beginnings of the steps to do so).

jameshilliard commented at 10:50 pm on June 28, 2017:

What do you think the upper limit is before it would generally cause a negative impact? Maybe just have an upper limit like we do for max outbound connections.

TheBlueMatt commented at 10:54 pm on June 28, 2017:

Probably around 2 :p. Its really only useful if your peer got stuck doing something and wasn’t able to respond or is being actively malicious. Once we’ve fixed the block-on-block-validation-before-responding-to-blocktxn-requests issue, it should be somewhat rare for this to help more than a very small amount.

jtimon commented at 0:41 am on September 6, 2017: contributor

More concurrency! concept ACK

TheBlueMatt commented at 0:46 am on September 6, 2017: member

This should likely be closed in favor of #10984.

ryanofsky commented at 6:08 pm on October 12, 2017: member

Should this still be closed in favor of #10984?

morcos closed this on Nov 9, 2017

DrahtBot locked this on Sep 8, 2021

Allow 2 simultaneous block downloads #9447