Small step towards demangling cs

TheBlueMatt commented at 7:56 pm on June 22, 2017: member

This is a rather long branch, split up into three PRs, with lots of cleanup and tweaks here and there that evntually gets us CNodeStates which have their own locks, locked before cs_main, which makes nodes not always require cs_main to do simple things like call Misbehaving(). The final result is at https://github.com/TheBlueMatt/bitcoin/tree/2017-06-cnodestateaccessors-3 (and next PR is https://github.com/TheBlueMatt/bitcoin/tree/2017-06-cnodestateaccessors-2).

This PR is primarily some small cleanups and the first few commits of #9447 rebased on #10179 and with small tweaks.

The result is that we take the lock for the CNodeState for the node we’re processing at the top of ProcessMessages/SendMessages, and hold it until the end, passing around an accessor to the CNodeState as we go. This moves towards a few goals: (1) ability to run ProcessMessages() in paralell, (2) move towards a world where ProcessMessage() is a function in CNodeState, which are objects owned by CConnman.

This isn’t a trivial thing to pull off because of lockorder, which introduces two issues:

There are a few places we call State() as a result of callbacks from CValidationInterface. Addressed by extending the CValidationInterface threading introduced in #10179 and used for wallet to a few more functions.
You can’t take two State() accessors at once, because then you’d have undefined lockorder. Luckily with the CValidationInterface threading changes gets most of those, but also the first few commits from #9447 are needed (included here rebased and slightly tweaked) as well as two little tweaks to be able to apply DoS score after unlocking the CNodeState of the node we’re processing on.

By the end DEBUG_LOCKORDER features are added to stricly enforce these potential issues (and I’ve been testing it a bunch in helgrind, etc).

TheBlueMatt commented at 7:56 pm on June 22, 2017: member

Ping @theuni.

fanquake added the label Refactoring on Jun 23, 2017

in src/net.cpp:2832 in f1d25b2d54 outdated

2828+    bool res = found != nullptr && NodeFullyConnected(found) && func(found);
2829+    found->Release();
2830+    return res;
2831 }
2832 
2833+void CConnman::ForEachNode(std::function<void (CNode* pnode)> func)

jonasschnelli commented at 8:24 pm on June 28, 2017:

Wouldn’t this discard the rvalue reference && (move schematic)?

TheBlueMatt commented at 9:07 pm on June 28, 2017:

I mean it does, but I’m not too worried about requiring move for a callable type (places where we std::move a lambda into this will still get the benifit of calling std::function(Callable&&)). The real concern here, of course, is that it introduces a performance penalty, but its not huge (an extra function call and few if’s on top of the callback function call) and we’re already calling this with lambda functions, so I dont think we were ever too worried about the performance here.

TheBlueMatt commented at 9:19 pm on June 28, 2017:

Moved this commit to #10697.

TheBlueMatt force-pushed on Jun 28, 2017

TheBlueMatt commented at 9:28 pm on June 28, 2017: member

Rebased on a rebased version of the updated #10179 and filled in more text for the " MarkBlockAsReceived on NewPoWValidBlock at receive." commit message which helps provide a bit more context for this PR:

“Note that this also helps with future per-CNodeState locks, as those will require no two CNodeState locks to be held at the same time (for lockorder reasons), and this prevents us from accessing another peer’s mmapBlocksInFlight entry during processing as we can move the NewPoWValidBlock callback into the CValidationInterface background thread mechanics.”

jonasschnelli added this to the "Blockers" column in a project

jonasschnelli removed this from the "Blockers" column in a project

in src/validation.cpp:3144 in df3001d00a outdated

3022@@ -3023,9 +3023,7 @@ static bool AcceptBlock(const std::shared_ptr<const CBlock>& pblock, CValidation
3023     }
3024 
3025     // Header is valid/has work, merkle tree and segwit merkle tree are good...RELAY NOW
3026-    // (but if it does not build on our best tip, let the SendMessages loop relay it)
3027-    if (!IsInitialBlockDownload() && chainActive.Tip() == pindex->pprev)
3028-        GetMainSignals().NewPoWValidBlock(pindex, pblock);
3029+    GetMainSignals().NewPoWValidBlock(pindex, pblock, chainActive.Tip() == pindex->pprev);

theuni commented at 9:56 pm on July 6, 2017:

I think this wants fInitialDownload as a param so that it’s always evaluated in its receiving context rather than having subscribers potentially coming to different conclusions.

TheBlueMatt commented at 9:21 pm on July 7, 2017:

Hmm? I’m not sure I caught your point - we never get here if we already have the block on disk, and the new uses of it in net_processing need to know that we got here for any ProcessNewBlock call they make.

theuni commented at 10:20 pm on July 7, 2017:

My point was that IsInitialBlockDownload() may be false by the time it’s checked in a callback on a separate thread (once that stuff comes in), though it was true in AcceptBlock. So it’s potentially racy to subscribers.

TheBlueMatt commented at 11:18 pm on July 7, 2017:

Ahh, ok. Yes, is not an issue in any of my work, I believe (worst-case you have an extra few fast-announcements when you first leave IBD), but I added a comment noting the issue.

in src/net_processing.cpp:2430 in 35bd4b6204 outdated

2423@@ -2431,7 +2424,10 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2424             LOCK(cs_main);
2425             // Also always process if we requested the block explicitly, as we may
2426             // need it even though it is not a candidate for a new best tip.
2427-            forceProcessing |= MarkBlockAsReceived(hash);
2428+            // TODO: Only process if requested from this peer?
2429+            forceProcessing |= mmapBlocksInFlight.count(hash);
2430+            // Block is no longer in flight from this peer
2431+            MarkBlockAsNotInFlight(hash, pfrom->GetId());

theuni commented at 9:58 pm on July 6, 2017:

Probably makes sense for MarkBlockAsNotInFlight() to return whether it removed something for the peer? I think that would take care of the TODO as well.

TheBlueMatt commented at 9:27 pm on July 7, 2017:

Given the TODO has a “?” on it (and I dunno if we want it or not), I’ll leave this for a later PR?

TheBlueMatt force-pushed on Jul 7, 2017

TheBlueMatt commented at 9:28 pm on July 7, 2017: member

Rebased.

TheBlueMatt force-pushed on Jul 7, 2017

in src/net_processing.cpp:312 in c248f0121b outdated

307+    while (range.first != range.second) {
308+        BlockDownloadMap::iterator itInFlight = range.first;
309+        range.first++;
310+        if (itInFlight->second.first == nodeid) {
311+            if (clearState)
312+                ClearDownloadState(itInFlight);

promag commented at 2:08 pm on July 10, 2017:

Missing block or same line as if ().

morcos commented at 7:45 pm on July 10, 2017:

same line or braces

TheBlueMatt commented at 1:35 am on July 11, 2017:

Fixed.

in src/net_processing.cpp:2069 in 0010932a0d outdated

2065@@ -2024,7 +2066,7 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2066         }
2067 
2068         // If we're not close to tip yet, give up and let parallel block fetch work its magic
2069-        if (!fAlreadyInFlight && !CanDirectFetch(chainparams.GetConsensus()))
2070+        if (!fInFlightFromSamePeer && !CanDirectFetch(chainparams.GetConsensus()))

morcos commented at 9:01 pm on July 10, 2017:

I’m a bit hesitant about this change. I can’t really remember exactly why we have this CanDirectFetch logic in compact block processing. But this could make us unnecessarily avoid an optimistic compact block reconstruction if for some reason it took more than 200 mins to receive 2 blocks (right we do reconstructions if within 2 blocks of the announced block?)

morcos commented at 9:07 pm on July 10, 2017:

Actually now that I think about it, that concern exists prior to the changes in this PR. Why do we have the CanDirectFetch guard for compact block processing?

TheBlueMatt commented at 1:38 am on July 11, 2017:

No, I think you’d have to go 200 minutes for one block for it to matter (if you go 200 minutes for two, and missed the first one, you wont compact-download that older one, but thats OK, you were probably offline or something anyway). Still, we dont have a much better way to say “ok, I really, really only want to do this if I’m caught up” (ie cause otherwise mempool will be worthless). We could drop it - its not the end of the world to take the extra RT to use the compact blocks and get a tiny bit of compression, but at that point you may very well be just as well off just doing a regular block fetch and not taking the extra RT.

morcos approved

morcos commented at 9:04 pm on July 10, 2017: member

Looks pretty good to me. utACK 0010932

TheBlueMatt force-pushed on Jul 11, 2017

in src/scheduler.cpp:146 in 7145218d32 outdated

138@@ -139,3 +139,69 @@ size_t CScheduler::getQueueInfo(boost::chrono::system_clock::time_point &first,
139     }
140     return result;
141 }
142+
143+bool CScheduler::AreThreadsServicingQueue() const {
144+    return nThreadsServicingQueue;
145+}
146+

promag commented at 9:53 am on July 11, 2017:

Remove extra line.

in src/scheduler.cpp:143 in 7145218d32 outdated

138@@ -139,3 +139,69 @@ size_t CScheduler::getQueueInfo(boost::chrono::system_clock::time_point &first,
139     }
140     return result;
141 }
142+
143+bool CScheduler::AreThreadsServicingQueue() const {

promag commented at 9:54 am on July 11, 2017:

{ in new line for all functions.

in src/net_processing.cpp:2195 in 7145218d32 outdated

2188@@ -2158,17 +2189,27 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2189         {
2190             LOCK(cs_main);
2191 
2192-            std::map<uint256, std::pair<NodeId, std::list<QueuedBlock>::iterator> >::iterator it = mapBlocksInFlight.find(resp.blockhash);
2193-            if (it == mapBlocksInFlight.end() || !it->second.second->partialBlock ||
2194-                    it->second.first != pfrom->GetId()) {
2195+            bool fExpectedBLOCKTXN = false;
2196+            std::pair<BlockDownloadMap::iterator, BlockDownloadMap::iterator> rangeInFlight = mmapBlocksInFlight.equal_range(resp.blockhash);
2197+            BlockDownloadMap::iterator it = rangeInFlight.first;

promag commented at 10:03 am on July 11, 2017:

Remove it and mutate rangeInFlight.first.

in src/net_processing.cpp:2199 in 7145218d32 outdated

2196+            std::pair<BlockDownloadMap::iterator, BlockDownloadMap::iterator> rangeInFlight = mmapBlocksInFlight.equal_range(resp.blockhash);
2197+            BlockDownloadMap::iterator it = rangeInFlight.first;
2198+            while (it != rangeInFlight.second) {
2199+                if (it->second.first == pfrom->GetId()) {
2200+                    if (it->second.second->partialBlock)
2201+                        fExpectedBLOCKTXN = true;

promag commented at 10:21 am on July 11, 2017:

0if (it->second.first == pfrom->GetId()) {
1    fExpectedBLOCKTXN = it->second.second->partialBlock;
2    break;
3}

in src/net_processing.cpp:2193 in 7145218d32 outdated

2188@@ -2158,17 +2189,27 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2189         {
2190             LOCK(cs_main);
2191 
2192-            std::map<uint256, std::pair<NodeId, std::list<QueuedBlock>::iterator> >::iterator it = mapBlocksInFlight.find(resp.blockhash);
2193-            if (it == mapBlocksInFlight.end() || !it->second.second->partialBlock ||
2194-                    it->second.first != pfrom->GetId()) {
2195+            bool fExpectedBLOCKTXN = false;

promag commented at 10:22 am on July 11, 2017:

expected_BLOCKTXN?

in src/net_processing.cpp:2428 in 7145218d32 outdated

2422@@ -2382,7 +2423,10 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
2423             LOCK(cs_main);
2424             // Also always process if we requested the block explicitly, as we may
2425             // need it even though it is not a candidate for a new best tip.
2426-            forceProcessing |= MarkBlockAsReceived(hash);
2427+            // TODO: Only process if requested from this peer?
2428+            forceProcessing |= mmapBlocksInFlight.count(hash);

promag commented at 10:27 am on July 11, 2017:

Replace these mmapBlocksInFlight.count() with, for instance:

0forceProcessing |= IsBlockInFlight(hash);

TheBlueMatt force-pushed on Jul 11, 2017

TheBlueMatt commented at 3:16 pm on July 11, 2017: member

Rebased to remove the useless old commits, but no changes. Still would be nice to see thtis for 0.15, but not a huge deal if it misses.

laanwj added this to the milestone 0.15.0 on Jul 11, 2017

morcos commented at 5:27 pm on July 11, 2017: member

utACK ca66c4a

in src/net_processing.cpp:108 in ca66c4a7d7 outdated

104@@ -105,7 +105,8 @@ namespace {
105         bool fValidatedHeaders;                                  //!< Whether this block has validated headers at the time of request.
106         std::unique_ptr<PartiallyDownloadedBlock> partialBlock;  //!< Optional, used for CMPCTBLOCK downloads
107     };
108-    std::map<uint256, std::pair<NodeId, std::list<QueuedBlock>::iterator> > mapBlocksInFlight;
109+    typedef std::multimap<uint256, std::pair<NodeId, std::list<QueuedBlock>::iterator>> BlockDownloadMap;

promag commented at 10:52 pm on July 12, 2017:

Suggestion, add NodeId to QueuedBlock structure?

TheBlueMatt commented at 7:07 pm on August 3, 2017:

Would be a bit cleaner, maybe, but would also make it redundant.

TheBlueMatt commented at 9:30 pm on July 18, 2017: member

This should likely be un-tagged 0.15. Its not a critical enough bugfix to merit merging at this late stage.

MarcoFalke removed this from the milestone 0.15.0 on Jul 18, 2017

TheBlueMatt force-pushed on Aug 3, 2017

Turn mapBlocksInFlight into a multimap ff57eb4bbd

Call NewPoWValidBlock callbacks for all new blocks, not just !IBD

This pushes some "is this callback useful" logic down into
net_processing, which is useful for later changes as it allows for
more notifications to be used.

b4592659b9

MarkBlockAsReceived on NewPoWValidBlock at receive.

The received block could be malleated, so this is both simpler, and
supports parallel downloads.

61c9b81df0

Only request full blocks from the peer we thought had the block in-flight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)

8689df48ce

TheBlueMatt force-pushed on Aug 3, 2017

TheBlueMatt commented at 7:05 pm on August 14, 2017: member

Closing as I believe I’m taking a different approach to beter network parallelization, however the commits here are still part of #10984, where they will remain.

TheBlueMatt closed this on Aug 14, 2017

laanwj removed this from the "Blockers" column in a project

laanwj referenced this in commit 723e580657 on Sep 7, 2017

PastaPastaPasta referenced this in commit d9684ed18a on Sep 24, 2019

codablock referenced this in commit 022e80a1c7 on Sep 26, 2019

codablock referenced this in commit 2edc29ee32 on Sep 29, 2019

barrystyle referenced this in commit a2bd5cd8fb on Jan 22, 2020

DrahtBot locked this on Sep 8, 2021

Small step towards demangling cs_main from CNodeState #10652