Call ProcessNewBlock() asynchronously #16175
pull TheBlueMatt wants to merge 10 commits into bitcoin:master from TheBlueMatt:2019-06-add-second-thread changing 14 files +347 −165-
TheBlueMatt commented at 11:51 am on June 9, 2019: memberThis is essentially a rebase of #12934, though its actually greenfield to reduce diff a bit. Based on #16174 as without it the parallelism would be almost entirely useless. Note that this is draft as I haven’t (yet) bothered doing the backgrounding of the block validation, its currently just a return value change and the net_processing changes to handle it.
-
DrahtBot added the label Mining on Jun 9, 2019
-
DrahtBot added the label P2P on Jun 9, 2019
-
DrahtBot added the label RPC/REST/ZMQ on Jun 9, 2019
-
DrahtBot added the label Tests on Jun 9, 2019
-
DrahtBot added the label Validation on Jun 9, 2019
-
DrahtBot commented at 3:05 pm on June 9, 2019: member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #16202 (Refactor network message deserialization by jonasschnelli)
- #16194 (refactor: share blockmetadata with BlockManager by jamesob)
- #15921 (Tidy up ValidationState interface by jnewbery)
- #15606 ([experimental] UTXO snapshots by jamesob)
- #15545 ([doc] explain why CheckBlock() is called before AcceptBlock by Sjors)
- #15505 ([p2p] Request NOTFOUND transactions immediately from other outbound peers, when possible by sdaftuar)
- #15206 (Immediately disconnect on invalid net message checksum by jonasschnelli)
- #14046 (net: Refactor message parsing (CNetMessage), adds flexibility by jonasschnelli)
- #14032 (Add p2p layer encryption with ECDH/ChaCha20Poly1305 by jonasschnelli)
- #9849 (Qt: Network Watch tool by luke-jr)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt commented at 7:32 pm on June 9, 2019: memberOops, it turns out its actually not possible to do #16174 first separately (and without it, this PR is entirely useless). By itself, #16174 ends up hitting the deadly cross-validationinterface-queue deadlock - it takes cs_peerstate first, then (eventually) calls PNB, which may call LimitValidationInterfaceQueue(). Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer’s state. Of course, once we actually move block validation into its own thread, this is no longer (materially) a concern - we trivially guarantee there are no locks held going into LimitValidationInterfaceQueue().
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt marked this as ready for review on Jun 9, 2019
-
TheBlueMatt force-pushed on Jun 9, 2019
-
TheBlueMatt commented at 10:04 pm on June 9, 2019: memberNote that there is still a regression causing some functional tests to time out as we may end up waiting 100ms to discover that a block has been processed and we can process the next message from that peer (this shouldn’t be an issue in the case of downloading from multiple peers since we’ll always wake on a new tip block, and seems to not be an issue for one peer, even though its possible it would be due to a wake race). Not 100% sure what the best solution to that is.
-
jamesob commented at 2:38 am on June 11, 2019: member
Oops, it turns out its actually not possible to do #16174 first separately […] Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer’s state.
For other reviewers: the particular validationinterface callback at fault here looks to be
PeerLogicValidation::BlockChecked
.Big Concept ACK on the lock-splitting at the very least, though I do wish it could be done separately somehow. I’m a Concept-Concept ACK on this (in concept), but want to make sure I fully understand the implications.
-
Remove unnecessary cs_mains in denialofservice_tests
9fdf05d70cac4a62d1aeeb4299e2c3a9a866f8af resolved some lock inversion warnings in denialofservice_tests, but left in a number of cs_main locks that are unnecessary (introducing lock inversion warnings in future changes).
-
Make ProcessNewBlock return a future instead of an immediate bool
This prepares for making ProcessNewBlock processing actually happen in a separate thread from the caller, even though all callsites currently block on the return value immediately. Note that this makes some of the unit tests less restrictive.
-
Add a new peer state tracking class to reduce cs_main contention.
CNodeState was added for validation-state-tracking, and thus, logically, was protected by cs_main. However, as it has grown to include non-validation state (taking state from CNode), and as we've reduced cs_main usage for other unrelated things, CNodeState is left with lots of cs_main locking in net_processing. In order to ease transition to something new, this adds only a dummy CPeerState which is held as a reference for the duration of message processing. Note that moving things is somewhat tricky pre validation-thread as a consistent lockorder must be kept - we can't take a lock on the new cs_peerstate in anything that's called directly from validation.
-
Move net_processing's ProcessNewBlock calls to resolve async.
Essentially, our goal is to not process anything for the given peer until the block finishes processing (emulating the previous behavior) without actually blocking the ProcessMessages loops. Obviously, in most cases, we'll just go on to the next peer and immediately hit a cs_main lock, blocking us anyway, but this we can slowly improve that state over time by moving things from CNodeState to CPeerState.
-
Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)
Spawn a background thread at startup which validates each block as it comes in from ProcessNewBlock, taking advantage of the new std::future return value to keep tests simple (and the new net_processing handling of such values async already). This makes introducing subtle validationinterface deadlocks much harder as any locks held going into ProcessNewBlock do not interact with (in the form of lockorder restrictions) locks taken in validationinterface callbacks. Note that after this commit, feature_block and feature_assumevalid tests time out due to increased latency between block processing when those blocks do not represent a new best block. This will be resolved in the next commit.
-
Add a callback to indicate a block has been processed
This resolves the performance regression introduced in the previous commit by always waking the message processing thread after each block future resolves. Sadly, this is somewhat awkward - all other validationinterface callbacks represent an actual change to the global validation state, whereas this callback indicates only that a call which one validation "client" made has completed. After going back and forth for some time I didn't see a materially better way to resolve this issue, and luckily its a rather simple change, but its far from ideal. Note that because we absolutely do not want to ever block on a ProcessNewBlock-returned-future, the callback approach is critical.
-
Move BlockChecked to a background thread 3499d988a9
-
Move nDoS counters to CPeerState (and, thus, out of cs_main) 445bb84bd3
-
Move rejects into cs_peerstate.
This removes the cs_main lock which is taken on every ProcessMessages call.
-
Actually parallelize validation somewhat 2d1e8f64cd
-
in src/net_processing.cpp:403 in 54b3431c40 outdated
398@@ -388,7 +399,20 @@ struct CNodeState { 399 // Keeps track of the time (in microseconds) when transactions were requested last time 400 limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ); 401 402+/** Note that this must be locked BEFORE cs_main! */ 403+CCriticalSection cs_peerstate;
ryanofsky commented at 4:30 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
Do you want this to be a recursive mutex for some reason? If so, it’d helpful to say in an comment why recursive locking is helpful here. Otherwise I’d suggest changing this line to
Mutex g_peerstate_mutex
(also not using the old “critical section” terminology).in src/net_processing.cpp:785 in 54b3431c40 outdated
779@@ -756,12 +780,22 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) { 780 LOCK(cs_main); 781 mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName), pnode->fInbound, pnode->m_manual_connection)); 782 } 783+ { 784+ LOCK(cs_peerstate); 785+ mapPeerState.emplace_hint(mapPeerState.end(), nodeid, CPeerState{});
ryanofsky commented at 4:33 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
Not necessarily suggesting you should change this here, but is the only reason for adding a new
mapPeerState
map to make the diff smaller? If entries are always added to bothmapNodeState
andmapPeerState
maps and removed from the maps at the same times with both locks held, it would seem more ideal to have just single map where each entry contains separate node state and peer state structs (to support separate lock annotations).in src/net_processing.cpp:210 in 54b3431c40 outdated
201+ * Maintain state about nodes, protected by our own lock. Historically we put all 202+ * peer tracking state in CNodeState, however this results in significant cs_main 203+ * contention. Thus, new state tracking should go here, and we should eventually 204+ * move most (non-validation-specific) state here. 205+ */ 206+struct CPeerState {
ryanofsky commented at 5:26 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
Probably better to follow current coding convention and call it
PeerState
instead ofCPeerState
in src/net_processing.cpp:408 in 54b3431c40 outdated
403+CCriticalSection cs_peerstate; 404+ 405 /** Map maintaining per-node state. */ 406+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate); 407+ 408+static CPeerState *PeerState(NodeId pnode) EXCLUSIVE_LOCKS_REQUIRED(cs_peerstate) LOCKS_EXCLUDED(cs_main) {
ryanofsky commented at 5:30 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
Would suggest calling this
LookupPeerState
, to be consistent withLookupBlockIndex
, and because otherwise calls to this function look like they are constructing objects, not looking up existing objectsin src/net_processing.cpp:431 in 54b3431c40 outdated
398@@ -388,7 +399,20 @@ struct CNodeState { 399 // Keeps track of the time (in microseconds) when transactions were requested last time 400 limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ); 401 402+/** Note that this must be locked BEFORE cs_main! */ 403+CCriticalSection cs_peerstate; 404+ 405 /** Map maintaining per-node state. */ 406+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);
ryanofsky commented at 5:32 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
g_peer_states
instead ofmapPeerState
would be more in line with current coding conventionin src/net_processing.cpp:3308 in 54b3431c40 outdated
3228@@ -3193,6 +3229,8 @@ bool PeerLogicValidation::SendRejectsAndCheckIfBanned(CNode* pnode, bool enable_ 3229 bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& interruptMsgProc) 3230 { 3231 const CChainParams& chainparams = Params(); 3232+ LOCK(cs_peerstate);
ryanofsky commented at 5:56 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
Acquiring a global lock in
PeerLogicValidation::ProcessMessages
for a single peer seems like a pessimization. Why acquirecs_peerstate
at such a broad scope here instead of just acquiring it directly where needed, like the code is currently doing withcs_main
? This would potentially be more efficient and make ProcessMessages code easier to parallelize in the future, but more importantly I think it would make the control flow easier to understand because you could see the mutex being locked in actual places the lock is needed, instead of relying on it already being locked somewhere up the call stack.If the broader locking is justified, it would at least be good to have a comment to give a clue about why the mutex is being locked for so long beyond the initial PeerState lookup, and what other things below require the lock.
in src/net_processing.cpp:3608 in 54b3431c40 outdated
3521@@ -3484,6 +3522,9 @@ bool PeerLogicValidation::SendMessages(CNode* pto) 3522 // If we get here, the outgoing message serialization version is set and can't change. 3523 const CNetMsgMaker msgMaker(pto->GetSendVersion()); 3524 3525+ LOCK(cs_peerstate);
ryanofsky commented at 6:24 pm on June 11, 2019:In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)
It’d be good to have a comment about what code here actually requires the lock. It’s unclear if it’s just the map lookup and
IsPendingBlockValidated
code added in a later commit, or if cs_peerstate is replacing cs_main in other parts of this function. If you could reduce the scope of the lock to only the code that needs it, I think that’d also be good, but when you’re holding a lock over a really wide scope and not briefly for a specific reason, it’s good to have some explanation.in src/net_processing.cpp:240 in bcc64438d2 outdated
224+ CPeerState(std::string addrNameIn, bool is_inbound, bool is_manual) : 225+ name(std::move(addrNameIn)), 226+ m_is_inbound(is_inbound), 227+ m_is_manual_connection (is_manual) 228+ { 229+ fShouldBan = false;
ryanofsky commented at 7:17 pm on June 11, 2019:In commit “Move nDoS counters to CPeerState (and, thus, out of cs_main)” (bcc64438d2eebf01866a26a92abf932ed21e0d4f):
Would be clearer to assign members directly above (
bool fShouldBan = false;
)TheBlueMatt force-pushed on Jun 12, 2019in src/validation.h:220 in 2d1e8f64cd
219 * 220 * @param[in] pblock The block we want to process. 221 * @param[in] fForceProcessing Process this block even if unrequested; used for non-network block sources and whitelisted peers. 222- * @param[out] fNewBlock A boolean which is set to indicate if the block was first received via this call 223- * @return True if state.IsValid() 224+ * @return A future which complets with a boolean which is set to indicate if the block was first received via this call
practicalswift commented at 7:52 am on June 13, 2019:Adding an “e” completes the spelling here :-)in src/test/denialofservice_tests.cpp:98 in 9bc8ca61f5 outdated
94@@ -95,11 +95,11 @@ BOOST_AUTO_TEST_CASE(outbound_slow_chain_eviction) 95 96 // Test starts here 97 { 98- LOCK2(cs_main, dummyNode1.cs_sendProcessing);
ryanofsky commented at 7:02 pm on June 13, 2019:In commit “Remove unnecessary cs_mains in denialofservice_tests” (9bc8ca61f5c9ca57ea71b3acdd1021a5cf66512f)
Consider moving this commit into a separate PR. It could probably be merged quickly and would cut down size of this PR.
in src/validation.cpp:3464 in 79e6e446dd outdated
3461+ result_promise.set_value(fNewBlock); 3462+ return result; 3463 } 3464 } 3465 3466+ result_promise.set_value(fNewBlock);
ryanofsky commented at 7:40 pm on June 13, 2019:In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)
This line is added here and them moved below NotifyHeaderTip in a later commit 64e74ce8a917e071b8a250a4ba9cd110448796d6. It would be preferable to just add in the right place below now so it doesn’t need to be moved later.
in src/validation.cpp:3509 in 64e74ce8a9 outdated
3533+ } 3534+ m_cv_block_validation_queue.notify_one(); 3535 return result; 3536 } 3537 3538+std::future<bool> ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<const CBlock> pblock, bool fForceProcessing)
ryanofsky commented at 7:48 pm on June 13, 2019:In commit “Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)” (64e74ce8a917e071b8a250a4ba9cd110448796d6)
Documentation for this function in validation.h becomes out of date with this commit and should be updated. It still says “This only returns after the best known valid block is made active.”
in src/validation.cpp:3475 in 64e74ce8a9 outdated
3493+ // Ensure that CheckBlock() passes before calling AcceptBlock, as 3494+ // belt-and-suspenders. 3495+ bool ret = CheckBlock(*pblock, state, chainparams.GetConsensus()); 3496+ if (ret) { 3497+ // Store to disk 3498+ ret = ::ChainstateActive().AcceptBlock(pblock, state, chainparams, &pindex, fForceProcessing, nullptr, &fNewBlock);
ryanofsky commented at 8:01 pm on June 13, 2019:In commit “Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)” (64e74ce8a917e071b8a250a4ba9cd110448796d6)
Throughout this function shouldn’t
::ChainstateActive()
be replaced by*this
?in src/rpc/mining.cpp:755 in 79e6e446dd outdated
753+ bool new_block = ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true).get(); 754 UnregisterValidationInterface(&sc); 755- if (!new_block && accepted) { 756+ LOCK(cs_main); 757+ const CBlockIndex* pindex = LookupBlockIndex(blockptr->GetHash()); 758+ if (!new_block && pindex && pindex->IsValid()) {
ryanofsky commented at 8:04 pm on June 13, 2019:In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)
This seems fine, but just to make sure I understand correctly, there is a change in behavior here? Previously
accepted
would only be true ifCheckBlock
andAcceptBlock
andActivateBestChain
all succeeded but this can be true even they fail?in src/test/blockfilter_index_tests.cpp:178 in 79e6e446dd outdated
174@@ -175,7 +175,7 @@ BOOST_FIXTURE_TEST_CASE(blockfilter_index_initial_sync, TestChain100Setup) 175 uint256 chainA_last_header = last_header; 176 for (size_t i = 0; i < 2; i++) { 177 const auto& block = chainA[i]; 178- BOOST_REQUIRE(ProcessNewBlock(Params(), block, true, nullptr));
ryanofsky commented at 8:28 pm on June 13, 2019:In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)
In the commit message about these tests you wrote “Note that this makes some of the unit tests less restrictive.” But I don’t think there’s a good justification for doing this. If you just declared
struct ProcessNewBlockResult { bool error = false; bool already_have = false; };
and returnedstd::future<ProcessNewBlockResult>
, the tests could behave the same and the code would be more readable and less fragile, without the need for people to guess from context whether the bool indicates success or newness.
ajtowns commented at 1:30 am on June 18, 2019:Think I agree with this suggestion; dropping thefNewBlock
result at the same time as turning the return into afuture
seems unnecessarily complicated.ryanofsky commented at 8:39 pm on June 13, 2019: memberStarted review (will update list below with progress).
- 9bc8ca61f5c9ca57ea71b3acdd1021a5cf66512f Remove unnecessary cs_mains in denialofservice_tests (1/10)
- 79e6e446dd0c8a3f53759a9792388f28e1053bfb Make ProcessNewBlock return a future instead of an immediate bool (2/10)
- e89822964e9ef8dadf88d47bc94f5396a8c61a54 Add a new peer state tracking class to reduce cs_main contention. (3/10)
- 8ddbb126780a8ab6ff547411e2b8d3b46e42c887 Move net_processing’s ProcessNewBlock calls to resolve async. (4/10)
- 64e74ce8a917e071b8a250a4ba9cd110448796d6 Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken) (5/10)
- ba810b9e88bee558c34ac24dc685c68e5d5c83cb Add a callback to indicate a block has been processed (6/10)
- 3499d988a9cef162987d98dda3cf914be24a3fa1 Move BlockChecked to a background thread (7/10)
- 445bb84bd3c38b2e98ccc247aa1a346ca3c5d66c Move nDoS counters to CPeerState (and, thus, out of cs_main) (8/10)
- b6a05524f9b66af1147c9776e3b499bdb94655f8 Move rejects into cs_peerstate. (9/10)
- 2d1e8f64cd527b9c44ba171beb0bb837c64cc0a5 Actually parallelize validation somewhat (10/10)
jamesob commented at 8:58 pm on June 13, 2019: memberWhen IBDing from a single peer, this branch appears to be no slower (or negligibly slower) than master. I’m going to modify the benchmark framework to test against multiple peers, which is where I’m assuming we’d expect to see some speedup.
ryanofsky commented at 2:53 pm on June 14, 2019: memberJust want to note that all my comments above are just suggested cleanups. Feel free to ignore them if they don’t make sense or aren’t worth effort to implement. The only thing I’d really like to see are more comments about locking. When a lock is held in a small scope for specific purpose, I don’t think there’s a need to have a comment, but when a lock is held over wide scope for no obvious reason it’s good to have a comment that suggests why it needs to be acquired there and when it is safe to release.TheBlueMatt commented at 8:17 pm on June 18, 2019: memberWorking on rewriting this to make it simpler but it looks like it may end up growing so dunno about it.TheBlueMatt closed this on Jun 18, 2019
DrahtBot locked this on Dec 16, 2021
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-14 07:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me