Call ProcessNewBlock() asynchronously #16175

pull TheBlueMatt wants to merge 10 commits into bitcoin:master from TheBlueMatt:2019-06-add-second-thread changing 14 files +347 −165
  1. TheBlueMatt commented at 11:51 am on June 9, 2019: member
    This is essentially a rebase of #12934, though its actually greenfield to reduce diff a bit. Based on #16174 as without it the parallelism would be almost entirely useless. Note that this is draft as I haven’t (yet) bothered doing the backgrounding of the block validation, its currently just a return value change and the net_processing changes to handle it.
  2. DrahtBot added the label Mining on Jun 9, 2019
  3. DrahtBot added the label P2P on Jun 9, 2019
  4. DrahtBot added the label RPC/REST/ZMQ on Jun 9, 2019
  5. DrahtBot added the label Tests on Jun 9, 2019
  6. DrahtBot added the label Validation on Jun 9, 2019
  7. DrahtBot commented at 3:05 pm on June 9, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16202 (Refactor network message deserialization by jonasschnelli)
    • #16194 (refactor: share blockmetadata with BlockManager by jamesob)
    • #15921 (Tidy up ValidationState interface by jnewbery)
    • #15606 ([experimental] UTXO snapshots by jamesob)
    • #15545 ([doc] explain why CheckBlock() is called before AcceptBlock by Sjors)
    • #15505 ([p2p] Request NOTFOUND transactions immediately from other outbound peers, when possible by sdaftuar)
    • #15206 (Immediately disconnect on invalid net message checksum by jonasschnelli)
    • #14046 (net: Refactor message parsing (CNetMessage), adds flexibility by jonasschnelli)
    • #14032 (Add p2p layer encryption with ECDH/ChaCha20Poly1305 by jonasschnelli)
    • #9849 (Qt: Network Watch tool by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. TheBlueMatt force-pushed on Jun 9, 2019
  9. TheBlueMatt force-pushed on Jun 9, 2019
  10. TheBlueMatt commented at 7:32 pm on June 9, 2019: member
    Oops, it turns out its actually not possible to do #16174 first separately (and without it, this PR is entirely useless). By itself, #16174 ends up hitting the deadly cross-validationinterface-queue deadlock - it takes cs_peerstate first, then (eventually) calls PNB, which may call LimitValidationInterfaceQueue(). Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer’s state. Of course, once we actually move block validation into its own thread, this is no longer (materially) a concern - we trivially guarantee there are no locks held going into LimitValidationInterfaceQueue().
  11. TheBlueMatt force-pushed on Jun 9, 2019
  12. TheBlueMatt force-pushed on Jun 9, 2019
  13. TheBlueMatt force-pushed on Jun 9, 2019
  14. TheBlueMatt force-pushed on Jun 9, 2019
  15. TheBlueMatt marked this as ready for review on Jun 9, 2019
  16. TheBlueMatt force-pushed on Jun 9, 2019
  17. TheBlueMatt commented at 10:04 pm on June 9, 2019: member
    Note that there is still a regression causing some functional tests to time out as we may end up waiting 100ms to discover that a block has been processed and we can process the next message from that peer (this shouldn’t be an issue in the case of downloading from multiple peers since we’ll always wake on a new tip block, and seems to not be an issue for one peer, even though its possible it would be due to a wake race). Not 100% sure what the best solution to that is.
  18. jamesob commented at 2:38 am on June 11, 2019: member

    Oops, it turns out its actually not possible to do #16174 first separately […] Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer’s state.

    For other reviewers: the particular validationinterface callback at fault here looks to be PeerLogicValidation::BlockChecked.

    Big Concept ACK on the lock-splitting at the very least, though I do wish it could be done separately somehow. I’m a Concept-Concept ACK on this (in concept), but want to make sure I fully understand the implications.

  19. Remove unnecessary cs_mains in denialofservice_tests
    9fdf05d70cac4a62d1aeeb4299e2c3a9a866f8af resolved some lock
    inversion warnings in denialofservice_tests, but left in a number
    of cs_main locks that are unnecessary (introducing lock inversion
    warnings in future changes).
    9bc8ca61f5
  20. Make ProcessNewBlock return a future instead of an immediate bool
    This prepares for making ProcessNewBlock processing actually happen
    in a separate thread from the caller, even though all callsites
    currently block on the return value immediately.
    
    Note that this makes some of the unit tests less restrictive.
    79e6e446dd
  21. Add a new peer state tracking class to reduce cs_main contention.
    CNodeState was added for validation-state-tracking, and thus,
    logically, was protected by cs_main. However, as it has grown to
    include non-validation state (taking state from CNode), and as
    we've reduced cs_main usage for other unrelated things, CNodeState
    is left with lots of cs_main locking in net_processing.
    
    In order to ease transition to something new, this adds only a
    dummy CPeerState which is held as a reference for the duration of
    message processing.
    
    Note that moving things is somewhat tricky pre validation-thread as
    a consistent lockorder must be kept - we can't take a lock on the
    new cs_peerstate in anything that's called directly from
    validation.
    e89822964e
  22. Move net_processing's ProcessNewBlock calls to resolve async.
    Essentially, our goal is to not process anything for the given peer
    until the block finishes processing (emulating the previous behavior)
    without actually blocking the ProcessMessages loops. Obviously, in
    most cases, we'll just go on to the next peer and immediately hit a
    cs_main lock, blocking us anyway, but this we can slowly improve
    that state over time by moving things from CNodeState to CPeerState.
    8ddbb12678
  23. Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)
    Spawn a background thread at startup which validates each block as
    it comes in from ProcessNewBlock, taking advantage of the new
    std::future return value to keep tests simple (and the new
    net_processing handling of such values async already).
    
    This makes introducing subtle validationinterface deadlocks much
    harder as any locks held going into ProcessNewBlock do not interact
    with (in the form of lockorder restrictions) locks taken in
    validationinterface callbacks.
    
    Note that after this commit, feature_block and feature_assumevalid
    tests time out due to increased latency between block processing
    when those blocks do not represent a new best block. This will be
    resolved in the next commit.
    64e74ce8a9
  24. Add a callback to indicate a block has been processed
    This resolves the performance regression introduced in the previous
    commit by always waking the message processing thread after each
    block future resolves.
    
    Sadly, this is somewhat awkward - all other validationinterface
    callbacks represent an actual change to the global validation state,
    whereas this callback indicates only that a call which one
    validation "client" made has completed. After going back and forth
    for some time I didn't see a materially better way to resolve this
    issue, and luckily its a rather simple change, but its far from
    ideal. Note that because we absolutely do not want to ever block on
    a ProcessNewBlock-returned-future, the callback approach is
    critical.
    ba810b9e88
  25. Move BlockChecked to a background thread 3499d988a9
  26. Move nDoS counters to CPeerState (and, thus, out of cs_main) 445bb84bd3
  27. Move rejects into cs_peerstate.
    This removes the cs_main lock which is taken on every
    ProcessMessages call.
    b6a05524f9
  28. Actually parallelize validation somewhat 2d1e8f64cd
  29. in src/net_processing.cpp:403 in 54b3431c40 outdated
    398@@ -388,7 +399,20 @@ struct CNodeState {
    399 // Keeps track of the time (in microseconds) when transactions were requested last time
    400 limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);
    401 
    402+/** Note that this must be locked BEFORE cs_main! */
    403+CCriticalSection cs_peerstate;
    


    ryanofsky commented at 4:30 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    Do you want this to be a recursive mutex for some reason? If so, it’d helpful to say in an comment why recursive locking is helpful here. Otherwise I’d suggest changing this line to Mutex g_peerstate_mutex (also not using the old “critical section” terminology).

  30. in src/net_processing.cpp:785 in 54b3431c40 outdated
    779@@ -756,12 +780,22 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
    780         LOCK(cs_main);
    781         mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName), pnode->fInbound, pnode->m_manual_connection));
    782     }
    783+    {
    784+        LOCK(cs_peerstate);
    785+        mapPeerState.emplace_hint(mapPeerState.end(), nodeid, CPeerState{});
    


    ryanofsky commented at 4:33 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    Not necessarily suggesting you should change this here, but is the only reason for adding a new mapPeerState map to make the diff smaller? If entries are always added to both mapNodeState and mapPeerState maps and removed from the maps at the same times with both locks held, it would seem more ideal to have just single map where each entry contains separate node state and peer state structs (to support separate lock annotations).

  31. in src/net_processing.cpp:210 in 54b3431c40 outdated
    201+ * Maintain state about nodes, protected by our own lock. Historically we put all
    202+ * peer tracking state in CNodeState, however this results in significant cs_main
    203+ * contention. Thus, new state tracking should go here, and we should eventually
    204+ * move most (non-validation-specific) state here.
    205+ */
    206+struct CPeerState {
    


    ryanofsky commented at 5:26 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    Probably better to follow current coding convention and call it PeerState instead of CPeerState

  32. in src/net_processing.cpp:408 in 54b3431c40 outdated
    403+CCriticalSection cs_peerstate;
    404+
    405 /** Map maintaining per-node state. */
    406+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);
    407+
    408+static CPeerState *PeerState(NodeId pnode) EXCLUSIVE_LOCKS_REQUIRED(cs_peerstate) LOCKS_EXCLUDED(cs_main) {
    


    ryanofsky commented at 5:30 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    Would suggest calling this LookupPeerState, to be consistent with LookupBlockIndex, and because otherwise calls to this function look like they are constructing objects, not looking up existing objects

  33. in src/net_processing.cpp:431 in 54b3431c40 outdated
    398@@ -388,7 +399,20 @@ struct CNodeState {
    399 // Keeps track of the time (in microseconds) when transactions were requested last time
    400 limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);
    401 
    402+/** Note that this must be locked BEFORE cs_main! */
    403+CCriticalSection cs_peerstate;
    404+
    405 /** Map maintaining per-node state. */
    406+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);
    


    ryanofsky commented at 5:32 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    g_peer_states instead of mapPeerState would be more in line with current coding convention

  34. in src/net_processing.cpp:3308 in 54b3431c40 outdated
    3228@@ -3193,6 +3229,8 @@ bool PeerLogicValidation::SendRejectsAndCheckIfBanned(CNode* pnode, bool enable_
    3229 bool PeerLogicValidation::ProcessMessages(CNode* pfrom, std::atomic<bool>& interruptMsgProc)
    3230 {
    3231     const CChainParams& chainparams = Params();
    3232+    LOCK(cs_peerstate);
    


    ryanofsky commented at 5:56 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    Acquiring a global lock in PeerLogicValidation::ProcessMessages for a single peer seems like a pessimization. Why acquire cs_peerstate at such a broad scope here instead of just acquiring it directly where needed, like the code is currently doing with cs_main? This would potentially be more efficient and make ProcessMessages code easier to parallelize in the future, but more importantly I think it would make the control flow easier to understand because you could see the mutex being locked in actual places the lock is needed, instead of relying on it already being locked somewhere up the call stack.

    If the broader locking is justified, it would at least be good to have a comment to give a clue about why the mutex is being locked for so long beyond the initial PeerState lookup, and what other things below require the lock.

  35. in src/net_processing.cpp:3608 in 54b3431c40 outdated
    3521@@ -3484,6 +3522,9 @@ bool PeerLogicValidation::SendMessages(CNode* pto)
    3522         // If we get here, the outgoing message serialization version is set and can't change.
    3523         const CNetMsgMaker msgMaker(pto->GetSendVersion());
    3524 
    3525+        LOCK(cs_peerstate);
    


    ryanofsky commented at 6:24 pm on June 11, 2019:

    In commit “Add a new peer state tracking class to reduce cs_main contention.” (e89822964e9ef8dadf88d47bc94f5396a8c61a54)

    It’d be good to have a comment about what code here actually requires the lock. It’s unclear if it’s just the map lookup and IsPendingBlockValidated code added in a later commit, or if cs_peerstate is replacing cs_main in other parts of this function. If you could reduce the scope of the lock to only the code that needs it, I think that’d also be good, but when you’re holding a lock over a really wide scope and not briefly for a specific reason, it’s good to have some explanation.

  36. in src/net_processing.cpp:240 in bcc64438d2 outdated
    224+    CPeerState(std::string addrNameIn, bool is_inbound, bool is_manual) :
    225+        name(std::move(addrNameIn)),
    226+        m_is_inbound(is_inbound),
    227+        m_is_manual_connection (is_manual)
    228+    {
    229+        fShouldBan = false;
    


    ryanofsky commented at 7:17 pm on June 11, 2019:

    In commit “Move nDoS counters to CPeerState (and, thus, out of cs_main)” (bcc64438d2eebf01866a26a92abf932ed21e0d4f):

    Would be clearer to assign members directly above (bool fShouldBan = false;)

  37. TheBlueMatt force-pushed on Jun 12, 2019
  38. in src/validation.h:220 in 2d1e8f64cd
    219  *
    220  * @param[in]   pblock  The block we want to process.
    221  * @param[in]   fForceProcessing Process this block even if unrequested; used for non-network block sources and whitelisted peers.
    222- * @param[out]  fNewBlock A boolean which is set to indicate if the block was first received via this call
    223- * @return True if state.IsValid()
    224+ * @return      A future which complets with a boolean which is set to indicate if the block was first received via this call
    


    practicalswift commented at 7:52 am on June 13, 2019:
    Adding an “e” completes the spelling here :-)
  39. in src/test/denialofservice_tests.cpp:98 in 9bc8ca61f5 outdated
    94@@ -95,11 +95,11 @@ BOOST_AUTO_TEST_CASE(outbound_slow_chain_eviction)
    95 
    96     // Test starts here
    97     {
    98-        LOCK2(cs_main, dummyNode1.cs_sendProcessing);
    


    ryanofsky commented at 7:02 pm on June 13, 2019:

    In commit “Remove unnecessary cs_mains in denialofservice_tests” (9bc8ca61f5c9ca57ea71b3acdd1021a5cf66512f)

    Consider moving this commit into a separate PR. It could probably be merged quickly and would cut down size of this PR.

  40. in src/validation.cpp:3464 in 79e6e446dd outdated
    3461+            result_promise.set_value(fNewBlock);
    3462+            return result;
    3463         }
    3464     }
    3465 
    3466+    result_promise.set_value(fNewBlock);
    


    ryanofsky commented at 7:40 pm on June 13, 2019:

    In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)

    This line is added here and them moved below NotifyHeaderTip in a later commit 64e74ce8a917e071b8a250a4ba9cd110448796d6. It would be preferable to just add in the right place below now so it doesn’t need to be moved later.

  41. in src/validation.cpp:3509 in 64e74ce8a9 outdated
    3533+    }
    3534+    m_cv_block_validation_queue.notify_one();
    3535     return result;
    3536 }
    3537 
    3538+std::future<bool> ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<const CBlock> pblock, bool fForceProcessing)
    


    ryanofsky commented at 7:48 pm on June 13, 2019:

    In commit “Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)” (64e74ce8a917e071b8a250a4ba9cd110448796d6)

    Documentation for this function in validation.h becomes out of date with this commit and should be updated. It still says “This only returns after the best known valid block is made active.”

  42. in src/validation.cpp:3475 in 64e74ce8a9 outdated
    3493+            // Ensure that CheckBlock() passes before calling AcceptBlock, as
    3494+            // belt-and-suspenders.
    3495+            bool ret = CheckBlock(*pblock, state, chainparams.GetConsensus());
    3496+            if (ret) {
    3497+                // Store to disk
    3498+                ret = ::ChainstateActive().AcceptBlock(pblock, state, chainparams, &pindex, fForceProcessing, nullptr, &fNewBlock);
    


    ryanofsky commented at 8:01 pm on June 13, 2019:

    In commit “Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)” (64e74ce8a917e071b8a250a4ba9cd110448796d6)

    Throughout this function shouldn’t ::ChainstateActive() be replaced by *this?

  43. in src/rpc/mining.cpp:755 in 79e6e446dd outdated
    753+    bool new_block = ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true).get();
    754     UnregisterValidationInterface(&sc);
    755-    if (!new_block && accepted) {
    756+    LOCK(cs_main);
    757+    const CBlockIndex* pindex = LookupBlockIndex(blockptr->GetHash());
    758+    if (!new_block && pindex && pindex->IsValid()) {
    


    ryanofsky commented at 8:04 pm on June 13, 2019:

    In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)

    This seems fine, but just to make sure I understand correctly, there is a change in behavior here? Previously accepted would only be true if CheckBlock and AcceptBlock and ActivateBestChain all succeeded but this can be true even they fail?

  44. in src/test/blockfilter_index_tests.cpp:178 in 79e6e446dd outdated
    174@@ -175,7 +175,7 @@ BOOST_FIXTURE_TEST_CASE(blockfilter_index_initial_sync, TestChain100Setup)
    175     uint256 chainA_last_header = last_header;
    176     for (size_t i = 0; i < 2; i++) {
    177         const auto& block = chainA[i];
    178-        BOOST_REQUIRE(ProcessNewBlock(Params(), block, true, nullptr));
    


    ryanofsky commented at 8:28 pm on June 13, 2019:

    In commit “Make ProcessNewBlock return a future instead of an immediate bool” (79e6e446dd0c8a3f53759a9792388f28e1053bfb)

    In the commit message about these tests you wrote “Note that this makes some of the unit tests less restrictive.” But I don’t think there’s a good justification for doing this. If you just declared struct ProcessNewBlockResult { bool error = false; bool already_have = false; }; and returned std::future<ProcessNewBlockResult>, the tests could behave the same and the code would be more readable and less fragile, without the need for people to guess from context whether the bool indicates success or newness.


    ajtowns commented at 1:30 am on June 18, 2019:
    Think I agree with this suggestion; dropping the fNewBlock result at the same time as turning the return into a future seems unnecessarily complicated.
  45. ryanofsky commented at 8:39 pm on June 13, 2019: member

    Started review (will update list below with progress).

    • 9bc8ca61f5c9ca57ea71b3acdd1021a5cf66512f Remove unnecessary cs_mains in denialofservice_tests (1/10)
    • 79e6e446dd0c8a3f53759a9792388f28e1053bfb Make ProcessNewBlock return a future instead of an immediate bool (2/10)
    • e89822964e9ef8dadf88d47bc94f5396a8c61a54 Add a new peer state tracking class to reduce cs_main contention. (3/10)
    • 8ddbb126780a8ab6ff547411e2b8d3b46e42c887 Move net_processing’s ProcessNewBlock calls to resolve async. (4/10)
    • 64e74ce8a917e071b8a250a4ba9cd110448796d6 Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken) (5/10)
    • ba810b9e88bee558c34ac24dc685c68e5d5c83cb Add a callback to indicate a block has been processed (6/10)
    • 3499d988a9cef162987d98dda3cf914be24a3fa1 Move BlockChecked to a background thread (7/10)
    • 445bb84bd3c38b2e98ccc247aa1a346ca3c5d66c Move nDoS counters to CPeerState (and, thus, out of cs_main) (8/10)
    • b6a05524f9b66af1147c9776e3b499bdb94655f8 Move rejects into cs_peerstate. (9/10)
    • 2d1e8f64cd527b9c44ba171beb0bb837c64cc0a5 Actually parallelize validation somewhat (10/10)
  46. jamesob commented at 8:58 pm on June 13, 2019: member

    When IBDing from a single peer, this branch appears to be no slower (or negligibly slower) than master. I’m going to modify the benchmark framework to test against multiple peers, which is where I’m assuming we’d expect to see some speedup.

    ibd local range 500000 505000

  47. ryanofsky commented at 2:53 pm on June 14, 2019: member
    Just want to note that all my comments above are just suggested cleanups. Feel free to ignore them if they don’t make sense or aren’t worth effort to implement. The only thing I’d really like to see are more comments about locking. When a lock is held in a small scope for specific purpose, I don’t think there’s a need to have a comment, but when a lock is held over wide scope for no obvious reason it’s good to have a comment that suggests why it needs to be acquired there and when it is safe to release.
  48. TheBlueMatt commented at 8:17 pm on June 18, 2019: member
    Working on rewriting this to make it simpler but it looks like it may end up growing so dunno about it.
  49. TheBlueMatt closed this on Jun 18, 2019

  50. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-30 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me