[WIP] rebase: Call ProcessNewBlock() asynchronously #18963

pull dongcarl wants to merge 18 commits into bitcoin:master from dongcarl:2020-05-async-pnb changing 15 files +513 −138
  1. dongcarl commented at 8:56 pm on May 12, 2020: contributor

    This is a (currently naive) rebase of #16323, which is a rework of #16175, which is a rework of #12934. Built on top of #17479.

    Currently validationinterface_tests/unregister_all_during_call fails.


    Goals:

    • Add as much documentation as possible to aid with review
    • Split up commits as much as possible to aid with review
  2. DrahtBot added the label Mining on May 12, 2020
  3. DrahtBot added the label P2P on May 12, 2020
  4. DrahtBot added the label RPC/REST/ZMQ on May 12, 2020
  5. DrahtBot added the label Tests on May 12, 2020
  6. DrahtBot added the label Validation on May 12, 2020
  7. dongcarl commented at 6:27 pm on May 13, 2020: contributor
    Since this PR touches critical parts of the codebase, I think it deserves a thorough review. I’m going to go through the original PR (#16323) commit by commit and write down my notes here: https://docs.google.com/document/d/1tduRmqcvhdl3FRkmdnj0f3_fknXuIZ57R8zdCGHNt6k/edit?usp=sharing
  8. [tests] Remove unnecessary cs_mains in denialofservice_tests
    9fdf05d70cac4a62d1aeeb4299e2c3a9a866f8af resolved some lock
    inversion warnings in denialofservice_tests, but left in a number
    of cs_main locks that are unnecessary (introducing lock inversion
    warnings in future changes).
    45bedbd96a
  9. [net processing] Deduplicate post-block-processing code
    Co-authored-by: Carl Dong <contact@carldong.me>
    3446fd2986
  10. [validation] Add BlockValidationState inout param to ProcessNewBlock
    This is a pure refactor commit.
    
    This commit enables the caller of ProcessNewBlock to access the final
    BlockValidationState passed around between CheckBlock(), AcceptBlock(),
    and BlockChecked() inside ProcessNewBlock(). This is useful because in a
    future commit, we will move the BlockChecked() call out of
    ProcessNewBlock(), and BlockChecked() still needs to be able to access
    the BlockValidationState.
    
    Co-authored-by: John Newbery <john@johnnewbery.com>
    Co-authored-by: Carl Dong <contact@carldong.me>
    db0eecc2bd
  11. [net processing] Make BlockChecked a standalone, static function
    This is a pure refactor commit.
    
    Since BlockChecked() doesn't actually depend on all of
    PeerLogicValidation but just PeerLogicValidation's CConnman, we can make
    a standalone, static function that simply has an extra CConnman
    parameter and have the non-static version call the static one.
    
    This also means that, in a future commit, when we move the
    BlockChecked() call out of ProcessNewBlock(), the caller of
    ProcessNewBlock() can call BlockChecked() directly even if they only
    have a CConnman.
    
    Co-authored-by: John Newbery <john@johnnewbery.com>
    Co-authored-by: Carl Dong <contact@carldong.me>
    a85e8b7393
  12. [validation] Return the AcceptBlock BlockValidationState directly in ProcessNewBlock
    Net processing now passes a BlockValidationState object into
    ProcessNewBlock(). If CheckBlock() or AcceptBlock() fails, then PNB
    returns to net processing without calling the (asynchronous)
    BlockChecked Validation Interface method. net processing can use the
    invalid BlockValidationState returned to punish peers.
    
    CheckBlock() and AcceptBlock() represent the DoS checks on a block (ie
    PoW and malleability). Net processing wants to know about those failed
    checks immediately and shouldn't have to wait on a callback. Other
    validation interface clients don't care about net processing submitting
    bogus malleated blocks to validation, so they don't need to be notified
    of BlockChecked.
    
    Furthermore, if PNB returns a valid BlockValidationState, we never need
    to try to process (non-malleated) copies of the block from other peers.
    That makes it much easier to move the best chain activation logic to a
    background thread in future work.
    
    Co-authored-by: John Newbery <john@johnnewbery.com>
    Co-authored-by: Carl Dong <contact@carldong.me>
    8fc192c3ec
  13. [validation] trivial: Rename state to dummy_state for clarity
    This is a pure refactor commit.
    
    Co-authored-by: John Newbery <john@johnnewbery.com>
    Co-authored-by: Carl Dong <contact@carldong.me>
    c94ca98385
  14. [test/rpc] Additional checks for dos_state validity
    Co-authored-by: John Newbery <john@johnnewbery.com>
    Co-authored-by: Carl Dong <contact@carldong.me>
    50048f45a0
  15. dongcarl force-pushed on May 18, 2020
  16. dongcarl commented at 9:49 pm on May 18, 2020: contributor

    Rebased on newer #17479. git-bisecting the validationinterface_tests/unregister_all_during_call says:

    0194935b1a2968b594a42cb880e30701dd2e2bc7c is the first bad commit
    
  17. ryanofsky commented at 10:35 pm on May 18, 2020: contributor

    Rebased on newer #17479. git-bisecting the validationinterface_tests/unregister_all_during_call says:

    0194935b1a2968b594a42cb880e30701dd2e2bc7c is the first bad commit
    

    Test was relying on the fact that BlockChecked as synchronous and that commit made it asynchronous. You can switch it to a different synchronous method:

     0--- a/src/test/validationinterface_tests.cpp
     1+++ b/src/test/validationinterface_tests.cpp
     2@@ -57,15 +57,13 @@ public:
     3     {
     4         if (m_on_destroy) m_on_destroy();
     5     }
     6-    void BlockChecked(const CBlock& block, const BlockValidationState& state) override
     7+    void NewPoWValidBlock(const CBlockIndex *pindex, const std::shared_ptr<const CBlock> &block) override
     8     {
     9         if (m_on_call) m_on_call();
    10     }
    11     static void Call()
    12     {
    13-        std::shared_ptr<const CBlock> block = std::make_shared<CBlock>();
    14-        BlockValidationState state;
    15-        GetMainSignals().BlockChecked(block, state);
    16+        GetMainSignals().NewPoWValidBlock(nullptr, std::make_shared<CBlock>());
    17     }
    18     std::function<void()> m_on_call;
    19     std::function<void()> m_on_destroy;
    
  18. dongcarl commented at 11:50 pm on May 18, 2020: contributor

    Now p2p_unrequested_blocks.py is failing. git-bisect says:

    01d9f66ea37ac6f22f26013cd57ed75fc04a9481f is the first bad commit
    

    As noted in the Google Doc:

    Up to this point, it seems that splitting this commit into:

    1. Changing the call semantics of ProcessNewBlock
    2. Changing the return type to a std::future

    Might make it easier to review.

    So I plan to split it up and track it down.

  19. Make ProcessNewBlock return fNewBlock instead of state validity cb178903b0
  20. Make ProcessNewBlock return a future instead of an immediate bool
    This prepares for making best-chain-activation and disk writes
    happen in a separate thread from the caller, even though all
    callsites currently block on the return value immediately.
    7532cf6c56
  21. Non-obvious parts of "Make ProcessNewBlock return a future instead of an immediate bool" 2dcdb537b5
  22. Add a new peer state tracking class to reduce cs_main contention.
    CNodeState was added for validation-state-tracking, and thus,
    logically, was protected by cs_main. However, as it has grown to
    include non-validation state (taking state from CNode), and as
    we've reduced cs_main usage for other unrelated things, CNodeState
    is left with lots of cs_main locking in net_processing.
    
    In order to ease transition to something new, this adds only a
    dummy CPeerState which is held as a reference for the duration of
    message processing.
    
    Note that moving things is somewhat tricky pre validation-thread as
    a consistent lockorder must be kept - we can't take a lock on the
    new cs_peerstate in anything that's called directly from
    validation.
    3a91e36669
  23. dongcarl force-pushed on May 19, 2020
  24. dongcarl commented at 7:37 pm on May 19, 2020: contributor

    Split up the first commit into:

    1. cb178903b0752de3edfd87f6013c2b4de41b6e13 which changes the call semantics of ProcessNewBlock and deals with the fallout of that
    2. 7532cf6c56d015e8bae646138d71df825d2684ab which changes the return from a bool to a future of a bool and deals with the fallout of that
    3. 2dcdb537b5c1dce60db217238c5e75dae602e755 which contains changes that are non-obvious to me to be correct

    I also botched the git-bisect before (didn’t call make in a bash -c), new output:

     0$ git bisect run bash -c "make -j50 && python3 test/functional/p2p_unrequested_blocks.py"
     1
     2...
     3
     47c77827558715180594f5881b6d02982504c4fad is the first bad commit
     5commit 7c77827558715180594f5881b6d02982504c4fad
     6Author: Matt Corallo <git@bluematt.me>
     7Date:   Mon Jun 17 13:13:36 2019 -0400
     8
     9    Move net_processing's ProcessNewBlock calls to resolve async.
    10
    11    Essentially, our goal is to not process anything for the given peer
    12    until the block finishes processing (emulating the previous behavior)
    13    without actually blocking the ProcessMessages loops. Obviously, in
    14    most cases, we'll just go on to the next peer and immediately hit a
    15    cs_main lock, blocking us anyway, but this we can slowly improve
    16    that state over time by moving things from CNodeState to CPeerState.
    17
    18 src/net_processing.cpp | 79 ++++++++++++++++++++++++++++++++++++++++++--------
    19 1 file changed, 67 insertions(+), 12 deletions(-)
    20bisect run success
    
  25. Move net_processing's ProcessNewBlock calls to resolve async.
    Essentially, our goal is to not process anything for the given peer
    until the block finishes processing (emulating the previous behavior)
    without actually blocking the ProcessMessages loops. Obviously, in
    most cases, we'll just go on to the next peer and immediately hit a
    cs_main lock, blocking us anyway, but this we can slowly improve
    that state over time by moving things from CNodeState to CPeerState.
    97aa9078e5
  26. Run the ActivateBestChain in ProcessNewBlock in a background thread
    Spawn a background thread at startup which validates each block as
    it comes in from ProcessNewBlock, taking advantage of the new
    std::future return value to keep tests simple (and the new
    net_processing handling of such values async already).
    
    This makes introducing subtle validationinterface deadlocks much
    harder as any locks held going into ProcessNewBlock do not interact
    with (in the form of lockorder restrictions) locks taken in
    validationinterface callbacks.
    
    Note that after this commit, feature_block and feature_assumevalid
    tests time out due to increased latency between block processing
    when those blocks do not represent a new best block. This will be
    resolved in the next commit.
    92574e2fda
  27. Add a callback to indicate a block has been processed
    This resolves the performance regression introduced in the previous
    commit by always waking the message processing thread after each
    block future resolves.
    
    Sadly, this is somewhat awkward - all other validationinterface
    callbacks represent an actual change to the global validation state,
    whereas this callback indicates only that a call which one
    validation "client" made has completed. After going back and forth
    for some time I didn't see a materially better way to resolve this
    issue, and luckily its a rather simple change, but its far from
    ideal. Note that because we absolutely do not want to ever block on
    a ProcessNewBlock-returned-future, the callback approach is
    critical.
    9a25c49a50
  28. Split AcceptBlock into three stages to write to disk in background
    To keep the API the same (and for simplicity of clients, ie
    net_processing), this splits AcceptBlock into the do-I-want-this
    stage, the checking stage, and the writing stage.
    
    ProcessNewBlock calls the do-I-want-this and checking (ie
    malleability checking) stuff, and then dumps blocks that pass
    into the background thread. In the background, we re-test the
    do-I-want-this logic but skip the checking stuff, before writing
    the block to disk and activating the best chain.
    bc8c92de5e
  29. Move BlockChecked to a background thread
    As reject messages are required to go out in-order (ie before any
    further messages are processed), this sadly requires that we
    further delay re-enabling a peer after a block has been processed
    by waiting for current validationinterface callbacks to drain.
    
    This commit enables further reduction of cs_main in net_processing
    by allowing us to lock cs_peerstate before cs_main in BlockChecked
    (ie allows us to move things which are accessed in BlockChecked,
    including DoS state and rejects into CPeerState and out of
    CNodeState).
    2ed0b64743
  30. Move mapBlockSource to cs_peerstate from cs_main
    This technically resolves a race where entries are added to
    mapBlockSource before we know that they're non-malleated and then
    removed only after PNB returns, though in practice this wasn't an
    issue since all access to mapBlockSource already held cs_peerstate.
    d3a941e8c5
  31. validationinterface: Use NewPoWValidBlock as sync test instead. d6b1729e64
  32. dongcarl force-pushed on May 21, 2020
  33. dongcarl commented at 10:06 pm on May 21, 2020: contributor
    Got the unit tests and functional tests to pass!
  34. in src/net_processing.cpp:255 in d6b1729e64
    247@@ -237,6 +248,30 @@ namespace {
    248 } // namespace
    249 
    250 namespace {
    251+/**
    252+ * Maintain state about nodes, protected by our own lock. Historically we put all
    253+ * peer tracking state in CNodeState, however this results in significant cs_main
    254+ * contention. Thus, new state tracking should go here, and we should eventually
    255+ * move most (non-validation-specific) state here.
    


    ariard commented at 11:42 pm on May 22, 2020:
    I think you should define more difference between validation-state and non-validation-state. Right now peer tracking state is spread among multiple class like CNode with TxRelay or CNodeState and now CPeerState. We should have a clear idea of what should go where, according to which thread uses it. You should also explain how CPeerState aims to reduce cs_main contention with regards to the new threading model.
  35. in src/net_processing.cpp:265 in d6b1729e64
    260+    //! peer.
    261+    std::future<bool> pending_block_processing;
    262+    //! The hash of the block which is pending download.
    263+    uint256 pending_block_hash;
    264+    //! Once we've finished processing a block from this peer, we must still wait for
    265+    //! any related callbacks to fire (to ensure, specifically, that rejects go out
    


    ariard commented at 11:46 pm on May 22, 2020:
    Is reject here making reference to reject messages ? I think it doesn’t make sense anymore post-#15437 and post-#17004. Also you may lay out the expected callback sequence (as we do for TransactionRemovedFromMempool in src/validationinterface.h).
  36. in src/net_processing.cpp:2051 in d6b1729e64
    2046@@ -1973,6 +2047,23 @@ void static ProcessOrphanTx(CConnman* connman, CTxMemPool& mempool, std::set<uin
    2047     }
    2048 }
    2049 
    2050+/**
    2051+ *  A block has been processed. Handle potential peer punishment and housekeeping.
    


    ariard commented at 11:47 pm on May 22, 2020:
    Please define more “housekeeping”.
  37. in src/net_processing.cpp:3173 in d6b1729e64
    3176-                mapBlockSource.emplace(resp.blockhash, std::make_pair(pfrom->GetId(), false));
    3177             }
    3178         } // Don't hold cs_main when we call into ProcessNewBlock
    3179         if (fBlockRead) {
    3180-            bool fNewBlock = false;
    3181+            // BIP 152 permits peers to relay compact blocks after validating
    


    ariard commented at 11:48 pm on May 22, 2020:
    I’m not sure that BIP152 low-bandwidth authorizes invalid header propagation, maybe precise.
  38. in src/validation.cpp:3732 in d6b1729e64
    3728@@ -3729,23 +3729,10 @@ static FlatFilePos SaveBlockToDisk(const CBlock& block, int nHeight, const CChai
    3729     return blockPos;
    3730 }
    3731 
    3732-/** Store block on disk. If dbp is non-nullptr, the file is known to already reside on disk */
    3733-bool CChainState::AcceptBlock(const std::shared_ptr<const CBlock>& pblock, BlockValidationState& state, const CChainParams& chainparams, CBlockIndex** ppindex, bool fRequested, const FlatFilePos* dbp, bool* fNewBlock)
    3734+bool CChainState::ShouldMaybeWrite(CBlockIndex* pindex, bool fRequested)
    


    ariard commented at 11:49 pm on May 22, 2020:
    Please can you comment on what conditional write is laying on.
  39. in src/validation.cpp:3926 in d6b1729e64
    3922+            }
    3923+        }
    3924+
    3925+        NotifyHeaderTip();
    3926+
    3927+        BlockValidationState state; // Only used to report errors, not invalidity - ignore it
    


    ariard commented at 11:50 pm on May 22, 2020:
    You should point where errors are distinguished from invalidity.
  40. ariard commented at 11:54 pm on May 22, 2020: contributor
    Great to see progress on this, for now just high-level comments calling for better comments . I will review John PR again and then your linked notes to help to make them easier to read. I think that’s the right way to come with a clear display of what is the current validation model, what this changes propose to do and why. There is so much context, that we can’t expect all reviewers to do the PR context history digging by themselves.
  41. dongcarl commented at 3:45 pm on April 20, 2021: contributor
    Not planning to work on this anytime soon unfortunately.
  42. dongcarl closed this on Apr 20, 2021

  43. jnewbery commented at 4:30 pm on April 20, 2021: contributor

    image

    🙁

    Let me know if you pick this up again!

  44. fanquake added the label Up for grabs on Apr 21, 2021
  45. bitcoin locked this on Aug 18, 2022
  46. hebasto commented at 8:07 am on May 17, 2023: member

    From my research it follows that the “Move BlockChecked to a background thread” commit fixes a lock-order-inversion in the MessageHandler thread.

    See #19303 (comment).

  47. bitcoin unlocked this on May 17, 2023
  48. maflcko removed the label Tests on Aug 8, 2023
  49. bitcoin locked this on Aug 7, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me