Yet another change to reduce recursive mempool locking #19917

pull promag wants to merge 5 commits into bitcoin:master from promag:2020-09-removeunbroadcasttx changing 4 files +30 −21
  1. promag commented at 1:57 pm on September 8, 2020: member

    First 2 commits avoid unlock/lock mempool.cs and cs_main interchangeably by turning a loop in two loops - each mutex is locked throughout each corresponding loop.

    Then explicit lock in CTxMemPool::RemoveUnbroadcastTx, CTxMemPool::GetUnbroadcastTxs and CTxMemPool::exists is removed forcing just 3 explicit WITH_LOCK where exists() is called and just 1 where GetUnbroadcastTxs() is called. This can be improved by adding an auxiliary function that locks and calls the original.

  2. net: Batch RelayTransaction in PeerLogicValidation::ReattemptInitialBroadcast 8f30df2002
  3. net: Batch RemoveUnbroadcastTx in PeerLogicValidation::ReattemptInitialBroadcast 746c6d4b8f
  4. fanquake added the label P2P on Sep 8, 2020
  5. fanquake added the label Refactoring on Sep 8, 2020
  6. promag commented at 2:04 pm on September 8, 2020: member
    @hebasto this is an example of stuff that I think we can do before moving locks around. The first and second commits refactor ReattemptInitialBroadcast to drop unlock/lock in each iteration.
  7. promag commented at 2:04 pm on September 8, 2020: member
    @vasild @ryanofsky your comment here would be nice too.
  8. hebasto commented at 3:50 pm on September 8, 2020: member
    Why draft?
  9. promag commented at 3:52 pm on September 8, 2020: member
    Yeah I don’t mind setting it ready for review, the goal was to show an example rather than adding noise to your PR.
  10. promag marked this as ready for review on Sep 8, 2020
  11. in src/net_processing.cpp:895 in 8f4307c975 outdated
    896-        if (m_mempool.exists(elem.first)) {
    897-            LOCK(cs_main);
    898+    std::vector<std::pair<uint256, uint256>> relay_transactions;
    899+    {
    900+        LOCK(m_mempool.cs);
    901+        std::map<uint256, uint256> unbroadcast_txids = m_mempool.GetUnbroadcastTxs();
    


    promag commented at 3:54 pm on September 8, 2020:
    This approach is also a small step towards removing the lock from GetUnbroadcastTxs and exists.

    promag commented at 9:08 pm on September 8, 2020:
    Done in fb2dca415f6ac74d79292296831ccc33c485230b and dde441c11a07b9ff72dbcfda896c61adf33ae228.
  12. hebasto commented at 3:55 pm on September 8, 2020: member

    Yeah I don’t mind setting it ready for review, the goal was to show an example rather than adding noise to your PR.

    I’ll be happy to postpone #19872 until this PR is reviewed and merged :)

  13. in src/net_processing.cpp:909 in 8f30df2002 outdated
    906+    {
    907+        LOCK(cs_main);
    908+        for (const auto& elem : relay_transactions) {
    909+            RelayTransaction(elem.first, elem.second, m_connman);
    910+        }
    911+    }
    


    hebasto commented at 4:25 pm on September 8, 2020:
    8f30df2002f00b7613e602bc576285974d8f2bcf Does this approach decrease concurrency wrt to ::cs_main uninterruptible locking?

    promag commented at 5:43 pm on September 8, 2020:
    RelayTransaction doesn’t hold cs_main that long.
  14. hebasto commented at 5:55 pm on September 8, 2020: member
    Concept ACK.
  15. vasild commented at 7:13 pm on September 8, 2020: member

    The changes in PeerManager::ReattemptInitialBroadcast() will execute CTxMemPool::GetUnbroadcastTxs() and CTxMemPool::exists() under CTxMemPool::cs whereas previously they were not called under this mutex. Both methods acquire the mutex themselves. So this adds more recursive mutex locks.

    What is the purpose of this patch? There is no description and commit messages are a bit scarce, missing answer to “Why are we doing this?”.

  16. promag commented at 7:28 pm on September 8, 2020: member
    @vasild sure I can detail the intention. See #19917 (review).
  17. refactor: CTxMemPool::RemoveUnbroadcastTx requires lock 7c35539156
  18. refactor: CTxMemPool::GetUnbroadcastTxs requires lock fb2dca415f
  19. refactor: CTxMemPool::exists requires lock dde441c11a
  20. promag force-pushed on Sep 8, 2020
  21. promag renamed this:
    RemoveUnbroadcastTx requires mempool lock
    Yet another change to reduce recursive mempool locking
    on Sep 8, 2020
  22. in src/txmempool.cpp:424 in dde441c11a
    420@@ -421,7 +421,7 @@ void CTxMemPool::removeUnchecked(txiter it, MemPoolRemovalReason reason)
    421     for (const CTxIn& txin : it->GetTx().vin)
    422         mapNextTx.erase(txin.prevout);
    423 
    424-    RemoveUnbroadcastTx(hash, true /* add logging because unchecked */ );
    425+    WITH_LOCK(cs, RemoveUnbroadcastTx(hash, true /* add logging because unchecked */ ));
    


    vasild commented at 6:59 am on September 9, 2020:
    missing return?
  23. hebasto commented at 7:19 am on September 9, 2020: member
    @promag To fix TSan errors consider comparing dde441c11a07b9ff72dbcfda896c61adf33ae228 with 049d8c54f4f14996c347d099bbad855c0aa74bb6.
  24. in src/net_processing.cpp:912 in dde441c11a
    915             RelayTransaction(elem.first, elem.second, m_connman);
    916-        } else {
    917-            m_mempool.RemoveUnbroadcastTx(elem.first, true);
    918         }
    919     }
    920 
    


    vasild commented at 8:06 am on September 9, 2020:

    This can be simplified to:

    0void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
    1{
    2    for (const auto& elem : WITH_LOCK(m_mempool.cs, return m_mempool.GetUnbroadcastTxs())) {
    3        LOCK(cs_main);
    4        RelayTransaction(elem.first, elem.second, m_connman);
    5    } 
    6
    7    // Schedule next run for 10-15 minutes in the future.
    8    ...
    9}
    

    Because m_mempool.exists() will always return true for a tx returned by m_mempool.GetUnbroadcastTxs() if we don’t release m_mempool.cs between the two calls.

    Also, the tx could be removed after we release m_mempool.cs and before we call RelayTransaction() and this is ok and is handled just fine.

    cc @gzhao408


    promag commented at 8:13 am on September 9, 2020:
    But is it really necessary to have cs_main lock in each iteration?

    vasild commented at 8:51 am on September 9, 2020:

    Since it was locked in each iteration before this PR, the question should rather be “If we move LOCK(cs_main) before the loop, why would we do that?”

    The frequent lock/unlock allows other threads to proceed. I don’t see a reason to change it.


    hebasto commented at 8:47 am on September 10, 2020:

    The frequent lock/unlock allows other threads to proceed. I don’t see a reason to change it.

    Agree (https://github.com/bitcoin/bitcoin/pull/19917#discussion_r485047637).


    promag commented at 1:57 pm on September 10, 2020:

    @vasild I don’t agree with that. Other threads can proceed but the current thread will wait unnecessarily in each iteration for the lock and as such other things will be delayed, not mentioning the mutex overhead. See https://stackoverflow.com/a/3652428.

    In this case RelayTransaction is pretty quick, nothing that can potentially cause a big lock on cs_main.

  25. DrahtBot added the label Needs rebase on Sep 16, 2020
  26. DrahtBot commented at 0:14 am on September 16, 2020: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  27. promag commented at 11:47 am on November 7, 2020: member
    Rebase hell.
  28. promag closed this on Nov 7, 2020

  29. promag deleted the branch on Nov 7, 2020
  30. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me