locks: introduce mutex for tx download, flush rejection filters once per tip change #30111

pull glozow wants to merge 7 commits into bitcoin:master from glozow:2024-05-txdownload-mutex changing 7 files +128 −124
  1. glozow commented at 2:16 pm on May 15, 2024: member

    See #27463 for full project tracking.

    This contains the first few commits of #30110, which require some thinking about thread safety in review.

    • Introduce a new m_tx_download_mutex which guards the transaction download data structures including m_txrequest, the rolling bloom filters, and m_orphanage. Later this should become the mutex guarding TxDownloadManager.
      • m_txrequest doesn’t need to be guarded using cs_main anymore
      • m_recent_confirmed_transactions doesn’t need its own lock anymore
      • m_orphanage doesn’t need its own lock anymore
    • Adds a new ValidationInterface event, ActiveTipChanged, which is a synchronous callback whenever the tip of the active chainstate changes.
    • Flush m_recent_rejects and m_recent_rejects_reconsiderable on ActiveTipChanged just once instead of checking the tip every time AlreadyHaveTx is called. This should speed up calls to that function (no longer comparing a block hash each time) and removes the need to lock cs_main every time it is called.

    Motivation:

    • These data structures need synchronization. While we are holding m_tx_download_mutex, these should hold:
      • a tx hash in m_txrequest is not also in m_orphanage
      • a tx hash in m_txrequest is not also in m_recent_rejects or m_recent_confirmed_transactions
      • In the future, orphan resolution tracking should also be synchronized. If a tx has an entry in the orphan resolution tracker, it is also in m_orphanage, and not in m_txrequest, etc.
    • Currently, cs_main is used to e.g. sync accesses to m_txrequest. We should not broaden the scope of things it locks.
    • Currently, we need to know the current chainstate every time we call AlreadyHaveTx so we can decide whether we should update it. Every call compares the current tip hash with hashRecentRejectsChainTip. It is more efficient to have a validation interface callback that updates the rejection filters whenever the chain tip changes.
  2. glozow added the label P2P on May 15, 2024
  3. DrahtBot commented at 2:16 pm on May 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, dergoegge, theStack, hebasto
    Concept ACK mzumsande
    Stale ACK vasild, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30413 (fuzz: Version handshake by dergoegge)
    • #30233 (refactor: move m_is_inbound out of CNodeState by sr-gi)
    • #30214 (refactor: Improve assumeutxo state representation by ryanofsky)
    • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. in src/net_processing.cpp:3187 in c280fe7813 outdated
    3180@@ -3181,9 +3181,10 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
    3181 void PeerManagerImpl::ProcessInvalidTx(NodeId nodeid, const CTransactionRef& ptx, const TxValidationState& state,
    3182                                        bool maybe_add_extra_compact_tx)
    3183 {
    3184+    AssertLockHeld(m_tx_download_mutex);
    3185     AssertLockNotHeld(m_peer_mutex);
    3186     AssertLockHeld(g_msgproc_mutex);
    3187-    AssertLockHeld(cs_main);
    3188+    /* AssertLockHeld(cs_main); */
    


    hebasto commented at 2:24 pm on May 15, 2024:
    This commented line maybe requires more explanation?

    glozow commented at 3:43 pm on May 15, 2024:
    I missed deleting that. Fixed, thanks!
  5. in src/net_processing.cpp:2307 in c280fe7813 outdated
    2312 
    2313-    {
    2314-        LOCK(m_recent_confirmed_transactions_mutex);
    2315-        if (m_recent_confirmed_transactions.contains(hash)) return true;
    2316-    }
    2317+    if (m_recent_confirmed_transactions.contains(hash)) return true;
    


    hebasto commented at 3:19 pm on May 15, 2024:
    It seems reasonable to prepend this function body with AssertLockHeld(m_tx_download_mutex); now, no?

    glozow commented at 3:43 pm on May 15, 2024:
    added
  6. glozow force-pushed on May 15, 2024
  7. in src/net_processing.cpp:1078 in 5ba125d5ce outdated
    1073@@ -1075,7 +1074,7 @@ class PeerManagerImpl final : public PeerManager
    1074     int m_peers_downloading_from GUARDED_BY(cs_main) = 0;
    1075 
    1076     /** Storage for orphan information */
    1077-    TxOrphanage m_orphanage;
    1078+    TxOrphanage m_orphanage GUARDED_BY(m_tx_download_mutex);
    


    ajtowns commented at 3:59 pm on May 15, 2024:

    This is already thread safe due to its internal locks. If its only user is going to protect it anyway, the internal locking should go away, no?

    (not done in this PR) m_orphanage doesn’t need its own lock anymore

    Ah. Perhaps add mutable Mutex m_mutex; // TODO: this lock is obsoleted by m_tx_download_mutex in txorphanage.h


    glozow commented at 8:07 am on May 16, 2024:
    Alternatively I can add a commit that removes the TxOrphanage lock?

    ajtowns commented at 8:23 am on May 16, 2024:
    I expect just removing it would be fine and easy, but no need to bump it up higher in the todo list

    glozow commented at 3:29 pm on May 16, 2024:
    Removed and added a scripted-diff commit to rename EraseTxNoLock to EraseTxInternal
  8. in src/validation.cpp:3473 in 5ba125d5ce outdated
    3407@@ -3407,7 +3408,11 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
    3408                     break;
    3409                 }
    3410             }
    3411-        }
    3412+            } // release MempoolMutex
    3413+            if (m_chainman.m_options.signals) {
    3414+                m_chainman.m_options.signals->UpdatedBlockTipSync(pindexNewTip);
    


    hebasto commented at 4:00 pm on May 15, 2024:
    This call invokes logging, i.e. I/O operations, while holding the ::cs_main and m_mempool->cs mutexes. It is considered as a suboptimal approach from the performance point of view. Could it be avoided?

    glozow commented at 4:13 pm on May 15, 2024:
    I don’t think it’s true that m_mempool->cs is still held here (see line above)?

    hebasto commented at 4:15 pm on May 15, 2024:

    I don’t think it’s true that m_mempool->cs is still held here (see line above)?

    You’re right. I was confused by indentation. My apologies.


    glozow commented at 4:33 pm on May 15, 2024:
    As for cs_main, it seems necessary for the chain tip to be what we think it is when the callback happens. IIUC the logging only happens in debug?
  9. in src/net_processing.cpp:786 in 5ba125d5ce outdated
    780@@ -779,7 +781,10 @@ class PeerManagerImpl final : public PeerManager
    781     BanMan* const m_banman;
    782     ChainstateManager& m_chainman;
    783     CTxMemPool& m_mempool;
    784-    TxRequestTracker m_txrequest GUARDED_BY(::cs_main);
    785+
    786+    /** Protects tx download including TxRequest, rejection filter. */
    787+    mutable Mutex m_tx_download_mutex;
    


    hebasto commented at 4:01 pm on May 15, 2024:
    Is the mutable keyword really required?

    glozow commented at 3:28 pm on May 16, 2024:
    Removed now, thanks
  10. in src/net_processing.cpp:3202 in 5ba125d5ce outdated
    3198@@ -3197,9 +3199,9 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, Peer& peer,
    3199 void PeerManagerImpl::ProcessInvalidTx(NodeId nodeid, const CTransactionRef& ptx, const TxValidationState& state,
    3200                                        bool maybe_add_extra_compact_tx)
    3201 {
    3202+    AssertLockHeld(m_tx_download_mutex);
    


    ajtowns commented at 4:07 pm on May 15, 2024:
    Do AssertLockNotHeld before the AssertLockHeld calls?

    glozow commented at 8:32 am on May 17, 2024:
    Done
  11. in src/validationinterface.cpp:186 in fd08394a99 outdated
    182@@ -183,6 +183,12 @@ void ValidationSignals::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlo
    183                           fInitialDownload);
    184 }
    185 
    186+void ValidationSignals::UpdatedBlockTipSync(const CBlockIndex *pindexNew)
    


    ajtowns commented at 4:32 pm on May 15, 2024:
    Doesn’t this also need to be called from InvalidateBlock(), potentially each time the tip is updated?

  12. ajtowns commented at 4:59 pm on May 15, 2024: contributor
    I’m not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn’t miss an edge case. Rest looks good to me.
  13. glozow force-pushed on May 16, 2024
  14. glozow commented at 11:41 am on May 16, 2024: member

    I’m not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn’t miss an edge case.

    I (locally) split the commit into (1) update on validation interface callback and asserting that hashRecentRejectsChainTip is equal to the chain tip whenever we call AlreadyHaveTx + (2) removing hashRecentRejectsChainTip. I think if we see that everything runs fine with (1) it’s probably correct.

    I’ve (locally) hit a bug though, so will figure out what’s wrong and then push.

  15. ajtowns commented at 2:01 pm on May 16, 2024: contributor

    I’m not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn’t miss an edge case.

    I (locally) split the commit into (1) update on validation interface callback and asserting that hashRecentRejectsChainTip is equal to the chain tip whenever we call AlreadyHaveTx + (2) removing hashRecentRejectsChainTip. I think if we see that everything runs fine with (1) it’s probably correct.

    I’ve (locally) hit a bug though, so will figure out what’s wrong and then push.

    FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?), but it wasn’t reliable, it only occurred when I ran many tests in parallel, not when I reran that test on its own. The bug will only trigger if the tip is updated via some other path and AlreadyHave is called before the tip is further updated via the expected path, so (1) doesn’t seem like a terribly reliable check to me.

  16. glozow force-pushed on May 16, 2024
  17. glozow commented at 2:21 pm on May 16, 2024: member

    FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?)

    Was it mempool_packages.py maybe? Mine tripped there on invalidateblock when I was adding UpdatedBlockTip to InvalidateBlock, and it was a lock ordering problem. My hierarchy is cs_main -> tx_download_mutex -> mempool.cs.

    Hm. The alternative is to hold cs_main and tell txdownloadman the current chain tip pretty much all the time…

  18. ajtowns commented at 3:02 pm on May 16, 2024: contributor

    FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?)

    Was it mempool_packages.py maybe? Mine tripped there on invalidateblock when I was adding UpdatedBlockTip to InvalidateBlock, and it was a lock ordering problem. My hierarchy is cs_main -> tx_download_mutex -> mempool.cs.

    Ah, yes, I think it was.

  19. instagibbs commented at 1:44 pm on May 17, 2024: member
    thanks for splitting this up, next on my review docket
  20. glozow force-pushed on May 20, 2024
  21. glozow commented at 10:07 am on May 20, 2024: member
    Rebased for #29817
  22. in src/net_processing.cpp:783 in b5009a730c outdated
    778@@ -779,7 +779,10 @@ class PeerManagerImpl final : public PeerManager
    779     BanMan* const m_banman;
    780     ChainstateManager& m_chainman;
    781     CTxMemPool& m_mempool;
    782-    TxRequestTracker m_txrequest GUARDED_BY(::cs_main);
    783+
    784+    /** Protects tx download including TxRequest, rejection filter. */
    


    instagibbs commented at 3:38 pm on May 20, 2024:
    nit TxRequest isn’t a thing?

    glozow commented at 12:27 pm on May 23, 2024:
    fixed
  23. in src/net_processing.cpp:496 in b5009a730c outdated
    492@@ -493,9 +493,9 @@ class PeerManagerImpl final : public PeerManager
    493 
    494     /** Overridden from CValidationInterface. */
    495     void BlockConnected(ChainstateRole role, const std::shared_ptr<const CBlock>& pblock, const CBlockIndex* pindexConnected) override
    496-        EXCLUSIVE_LOCKS_REQUIRED(!m_recent_confirmed_transactions_mutex);
    497+        EXCLUSIVE_LOCKS_REQUIRED(!m_recent_confirmed_transactions_mutex, !m_tx_download_mutex);
    


    instagibbs commented at 7:05 pm on May 20, 2024:
    not a locking expert but would LOCKS_EXCLUDED be easier to read?

    glozow commented at 11:20 am on May 23, 2024:
    Oh, I didn’t know you could do both. Happy to change if people want, but generally prefer to follow convention of the existing annotations.

    ajtowns commented at 11:43 am on May 23, 2024:
    EXCLUSIVE_LOCKS_REQUIRED(!foo) is a stronger assertion; it says that if the caller can see the mutex, it also has to have the same assertion. LOCKS_EXCLUDED(foo) just says you can’t LOCK() or have EXCLUSIVE_LOCKS_REQUIRED(foo) – so there’s nothing preventing the caller’s caller from having taken the lock.

    glozow commented at 12:34 pm on May 23, 2024:
    Ah that makes sense. I will leave this as is.

    instagibbs commented at 1:48 pm on May 23, 2024:
    ok, makes sense why I only see it used in cs_main contexts, since it’s a global mutex. thanks
  24. in src/test/orphanage_tests.cpp:24 in 879f5db3ce outdated
    20@@ -21,15 +21,13 @@ BOOST_FIXTURE_TEST_SUITE(orphanage_tests, TestingSetup)
    21 class TxOrphanageTest : public TxOrphanage
    22 {
    23 public:
    24-    inline size_t CountOrphans() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    


    instagibbs commented at 7:15 pm on May 20, 2024:
    in commit message mention what it’s obsoleted by, e.g., external mutex by caller

    glozow commented at 12:27 pm on May 23, 2024:
    done
  25. in src/txorphanage.h:102 in 7c3fb97284 outdated
     98@@ -99,7 +99,7 @@ class TxOrphanage {
     99     std::vector<OrphanMap::iterator> m_orphan_list;
    100 
    101     /** Erase an orphan by wtxid */
    102-    int EraseTxNoLock(const Wtxid& wtxid);
    103+    int EraseTxInternal(const Wtxid& wtxid);
    


    instagibbs commented at 7:21 pm on May 20, 2024:
    what’s the value of having the now bare wrapper EraseTx

    glozow commented at 12:27 pm on May 23, 2024:
    Oh duh! Swapped out the last commit for just a deletion of EraseTxNoLock, now having all functions call EraseTx instead.
  26. instagibbs commented at 2:07 pm on May 22, 2024: member

    looking pretty straightforward but I’m not an expert in the current locking setup

    will give another pass in a bit

  27. glozow force-pushed on May 23, 2024
  28. instagibbs commented at 2:09 pm on May 23, 2024: member
    suggested changes were done, reviewed via git range-diff master 7c3fb97 ef8de26
  29. in src/net_processing.cpp:785 in e71a5ae9e0 outdated
    778@@ -779,7 +779,10 @@ class PeerManagerImpl final : public PeerManager
    779     BanMan* const m_banman;
    780     ChainstateManager& m_chainman;
    781     CTxMemPool& m_mempool;
    782-    TxRequestTracker m_txrequest GUARDED_BY(::cs_main);
    783+
    784+    /** Protects tx download including TxRequestTracker, rejection filters, and TxOrphanage. */
    


    instagibbs commented at 2:31 pm on May 23, 2024:
    non-blocking nit: commit message still referencing TxRequest

    glozow commented at 12:13 pm on June 6, 2024:
    fixed, thanks
  30. instagibbs approved
  31. instagibbs commented at 2:00 pm on May 24, 2024: member
    utACK ef8de26be5478729328ac9d8a9ad6898351552b6
  32. DrahtBot added the label Needs rebase on Jun 3, 2024
  33. glozow force-pushed on Jun 6, 2024
  34. glozow commented at 12:14 pm on June 6, 2024: member
    rebased
  35. DrahtBot removed the label Needs rebase on Jun 6, 2024
  36. instagibbs commented at 6:56 pm on June 6, 2024: member

    reACK https://github.com/bitcoin/bitcoin/pull/30111/commits/a9a6de5a7c8c411519622a32bd4b998df5d7d883

    Only relevant change is suggested commit message fix

    reviewed via git range-diff master ef8de26 a9a6de5

  37. DrahtBot added the label Needs rebase on Jun 10, 2024
  38. glozow force-pushed on Jun 11, 2024
  39. DrahtBot removed the label Needs rebase on Jun 11, 2024
  40. DrahtBot commented at 10:33 am on June 11, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/26067334886

  41. DrahtBot added the label CI failed on Jun 11, 2024
  42. glozow force-pushed on Jun 11, 2024
  43. glozow commented at 12:04 pm on June 11, 2024: member
    Rebased
  44. DrahtBot removed the label CI failed on Jun 11, 2024
  45. instagibbs commented at 2:18 pm on June 13, 2024: member

    reACK https://github.com/bitcoin/bitcoin/pull/30111/commits/0e0c422aedd4009ab34eca127e4904d15e81f5be

    just a rebase

    reviewed via git range-diff master a9a6de5 0e0c422

  46. in src/net_processing.cpp:1684 in 0e0c422aed outdated
    1673@@ -1674,8 +1674,9 @@ void PeerManagerImpl::InitializeNode(CNode& node, ServiceFlags our_services)
    1674 {
    1675     NodeId nodeid = node.GetId();
    1676     {
    1677-        LOCK(cs_main);
    1678+        LOCK(cs_main); // For m_node_states
    1679         m_node_states.emplace_hint(m_node_states.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(node.IsInboundConn()));
    1680+        LOCK(m_tx_download_mutex);
    1681         assert(m_txrequest.Count(nodeid) == 0);
    1682     }
    


    vasild commented at 12:20 pm on June 17, 2024:

    Since now cs_main need not be held when accessing m_txrequest better not hold it. For clarity and performance.

    0    {
    1        LOCK(cs_main); // For m_node_states
    2        m_node_states.emplace_hint(m_node_states.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(node.IsInboundConn()));
    3+   }
    4+   {
    5        LOCK(m_tx_download_mutex);
    6        assert(m_txrequest.Count(nodeid) == 0);
    7    }
    

    glozow commented at 2:33 pm on July 3, 2024:
    added
  47. in src/net_processing.cpp:2095 in 0e0c422aed outdated
    2090+{
    2091+    AssertLockNotHeld(m_mempool.cs);
    2092+    LOCK(m_tx_download_mutex);
    2093+    // If the chain tip has changed, previously rejected transactions might now be invalid, e.g. due
    2094+    // to a timelock. Reset the rejection filters to give those transactions another chance if we
    2095+    // see them again.
    


    vasild commented at 1:10 pm on June 17, 2024:
    Should s/invalid/valid?

    glozow commented at 2:33 pm on July 3, 2024:
    done, thanks
  48. vasild approved
  49. vasild commented at 1:32 pm on June 17, 2024: contributor

    ACK 0e0c422aedd4009ab34eca127e4904d15e81f5be

    Changes like this are inherently difficult to review:

    One has to check all variables protected by cs_main and ensure that none of them are required to be in sync with the ones that are removed from cs_main protection like m_txrequest.

    Also, one has to check that all mutexes locked before the new one (and still locked when the new one is acquired) are never locked while the new one is locked anywhere else in the code and that mutexes locked while the new one is locked are never locked before it anywhere else.

  50. glozow commented at 4:44 pm on June 17, 2024: member
    Thanks @vasild. I will take your suggestions in a followup or if I retouch this.
  51. achow101 commented at 8:14 pm on June 17, 2024: member

    ACK 0e0c422aedd4009ab34eca127e4904d15e81f5be

    Not as familiar with locking in their area, so not super confident on the cs_main related changes, but the rest (removing other mutexes and consolidating under m_tx_download_mutex) looks fine.

  52. achow101 requested review from ajtowns on Jun 17, 2024
  53. achow101 requested review from hebasto on Jun 17, 2024
  54. in src/validation.cpp:3695 in ecf1acc36f outdated
    3690@@ -3684,6 +3691,9 @@ bool Chainstate::InvalidateBlock(BlockValidationState& state, CBlockIndex* pinde
    3691         // distinguish user-initiated invalidateblock changes from other
    3692         // changes.
    3693         (void)m_chainman.GetNotifications().blockTip(GetSynchronizationState(m_chainman.IsInitialBlockDownload(), m_chainman.m_blockman.m_blockfiles_indexed), *to_mark_failed->pprev);
    3694+        if (m_chainman.m_options.signals) {
    3695+            m_chainman.m_options.signals->UpdatedBlockTipSync(m_chain.Tip());
    


    dergoegge commented at 9:46 am on June 19, 2024:
    Why is UpdatedBlockTipSync called but not UpdatedBlockTip?

    glozow commented at 2:35 pm on July 3, 2024:

    I suppose it’s not super accurate to say that UpdatedBlockTipSync is the synchronous version of UpdatedBlockTip, as the point here is to fire whenever the chain tip changes at all, while UpdatedBlockTip skips some things. i.e. InvalidateBlock when our new tip is a step back instead of an advancement.

    I’ve removed comments comparing it to UpdatedBlockTip, slightly changed the calling logic, and renamed it to ActiveTipChange.

  55. in src/net_processing.cpp:2092 in 0e0c422aed outdated
    2085@@ -2081,6 +2086,17 @@ void PeerManagerImpl::StartScheduledTasks(CScheduler& scheduler)
    2086     scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
    2087 }
    2088 
    2089+void PeerManagerImpl::UpdatedBlockTipSync(const CBlockIndex* pindexNew)
    2090+{
    2091+    AssertLockNotHeld(m_mempool.cs);
    2092+    LOCK(m_tx_download_mutex);
    


    dergoegge commented at 10:28 am on June 20, 2024:
    Is clearing the filters necessary in IBD?

    glozow commented at 2:35 pm on July 3, 2024:
    gating on whether ibd now
  56. dergoegge commented at 10:40 am on June 20, 2024: member

    This is a synchronous version of UpdatedBlockTip.

    Given that this is the goal, it’s a little weird that UpdatedBlockTipSync has a different signature and is called in different locations.

    If it wasn’t for the zmq notifier we could probably just make UpdateBlockTip synchronous instead of having two versions.

  57. glozow force-pushed on Jul 3, 2024
  58. glozow commented at 2:39 pm on July 3, 2024: member

    Changes like this are inherently difficult to review

    Yes, needs careful code review. The intermediate Assume in 4673e04cb3 may help as a sanity check. Lmk if anybody has ideas to structure the PR in a clearer way.

  59. glozow force-pushed on Jul 3, 2024
  60. glozow commented at 11:05 am on July 4, 2024: member
    Addressed @dergoegge and @vasild comments. Ready for review again, reACKs would be appreciated!
  61. in src/validationinterface.h:67 in 17d41e7a54 outdated
    60@@ -61,6 +61,10 @@ class CValidationInterface {
    61      * Called on a background thread. Only called for the active chainstate.
    62      */
    63     virtual void UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockIndex *pindexFork, bool fInitialDownload) {}
    64+    /**
    65+     * Notifies listeners any time the block chain tip changes, synchronously.
    66+     */
    67+    virtual void ActiveTipChange(const CBlockIndex* pindexNew, bool is_ibd) {};
    


    dergoegge commented at 11:17 am on July 4, 2024:
    0    virtual void ActiveTipChange(const CBlockIndex* new_tip, bool is_ibd) {};
    

    glozow commented at 10:52 am on July 5, 2024:
    will do if I retouch

    glozow commented at 9:22 am on July 16, 2024:
    done
  62. dergoegge approved
  63. dergoegge commented at 1:18 pm on July 4, 2024: member
    Code review ACK 17d41e7a547f16f0dd3802dac73a63772ca000b2
  64. DrahtBot requested review from instagibbs on Jul 4, 2024
  65. DrahtBot requested review from achow101 on Jul 4, 2024
  66. DrahtBot requested review from vasild on Jul 4, 2024
  67. instagibbs approved
  68. instagibbs commented at 5:30 pm on July 8, 2024: member

    reACK 17d41e7a547f16f0dd3802dac73a63772ca000b2 (noting my weak ability to review locking)

    biggest change is changing the new function to ActiveTipChange and making ibd a no-op

    reviewed via git range-diff master 0e0c422 17d41e7

  69. in src/validation.cpp:3795 in 7e9f130a12 outdated
    3689@@ -3684,6 +3690,12 @@ bool Chainstate::InvalidateBlock(BlockValidationState& state, CBlockIndex* pinde
    3690         // distinguish user-initiated invalidateblock changes from other
    3691         // changes.
    3692         (void)m_chainman.GetNotifications().blockTip(GetSynchronizationState(m_chainman.IsInitialBlockDownload(), m_chainman.m_blockman.m_blockfiles_indexed), *to_mark_failed->pprev);
    3693+
    3694+        // Fire ActiveTipChange now for the current chain tip to make sure clients are notified.
    3695+        // ActivateBestChain may call this as well, but not necessarily.
    3696+        if (m_chainman.m_options.signals) {
    


    Eunovo commented at 9:03 am on July 11, 2024:
    IIUC ActivateBestChain will fire ActiveTipChange as long as the current instance is the active chain state. InvalidateBlock and ActivateBestChain are both called on active chain state so it doesn’t seem like there’s a situation where ActiveTipChange won’t be fired before this point. Could this comment be misleading?

    glozow commented at 12:48 pm on July 11, 2024:

    That seems incorrect. ActivateBestchain exits before ActiveTipChange if it realizes there is nothing to do, e.g. if invalidateblock only causes the tip to change from block n to block n-1.

    https://github.com/bitcoin/bitcoin/blob/9b480f7a25a737c9c4ebc33401e94d66c2da9ec3/src/validation.cpp#L3505-L3508


    Eunovo commented at 1:24 pm on July 11, 2024:
    @glozow Thanks for the clarification. I see it now. Any reason why we don’t fire the ActiveTipChange here instead?

    glozow commented at 1:38 pm on July 11, 2024:
    What do you mean by “here”?


    glozow commented at 3:07 pm on July 11, 2024:
    Because then it would fire even when it’s not supposed to fire. We’re aiming to add a notification for when the tip changes.
  70. theStack commented at 4:38 pm on July 15, 2024: contributor
    Concept ACK
  71. in src/net_processing.cpp:2075 in 4673e04cb3 outdated
    2093@@ -2092,6 +2094,21 @@ void PeerManagerImpl::StartScheduledTasks(CScheduler& scheduler)
    2094     scheduler.scheduleFromNow([&] { ReattemptInitialBroadcast(scheduler); }, delta);
    2095 }
    2096 
    2097+void PeerManagerImpl::ActiveTipChange(const CBlockIndex* pindexNew, bool is_ibd)
    2098+{
    2099+    AssertLockNotHeld(m_mempool.cs);
    


    mzumsande commented at 8:11 pm on July 15, 2024:
    I think you meant m_tx_download_mutex

    glozow commented at 8:29 am on July 16, 2024:

    No, I think the LOCK(m_tx_download_mutex) already checks that it isn’t already held, since it’s a Mutex.

    I guess this seems a bit random, but I’m trying to establish the lock hierarchy as m_tx_download_mutex, then m_mempool.cs (which is a RecursiveMutex). That’s also why we can only fire ActiveTipChanged after releasing the mempool mutex. Later code added to TxDownloadManager will call CTxMemPool functions that take the mempool lock (see e.g. https://github.com/bitcoin/bitcoin/pull/30110/files#diff-5d7e42fb1e904f54956c0975a9307c58d8d36d77ad9da8bcd8989437356e3229R264 or grep exists within txdownload_impl).


    glozow commented at 9:22 am on July 16, 2024:
    Ah I was wrong about LOCK doing the assert, added that as well. Also wrote a comment on m_tx_download_mutex about the lock order

    mzumsande commented at 2:50 pm on July 16, 2024:
    Makes sense! In net.cpp, the AssertLockNotHeld(x) / LOCK(x) pattern is so common that I didn’t even consider the possibility that this could be on purpose.

    hebasto commented at 11:05 am on July 22, 2024:

    As far as I understand, AssertLockNotHeld(m_mempool.cs); is an attempt to force the mutex hierarchy mentioned in here: https://github.com/bitcoin/bitcoin/blob/c85accecafc20f6a6ae94bdf6cdd3ba9747218fd/src/net_processing.cpp#L784

    As m_mempool.cs is not used in this function, I suggest to add a comment to prevent tempting of removing this line on the “clean up” purpose in the future.

  72. mzumsande commented at 8:15 pm on July 15, 2024: contributor

    Concept ACK - don’t know the txrequest logic very well though.

    The PR description should be updated (UpdatedBlockTipSync was renamed).

  73. glozow renamed this:
    locks: introduce mutex for tx download, flush rejection filters on UpdatedBlockTip
    locks: introduce mutex for tx download, flush rejection filters once per tip change
    on Jul 16, 2024
  74. guard TxRequest and rejection caches with new mutex
    We need to synchronize between various tx download structures.
    TxRequest does not inherently need cs_main for synchronization, and it's
    not appropriate to lock all of the tx download logic under cs_main.
    3eb1307df0
  75. add ValidationInterface::ActiveTipChange
    This is a synchronous callback notifying clients of all tip changes.
    
    It allows clients to respond to a new block immediately after it is
    connected. The synchronicity is important for things like
    m_recent_rejects, in which a transaction's validity can change (rejected
    vs accepted) when this event is processed. For example, the transaction
    might have a timelock condition that has just been met. This is distinct
    from something like m_recent_confirmed_transactions, in which the
    validation outcome is the same (valid vs already-have), so it does not
    need to be reset immediately.
    36f170d879
  76. glozow force-pushed on Jul 16, 2024
  77. glozow commented at 9:02 am on July 16, 2024: member
  78. update recent_rejects filters on ActiveTipChange
    Resetting m_recent_rejects once per block is more efficient than
    comparing hashRecentRejectsChainTip with the chain tip every time we
    call AlreadyHaveTx. We keep hashRecentRejectsChainTip for now to assert
    that updates happen correctly; it is removed in the next commit.
    18a4355250
  79. remove obsoleted hashRecentRejectsChainTip
    This also means AlreadyHaveTx no longer needs cs_main held.
    723ea0f9a5
  80. lock m_recent_confirmed_transactions using m_tx_download_mutex 61745c7451
  81. remove obsoleted TxOrphanage::m_mutex
    The TxOrphanage is now guarded externally by m_tx_download_mutex.
    6ff84069a5
  82. [refactor] delete EraseTxNoLock, just use EraseTx c85accecaf
  83. glozow force-pushed on Jul 16, 2024
  84. instagibbs commented at 1:59 pm on July 16, 2024: member

    reACK c85accecafc20f6a6ae94bdf6cdd3ba9747218fd

    reviewed via git range-diff master 17d41e7 c85acce

  85. DrahtBot requested review from theStack on Jul 16, 2024
  86. DrahtBot requested review from dergoegge on Jul 16, 2024
  87. DrahtBot requested review from mzumsande on Jul 16, 2024
  88. in src/net_processing.cpp:2129 in 61745c7451 outdated
    2136-            m_txrequest.ForgetTxHash(ptx->GetHash());
    2137-            m_txrequest.ForgetTxHash(ptx->GetWitnessHash());
    2138-        }
    2139+    for (const auto& ptx : pblock->vtx) {
    2140+        m_txrequest.ForgetTxHash(ptx->GetHash());
    2141+        m_txrequest.ForgetTxHash(ptx->GetWitnessHash());
    


    theStack commented at 4:44 pm on July 18, 2024:
    nit: as a follow-up, could maybe consolidate those two loops into a single one, as they operate now under the same lock

    glozow commented at 12:40 pm on July 23, 2024:
    good idea, done in #30507
  89. dergoegge approved
  90. dergoegge commented at 1:09 pm on July 19, 2024: member
    Code review ACK c85accecafc20f6a6ae94bdf6cdd3ba9747218fd
  91. in src/net_processing.cpp:510 in c85accecaf
    508+    void InitializeNode(const CNode& node, ServiceFlags our_services) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_tx_download_mutex);
    509+    void FinalizeNode(const CNode& node) override EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_headers_presync_mutex, !m_tx_download_mutex);
    510     bool HasAllDesirableServiceFlags(ServiceFlags services) const override;
    511     bool ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt) override
    512-        EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_recent_confirmed_transactions_mutex, !m_most_recent_block_mutex, !m_headers_presync_mutex, g_msgproc_mutex);
    513+        EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_most_recent_block_mutex, !m_headers_presync_mutex, g_msgproc_mutex, !m_tx_download_mutex);
    


    hebasto commented at 10:04 am on July 22, 2024:

    The PeerManagerImpl::ProcessMessages implementation has multiple early return statements. In such a case, the Developer Notes suggest to:

    Combine annotations in function declarations with run-time asserts in function definitions

    i.e., AssertLockNotHeld(m_tx_download_mutex);

    UPD. Same for other places, for instance, PeerManagerImpl::SendMessages.


    glozow commented at 12:39 pm on July 23, 2024:
    thanks, added in #30507
  92. in src/net_processing.cpp:6300 in c85accecaf
    6296@@ -6281,6 +6297,7 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    6297         //
    6298         // Message: getdata (transactions)
    6299         //
    6300+        LOCK(m_tx_download_mutex);
    


    hebasto commented at 10:13 am on July 22, 2024:

    glozow commented at 12:40 pm on July 23, 2024:
    no we don’t, changed in #30507
  93. theStack approved
  94. theStack commented at 11:59 am on July 22, 2024: contributor

    Light code-review ACK c85accecafc20f6a6ae94bdf6cdd3ba9747218fd

    Fwiw, I’ve also spun up a fresh mainnet node instance on a publicly reachable machine running this PR for the last few days (out of IBD since Friday morning) and didn’t notice any problems so far.

  95. hebasto commented at 12:18 pm on July 22, 2024: member
    • Introduce a new m_tx_download_mutex which guards the transaction download data structures including m_txrequest, the rolling bloom filters, and m_orphanage.

      • m_orphanage doesn’t need its own lock anymore

    In the master branch, the design of the TxOrphanage class is thread-safe for all purposes. I don’t think that the fact that m_orphanage does not require additional synchronization justifies the change of TxOrphanage class design. The new m_tx_download_mutex guarantees no contention when locking TxOrphanage::m_mutex, which makes the latter cheap.

    If still insisting on the TxOrphanage class change, then perhaps document that it is no longer thread-safe and requires an external synchronization?

  96. glozow commented at 12:39 pm on July 23, 2024: member

    In the master branch, the design of the TxOrphanage class is thread-safe for all purposes. I don’t think that the fact that m_orphanage does not require additional synchronization justifies the change of TxOrphanage class design.

    See #30111 (review) for background. TxOrphanage is not a multi-purpose container. It has 1 specific purpose, and its user will need to externally synchronize it with other data structures to fulfill that purpose. It seems it would just slow things down to acquire and release a lock whenever we use it.

    perhaps document that it is no longer thread-safe and requires an external synchronization?

    I’ve opened #30507 to add documentation to txorphanage.h, as well as the other suggestions from @hebasto and @theStack (thanks!).

  97. hebasto approved
  98. hebasto commented at 7:14 am on July 24, 2024: member
    ACK c85accecafc20f6a6ae94bdf6cdd3ba9747218fd, I have reviewed the code and it looks OK.
  99. fanquake merged this on Jul 24, 2024
  100. fanquake closed this on Jul 24, 2024

  101. glozow deleted the branch on Jul 25, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me