Replace global g_cs_orphans lock with local #26295

pull ajtowns wants to merge 9 commits into bitcoin:master from ajtowns:202210-cs_orphans changing 8 files +126 −99
  1. ajtowns commented at 9:09 AM on October 11, 2022: contributor

    Moves extra transactions to be under the m_msgproc_mutex lock rather than g_cs_orphans and refactors orphan handling so that the lock can be internal to the TxOrphange class.

  2. Remove unnecessary includes of txorphange.h ff8d44d196
  3. net_processing: move extra transactions to msgproc mutex
    Previously vExtraTxnForCompact and vExtraTxnForCompactIt were protected
    by g_cs_orphans; protect them by g_msgproc_mutex instead, as they
    are only used during message processing.
    89e2e0da0b
  4. net_processing: Pass a Peer& to ProcessOrphanTx 9910ed755c
  5. net_processing: move ProcessOrphanTx docs to declaration 0027174b39
  6. net_processing: Localise orphan_work_set handling to ProcessOrphanTx 6f8e442ba6
  7. txorphange: move orphan workset to txorphanage 3614819864
  8. txorphanage: make m_peer_work_set private a936f41a5d
  9. ajtowns commented at 9:10 AM on October 11, 2022: contributor

    Resurrects #21527

  10. in src/net_processing.cpp:599 in 5d59755168 outdated
     596 | +     *                     may be added to if accepting an orphan causes its children to be
     597 | +     *                     reconsidered.
     598 | +     * @return             True if there are still orphans in this peer's work set.
     599 | +     */
     600 | +    bool ProcessOrphanTx(Peer& peer) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
     601 | +        EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, g_msgproc_mutex);
    


    maflcko commented at 10:17 AM on October 11, 2022:

    Is it intentional to split the annotations on two different lines?


    ajtowns commented at 1:45 PM on October 11, 2022:

    It wasn't, I just keep not noticing it. Fixed.

  11. dergoegge commented at 11:15 AM on October 11, 2022: member

    Concept ACK

  12. Move all g_cs_orphans locking to txorphanage 733d85f79c
  13. scripted-diff: rename and de-globalise g_cs_orphans
    -BEGIN VERIFY SCRIPT-
    sed -i -e 's/static RecursiveMutex/mutable Mutex/' src/txorphanage.h
    sed -i -e '/RecursiveMutex/d' src/txorphanage.cpp
    sed -i -e 's/g_cs_orphans/m_mutex/g' $(git grep -l g_cs_orphans src/)
    -END VERIFY SCRIPT-
    7082ce3e88
  14. ajtowns force-pushed on Oct 11, 2022
  15. DrahtBot commented at 5:23 PM on October 11, 2022: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK dergoegge, glozow
    Concept ACK hebasto, chinggg

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26151 (refactor: Guard TxRequestTracker by its own lock instead of cs_main by dergoegge)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  16. hebasto commented at 11:37 AM on October 12, 2022: member

    Concept ACK.

  17. in src/net_processing.cpp:933 in 7082ce3e88
     929 | @@ -924,14 +930,14 @@ class PeerManagerImpl final : public PeerManager
     930 |      /** Storage for orphan information */
     931 |      TxOrphanage m_orphanage;
     932 |  
     933 | -    void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
     934 | +    void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
    


    hebasto commented at 12:58 PM on October 12, 2022:

    89e2e0da0bcd0b0757d7b42907e2d2214da9f68b

    Add AssertLockHeld(g_msgproc_mutex); into implementation code?

  18. hebasto commented at 1:18 PM on October 12, 2022: member

    89e2e0da0bcd0b0757d7b42907e2d2214da9f68b

    ... protect them by g_msgproc_mutex instead, as they are only used during message processing.

    It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex. It could be locked only within PeerManagerImpl::AddToCompactExtraTransactions and PeerManagerImpl::ProcessMessage functions. The fact that "they are only used during message processing" will guarantee no performance loss.

    6f8e442ba61378489a6e2e2ab5bcfbccca1a29ec "net_processing: Localise orphan_work_set handling to ProcessOrphanTx" looks behavior changing. Do we have tests which cover it?

    Tbh, I'd be happy if 4 or 5 first commits could be split into a separated PR.

  19. glozow commented at 7:59 PM on October 12, 2022: member

    Big Concept ACK

  20. ajtowns commented at 5:19 AM on October 17, 2022: contributor

    It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex

    No, it shouldn't have its own mutex: it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic. The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.

  21. hebasto commented at 11:55 AM on October 17, 2022: member

    It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex

    No, it shouldn't have its own mutex... The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.

    This implies that in future design changes the vExtraTxnForCompact cannot be processed by any other thread except for the message processing one. Do we really want such a restriction for an ideal networking code?

    it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic.

    I cannot agree with "it's always better".

    Making it use "fewer resources" is a kind of optimization, and "premature optimization is the root of all evil”.

    it's always better to have fewer locks than more, as ... there are fewer things to keep in your head when following the logic.

    Perhaps, we are talking about personal biases, but, when following the logic, it's better to keep things structured, to have small classes which implement data structures with internal locking and thread-safe interfaces.

  22. ajtowns commented at 2:47 AM on October 18, 2022: contributor

    This implies that in future design changes the vExtraTxnForCompact cannot be processed by any other thread except for the message processing one.

    What are you talking about? If we can put extra transactions under a different lock now, we certainly can in the future. This isn't an API that we have to support indefinitely.

  23. in src/txorphanage.h:36 in 7082ce3e88
      37 | +     *  the transaction from the work set, and populates its arguments with
      38 | +     *  the originating peer, and whether there are more orphans for this peer
      39 | +     *  to work on after this tx.
      40 |       */
      41 | -    std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
      42 | +    CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    


    dergoegge commented at 4:03 PM on October 18, 2022:
        std::optional<TxToReconsider> GetTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    

    With TxToReconsider being:

    struct TxToReconsider {
        const CTransactionRef tx;
        const NodeId originator;
        const bool more;
    };
    

    That would seem a little cleaner to me because originator and more only have meaning if the returned transaction is not nullptr, so bundling the three would make sense imo.


    glozow commented at 5:41 PM on November 17, 2022:

    Was going to comment the same thing, this seems like a better interface given that we only want originator/more if there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)


    ajtowns commented at 6:57 AM on November 18, 2022:

    There was some other followup ideas in the previous PR, particularly #21527 (review) -- the idea there is that if Alice sends orphan tx Y which has a missing parent X, instead of immediately processing Y when X arrives (which might have arrived from Bob), just put it in Alice's workset, and delay processing of the orphan until it's Alice's turn. Then you'd have peer == originator and not need it to be returned as a result. At least last time I looked at this (the 202104-whohandlesorphans branch), it ended up simplifying the function to just CTransactionRef GetTxToReconsider(NodeId peer) (using nullptr as the "there's nothing to consider" indicator).


    ajtowns commented at 7:45 PM on November 21, 2022:

    Opened this as a draft at #26551

  24. in src/net_processing.cpp:4778 in 7082ce3e88
    4777 |      }
    4778 |  
    4779 |      if (pfrom->fDisconnect)
    4780 |          return false;
    4781 |  
    4782 | +    if (has_more_orphans) return true;
    


    dergoegge commented at 4:10 PM on October 18, 2022:

    Just wanted to note that you moved this check above the m_getdata_requests.empty() check. Any reason behind that? (Don't see how this changes anything, so feel free to just mark as resolved if there was no reason.)


    ajtowns commented at 5:52 AM on October 19, 2022:

    It avoids taking the m_getdata_requests_mutex without changing the behaviour otherwise.

  25. in src/txorphanage.cpp:152 in 7082ce3e88
     150 | +void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, NodeId peer)
     151 |  {
     152 | -    AssertLockHeld(g_cs_orphans);
     153 | +    LOCK(m_mutex);
     154 | +
     155 | +    // Get this peer's work set, emplacing an empty set it didn't exist
    


    dergoegge commented at 4:13 PM on October 18, 2022:
        // Get this peer's work set, emplacing an empty set if it didn't exist
    

    ajtowns commented at 7:50 PM on November 21, 2022:

    Comment updated in #26551

  26. in src/net_processing.cpp:2886 in 9910ed755c outdated
    2883 |   *                                  orphan will be reconsidered on each call of this function. This set
    2884 |   *                                  may be added to if accepting an orphan causes its children to be
    2885 |   *                                  reconsidered.
    2886 |   */
    2887 | -void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
    2888 | +void PeerManagerImpl::ProcessOrphanTx(Peer& peer)
    


    dergoegge commented at 4:22 PM on October 18, 2022:

    After the work set is moved to the orphanage, ProcessOrphanTx really only needs the id of the peer not the entire Peer.

    Maybe pass NodeId to ProcessOrphanTx instead of Peer in 9910ed755c3dfd7669707b44d993a20030dd7477 and have it look up the needed Peer.


    ajtowns commented at 7:11 AM on November 15, 2022:

    If you already have a Peer, I don't think there's really any reason not to pass in.

  27. dergoegge commented at 4:32 PM on October 18, 2022: member

    Code review ACK 7082ce3e88d77456d60a5a66bd38807fbd66f311

    All comments are non-blocking (could also be done in a follow up)

  28. chinggg commented at 1:01 PM on October 30, 2022: contributor

    Concept ACK

  29. in src/txorphanage.h:27 in 7082ce3e88
      27 |      /** Check if we already have an orphan transaction (by txid or wtxid) */
      28 | -    bool HaveTx(const GenTxid& gtxid) const LOCKS_EXCLUDED(::g_cs_orphans);
      29 | -
      30 | -    /** Get an orphan transaction and its originating peer
      31 | -     * (Transaction ref will be nullptr if not found)
      32 | +    bool HaveTx(const GenTxid& gtxid) const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
    


    chinggg commented at 2:18 PM on October 30, 2022:

    I used to be confused about the different lock granularity for these methods. Thanks for moving all the locks inside the function. But can we use LOCKS_EXCLUDED(m_mutex) as this line used to be instead of EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) for these methods? see https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#excludes


    ajtowns commented at 2:05 AM on October 31, 2022:

    Using a negative annotation (EXCLUSIVE_LOCKS_REQUIRED(!foo)) is preferable in order to avoid "false negatives in some cases" that that link warns about (ie, the compiler not giving warnings about code that could result in double locking occurring), and Mutex is set up to require the use of negative annotations. That means making the change you suggest results in the error:

    txorphanage.cpp:167:5: error: calling function 'MaybeCheckNotHeld' requires negative capability '!m_mutex' [-Werror,-Wthread-safety-analysis]
        LOCK(m_mutex);
    
  30. in src/net_processing.cpp:596 in 0027174b39 outdated
     588 | @@ -589,6 +589,14 @@ class PeerManagerImpl final : public PeerManager
     589 |       */
     590 |      bool MaybeDiscourageAndDisconnect(CNode& pnode, Peer& peer);
     591 |  
     592 | +    /**
     593 | +     * Reconsider orphan transactions after a parent has been accepted to the mempool.
     594 | +     *
     595 | +     * @peer[in]  peer     The peer whose orphan transactions we will reconsider. Generally only one
     596 | +     *                     orphan will be reconsidered on each call of this function. This set
    


    glozow commented at 3:43 PM on November 17, 2022:

    nit in 0027174b396cacbaac5fd52f13be3dca9fcbbfb8:

    It's a bit confusing what "This set" refers to.

         *                     orphan will be reconsidered on each call of this function. The peer's orphan work set
    

    ajtowns commented at 7:50 PM on November 21, 2022:

    Comment updated in #26551

  31. in src/net_processing.cpp:2888 in 733d85f79c outdated
    2884 | @@ -2885,7 +2885,6 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
    2885 |  {
    2886 |      AssertLockHeld(g_msgproc_mutex);
    2887 |      AssertLockHeld(cs_main);
    2888 | -    AssertLockHeld(g_cs_orphans);
    


    glozow commented at 5:24 PM on November 17, 2022:

    in 733d85f79cde353d8c9b54370f296b1031fa33d9 Note to self, my understanding here is:

    • The two threads that may access orphanage are (1) msg proc and (2) scheduler for calling m_orphanage.EraseForBlock() in BlockConnected().
    • Previously, we held g_cs_orphans through the entirety of ProcessOrphanTx, and now we are grabbing the lock for each call here (AddChildrenToWorkSet(*porphanTx, peer.m_id), EraseTx(orphanHash)).

    And this is fine. The "problem" would be if we called BlockConnected and e.g. evicted an orphan that was just confirmed. It's fine since EraseTx() handles if a tx is already gone, and AddChildrenToworkSet() will wait on g_cs_orphans for EraseForBlock() to be done updating m_outpoint_to_orphan_it.

  32. in src/net_processing.cpp:2893 in 7082ce3e88
    2904 | -        if (porphanTx == nullptr) continue;
    2905 | +    CTransactionRef porphanTx = nullptr;
    2906 | +    NodeId from_peer = -1;
    2907 | +    bool more = false;
    2908 |  
    2909 | +    while (CTransactionRef porphanTx = m_orphanage.GetTxToReconsider(peer.m_id, from_peer, more)) {
    


    glozow commented at 5:54 PM on November 17, 2022:

    Since you're touching this, I think it's quite strange that we do this in a while(!work_set.empty()) loop when we're trying to make sure that we only ever process 1 orphan at a time. Perhaps a remnant of pre-#15644 code? My suggestion is to take this out of the while loop and replace the breaks with returns... but non-blocking of course, can do in another PR.


    ajtowns commented at 6:42 AM on November 18, 2022:

    There's three "loops" I guess:

    • ThreadMessageHandler repeatedly calls ProcessMessages calls ProcessMessage calls ProcessOrphanTx; if ProcessOrphanTx returns true, ProcessMessage exits early so the next run of the ThreadMessageHandler loop for this peer immediately goes back to ProcessOrphanTx
    • while (GetTxToReconsider) here loops until it actually does some work (ProcessTransaction returns VALID or something other than TX_MISSING_INPUTS, returning true if the while loop could have continued
    • GetTxToReconsider loops over the peer's workset, returning the first entry in the workset that still exists as an orphan

    Removing the while (GetTxToReconsider) loop would mean that we'd immediately switch to the next peer anytime we came across an orphan that needed two parents, rather than just one, despite that not taking much time to determine (it's part of MemPoolAccept::PreChecks). I don't think that would be wrong, but the way it is seems better to me.

    Removing the internal loop for GetTxToReconsider would mean you'd need to return a new result: "there's more work to do, but the first bit of work was a no-op", and you'd be re-querying m_peer_work_set.find(peer) repeatedly.

  33. glozow commented at 6:52 PM on November 17, 2022: member

    ACK 7082ce3e88d77456d60a5a66bd38807fbd66f311 via code review and some basic testing. I think putting txorphanage in charge of handling peer work sets is the right direction.

  34. glozow merged this on Nov 28, 2022
  35. glozow closed this on Nov 28, 2022

  36. sidhujag referenced this in commit 395a6c70f1 on Dec 1, 2022
  37. glozow referenced this in commit 77a36033b5 on Jan 26, 2023
  38. sidhujag referenced this in commit 09be5d2ccc on Jan 26, 2023
  39. bitcoin locked this on Nov 29, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-26 09:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me