Moves extra transactions to be under the m_msgproc_mutex lock rather than g_cs_orphans and refactors orphan handling so that the lock can be internal to the TxOrphange class.
Replace global g_cs_orphans lock with local #26295
pull ajtowns wants to merge 9 commits into bitcoin:master from ajtowns:202210-cs_orphans changing 8 files +126 −99-
ajtowns commented at 9:09 AM on October 11, 2022: contributor
-
Remove unnecessary includes of txorphange.h ff8d44d196
-
89e2e0da0b
net_processing: move extra transactions to msgproc mutex
Previously vExtraTxnForCompact and vExtraTxnForCompactIt were protected by g_cs_orphans; protect them by g_msgproc_mutex instead, as they are only used during message processing.
-
net_processing: Pass a Peer& to ProcessOrphanTx 9910ed755c
-
net_processing: move ProcessOrphanTx docs to declaration 0027174b39
-
net_processing: Localise orphan_work_set handling to ProcessOrphanTx 6f8e442ba6
-
txorphange: move orphan workset to txorphanage 3614819864
-
txorphanage: make m_peer_work_set private a936f41a5d
-
in src/net_processing.cpp:599 in 5d59755168 outdated
596 | + * may be added to if accepting an orphan causes its children to be 597 | + * reconsidered. 598 | + * @return True if there are still orphans in this peer's work set. 599 | + */ 600 | + bool ProcessOrphanTx(Peer& peer) EXCLUSIVE_LOCKS_REQUIRED(cs_main) 601 | + EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, g_msgproc_mutex);
maflcko commented at 10:17 AM on October 11, 2022:Is it intentional to split the annotations on two different lines?
ajtowns commented at 1:45 PM on October 11, 2022:It wasn't, I just keep not noticing it. Fixed.
dergoegge commented at 11:15 AM on October 11, 2022: memberConcept ACK
Move all g_cs_orphans locking to txorphanage 733d85f79c7082ce3e88scripted-diff: rename and de-globalise g_cs_orphans
-BEGIN VERIFY SCRIPT- sed -i -e 's/static RecursiveMutex/mutable Mutex/' src/txorphanage.h sed -i -e '/RecursiveMutex/d' src/txorphanage.cpp sed -i -e 's/g_cs_orphans/m_mutex/g' $(git grep -l g_cs_orphans src/) -END VERIFY SCRIPT-
ajtowns force-pushed on Oct 11, 2022DrahtBot commented at 5:23 PM on October 11, 2022: contributor<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--021abf342d371248e50ceaed478a90ca-->
Reviews
See the guideline for information on the review process.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #26151 (refactor: Guard TxRequestTracker by its own lock instead of cs_main by dergoegge)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
hebasto commented at 11:37 AM on October 12, 2022: memberConcept ACK.
in src/net_processing.cpp:933 in 7082ce3e88
929 | @@ -924,14 +930,14 @@ class PeerManagerImpl final : public PeerManager 930 | /** Storage for orphan information */ 931 | TxOrphanage m_orphanage; 932 | 933 | - void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 934 | + void AddToCompactExtraTransactions(const CTransactionRef& tx) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
hebasto commented at 12:58 PM on October 12, 2022:89e2e0da0bcd0b0757d7b42907e2d2214da9f68b
Add
AssertLockHeld(g_msgproc_mutex);into implementation code?hebasto commented at 1:18 PM on October 12, 2022: member89e2e0da0bcd0b0757d7b42907e2d2214da9f68b
... protect them by
g_msgproc_mutexinstead, as they are only used during message processing.It does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex. It could be locked only within
PeerManagerImpl::AddToCompactExtraTransactionsandPeerManagerImpl::ProcessMessagefunctions. The fact that "they are only used during message processing" will guarantee no performance loss.6f8e442ba61378489a6e2e2ab5bcfbccca1a29ec "net_processing: Localise orphan_work_set handling to ProcessOrphanTx" looks behavior changing. Do we have tests which cover it?
Tbh, I'd be happy if 4 or 5 first commits could be split into a separated PR.
glozow commented at 7:59 PM on October 12, 2022: memberBig Concept ACK
ajtowns commented at 5:19 AM on October 17, 2022: contributorIt does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex
No, it shouldn't have its own mutex: it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic. The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.
hebasto commented at 11:55 AM on October 17, 2022: memberIt does not look like a justification. An access to the extra txs ring buffer should be synced with its own mutex
No, it shouldn't have its own mutex... The only time when more locks are better is when that enables more parallelism, and improves performance -- that is, a justification is needed when introducing new locks.
This implies that in future design changes the
vExtraTxnForCompactcannot be processed by any other thread except for the message processing one. Do we really want such a restriction for an ideal networking code?it's always better to have fewer locks than more, as that uses fewer resources and means there are fewer things to keep in your head when following the logic.
I cannot agree with "it's always better".
Making it use "fewer resources" is a kind of optimization, and "premature optimization is the root of all evil”.
it's always better to have fewer locks than more, as ... there are fewer things to keep in your head when following the logic.
Perhaps, we are talking about personal biases, but, when following the logic, it's better to keep things structured, to have small classes which implement data structures with internal locking and thread-safe interfaces.
ajtowns commented at 2:47 AM on October 18, 2022: contributorThis implies that in future design changes the
vExtraTxnForCompactcannot be processed by any other thread except for the message processing one.What are you talking about? If we can put extra transactions under a different lock now, we certainly can in the future. This isn't an API that we have to support indefinitely.
in src/txorphanage.h:36 in 7082ce3e88
37 | + * the transaction from the work set, and populates its arguments with 38 | + * the originating peer, and whether there are more orphans for this peer 39 | + * to work on after this tx. 40 | */ 41 | - std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 42 | + CTransactionRef GetTxToReconsider(NodeId peer, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
dergoegge commented at 4:03 PM on October 18, 2022:std::optional<TxToReconsider> GetTxToReconsider(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);With
TxToReconsiderbeing:struct TxToReconsider { const CTransactionRef tx; const NodeId originator; const bool more; };That would seem a little cleaner to me because
originatorandmoreonly have meaning if the returned transaction is notnullptr, so bundling the three would make sense imo.
glozow commented at 5:41 PM on November 17, 2022:Was going to comment the same thing, this seems like a better interface given that we only want
originator/moreif there is a tx to work on, and generally avoiding in-out params is cleaner. (non-blocking)
ajtowns commented at 6:57 AM on November 18, 2022:There was some other followup ideas in the previous PR, particularly #21527 (review) -- the idea there is that if Alice sends orphan tx Y which has a missing parent X, instead of immediately processing Y when X arrives (which might have arrived from Bob), just put it in Alice's workset, and delay processing of the orphan until it's Alice's turn. Then you'd have
peer == originatorand not need it to be returned as a result. At least last time I looked at this (the 202104-whohandlesorphans branch), it ended up simplifying the function to justCTransactionRef GetTxToReconsider(NodeId peer)(usingnullptras the "there's nothing to consider" indicator).
in src/net_processing.cpp:4778 in 7082ce3e88
4777 | } 4778 | 4779 | if (pfrom->fDisconnect) 4780 | return false; 4781 | 4782 | + if (has_more_orphans) return true;
dergoegge commented at 4:10 PM on October 18, 2022:Just wanted to note that you moved this check above the
m_getdata_requests.empty()check. Any reason behind that? (Don't see how this changes anything, so feel free to just mark as resolved if there was no reason.)
ajtowns commented at 5:52 AM on October 19, 2022:It avoids taking the
m_getdata_requests_mutexwithout changing the behaviour otherwise.in src/txorphanage.cpp:152 in 7082ce3e88
150 | +void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, NodeId peer) 151 | { 152 | - AssertLockHeld(g_cs_orphans); 153 | + LOCK(m_mutex); 154 | + 155 | + // Get this peer's work set, emplacing an empty set it didn't exist
dergoegge commented at 4:13 PM on October 18, 2022:// Get this peer's work set, emplacing an empty set if it didn't exist
in src/net_processing.cpp:2886 in 9910ed755c outdated
2883 | * orphan will be reconsidered on each call of this function. This set 2884 | * may be added to if accepting an orphan causes its children to be 2885 | * reconsidered. 2886 | */ 2887 | -void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set) 2888 | +void PeerManagerImpl::ProcessOrphanTx(Peer& peer)
dergoegge commented at 4:22 PM on October 18, 2022:After the work set is moved to the orphanage,
ProcessOrphanTxreally only needs the id of the peer not the entirePeer.Maybe pass
NodeIdtoProcessOrphanTxinstead ofPeerin 9910ed755c3dfd7669707b44d993a20030dd7477 and have it look up the neededPeer.
ajtowns commented at 7:11 AM on November 15, 2022:If you already have a
Peer, I don't think there's really any reason not to pass in.dergoegge commented at 4:32 PM on October 18, 2022: memberCode review ACK 7082ce3e88d77456d60a5a66bd38807fbd66f311
All comments are non-blocking (could also be done in a follow up)
chinggg commented at 1:01 PM on October 30, 2022: contributorConcept ACK
in src/txorphanage.h:27 in 7082ce3e88
27 | /** Check if we already have an orphan transaction (by txid or wtxid) */ 28 | - bool HaveTx(const GenTxid& gtxid) const LOCKS_EXCLUDED(::g_cs_orphans); 29 | - 30 | - /** Get an orphan transaction and its originating peer 31 | - * (Transaction ref will be nullptr if not found) 32 | + bool HaveTx(const GenTxid& gtxid) const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
chinggg commented at 2:18 PM on October 30, 2022:I used to be confused about the different lock granularity for these methods. Thanks for moving all the locks inside the function. But can we use
LOCKS_EXCLUDED(m_mutex)as this line used to be instead ofEXCLUSIVE_LOCKS_REQUIRED(!m_mutex)for these methods? see https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#excludes
ajtowns commented at 2:05 AM on October 31, 2022:Using a negative annotation (
EXCLUSIVE_LOCKS_REQUIRED(!foo)) is preferable in order to avoid "false negatives in some cases" that that link warns about (ie, the compiler not giving warnings about code that could result in double locking occurring), andMutexis set up to require the use of negative annotations. That means making the change you suggest results in the error:txorphanage.cpp:167:5: error: calling function 'MaybeCheckNotHeld' requires negative capability '!m_mutex' [-Werror,-Wthread-safety-analysis] LOCK(m_mutex);in src/net_processing.cpp:596 in 0027174b39 outdated
588 | @@ -589,6 +589,14 @@ class PeerManagerImpl final : public PeerManager 589 | */ 590 | bool MaybeDiscourageAndDisconnect(CNode& pnode, Peer& peer); 591 | 592 | + /** 593 | + * Reconsider orphan transactions after a parent has been accepted to the mempool. 594 | + * 595 | + * @peer[in] peer The peer whose orphan transactions we will reconsider. Generally only one 596 | + * orphan will be reconsidered on each call of this function. This set
glozow commented at 3:43 PM on November 17, 2022:nit in 0027174b396cacbaac5fd52f13be3dca9fcbbfb8:
It's a bit confusing what "This set" refers to.
* orphan will be reconsidered on each call of this function. The peer's orphan work set
in src/net_processing.cpp:2888 in 733d85f79c outdated
2884 | @@ -2885,7 +2885,6 @@ bool PeerManagerImpl::ProcessOrphanTx(Peer& peer) 2885 | { 2886 | AssertLockHeld(g_msgproc_mutex); 2887 | AssertLockHeld(cs_main); 2888 | - AssertLockHeld(g_cs_orphans);
glozow commented at 5:24 PM on November 17, 2022:in 733d85f79cde353d8c9b54370f296b1031fa33d9 Note to self, my understanding here is:
- The two threads that may access orphanage are (1) msg proc and (2) scheduler for calling
m_orphanage.EraseForBlock()inBlockConnected(). - Previously, we held
g_cs_orphansthrough the entirety ofProcessOrphanTx, and now we are grabbing the lock for each call here (AddChildrenToWorkSet(*porphanTx, peer.m_id),EraseTx(orphanHash)).
And this is fine. The "problem" would be if we called
BlockConnectedand e.g. evicted an orphan that was just confirmed. It's fine sinceEraseTx()handles if a tx is already gone, andAddChildrenToworkSet()will wait ong_cs_orphansforEraseForBlock()to be done updatingm_outpoint_to_orphan_it.in src/net_processing.cpp:2893 in 7082ce3e88
2904 | - if (porphanTx == nullptr) continue; 2905 | + CTransactionRef porphanTx = nullptr; 2906 | + NodeId from_peer = -1; 2907 | + bool more = false; 2908 | 2909 | + while (CTransactionRef porphanTx = m_orphanage.GetTxToReconsider(peer.m_id, from_peer, more)) {
glozow commented at 5:54 PM on November 17, 2022:Since you're touching this, I think it's quite strange that we do this in a
while(!work_set.empty())loop when we're trying to make sure that we only ever process 1 orphan at a time. Perhaps a remnant of pre-#15644 code? My suggestion is to take this out of the while loop and replace thebreaks with returns... but non-blocking of course, can do in another PR.
ajtowns commented at 6:42 AM on November 18, 2022:There's three "loops" I guess:
- ThreadMessageHandler repeatedly calls ProcessMessages calls ProcessMessage calls ProcessOrphanTx; if ProcessOrphanTx returns true, ProcessMessage exits early so the next run of the ThreadMessageHandler loop for this peer immediately goes back to ProcessOrphanTx
while (GetTxToReconsider)here loops until it actually does some work (ProcessTransaction returns VALID or something other than TX_MISSING_INPUTS, returningtrueif thewhileloop could have continuedGetTxToReconsiderloops over the peer's workset, returning the first entry in the workset that still exists as an orphan
Removing the
while (GetTxToReconsider)loop would mean that we'd immediately switch to the next peer anytime we came across an orphan that needed two parents, rather than just one, despite that not taking much time to determine (it's part of MemPoolAccept::PreChecks). I don't think that would be wrong, but the way it is seems better to me.Removing the internal loop for
GetTxToReconsiderwould mean you'd need to return a new result: "there's more work to do, but the first bit of work was a no-op", and you'd be re-queryingm_peer_work_set.find(peer)repeatedly.glozow commented at 6:52 PM on November 17, 2022: memberACK 7082ce3e88d77456d60a5a66bd38807fbd66f311 via code review and some basic testing. I think putting txorphanage in charge of handling peer work sets is the right direction.
glozow merged this on Nov 28, 2022glozow closed this on Nov 28, 2022sidhujag referenced this in commit 395a6c70f1 on Dec 1, 2022glozow referenced this in commit 77a36033b5 on Jan 26, 2023sidhujag referenced this in commit 09be5d2ccc on Jan 26, 2023bitcoin locked this on Nov 29, 2023
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-26 09:14 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me