cs_sendProcessing is replaced by a private mutex in net_processing, non-orphan-specific things are moved out from g_cs_orphans and g_cs_orphans is replaced by a private mutex in txorphanage.
net_processing: lock clean up #21527
pull ajtowns wants to merge 18 commits into bitcoin:master from ajtowns:202102-orphanworkset changing 15 files +257 −230-
ajtowns commented at 4:16 AM on March 25, 2021: contributor
- fanquake added the label P2P on Mar 25, 2021
-
in src/net_processing.cpp:3454 in 59cfe0c74b outdated
3059 | @@ -3059,7 +3060,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3060 | } 3061 | 3062 | // Recursively process any orphan transactions that depended on this one 3063 | - ProcessOrphanTx(peer->m_orphan_work_set); 3064 | + ProcessOrphanTx(*peer);
jnewbery commented at 8:28 AM on March 25, 2021:What do you think about removing this call? The comment above is incorrect (we only process one tx at most, not recursively), and we'll process the orphans in subsequent calls to
ProcessMessages(). It seems strange that in this one case we can process up to two transactions inProcessMessage()(the one that we just received, and up to one orphan child of it).
ajtowns commented at 11:27 AM on March 25, 2021:We could process 3 txs I think -- the last orphan already in the workset (in ProcessMessages), one from a just received TX message, and an additional orphan whose parent was the contents of that TX message. I don't think it's problematic that way, but 1-non-trivial-ATMP-call per ProcessMessages invocation could be reasonable.
jnewbery commented at 11:49 AM on March 25, 2021:Very good point. I hadn't considered processing the orphan before processing the net message.
in src/net_processing.cpp:3899 in 59cfe0c74b outdated
3870 | @@ -3871,8 +3871,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt 3871 | } 3872 | 3873 | { 3874 | - LOCK(g_cs_orphans); 3875 | - if (!peer->m_orphan_work_set.empty()) return true; 3876 | + if (more_orphans) return true;
jnewbery commented at 8:44 AM on March 25, 2021:How about moving this check up to immediately after the call to
ProcessOrphanTx()? It's not possible to have bothmore_orphansbe true and havem_getdata_requestsbe non-empty:ProcessOrphanTx(*peer)can only return true if there are elements in peer's orphan work set, which can only be true if the last network message processed was atx(m_peer_work_setonly gets added to when processing atxor when processing an element ofm_peer_work_set, and we won't process another network message until it's empty)m_getdata_requestscan only be non-empty if the last network message processed was agetdata(m_getdata_requestsonly gets added to when processing agetdata, and we won't process another message until it's empty)
That'd eliminate the need for a
more_orphansvariable, and make it immediately obvious that we're not going to process an orphan and a getdata request on the same pass throughProcessMessage().in src/net_processing.cpp:3885 in 59cfe0c74b outdated
3859 | - LOCK2(cs_main, g_cs_orphans); 3860 | - if (!peer->m_orphan_work_set.empty()) { 3861 | - ProcessOrphanTx(peer->m_orphan_work_set); 3862 | - } 3863 | + LOCK(cs_main); 3864 | + more_orphans = ProcessOrphanTx(*peer);
jnewbery commented at 8:47 AM on March 25, 2021:Can be more concisely written as:
diff --git a/src/net_processing.cpp b/src/net_processing.cpp index 02e2785c24..c90d73cb95 100644 --- a/src/net_processing.cpp +++ b/src/net_processing.cpp @@ -3854,11 +3854,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt } } - bool more_orphans = false; - { - LOCK(cs_main); - more_orphans = ProcessOrphanTx(*peer); - } + const bool more_orphans{WITH_LOCK(cs_main, return ProcessOrphanTx(*peer);)}; if (pfrom->fDisconnect) return false;DrahtBot commented at 8:57 AM on March 25, 2021: contributor<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #25168 (refactor: Avoid passing params where not needed by MarcoFalke)
- #24970 (net processing: Move cleanSubVer, fPreferredDownload and nLocalHostNonce to Peer by jnewbery)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
in src/net_processing.cpp:2392 in 59cfe0c74b outdated
2025 | - * reconsidered. 2026 | - */ 2027 | -void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set) 2028 | +bool PeerManagerImpl::ProcessOrphanTx(Peer& peer) 2029 | { 2030 | AssertLockHeld(cs_main);
jnewbery commented at 8:57 AM on March 25, 2021:It'd be good to remove the
EXCLUSIVE_LOCKS_REQUIRED(cs_main)and only take the cs_main lock inside the while look (once we know that we have orphans to reprocess). In vast majority of cases we're taking and releasing cs_main for no reason because we have no orphans to process.
ajtowns commented at 6:39 AM on April 20, 2021:I think:
- just call AddChildrenToWorkSet when a new tx arrives, don't also immediately process them (ie don't also call ProcessOrphanTx)
- when ProcessOrphanTx does work, don't continue on to also do a ProcessMessage, give another peer a chance to get some work done first
- have ProcessOrphanTx take cs_main only when necessary
- make the GetTxToReconsider signature simpler
would all fit together nicely and be a good idea, but it also feels like it's adding a bit much to this PR?
ajtowns commented at 1:44 PM on April 20, 2021:This PR changes orphan processing from being done for the peer that provided the parent to the peer that provided the orphan. I think that's fine, but perhaps we should split the PR into one that does refactoring only and one that has behaviour changes.
Hmm, might make sense to put the changes above together with the last commit in a separate PR, so this is just refactoring and the behaviour changes (who processes orphans, and how many orphans are processed in a cycle) are collected together.
jnewbery commented at 2:00 PM on April 20, 2021:Yes, I agree that it's better to split the PR into two: one that refactors and one that changes behaviour.
ajtowns commented at 10:41 PM on April 22, 2021:This is drafted at https://github.com/ajtowns/bitcoin/commits/202104-whohandlesorphans
in src/txorphanage.cpp:91 in 59cfe0c74b outdated
85 | @@ -87,24 +86,27 @@ int TxOrphanage::EraseTx(const uint256& txid) 86 | 87 | void TxOrphanage::EraseForPeer(NodeId peer) 88 | { 89 | - AssertLockHeld(g_cs_orphans); 90 | + LOCK(m_mutex); 91 | + 92 | + const int work_erased = m_peer_work_set.erase(peer);
jnewbery commented at 8:58 AM on March 25, 2021:Well remembered to do this. Perhaps add a
TxOrphange::Empty()function that returns whether there are any elements in any of the containers in the orphanage, and then assertm_orphange.Empty()in theFinalizeNode()logic when the last peer is deleted.
ajtowns commented at 11:28 AM on March 25, 2021:It crossed my mind that this might be another case where a multi index could be a win.
in src/txorphanage.cpp:104 in 59cfe0c74b outdated
102 | - nErased += EraseTx(maybeErase->second.tx->GetHash()); 103 | + nErased += _EraseTx(maybeErase->second.tx->GetHash()); 104 | } 105 | } 106 | - if (nErased > 0) LogPrint(BCLog::MEMPOOL, "Erased %d orphan tx from peer=%d\n", nErased, peer); 107 | + if (nErased > 0 || work_erased > 0) LogPrint(BCLog::MEMPOOL, "Erased %d orphan tx and %d from orphan work set for peer=%d\n", nErased, work_erased, peer);
jnewbery commented at 9:01 AM on March 25, 2021:Not sure that it's very interesting to log
work_erased. It'll only ever be 0 (nothing in the work set) or 1 (items in the work set).in src/net.h:235 in 59cfe0c74b outdated
231 | @@ -232,6 +232,7 @@ struct LocalServiceInfo { 232 | uint16_t nPort; 233 | }; 234 | 235 | +extern Mutex g_mutex_message_handler;
jnewbery commented at 9:11 AM on March 25, 2021:I don't think this is a good idea. Anything that you guard with this mutex will never be accessible on other threads, which would prevent us from eg exposing it to RPC methods, or using it in validation interface callbacks. This seems like a way of reinventing thread local storage, which I don't think we want.
In addition, this mutex is exposed and used in both the net and net_processing layer, which adds to the coupling between those components. That's the opposite direction that I think we want to go in.
ajtowns commented at 11:34 AM on March 25, 2021:The reasoning isn't that this is particularly "good", it's that it's documenting the safety mechanisms we already have. It would have caught #21528 (review) for instance.
net_processing is fundamentally coupled with net -- it implements the NetEventsInterface defined and called from net. I mean I guess if you preferred, it could be
NetEventsInterface::m_mutex_message_handler? (EDIT: actually, whether or not you prefer that, I think I do...)
jnewbery commented at 12:46 PM on March 25, 2021:And if you're doing that then perhaps you can make it a
PeerManagerImpl::m_mutex_message_handlerand take it inProcessMessages()andSendMessages(). Taking that lock once at the top of those functions would be very low cost (since there would never be any contention).
ajtowns commented at 6:04 AM on April 20, 2021:I've done this now -- the first three commits introduce
m_mutex_messages, removecs_sendProcessingand move extra txns to be protected bym_mutex_messagesinstead ofg_cs_orphans. I think it would be good to cherry-pick those commits into #21186 so that the moved addr variables can actually be marked as guarded by something.
in src/net_processing.cpp:2090 in 59cfe0c74b outdated
2038 | + bool more = false; 2039 | 2040 | - const auto [porphanTx, from_peer] = m_orphanage.GetTx(orphanHash); 2041 | - if (porphanTx == nullptr) continue; 2042 | + while (m_orphanage.GetTxToReconsider(peer.m_id, porphanTx, from_peer, more)) { 2043 | + if (porphanTx == nullptr) break;
jnewbery commented at 9:26 AM on March 25, 2021:Is this possible? I think the logic in
GetTxToReconsider()ensures that if true is returned, then porphanTx will not be null.
ajtowns commented at 11:36 AM on March 25, 2021:assert(porphanTx != nullptr)seemed overkill and not having anything seemed like it might be annoying for static analysers
jnewbery commented at 11:52 AM on March 25, 2021:How about:
if (!Assume(porphanTx)) break?in src/txorphanage.h:35 in 59cfe0c74b outdated
37 | + * to work on. Otherwise returns true, and populates its arguments with tx, 38 | + * the originating peer, and whether there are more orphans for this 39 | + * peer to work on after this tx. 40 | */ 41 | - std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 42 | + bool GetTxToReconsider(NodeId peer, CTransactionRef& ref, NodeId& originator, bool& more);
jnewbery commented at 9:49 AM on March 25, 2021:This is a very strange interface. How about returning a
std::optional<std::pair<CTransactionRef, NodeId>>, and then adding another public methodHasTxToReconsider(). It's a bit less efficient, but we'll only ever call it after processing a transaction, so it seems negligible.in src/txorphanage.h:14 in 59cfe0c74b outdated
9 | @@ -10,8 +10,10 @@ 10 | #include <primitives/transaction.h> 11 | #include <sync.h> 12 | 13 | -/** Guards orphan transactions and extra txs for compact blocks */ 14 | -extern RecursiveMutex g_cs_orphans; 15 | +#include <map> 16 | +#include <optional>
jnewbery commented at 9:49 AM on March 25, 2021:I guess this means you already tried using an optional in the return value of
GetTxToReconsider()?in src/init.cpp:209 in 59cfe0c74b outdated
214 | - // Thus the implicit locking order requirement is: (1) cs_main, (2) g_cs_orphans, (3) cs_vNodes. 215 | + // Thus the implicit locking order requirement is: (1) cs_main, (2) cs_vNodes. 216 | if (node.connman) { 217 | node.connman->StopThreads(); 218 | - LOCK2(::cs_main, ::g_cs_orphans); 219 | + LOCK(::cs_main);
jnewbery commented at 9:53 AM on March 25, 2021:I don't think you need this lock. It was previously here to enforce that cs_main was taken before g_cs_orphans.
It can also be removed from the process_message and process_messages fuzz tests.
ajtowns commented at 11:48 AM on March 25, 2021:I updated the comments which should still indicate why cs_main is needed there -- it needs to be locked before cs_vNodes, but cs_vNodes is locked first if you call StopThreads directly?
jnewbery commented at 12:00 PM on March 25, 2021:Ah, very good point. And we can't take
cs_maininsideStopNodesbecause net isn't aware ofcs_main.It's outside the scope of this PR, but it'd be good to untangle this lock ordering dependency by eg making sure that cs_vNodes is not held when calling any PeerManager functions.
jnewbery commented at 3:02 PM on April 25, 2021:21563 is merged
in src/txorphanage.h:92 in 59cfe0c74b outdated
96 | 97 | + /** Map from txid to orphan transaction record. Limited by 98 | + * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */ 99 | + OrphanMap m_orphans GUARDED_BY(m_mutex); 100 | + 101 | + /** Which peer provided a parent tx of orphans that need to be reconsidered */
jnewbery commented at 10:05 AM on March 25, 2021:This comment is now incorrect. The multimap stores the node that provided the orphan tx.
Also, this data is redundant data with the
fromPeerin theOrphanTxstruct (this could easily be a set of txids to reconsider, and the originating peer is then looked up in theOrphanTxstruct). We store it in the multimap for efficiency - the comment could be updated to indicate that.jnewbery commented at 10:13 AM on March 25, 2021: memberConcept ACK. Lots of minor review comments inline, but a bit of high-level design feedback:
- I'm not at all convinced by the
g_mutex_message_handler. It seems like you're trying to reinvent thread_local storage. That makes potential future changes difficult. It also adds unnecessary coupling between net and net_processing. - This PR changes orphan processing from being done for the peer that provided the parent to the peer that provided the orphan. I think that's fine, but perhaps we should split the PR into one that does refactoring only and one that has behaviour changes.
ajtowns force-pushed on Apr 20, 2021ajtowns force-pushed on Apr 20, 2021ajtowns force-pushed on Apr 20, 2021ajtowns force-pushed on Apr 20, 2021ajtowns commented at 6:53 AM on April 20, 2021: contributorRebased, some of the suggestions picked up, reworked the "message handler" mutex.
in src/net_processing.cpp:330 in 9cc2da27d3 outdated
258 | + * Anything only accessed by message parsing can be guarded by 259 | + * this mutex, which is acquired by ProcessMessages and SendMessages 260 | + */ 261 | + Mutex m_mutex_messages; 262 | + 263 | + void _ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,
jnewbery commented at 10:14 AM on April 20, 2021:I think it'd be cleaner for the
m_mutex_messageslock to be taken by ProcessMessages, rather than ProcessMessage. That would remove the need to define an inner_ProcessMessageto be called when the mutex is already locked. It's also consistent with the comment on them_mutex_messagesmember, which says "is acquired by ProcessMessages and SendMessages"
ajtowns commented at 10:40 AM on April 20, 2021:It is taken by
ProcessMessages()(and also bySendMessages()), howeverProcessMessage()is also called from the test suite, so cannot assume the lock has been taken. That's why the public interface,ProcessMessage(), is implemented as a wrapper that takes the lock, then calls the internal function.
jnewbery commented at 1:02 PM on April 20, 2021:Ah, sorry - misread.
jnewbery commented at 10:15 AM on April 20, 2021: memberI don't understand the second commit log: SendMessages() is now protected internally by m_mutex_messages; so this additional lock is not useful.
SendMessages()is not protected by the new mutex as far as I can see.I prefer the approach in #21563, which locks the mutex whenever any
NetEventsmethod is called.jnewbery commented at 1:08 PM on April 20, 2021: memberI've re-reviewed the locking changes and they look reasonable. There are still a few review comments outstanding. I'm happy to review this PR again once those have been addressed.
ajtowns force-pushed on Apr 22, 2021ajtowns renamed this:NOMERGE: net_processing: orphan handling changes
net_processing: lock clean up
on Apr 22, 2021ajtowns commented at 10:43 PM on April 22, 2021: contributorI think this is now cleaned up, and ready for review. The behaviour changes that were in the last commit are now deferred to https://github.com/ajtowns/bitcoin/commits/202104-whohandlesorphans .
ajtowns added the label Refactoring on Apr 22, 2021ajtowns marked this as ready for review on Apr 22, 2021DrahtBot added the label Needs rebase on Apr 25, 2021jnewbery commented at 3:38 PM on April 25, 2021: memberWhen you rebase this, there are a few files that no longer need to include txorphanage.h:
<details> <summary>Diff</summary>
diff --git a/src/init.cpp b/src/init.cpp index bb5b144802..e7b5ed60e3 100644 --- a/src/init.cpp +++ b/src/init.cpp @@ -52,7 +52,6 @@ #include <torcontrol.h> #include <txdb.h> #include <txmempool.h> -#include <txorphanage.h> #include <util/asmap.h> #include <util/check.h> #include <util/moneystr.h> diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp index 7b99193ad0..5a71b25768 100644 --- a/src/test/fuzz/process_message.cpp +++ b/src/test/fuzz/process_message.cpp @@ -18,7 +18,6 @@ #include <test/util/net.h> #include <test/util/setup_common.h> #include <test/util/validation.h> -#include <txorphanage.h> #include <validationinterface.h> #include <version.h> diff --git a/src/test/fuzz/process_messages.cpp b/src/test/fuzz/process_messages.cpp index 11b236c9bd..f8b1c8fc90 100644 --- a/src/test/fuzz/process_messages.cpp +++ b/src/test/fuzz/process_messages.cpp @@ -13,7 +13,6 @@ #include <test/util/net.h> #include <test/util/setup_common.h> #include <test/util/validation.h> -#include <txorphanage.h> #include <validation.h> #include <validationinterface.h></details>
Those files were only including txorphange.h to use the g_cs_orphans mutex.
ajtowns force-pushed on Apr 29, 2021DrahtBot removed the label Needs rebase on Apr 29, 2021ajtowns force-pushed on Apr 29, 2021ajtowns commented at 9:21 AM on April 29, 2021: contributorRebased (and likewise the whohandlesorphans followup)
in src/txorphanage.h:29 in 584043a020 outdated
28 | - bool HaveTx(const GenTxid& gtxid) const LOCKS_EXCLUDED(::g_cs_orphans); 29 | + bool HaveTx(const GenTxid& gtxid) const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); 30 | 31 | - /** Get an orphan transaction and its originating peer 32 | - * (Transaction ref will be nullptr if not found) 33 | + /** Get an orphan transaction for a peer to work on
jnewbery commented at 12:20 PM on April 29, 2021:/** Get an orphan transaction for a peer to work on and erase it from the peer's workset.
ajtowns commented at 2:04 PM on May 7, 2021:Done-ish
jnewbery commented at 2:39 PM on May 7, 2021:thank you!
jnewbery commented at 12:40 PM on April 29, 2021: memberutACK 584043a0203332fe645529b1c27c086e4f14a61c
Very nice cleanup.
DrahtBot added the label Needs rebase on May 7, 2021ajtowns force-pushed on May 7, 2021ajtowns commented at 12:27 PM on May 7, 2021: contributorRebased past #21845. cc @MarcoFalke @promag for reviews :)
jnewbery commented at 1:00 PM on May 7, 2021: memberutACK 8b4f685ebef5eb14f049d04e2c4f4ce5b44878de
Do you mind addressing #21527 (review) while there are no other ACKs on the branch?
DrahtBot removed the label Needs rebase on May 7, 2021ajtowns force-pushed on May 7, 2021jnewbery commented at 2:39 PM on May 7, 2021: memberACK dd754fd73ab0fcbe9461a6a21b5fc2bc22faf968
in src/net_processing.cpp:2284 in fec0ff1662 outdated
2094 | @@ -2093,6 +2095,7 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, const Peer& peer, 2095 | void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set) 2096 | { 2097 | AssertLockHeld(cs_main); 2098 | + AssertLockHeld(m_mutex_message_handling);
MarcoFalke commented at 5:21 AM on May 9, 2021:nit fec0ff16627c1cbf5545c9de8db19b749c90beee:
The lock order is m_mutex_message_handling -> cs_main -> g_cs_orphans, so it would be nice to also order the Asserts here in the same way.
in src/net_processing.cpp:3959 in 57900348db outdated
3901 | @@ -3895,9 +3902,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt 3902 | 3903 | { 3904 | LOCK2(cs_main, g_cs_orphans); 3905 | - if (!peer->m_orphan_work_set.empty()) { 3906 | - ProcessOrphanTx(*peer); 3907 | - } 3908 | + if (ProcessOrphanTx(*peer)) return true;
MarcoFalke commented at 5:30 AM on May 9, 2021:57900348db566105351b525ae18cc2830e9665b5:
This undocumented one-line-if might be a bit too minimal. Previously there was a nice comment as to why return early is needed here. Now the comment is gone, or at least can't be trivially attached to the return here.
ajtowns commented at 3:45 PM on May 9, 2021:Not sure what you mean by "the comment is gone" ? The "maintains the order of responses and prevents ~vRecvGetData~ m_getdata_requests to grow unbounded" is still there, but at least I thought that sensibly described the ~
!pfrom->vRecvGetData.empty()~!peer->m_getdata_requests.empty()check it remains attached to, and didn't really fit the "more orphans to work on" check. (EDIT: oops, was looking at 0.19 to see if there was any other comment)
ajtowns commented at 11:21 PM on September 2, 2021:Added a comment.
in src/txorphanage.h:47 in 67618f0246 outdated
42 | @@ -43,9 +43,11 @@ class TxOrphanage { 43 | /** Limit the orphanage to the given maximum */ 44 | unsigned int LimitOrphans(unsigned int max_orphans) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 45 | 46 | - /** Add any orphans that list a particular tx as a parent into a peer's work set 47 | - * (ie orphans that may have found their final missing parent, and so should be reconsidered for the mempool) */ 48 | - void AddChildrenToWorkSet(const CTransaction& tx, std::set<uint256>& orphan_work_set) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 49 | + /** Which peer provided a parent tx of orphans that need to be reconsidered */ 50 | + std::multimap<NodeId, uint256> m_peer_work_set GUARDED_BY(g_cs_orphans);
MarcoFalke commented at 6:00 AM on May 9, 2021:67618f024690bc9b926aa48ead23a0f687f03fe4: I know there is a check to avoid duplicate insertion of the same txid, but is there a reason to pick this datastructure, which doesn't disallow duplicate entries like set? Also insertion is trivially more expensive (log(all peers' work sets) vs log(this peers' work set)). Finally handling is done in insertion order, not in sorted order.
vasild commented at 1:53 PM on May 13, 2021:... is there a reason to pick this datastructure, which doesn't disallow duplicate entries ...
I think the reason could be to allow (nodeX, tx1), (nodeX, tx2)?
ajtowns commented at 3:49 AM on May 17, 2021:Note that if you've got N peers each with K items in their work set, then a log search over everything is
log(N*K), and a log search for the peer followed by a log search or the work item islog(N) + log(K)which is the same. If there's one peer with K items and every other peer has 0, you instead getlog(K)vslog(N) + 1/N log(K); so a map of sets would be better than a set of pairs providedN < K, but probably isn't very interesting.I picked the data structure because I thought it was simpler than a set of pairs, but I don't think there's an adequate check to avoid duplicate insertion of the same txid (and to do that, you'd have to search through all the entries for a peer, which would be annoying).
ajtowns commented at 3:52 AM on May 17, 2021:Changed to
map<NodeId,set<uint256>>MarcoFalke commented at 6:05 AM on May 9, 2021: memberdd754fd73ab0fcbe9461a6a21b5fc2bc22faf968 🕵
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 dd754fd73ab0fcbe9461a6a21b5fc2bc22faf968 🕵 -----BEGIN PGP SIGNATURE----- iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p pUgcDAwAqL240wertSr+Mw2Ei6s6SQizRd2AzRniN6bec0MgUdiNK8aOz7CBTuf5 3g5Jt5vqORpMkWidS6+JZFMDJQzOzcZXet+eGP6OpuUEFjUKQ2DXIPvRUvtZZU0s d0Ad75v5Ov/jw5dU/+Mh1o/eaNJvtvE0bkHJMZrounpUDIH+0M0XKuzCVDPe1NeN PomxQwCDlCBbpYJ02ca6oOo9mSzVi0IaMToCdb+S0nI3/1fZwJ2saN16w1ul9Xg+ 8w8Z6j9sxvV9DTTkQTVqv7ckyaXO7V/pZVGzW2FI2hqHYSf8LLRae07+Muqxlec7 6ldtgIZE0v262nVbcWJFITGr6hrZPr/Oyo6FBfAZnNjZmr/7Z60hY2n2FSNu8GUG B84+424mEkH2oOcevu6GeP/GKoL/JXu1gLNxKKPUoOohBfTMHM1OfAeEVm/o/DnI xIhl10GoHg3jo127zutHb1TCvknUKCzJzPeHbVvUSJB9g5DIbKqPaABFn7k+cCUy zJ3COLUx =f8yk -----END PGP SIGNATURE-----Timestamp of file with hash
04c8a7094b10796be32033da7a78c70f9926aef2456fc44aaece53aa2519e97d -</details>
in src/net_processing.cpp:328 in dd754fd73a outdated
258 | + /** Message handling mutex. 259 | + * Message processing is single-threaded, so anything only accessed 260 | + * by ProcessMessage(s) or SendMessages can be guarded by this mutex, 261 | + * which guarantees it's only accessed by a single thread. 262 | + */ 263 | + Mutex m_mutex_message_handling;
vasild commented at 3:25 PM on May 13, 2021:If it is single-threaded, then we don't need to guard anything with a mutex?
It looks to me that
PeerManagerImpl::m_mutex_message_handlingis useless because it is only being acquired from a single thread. No two threads are ever going to race to acquire it at the same time:ThreadMessageHandler() // there is just one thread executing this ProcessMessages() ... lock m_mutex_message_handling ... ... SendMessages() ... lock m_mutex_message_handling ...I may be missing something but to me it looks like
m_mutex_message_handlingcan be removed safely, which will reduce the complexity of the code.
ajtowns commented at 3:56 AM on May 17, 2021:You don't need the mutex at runtime, but having the
GUARDED_BY(m_mutex_message_handling)at compile time makes it easy to verify nobody's accidentally accessing the object from some other thread (either currently, or as a result of some future change, eg adding an RPC call that wants to expose some of the information stored in those objects). The cost of also having the mutex at runtime is trivial since there's never any contention on it.
vasild commented at 4:11 PM on May 17, 2021:NACK the addition of
m_mutex_message_handling.Following the logic of
GUARDED_BY() makes it easy to verify nobody's accidentally accessing...
should we add a dummy mutex for every variable?
IMO mutexes should be used only where concurrent access is possible.
jnewbery commented at 4:43 PM on May 17, 2021:@vasild can you take a look at the commits that build on top of this branch in https://github.com/jnewbery/bitcoin/commits/2021-05-use-mutex-message-handling. Currently, a lot of the data in net_processing is guarded by cs_main, which is also used in validation. One way to loosen that coupling between validation and net_processing is to introduce an internal mutex inside net_processing (
m_mutex_message_handling) that guards its own members.
vasild commented at 1:26 PM on May 18, 2021:I think that https://github.com/jnewbery/bitcoin/commits/2021-05-use-mutex-message-handling can be simplified and the common
m_mutex_message_handlingavoided by changing some of the variables to atomics and using a dedicated mutex for e.g.m_txrequest.Btw, it does not compile.
<details> <summary>minor fixups</summary>
diff --git i/src/net_processing.cpp w/src/net_processing.cpp index c56863e5ef..cc04679683 100644 --- i/src/net_processing.cpp +++ w/src/net_processing.cpp @@ -401,13 +401,13 @@ private: * Set mapBlockSource[hash].second to false if the node should not be * punished if the block is invalid. */ std::map<uint256, std::pair<NodeId, bool>> mapBlockSource GUARDED_BY(cs_main); /** Number of peers with wtxid relay. */ - int m_wtxid_relay_peers GUARDED_BY(m_mutex_message_handing) {0}; + int m_wtxid_relay_peers GUARDED_BY(m_mutex_message_handling) {0}; /** Number of outbound peers with m_chain_sync.m_protect. */ int m_outbound_peers_with_protect_from_disconnect GUARDED_BY(cs_main) = 0; bool AlreadyHaveTx(const GenTxid& gtxid) EXCLUSIVE_LOCKS_REQUIRED(cs_main); @@ -1000,20 +1000,15 @@ void UpdateLastBlockAnnounceTime(NodeId node, int64_t time_in_seconds) CNodeState *state = State(node); if (state) state->m_last_block_announcement = time_in_seconds; } void PeerManagerImpl::InitializeNode(CNode *pnode) { - LOCK(m_mutex_message_handling); - NodeId nodeid = pnode->GetId(); - { - LOCK(cs_main); - mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(pnode->IsInboundConn())); - assert(m_txrequest.Count(nodeid) == 0); - } + WITH_LOCK(cs_main, mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(pnode->IsInboundConn()))); + WITH_LOCK(m_mutex_message_handling, assert(m_txrequest.Count(nodeid) == 0)); { PeerRef peer = std::make_shared<Peer>(nodeid, m_mutex_message_handling); LOCK(m_peer_mutex); m_peer_map.emplace_hint(m_peer_map.end(), nodeid, std::move(peer)); } if (!pnode->IsInboundConn()) {</details>
<details> <summary>compilation errors</summary>
(clang 13) net_processing.cpp:986:33: error: reading variable 'fPreferredDownload' requires holding mutex 'peer.m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] const bool preferred = peer.fPreferredDownload; ^ net_processing.cpp:1033:13: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] _RelayTransaction(txid, tx->GetWitnessHash()); ^ net_processing.cpp:1062:41: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] nPreferredDownload -= peer->fPreferredDownload; ^ net_processing.cpp:1063:42: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] m_wtxid_relay_peers -= peer->m_wtxid_relay; ^ net_processing.cpp:1537:41: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] WITH_LOCK(m_mutex_message_handling, _RelayTransaction(txid, wtxid);); ^ net_processing.cpp:1550:43: error: reading variable 'm_wtxid_relay' requires holding mutex 'it->second->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] node->PushTxInventory(it->second->m_wtxid_relay ? wtxid : txid); ^ net_processing.cpp:2485:15: error: writing variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' exclusively [-Werror,-Wthread-safety-analysis] peer->fPreferredDownload = (!pfrom.IsInboundConn() || pfrom.HasPermission(PF_NOBAN)) && !pfrom.IsAddrFetchConn() && !pfrom.fClient; ^ net_processing.cpp:2486:37: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] nPreferredDownload -= peer->fPreferredDownload; ^ net_processing.cpp:2653:24: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] if (!peer->m_wtxid_relay) { ^ net_processing.cpp:2654:23: error: writing variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' exclusively [-Werror,-Wthread-safety-analysis] peer->m_wtxid_relay = true; ^ net_processing.cpp:2777:23: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] if (peer->m_wtxid_relay) { ^ net_processing.cpp:3049:37: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] const uint256& hash = peer->m_wtxid_relay ? wtxid : txid; ^ net_processing.cpp:3051:19: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] if (peer->m_wtxid_relay && txid != wtxid) { ^ net_processing.cpp:3075:13: error: calling function 'AlreadyHaveTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] if (AlreadyHaveTx(GenTxid(/* is_wtxid=*/true, wtxid))) { ^ net_processing.cpp:3084:21: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] _RelayTransaction(tx.GetHash(), tx.GetWitnessHash()); ^ net_processing.cpp:3090:44: error: calling function 'AcceptToMemoryPool' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] const MempoolAcceptResult result = AcceptToMemoryPool(m_chainman.ActiveChainstate(), m_mempool, ptx, false /* bypass_limits */); ^ net_processing.cpp:3094:23: error: calling function 'check' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] m_mempool.check(m_chainman.ActiveChainstate()); ^ net_processing.cpp:3099:13: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] _RelayTransaction(tx.GetHash(), tx.GetWitnessHash()); ^ net_processing.cpp:3114:13: error: calling function 'ProcessOrphanTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] ProcessOrphanTx(*peer); ^ net_processing.cpp:3131:21: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] if (recentRejects->contains(parent_txid)) { ^ net_processing.cpp:3147:26: error: calling function 'AlreadyHaveTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] if (!AlreadyHaveTx(gtxid)) AddTxAnnouncement(pfrom, *peer, gtxid, current_time); ^ net_processing.cpp:3172:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] recentRejects->insert(tx.GetHash()); ^ net_processing.cpp:3173:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] recentRejects->insert(tx.GetWitnessHash()); ^ net_processing.cpp:3192:24: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] assert(recentRejects); ^ net_processing.cpp:3193:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] recentRejects->insert(tx.GetWitnessHash()); ^ net_processing.cpp:3204:21: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis] recentRejects->insert(tx.GetHash()); ^ net_processing.cpp:4291:29: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] bool fFetch = peer->fPreferredDownload || (nPreferredDownload == 0 && !pto->fClient && !pto->IsAddrFetchConn()); // Download if this is a nice peer, or we have no nice peers and this one might do. ^ net_processing.cpp:4504:53: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] const uint256& hash = peer->m_wtxid_relay ? txinfo.tx->GetWitnessHash() : txinfo.tx->GetHash(); ^ net_processing.cpp:4505:40: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] CInv inv(peer->m_wtxid_relay ? MSG_WTX : MSG_TX, hash); ^ net_processing.cpp:4536:85: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] CompareInvMempoolOrder compareInvMempoolOrder(&m_mempool, peer->m_wtxid_relay); ^ net_processing.cpp:4548:40: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] CInv inv(peer->m_wtxid_relay ? MSG_WTX : MSG_TX, hash); ^ net_processing.cpp:4636:117: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis] if (current_time > state.m_headers_sync_timeout && nSyncStarted == 1 && (nPreferredDownload - peer->fPreferredDownload >= 1)) { ^ 32 errors generated.</details>
ajtowns commented at 4:10 PM on May 18, 2021:"Should we add a dummy mutex for every variable" -- we should have a GUARDED_BY for every object that's reachable by multiple threads, yes. The mutex is not a dummy, it's a real mutex.
For this specific case, here's an example of a patch that adds an access from another thread: https://github.com/ajtowns/bitcoin/commits/202105-whyhaveaguard . With the
GUARDED_BY(m_mutex_message_handling)this is caught at compile time; without the guard (removed in the second commit), it compiles fine but introduces a race condition.
vasild commented at 10:41 AM on May 19, 2021:we should have a GUARDED_BY for every object that's reachable by multiple threads
I assume here you mean -- even if the current code accesses it from a single thread.
If yes, then I think that is an overkill - it means having a mutex attached to every non-const global or class member variable, including private ones (coz in the future somebody may access it from another thread).
For this specific case, here's an example...
Yeah, but one can do such an example for any other variable. IMO annotating everything with
GUARDED_BY(), just in case, would have a net-adverse effect on the code base.<details> <summary>Similar example for two other, randomly picked variables</summary>
diff --git i/src/net.h w/src/net.h index 6b9a7ef136..18d57a7d8e 100644 --- i/src/net.h +++ w/src/net.h @@ -1256,12 +1256,18 @@ private: * an address and port that are designated for incoming Tor connections. */ std::vector<CService> m_onion_binds; friend struct CConnmanTest; friend struct ConnmanTestMsg; + +public: + size_t ResponseCachesSize() const + { + return m_addr_response_caches.size(); + } }; /** Return a timestamp in the future (in microseconds) for exponentially distributed events. */ std::chrono::microseconds PoissonNextSend(std::chrono::microseconds now, std::chrono::seconds average_interval); /** Dump binary message to file, with timestamp */ diff --git i/src/rpc/net.cpp w/src/rpc/net.cpp index 1f6b6e8d7e..cfab1ebafc 100644 --- i/src/rpc/net.cpp +++ w/src/rpc/net.cpp @@ -631,12 +631,14 @@ static RPCHelpMan getnetworkinfo() NodeContext& node = EnsureAnyNodeContext(request.context); if (node.connman) { ServiceFlags services = node.connman->GetLocalServices(); obj.pushKV("localservices", strprintf("%016x", services)); obj.pushKV("localservicesnames", GetServicesNames(services)); } + obj.pushKV("bug1", node.connman->ResponseCachesSize()); + obj.pushKV("bug2", node.addrman->m_asmap.size()); if (node.peerman) { obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs()); } obj.pushKV("timeoffset", GetTimeOffset()); if (node.connman) { obj.pushKV("networkactive", node.connman->GetNetworkActive());</details>
ajtowns commented at 2:37 PM on May 20, 2021:diff --git i/src/net.h w/src/net.hYes, net.cpp/net.h is terrible for having globally accessible variables that aren't annotated for thread safety. Being able to introduce races that the compiler doesn't catch is a bad thing... txmempool's a better example, though it's also missing some guards (cf #22003).
IMO annotating everything with
GUARDED_BY(), just in case, would have a net-adverse effect on the code base.GUARDED_BYis always a "just in case" thing -- any time the code compiles correctly withGUARDED_BYit will also compile correctly without it -- the benefit is "just in case" someone forgets while modifying the code later. With a guard, the compiler will catch the mistake; without it, you have to hope someone reviewing the code remembers, or wait for some obscure bug to be noticed and reported.Not consistently guarding variables has an easily observable adverse affect on the code bas as a whole: it makes it hard to change code, because you're not sure when things might be safe to reuse in a different context. eg see #21061 (review)
it means having a mutex attached to every non-const global or class member variable, including private ones
Yes and no -- no, in that most classes don't need to manually manage locks, but should rather rely on whatever instantiates them to ensure the entire object is accessed safely; but for the ones that do, yes: every non-const, non-atomic member should be guarded (as should every class's non-const, non-atomic static members, since they're effectively globals themselves).
vasild commented at 1:44 PM on May 25, 2021:@ajtowns, thanks for the elaboration.
I still think that mutexes should be introduced when needed, not beforehand (with a reason that in the future somebody may access a variable from a different thread).
Looks like neither one of us is convinced by the other's arguments. It's ok, it would have been too boring if everybody agreed on everything all the time.
MarcoFalke commented at 3:29 PM on January 24, 2022:I think it is fine to not have thread safety stuff on things that aren't meant to be called by more than one thread. For example, most code in init.cpp doesn't need them because they are only called once per process. Though, other stuff should communicate to the developer whether it is thread safe. Either via code comments (https://github.com/bitcoin/bitcoin/blob/e3de7cb9039770e0fd5b8bb8a5cba35c87ae8f00/src/random.h#L67, https://github.com/bitcoin/bitcoin/blob/e3de7cb9039770e0fd5b8bb8a5cba35c87ae8f00/src/random.h#L129) or via annotations.
ajtowns force-pushed on May 17, 2021in src/txorphanage.cpp:147 in d2f5cc9de1 outdated
145 | +void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, NodeId peer) 146 | { 147 | - AssertLockHeld(g_cs_orphans); 148 | + LOCK(m_mutex); 149 | + 150 | + // lookup this peer's work set, ensuring it exists
jnewbery commented at 7:31 AM on May 17, 2021:// Look up this peer's work set, ensuring it exists.in src/net_processing.cpp:2108 in bdd2274936 outdated
2107 | - const uint256 orphanHash = *peer.m_orphan_work_set.begin(); 2108 | - peer.m_orphan_work_set.erase(peer.m_orphan_work_set.begin()); 2109 | + while (!orphan_work_set.empty()) { 2110 | + auto it = orphan_work_set.begin(); 2111 | + const uint256 orphanHash = *it; 2112 | + it = orphan_work_set.erase(it);
jnewbery commented at 7:42 AM on May 17, 2021:In commit bdd227493602437d4b6ee2d366ca0a83189f071c:
The new value for
itis unused. No need to set it here:orphan_work_set.erase(it);in src/txorphanage.h:40 in d2f5cc9de1 outdated
37 | + * the work set, and populates its arguments with tx, the originating 38 | + * peer, and whether there are more orphans for this peer to work on 39 | + * after this tx. 40 | */ 41 | - std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 42 | + bool GetTxToReconsider(NodeId peer, CTransactionRef& ref, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
jnewbery commented at 11:22 AM on May 17, 2021:I left a comment about this interface further up that seems to have been lost without being resolved. I think the interface is needlessly complex and doing too much. From my comment above:
How about returning a
std::optional<std::pair<CTransactionRef, NodeId>>, and then adding another public methodHasTxToReconsider(). It's a bit less efficient, but we'll only ever call it after processing a transaction, so it seems negligible.
ajtowns commented at 11:40 AM on May 17, 2021:See #21527 (review)
in src/txorphanage.cpp:185 in d2f5cc9de1 outdated
187 | + uint256 txid = *it; 188 | + it = work_set.erase(it); 189 | + 190 | + const auto orphan_it = m_orphans.find(txid); 191 | + if (orphan_it != m_orphans.end()) { 192 | + more = (it != work_set.end());
jnewbery commented at 11:23 AM on May 17, 2021:(if you're keeping
more), the following is equivalent, and maybe more direct/clear:more = !(work_set.empty());jnewbery commented at 11:26 AM on May 17, 2021: memberutACK d2f5cc9de12196c870b2079bffc38e90076074d4.
Verified range-diff. Only significant difference was replacing
std::multimap<NodeId, uint256> m_peer_work_setwithstd::map<NodeId, std::set<uint256>> m_peer_work_set.DrahtBot added the label Needs rebase on May 24, 2021ajtowns force-pushed on May 25, 2021DrahtBot removed the label Needs rebase on May 25, 2021ajtowns force-pushed on May 25, 2021jnewbery commented at 10:55 AM on May 25, 2021: memberCode review ACK d6dfa5977a7c03d9a81727e1692edb58bfeab62c
in src/net.cpp:2257 in d6dfa5977a outdated
2196 | @@ -2197,10 +2197,7 @@ void CConnman::ThreadMessageHandler() 2197 | if (flagInterruptMsgProc) 2198 | return; 2199 | // Send messages 2200 | - { 2201 | - LOCK(pnode->cs_sendProcessing); 2202 | - m_msgproc->SendMessages(pnode); 2203 | - } 2204 | + m_msgproc->SendMessages(pnode);
vasild commented at 2:18 PM on May 25, 2021:Previously this would have allowed concurrent executions of
SendMessages()for different peers, whereas now the concurrency is reduced by serializing allSendMessages()under the newly introduced single mutexm_mutex_message_handling.Currently this code is executed by a single thread, so that is irrelevant, but in a possible future where we might want to do concurrent
SendMessages()for different peers, the deletedcs_sendProcessingwill have to be re-introduced.
ariard commented at 6:28 PM on June 16, 2021:Actually, sharing block headers/compact blocks messages in parallel has been discussed a while back (iirc https://bitcoincore.org/en/meetings/2018/05/03/, see "Call ProcessNewBlock asynchronously") though we could insert some queued interface between a net_processing thread and multiple net threads ?
That said, if it's direction we agree on, I think the discussion is worthy to have.
in src/net_processing.cpp:4244 in d6dfa5977a outdated
3961 | - } 3962 | + LOCK(cs_main); 3963 | + if (ProcessOrphanTx(*peer)) return true; 3964 | } 3965 | 3966 | if (pfrom->fDisconnect)
vasild commented at 4:27 PM on May 25, 2021:Before this PR we would have returned
false(no more work) iffDisconnectwas set regardless of the outcome ofProcessOrphanTx()and regardless of whetherpeer->m_orphan_work_setwas empty.Now
fDisconnectmay be set, but we may returntrueifProcessOrphanTx()returnstrue.If this change is not intentional, then maybe swap the order:
if (pfrom->fDisconnect) { return false; } { LOCK(cs_main); if (ProcessOrphanTx(*peer)) return true; }
ajtowns commented at 11:23 PM on September 2, 2021:Changed to
return !pfrom->fDisconnectto preserve the same behaviour.in src/txorphanage.cpp:148 in d6dfa5977a outdated
146 | { 147 | - AssertLockHeld(g_cs_orphans); 148 | + LOCK(m_mutex); 149 | + 150 | + // Get this peer's work set, ensuring it exists 151 | + std::set<uint256>& orphan_work_set = m_peer_work_set.emplace(peer, std::set<uint256>{}).first->second;
vasild commented at 4:35 PM on May 25, 2021:Can be simplified?
std::set<uint256>& orphan_work_set = m_peer_work_set[peer];
ajtowns commented at 11:24 PM on September 2, 2021:Changed to C++17's
try_emplacewhich simplifies it slightly. Not really a fan ofmap::operator[]in src/txorphanage.h:93 in d6dfa5977a outdated
97 | + /** Map from txid to orphan transaction record. Limited by 98 | + * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */ 99 | + OrphanMap m_orphans GUARDED_BY(m_mutex); 100 | + 101 | + /** Which peer provided a parent tx of orphans that need to be reconsidered */ 102 | + std::map<NodeId, std::set<uint256>> m_peer_work_set GUARDED_BY(m_mutex);
vasild commented at 4:39 PM on May 25, 2021:s/map/unordered_map/to make lookupO(1)(we don't rely on this being ordered byNodeId). Same for the set of transaction ids.std::unordered_map<NodeId, std::unordered_set<uint256>> m_peer_work_set GUARDED_BY(m_mutex);
ajtowns commented at 11:25 PM on September 2, 2021:Left as map; I don't think performance matters enough to justify using extra space for the hashmap; and longer term, using a multiindex rather than a bunch of different containers referencing each other is probably better.
in src/net_processing.cpp:2399 in d6dfa5977a outdated
2175 | + bool more = false; 2176 | 2177 | - const auto [porphanTx, from_peer] = m_orphanage.GetTx(orphanHash); 2178 | - if (porphanTx == nullptr) continue; 2179 | + while (m_orphanage.GetTxToReconsider(peer.m_id, porphanTx, from_peer, more)) { 2180 | + if (!Assume(porphanTx)) break;
vasild commented at 4:53 PM on May 26, 2021:This
Assume()seems unnecessary becauseGetTxToReconsider()is documented to populateporphanTxif it returnstrue.
ajtowns commented at 2:53 AM on May 28, 2021:All Assume's should be unnecessary by definition? See #21527 (review) for prior discussion.
vasild commented at 8:48 AM on May 28, 2021:Assume()makes sense if the value is derived in an obscure way. But not for checking whether a function has set its "out" parameters as documented.For example, when calling
CSHA256::Finalize(result)we don't setresultto a dummy value before the call and check that it is not that dummy value after the call withAssume(). Same for any other function that has "out" parameters.If you insist to check that
porphanTxwas set by the function, then I guess you should do the same withfrom_peer:Assume(from_peer != -1).in src/txorphanage.h:90 in d6dfa5977a outdated
94 | } 95 | }; 96 | 97 | + /** Map from txid to orphan transaction record. Limited by 98 | + * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */ 99 | + OrphanMap m_orphans GUARDED_BY(m_mutex);
vasild commented at 9:36 AM on May 27, 2021:Is it possible for
TxOrphanage::m_orphansto contain keys (transaction ids) which are not inTxOrphanage::m_peer_work_setor the other way around?
ajtowns commented at 3:20 AM on May 28, 2021:m_orphanswill normally contain txids that aren't inm_peer_work_set-- they only get added to the work set when a parent appears and are (hopefully) very quickly removed from the work set after retrying ATMP.It's also possible for something to expire from
m_orphansand still be present inm_peer_work_set: if a tx that's in the work set is removed viaLimitOrphansorEraseForBlockyou'll get that case. It will eventually be removed from the work set whenGetTxToReconsideris called.michaelfolkson commented at 11:42 AM on June 16, 2021: contributorNACK the addition of m_mutex_message_handling.
Looks like neither one of us is convinced by the other's arguments. It's ok, it would have been too boring if everybody agreed on everything all the time. @vasild: Are you in a position where you can ACK this PR overall (with reservations) or would you prefer not to ACK it? Either is fine (obviously), I just suspect other reviewers may be waiting to see if your concerns are addressed or not before spending time reviewing this.
in src/net_processing.cpp:346 in d6dfa5977a outdated
342 | + * Reconsider orphan transactions after a parent has been accepted to the mempool. 343 | + * 344 | + * @param[in,out] peer The peer whose orphan transactions we will reconsider. Generally only one 345 | + * orphan will be reconsidered on each call of this function. This peer's set 346 | + * may be added to if accepting an orphan causes its children to be 347 | + * reconsidered.
ariard commented at 5:30 PM on June 16, 2021:I think second part of this comment "This peer's set may be added..." is confusing and annotating the parameter as an output doesn't make sense anymore.
After this PR, parameter is no more a
std::map<NodeId, std::set<uint256>>and new one can be actually a const.
ajtowns commented at 11:28 PM on September 2, 2021:Dropped the
outwhich doesn't make sense. Knowing that this may increase the size of the set, not simply reduce it is valuable I think, so have left that text alone.in src/txorphanage.h:49 in d6dfa5977a outdated
49 | - void EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans); 50 | + void EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex); 51 | 52 | /** Erase all orphans included in or invalidated by a new block */ 53 | - void EraseForBlock(const CBlock& block) LOCKS_EXCLUDED(::g_cs_orphans); 54 | + void EraseForBlock(const CBlock& block) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
ariard commented at 5:36 PM on June 16, 2021:What do you think about
EXCLUSIVE_LOCKS_EXCLUDED(!(...))? The logical negation operator is easy to miss for a reviewer.
ajtowns commented at 11:29 PM on September 2, 2021:Seems like something more for clang upstream if anything?
in src/net_processing.cpp:2284 in d6dfa5977a outdated
2161 | - * reconsidered. 2162 | - */ 2163 | -void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set) 2164 | +bool PeerManagerImpl::ProcessOrphanTx(Peer& peer) 2165 | { 2166 | + AssertLockHeld(m_mutex_message_handling);
ariard commented at 6:00 PM on June 16, 2021:At commit f86a525, I don't think this lock is useful to protect
vExtraTxnForCompact, there is already one inAddToCompactExtraTransactions? Though it might be useful at branch tip ?
ajtowns commented at 6:49 AM on September 2, 2021:I was just adding
AssertLockHeldto correspond with theEXCLUSIVE_LOCKS_REQUIREDannotation as a matter of course (see the recommendation in developer-notes.md).in src/net_processing.cpp:4239 in d6dfa5977a outdated
3968 | @@ -3943,11 +3969,6 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt 3969 | if (!peer->m_getdata_requests.empty()) return true; 3970 | } 3971 | 3972 | - { 3973 | - LOCK(g_cs_orphans); 3974 | - if (!peer->m_orphan_work_set.empty()) return true;
ariard commented at 6:18 PM on June 16, 2021:I think this introduce a lightweight behavior change.
Previously, if our peer has
fDisconnect=trueset up from a previous message processing and the orphan work set is still non-empty after theProcessOrphanTxabove, we would have returnfalse.From now on, if the orphan work set is still non-empty after
ProcessOrphanTx, we're going to returntrue.Though i don't think it has any impact on allowing our peer to abuse more our CPU time, when we return from
ProcessMessagestoThreadMessageHandler, we go toSendMessageswhich callsMaybeDiscourageAndDisconnectas first thing.(Actually why the check on
fDisconnectisn't first inProcessMessages, should be before any kind of processing work attempted withProcessGetData/ProcessOrphanTx?)
ajtowns commented at 6:51 AM on September 2, 2021:fDisconnectis already checked in net.cpp prior toProcessMessagesbeing called, but there's always a chance that there's a race andfDisconnectis set while in the middle ofProcessMessagesfor the peer.ariard commented at 6:19 PM on June 16, 2021: memberApproach ACK, reviewed up to 1e3f227 so far
in src/net_processing.cpp:4581 in 7961d94c8b outdated
4323 | @@ -4303,6 +4324,8 @@ bool PeerManagerImpl::SendMessages(CNode* pto) 4324 | if (!peer) return false; 4325 | const Consensus::Params& consensusParams = m_chainparams.GetConsensus(); 4326 | 4327 | + LOCK(m_mutex_message_handling);
narula commented at 10:55 PM on June 18, 2021:/** Message handling mutex.
- Message processing is single-threaded, so anything only accessed
- by ProcessMessage(s) or SendMessages can be guarded by this mutex,
- which guarantees it's only accessed by a single thread.
I'm confused. This mutex ends up protecting everything accessed in those functions. For example, in
SendMessages, it also protectsMaybeDiscourageAndDisconnectthat accesses things likepeer.m_misbehavior_mutex, which can be accessed by the validation thread. So this mutex is guarding more things than what is only accessed by theProcessMessage(s)andSendMessagesthread.I suppose it's safe that this mutex also ends up guarding things that are accessed by multiple threads, but I find the comment confusing.
ajtowns commented at 10:26 PM on September 2, 2021:If you have:
Mutex mutex; int x GUARDED_BY(mutex); int y; static void A() { LOCK(mutex); ++x; ++y; } static void B() { LOCK(mutex); --x; } void ThreadA() { for (int i = 0; i < 100; ++i) { A(); } } void ThreadB() { for (int i = 0; i < 100; ++i) { B(); } }then I'd say that only
xis guarded by the mutex, even thoughyis only actually accessed while the mutex is held. If you later introducestatic void C() { y *= 2; }then that would still be safe/correct if
ThreadAwere to start callingC(), butywould no longer always be protected by the mutex; conversely ifThreadBwere to start callingC(), thenywould no longer be thread safe, but because it's not guarded by anything, the compiler would not catch that bug.I guess I'd look at it more as mutexes protect code from being run simultaneously; it's the GUARDED_BY annotations that protect the variables. If the variables aren't annotated, they don't have any automatic protections, only manual protection by the programmer constantly being careful.
vasild commented at 8:06 AM on September 3, 2021:I guess I'd look at it more as mutexes protect code from being run simultaneously
IMO mutexes protect variables, not code.
From https://en.wikipedia.org/wiki/Lock_(computer_science)
... a mechanism that enforces limits on access to a resource ... ... each thread cooperates by acquiring the lock before accessing the corresponding data ...
ajtowns commented at 2:55 AM on September 7, 2021:Read your second quote carefully -- it's not the mutex/lock that protects the data, it's the threads that protect it by cooperatively acquiring the lock before accessing the data.
It's 99% true that the point of using a mutex is to protect data, but it's not the mutex itself that ensures the data is protected, it's how you use the mutex. It's like looking both ways before crossing the road -- if you don't do that, you'll probably walk into traffic; but looking alone isn't enough, you have to make sure you do it before crossing the road, and depending on what you see, delay crossing the road. The point of annotations here is to ensure at compile time that you're not metaphorically stepping into the road before looking, or literally not accessing variables that are only meant to be accessed by a particular thread from another thread.
(I think I've repeated this point enough now; if there's something not clear, feel free to ask, but I'm not seeing any point to going around in circles another time)
in src/net_processing.cpp:2643 in 7961d94c8b outdated
2406 | @@ -2396,6 +2407,14 @@ void PeerManagerImpl::ProcessBlock(CNode& pfrom, const std::shared_ptr<const CBl 2407 | void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,
amitiuttarwar commented at 11:28 PM on June 18, 2021:do I understand correctly that after these changes
PeerManagerImpl::ProcessMessageis a test-only function? If so, I think the comment innet_processing.hshould be updated. Or even better, the function renamed to something likeProcessMessageTest:)
ajtowns commented at 10:35 PM on September 2, 2021:PeerManager::ProcessMessagehas always only been exposed for tests, and is already documented that way in the header file:/** Process a single message from a peer. Public for fuzz testing */ virtual void ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,PeerManagerImpl::ProcessMessageis thus needed to complete thePeerManagerinterface.vasild commented at 2:02 PM on June 21, 2021: contributorNACK the addition of m_mutex_message_handling.
Looks like neither one of us is convinced by the other's arguments. It's ok, it would have been too boring if everybody agreed on everything all the time.
@vasild: Are you in a position where you can ACK this PR overall (with reservations) or would you prefer not to ACK it? Either is fine (obviously), I just suspect other reviewers may be waiting to see if your concerns are addressed or not before spending time reviewing this.
The rest of this PR looks good overall (modulo some minor other comments I left). But the mutex
m_mutex_message_handlinglooks really strange to me, with a confusing comment.I would re-review carefully and probably ACK this PR if the mutex is removed. If needed later then it can be added in another PR.
jnewbery referenced this in commit 1b84b99caa on Jul 28, 2021ajtowns force-pushed on Sep 2, 2021glozow commented at 1:43 PM on September 24, 2021: memberConcept ACK, I like the idea of txorphanage being internally thread-safe
DrahtBot added the label Needs rebase on Sep 27, 2021ajtowns force-pushed on Sep 29, 2021DrahtBot removed the label Needs rebase on Sep 29, 2021DrahtBot added the label Needs rebase on Oct 25, 2021ajtowns force-pushed on Oct 26, 2021DrahtBot removed the label Needs rebase on Oct 26, 2021DrahtBot added the label Needs rebase on Nov 10, 2021ajtowns force-pushed on Dec 9, 2021ajtowns commented at 3:17 PM on December 9, 2021: contributorRebased past a few conflicts.
DrahtBot removed the label Needs rebase on Dec 9, 2021DrahtBot added the label Needs rebase on Jan 24, 2022ajtowns force-pushed on Mar 4, 2022ajtowns marked this as a draft on Mar 4, 2022DrahtBot removed the label Needs rebase on Mar 4, 2022DrahtBot added the label Needs rebase on Mar 7, 2022ajtowns force-pushed on Mar 7, 2022DrahtBot removed the label Needs rebase on Mar 7, 2022ajtowns force-pushed on Mar 25, 2022DrahtBot added the label Needs rebase on Apr 26, 2022ajtowns force-pushed on May 3, 2022DrahtBot removed the label Needs rebase on May 3, 2022DrahtBot added the label Needs rebase on May 16, 2022ajtowns force-pushed on May 16, 2022ajtowns force-pushed on May 16, 2022DrahtBot removed the label Needs rebase on May 16, 2022DrahtBot added the label Needs rebase on May 19, 2022net/net_processing: add missing thread safety annotations 70a3324d84fca9fadd4bnet: mark TransportSerializer/m_serializer as const
The (V1)TransportSerializer instance CNode::m_serializer is used from multiple threads via PushMessage without protection by a mutex. This is only thread safe because the class does not have any mutable state, so document that by marking the methods and the object as "const".
021a96d41bnet: mark CNode unique_ptr members as const
Dereferencing a unique_ptr is not necessarily thread safe. The reason these are safe is because their values are set at construction and do not change later; so mark them as const and set them via the initializer list to guarantee that.
5df0bd1e8dnet: note CNode members that are treated as const
m_permissionFlags and m_prefer_evict are treated as const -- they're only set immediately after construction before any other thread has access to the object, and not changed again afterwards. As such they don't need to be marked atomic or guarded by a mutex; though it would probably be better to actually mark them as const...
c8c4faac48net: add NetEventsInterface::g_mutex_msgproc_thread mutex
There are many cases where we asssume message processing is single-threaded in order for how we access node-related memory to be safe. Add an explicit mutex that we can use to document this, which allows the compiler to catch any cases where we try to access that memory from other threads and break that assumption.
b1af785346net: drop cs_sendProcessing
SendMessages() is now protected g_mutex_msgproc_thread; so this additional per-node mutex is redundant.
8e75e74e6a[move-only] net: move NetEventsInterface before CNode
This allows CNode members to be marked as guarded by the g_mutex_msgproc_thread mutex.
net: add thread safety annotations for CNode/Peer members accessed only via the msgproc thread 49b72d235cnet_processing: add thread safety annotations for Peer members accessed only via the msgproc thread 3ca9617d4549c3b28ab4net_processing: extra transactions data are accessed only via the msgproc thread
Previously vExtraTxnForCompact and vExtraTxnForCompactIt were protected by g_cs_orphans; protect them by g_mutex_msgproc_thread instead, as they are only used during message processing.
Remove unnecessary includes of txorphanage.h 953e9bc0a8net_processing: Pass a Peer to ProcessOrphanTx 0dd568f41enet_processing: Localise orphan_work_set handling to ProcessOrphanTx 983a6c09catxorphange/net_processing: move orphan workset to txorphanage d30d1da235txorphanage: make m_peer_work_set private 5b3f4d4101Move all g_cs_orphans locking to txorphanage 04a8962eadtxorphanage: move g_cs_orphans to TxOrphanage::m_mutex 9037716fe8txorphanage: fix missing includes 023909c638ajtowns force-pushed on May 19, 2022DrahtBot removed the label Needs rebase on May 19, 2022DrahtBot commented at 1:41 PM on May 20, 2022: contributor<!--cf906140f33d8803c4a75a2196329ecb-->
🐙 This pull request conflicts with the target branch and needs rebase.
<sub>Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".</sub>
DrahtBot added the label Needs rebase on May 20, 2022ajtowns closed this on Aug 29, 2022bitcoin locked this on Aug 29, 2023
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 15:14 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me