cs_sendProcessing
is replaced by a private mutex in net_processing, non-orphan-specific things are moved out from g_cs_orphans
and g_cs_orphans
is replaced by a private mutex in txorphanage.
cs_sendProcessing
is replaced by a private mutex in net_processing, non-orphan-specific things are moved out from g_cs_orphans
and g_cs_orphans
is replaced by a private mutex in txorphanage.
3059@@ -3059,7 +3060,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3060 }
3061
3062 // Recursively process any orphan transactions that depended on this one
3063- ProcessOrphanTx(peer->m_orphan_work_set);
3064+ ProcessOrphanTx(*peer);
ProcessMessages()
. It seems strange that in this one case we can process up to two transactions in ProcessMessage()
(the one that we just received, and up to one orphan child of it).
3870@@ -3871,8 +3871,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
3871 }
3872
3873 {
3874- LOCK(g_cs_orphans);
3875- if (!peer->m_orphan_work_set.empty()) return true;
3876+ if (more_orphans) return true;
How about moving this check up to immediately after the call to ProcessOrphanTx()
? It’s not possible to have both more_orphans
be true and have m_getdata_requests
be non-empty:
ProcessOrphanTx(*peer)
can only return true if there are elements in peer’s orphan work set, which can only be true if the last network message processed was a tx
(m_peer_work_set
only gets added to when processing a tx
or when processing an element of m_peer_work_set
, and we won’t process another network message until it’s empty)m_getdata_requests
can only be non-empty if the last network message processed was a getdata
(m_getdata_requests
only gets added to when processing a getdata
, and we won’t process another message until it’s empty)That’d eliminate the need for a more_orphans
variable, and make it immediately obvious that we’re not going to process an orphan and a getdata request on the same pass through ProcessMessage()
.
3859- LOCK2(cs_main, g_cs_orphans);
3860- if (!peer->m_orphan_work_set.empty()) {
3861- ProcessOrphanTx(peer->m_orphan_work_set);
3862- }
3863+ LOCK(cs_main);
3864+ more_orphans = ProcessOrphanTx(*peer);
Can be more concisely written as:
0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
1index 02e2785c24..c90d73cb95 100644
2--- a/src/net_processing.cpp
3+++ b/src/net_processing.cpp
4@@ -3854,11 +3854,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
5 }
6 }
7
8- bool more_orphans = false;
9- {
10- LOCK(cs_main);
11- more_orphans = ProcessOrphanTx(*peer);
12- }
13+ const bool more_orphans{WITH_LOCK(cs_main, return ProcessOrphanTx(*peer);)};
14
15 if (pfrom->fDisconnect)
16 return false;
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
2025- * reconsidered.
2026- */
2027-void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
2028+bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
2029 {
2030 AssertLockHeld(cs_main);
EXCLUSIVE_LOCKS_REQUIRED(cs_main)
and only take the cs_main lock inside the while look (once we know that we have orphans to reprocess). In vast majority of cases we’re taking and releasing cs_main for no reason because we have no orphans to process.
I think:
would all fit together nicely and be a good idea, but it also feels like it’s adding a bit much to this PR?
This PR changes orphan processing from being done for the peer that provided the parent to the peer that provided the orphan. I think that’s fine, but perhaps we should split the PR into one that does refactoring only and one that has behaviour changes.
Hmm, might make sense to put the changes above together with the last commit in a separate PR, so this is just refactoring and the behaviour changes (who processes orphans, and how many orphans are processed in a cycle) are collected together.
85@@ -87,24 +86,27 @@ int TxOrphanage::EraseTx(const uint256& txid)
86
87 void TxOrphanage::EraseForPeer(NodeId peer)
88 {
89- AssertLockHeld(g_cs_orphans);
90+ LOCK(m_mutex);
91+
92+ const int work_erased = m_peer_work_set.erase(peer);
TxOrphange::Empty()
function that returns whether there are any elements in any of the containers in the orphanage, and then assert m_orphange.Empty()
in the FinalizeNode()
logic when the last peer is deleted.
102- nErased += EraseTx(maybeErase->second.tx->GetHash());
103+ nErased += _EraseTx(maybeErase->second.tx->GetHash());
104 }
105 }
106- if (nErased > 0) LogPrint(BCLog::MEMPOOL, "Erased %d orphan tx from peer=%d\n", nErased, peer);
107+ if (nErased > 0 || work_erased > 0) LogPrint(BCLog::MEMPOOL, "Erased %d orphan tx and %d from orphan work set for peer=%d\n", nErased, work_erased, peer);
work_erased
. It’ll only ever be 0 (nothing in the work set) or 1 (items in the work set).
231@@ -232,6 +232,7 @@ struct LocalServiceInfo {
232 uint16_t nPort;
233 };
234
235+extern Mutex g_mutex_message_handler;
I don’t think this is a good idea. Anything that you guard with this mutex will never be accessible on other threads, which would prevent us from eg exposing it to RPC methods, or using it in validation interface callbacks. This seems like a way of reinventing thread local storage, which I don’t think we want.
In addition, this mutex is exposed and used in both the net and net_processing layer, which adds to the coupling between those components. That’s the opposite direction that I think we want to go in.
The reasoning isn’t that this is particularly “good”, it’s that it’s documenting the safety mechanisms we already have. It would have caught #21528 (review) for instance.
net_processing is fundamentally coupled with net – it implements the NetEventsInterface defined and called from net. I mean I guess if you preferred, it could be NetEventsInterface::m_mutex_message_handler
? (EDIT: actually, whether or not you prefer that, I think I do…)
PeerManagerImpl::m_mutex_message_handler
and take it in ProcessMessages()
and SendMessages()
. Taking that lock once at the top of those functions would be very low cost (since there would never be any contention).
m_mutex_messages
, remove cs_sendProcessing
and move extra txns to be protected by m_mutex_messages
instead of g_cs_orphans
. I think it would be good to cherry-pick those commits into #21186 so that the moved addr variables can actually be marked as guarded by something.
2038+ bool more = false;
2039
2040- const auto [porphanTx, from_peer] = m_orphanage.GetTx(orphanHash);
2041- if (porphanTx == nullptr) continue;
2042+ while (m_orphanage.GetTxToReconsider(peer.m_id, porphanTx, from_peer, more)) {
2043+ if (porphanTx == nullptr) break;
GetTxToReconsider()
ensures that if true is returned, then porphanTx will not be null.
assert(porphanTx != nullptr)
seemed overkill and not having anything seemed like it might be annoying for static analysers
if (!Assume(porphanTx)) break
?
37+ * to work on. Otherwise returns true, and populates its arguments with tx,
38+ * the originating peer, and whether there are more orphans for this
39+ * peer to work on after this tx.
40 */
41- std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
42+ bool GetTxToReconsider(NodeId peer, CTransactionRef& ref, NodeId& originator, bool& more);
std::optional<std::pair<CTransactionRef, NodeId>>
, and then adding another public method HasTxToReconsider()
. It’s a bit less efficient, but we’ll only ever call it after processing a transaction, so it seems negligible.
9@@ -10,8 +10,10 @@
10 #include <primitives/transaction.h>
11 #include <sync.h>
12
13-/** Guards orphan transactions and extra txs for compact blocks */
14-extern RecursiveMutex g_cs_orphans;
15+#include <map>
16+#include <optional>
GetTxToReconsider()
?
214- // Thus the implicit locking order requirement is: (1) cs_main, (2) g_cs_orphans, (3) cs_vNodes.
215+ // Thus the implicit locking order requirement is: (1) cs_main, (2) cs_vNodes.
216 if (node.connman) {
217 node.connman->StopThreads();
218- LOCK2(::cs_main, ::g_cs_orphans);
219+ LOCK(::cs_main);
I don’t think you need this lock. It was previously here to enforce that cs_main was taken before g_cs_orphans.
It can also be removed from the process_message and process_messages fuzz tests.
Ah, very good point. And we can’t take cs_main
inside StopNodes
because net isn’t aware of cs_main
.
It’s outside the scope of this PR, but it’d be good to untangle this lock ordering dependency by eg making sure that cs_vNodes is not held when calling any PeerManager functions.
96
97+ /** Map from txid to orphan transaction record. Limited by
98+ * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */
99+ OrphanMap m_orphans GUARDED_BY(m_mutex);
100+
101+ /** Which peer provided a parent tx of orphans that need to be reconsidered */
This comment is now incorrect. The multimap stores the node that provided the orphan tx.
Also, this data is redundant data with the fromPeer
in the OrphanTx
struct (this could easily be a set of txids to reconsider, and the originating peer is then looked up in the OrphanTx
struct). We store it in the multimap for efficiency - the comment could be updated to indicate that.
Concept ACK. Lots of minor review comments inline, but a bit of high-level design feedback:
g_mutex_message_handler
. It seems like you’re trying to reinvent thread_local storage. That makes potential future changes difficult. It also adds unnecessary coupling between net and net_processing.258+ * Anything only accessed by message parsing can be guarded by
259+ * this mutex, which is acquired by ProcessMessages and SendMessages
260+ */
261+ Mutex m_mutex_messages;
262+
263+ void _ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,
m_mutex_messages
lock to be taken by ProcessMessages, rather than ProcessMessage. That would remove the need to define an inner _ProcessMessage
to be called when the mutex is already locked. It’s also consistent with the comment on the m_mutex_messages
member, which says “is acquired by ProcessMessages and SendMessages”
ProcessMessages()
(and also by SendMessages()
), however ProcessMessage()
is also called from the test suite, so cannot assume the lock has been taken. That’s why the public interface, ProcessMessage()
, is implemented as a wrapper that takes the lock, then calls the internal function.
I don’t understand the second commit log: SendMessages() is now protected internally by m_mutex_messages; so this additional lock is not useful. SendMessages()
is not protected by the new mutex as far as I can see.
I prefer the approach in #21563, which locks the mutex whenever any NetEvents
method is called.
When you rebase this, there are a few files that no longer need to include txorphanage.h:
0diff --git a/src/init.cpp b/src/init.cpp
1index bb5b144802..e7b5ed60e3 100644
2--- a/src/init.cpp
3+++ b/src/init.cpp
4@@ -52,7 +52,6 @@
5 #include <torcontrol.h>
6 #include <txdb.h>
7 #include <txmempool.h>
8-#include <txorphanage.h>
9 #include <util/asmap.h>
10 #include <util/check.h>
11 #include <util/moneystr.h>
12diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp
13index 7b99193ad0..5a71b25768 100644
14--- a/src/test/fuzz/process_message.cpp
15+++ b/src/test/fuzz/process_message.cpp
16@@ -18,7 +18,6 @@
17 #include <test/util/net.h>
18 #include <test/util/setup_common.h>
19 #include <test/util/validation.h>
20-#include <txorphanage.h>
21 #include <validationinterface.h>
22 #include <version.h>
23
24diff --git a/src/test/fuzz/process_messages.cpp b/src/test/fuzz/process_messages.cpp
25index 11b236c9bd..f8b1c8fc90 100644
26--- a/src/test/fuzz/process_messages.cpp
27+++ b/src/test/fuzz/process_messages.cpp
28@@ -13,7 +13,6 @@
29 #include <test/util/net.h>
30 #include <test/util/setup_common.h>
31 #include <test/util/validation.h>
32-#include <txorphanage.h>
33 #include <validation.h>
34 #include <validationinterface.h>
Those files were only including txorphange.h to use the g_cs_orphans mutex.
28- bool HaveTx(const GenTxid& gtxid) const LOCKS_EXCLUDED(::g_cs_orphans);
29+ bool HaveTx(const GenTxid& gtxid) const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
30
31- /** Get an orphan transaction and its originating peer
32- * (Transaction ref will be nullptr if not found)
33+ /** Get an orphan transaction for a peer to work on
0 /** Get an orphan transaction for a peer to work on and erase it from the peer's workset.
utACK 584043a0203332fe645529b1c27c086e4f14a61c
Very nice cleanup.
utACK 8b4f685ebef5eb14f049d04e2c4f4ce5b44878de
Do you mind addressing #21527 (review) while there are no other ACKs on the branch?
2094@@ -2093,6 +2095,7 @@ void PeerManagerImpl::ProcessHeadersMessage(CNode& pfrom, const Peer& peer,
2095 void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
2096 {
2097 AssertLockHeld(cs_main);
2098+ AssertLockHeld(m_mutex_message_handling);
nit fec0ff16627c1cbf5545c9de8db19b749c90beee:
The lock order is m_mutex_message_handling -> cs_main -> g_cs_orphans, so it would be nice to also order the Asserts here in the same way.
3901@@ -3895,9 +3902,7 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
3902
3903 {
3904 LOCK2(cs_main, g_cs_orphans);
3905- if (!peer->m_orphan_work_set.empty()) {
3906- ProcessOrphanTx(*peer);
3907- }
3908+ if (ProcessOrphanTx(*peer)) return true;
57900348db566105351b525ae18cc2830e9665b5:
This undocumented one-line-if might be a bit too minimal. Previously there was a nice comment as to why return early is needed here. Now the comment is gone, or at least can’t be trivially attached to the return here.
!pfrom->vRecvGetData.empty()
~ !peer->m_getdata_requests.empty()
check it remains attached to, and didn’t really fit the “more orphans to work on” check. (EDIT: oops, was looking at 0.19 to see if there was any other comment)
42@@ -43,9 +43,11 @@ class TxOrphanage {
43 /** Limit the orphanage to the given maximum */
44 unsigned int LimitOrphans(unsigned int max_orphans) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
45
46- /** Add any orphans that list a particular tx as a parent into a peer's work set
47- * (ie orphans that may have found their final missing parent, and so should be reconsidered for the mempool) */
48- void AddChildrenToWorkSet(const CTransaction& tx, std::set<uint256>& orphan_work_set) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
49+ /** Which peer provided a parent tx of orphans that need to be reconsidered */
50+ std::multimap<NodeId, uint256> m_peer_work_set GUARDED_BY(g_cs_orphans);
… is there a reason to pick this datastructure, which doesn’t disallow duplicate entries …
I think the reason could be to allow (nodeX, tx1), (nodeX, tx2)?
Note that if you’ve got N peers each with K items in their work set, then a log search over everything is log(N*K)
, and a log search for the peer followed by a log search or the work item is log(N) + log(K)
which is the same. If there’s one peer with K items and every other peer has 0, you instead get log(K)
vs log(N) + 1/N log(K)
; so a map of sets would be better than a set of pairs provided N < K
, but probably isn’t very interesting.
I picked the data structure because I thought it was simpler than a set of pairs, but I don’t think there’s an adequate check to avoid duplicate insertion of the same txid (and to do that, you’d have to search through all the entries for a peer, which would be annoying).
map<NodeId,set<uint256>>
dd754fd73ab0fcbe9461a6a21b5fc2bc22faf968 🕵
Signature:
0-----BEGIN PGP SIGNED MESSAGE-----
1Hash: SHA512
2
3dd754fd73ab0fcbe9461a6a21b5fc2bc22faf968 🕵
4-----BEGIN PGP SIGNATURE-----
5
6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
7pUgcDAwAqL240wertSr+Mw2Ei6s6SQizRd2AzRniN6bec0MgUdiNK8aOz7CBTuf5
83g5Jt5vqORpMkWidS6+JZFMDJQzOzcZXet+eGP6OpuUEFjUKQ2DXIPvRUvtZZU0s
9d0Ad75v5Ov/jw5dU/+Mh1o/eaNJvtvE0bkHJMZrounpUDIH+0M0XKuzCVDPe1NeN
10PomxQwCDlCBbpYJ02ca6oOo9mSzVi0IaMToCdb+S0nI3/1fZwJ2saN16w1ul9Xg+
118w8Z6j9sxvV9DTTkQTVqv7ckyaXO7V/pZVGzW2FI2hqHYSf8LLRae07+Muqxlec7
126ldtgIZE0v262nVbcWJFITGr6hrZPr/Oyo6FBfAZnNjZmr/7Z60hY2n2FSNu8GUG
13B84+424mEkH2oOcevu6GeP/GKoL/JXu1gLNxKKPUoOohBfTMHM1OfAeEVm/o/DnI
14xIhl10GoHg3jo127zutHb1TCvknUKCzJzPeHbVvUSJB9g5DIbKqPaABFn7k+cCUy
15zJ3COLUx
16=f8yk
17-----END PGP SIGNATURE-----
Timestamp of file with hash 04c8a7094b10796be32033da7a78c70f9926aef2456fc44aaece53aa2519e97d -
258+ /** Message handling mutex.
259+ * Message processing is single-threaded, so anything only accessed
260+ * by ProcessMessage(s) or SendMessages can be guarded by this mutex,
261+ * which guarantees it's only accessed by a single thread.
262+ */
263+ Mutex m_mutex_message_handling;
If it is single-threaded, then we don’t need to guard anything with a mutex?
It looks to me that PeerManagerImpl::m_mutex_message_handling
is useless because it is only being acquired from a single thread. No two threads are ever going to race to acquire it at the same time:
0ThreadMessageHandler() // there is just one thread executing this
1 ProcessMessages()
2 ...
3 lock m_mutex_message_handling
4 ...
5 ...
6 SendMessages()
7 ...
8 lock m_mutex_message_handling
9 ...
I may be missing something but to me it looks like m_mutex_message_handling
can be removed safely, which will reduce the complexity of the code.
GUARDED_BY(m_mutex_message_handling)
at compile time makes it easy to verify nobody’s accidentally accessing the object from some other thread (either currently, or as a result of some future change, eg adding an RPC call that wants to expose some of the information stored in those objects). The cost of also having the mutex at runtime is trivial since there’s never any contention on it.
NACK the addition of m_mutex_message_handling
.
Following the logic of
GUARDED_BY() makes it easy to verify nobody’s accidentally accessing…
should we add a dummy mutex for every variable?
IMO mutexes should be used only where concurrent access is possible.
m_mutex_message_handling
) that guards its own members.
I think that https://github.com/jnewbery/bitcoin/commits/2021-05-use-mutex-message-handling can be simplified and the common m_mutex_message_handling
avoided by changing some of the variables to atomics and using a dedicated mutex for e.g. m_txrequest
.
Btw, it does not compile.
0diff --git i/src/net_processing.cpp w/src/net_processing.cpp
1index c56863e5ef..cc04679683 100644
2--- i/src/net_processing.cpp
3+++ w/src/net_processing.cpp
4@@ -401,13 +401,13 @@ private:
5 * Set mapBlockSource[hash].second to false if the node should not be
6 * punished if the block is invalid.
7 */
8 std::map<uint256, std::pair<NodeId, bool>> mapBlockSource GUARDED_BY(cs_main);
9
10 /** Number of peers with wtxid relay. */
11- int m_wtxid_relay_peers GUARDED_BY(m_mutex_message_handing) {0};
12+ int m_wtxid_relay_peers GUARDED_BY(m_mutex_message_handling) {0};
13
14 /** Number of outbound peers with m_chain_sync.m_protect. */
15 int m_outbound_peers_with_protect_from_disconnect GUARDED_BY(cs_main) = 0;
16
17 bool AlreadyHaveTx(const GenTxid& gtxid) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
18
19@@ -1000,20 +1000,15 @@ void UpdateLastBlockAnnounceTime(NodeId node, int64_t time_in_seconds)
20 CNodeState *state = State(node);
21 if (state) state->m_last_block_announcement = time_in_seconds;
22 }
23
24 void PeerManagerImpl::InitializeNode(CNode *pnode)
25 {
26- LOCK(m_mutex_message_handling);
27-
28 NodeId nodeid = pnode->GetId();
29- {
30- LOCK(cs_main);
31- mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(pnode->IsInboundConn()));
32- assert(m_txrequest.Count(nodeid) == 0);
33- }
34+ WITH_LOCK(cs_main, mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(pnode->IsInboundConn())));
35+ WITH_LOCK(m_mutex_message_handling, assert(m_txrequest.Count(nodeid) == 0));
36 {
37 PeerRef peer = std::make_shared<Peer>(nodeid, m_mutex_message_handling);
38 LOCK(m_peer_mutex);
39 m_peer_map.emplace_hint(m_peer_map.end(), nodeid, std::move(peer));
40 }
41 if (!pnode->IsInboundConn()) {
0(clang 13)
1
2net_processing.cpp:986:33: error: reading variable 'fPreferredDownload' requires holding mutex 'peer.m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
3 const bool preferred = peer.fPreferredDownload;
4 ^
5net_processing.cpp:1033:13: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
6 _RelayTransaction(txid, tx->GetWitnessHash());
7 ^
8net_processing.cpp:1062:41: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
9 nPreferredDownload -= peer->fPreferredDownload;
10 ^
11net_processing.cpp:1063:42: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
12 m_wtxid_relay_peers -= peer->m_wtxid_relay;
13 ^
14net_processing.cpp:1537:41: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
15 WITH_LOCK(m_mutex_message_handling, _RelayTransaction(txid, wtxid););
16 ^
17net_processing.cpp:1550:43: error: reading variable 'm_wtxid_relay' requires holding mutex 'it->second->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
18 node->PushTxInventory(it->second->m_wtxid_relay ? wtxid : txid);
19 ^
20net_processing.cpp:2485:15: error: writing variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' exclusively [-Werror,-Wthread-safety-analysis]
21 peer->fPreferredDownload = (!pfrom.IsInboundConn() || pfrom.HasPermission(PF_NOBAN)) && !pfrom.IsAddrFetchConn() && !pfrom.fClient;
22 ^
23net_processing.cpp:2486:37: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
24 nPreferredDownload -= peer->fPreferredDownload;
25 ^
26net_processing.cpp:2653:24: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
27 if (!peer->m_wtxid_relay) {
28 ^
29net_processing.cpp:2654:23: error: writing variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' exclusively [-Werror,-Wthread-safety-analysis]
30 peer->m_wtxid_relay = true;
31 ^
32net_processing.cpp:2777:23: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
33 if (peer->m_wtxid_relay) {
34 ^
35net_processing.cpp:3049:37: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
36 const uint256& hash = peer->m_wtxid_relay ? wtxid : txid;
37 ^
38net_processing.cpp:3051:19: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
39 if (peer->m_wtxid_relay && txid != wtxid) {
40 ^
41net_processing.cpp:3075:13: error: calling function 'AlreadyHaveTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
42 if (AlreadyHaveTx(GenTxid(/* is_wtxid=*/true, wtxid))) {
43 ^
44net_processing.cpp:3084:21: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
45 _RelayTransaction(tx.GetHash(), tx.GetWitnessHash());
46 ^
47net_processing.cpp:3090:44: error: calling function 'AcceptToMemoryPool' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
48 const MempoolAcceptResult result = AcceptToMemoryPool(m_chainman.ActiveChainstate(), m_mempool, ptx, false /* bypass_limits */);
49 ^
50net_processing.cpp:3094:23: error: calling function 'check' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
51 m_mempool.check(m_chainman.ActiveChainstate());
52 ^
53net_processing.cpp:3099:13: error: calling function '_RelayTransaction' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
54 _RelayTransaction(tx.GetHash(), tx.GetWitnessHash());
55 ^
56net_processing.cpp:3114:13: error: calling function 'ProcessOrphanTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
57 ProcessOrphanTx(*peer);
58 ^
59net_processing.cpp:3131:21: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
60 if (recentRejects->contains(parent_txid)) {
61 ^
62net_processing.cpp:3147:26: error: calling function 'AlreadyHaveTx' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
63 if (!AlreadyHaveTx(gtxid)) AddTxAnnouncement(pfrom, *peer, gtxid, current_time);
64 ^
65net_processing.cpp:3172:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
66 recentRejects->insert(tx.GetHash());
67 ^
68net_processing.cpp:3173:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
69 recentRejects->insert(tx.GetWitnessHash());
70 ^
71net_processing.cpp:3192:24: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
72 assert(recentRejects);
73 ^
74net_processing.cpp:3193:17: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
75 recentRejects->insert(tx.GetWitnessHash());
76 ^
77net_processing.cpp:3204:21: error: reading variable 'recentRejects' requires holding mutex 'cs_main' [-Werror,-Wthread-safety-analysis]
78 recentRejects->insert(tx.GetHash());
79 ^
80net_processing.cpp:4291:29: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
81 bool fFetch = peer->fPreferredDownload || (nPreferredDownload == 0 && !pto->fClient && !pto->IsAddrFetchConn()); // Download if this is a nice peer, or we have no nice peers and this one might do.
82 ^
83net_processing.cpp:4504:53: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
84 const uint256& hash = peer->m_wtxid_relay ? txinfo.tx->GetWitnessHash() : txinfo.tx->GetHash();
85 ^
86net_processing.cpp:4505:40: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
87 CInv inv(peer->m_wtxid_relay ? MSG_WTX : MSG_TX, hash);
88 ^
89net_processing.cpp:4536:85: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
90 CompareInvMempoolOrder compareInvMempoolOrder(&m_mempool, peer->m_wtxid_relay);
91 ^
92net_processing.cpp:4548:40: error: reading variable 'm_wtxid_relay' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
93 CInv inv(peer->m_wtxid_relay ? MSG_WTX : MSG_TX, hash);
94 ^
95net_processing.cpp:4636:117: error: reading variable 'fPreferredDownload' requires holding mutex 'peer->m_mutex_message_handling' [-Werror,-Wthread-safety-analysis]
96 if (current_time > state.m_headers_sync_timeout && nSyncStarted == 1 && (nPreferredDownload - peer->fPreferredDownload >= 1)) {
97 ^
9832 errors generated.
“Should we add a dummy mutex for every variable” – we should have a GUARDED_BY for every object that’s reachable by multiple threads, yes. The mutex is not a dummy, it’s a real mutex.
For this specific case, here’s an example of a patch that adds an access from another thread: https://github.com/ajtowns/bitcoin/commits/202105-whyhaveaguard . With the GUARDED_BY(m_mutex_message_handling)
this is caught at compile time; without the guard (removed in the second commit), it compiles fine but introduces a race condition.
we should have a GUARDED_BY for every object that’s reachable by multiple threads
I assume here you mean – even if the current code accesses it from a single thread.
If yes, then I think that is an overkill - it means having a mutex attached to every non-const global or class member variable, including private ones (coz in the future somebody may access it from another thread).
For this specific case, here’s an example…
Yeah, but one can do such an example for any other variable. IMO annotating everything with GUARDED_BY()
, just in case, would have a net-adverse effect on the code base.
0diff --git i/src/net.h w/src/net.h
1index 6b9a7ef136..18d57a7d8e 100644
2--- i/src/net.h
3+++ w/src/net.h
4@@ -1256,12 +1256,18 @@ private:
5 * an address and port that are designated for incoming Tor connections.
6 */
7 std::vector<CService> m_onion_binds;
8
9 friend struct CConnmanTest;
10 friend struct ConnmanTestMsg;
11+
12+public:
13+ size_t ResponseCachesSize() const
14+ {
15+ return m_addr_response_caches.size();
16+ }
17 };
18
19 /** Return a timestamp in the future (in microseconds) for exponentially distributed events. */
20 std::chrono::microseconds PoissonNextSend(std::chrono::microseconds now, std::chrono::seconds average_interval);
21
22 /** Dump binary message to file, with timestamp */
23diff --git i/src/rpc/net.cpp w/src/rpc/net.cpp
24index 1f6b6e8d7e..cfab1ebafc 100644
25--- i/src/rpc/net.cpp
26+++ w/src/rpc/net.cpp
27@@ -631,12 +631,14 @@ static RPCHelpMan getnetworkinfo()
28 NodeContext& node = EnsureAnyNodeContext(request.context);
29 if (node.connman) {
30 ServiceFlags services = node.connman->GetLocalServices();
31 obj.pushKV("localservices", strprintf("%016x", services));
32 obj.pushKV("localservicesnames", GetServicesNames(services));
33 }
34+ obj.pushKV("bug1", node.connman->ResponseCachesSize());
35+ obj.pushKV("bug2", node.addrman->m_asmap.size());
36 if (node.peerman) {
37 obj.pushKV("localrelay", !node.peerman->IgnoresIncomingTxs());
38 }
39 obj.pushKV("timeoffset", GetTimeOffset());
40 if (node.connman) {
41 obj.pushKV("networkactive", node.connman->GetNetworkActive());
diff --git i/src/net.h w/src/net.h
Yes, net.cpp/net.h is terrible for having globally accessible variables that aren’t annotated for thread safety. Being able to introduce races that the compiler doesn’t catch is a bad thing… txmempool’s a better example, though it’s also missing some guards (cf #22003).
IMO annotating everything with
GUARDED_BY()
, just in case, would have a net-adverse effect on the code base.
GUARDED_BY
is always a “just in case” thing – any time the code compiles correctly with GUARDED_BY
it will also compile correctly without it – the benefit is “just in case” someone forgets while modifying the code later. With a guard, the compiler will catch the mistake; without it, you have to hope someone reviewing the code remembers, or wait for some obscure bug to be noticed and reported.
Not consistently guarding variables has an easily observable adverse affect on the code bas as a whole: it makes it hard to change code, because you’re not sure when things might be safe to reuse in a different context. eg see #21061 (review)
it means having a mutex attached to every non-const global or class member variable, including private ones
Yes and no – no, in that most classes don’t need to manually manage locks, but should rather rely on whatever instantiates them to ensure the entire object is accessed safely; but for the ones that do, yes: every non-const, non-atomic member should be guarded (as should every class’s non-const, non-atomic static members, since they’re effectively globals themselves).
@ajtowns, thanks for the elaboration.
I still think that mutexes should be introduced when needed, not beforehand (with a reason that in the future somebody may access a variable from a different thread).
Looks like neither one of us is convinced by the other’s arguments. It’s ok, it would have been too boring if everybody agreed on everything all the time.
145+void TxOrphanage::AddChildrenToWorkSet(const CTransaction& tx, NodeId peer)
146 {
147- AssertLockHeld(g_cs_orphans);
148+ LOCK(m_mutex);
149+
150+ // lookup this peer's work set, ensuring it exists
0 // Look up this peer's work set, ensuring it exists.
2107- const uint256 orphanHash = *peer.m_orphan_work_set.begin();
2108- peer.m_orphan_work_set.erase(peer.m_orphan_work_set.begin());
2109+ while (!orphan_work_set.empty()) {
2110+ auto it = orphan_work_set.begin();
2111+ const uint256 orphanHash = *it;
2112+ it = orphan_work_set.erase(it);
In commit bdd227493602437d4b6ee2d366ca0a83189f071c:
The new value for it
is unused. No need to set it here:
0 orphan_work_set.erase(it);
37+ * the work set, and populates its arguments with tx, the originating
38+ * peer, and whether there are more orphans for this peer to work on
39+ * after this tx.
40 */
41- std::pair<CTransactionRef, NodeId> GetTx(const uint256& txid) const EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
42+ bool GetTxToReconsider(NodeId peer, CTransactionRef& ref, NodeId& originator, bool& more) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
I left a comment about this interface further up that seems to have been lost without being resolved. I think the interface is needlessly complex and doing too much. From my comment above:
How about returning a
std::optional<std::pair<CTransactionRef, NodeId>>
, and then adding another public methodHasTxToReconsider()
. It’s a bit less efficient, but we’ll only ever call it after processing a transaction, so it seems negligible.
187+ uint256 txid = *it;
188+ it = work_set.erase(it);
189+
190+ const auto orphan_it = m_orphans.find(txid);
191+ if (orphan_it != m_orphans.end()) {
192+ more = (it != work_set.end());
(if you’re keeping more
), the following is equivalent, and maybe more direct/clear:
0 more = !(work_set.empty());
utACK d2f5cc9de12196c870b2079bffc38e90076074d4.
Verified range-diff. Only significant difference was replacing std::multimap<NodeId, uint256> m_peer_work_set
with std::map<NodeId, std::set<uint256>> m_peer_work_set
.
2196@@ -2197,10 +2197,7 @@ void CConnman::ThreadMessageHandler()
2197 if (flagInterruptMsgProc)
2198 return;
2199 // Send messages
2200- {
2201- LOCK(pnode->cs_sendProcessing);
2202- m_msgproc->SendMessages(pnode);
2203- }
2204+ m_msgproc->SendMessages(pnode);
Previously this would have allowed concurrent executions of SendMessages()
for different peers, whereas now the concurrency is reduced by serializing all SendMessages()
under the newly introduced single mutex m_mutex_message_handling
.
Currently this code is executed by a single thread, so that is irrelevant, but in a possible future where we might want to do concurrent SendMessages()
for different peers, the deleted cs_sendProcessing
will have to be re-introduced.
Actually, sharing block headers/compact blocks messages in parallel has been discussed a while back (iirc https://bitcoincore.org/en/meetings/2018/05/03/, see “Call ProcessNewBlock asynchronously”) though we could insert some queued interface between a net_processing thread and multiple net threads ?
That said, if it’s direction we agree on, I think the discussion is worthy to have.
3961- }
3962+ LOCK(cs_main);
3963+ if (ProcessOrphanTx(*peer)) return true;
3964 }
3965
3966 if (pfrom->fDisconnect)
Before this PR we would have returned false
(no more work) if fDisconnect
was set regardless of the outcome of ProcessOrphanTx()
and regardless of whether peer->m_orphan_work_set
was empty.
Now fDisconnect
may be set, but we may return true
if ProcessOrphanTx()
returns true
.
If this change is not intentional, then maybe swap the order:
0 if (pfrom->fDisconnect) {
1 return false;
2 }
3
4 {
5 LOCK(cs_main);
6 if (ProcessOrphanTx(*peer)) return true;
7 }
return !pfrom->fDisconnect
to preserve the same behaviour.
146 {
147- AssertLockHeld(g_cs_orphans);
148+ LOCK(m_mutex);
149+
150+ // Get this peer's work set, ensuring it exists
151+ std::set<uint256>& orphan_work_set = m_peer_work_set.emplace(peer, std::set<uint256>{}).first->second;
Can be simplified?
0 std::set<uint256>& orphan_work_set = m_peer_work_set[peer];
try_emplace
which simplifies it slightly. Not really a fan of map::operator[]
97+ /** Map from txid to orphan transaction record. Limited by
98+ * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */
99+ OrphanMap m_orphans GUARDED_BY(m_mutex);
100+
101+ /** Which peer provided a parent tx of orphans that need to be reconsidered */
102+ std::map<NodeId, std::set<uint256>> m_peer_work_set GUARDED_BY(m_mutex);
s/map/unordered_map/
to make lookup O(1)
(we don’t rely on this being ordered by NodeId
). Same for the set of transaction ids.
0 std::unordered_map<NodeId, std::unordered_set<uint256>> m_peer_work_set GUARDED_BY(m_mutex);
2175+ bool more = false;
2176
2177- const auto [porphanTx, from_peer] = m_orphanage.GetTx(orphanHash);
2178- if (porphanTx == nullptr) continue;
2179+ while (m_orphanage.GetTxToReconsider(peer.m_id, porphanTx, from_peer, more)) {
2180+ if (!Assume(porphanTx)) break;
Assume()
seems unnecessary because GetTxToReconsider()
is documented to populate porphanTx
if it returns true
.
Assume()
makes sense if the value is derived in an obscure way. But not for checking whether a function has set its “out” parameters as documented.
For example, when calling CSHA256::Finalize(result)
we don’t set result
to a dummy value before the call and check that it is not that dummy value after the call with Assume()
. Same for any other function that has “out” parameters.
If you insist to check that porphanTx
was set by the function, then I guess you should do the same with from_peer
: Assume(from_peer != -1)
.
94 }
95 };
96
97+ /** Map from txid to orphan transaction record. Limited by
98+ * -maxorphantx/DEFAULT_MAX_ORPHAN_TRANSACTIONS */
99+ OrphanMap m_orphans GUARDED_BY(m_mutex);
TxOrphanage::m_orphans
to contain keys (transaction ids) which are not in TxOrphanage::m_peer_work_set
or the other way around?
m_orphans
will normally contain txids that aren’t in m_peer_work_set
– they only get added to the work set when a parent appears and are (hopefully) very quickly removed from the work set after retrying ATMP.
It’s also possible for something to expire from m_orphans
and still be present in m_peer_work_set
: if a tx that’s in the work set is removed via LimitOrphans
or EraseForBlock
you’ll get that case. It will eventually be removed from the work set when GetTxToReconsider
is called.
NACK the addition of m_mutex_message_handling.
Looks like neither one of us is convinced by the other’s arguments. It’s ok, it would have been too boring if everybody agreed on everything all the time. @vasild: Are you in a position where you can ACK this PR overall (with reservations) or would you prefer not to ACK it? Either is fine (obviously), I just suspect other reviewers may be waiting to see if your concerns are addressed or not before spending time reviewing this.
342+ * Reconsider orphan transactions after a parent has been accepted to the mempool.
343+ *
344+ * @param[in,out] peer The peer whose orphan transactions we will reconsider. Generally only one
345+ * orphan will be reconsidered on each call of this function. This peer's set
346+ * may be added to if accepting an orphan causes its children to be
347+ * reconsidered.
I think second part of this comment “This peer’s set may be added…” is confusing and annotating the parameter as an output doesn’t make sense anymore.
After this PR, parameter is no more a std::map<NodeId, std::set<uint256>>
and new one can be actually a const.
out
which doesn’t make sense. Knowing that this may increase the size of the set, not simply reduce it is valuable I think, so have left that text alone.
49- void EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(g_cs_orphans);
50+ void EraseForPeer(NodeId peer) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
51
52 /** Erase all orphans included in or invalidated by a new block */
53- void EraseForBlock(const CBlock& block) LOCKS_EXCLUDED(::g_cs_orphans);
54+ void EraseForBlock(const CBlock& block) EXCLUSIVE_LOCKS_REQUIRED(!m_mutex);
EXCLUSIVE_LOCKS_EXCLUDED(!(...))
? The logical negation operator is easy to miss for a reviewer.
2161- * reconsidered.
2162- */
2163-void PeerManagerImpl::ProcessOrphanTx(std::set<uint256>& orphan_work_set)
2164+bool PeerManagerImpl::ProcessOrphanTx(Peer& peer)
2165 {
2166+ AssertLockHeld(m_mutex_message_handling);
vExtraTxnForCompact
, there is already one in AddToCompactExtraTransactions
? Though it might be useful at branch tip ?
AssertLockHeld
to correspond with the EXCLUSIVE_LOCKS_REQUIRED
annotation as a matter of course (see the recommendation in developer-notes.md).
3968@@ -3943,11 +3969,6 @@ bool PeerManagerImpl::ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt
3969 if (!peer->m_getdata_requests.empty()) return true;
3970 }
3971
3972- {
3973- LOCK(g_cs_orphans);
3974- if (!peer->m_orphan_work_set.empty()) return true;
I think this introduce a lightweight behavior change.
Previously, if our peer has fDisconnect=true
set up from a previous message processing and the orphan work set is still non-empty after the ProcessOrphanTx
above, we would have return false
.
From now on, if the orphan work set is still non-empty after ProcessOrphanTx
, we’re going to return true
.
Though i don’t think it has any impact on allowing our peer to abuse more our CPU time, when we return from ProcessMessages
to ThreadMessageHandler
, we go to SendMessages
which calls MaybeDiscourageAndDisconnect
as first thing.
(Actually why the check on fDisconnect
isn’t first in ProcessMessages
, should be before any kind of processing work attempted with ProcessGetData
/ProcessOrphanTx
?)
fDisconnect
is already checked in net.cpp prior to ProcessMessages
being called, but there’s always a chance that there’s a race and fDisconnect
is set while in the middle of ProcessMessages
for the peer.
4323@@ -4303,6 +4324,8 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
4324 if (!peer) return false;
4325 const Consensus::Params& consensusParams = m_chainparams.GetConsensus();
4326
4327+ LOCK(m_mutex_message_handling);
/** Message handling mutex.
- Message processing is single-threaded, so anything only accessed
- by ProcessMessage(s) or SendMessages can be guarded by this mutex,
- which guarantees it’s only accessed by a single thread.
I’m confused. This mutex ends up protecting everything accessed in those functions. For example, in SendMessages
, it also protects MaybeDiscourageAndDisconnect
that accesses things like peer.m_misbehavior_mutex
, which can be accessed by the validation thread. So this mutex is guarding more things than what is only accessed by the ProcessMessage(s)
and SendMessages
thread.
I suppose it’s safe that this mutex also ends up guarding things that are accessed by multiple threads, but I find the comment confusing.
If you have:
0Mutex mutex;
1int x GUARDED_BY(mutex);
2int y;
3static void A() { LOCK(mutex); ++x; ++y; }
4static void B() { LOCK(mutex); --x; }
5void ThreadA() { for (int i = 0; i < 100; ++i) { A(); } }
6void ThreadB() { for (int i = 0; i < 100; ++i) { B(); } }
then I’d say that only x
is guarded by the mutex, even though y
is only actually accessed while the mutex is held. If you later introduce
0static void C() { y *= 2; }
then that would still be safe/correct if ThreadA
were to start calling C()
, but y
would no longer always be protected by the mutex; conversely if ThreadB
were to start calling C()
, then y
would no longer be thread safe, but because it’s not guarded by anything, the compiler would not catch that bug.
I guess I’d look at it more as mutexes protect code from being run simultaneously; it’s the GUARDED_BY annotations that protect the variables. If the variables aren’t annotated, they don’t have any automatic protections, only manual protection by the programmer constantly being careful.
I guess I’d look at it more as mutexes protect code from being run simultaneously
IMO mutexes protect variables, not code.
From https://en.wikipedia.org/wiki/Lock_(computer_science)
… a mechanism that enforces limits on access to a resource … … each thread cooperates by acquiring the lock before accessing the corresponding data …
Read your second quote carefully – it’s not the mutex/lock that protects the data, it’s the threads that protect it by cooperatively acquiring the lock before accessing the data.
It’s 99% true that the point of using a mutex is to protect data, but it’s not the mutex itself that ensures the data is protected, it’s how you use the mutex. It’s like looking both ways before crossing the road – if you don’t do that, you’ll probably walk into traffic; but looking alone isn’t enough, you have to make sure you do it before crossing the road, and depending on what you see, delay crossing the road. The point of annotations here is to ensure at compile time that you’re not metaphorically stepping into the road before looking, or literally not accessing variables that are only meant to be accessed by a particular thread from another thread.
(I think I’ve repeated this point enough now; if there’s something not clear, feel free to ask, but I’m not seeing any point to going around in circles another time)
2406@@ -2396,6 +2407,14 @@ void PeerManagerImpl::ProcessBlock(CNode& pfrom, const std::shared_ptr<const CBl
2407 void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,
PeerManagerImpl::ProcessMessage
is a test-only function? If so, I think the comment in net_processing.h
should be updated. Or even better, the function renamed to something like ProcessMessageTest
:)
PeerManager::ProcessMessage
has always only been exposed for tests, and is already documented that way in the header file:
0 /** Process a single message from a peer. Public for fuzz testing */
1 virtual void ProcessMessage(CNode& pfrom, const std::string& msg_type, CDataStream& vRecv,
PeerManagerImpl::ProcessMessage
is thus needed to complete the PeerManager
interface.
NACK the addition of m_mutex_message_handling.
Looks like neither one of us is convinced by the other’s arguments. It’s ok, it would have been too boring if everybody agreed on everything all the time.
@vasild: Are you in a position where you can ACK this PR overall (with reservations) or would you prefer not to ACK it? Either is fine (obviously), I just suspect other reviewers may be waiting to see if your concerns are addressed or not before spending time reviewing this.
The rest of this PR looks good overall (modulo some minor other comments I left). But the mutex m_mutex_message_handling
looks really strange to me, with a confusing comment.
I would re-review carefully and probably ACK this PR if the mutex is removed. If needed later then it can be added in another PR.
The (V1)TransportSerializer instance CNode::m_serializer is used from
multiple threads via PushMessage without protection by a mutex. This
is only thread safe because the class does not have any mutable state,
so document that by marking the methods and the object as "const".
Dereferencing a unique_ptr is not necessarily thread safe. The reason
these are safe is because their values are set at construction and do
not change later; so mark them as const and set them via the initializer
list to guarantee that.
m_permissionFlags and m_prefer_evict are treated as const -- they're
only set immediately after construction before any other thread has
access to the object, and not changed again afterwards. As such they
don't need to be marked atomic or guarded by a mutex; though it would
probably be better to actually mark them as const...
There are many cases where we asssume message processing is
single-threaded in order for how we access node-related memory to be
safe. Add an explicit mutex that we can use to document this, which allows
the compiler to catch any cases where we try to access that memory from
other threads and break that assumption.
SendMessages() is now protected g_mutex_msgproc_thread; so this additional
per-node mutex is redundant.
This allows CNode members to be marked as guarded by the
g_mutex_msgproc_thread mutex.
Previously vExtraTxnForCompact and vExtraTxnForCompactIt were protected
by g_cs_orphans; protect them by g_mutex_msgproc_thread instead, as they
are only used during message processing.
🐙 This pull request conflicts with the target branch and needs rebase.
Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.