This continues the work of moving application layer data into net_processing, by moving all block inventory state into the new Peer object added in #19607.
For motivation, see #19398.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
487 | @@ -488,6 +488,9 @@ struct Peer { 488 | /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */ 489 | bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false}; 490 | 491 | + /** Starting height of this peer */ 492 | + std::atomic<int> nStartingHeight{-1};
"Discovered at version message exchange. Used at tip update, to decide if our chain is fresh enough compared to peer's one to be worthy of headers announcement." ?
The comment doesn't need to document everything that the variable is used for. That's what the code does.
921 | @@ -928,9 +922,6 @@ class CNode 922 | // m_tx_relay == nullptr if we're not relaying transactions with this peer 923 | std::unique_ptr<TxRelay> m_tx_relay; 924 | 925 | - // Used for headers announcements - unfiltered blocks to relay
What's the distinction you loose by dropping unfiltered blocks to relay ?
I guess this is a reference to sort done at headers announcement, L4218 src/net_processing.cpp: https://github.com/bitcoin/bitcoin/blob/1cf73fb8eba3b89a30d5e943deaec5052a8b21b7/src/net_processing.cpp#L4218
Before to announce headers, we check both with regards to peer's context ("Does this header connect to known peer's tip ?") and our context ("Does this header is part of best chain ?"). It wasn't a clear comment, so yes either drop it or extend ?
I think drop it.
492 | + RecursiveMutex m_block_inv_mutex; 493 | + /** List of block ids we still have announce. 494 | + * There is no final sorting before sending, as they are always sent 495 | + * immediately and in the order requested. */ 496 | + std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex); 497 | + /** List of headers to relay */
vBlockHashesToAnnounce is a list of block hashes not headers ? With regards to renaming follow-up commit, I would lean towards the most verbose option m_blockids_for_inv_relay/m_blockids_for_headers_relay to denotate that these vectors contain an identifier even if the endgoal is to relay the identifiee (either block or headers).
Ok, I'll update the comment to "List of block ids to announce headers for".
I would lean towards the most verbose option m_blockids_for_inv_relay/m_blockids_for_headers_relay
Sure. I'll update.
Just calls for better comments.
Rebased
Concept ACK
rebased
Concept ACK.
487 | @@ -488,6 +488,14 @@ struct Peer { 488 | /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */ 489 | bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false}; 490 | 491 | + /** Protects block inventory data members */ 492 | + RecursiveMutex m_block_inv_mutex;
e04fb9a2df9f4d77c587bd25c608ce806c97ea45
Could using of RecursiveMutex type be avoided?
(as there was Mutex cs_inventory;)
Good catch! cs_inventory was a RecursiveMutex when I opened this PR, and I missed the change when I rebased. Fixed.
:flushed:
517 | + peer_copies.push_back(peer.second); 518 | + } 519 | + } 520 | + 521 | + for (auto&& peer : peer_copies) { 522 | + func(peer);
575c5f55cc44348f0ebc7740c90013b96bdb8c27
It is good for concurrency to limit the lock scope. But is it safe to call func on potentially stale peer_copies?
Yes. peer_copies holds shared_ptr references, so the objects pointed to can't be destructed while it's in scope.
I mean not language safety rather client behavior safety: if another thread removes a peer from the g_peer_map, can we rely on result of func(peer)?
Yes, because peer still exists in that case.
ACK 06841521613f1e7312f3f51597b43a20c193a363, tested on Linux Mint 20 (x86_64).
rebased
rebased
rebased
1581 | @@ -1542,6 +1582,7 @@ static void RelayAddress(const CAddress& addr, bool fReachable, const CConnman& 1582 | 1583 | void static ProcessGetBlockData(CNode& pfrom, const CChainParams& chainparams, const CInv& inv, CConnman& connman) 1584 | { 1585 | + PeerRef peer = GetPeerRef(pfrom.GetId());
In 8d3a068 I suppose this is placed at the top of the function, instead of where it is used 150 lines below, as you plan to add more uses of it to this function.
Correct. Future commits will use peer throughout the function, and there's no downside to holding on to the object for a bit longer.
947 | @@ -913,6 +948,10 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) { 948 | LOCK(cs_main); 949 | int misbehavior{0}; 950 | { 951 | + // We remove the Peer object from g_peer_map here, but it's not guaranteed that 952 | + // we'll destruct Peer here since another thread could potentially still hold 953 | + // a reference to the object. Be careful not to do any processing here that 954 | + // assumes Peer won't be changed before it's destructed. 955 | PeerRef peer = GetPeerRef(nodeid); 956 | assert(peer != nullptr);
In commit 85e7deb, perhaps go a bit further in the added comment to clarify that the refcount is often > 1 here.
In my testing, adding assert(peer.use_count() <= 1); after this line as suggested in #19607 (review) causes bitcoind on mainnet to halt nearly immediately after net processing begins and 1-2 peers start to connect.
Done
518 | @@ -502,6 +519,24 @@ using PeerRef = std::shared_ptr<Peer>; 519 | Mutex g_peer_mutex; 520 | static std::map<NodeId, PeerRef> g_peer_map GUARDED_BY(g_peer_mutex); 521 | 522 | +/** Get a reference to each Peer, then call a function on each of them */ 523 | +template<typename Callable> 524 | +void ForEachPeer(Callable&& func)
cff5d286 What are your plans with respect to this function ForEachPeer() in your roadmap...end game interface, or more of an intermediary step?
I don't have a strong opinion. I've done it this way here to minimize the difference in logic, but if people want to change the internal implementation later, I'm fine with that.
1146 | @@ -1148,6 +1147,8 @@ void RPCConsole::updateNodeDetail(const CNodeCombinedStats *stats) 1147 | ui->peerCommonHeight->setText(QString("%1").arg(stats->nodeStateStats.nCommonHeight)); 1148 | else 1149 | ui->peerCommonHeight->setText(tr("Unknown")); 1150 | + 1151 | + ui->peerHeight->setText(QString::number(stats->nodeStateStats.m_starting_height));
9b14eb372 verified manually that this displays as before, as there is no GUI test coverage. Unfortunately, it is broken at this commit and returns only the initial value of -1 until 3 commits later at f27a1e221d.
Oops. I think this was a bad rebase. Should be fixed now.
208 | // banscore is deprecated in v0.21 for removal in v0.22
209 | obj.pushKV("banscore", statestats.m_misbehavior_score);
210 | }
211 | obj.pushKV("synced_headers", statestats.nSyncHeight);
212 | obj.pushKV("synced_blocks", statestats.nCommonHeight);
213 | + obj.pushKV("startingheight", statestats.m_starting_height);
9b14eb37 verified this functions as before per ./src/bitcoin-cli getpeerinfo | grep 'startingheight' | sort, as there appear to be no functional tests for getpeerinfo#startingheight. As above, it is broken at this commit and returns only the initial value of -1. Functionality returns 3 commits later at f27a1e221d.
As above, should now be fixed.
496 | + std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex); 497 | + /** List of headers to relay */ 498 | + std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex); 499 | + 500 | + /** Starting height of this peer */ 501 | + std::atomic<int> m_starting_height{-1};
9b14eb372 m_starting_height presumably still needs to be std::atomic_int?
If you retouch, perhaps s/height/block height/ for the Doxygen doc.
m_starting_height presumably still needs to be std::atomic_int?
I don't understand. It's a std::atomic<int> before and after this PR.
If you retouch, perhaps s/height/block height/ for the Doxygen doc.
I've updated the Doxygen comment to be clearer.
2374 | @@ -2334,6 +2375,7 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat 2375 | return; 2376 | } 2377 | 2378 | + PeerRef peer = GetPeerRef(pfrom.GetId());
In general, is there a plan to eventually no longer need to set PeerRef peer = GetPeerRef(...) in many net processing methods?
Yes, potentially we could just pass a Peer& to many of the private functions (just as CNodeState* is passed to some of the functions now).
4238 | @@ -4196,7 +4239,7 @@ bool PeerManager::SendMessages(CNode* pto) 4239 | // Try to find first header that our peer doesn't have, and 4240 | // then send all headers past that one. If we come across any 4241 | // headers that aren't on ::ChainActive(), give up. 4242 | - for (const uint256 &hash : pto->vBlockHashesToAnnounce) { 4243 | + for (const uint256 &hash : peer->m_blocks_for_headers_relay) {
f27a1e221 nit, while touching this and lines 4334 and 4366 (3 places), if so inclined
for (const uint256& hash : peer->m_blocks_for_headers_relay) {
done
939 | @@ -940,8 +940,6 @@ class CNode 940 | mapMsgCmdSize mapRecvBytesPerMsgCmd GUARDED_BY(cs_vRecv); 941 | 942 | public: 943 | - uint256 hashContinue; 944 | - std::atomic<int> nStartingHeight{-1}; 945 |
8d3a068 nit, remove extra line created by this deletion
done
497 | + /** List of headers to relay */ 498 | + std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex); 499 | + 500 | + /** Starting height of this peer */ 501 | + std::atomic<int> m_starting_height{-1}; 502 | + /** The final block hash in a INV message. When the peer requests this
8d3a068 pico-nits, s/a INV/an INV/ and s/them/it|the peer/
fixed
1438 | @@ -1399,11 +1439,11 @@ void PeerManager::UpdatedBlockTip(const CBlockIndex *pindexNew, const CBlockInde 1439 | } 1440 | } 1441 | // Relay inventory, but don't relay old inventory during initial block download. 1442 | - m_connman.ForEachNode([nNewHeight, &vHashes](CNode* pnode) { 1443 | - LOCK(pnode->cs_inventory); 1444 | - if (nNewHeight > (pnode->nStartingHeight != -1 ? pnode->nStartingHeight - 2000 : 0)) { 1445 | + ForEachPeer([nNewHeight, &vHashes](const PeerRef& peer) { 1446 | + LOCK(peer->m_block_inv_mutex);
In 9b14eb37 is it safe to remove LOCK(pnode->cs_inventory); this early? Another lock doesn't replace it until 3 commits later in f27a1e22.
Oops. Bad rebase. Should be fixed now.
First pass through this follow-up to #19607. A few comments below; feel free to ignore the nits. My main concern is that several commits aren't hygienic as mentioned below and that we're missing a test that exhibits this without manually looking at getpeerinfo or the GUI peer window.
1010 | @@ -971,6 +1011,7 @@ bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) { 1011 | PeerRef peer = GetPeerRef(nodeid); 1012 | if (peer == nullptr) return false; 1013 | stats.m_misbehavior_score = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score); 1014 | + stats.m_starting_height = peer->m_starting_height;
0a9839058c
net_processing.cpp: In function ‘bool GetNodeStateStats(NodeId, CNodeStateStats&)’:
net_processing.cpp:982:11: error: ‘struct CNodeStateStats’ has no member named ‘m_starting_height’; did you mean ‘nStartingHeight’?
982 | stats.m_starting_height = peer->m_starting_height;
| ^~~~~~~~~~~~~~~~~
| nStartingHeight
fixed
The new changes (git diff 8d3a068 1a9e191) look good. One rebase fixup in 0a9839058c.
commits f9136f4bf, 7199d81968, 8ff3e9319c11 and 40ea4a42d9
net_processing.cpp: In function ‘bool GetNodeStateStats(NodeId, CNodeStateStats&)’:
net_processing.cpp:982/1000/1009: error: ‘struct CNodeStateStats’ has no member named ‘nStartingHeight’; did you mean ‘m_starting_height’?
982 | stats.nStartingHeight = peer->nStartingHeight;
| ^~~~~~~~~~~~~~~
| m_starting_height
(we'll get there!)
Oops. Sorry Jon. Should be good now.
Rebased
Rebased
This is a slightly difficult rebase since the commits need to be structured a little differently to all build. Will work on it tomorrow.
Rebased. This is now ready for review.
Running script for: fe391ca326f73337275d726f7de9db384ac35466
sed -i 's/vBlockHashesToAnnounce/m_blocks_for_headers_relay/g' src/net_processing.cpp
sed -i 's/vInventoryBlockToSend/m_blocks_for_inv_relay/g' src/net_processing.cpp
diff --git a/src/net_processing.h b/src/net_processing.h
index 1ad40982b..4998a1e04 100644
--- a/src/net_processing.h
+++ b/src/net_processing.h
@@ -66,9 +66,9 @@ struct Peer {
/** List of block ids we still have announce.
* There is no final sorting before sending, as they are always sent
* immediately and in the order requested. */
- std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex);
+ std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex);
/** List of headers to relay */
- std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);
+ std::vector<uint256> vBlockHashesToAnnounce GUARDED_BY(m_block_inv_mutex);
/** This peer's reported block height when we connected */
std::atomic<int> m_starting_height{-1};
Failed
Thanks @MarcoFalke. Should be fixed now.
1320 | - LOCK(pnode->cs_inventory); 1321 | - for (const uint256& hash : reverse_iterate(vHashes)) { 1322 | - pnode->vBlockHashesToAnnounce.push_back(hash); 1323 | + { 1324 | + LOCK(m_peer_mutex); 1325 | + for (auto& it : m_peer_map) {
a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1 : ForEachNode performs an additional check NodeFullyConnected (pnode->fSuccessfullyConnected && !pnode->fDisconnect;).
Not sure if that matters much here. Could we prematurely send headers before VERACK?
This is an excellent observation. Block hashes could be pushed into m_blocks_for_headers_relay before the version/verack handshake is complete. However, we'll never send any blocks inventory before we're fully connected due to the guard clause at the top of SendMessages():
I think this behaviour is ok. The version-verack handshake usually lasts no more than a second or two, and at the worst case times out after a minute, so we're never going to push many block hashes onto this vector.
a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1
Any reason to not have a ForEachPeer member function?
I originally had one but removed it because it didn't seem worth it. We can revisit that decision later if it seems useful.
ForEachNode was useful because it allowed passing a lambda into CConnman which would be executed while the cs_vNodes lock was held. Here, the m_peer_mutex is in PeerMan, so it can be held directly.
tACK b5b27ea modulo NodeFullyConnected clarification
230 | // banscore is deprecated in v0.21 for removal in v0.22
231 | obj.pushKV("banscore", statestats.m_misbehavior_score);
232 | }
233 | obj.pushKV("synced_headers", statestats.nSyncHeight);
234 | obj.pushKV("synced_blocks", statestats.nCommonHeight);
235 | + obj.pushKV("startingheight", statestats.nStartingHeight);
4ac61d85f1624b1cd7af4a85c8bca0bd430801f0
Position of "startingheight" in RPCHelpMan should be changed accordingly.
Will fix.
Fixed
Not sure if it is done in the latest push (58f4c8567ba77fbdbd65a12c74628505aacd0d3c).
60 | @@ -61,6 +61,15 @@ struct Peer { 61 | /** Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission). */ 62 | bool m_should_discourage GUARDED_BY(m_misbehavior_mutex){false}; 63 | 64 | + /** Protects block inventory data members */ 65 | + Mutex m_block_inv_mutex; 66 | + /** List of block ids we still have announce.
a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1
I understand that comment text has been preserved, but maybe
/** List of block ids we still have to announce.
?
Agree. Will fix.
Fixed
utACK b5b27ea052905ff7f26a962baa961fcc762c5ee0
Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add?
ACK b5b27ea052905ff7f26a962baa961fcc762c5ee0, tested on Linux Mint 20 (x86_64), I've verified getpeerinfo RPC output, and detailed peer info in the "Peers" tab of the "Node window".
1322 | - pnode->vBlockHashesToAnnounce.push_back(hash); 1323 | + { 1324 | + LOCK(m_peer_mutex); 1325 | + for (auto& it : m_peer_map) { 1326 | + Peer& peer = *it.second; 1327 | + LOCK(peer.m_block_inv_mutex);
I think this is the first place where we are acquiring a lock while already holding m_peer_mutex. Any thoughts on what is the best way to document this design, so that future code writers know that acquiring m_peer_mutex while holding a peer's m_block_inv_mutex is not allowed?
Or, would it be better to first copy all the Peer objects to a temporary place, release the m_peer_mutex lock, and then grab the per-peer m_block_inv_mutex locks to avoid holding both at the same time? (I don't know or have an opinion; just trying to figure out the locking design here.)
Would a comment next to m_peer_mutex suffice here? This design (take m_peer_mutex first, then lock the mutexes protecting the individual data members) is analogous to ForEachNode() was doing here before (take cs_vNodes first, then lock the mutexes protecting the individual data members in the lambda - here cs_inventory).
Added comments to the first commit. Let me know if you think more is required.
71 | @@ -72,6 +72,11 @@ struct Peer { 72 | 73 | /** This peer's reported block height when we connected */ 74 | std::atomic<int> m_starting_height{-1}; 75 | + /** The final block hash in an INV message. When the peer requests this 76 | + * block, we send a BLOCK message to trigger the peer to request the next
We send an INV message, not a BLOCK message.
Yes, will fix.
Done.
Concept ACK, just one question about locking and a doc nit.
Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add? @sipa could we just make it a std::atomic? It doesn't need to be synchronized with any other data, so as long as we're guaranteeing that reads/writes are atomic, I think that's enough?
1843 | @@ -1843,7 +1844,8 @@ void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHe 1844 | // Headers message had its maximum size; the peer may have more headers. 1845 | // TODO: optimize: if pindexLast is an ancestor of ::ChainActive().Tip or pindexBestHeader, continue 1846 | // from there instead. 1847 | - LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", pindexLast->nHeight, pfrom.GetId(), pfrom.nStartingHeight); 1848 | + PeerRef peer = GetPeerRef(pfrom.GetId());
4ac61d85f1624b1cd7af4a85c8bca0bd430801f0:
Can you explain why a nullptr-Assert is not needed here? Also, why isn't the peer passed in from outside, where it already exists?
Yes, passing in from outside is a much better idea. I've done that now.
65 | + Mutex m_block_inv_mutex; 66 | + /** List of block ids we still have announce. 67 | + * There is no final sorting before sending, as they are always sent 68 | + * immediately and in the order requested. */ 69 | + std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex); 70 | + /** List of headers to relay */
a50dee50fd2aee93b41bfb3b04b24939d0c2f6c1
The list is unfiltered, as was mentioned in the previous comment. Any reason to remove that?
Ok, I've added the word unfiltered back.
65 | @@ -66,9 +66,9 @@ struct Peer { 66 | /** List of block ids we still have announce. 67 | * There is no final sorting before sending, as they are always sent 68 | * immediately and in the order requested. */ 69 | - std::vector<uint256> vInventoryBlockToSend GUARDED_BY(m_block_inv_mutex); 70 | + std::vector<uint256> m_blocks_for_inv_relay GUARDED_BY(m_block_inv_mutex); 71 | /** List of headers to relay */ 72 | - std::vector<uint256> vBlockHashesToAnnounce GUARDED_BY(m_block_inv_mutex); 73 | + std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex);
22c2b3eff601cbc1aed2e466502bdc5549e11ae2
I don't think this name is correct. Headers relay isn't implied. It might very well fall back to inv relay. Most importantly when the peer asked us to. Though, there are other reasons, too.
We hope to announce these blocks via headers. If we can't, we'll move the hashes onto the m_blocks_for_inv_relay and then announce via inv.
I've updated the comment. Let me know if you think it's clear enough now.
Thanks. Looks good now.
71 | @@ -72,6 +72,11 @@ struct Peer { 72 | 73 | /** This peer's reported block height when we connected */ 74 | std::atomic<int> m_starting_height{-1}; 75 | + /** The final block hash in an INV message. When the peer requests this
b5b27ea052905ff7f26a962baa961fcc762c5ee0
I don't think this is true either. m_hash_continue is a purely internal helper, never sent over the wire directly. Instead, the tip hash is sent to trigger the peer to ask further.
I've updated the comment to be "The final block hash that we sent in an inv message to this peer." Let me know if you have suggestions for better wording.
Thanks. Looks good now.
review ACK b5b27ea052905ff7f26a962baa961fcc762c5ee0 I reviewed the code and it probably wont crash, but I had a few questions 🚟
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
review ACK b5b27ea052905ff7f26a962baa961fcc762c5ee0 I reviewed the code and it probably wont crash, but I had a few questions 🚟
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUioOwv6Amw0DZDkCn50bwGUmV7m2mPj2cL5zV9XhpbPUhiMa/emED2JTAu020/z
LQZQp0pzHsSOCtBzVbfbi4GTK45AEfAV2FEMgMaQNO0QQIdFXD86angiyLZuojbM
uq5gnQS9d2uPHgfbWpnGVY7VEMIrIt5wDwXi81geVErEYT3yIf6d7sKazeq0Rw2t
VIIYUsWzESgJpw7iSMVuqP/Du9zuIkzreaLAi3VyJ6BqVxPDKF0OTVE4rsPh0UTN
UAYyz0TczihypdAUm1KQGYeLuHORg+OpDf0ErRwdrahzz1Pk6i6qceJt6Xiaatmy
F0VvaGCxRHCfZp2OGjAmp+Dv8v5AWLYnx4h4dobg+IleEVfOnQEp7G6guB/z30Ht
6mRu6x3PkFJH/LSfSmxRQLuJoteM+ZaPpm15hPrT+eismtk77EslGYDjo4SiUNxM
4u+r8fycyixslocpoByX8wTF0lrsY+YYH5hQINL1UIuQY70qPq9GLQlsTuJAZQyk
nNGzdlSe
=l+a9
-----END PGP SIGNATURE-----
Timestamp of file with hash aea33e795fcbc5727e94f7d8ccebdce44be9f051f6b87be26df0c7f774cd7c32 -
</details>
Thanks for the review @MarcoFalke . There are enough review comments here that it makes sense to address them in this PR instead of a follow-up. It also looks to me like this isn't entirely a clean merge, so I'm going to rebase on master.
Suggested here:
- https://github.com/bitcoin/bitcoin/pull/19607#discussion_r467071878
- https://github.com/bitcoin/bitcoin/pull/19829#discussion_r546116786
Addressed all comments.
Is m_hash_continue protected by any locks? A "this is only accessed by the singular net_processing thread" rationale would be sufficient, but if there happens to be a lock that's held in all access locations, perhaps it's easy to add? @sipa I've added a commit to guard m_hash_continue with m_block_inv_mutex.
1774 | @@ -1764,7 +1775,9 @@ void PeerManager::SendBlockTransactions(CNode& pfrom, const CBlock& block, const 1775 | m_connman.PushMessage(&pfrom, msgMaker.Make(nSendFlags, NetMsgType::BLOCKTXN, resp)); 1776 | } 1777 | 1778 | -void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHeader>& headers, bool via_compact_block) 1779 | +void PeerManager::ProcessHeadersMessage(CNode& pfrom, const Peer& peer, 1780 | + const std::vector<CBlockHeader>& headers, 1781 | + bool via_compact_block)
2f023a8 nit, the diff would be clearer without the line break changes, and I also find the code less readable here than on one line
This is permitted by our clang formatter. I find it easier to read lines that are less than 100 columns, so I think this is a matter of taste.
1866 | @@ -1854,7 +1867,8 @@ void PeerManager::ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHe 1867 | // Headers message had its maximum size; the peer may have more headers. 1868 | // TODO: optimize: if pindexLast is an ancestor of ::ChainActive().Tip or pindexBestHeader, continue 1869 | // from there instead. 1870 | - LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", pindexLast->nHeight, pfrom.GetId(), pfrom.nStartingHeight); 1871 | + LogPrint(BCLog::NET, "more getheaders (%d) to end to peer=%d (startheight:%d)\n", 1872 | + pindexLast->nHeight, pfrom.GetId(), peer.m_starting_height);
2f023a8 nit, as above, easier diff to review and more readable without the added line break
Again, this is personal taste, so I'm going to resolve this.
201 | @@ -180,7 +202,9 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface 202 | 203 | void ProcessOrphanTx(std::set<uint256>& orphan_work_set) EXCLUSIVE_LOCKS_REQUIRED(cs_main, g_cs_orphans); 204 | /** Process a single headers message from a peer. */ 205 | - void ProcessHeadersMessage(CNode& pfrom, const std::vector<CBlockHeader>& headers, bool via_compact_block); 206 | + void ProcessHeadersMessage(CNode& pfrom, const Peer& peer,
2f023a8 nit, as above, easier diff to review and more readable here without the line break changes
as above
230 | // banscore is deprecated in v0.21 for removal in v0.22
231 | obj.pushKV("banscore", statestats.m_misbehavior_score);
232 | }
233 | obj.pushKV("synced_headers", statestats.nSyncHeight);
234 | obj.pushKV("synced_blocks", statestats.nCommonHeight);
235 | + obj.pushKV("startingheight", statestats.m_starting_height);
In 2f023a8c0e445985a20, startingheight should remain before synced_headers and synced_blocks (here and in the help) for consistency with the GUI peers info (or maybe less preferably, the GUI should be updated to follow this change)
done
75 | + std::vector<uint256> m_blocks_for_headers_relay GUARDED_BY(m_block_inv_mutex); 76 | + /** The final block hash that we sent in an `inv` message to this peer. 77 | + * When the peer requests this block, we send an `inv` message to trigger 78 | + * the peer to request the next sequence of block hashes. 79 | + * Most peers use headers-first syncing, which doesn't use this mechanism */ 80 | + uint256 m_hash_continue GUARDED_BY(m_block_inv_mutex) {};
25ccfd3b while renaming hashContinue, I wonder if it can be better named (but don't have one to propose)
Changed to m_continuation_block
ACK 9aad3e4f3347990641da23c0d9a3f05e1669cabf modulo the RPC fields/help order (see comment below)
Concept ACK
45 | @@ -46,6 +46,8 @@ struct CNodeStateStats { 46 | * Memory is owned by shared pointers and this object is destructed when 47 | * the refcount drops to zero. 48 | * 49 | + * Mutexes inside this struct must not be held when locking m_peer_mutex.
717a374e74b64b7b90bc1b2995e8900212bd0bfe
nit to avoid the inversion and give a rationale:
"m_peer_mutex should be locked before locking a mutex inside this struct to avoid lock order violations"
This will also weaken the requirement, because for some mutexes it could make sense to not require m_peer_mutex beforehand.
That isn't precise enough. It's fine to lock m_peer_mutex, take a shared_ptr to one of the Peer objects, release m_peer_mutex and later lock one of the inner mutexes. The only rule is that we don't want to lock m_peer_mutex while already holding a lock on one of the inner mutexes.
This is the same as cs_vNodes and the inner mutexes in CNode.
Ah right. I didn't parse this correctly. Thanks for the explanation.
211 | @@ -210,7 +212,8 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface 212 | * on extra block-relay-only peers. */ 213 | bool m_initial_sync_finished{false}; 214 | 215 | - /** Protects m_peer_map */ 216 | + /** Protects m_peer_map. This mutex must not be locked while holding a lock 217 | + * on any of the mutexes inside a Peer object. */
same
as above
1622 | - // wait for other stuff first. 1623 | - std::vector<CInv> vInv; 1624 | - vInv.push_back(CInv(MSG_BLOCK, ::ChainActive().Tip()->GetBlockHash())); 1625 | - connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::INV, vInv)); 1626 | - peer.m_hash_continue.SetNull(); 1627 | + LOCK(peer.m_block_inv_mutex); {
What syntax is this? I don't think we use LOCK(); { ... } anywhere in the codebase
Also, doesn't this lock violate the documentation you added in the first commit?
This is my brain not working properly. I've fixed the syntax.
This doesn't violate the documentation. That says that it's not permitted to lock m_peer_map while holding a lock on an inner mutex. This is locking an inner mutex.
review ACK 9aad3e4f33 🔱
Changes since previous review:
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
review ACK 9aad3e4f33 🔱
Changes since previous review:
* Add lock-order docs
* Pass in Peer& to ProcessHeadersMessage
* Shuffle order in RPC doc
* Rebase
* Doc fixups
* New commit with a lock
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUhhwgv+M611eBCdMzaHcsl/tL05mOFVC94Nc9F7t5MCW8iuNlEWaT7yzpQaPV5g
8L9MZqFGi7YpPVrL2y12ky0MYBfG9MdploG2F53O4WvLnyEa6PXw04HZKqlv9yfS
qcJEwjJc+XvKPohl6NZg91GDyofVttKI8bqCqjPhEbZYCi2+RQ9o0OZ1PJEr8b6L
KgtXpj+YgtpRwpH5ny/E/mR5LwfLFmxYNmgQlEpMvjIqyxcSUgrYXn77vxMJI//T
Y2g5NowkMo43S12fT6KWJk9eMoRy+L2kgrCD9LI0r/wuY1B+9rCamBgzeBKIuk6f
EfFnS33FIG8xcT21+x5g38J6jlHz6HlSy/huA31ALBhDItMTR+dm7VI+1KXKTTag
nhN9S/ZVWXjjPaZKzbAmKguPRbJWdOcoWdyAlZy+S4qeC+wdEqT+GBGvhj0TZ3du
56VMY6sNhdG8kUQr4KuwugereUtKQDJvHMehBH6ki6br/G6W71kdpCYPhxK5fZkx
khylqj75
=v6fJ
-----END PGP SIGNATURE-----
Timestamp of file with hash 431bceaaa7cf87a17743f11d77f4a61c5cdf9d1374a5f4fcd4e5683b564af330 -
</details>
Not done as a scripted diff to avoid misnaming the local variable in
ProcessMessage().
-BEGIN VERIFY SCRIPT-
sed -i 's/vBlockHashesToAnnounce/m_blocks_for_headers_relay/g' src/net_processing.*
sed -i 's/vInventoryBlockToSend/m_blocks_for_inv_relay/g' src/net_processing.*
-END VERIFY SCRIPT-
Also rename to m_continuation_block to better communicate meaning.
Thanks for the latest reviews @jonatack and @MarcoFalke. I've addressed/responded to all of your comments.
I'd like to stabilize this PR now unless any functional issues are found. We have ACKs on the code change at various times from @MarcoFalke, @jonatack, @hebasto, @sipa and @Sjors. If there are any further style suggestions, I can address them in the next PR in this series.
re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662 🌓
only changes:
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662 🌓
only changes:
* Move RPC help once more
* Fix LOCK scope {} nit
* Rename a member
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUirewv/WiQMgzv/dZCAqCSBX8qa6mPZiEw4j5mOPAnylVH3P+qiQGEXrHpKVjTj
XO0rGPI+dEvej90YVSThX231sBS1VXvP4hv8JKYHVdoAVP2j3XbmEmtZrPoGqViR
eMtblmUALv5A8OfkquBDUb/PVDr9FxUycTe01tGxyxhcLtaTfoIcq+6Elh7NkWMm
tgszY1rz2S7pjYU3GH/an+v5BYOBAi3HG5vHETEICQGjabAEduMZFF2bueQZIj1Q
CkVhnj/mcFx7Fypc0J1q5LsPonuPHVfqMK38dO6iFxcpLfLZKwmcnXAJ7OFnHkQF
JJ1zXA5HRixl5UKakZqUkjhjroD71vfODzfhSLaszEZLqeGXx0+N17CAd8Vgblk2
ReLskQfzmco+aHWX6Hl7CKj7+yTszpWTMvmMvkl08EaNnAGuGBbF1TgI8oA213Ry
QwjwlP5Ohx7Iy33qtysxAq5/4jqypu/srlhxcfFCidi+AvujeZqcyXLztxw54zLj
cWdIBtpT
=PVp9
-----END PGP SIGNATURE-----
Timestamp of file with hash 03268669e27bdf510ed13b406639896a3bedb0573f1b2d13c4c88ed636966237 -
</details>
re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662 per git diff 9aad3e4 3002b4a
Code review re-ACK 3002b4af2b4fde63026f8f7c575452a5c989c662