This continues the work of moving application layer data into net_processing, by moving all tx data into the new Peer object added in #19607.
For motivation, see #19398.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
cs_tx_inventory with Mutex and rename it by w0xlt)If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
Rebase resulted in silent merge conflict with 9476886353dffb730dcb75799f2bd5e143425795. I'll fix tomorrow.
I've rebased this and moved it out of draft. It's now ready for review.
572 | - /** Minimum fee rate with which to filter inv's to this node */ 573 | - std::atomic<CAmount> minFeeFilter{0}; 574 | - CAmount lastSentFeeFilter{0}; 575 | - std::chrono::microseconds m_next_send_feefilter{0}; 576 | - }; 577 | + /** Whether this peer asked us to relay transactions. This only changes from
/** Whether we should relay transactions to this peer (he asked us to in his version message and this is not a `ConnectionType::BLOCK_RELAY` connection). This only changes from
That's better. I've updated the comment.
564 | @@ -565,12 +565,6 @@ void CNode::copyStats(CNodeStats &stats, const std::vector<bool> &m_asmap)
565 | X(addrBind);
(comment on irrelevant line)
In the commit message of 465e9da51c56beaa97f5ffcdf2132471b591a8d5 [net] Add CNode.m_relays_txs and CNode.m_bloom_filter_loaded:
This is currently redundant information with m_tx_relay->fRelayTxes,
- but m_tx_relay is moved into net_processing, then we'll need these
+ but when m_tx_relay is moved into net_processing, then we'll need these
separate fields in CNode.
Thanks! Fixed.
975 | HasAllDesirableServiceFlags(node->nServices), 976 | - peer_relay_txes, peer_filter_not_null, node->nKeyedNetGroup, 977 | - node->m_prefer_evict, node->addr.IsLocal()}; 978 | + node->m_relays_txs.load(), node->m_bloom_filter_loaded.load(), 979 | + node->nKeyedNetGroup, node->m_prefer_evict, node->addr.IsLocal()}; 980 | vEvictionCandidates.push_back(candidate);
This can be reduced to just:
vEvictionCandidates.push_back(*node);
By using a conversion function from CNode to NodeEvictionCandidate.
Maybe out of the scope of this PR.
Seems reasonable, but I agree it's out of scope for this PR.
1517 | - pnode->PushTxInventory(wtxid); 1518 | - } else { 1519 | - pnode->PushTxInventory(txid); 1520 | + LOCK(m_peer_mutex); 1521 | + for(auto& it : m_peer_map) { 1522 | + Peer& peer = *it.second;
for (auto& [node_id, peer] : m_peer_map) {
That means node_id is unused, which causes warnings depending on your compiler flags.
Which compiler flags produce a warning? I don't get any with either clang 12 or gcc 10:
std::map<int, int> m;
for (const auto& [x, y] : m) { // no warning
std::cout << x;
}
for (const auto& [x, y] : m) { // warning: unused structured binding declaration [-Wunused-variable]
}
Hmmm, if you search for "c++ structured binding unused variable", there do seem to be a lot of people claiming that they've encountered this warning. Like you, I've been unable to reproduce it, either locally or on godbolt.org with various compiler versions.
In any case, I'm going to leave this as it is, since it's only one more line and no less clear to explicitly declare the Peer object. Thank you for pointing this out though!
1022 | @@ -983,6 +1023,7 @@ void PeerManagerImpl::FinalizeNode(const CNode& node, bool& fUpdateConnectionTim 1023 | PeerRef peer = RemovePeer(nodeid); 1024 | assert(peer != nullptr); 1025 | misbehavior = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score); 1026 | + m_wtxid_relay_peers -= peer->m_wtxid_relay;
This was moved from below without the accompanying assert, better move the assert with it.
m_wtxid_relay_peers -= peer->m_wtxid_relay;
assert(m_wtxid_relay_peers >= 0);
Done.
Now the move of the decrement happens in commit
[net processing] Move m_wtxid_relay to Peer
while the move of the assert happens in
[net processing] Move tx relay data to Peer.
I think both moves belong to the first commit.
Oops. Now fixed. Thank you!
1046 | @@ -1006,7 +1047,6 @@ void PeerManagerImpl::FinalizeNode(const CNode& node, bool& fUpdateConnectionTim 1047 | assert(nPeersWithValidatedDownloads >= 0); 1048 | m_outbound_peers_with_protect_from_disconnect -= state->m_chain_sync.m_protect; 1049 | assert(m_outbound_peers_with_protect_from_disconnect >= 0); 1050 | - m_wtxid_relay_peers -= state->m_wtxid_relay; 1051 | assert(m_wtxid_relay_peers >= 0);
This assert can/should be moved up, where the decrement was moved to.
Done.
2644 | @@ -2597,9 +2645,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 2645 | return; 2646 | } 2647 | if (pfrom.GetCommonVersion() >= WTXID_RELAY_VERSION) { 2648 | - LOCK(cs_main); 2649 | - if (!State(pfrom.GetId())->m_wtxid_relay) { 2650 | - State(pfrom.GetId())->m_wtxid_relay = true; 2651 | + if (!peer->m_wtxid_relay) { 2652 | + peer->m_wtxid_relay = true; 2653 | m_wtxid_relay_peers++;
Just a note:
Before this change both CNodeState::m_wtxid_relay and PeerManagerImpl::m_wtxid_relay_peers would have been modified under cs_main. After this change Peer::m_wtxid_relay and PeerManagerImpl::m_wtxid_relay_peers are not anymore modified together under cs_main.
This means that after peer->m_wtxid_relay = true and before m_wtxid_relay_peers++ the system is in an observable state where the sum of all Peer::m_wtxid_relay does not equal PeerManagerImpl::m_wtxid_relay_peers.
I think this is ok - couldn't find any code that would be upset by that.
Yes, that's a good observation. I agree that this is totally fine, and the fields don't need to be atomically updated under cs_main.
249 | @@ -217,7 +250,10 @@ struct Peer { 250 | /** Work queue of items requested by this peer **/ 251 | std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex); 252 | 253 | - explicit Peer(NodeId id) : m_id(id) {} 254 | + explicit Peer(NodeId id, bool blocks_only)
explicit can be removed because now we have 2 arguments.
Nice! Removed.
647 | +{ 648 | + if (peer.m_tx_relay != nullptr) { 649 | + LOCK(peer.m_tx_relay->m_tx_inventory_mutex); 650 | + peer.m_tx_relay->m_tx_inventory_known_filter.insert(hash); 651 | + } 652 | +}
Why not a Peer method?
I'm trying to keep Peer as a data structure without any internal logic.
1518 | - } else { 1519 | - pnode->PushTxInventory(txid); 1520 | + LOCK(m_peer_mutex); 1521 | + for(auto& it : m_peer_map) { 1522 | + Peer& peer = *it.second; 1523 | + if (!peer.m_tx_relay) return;
nit: I think the rest of the code around m_tx_relay uses == nullptr or != nullptr.
We don't have a project style on whether to compare to nullptr or use a pointer's bool conversion. Both are fine.
4505 | - pto->m_tx_relay->nNextInvSend = m_connman.PoissonNextSendInbound(current_time, INBOUND_INVENTORY_BROADCAST_INTERVAL); 4506 | + peer->m_tx_relay->m_next_inv_send_time = m_connman.PoissonNextSendInbound(current_time, INBOUND_INVENTORY_BROADCAST_INTERVAL); 4507 | } else { 4508 | - pto->m_tx_relay->nNextInvSend = PoissonNextSend(current_time, OUTBOUND_INVENTORY_BROADCAST_INTERVAL); 4509 | + // Use half the delay for outbound peers, as there is less privacy concern for them. 4510 | + peer->m_tx_relay->m_next_inv_send_time = PoissonNextSend(current_time, OUTBOUND_INVENTORY_BROADCAST_INTERVAL);
This comment // Use half... was removed from master when {INBOUND,OUTBOUND}_INVENTORY_BROADCAST_INTERVAL were introduced. I think it better be removed because it implies OUTBOUND is INBOUND/2.
Oops! Thank you. Removed.
1113 | @@ -1114,7 +1114,6 @@ void RPCConsole::updateDetailWidget() 1114 | peerAddrDetails += "<br />" + tr("via %1").arg(QString::fromStdString(stats->nodeStats.addrLocal)); 1115 | ui->peerHeading->setText(peerAddrDetails); 1116 | ui->peerServices->setText(GUIUtil::formatServicesStr(stats->nodeStats.nServices)); 1117 | - ui->peerRelayTxes->setText(stats->nodeStats.fRelayTxes ? ts.yes : ts.no);
This change would only set the text if stats->fNodeStateStatsAvailable is true and will leave it "uninitialized" otherwise. What would be displayed in that case? Empty? Should something like n/a be displayed?
This looks like a regression in the UI - previously that text would always have been set (it was always known) and now - not anymore.
In reality, I think fNodeStateStatsAvailable will almost always be true, so this isn't a big issue (if it was, then the other stats in nodeStateStats not being available would also be an issue).
192 | @@ -193,7 +193,6 @@ static RPCHelpMan getpeerinfo() 193 | } 194 | obj.pushKV("services", strprintf("%016x", stats.nServices)); 195 | obj.pushKV("servicesnames", GetServicesNames(stats.nServices)); 196 | - obj.pushKV("relaytxes", stats.fRelayTxes);
Similarly to the UI - the fields relaytxes and minfeefilter would have always been present before and now - only if fStateStats is true.
As above for the gui peer console.
It seems sub-optimal that new fields are added - CNode::m_relays_txs and CNode::m_bloom_filter_loaded that are redundant/duplicates of Peer::TxRelay::m_relay_txs and Peer::TxRelay::m_bloom_filter != nullptr. This could be confusing and leaves a chance that in the future they go out of sync. Can the duplicates be avoided somehow?
There are two user-visible changes that may be considered as regressions in the UI and in RPC.
Rebased
Doesn't compile
Thanks Marco. Bad rebase should be fixed now.
It seems sub-optimal that new fields are added - CNode::m_relays_txs and CNode::m_bloom_filter_loaded that are redundant/duplicates of Peer::TxRelay::m_relay_txs and Peer::TxRelay::m_bloom_filter != nullptr. This could be confusing and leaves a chance that in the future they go out of sync. Can the duplicates be avoided somehow?
That does seem a bit of a shame, but note that even though the data seems redundant, it's used for different purposes in net_processing and net. TxRelay in net_processing is data that is used in relaying transactions to that peer. m_relay_txs and m_bloom_filter in net are simply metrics used in the eviction logic. net_processing informs net of any events that might be used in net's eviction logic (e.g. min ping time, last tx received, last block received), and these new metrics (m_relay_txs and m_bloom_filter) are no different from those.
Thanks for the re-review @vasild. I've updated the commits to address #21160 (review).
208 | @@ -209,6 +209,39 @@ struct Peer { 209 | /** Whether a ping has been requested by the user */ 210 | std::atomic<bool> m_ping_queued{false}; 211 | 212 | + /** Whether this peer relays txs via wtxid */ 213 | + bool m_wtxid_relay{false};
I thought this could be const because you'd get the info from the VERSION and it wouldn't ever change, but then I realized the Peer struct is created before then, so it couldn't be const... is that accurate?
Correct. The Peer object is constructed before we receive the wtxidrelay message, so we can't make this const.
Rebased
rebased
Rebased
250 | @@ -218,7 +251,10 @@ struct Peer { 251 | /** Work queue of items requested by this peer **/ 252 | std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex); 253 | 254 | - explicit Peer(NodeId id) : m_id(id) {} 255 | + Peer(NodeId id, bool blocks_only)
Why'd you delete the explicit?
No good reason. I've put it back.
explicit prevents implicit conversions for single-argument constructors. After the change above, the constructor has 2 arguments. See https://stackoverflow.com/a/121163
explicit will also prevent implicit conversion for multiple-argument constructors. Is there a reason why {int,bool} should implicitly be converted into Peer?
Was this addressed?
fixed now
1528 | - } else { 1529 | - pnode->PushTxInventory(txid); 1530 | + LOCK(m_peer_mutex); 1531 | + for(auto& it : m_peer_map) { 1532 | + Peer& peer = *it.second; 1533 | + if (!peer.m_tx_relay) return;
Why is this return instead of break? Maybe I'm misunderstanding what this is doing, but I thought that we'd continue iterating through the others to relay to them, but returning seems like we're just skipping over the rest?
Yikes. Good catch. In a previous iteration of this PR, I had a ForEachPeer() helper that took a lambda, and this return was effectively a continue (the lambda would return and then be called for the remaining peers). Now fixed!
Thanks for the review @glozow. I've rebased on master and addressed your review comments.
#21527 moves around the locks in net_processing, which interacts a bit with how this PR moves m_wtxid_relay around. I think it makes sense to focus on merging that PR first, since that could simplify the locking here a bit.
I'd rebased this on top of some of the work in #21527, since they conflicted a little. However, it seems that 21527 has now been abandoned, so I've removed that commit and marked this PR ready for review again. If 21527 does get picked up again and merged, then it should be easy enough to rebase this on top of it.
4766 | // Time to send but the peer has requested we not relay transactions.
4767 | if (fSendTrickle) {
4768 | - LOCK(pto->m_tx_relay->cs_filter);
4769 | - if (!pto->m_tx_relay->fRelayTxes) pto->m_tx_relay->setInventoryTxToSend.clear();
4770 | + LOCK(peer->m_tx_relay->m_bloom_filter_mutex);
4771 | + if (!peer->m_tx_relay->m_relay_txs) peer->m_tx_relay->m_tx_inventory_to_send.clear();
I have a question about this code (not related to this PR though) - why don't we lock m_tx_inventory_mutex here?
Hmmm, actually m_tx_inventory_mutex is held at this point (see line 4747 above):
if (peer->m_tx_relay != nullptr) {
LOCK(peer->m_tx_relay->m_tx_inventory_mutex);
...
In fact, all access of m_tx_inventory_to_send (and setInventoryTxToSend in master) is guarded by m_tx_inventory_mutex (cs_tx_inventory in master). It looks like an oversight in #13123 that a lock annotation wasn't added to setInventoryTxToSend. It could also be added to m_next_inv_send_time (nNextInvSend in master).
We could add those annotations in a follow-up, since this branch is simply moving the data members and not changing any behavior here. I also have a separate follow-up branch that rationalizes these mutexes to a single mutex.
Rebased on master
3204 | @@ -3212,6 +3205,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3205 | pfrom.AddKnownTx(txid); 3206 | } 3207 | 3208 | + LOCK2(cs_main, g_cs_orphans); 3209 | +
In 203bfe156b0a25a36c19fba397c8b2577bab1e9b: Noted that the line LOCK2(cs_main, g_cs_orphans) was moved down a few lines, and my reasoning for why this is okay was:
peer->m_wtxid_relay, wouldn't change, and either way wouldn't require cs_main or g_cs_orphanscs_tx_inventory and that's grabbed within AddKnownTxRight. This LOCK(cs_main) was required to access the CNodeState in the line below. Now that m_wtxid_relay is in Peer, we don't need to hold cs_main until a little bit later (when we access m_txrequest).
1194 | @@ -1199,6 +1195,8 @@ void PeerManagerImpl::FinalizeNode(const CNode& node) 1195 | PeerRef peer = RemovePeer(nodeid); 1196 | assert(peer != nullptr); 1197 | misbehavior = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score); 1198 | + m_wtxid_relay_peers -= peer->m_wtxid_relay;
In 203bfe156b0a25a36c19fba397c8b2577bab1e9b: not changed in this PR, but it's odd to me that we're subtracting a bool from an int? Why isn't it if (peer->m_wtxid_relay) m_wtxid_relay_peers -= 1; ?
I agree that would be clearer.
234 | // a) it allows us to not relay tx invs before receiving the peer's version message 235 | // b) the peer may tell us in its version message that we should not relay tx invs 236 | // unless it loads a bloom filter. 237 | - bool fRelayTxes GUARDED_BY(cs_filter){false}; 238 | - std::unique_ptr<CBloomFilter> pfilter PT_GUARDED_BY(cs_filter) GUARDED_BY(cs_filter){nullptr}; 239 | + bool m_relay_txs GUARDED_BY(m_bloom_filter_mutex){false};
In 2533ea5161d2cb15b3a43e0fea9b146f525aea97
Noting that we have 3 variables named m_relay_txs: in NodeEvictionCandidate, Peer.TxRelay, and CNodeStateStats. And we have a CNode.m_relays_txs (very similar but not same name) where CNode.m_relays_txs == Peer.m_tx_relay.m_relay_txs always, and they are not the same thing as Peer.m_tx_relay != nullptr.
Yes, this is slightly unfortunate. Please let me know if you have suggestions for improving the naming.
Is there a reason for the bool to exist? What would happen if we delayed initialization of the whole txrelay struct to when the version message is received?
Something similar is done for address relay, IIRC.
@MarcoFalke yes, that's the plan. This PR is intended to be a simple refactor, and delaying initialization of the TxRelay struct requires more focused discussion so should be done in a separate PR.
I've opened PR #22778, which implements delaying constructing m_tx_relay until the version message is received. Note however that m_relay_txs is still required - a peer that we've offered NODE_BLOOM services to may send us a version with fRelay=0, but later request that we start announcing transactions by sending filterload or filterclear, so we need to initialize the TxRelay struct during version handling but use m_relay_txs to gate whether we announce transactions.
I'm going to mark this conversation as resolved, but please let me know in #22778 if you have any other thoughts/feedback.
code review ACK 2533ea5161d2cb15b3a43e0fea9b146f525aea97
1705 | @@ -1666,22 +1706,17 @@ void PeerManagerImpl::SendPings() 1706 | 1707 | void PeerManagerImpl::RelayTransaction(const uint256& txid, const uint256& wtxid) 1708 | { 1709 | - WITH_LOCK(cs_main, _RelayTransaction(txid, wtxid);); 1710 | -} 1711 | - 1712 | -void PeerManagerImpl::_RelayTransaction(const uint256& txid, const uint256& wtxid)
Commit 203bfe156b0a25a36c19fba397c8b2577bab1e9b [net processing] Move m_wtxid_relay to Peer introduces a usage of CConnMan::ForEachNode() with a lambda that returns true and false, however it should have a return type of void.
This is later removed in a subsequent commit.
fixed.
2533ea5161d2cb15b3a43e0fea9b146f525aea97 looks fine on the low level.
It is best if internal refactors like this one do not introduce user visible changes (even less if those changes look like "regressions" -- missing fields whereas fields were present before) or code smells like the same data being stored in two different places (that can go out of sync). See an earlier comment.
While those may look minor, it is difficult for me to judge whether the benefits of this PR outweight the above.
Wrt the missing fields in getpeerinfo: I think that may happen (fStateStats be false) at least in the case where the node is removed after CConnMan::GetNodeStats() is called, maybe in other cases too.
Thanks for the review @vasild. I understand your arguments about the extra fields in CNode and removing the stats from the GUI and RPC when they're not available. I don't have anything to add beyond my responses from the first time you raised those points.
Thanks for the review @vasild. I understand your arguments about the extra fields in
CNodeand removing the stats from the GUI and RPC when they're not available.
(If helpful, mentioning #21160 (review) here, as that thread is closed as resolved.)
Concept ACK. This will simplify -nomempool
re ACK a192c23b2cb45e7c9a06bb00714347ef38df701f via git range-diff 2533ea5...a192c23, no overall changes, diff moved between 2 commits.
rebased
concept ACK
rebased
reACK 34bd2711e7ffa0d5a7b563d15d79f8ae1d8cef43 via git range-diff a192c23...34bd271
Concept ACK.
2592 | @@ -2593,6 +2593,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 2593 | if (pfrom.m_tx_relay != nullptr) { 2594 | LOCK(pfrom.m_tx_relay->cs_filter); 2595 | pfrom.m_tx_relay->fRelayTxes = fRelay; // set to true after we get the first filter* message 2596 | + if (fRelay) pfrom.m_relays_txs = true;
nit bc3bc6ceb7265b8ea9f530066e6b79c75881f2e4: Is there a reason to not use the shorter and more clear version pfrom.m_relays_txs = fRelay?
Because of the comment for m_relay_txs in net.h:
/** ... This only changes from false to true. It will never change
* back to false. Used only in inbound eviction logic. */
This doesn't work:
pfrom.m_relays_txs &= fRelay;
since the &= operator isn't defined for std::atomic<bool>. I could use:
pfrom.m_relays_txs = pfrom.m_relay_txs && fRelay;
but I don't think that's an improvement.
pfrom.m_relays_txs = pfrom.m_relay_txs && fRelay; could change it from true to false.
I think the comment is a good justification to keeping the code this way. The only problem I can see is the temptation of future devs to change this.
pfrom.m_relays_txs = pfrom.m_relay_txs && fRelay; could change it from true to false.
:man_facepalming: You're right of course. I meant pfrom.m_relay_txs || fRelay;
Setting these values should be fine outside of the lock, right?
Can you move this out of the lock scope to make that clear?
222 | @@ -223,6 +223,9 @@ struct Peer { 223 | /** Whether a ping has been requested by the user */ 224 | std::atomic<bool> m_ping_queued{false}; 225 | 226 | + /** Whether this peer relays txs via wtxid */ 227 | + bool m_wtxid_relay{false};
c46d1f5eef34ec210a21d4700d6f2d4ca0b6941a: Is there some intuition available why this doesn't need any Mutex or atomic? The other members seem to have one.
No reason. I've updated this to be an atomic.
212 | node.nVersion = version; 213 | node.SetCommonVersion(std::min(version, PROTOCOL_VERSION)); 214 | } 215 | - if (node.m_tx_relay != nullptr) { 216 | - LOCK(node.m_tx_relay->cs_filter); 217 | - node.m_tx_relay->fRelayTxes = filter_txs;
eeb0e84fc5c259dfd00f7673ad4a51582b100282: Would be nice to not break the fuzz test. Maybe a test-only mock method on PeerMan can do this?
Does this break the fuzz test? fRelayTxes (m_relay_txs after this PR) will get set after the node receives and processes the version message. Presumably the fuzzer will be able to set the fRelay field on that message to true or false.
There is only one fuzz test to send more than one message type in one session, so this will break all other fuzz tests that only send one message type.
That seems like a bad approach to me. There's all kinds of logic that happens inside the version and verack handling. Skipping over that by reaching into PeerManager and CConnman internals and updating individual fields means that the CNode, CNodeState and Peer objects are only partially initialized when you send in the fuzzed network messages.
There shouldn't be any fields that are left uninitialized or partially initialized.
If you don't like reaching into internals, the only alternative I see is sending the fields that are set here in p2p messages.
There shouldn't be any fields that are left uninitialized or partially initialized.
Here's one example. CNode.fClient is set here:
depending on the nServices field in the version message. FillNode() in fuzz/util.cpp will never set that field to true, even if (!(nServices & NODE_NETWORK) && !(nServices & NODE_NETWORK_LIMITED)). That means that the CNode object created by the fuzz test setup can be in a state that's impossible for it to be in the running software.
Thanks, fixed in https://github.com/bitcoin/bitcoin/pull/23575
313 | @@ -284,8 +314,9 @@ struct Peer { 314 | /** Work queue of items requested by this peer **/ 315 | std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex); 316 | 317 | - explicit Peer(NodeId id) 318 | + Peer(NodeId id, bool blocks_only)
eeb0e84fc5c259dfd00f7673ad4a51582b100282: Would be nice to rename this from blocks_only to initialize_tx_relay (or similar). blocks_only is confusingly similar named to the setting -blocksonly, but it really means block-relay-only.
Moreover, the details why tx relay isn't initialized here don't matter here, so a bool with a more general name seems fine.
Done
Obviously that means you'll have to toggle the code after renaming the symbol ;)
1324 | @@ -1286,6 +1325,11 @@ bool PeerManagerImpl::GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) c 1325 | ping_wait = GetTime<std::chrono::microseconds>() - peer->m_ping_start.load(); 1326 | } 1327 | 1328 | + if (peer->m_tx_relay != nullptr) { 1329 | + stats.fRelayTxes = WITH_LOCK(peer->m_tx_relay->cs_filter, return peer->m_tx_relay->fRelayTxes); 1330 | + stats.minFeeFilter = peer->m_tx_relay->minFeeFilter.load(); 1331 | + }
eeb0e84fc5c259dfd00f7673ad4a51582b100282: any reason to make this code more fragile by removing the else in the move? The else used to set those to false, and 0, respectively.
I don't think I agree that this is more fragile. The default is that there connection isn't being used to relay transactions, and the feefilter is set to zero. Those stats are only updated if the node is relaying transactions/has set a feefilter.
I've reverted this to use if/else and remove the default initialization, since that's closer to the current code.
28 | @@ -29,6 +29,8 @@ struct CNodeStateStats { 29 | int m_starting_height = -1; 30 | std::chrono::microseconds m_ping_wait; 31 | std::vector<int> vHeightInFlight; 32 | + bool fRelayTxes{false}; 33 | + CAmount minFeeFilter{0};
eeb0e84fc5c259dfd00f7673ad4a51582b100282: Why not -1 (an impossible number), like the height values above? Or maybe leave it uninitialized to make memory sanitizers more useful?
Concept ACK 34bd2711e7ffa0d5a7b563d15d79f8ae1d8cef43 🤺
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Concept ACK 34bd2711e7ffa0d5a7b563d15d79f8ae1d8cef43 🤺
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUi+Jwv/Ur+qsT+NGY4mzH2EdkIAAFERJYtimCUi2QIug57wZnFMdbuGmmQ1rLH8
7VdAD3sG1SLDH76CkToopAYidMgxwpX2I07fV9YY+Rf3V3brS+aSSBJP0jLH0c2e
lHfrOA+kwK6L9tEBuODUBlAM3oKglqa8R50SUha9EHZ+dQGnkNufNtH7PA9XEBuS
HSZU4w2+u6j5IHvN94lgIe+HdympyeMK+JRvzOYZrX5IThzhZ3k1Ho5aoJtjv60b
FIPZ/oskBKG1dlnQU02OPZLwyjXOBAvxfiXpNHGEiBrUFQEmgLbldPXJsBGZtQv/
XN5S7YvSSpKSv7dMeGKUxbjesA5WYJH5/f5vzb2LADPhpdNIqbHn6aL/60mEKqpq
2++CjN5J7o9mUoGgVOlH2rHe/x8gu8kaSBRhLnDI7kkIjbxCfLJQ2nR3hjxo5URh
N602XW1Uxmos1hl0JsMRZ6GTzT5Zw9lPyA4q/JDqcirR0+RnUTBZStvAeAQhF4+B
lgygIrVL
=+2rA
-----END PGP SIGNATURE-----
Timestamp of file with hash 98a6c2ba1a3b3b76dacfb7357c8a39198b4cbb57741f9a173f774636f7632c66 -
</details>
Are you still working on this?
Are you still working on this? @MarcoFalke
Yes. Apologies for the delay. I do intend to rebase and address your review comments soon.
Thanks for the review @MarcoFalke. I've addressed your review comments and rebased on master.
556 | @@ -557,6 +557,16 @@ class CNode 557 | // m_tx_relay == nullptr if we're not relaying transactions with this peer 558 | std::unique_ptr<TxRelay> m_tx_relay; 559 | 560 | + /** Whether we should relay transactions to this peer (their version 561 | + * message did not include fRelay=false and this is not a block-relay-only 562 | + * connection). This only changes from false to true. It will never change
This only changes from false to true
Could you mention that this happens only at the beginning of the connection lifetime?
m_relays_txs can be set to true after initial connection, if the peer sends a filterload or filterclear message.
Right, thanks.
ACK ce772d53048a8d0d9934a8310dbedcbdafdc34ec A good change for better readability with no behavior changes.
Rebased on master
rebased
Feedback:
Concept ACK b0dfefec249302cd9d9f121fe081a9a51294d0ab 🌠
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Concept ACK b0dfefec249302cd9d9f121fe081a9a51294d0ab 🌠
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYEACgkQzit1aX5p
pUhy2QwAgOZpQ3kI/F3nHCLHVgGEA/5M63KOPO69nBlzM7p/gYJFYjqiZjK75dOQ
KpaENaMtNwzsLyJwca+JUpBwLMjStXbP6/tfK/HnwkDMXEv5n1SMyrRs7lI12gKG
IzttDNqQSAc8VeqBa5qUjPclKlrz7BxLQoaZmmhCxpDmEpaqU9YrJMtez3PTYmag
dpfX+Yk57FFwrgUxXI5fnw0zWo4EwH7Xa0oiU8TeiUpd2xR0u3B4kUIL2UiGmPv5
PptowC4ziKcilYu7jBnt0meUEVMnAAG1WoiL2aGnARXWatfSQRHsLqgOjonHzXuE
5s2kC/1LJsWXzlY39LNO/cF0WvXShlpVuZ8V5MpJLG13HV/RhwROYhRJJVTmJeeE
sXlJmWc46VA+9QXqLdNJ08qFCIbHgakHd2F5bHxV+yEtIfcR2xeHfuiVc6vFwyzW
Zr/WIsfQGnVagth2H5bJNZVfEavUXPnkZW0xSsTBogwsdBYiue7oqCVsQ0ANzxaf
W4vi+jNl
=GzKO
-----END PGP SIGNATURE-----
</details>
314 | @@ -282,8 +315,9 @@ struct Peer { 315 | /** Work queue of items requested by this peer **/ 316 | std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex); 317 | 318 | - explicit Peer(NodeId id) 319 | + explicit Peer(NodeId id, bool tx_relay) 320 | : m_id(id) 321 | + , m_tx_relay(tx_relay ? nullptr : std::make_unique<TxRelay>())
I see that this bool was renamed from blocks_only to tx_relay based on Marco's comment #21160 (review). The rename makes sense to me, but it's a little odd that if tx_relay=true you make m_tx_relay null... I would've expected it to be the other way around? :thinking:
Agreed. Can't parse this as-is.
Oops. Fixed properly now.
3910 | @@ -3910,7 +3911,9 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3911 | { 3912 | LOCK(pfrom.m_tx_relay->cs_filter); 3913 | pfrom.m_tx_relay->pfilter.reset(new CBloomFilter(filter)); 3914 | + pfrom.m_bloom_filter_loaded = true;
Same comment about lock scope here (and the hunk below as well)
1054 | NodeEvictionCandidate candidate = {node->GetId(), node->nTimeConnected, node->m_min_ping_time,
1055 | node->nLastBlockTime, node->nLastTXTime,
1056 | HasAllDesirableServiceFlags(node->nServices),
1057 | - peer_relay_txes, peer_filter_not_null, node->nKeyedNetGroup,
1058 | - node->m_prefer_evict, node->addr.IsLocal(),
1059 | + node->m_relays_txs.load(), node->m_bloom_filter_loaded.load(),
This is the only place where these values are read outside of a lock which keeps their writes in sync. I suppose these rarely being out of sync is of no consequence?
Yes, I believe this is fine. The only time both of these values are read is in CompareNodeTXTime() as a tie-break between nodes that have the same m_last_tx_time. In the (extremely rare) window condition where a filterclear or filterload message has caused these to be written, but there's a read between m_bloom_filter_loaded and m_relays_txs being read, then the only difference is that two nodes that have the same m_last_tx_time might be ordered differently.
415 | @@ -385,7 +416,7 @@ class PeerManagerImpl final : public PeerManager 416 | EXCLUSIVE_LOCKS_REQUIRED(::cs_main); 417 | 418 | /** Send a version message to a peer */ 419 | - void PushNodeVersion(CNode& pnode, int64_t nTime); 420 | + void PushNodeVersion(CNode& pnode, Peer& peer, int64_t nTime);
Why not const Peer& peer here (and for MaybeSendFeefilter), since it's used read-only?
833 | @@ -803,6 +834,14 @@ static void PushAddress(Peer& peer, const CAddress& addr, FastRandomContext& ins 834 | } 835 | } 836 | 837 | +static void AddKnownTx(Peer& peer, const uint256& hash)
Why not make this a member function of Peer to avoid having to peek at the internal lock?
I've been trying to keep Peer as a pure data structure and have all the program logic contained in the PeerManagerImpl functions. We've previously had logic split between net_processing functions, CNodeState methods and CNode methods, which has made it difficult to follow the program logic.
1124 | @@ -1086,7 +1125,7 @@ void PeerManagerImpl::FindNextBlocksToDownload(NodeId nodeid, unsigned int count 1125 | 1126 | } // namespace 1127 | 1128 | -void PeerManagerImpl::PushNodeVersion(CNode& pnode, int64_t nTime) 1129 | +void PeerManagerImpl::PushNodeVersion(CNode& pnode, Peer& peer, int64_t nTime)
Afaics, all this needs is a bool for peer.m_tx_relay != nullptr, otherwise Peer is unused here. Maybe just pass the bool instead of requiring all of Peer?
4459 | @@ -4420,12 +4460,12 @@ void PeerManagerImpl::MaybeSendAddr(CNode& node, Peer& peer, std::chrono::micros 4460 | } 4461 | } 4462 | 4463 | -void PeerManagerImpl::MaybeSendFeefilter(CNode& pto, std::chrono::microseconds current_time) 4464 | +void PeerManagerImpl::MaybeSendFeefilter(CNode& pto, Peer& peer, std::chrono::microseconds current_time)
Likewise, this doesn't need all of Peer, only a Peer::TxRelay.
1203 | - m_peer_map.emplace_hint(m_peer_map.end(), nodeid, std::move(peer)); 1204 | + m_peer_map.emplace_hint(m_peer_map.end(), nodeid, peer); 1205 | } 1206 | if (!pnode->IsInboundConn()) { 1207 | - PushNodeVersion(*pnode, GetTime()); 1208 | + PushNodeVersion(*pnode, *peer, GetTime());
See comment above about changing the PushNodeVersion params, which would eliminate the need for the copies here.
Concept ACK.
My only reservation here is that we lose a few sync guarantees now that some things are done outside of cs_main. For example m_wtxid_relay_peers and the actual count may momentarily disagree. Are you confident that nothing is relying on the assumption that they're in sync?
There is also this feedback, which you haven't responded to: #21160 (comment)
This has been needing rebase and stuff addressed for more than a week, so I've removed it from hi-prio for now. (Can be added back later)
This has been needing rebase and stuff addressed for more than a week, so I've removed it from hi-prio for now. (Can be added back later)
Thanks @MarcoFalke. That's fair. I do intend to rebase this. Just have a lot going on right now.
Absolutely. Take your time and maybe see you rebase this next year :heart:
re-ACK 0d025d56c1882ef1d4f3e0d78d337a158791f4d9 🚦
only changes are:
<details><summary>Show signature</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
re-ACK 0d025d56c1882ef1d4f3e0d78d337a158791f4d9 🚦
only changes are:
* Rebase
* Fix Peer constructor (tx_relay bool)
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjPdwv/ZLe2SAvo1xnZxY38S+0T2XwINcmvufQZrieVafJfRGQaBarFiOKtW/yN
AfMfoYSKuXHHJFiMx7IZdMFiN1224OnHgxtt7RCtW+NVsb2ZKZjqeIbYceN4Ex4Z
Q2JZT06+X2wO7oChrhMOaV9QujS6A0aEfCeWBOed0NNZnRVh4Xeb4rD5/yNFLdKO
kVchsNjNKDdX166g6k6fH82o8MRaTV84pd2t7Yw80ZZL0hqBt5X9AS7HKVrXQfs6
KI9WqBmTPWAQkJ9pIu5LbcHM26ayQ7MJTx/kh1pVRA97I2uid/5Q7yZLCPxblP6h
wP0hJgJEIGXaU9th0PcSErTJd27JOPkhSkRtU/EONWhJxt0E2foldrNdEYZ+USK2
a6OoBValo+ieXDFVgH/8eBm5eQoEhrmfhbOMakEbkULzD8fRIx6SIOlqeT6yWojj
8L5S6HudaTQsEq+h6w1iv6VyWrsn6VrZSNASNRGPdcsxSEsPc8CoofisO10rXULK
YmWgVqV6
=QnMV
-----END PGP SIGNATURE-----
</details>
237 | @@ -238,10 +238,6 @@ void FillNode(FuzzedDataProvider& fuzzed_data_provider, ConnmanTestMsg& connman, 238 | assert(node.nVersion == version); 239 | assert(node.GetCommonVersion() == std::min(version, PROTOCOL_VERSION)); 240 | assert(node.nServices == remote_services); 241 | - if (node.m_tx_relay != nullptr) { 242 | - LOCK(node.m_tx_relay->cs_filter); 243 | - assert(node.m_tx_relay->fRelayTxes == filter_txs);
nit in b6eae072553bd2811feed0149644a9a906489f35:
Instead of removing, I think you can do assert(peerman.getpeer(id).frelay==filter)
PeerManager doesn't have a public GetPeerRef() method. The Peer objects are intentionally private to PeerManagerImpl.
The following diff compiles for me:
diff --git a/src/test/fuzz/util.cpp b/src/test/fuzz/util.cpp
index d57c0081db..6766fbf2d9 100644
--- a/src/test/fuzz/util.cpp
+++ b/src/test/fuzz/util.cpp
@@ -225,7 +225,7 @@ void FillNode(FuzzedDataProvider& fuzzed_data_provider, ConnmanTestMsg& connman,
const ServiceFlags remote_services = ConsumeWeakEnum(fuzzed_data_provider, ALL_SERVICE_FLAGS);
const NetPermissionFlags permission_flags = ConsumeWeakEnum(fuzzed_data_provider, ALL_NET_PERMISSION_FLAGS);
const int32_t version = fuzzed_data_provider.ConsumeIntegralInRange<int32_t>(MIN_PEER_PROTO_VERSION, std::numeric_limits<int32_t>::max());
- const bool filter_txs = fuzzed_data_provider.ConsumeBool();
+ const bool relay_txs{fuzzed_data_provider.ConsumeBool()};
const CNetMsgMaker mm{0};
@@ -241,7 +241,7 @@ void FillNode(FuzzedDataProvider& fuzzed_data_provider, ConnmanTestMsg& connman,
uint64_t{1}, // dummy nonce
std::string{}, // dummy subver
int32_t{}, // dummy starting_height
- filter_txs),
+ relay_txs),
};
(void)connman.ReceiveMsgFrom(node, msg_version);
@@ -255,6 +255,9 @@ void FillNode(FuzzedDataProvider& fuzzed_data_provider, ConnmanTestMsg& connman,
assert(node.nVersion == version);
assert(node.GetCommonVersion() == std::min(version, PROTOCOL_VERSION));
assert(node.nServices == remote_services);
+ CNodeStateStats statestats;
+ assert(peerman.GetNodeStateStats(node.GetId(), statestats));
+ assert(statestats.m_relay_txs == (relay_txs && !node.IsBlockOnlyConn()));
node.m_permissionFlags = permission_flags;
if (successfully_connected) {
CSerializedNetMsg msg_verack{mm.Make(NetMsgType::VERACK)};
It does two things:
filter_txs symbol to a better namem_relay_txs is correctly set after the "version msg roundtrip".
We'll move the transaction relay data into Peer in subsequent commits,
but the inbound eviction logic needs to know if the peer is relaying
txs and if the peer has loaded a bloom filter.
This is currently redundant information with m_tx_relay->fRelayTxes,
but when m_tx_relay is moved into net_processing, then we'll need these
separate fields in CNode.
Also, remove cs_main guard from m_wtxid_relay_peers and make it atomic.
This should be fine since we don't need m_wtxid_relay_peers to be
synchronized with m_wtxid_relay exactly at all times.
After this change, RelayTransaction no longer requires cs_main.
-BEGIN VERIFY SCRIPT-
ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); }
ren cs_filter m_bloom_filter_mutex
ren fRelayTxes m_relay_txs
ren pfilter m_bloom_filter
ren cs_tx_inventory m_tx_inventory_mutex
ren filterInventoryKnown m_tx_inventory_known_filter
ren setInventoryTxToSend m_tx_inventory_to_send
ren fSendMempool m_send_mempool
ren nNextInvSend m_next_inv_send_time
ren minFeeFilter m_fee_filter_received
ren lastSentFeeFilter m_fee_filter_sent
-END VERIFY SCRIPT-Rebased
424 | @@ -394,7 +425,7 @@ class PeerManagerImpl final : public PeerManager 425 | EXCLUSIVE_LOCKS_REQUIRED(::cs_main); 426 | 427 | /** Send a version message to a peer */ 428 | - void PushNodeVersion(CNode& pnode); 429 | + void PushNodeVersion(CNode& pnode, Peer& peer);
nit:
void PushNodeVersion(CNode& pnode, const Peer& peer);
Or like cory suggested this could also just be a bool.
ACK 1066d10f71e6800c78012d789ff6ae19df0243fe - This is a good layer separation improvement with no behavior changes.
Are you still planning to address Cory's comments?
utACK 1066d10f71e6800c78012d789ff6ae19df0243fe
There are still a few outstanding review comments that could be responded to or addressed in a follow-up. Leaving this here mostly as a reminder to myself:
There are still a few outstanding review comments that could be responded to or addressed in a follow-up. Leaving this here mostly as a reminder to myself:
I believe I've now addressed all outstanding review comments on this PR, either here or in #24692.
See also #24691
1369 | + stats.m_relay_txs = WITH_LOCK(peer->m_tx_relay->m_bloom_filter_mutex, return peer->m_tx_relay->m_relay_txs); 1370 | + stats.m_fee_filter_received = peer->m_tx_relay->m_fee_filter_received.load(); 1371 | + } else { 1372 | + stats.m_relay_txs = false; 1373 | + stats.m_fee_filter_received = 0; 1374 | + }
Moving fRelayTxes/m_relay_txs from CNodeStats to CNodeStateStats broke -netinfo (and perhaps other clients that expect this field to be non-null). Proposing a fix.