net processing: Extract `addr` send functionality into MaybeSendAddr()

jnewbery commented at 1:06 pm on February 19, 2021: member

This continues the work of moving application layer data into net_processing. It refactors addr send functionality into its own function MaybeSendAddr() and flattens/simplifies the code. Isolating and simplifying the addr handling code makes subsequent changes (which will move addr data and logic into net processing) easier to review.

This is a pure refactor. There are no functional changes.

For motivation of the project, see #19398.

DrahtBot added the label P2P on Feb 19, 2021

jnewbery force-pushed on Feb 19, 2021

in src/net_processing.cpp:4403 in 5e9af6f841 outdated

4397+    // We sent an `addr` message to this peer recently. Nothing more to do.
4398+    if (current_time <= pto.m_next_addr_send) return;
4399+    pto.m_next_addr_send = PoissonNextSend(current_time, AVG_ADDRESS_BROADCAST_INTERVAL);
4400+
4401+    assert(pto.m_addr_known);
4402+    assert(pto.vAddrToSend.size() <= MAX_ADDR_TO_SEND);

jnewbery commented at 1:23 pm on February 19, 2021:

It’s currently impossible for vAddrToSend to grow larger than MAX_ADDR_TO_SEND (observe that the only place where elements are added to vAddrToSend check the size first). However, it might be safer to do something like:

0    if(pto.vAddrToSend.size() > MAX_ADDR_TO_SEND) vAddrToSend.resize(MAX_ADDR_TO_SEND);

I’m happy to go with whatever reviewers prefer.

ajtowns commented at 6:54 pm on February 26, 2021:

Could have an Assume instead of an assert, but better to error out than silently cover it up if mistakes are introduced elsewhere I think.

jnewbery commented at 10:59 am on February 28, 2021:

Do you think it’d be useful to add a new macro AssertAndRecover():

in debug builds, this asserts (terminates) if the condition fails
in release builds, this executes some recovery code and continues.

eg here the recovery code would be to clear the vAddrToSend vector and return from MaybeSendAddr(). Better to not relay addr records to a peer and continue operation than crash the node.

ajtowns commented at 2:17 pm on February 28, 2021:

Isn’t AssertAndRecover(condition, recovery) the same as if (!Assume(condition)) { recovery; } ? (Err, at least if Assume was the identity function its comments claim it to be, rather than being cast to (void) in release builds)

Not sure the recovery code is needed here – more than MAX_ADDR_TO_SEND would be a protocol violation, but shouldn’t cause a crash or anything, I think?

jnewbery commented at 8:08 pm on February 28, 2021:

Isn’t AssertAndRecover(condition, recovery) the same as if (!Assume(condition)) { recovery; }

Ah interesting. I didn’t realise that.

Not sure the recovery code is needed here – more than MAX_ADDR_TO_SEND would be a protocol violation, but shouldn’t cause a crash or anything, I think?

I think it’s probably better to skip sending the addr message than violate the p2p protocol.

jnewbery commented at 11:22 am on March 29, 2021:

#21317 has now been merged so I’ve updated this to use the if (!Assume(condition)) { recovery; } pattern.

DrahtBot commented at 2:17 pm on February 19, 2021: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#21528 ([p2p] Reduce addr blackholes by amitiuttarwar)
#21515 (Erlay: bandwidth-efficient transaction relay protocol by naumenkogs)
#21198 (net: Address outstanding review comments from PR20721 by jnewbery)
#21160 (net/net processing: Move tx inventory into net_processing by jnewbery)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jnewbery force-pushed on Feb 19, 2021

jnewbery force-pushed on Feb 22, 2021

jnewbery commented at 8:46 am on February 22, 2021: member

Rebased on master to pick up fix for interface_zmq.py in #21008.

in src/net.h:550 in 975f0060aa outdated

545@@ -546,8 +546,8 @@ class CNode
546     std::vector<CAddress> vAddrToSend;
547     std::unique_ptr<CRollingBloomFilter> m_addr_known{nullptr};
548     bool fGetAddr{false};
549-    std::chrono::microseconds m_next_addr_send GUARDED_BY(cs_sendProcessing){0};
550-    std::chrono::microseconds m_next_local_addr_send GUARDED_BY(cs_sendProcessing){0};
551+    std::chrono::microseconds m_next_addr_send{0};
552+    std::chrono::microseconds m_next_local_addr_send{0};

ajtowns commented at 5:37 pm on February 26, 2021:

Err, nack. Please add EXCLUSIVE_LOCKS_REQUIRED(pto->cs_sendProcessing) to MaybeSendAddr instead. I could see an argument for having a single mutex representing “this is the message handler thread” (rather than one per-peer), but removing guards entirely is wrong…

jnewbery commented at 10:53 am on February 28, 2021:

These members should never have been guarded by cs_sendProcessing, which really only exists at this point to make sure that multiple threads don’t enter SendMessages() concurrently.

The annotation was over-zealously added in #13123. The author of that PR agrees that these annotations are unnecessary: #13123 (comment).

These variables are only ever read/written by the message handler thread inside SendMessages(), so it’s safe for them to be unguarded. I can make them atomic if you think that would make this more obviously safe.

ajtowns commented at 2:41 pm on February 28, 2021:

“These variables are only ever read/written … inside SendMessages” – is exactly what GUARDED_BY(cs_sendProcessing) documents. The point of having the guard is to catch errors when we change the code in future and violate the assumptions we were relying on – if the code compiles with the GUARDED_BY statements, it will always also compile correctly without them – the trouble comes when people make changes later.

I don’t know what @practicalswift thinks the annotation was a mistake; it’s not. (It’s a mistake the other addr-related member variables aren’t guarded though)

Having things be atomic instead of guarded is only a good idea if they’re actually accessed by multiple threads, and either they don’t need to be synchronized with any other data, or the timing is too important for proper locks.

EDIT: There’s a commit on https://github.com/ajtowns/bitcoin/commits/202103-cs_sendProcessing which turns the mutex into a single global, which can then be used to also guard things in struct Peer that aren’t accessed outside of ProcessMesasges/SendMessages. Not needed for this PR, but might be useful for the followup.

jnewbery commented at 8:10 pm on February 28, 2021:

It’s a mistake the other addr-related member variables aren’t guarded though

Ok. I’ve moved all of the addr fields to be guarded by their own m_addr_mutex mutex. This seems much better than having data/logic that should be in net_processing being locked by a mutex in net.

jnewbery commented at 6:01 pm on March 1, 2021:

That proved more difficult than expected. I’ve reverted that change and have moved just m_next_addr_send and m_next_local_addr_send to be guarded by the new m_addr_mutex. I’ll move the rest of the addr fields under that mutex when they move into the Peer struct.

ajtowns commented at 10:52 am on March 2, 2021:

I don’t follow why you’re changing those guards at all in this PR; you don’t need to. See https://github.com/ajtowns/bitcoin/tree/202003-pr21236

I don’t think it will make sense to lock m_addr_known and vAddrToSend with per-peer locks: when receiving an addr message we’ll obtain the lock for the peer that sends the message, then attempt to obtain the lock for a peer we want to forward the address on to – but there’s no particular reason to expect those peers to always be in a particular order, so holding the sending peer’s lock would create ordering violations, and releasing and reacquiring would be a bit ridiculous given that this is always happening in a single thread.

jnewbery commented at 11:36 am on March 2, 2021:

Thanks! Totally agree that taking and releasing multiple locks would be ridiculous. I’ve taken your branch.

jnewbery commented at 11:23 am on March 29, 2021:

This is still causing confusion for reviewers, so I’ve moved the fields to have their own lock that is only used in net_processing.

in src/net_processing.cpp:340 in cb68827f7d outdated

336@@ -337,7 +337,7 @@ class PeerManagerImpl final : public PeerManager
337     void MaybeSendPing(CNode& node_to, Peer& peer);
338 
339     /** Send `addr` messages on a regular schedule*/
340-    void MaybeSendAddr(CNode* pto);
341+    void MaybeSendAddr(CNode& pto);

ajtowns commented at 6:36 pm on February 26, 2021:

Maybe change this to node or node_to as per MaybeSendPing. It’s no longer a p either way.

jnewbery commented at 11:40 am on February 28, 2021:

Done

in src/net_processing.cpp:4402 in 53353c7536 outdated

4435-        if (!vAddr.empty())
4436-            m_connman.PushMessage(&pto, msgMaker.Make(make_flags, msg_type, vAddr));
4437-        // we only send the big addr message once
4438-        if (pto.vAddrToSend.capacity() > 40)
4439-            pto.vAddrToSend.shrink_to_fit();
4440+    assert(pto.m_addr_known);

ajtowns commented at 6:56 pm on February 26, 2021:

Should move this to directly after the RelayAddrsWithConn check – that’s what makes it true, and you rely on m_addr_known being not null in the !IBD branch above as well.

jnewbery commented at 11:44 am on February 28, 2021:

Done

ajtowns dismissed

ajtowns commented at 6:57 pm on February 26, 2021: member

Haven’t thoroughly reviewed the “Refactor MaybeSendAddr” commit; seems like separating it into three or four commits would be clearer (early exit; invariants; skip vAddr check; one-pass erase)?

jnewbery force-pushed on Feb 28, 2021

jnewbery commented at 11:52 am on February 28, 2021: member

Thanks for the review @ajtowns. I’ve addressed your comments.

jnewbery force-pushed on Feb 28, 2021

jnewbery commented at 10:05 pm on February 28, 2021: member

Oops. I missed taking the new lock in one of the call sites. Will fix up tomorrow.

jnewbery force-pushed on Mar 1, 2021

jnewbery commented at 6:00 pm on March 1, 2021: member

Once all of the addr data has been moved into net_processing, I intend to guard it with a single lock (see the m_addr_relay struct in https://github.com/jnewbery/bitcoin/tree/2021-02-lazy-init-peer). However, it’s proving difficult to get lock orders right while the data is still in net. I’ve decided to punt on doing that until after this PR.

To avoid simply removing the guard from m_next_addr_send and m_next_local_addr_send, I’ve introduced the new m_addr_mutex lock in this PR. Once the data is moved into Peer, I’ll extend that mutex to guard the remaining addr data.

jnewbery force-pushed on Mar 2, 2021

in src/net_processing.cpp:4210 in d587fae754 outdated

4429+    m_connman.PushMessage(&node, CNetMsgMaker(node.GetCommonVersion()).Make(make_flags, msg_type, node.vAddrToSend));
4430+    node.vAddrToSend.clear();
4431+
4432+    // we only send the big addr message once
4433+    if (node.vAddrToSend.capacity() > 40)
4434+        node.vAddrToSend.shrink_to_fit();

fanquake commented at 6:36 am on March 3, 2021:

0    if (node.vAddrToSend.capacity() > 40) {
1        node.vAddrToSend.shrink_to_fit();
2    }

jnewbery commented at 9:02 am on March 3, 2021:

done

in src/net_processing.cpp:4379 in d587fae754 outdated

4374+    // Nothing to do for non-address-relay peers
4375+    if (!node.RelayAddrsWithConn()) return;
4376+
4377+    assert(node.m_addr_known);
4378+
4379+    auto current_time = GetTime<std::chrono::microseconds>();

fanquake commented at 6:42 am on March 3, 2021:

0    const auto current_time = GetTime<std::chrono::microseconds>();

jnewbery commented at 9:02 am on March 3, 2021:

done

fanquake commented at 7:00 am on March 3, 2021: member

Concept ACK - this seems like a fairly good simplification. I like consolidating logic and returning early when possible. If this SendMessages() logic is going to be split up even more, you could consider a call to GetTime<>() at the start, and just pass that into MaybeSendPing, MaybeSendAddr, MaybeSomeOtherFunc etc.

This doesn’t currently compile:

0net_processing.cpp:4408:91: error: no member named 'm_addr_mutex' in 'CNode'
1    auto addr_already_known = [&node](const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(node.m_addr_mutex)
2                                                                                     ~~~~ ^
3./threadsafety.h:31:79: note: expanded from macro 'EXCLUSIVE_LOCKS_REQUIRED'
4#define EXCLUSIVE_LOCKS_REQUIRED(...) __attribute__((exclusive_locks_required(__VA_ARGS__)))
5                                                                              ^~~~~~~~~~~
61 error generated.

jnewbery force-pushed on Mar 3, 2021

jnewbery commented at 9:04 am on March 3, 2021: member

Thanks for the review, @fanquake! The compile error was due to an annotation that I’d left from a previous branch. It’s now fixed.

If this SendMessages() logic is going to be split up even more, you could consider a call to GetTime<>() at the start, and just pass that into MaybeSendPing, MaybeSendAddr, MaybeSomeOtherFunc etc.

Good idea. Done!

jnewbery force-pushed on Mar 3, 2021

in src/net_processing.cpp:339 in 86087474a2 outdated

335@@ -336,6 +336,9 @@ class PeerManagerImpl final : public PeerManager
336      *  mark the peer to be disconnected if a ping has timed out. */
337     void MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::microseconds now);
338 
339+    /** Send `addr` messages on a regular schedule*/

amitiuttarwar commented at 11:39 pm on March 3, 2021:

μnit: schedule. */

jnewbery commented at 10:03 am on March 4, 2021:

Fixed.

in src/net_processing.cpp:4146 in 86087474a2 outdated

4367@@ -4365,6 +4368,71 @@ void PeerManagerImpl::MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::mic
4368     }
4369 }
4370 
4371+void PeerManagerImpl::MaybeSendAddr(CNode* pto, std::chrono::microseconds current_time)
4372+{

amitiuttarwar commented at 11:57 pm on March 3, 2021:

what do you think about adding an AssertLockHeld to accompany the clang safety annotations with a dynamic runtime check? (developer notes)

jnewbery commented at 10:04 am on March 4, 2021:

done

in src/net_processing.cpp:4155 in d506485865 outdated

4376 
4377-    if (fListen && pto->RelayAddrsWithConn() &&
4378-        !::ChainstateActive().IsInitialBlockDownload() &&
4379-        pto->m_next_local_addr_send < current_time) {
4380+    if (fListen && node.RelayAddrsWithConn() &&
4381+        node.m_next_local_addr_send < current_time) {

amitiuttarwar commented at 0:48 am on March 4, 2021:

looks like d506485865 removed the IBD check, but 4adba42f41 brings it back

jnewbery commented at 10:04 am on March 4, 2021:

Oops. Good catch. Fixed!

in src/net_processing.cpp:4176 in 950398a61b outdated

4374@@ -4375,8 +4375,6 @@ void PeerManagerImpl::MaybeSendAddr(CNode& node, std::chrono::microseconds curre
4375 
4376     assert(node.m_addr_known);
4377 
4378-    const CNetMsgMaker msgMaker(node.GetCommonVersion());
4379-
4380     // Periodically advertise our local address to the peer.
4381     if (fListen && !::ChainstateActive().IsInitialBlockDownload() &&

amitiuttarwar commented at 1:02 am on March 4, 2021:

might be worth updating to m_chainman.ActivateChainstate()?

h/t #21327 (review)

jnewbery commented at 10:09 am on March 4, 2021:

This isn’t adding a new ::ChainstateActive() like #21327 was. I think changing this call to be m_chainman.ActiveChainstate() would be adding more noise/distraction for reviewers.

amitiuttarwar commented at 6:49 pm on March 4, 2021:

ok. since this is a refactor PR anyway, as a reviewer I would not mind, but its no worries :)

amitiuttarwar commented at 1:08 am on March 4, 2021: contributor

approach ACK

I have some more things to think about / investigate (namely the lock changes & the vAddrToSend size assertion) , but left a few comments in my first pass.

ajtowns requested review from ajtowns on Mar 4, 2021

in src/net_processing.cpp:324 in 950398a61b outdated

333@@ -334,7 +334,10 @@ class PeerManagerImpl final : public PeerManager
334 
335     /** Send a ping message every PING_INTERVAL or if requested via RPC. May
336      *  mark the peer to be disconnected if a ping has timed out. */
337-    void MaybeSendPing(CNode& node_to, Peer& peer);
338+    void MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::microseconds now);

amitiuttarwar commented at 4:46 am on March 4, 2021:

looks like both MaybeSendPing and MaybeSendAddr can be made into const member functions

jnewbery commented at 10:01 am on March 4, 2021:

I’m not sure that const should be used in cases like this. The compiler will make sure that any function marked const doesn’t mutate the object’s data members, which is the case for both of these functions. However, const is not transitive, so the functions are still able to modify state through pointers and modify global state.

Here, PeerManagerImpl owns the map of Peer objects:

 0class PeerManagerImpl final : public PeerManager
 1{
 2[...]
 3private:
 4    /**
 5     * Map of all Peer objects, keyed by peer id. This map is protected
 6     * by the m_peer_mutex. Once a shared pointer reference is
 7     * taken, the lock may be released. Individual fields are protected by
 8     * their own locks.
 9     */
10    std::map<NodeId, PeerRef> m_peer_map GUARDED_BY(m_peer_mutex);

and the map of CNodeState objects are global (in the file’s unnamed namespace):

0/** Map maintaining per-node state. */
1static std::map<NodeId, CNodeState> mapNodeState GUARDED_BY(cs_main);

So PeerManagerImpl definitely owns the Peer objects, and logically owns the CNodeState objects since it’s responsible for constructing and destructing them (and #20758 actually moves the CNodeState objects into PeerManagerImpl).

These functions don’t mutate PeerManagerImpl, but do mutate data that it owns. I think in such cases there’s an argument that we shouldn’t mark the function const even if the compiler would allow us to. For me, const on a member function is telling the programmer “this function does not mutate the logical state of this object”, which isn’t the case here.

ajtowns commented at 12:17 pm on March 4, 2021:

Since PeerRef is a shared_ptr which does its own destruction once the ref count hits zero, I don’t think it’s accurate to say that PeerManagerImpl owns the Peer object.

jnewbery commented at 12:39 pm on March 4, 2021:

I’m not sure it’s right to say that ownership of Peer is shared because it uses a shared pointer. We use a shared pointer for the Peer for convenience/safety/synchronization. We never take a shared pointer to a Peer outside PeerManagerImpl, and if we really wanted to, could implement it with raw pointers and a manual ref count, and then new and delete the Peer objects (as is done by CConnman for its CNode objects in vNodes)

That’s maybe a bit pedantic. I think the more important question is “what is const for?” For me, it means “this function won’t mutate the logical state of this object”. If we’re using it to say “this function won’t mutate the data members of this object, but it will mutate the state of data structures that are integral to how this object operates, but will do so through a pointer”, then it seems a bit legalistic and not very useful to me.

ajtowns commented at 1:59 pm on March 4, 2021:

I was surprised to discover that if you have a class Y with a member X& x then C++ thinks it’s fine to modify x.foo = 3 from a const member function of Y. So not sure I’d argue against “a bit legalistic and not very useful”… (So in particular, if you have a const PeerManagerImpl, it will have no problems calling non-const functions of its m_connman)

If you consider modifying the contents of a PeerRef to be modifying a PeerManagerImpl, then I don’t think it’s consistent to mark GetPeerRef() as const either (code compiles fine that way though). I dunno; “shared” makes me think things are shared, not owned, so that’s definitely a mental adjustment.

I guess I don’t think it’s a good idea to try to put too much convention on top of what our compilers will actually enforce here though.

jnewbery commented at 2:28 pm on March 4, 2021:

This certainly isn’t a hill I’m prepared to die on (either here or in #21162 (review)). My preference is not to mark functions as const if they’re mutating data that is so closely associated with the object that it can only be modified through the object, but I’ll go along with the majority if people want this to be const.

ajtowns commented at 3:18 pm on March 4, 2021:

Yeah, not my hill either, but I think it’s worth discussing so we’ve got a shared understanding of how this data’s meant to be controlled… I have some other ideas about Peer ownership I want to try out once more of this refactoring is finished that might change this, and worst case can add or remove const markers when we’re reordering the PeerManageImpl declarations and adding more comments…

amitiuttarwar commented at 0:57 am on March 5, 2021:

I agree that the relevant question is “what is const for?” The two main aspects are compiler enforcement & human understanding. @jnewbery I get what you’re saying about making these member functions const is misleading for the humans, because the function often mutates the logical state of the object. However, I also agree with @ajtowns that since the compilers would actually be enforcing something here, it does offer a bit of literal guarding?

I don’t have a strong opinion on which way we go, but it is definitely an interesting consideration & mostly I conclude that const is complex. I’m personally curious to learn more about how a compiler uses the keyword- when its just enforcement vs optimization. (recommending resources welcome!)

sipa commented at 2:21 am on March 5, 2021:

I’m 99% sure that compilers don’t use const markers on functions/pointers/references at all for optimization. All they do is disallow certain operations on them.

It’s different if you have an actual const variable (e.g. a global, local, or member variable that is declared const where it is defined & initialized), because that means “changing this is UB”. Any other use simply means “no non-const operations are allowed on object through this reference/pointer/function; if other code is invoked that happens to hold a non-const reference/pointer, the object can still change (and the compiler can’t assume a cached version in a register won’t change e.g.).

amitiuttarwar commented at 11:48 pm on March 5, 2021:

interesting, ok. so other than with direct variable management, const essentially uses the compiler to ensure anyone introducing new code has to think before they modify the object through this path. hm, thanks.

jnewbery commented at 3:44 pm on March 6, 2021:

@sipa yes, that’s exactly my understanding. Mutating a const-declared variable is UB. Therefore compilers could make optimizations based on the fact that const-declared variables cannot change.

const on a member functions simply means that this function call won’t mutate the state of the object, although of course, other code could mutate the state. Therefore, the compiler can’t make any assumptions about the state of the variable not changing.

You can even mutate the state through a const function by casting away the constness, eg:

 0#include <iostream>
 1
 2class thing
 3{
 4    int x{0};
 5
 6public:
 7    void mutate(int x_in) const
 8    {
 9        thing* mutable_this = const_cast<thing*>(this);
10        mutable_this->x = x_in;
11    }
12
13    int access() const { return x; }
14
15    thing(int x_in) : x(x_in) { }
16};
17 
18int main()
19{
20    thing t(1);
21    std::cout << "t.x = " << t.access() << "\n";
22
23    t.mutate(2);
24    std::cout << "t.x = " << t.access() << "\n";
25
26    return 0;
27}

You can see that the state of t has been mutated in the mutate() function, even though that function is const. This compiles with no warnings. I believe this is perfectly legal (although ill-advised!) c++. The c++ core guidelines advise against casting away const in almost all cases: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es50-dont-cast-away-const. Scott Meyers does use casting away const to avoid code duplication: https://stackoverflow.com/a/123995/933705 (although that book is now quite old, so I’m not sure if the advice is still current).

Going back to the main question about whether these functions should be const, I found the isocpp.org wikipedia page on const correctness to be very useful in clarifying my thoughts. A few extracts:

The trailing const on inspect() member function should be used to mean the method won’t change the object’s abstract (client-visible) state. That is slightly different from saying the method won’t change the “raw bits” of the object’s struct

a const member function must not change (or allow a caller to change) the this object’s logical state (AKA abstract state AKA meaningwise state). Think of what an object means, not how it is internally implemented. A Person’s age and name are logically part of the Person, but the Person’s neighbor and employer are not. An inspector method that returns part of the this object’s logical / abstract / meaningwise state must not return a non-const pointer (or reference) to that part, independent of whether that part is internally implemented as a direct data-member physically embedded within the this object or some other way. (emphasis mine)

If a method changes any part of the object’s logical state, it logically is a mutator; it should not be const even if (as actually happens!) the method doesn’t change any physical bits of the object’s concrete state.

Conversely, a method is logically an inspector and should be const if it never changes any part of the object’s logical state, even if (as actually happens!) the method changes physical bits of the object’s concrete state.

If you are outside the class — you are a normal user, every experiment you could perform (every method or sequence of methods you call) would have the same results (same return values, same exceptions or lack of exceptions) irrespective of whether you first called that lookup method. If the lookup function changed any future behavior of any future method (not just making it faster but changed the outcome, changed the return value, changed the exception), then the lookup method changed the object’s logical state — it is a mutuator.

Based on that, I believe these functions should not be marked const. Calling MaybeSendPing() and MaybeSendAddr() multiple times will result in different behaviour that is observable from outside the class, since the Peer() objects are mutated.

amitiuttarwar commented at 6:45 pm on March 17, 2021:

I’m convinced :)

jnewbery force-pushed on Mar 4, 2021

jnewbery commented at 10:09 am on March 4, 2021: member

Thanks for the review @amitiuttarwar! I’ve addressed all of your comments.

DrahtBot added the label Needs rebase on Mar 4, 2021

jnewbery commented at 2:26 pm on March 4, 2021: member

Rebased

jnewbery force-pushed on Mar 4, 2021

DrahtBot removed the label Needs rebase on Mar 4, 2021

in src/net_processing.cpp:4190 in 151597e19b outdated

4199+
4200+    assert(node.vAddrToSend.size() <= MAX_ADDR_TO_SEND);
4201+
4202+    // Remove addr records that the peer already knows about
4203+    auto addr_already_known = [&node](const CAddress& addr) {return node.m_addr_known->contains(addr.GetKey());};
4204+    node.vAddrToSend.erase(std::remove_if(node.vAddrToSend.begin(), node.vAddrToSend.end(), addr_already_known),

amitiuttarwar commented at 11:25 pm on March 5, 2021:

should we explicitly #include <algorithm>?

jnewbery commented at 3:51 pm on March 6, 2021:

Yes! Thank you :)

jnewbery commented at 4:16 pm on March 6, 2021:

Done!

amitiuttarwar commented at 1:45 am on March 6, 2021: contributor

code review ACK 151597e19b

a couple review notes

I liked the use of the erase-remove idiom, my understanding of the changes: we no longer copy all the elements of vAddrToSend and instead are able to edit in place with linear complexity by using this pattern. I think we now traverse the vector 2x instead of 1x, but that seems fine. I also find the logic flow much easier to understand, esp with removing the unused logic to handle >1000 addrs queued.
the biggest change seems to be breaking this addr functionality out of cs_main. I spent some time checking if this is safe. to do so, I checked that all of the state touched either takes cs_main itself, is protected by a more specific lock, or only accessed from the message handler thread (eg. m_addr_known, vAddrToSend, GetLocalAddrForPeer..)

The only touchpoint I’m unsure about is the call to fRelay here, I’m not very familiar with the qt code so I’m not 100% sure how this interacts with the rest. I think it’s ok, but just wanted to explicitly call out the gap in my understanding.

jnewbery force-pushed on Mar 6, 2021

jnewbery commented at 4:17 pm on March 6, 2021: member

The only touchpoint I’m unsure about is the call to fRelay here

I’m not sure what this is referring to. I don’t see fRelay in that file at all.

jnewbery force-pushed on Mar 6, 2021

jnewbery commented at 4:42 pm on March 6, 2021: member

I think we now traverse the vector 2x instead of 1x, but that seems fine.

I’ve changed this so we both check for the addr in node.m_addr_known and add it to node.m_addr_known in the same pass.

jnewbery deleted a comment on Mar 6, 2021

amitiuttarwar commented at 9:56 pm on March 7, 2021: contributor

The only touchpoint I’m unsure about is the call to fRelay here

I’m not sure what this is referring to. I don’t see fRelay in that file at all.

oops, I’m not sure what I was referring to either 😳 I’ve re-checked all the fRelay touch points and am now comfortable with pulling it out of cs_main.

reACK c3b424d98f

DrahtBot added the label Needs rebase on Mar 11, 2021

jnewbery commented at 1:58 pm on March 11, 2021: member

Rebased

jnewbery force-pushed on Mar 11, 2021

DrahtBot removed the label Needs rebase on Mar 11, 2021

DrahtBot added the label Needs rebase on Mar 17, 2021

jnewbery force-pushed on Mar 17, 2021

jnewbery commented at 2:25 pm on March 17, 2021: member

rebased

DrahtBot removed the label Needs rebase on Mar 17, 2021

amitiuttarwar approved

amitiuttarwar commented at 6:46 pm on March 17, 2021: contributor

reACK e3d6cbca83300e5ccc0fd8a1075ff52654aa76ff, rebased chainman & std::optional changes since last round.

hebasto commented at 1:26 pm on March 26, 2021: member

Concept ACK.

hebasto commented at 3:04 pm on March 26, 2021: member

Approach ACK e3d6cbca83300e5ccc0fd8a1075ff52654aa76ff, ~~need more time to think about the last commit~~.

The message of the commit “[net processing] Only call GetTime() once in SendMessages()” (461dda316aefa4f792350a82ca2fb8c6daf8cca1) states:

We currently call GetTime() 3 times in SendMessages(). Consolidate this to once GetTime() call.

I see two calls of three are consolidated. What about the third one: https://github.com/bitcoin/bitcoin/blob/e3d6cbca83300e5ccc0fd8a1075ff52654aa76ff/src/net_processing.cpp#L4513?

Also I believe that this commit is not a clean refactoring. To be precise, change behavior could be expected as even LOCK(cs_main); could take unpredictable time. But the scale of such changes seems insignificant to P2P, right?

The commit “[net processing] Extract addr send functionality into MaybeSendAddr()” (97e101345e6b838a5dd308a6adafea53eb14d521) allows to localize acquiring of the CNode::cs_sendProcessing mutex:

 0--- a/src/net.cpp
 1+++ b/src/net.cpp
 2@@ -2164,10 +2164,7 @@ void CConnman::ThreadMessageHandler()
 3             if (flagInterruptMsgProc)
 4                 return;
 5             // Send messages
 6-            {
 7-                LOCK(pnode->cs_sendProcessing);
 8-                m_msgproc->SendMessages(pnode);
 9-            }
10+            m_msgproc->SendMessages(pnode);
11 
12             if (flagInterruptMsgProc)
13                 return;
14diff --git a/src/net.h b/src/net.h
15index 48d37084a..300c9fffe 100644
16--- a/src/net.h
17+++ b/src/net.h
18@@ -419,7 +419,7 @@ public:
19     std::list<CNetMessage> vProcessMsg GUARDED_BY(cs_vProcessMsg);
20     size_t nProcessQueueSize{0};
21 
22-    RecursiveMutex cs_sendProcessing;
23+    Mutex cs_sendProcessing;
24 
25     uint64_t nRecvBytes GUARDED_BY(cs_vRecv){0};
26 
27@@ -787,7 +787,7 @@ public:
28     * [@param](/bitcoin-bitcoin/contributor/param/)[in]   pnode           The node which we are sending messages to.
29     * [@return](/bitcoin-bitcoin/contributor/return/)                      True if there is more work to be done
30     */
31-    virtual bool SendMessages(CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(pnode->cs_sendProcessing) = 0;
32+    virtual bool SendMessages(CNode* pnode) = 0;
33 
34 
35 protected:
36diff --git a/src/net_processing.cpp b/src/net_processing.cpp
37index 8b13a859c..b7c6a08ea 100644
38--- a/src/net_processing.cpp
39+++ b/src/net_processing.cpp
40@@ -241,7 +241,7 @@ public:
41     void InitializeNode(CNode* pnode) override;
42     void FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) override;
43     bool ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt) override;
44-    bool SendMessages(CNode* pto) override EXCLUSIVE_LOCKS_REQUIRED(pto->cs_sendProcessing);
45+    bool SendMessages(CNode* pto) override;
46 
47     /** Implement PeerManager */
48     void CheckForStaleTipAndEvictPeers() override;
49@@ -323,7 +323,7 @@ private:
50     void MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::microseconds now);
51 
52     /** Send `addr` messages on a regular schedule. */
53-    void MaybeSendAddr(CNode& node, std::chrono::microseconds current_time) EXCLUSIVE_LOCKS_REQUIRED(node.cs_sendProcessing);
54+    void MaybeSendAddr(CNode& node, std::chrono::microseconds current_time);
55 
56     const CChainParams& m_chainparams;
57     CConnman& m_connman;
58@@ -4160,7 +4160,7 @@ void PeerManagerImpl::MaybeSendPing(CNode& node_to, Peer& peer, std::chrono::mic
59 
60 void PeerManagerImpl::MaybeSendAddr(CNode& node, std::chrono::microseconds current_time)
61 {
62-    AssertLockHeld(node.cs_sendProcessing);
63+    LOCK(node.cs_sendProcessing);
64 
65     // Nothing to do for non-address-relay peers
66     if (!node.RelayAddrsWithConn()) return;

in src/net_processing.cpp:4194 in e3d6cbca83 outdated

4202+    };
4203+    node.vAddrToSend.erase(std::remove_if(node.vAddrToSend.begin(), node.vAddrToSend.end(), addr_already_known),
4204+                           node.vAddrToSend.end());
4205+
4206+    // No addr messages to send
4207+    if (node.vAddrToSend.empty()) return;

hebasto commented at 4:44 pm on March 26, 2021:

e3d6cbca83300e5ccc0fd8a1075ff52654aa76ff

Early return skips https://github.com/bitcoin/bitcoin/blob/e3d6cbca83300e5ccc0fd8a1075ff52654aa76ff/src/net_processing.cpp#L4221-L4224

That differs from original behavior.

jnewbery commented at 11:24 am on March 29, 2021:

Thanks @hebasto! See #21236 (comment).

[net processing] Only call GetTime() once in SendMessages()

We currently call GetTime() 4 times in SendMessages(). Consolidate this to
once GetTime() call.

c02fa47baa

[net] Change addr send times fields to be guarded by new mutex 4ad4abcf07

[net processing] Extract `addr` send functionality into MaybeSendAddr()

Reviewer hint: review with

 `git diff --color-moved=dimmed-zebra --ignore-all-space`

ad719297f2

[net processing] Change MaybeSendAddr() to take a reference

Change name of CNode parameter to node now that it's no longer a
pointer.

c87423c58b

[net processing] Refactor MaybeSendAddr() - early exits

Add early exit guard clauses if node.RelayAddrsWithConn() is false or if
current_time < node.m_next_addr_send. Add comments.

This commit leaves some lines over-indented. Those will be fixed in a
subsequent whitespace-only commit.

38c0be5da3

[net processing] Fix overindentation in MaybeSendAddr()

Reviewer hint: review with `git diff --ignore-all-space`.

01a79ff924

in src/net_processing.cpp:4254 in 461dda316a outdated

4191@@ -4194,7 +4192,9 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
4192     // If we get here, the outgoing message serialization version is set and can't change.
4193     const CNetMsgMaker msgMaker(pto->GetCommonVersion());
4194 
4195-    MaybeSendPing(*pto, *peer);
4196+    const auto current_time = GetTime<std::chrono::microseconds>();
4197+
4198+    MaybeSendPing(*pto, *peer, current_time);

ccdle12 commented at 2:04 pm on March 28, 2021:

super nit: current_time here is const. Not sure if the intention for now is to be immutable in MaybeSendPing(), but I think now would still be mutable even though the outer scoped variable is const because of the param signature in MaybeSendPing(..., std::chrono::microseconds now)

jnewbery commented at 10:11 am on March 29, 2021:

In general, there’s no benefit to making pass by value parameters const. Passing by reference to const indicates to the caller that the parameter will not be mutated. When passing by value, a copy of the parameter is made, so the original object cannot be mutated.

There’s very good documentation about how to pass parameters in the c++ core guidelines: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f16-for-in-parameters-pass-cheaply-copied-types-by-value-and-others-by-reference-to-const.

ccdle12 commented at 11:52 am on March 29, 2021:

Thanks, apologies, I think I should have been more clear. Even though the std::chrono is passed by value, MaybeSendPing() treats it the copy as mutable even though I don’t think the method mutates the copy. So I was just thinking would it be useful/or maybe unecessary to make current_time in MaybeSendPing() also const.

jnewbery commented at 1:42 pm on March 29, 2021:

I understand. Yes, it is possible to mark the function parameters as const in the definition, which would prevent the copy from being mutated within the function body. You can even do this if the function declaration does not mark the parameter as const.

However, I haven’t ever seen that style used in this project. It may be marginally useful, but it may also be misleading (eg see https://groups.google.com/u/4/g/comp.lang.c++.moderated/c/DEQTho8OCj8).

jnewbery commented at 11:23 am on March 29, 2021: member

Thanks for the review @hebasto! Responding to your points:

I see two calls of three are consolidated. What about the third one […]

I was also counting the call inside MaybeSendPing(), which is called from SendMessages(). I’ve now removed the additional call to GetTime() as suggested, which required a duration cast.

I believe that this commit is not a clean refactoring. To be precise, change behavior could be expected as even LOCK(cs_main); could take unpredictable time. But the scale of such changes seems insignificant to P2P, right?

Good observation! However, I think it’s fine to treat time as constant within a call to SendMessages(), since it’s observationally equivalent to any actors outside net_processing.

The commit “[net processing] Extract addr send functionality into MaybeSendAddr()” (97e1013) allows to localize acquiring of the CNode::cs_sendProcessing mutex

The fact that you’ve raised this independently has made me reconsider the discussion here #21236 (review). It doesn’t make sense for a mutex to be guarding variables/functions across multiple layers, so I’ve moved the addr send time vars under their own mutex.

Early return skips

0 // we only send the big addr message once 
1 if (node.vAddrToSend.capacity() > 40) { 
2     node.vAddrToSend.shrink_to_fit(); 
3 }

That differs from original behavior.

Another great observation! Although again, I don’t think it matters. This logic is for when we’ve sent a response to the peer’s getaddr message. In such cases, we’d expect to send up to 1000 addresses, and for the address filter to not yet be filled, so we wouldn’t exit early.

jnewbery force-pushed on Mar 29, 2021

hebasto commented at 4:45 pm on March 29, 2021: member

RecursiveMutex cs_sendProcessing guards nothing. Let’s burn it :)

hebasto commented at 4:52 pm on March 29, 2021: member

pico-nit: I find myself more comfortable to reason about the code when the same variable is placed on the same position in similar logical operations. Consider https://github.com/bitcoin/bitcoin/blob/b4d7d3def058e17d36cf7702e34c0a2fff45a5cf/src/net_processing.cpp#L4154-L4155 and https://github.com/bitcoin/bitcoin/blob/b4d7d3def058e17d36cf7702e34c0a2fff45a5cf/src/net_processing.cpp#L4173

Could current_time position be consistent?

jnewbery commented at 5:18 pm on March 29, 2021: member

pico-nit: I find myself more comfortable to reason about the code when the same variable is placed on the same position in similar logical operations.

I tend to agree and think that generally the variable should be placed on the left and the thing it’s being compared to on the right. Others disagreed with me on previous PRs and told me to always place the smaller thing on the left. I think it’s just a matter of taste :shrug:

jnewbery renamed this:
~~Net processing: Extract `addr` send functionality into MaybeSendAddr()~~
net processing: Extract `addr` send functionality into MaybeSendAddr()
on Mar 30, 2021

[net processing] Refactor MaybeSendAddr()

Changes to make MaybeSendAddr simpler and easier to maintain/update:

- assert invariant that node.vAddrToSend.size() can never exceed
  MAX_ADDR_TO_SEND
- erase known addresses from vAddrToSend in one pass
- no check for (vAddr.size() >= MAX_ADDR_TO_SEND) during iteration,
  since vAddr can never exceed MAX_ADDR_TO_SEND.

935d488922

jnewbery force-pushed on Mar 31, 2021

hebasto approved

hebasto commented at 8:07 pm on March 31, 2021: member

ACK 935d4889228e7e361c8b0020761fa0e08a55fb48, I have reviewed the code and it looks OK, I agree it can be merged.

sipa commented at 11:18 pm on March 31, 2021: member

utACK 935d4889228e7e361c8b0020761fa0e08a55fb48

MarcoFalke commented at 6:29 am on April 1, 2021: member

review ACK 935d4889228e7e361c8b0020761fa0e08a55fb48 🐑

Signature:

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3review ACK 935d4889228e7e361c8b0020761fa0e08a55fb48 🐑
 4-----BEGIN PGP SIGNATURE-----
 5
 6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
 7pUh7Hgv/dmVk2jwk66OpRVhWiCUITNoKMBCZFBA8+k63KsptFTEUV0oneUoiMVTm
 8Ftw+YVO60MZdNQjBet3CC3IyTUv098/H2+e9cODLgbRpRMJaLewzevxUy0o6RToQ
 9BzHalKxNnrGEUnJB/6zm4HzXWIPJWppY947BlLEuZ+HM7J8HgAmFVPZWa/VubsrN
10YhY6rBbtHuJKGamsdE/LkNfNEMhP4dIOksl1WteXg2xoSeBHQeHL6nFLKFjJohbj
11Y34ZSnxo0jJFpz+bSVlzelJh5EL7/G+fZzvE5vbOa39+5unZ2mFx70ftvGYgRK6V
12qLfhnyeTMABXjs5bwU3HcjVGqa1ZsApIhY+sLs6lwSQmNAG2UjOxPcl7EDcXnZha
13fV//g/jdU5qTyNQiriWrA0TEtwPm7X1lgJFYBgpfSAfnSIxbLk/uZWkPqFVVbt5P
14Mh2PXN7lIqJ/irxKTBscAXqX4RrIbHXHVv5TaUvS3kI2igSLaKjUhj2Nqoea6UZ3
15XBBbfMvT
16=sVrE
17-----END PGP SIGNATURE-----

Timestamp of file with hash 2166a9ff7592f66162db55121e31cb0376f60455376bea865e5631ad552a2fc6 -

MarcoFalke merged this on Apr 1, 2021

MarcoFalke closed this on Apr 1, 2021

jnewbery deleted the branch on Apr 1, 2021

sidhujag referenced this in commit cee757ebf9 on Apr 1, 2021

luke-jr referenced this in commit 8b426c4de2 on Jun 27, 2021

Fabcien referenced this in commit 8c182804b6 on Jan 28, 2022

Fabcien referenced this in commit ff4da06d8d on Jan 28, 2022

Fabcien referenced this in commit 79cc0dca6f on Jan 28, 2022

Fabcien referenced this in commit 4521689ee9 on Jan 28, 2022

Fabcien referenced this in commit 6865c99033 on Jan 28, 2022

DrahtBot locked this on Aug 16, 2022

net processing: Extract addr send functionality into MaybeSendAddr() #21236

Conflicts

net processing: Extract `addr` send functionality into MaybeSendAddr() #21236