net processing: Move peer_map to PeerManager

jnewbery commented at 5:17 pm on September 7, 2020: member

This moves g_peer_map from a global in net_processing.cpp’s unnamed namespace to being a member m_peer_map of PeerManager.

jnewbery commented at 5:18 pm on September 7, 2020: member

Requested by @MarcoFalke @sdaftuar and @theuni . Review very much appreciated :pray:

in src/rpc/net.cpp:160 in e0b0d0c564 outdated

155@@ -156,7 +156,10 @@ static UniValue getpeerinfo(const JSONRPCRequest& request)
156     for (const CNodeStats& stats : vstats) {
157         UniValue obj(UniValue::VOBJ);
158         CNodeStateStats statestats;
159-        bool fStateStats = GetNodeStateStats(stats.nodeid, statestats);
160+        bool fStateStats{false};
161+        if (node.peerman) {

MarcoFalke commented at 5:25 pm on September 7, 2020:

It is currently not possible to not have node.peerman. I think an Ensure..() like EnsureMempool would be good. It doesn’t really make sense to call getpeerinfo without a peerman anyway?

jnewbery commented at 6:08 pm on September 7, 2020:

I’ve done something slightly different and checked existence at the top of this function. Let me know what you think.

in src/net_processing.h:130 in e0b0d0c564 outdated

125@@ -93,7 +126,14 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
126      */
127     void Misbehaving(const NodeId pnode, const int howmuch, const std::string& message);
128 
129+    /** Get statistics from node state */
130+    bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats);

MarcoFalke commented at 5:25 pm on September 7, 2020:

0    bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats);

nit for new code

jnewbery commented at 6:09 pm on September 7, 2020:

fixed. Thanks!

MarcoFalke commented at 5:31 pm on September 7, 2020: member

Concept ACK

DrahtBot added the label P2P on Sep 7, 2020

DrahtBot added the label RPC/REST/ZMQ on Sep 7, 2020

jnewbery force-pushed on Sep 7, 2020

hebasto commented at 6:34 pm on September 7, 2020: member

Concept ACK.

MarcoFalke removed the label P2P on Sep 7, 2020

MarcoFalke removed the label RPC/REST/ZMQ on Sep 7, 2020

MarcoFalke added the label Refactoring on Sep 7, 2020

practicalswift commented at 7:23 pm on September 7, 2020: contributor

Concept ACK: decoupling is good!

in src/net_processing.h:184 in 322e87b603 outdated

175@@ -146,6 +176,16 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
176     CTxMemPool& m_mempool;
177 
178     int64_t m_stale_tip_check_time; //!< Next time to check for stale tip
179+
180+    /** Protects m_peer_map */
181+    Mutex m_peer_mutex;
182+    /**
183+     * Map of all Peer objects, keyed by peer id. This map is protected
184+     * by the global m_peer_mutex. Once a shared pointer reference is

hebasto commented at 9:36 am on September 8, 2020:

322e87b6039877ed3457baea93cc87d78f3f4d96

0     * by the m_peer_mutex. Once a shared pointer reference is

jnewbery commented at 10:32 am on September 8, 2020:

good catch. Fixed.

in src/net_processing.h:199 in 322e87b603 outdated

175@@ -146,6 +176,16 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
176     CTxMemPool& m_mempool;
177 
178     int64_t m_stale_tip_check_time; //!< Next time to check for stale tip
179+
180+    /** Protects m_peer_map */

hebasto commented at 9:37 am on September 8, 2020:

322e87b6039877ed3457baea93cc87d78f3f4d96 nit: This comment duplicates the code :) Consider to drop it.

jnewbery commented at 10:32 am on September 8, 2020:

Normally I’d agree with removing unnecessary commenting, but I think it’s useful for every member variable to have a doxygen comment. Happy to go either way if others think it should be dropped.

laanwj commented at 11:14 am on September 8, 2020:

I’m for keeping it, it’s better for consistency to have a comment for every one.

hebasto changes_requested

hebasto commented at 9:39 am on September 8, 2020: member

ACK 322e87b6039877ed3457baea93cc87d78f3f4d96, I have reviewed the code and it looks OK.

jnewbery force-pushed on Sep 8, 2020

hebasto approved

hebasto commented at 10:46 am on September 8, 2020: member

re-ACK d350feb2013e8db8a7464ec40670ef609d77e778

in src/interfaces/node.cpp:110 in 6647af9361 outdated

110-                for (auto& node_stats : stats) {
111-                    std::get<1>(node_stats) =
112-                        GetNodeStateStats(std::get<0>(node_stats).nodeid, std::get<2>(node_stats));
113+            if (m_context->peerman) {
114+                TRY_LOCK(::cs_main, lockMain);
115+                if (lockMain) {

ariard commented at 2:50 pm on September 8, 2020:

Does it change something if this check is moved above at the m_context->connman level ? You may save one more control flow.

hebasto commented at 4:10 pm on September 8, 2020:

Does it change something if this check is moved above at the m_context->connman level ? You may save one more control flow.

Why increase code block with locked ::cs_main?

jnewbery commented at 4:27 pm on September 8, 2020:

Does it change something if this check is moved above at the m_context->connman level ? You may save one more control flow.

Yes, it’s a try lock. It’s possible that we’re not able to take the lock, and therefore that we’re not able to get the node state stats, but we’d still get the node stats. If we move the try above the m_context->connman level, then if we couldn’t get the cs_main lock, we wouldn’t be able to get any stats.

Eventually, none of the statistics should require cs_main, so the try lock can be removed entirely.

in src/net_processing.cpp:855 in 6647af9361 outdated

952@@ -953,7 +953,7 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
953     LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
954 }
955 
956-bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) {
957+bool PeerManager::GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) {

ariard commented at 2:52 pm on September 8, 2020:

It’s a bit confusing to have GetNodeStateStats which is actually a member of PeerManager, and still have a CNodeState, even if inside this function you effectively fetch it. Maybe rename GetNodeProcessingStats ?

jnewbery commented at 4:12 pm on September 8, 2020:

I think the name right now is fine. It’s getting stats related to the node’s state.

ariard commented at 2:56 pm on September 8, 2020: member

Concept ACK

promag commented at 8:14 am on September 9, 2020: member

Concept ACK.

DrahtBot commented at 1:46 pm on September 19, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#20295 (rpc: getblockfrompeer by Sjors)
#20217 (net: Remove g_relay_txes by jnewbery)
#19858 (Periodically make block-relay connections and sync headers by sdaftuar)
#19829 (net processing: Move block inventory state to net_processing by jnewbery)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/net_processing.cpp:874 in d350feb201 outdated

870@@ -915,8 +871,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
871         PeerRef peer = GetPeerRef(nodeid);
872         assert(peer != nullptr);
873         misbehavior = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score);
874-        LOCK(g_peer_mutex);
875-        g_peer_map.erase(nodeid);
876+        LOCK(m_peer_mutex);

theuni commented at 8:19 pm on September 22, 2020:

Using GetPeerRef() with an encapsulated lock for m_peer_map, then manually locking the private mutex 3 lines later looks strange. It also requires two locks… all the drawbacks of mutex encapsulation without exploiting the benefits :)

Since misbehavior changes to the peer will go unobserved after this point anyway, why not just take and remove with a single lock?

Something like (untested):

 0PeerRef PeerManager::RemovePeer(NodeId id)
 1{
 2    PeerRef ret;
 3    {
 4        LOCK(m_peer_mutex);
 5        auto it = m_peer_map.find(id);
 6        if (it != m_peer_map.end()) {
 7            ret = std::move(it->second);
 8            m_peer_map.erase(it);
 9        }
10    }
11    return ret;
12}

jnewbery commented at 9:22 am on September 24, 2020:

Yes, that’s much better. Thank you!

I’ve implemented this within GetPeerRef() to avoid the code duplication, and also added another commit that tidies up FinalizeNode() a bit more. Let me know what you think.

jnewbery commented at 10:50 am on September 24, 2020:

EDIT: The comment below is wrong. I introduced a different bug when implementing this as Cory points out below.

~~oops. There’s a bug in the code above which caused this to fail. You can’t move the shared_ptr out of the map and then erase it. More precisely, I think something like the following is happening:~~

ret = std::move(it->second); means ret is reset to be a shared pointer to Peer. It steals the refcount from it->second, since std::move(it->second) is an rvalue. std::move doesn’t do anything to the map entry.
~~the entry is then erased from the map, decrementing the refcount to zero and destructing the Peer object.~~

~~Removing the std::move fixed this.~~

theuni commented at 5:12 pm on October 5, 2020:

This is really uninteresting minutia, but @jnewbery and I happened to get into a discussion/debate about it today so I’m following up with details. Everyone else can safely ignore this as boring.

I don’t believe the comment above is correct. If removing the std::move helps, I’m wondering if there might be another issue causing problems.

ret = std::move(it->second); means ret is reset to be a shared pointer to Peer. It steals the refcount from it->second, since std::move(it->second) is an rvalue. std::move doesn’t do anything to the map entry.

Indeed std::move doesn’t do anything to the map entry as it’s essentially just a cast, but the operator= does likely (definitely?) reset it. See libc++’s implementation of shared_ptr’s move assignment operator, which calls the move constructor, where the rhs is null’d.

the entry is then erased from the map, decrementing the refcount to zero and destructing the Peer object.

We can write a quick test to see that the refcount never hits zero after removal from the map, that copy vs. move refcounts are different as we would expect them to be, and that a shared_ptr moved from a map’s value is still safe to use:

 0#include <cassert>
 1#include <cstdio>
 2#include <map>
 3#include <memory>
 4#include <string>
 5
 6static std::shared_ptr<std::string> copytest()
 7{
 8    std::map<char, std::shared_ptr<std::string> > foo;
 9    foo.emplace('c', std::make_shared<std::string>("hello"));
10    auto it = foo.find('c');
11    std::shared_ptr<std::string> ret;
12    if (it != foo.end()) {
13        assert(it->second.use_count() == 1);
14        ret = it->second;
15        assert(it->second != nullptr); // <------------------
16        assert(ret.use_count() == 2);  // <------------------
17        foo.erase(it);
18        assert(ret.use_count() == 1);
19    }
20    return ret;
21}
22
23static std::shared_ptr<std::string> movetest()
24{
25    std::map<char, std::shared_ptr<std::string> > foo;
26    foo.emplace('c', std::make_shared<std::string>("world"));
27    auto it = foo.find('c');
28    std::shared_ptr<std::string> ret;
29    if (it != foo.end()) {
30        assert(it->second.use_count() == 1);
31        ret = std::move(it->second);
32        assert(it->second == nullptr); // <------------------
33        assert(ret.use_count() == 1);  // <------------------
34        foo.erase(it);
35        assert(ret.use_count() == 1);
36    }
37    return ret;
38}
39
40int main()
41{
42    auto copyptr = copytest();
43    assert(copyptr.use_count() == 1);
44    auto moveptr = movetest();
45    assert(moveptr.use_count() == 1);
46
47    printf("%s %s!\n", copyptr->c_str(), moveptr->c_str());
48}

Note that this doesn’t actually prove that the map itself is still in working order after the move+removal, but I don’t see why that should break anything.

Lastly, we have cases in our code where we move from a map’s value. One example: https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L1012

tl;dr: If moving a shared_ptr out of a mapped value causes a problem that a copy doesn’t, we should make sure to understand why and potentially audit other places where we do it as well.

theuni commented at 5:20 pm on October 5, 2020:

Aha, I think I found the problem. From your previous version of this commit:

0...
1    if (it != m_peer_map.end()) {
2        ret = std::move(it->second);
3        if (erase) m_peer_map.erase(it);
4    }
5    return ret;

Moving only makes sense when erasing.

jnewbery commented at 9:42 am on October 8, 2020:

Indeed, the failure was caused by that. Sorry for the noise (and for casting aspersions on your code!)

jnewbery force-pushed on Sep 24, 2020

in src/net_processing.cpp:867 in 658ca700ff outdated

863@@ -864,23 +864,18 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
864 }
865 
866 void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
867-    fUpdateConnectionTime = false;
868+    PeerRef peer = RemovePeer(nodeid);

theuni commented at 7:56 pm on September 24, 2020:

This undoes this assumption: #19607 (review) , which I’m still not convinced is a good idea.

jnewbery commented at 8:01 am on September 25, 2020:

ok, removed that commit

jnewbery force-pushed on Sep 25, 2020

jnewbery force-pushed on Sep 29, 2020

jnewbery added the label P2P on Oct 1, 2020

in src/net_processing.cpp:910 in 2ce2ec188a outdated

906@@ -909,11 +907,16 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
907     LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
908 }
909 
910-PeerRef PeerManager::GetPeerRef(NodeId id)
911+PeerRef PeerManager::GetPeerRef(NodeId id, bool erase)

theuni commented at 6:58 pm on October 5, 2020:

This is (mostly) a style preference, but I’m not a fan of adding a boolean option to substantially change the functionality of a function as opposed to just creating two functions. GetPeerRef() is logically a read-only operation.

It’s also easier easy to slip up and make mistakes like this one: #19910 (review) ;)

jnewbery commented at 10:13 am on October 8, 2020:

You got me. I’ve made it a separate function.

GetPeerRef() is logically a read-only operation.

Good point. I’ve made it const (and made the mutex mutable)

jnewbery force-pushed on Oct 8, 2020

jnewbery commented at 10:13 am on October 8, 2020: member

Thanks for the review @theuni. I’ve addressed your comment.

DrahtBot added the label Needs rebase on Oct 19, 2020

jnewbery force-pushed on Oct 19, 2020

jnewbery commented at 8:10 am on October 19, 2020: member

Rebased

DrahtBot removed the label Needs rebase on Oct 19, 2020

jnewbery commented at 11:14 am on November 19, 2020: member

Rebased

jnewbery force-pushed on Nov 19, 2020

laanwj added this to the "Blockers" column in a project

promag commented at 0:46 am on November 30, 2020: member

Code review ACK 3bf59172859b484405321d9778da16e41dc049cc.

This is a simple and nice refactor. Verified moved code.

DrahtBot added the label Needs rebase on Dec 1, 2020

jnewbery force-pushed on Dec 1, 2020

jnewbery commented at 11:54 am on December 1, 2020: member

rebased

DrahtBot removed the label Needs rebase on Dec 1, 2020

DrahtBot added the label Needs rebase on Dec 2, 2020

in src/net_processing.h:147 in aedd418fed outdated

142@@ -143,6 +143,10 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
143      *  May return nullptr if the Peer object can't be found. */
144     PeerRef GetPeerRef(NodeId id) const;
145 
146+    /** Get a shared pointer to the Peer object and remove it from m_peer_map.
147+     *  May return nullptr if the Peer object can't be found. */

hebasto commented at 10:35 am on December 3, 2020:

aedd418fed7abfd00cc731bd57ed5b013259eb5a

To be precise, it may return an empty shared pointer rather nullptr.

jnewbery commented at 11:53 am on December 4, 2020:

fixed

in src/net_processing.cpp:842 in 6bfcef3797 outdated

841+    PeerRef ret;
842     LOCK(m_peer_mutex);
843     auto it = m_peer_map.find(id);
844-    return it != m_peer_map.end() ? it->second : nullptr;
845+    if (it != m_peer_map.end()) ret = it->second;
846+    return ret;

hebasto commented at 10:41 am on December 3, 2020:

6bfcef3797d4d105180e325033929684f84cfc08

Why are these changes required? What are benefits to return an empty shared pointer instead nullptr-constructed one? Btw, in all caller places the return value is checked for nullptr, not being empty.

jnewbery commented at 11:54 am on December 4, 2020:

This makes is clear that NRVO can be used, since there’s only one return statement. I’m not sure if the previous version would have used copy elision.

theuni commented at 2:54 pm on December 4, 2020:

+1 for considering this!

I’d be curious to know if compilers are allowed to rearrange the code before deciding how many return statements there are. I could see this going either way.

hebasto commented at 10:08 am on December 5, 2020:

Considering the before-change code, to be precise, in C++17 in a return statement, when the operand is a prvalue (the result of the ternary conditional expression), elision of copy/move operations is mandatory.

OTOH, NRVO is non-mandatory.

jnewbery commented at 12:02 pm on December 7, 2020:

Thanks @hebasto! Yes, I think you’re right. If I’m reading it correctly, the result of the conditional operator is a prvalue (item 5 in https://en.cppreference.com/w/cpp/language/operator_other#Conditional_operator since 4 doesn’t apply), and copy elision is mandatory for returning a prvalue.

I’ve reverted this.

hebasto commented at 12:05 pm on December 7, 2020:

… and copy elision is mandatory for returning a prvalue.

Good to have C++17 :)

hebasto approved

hebasto commented at 10:41 am on December 3, 2020: member

re-ACK aedd418fed7abfd00cc731bd57ed5b013259eb5a, added two last commits since my previous review.

Still required rebasing due to the conflict with #20530.

dongcarl commented at 4:39 pm on December 3, 2020: member

Code Review ACK aedd418

Private members » Anon-namespaced statics/functions

Especially when the functionality is strongly intertwined with the class of which it’s being made a member.

[net processing] Move GetNodeStateStats into PeerManager a529fd3e3f

jnewbery force-pushed on Dec 4, 2020

jnewbery commented at 11:55 am on December 4, 2020: member

Rebased and addressed outstanding review comments.

in src/net_processing.cpp:793 in 5a1f08ce1f outdated

789@@ -790,11 +790,9 @@ void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
790     LOCK(cs_main);
791     int misbehavior{0};
792     {
793-        PeerRef peer = GetPeerRef(nodeid);
794+        PeerRef peer = RemovePeer(nodeid);

promag commented at 12:00 pm on December 4, 2020:

This also avoids duplicate lookup on m_peer_map :+1:

promag commented at 12:02 pm on December 4, 2020: member

Core review ACK 4fe77ae9aab73d8de6c40d504e079e30eff45f91.

Could squash last commit or fix the comment in 5a1f08ce1f4d49fe85b8571cd5da1b1096339a89.

DrahtBot removed the label Needs rebase on Dec 4, 2020

in src/net_processing.cpp:794 in 4fe77ae9aa outdated

789@@ -842,11 +790,9 @@ void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
790     LOCK(cs_main);
791     int misbehavior{0};
792     {
793-        PeerRef peer = GetPeerRef(nodeid);
794+        PeerRef peer = RemovePeer(nodeid);
795         assert(peer != nullptr);

theuni commented at 2:31 pm on December 4, 2020:

This is a little awkward because of the comment in RemovePeer:

0/** Get a shared pointer to the Peer object and remove it from m_peer_map.
1 *  May return an empty shared_ptr if the Peer object can't be found. */

Might be worth of a comment explaining that at this point the peer is guaranteed to exist, but not afterwards. Then again, that’s kinda what the assert means. Shrug.

jnewbery commented at 12:11 pm on December 7, 2020:

Yeah, I think it’s ok for now. Happy to change this if you decide you feel more strongly about it.

rebroad commented at 8:00 am on September 15, 2021:

I’m getting bitcoind crashing due to this assert. It appears that sometimes FinalizeNode gets called twice (from DeleteNode()). Perhaps nRefCount is being increased between the two occurances. This only seems to happen though when there’s a delay between them due to UpdateTip(). It’s probably a bug I’ve introduced somewhere, but just leaving this comment as a “heads up”.

jnewbery commented at 9:54 am on September 15, 2021:

Thanks. I expect if the bug was present in master we’d have seen it in CI or from a user report.

theuni approved

theuni commented at 2:33 pm on December 4, 2020: member

ACK 4fe77ae9aab73d8de6c40d504e079e30eff45f91 or squashed as @promag suggested.

hebasto approved

hebasto commented at 9:50 am on December 5, 2020: member

re-ACK 4fe77ae9aab73d8de6c40d504e079e30eff45f91, only rebased since my previous review and added a commit to fix comments.

[net_processing] Move peer_map to PeerManager ed7e469cee

[net processing] Make GetPeerRef const a20ab22786

[net processing] Add RemovePeer()

This allows us to avoid repeated locking in FinalizeNode()

3025ca9e77

jnewbery force-pushed on Dec 7, 2020

jnewbery commented at 12:12 pm on December 7, 2020: member

Thanks for the review @hebasto @promag @theuni and @dongcarl. I’ve addressed all comments, so this should now be ready for re-review.

in src/net_processing.cpp:853 in 3025ca9e77

849+    if (it != m_peer_map.end()) {
850+        ret = std::move(it->second);
851+        m_peer_map.erase(it);
852+    }
853+    return ret;
854+}

hebasto commented at 12:15 pm on December 7, 2020:

3025ca9e7743d9b96c22e9c6ed7ef051dcea7e54 I’m curios is it possible to force the mandatory copy/move elision for the return value here as well?

hebasto approved

hebasto commented at 1:26 pm on December 7, 2020: member

re-ACK 3025ca9e7743d9b96c22e9c6ed7ef051dcea7e54, since my previous review only reverted the change that introduced NRVO in PeerManager::GetPeerRef, and comments are fixed in the proper commits.

theuni commented at 7:23 pm on December 8, 2020: member

Re-ACK 3025ca9e7743d9b96c22e9c6ed7ef051dcea7e54.

dongcarl commented at 1:11 pm on December 9, 2020: member

Re-ACK 3025ca9

Disambiguated between nullptr and empty shared_ptr
Reverted refactoring in PeerManager::GetPeerRef
Made Peer constructor explicit

fanquake merged this on Dec 9, 2020

fanquake closed this on Dec 9, 2020

sidhujag referenced this in commit c27878a62e on Dec 9, 2020

laanwj removed this from the "Blockers" column in a project

jnewbery deleted the branch on Sep 15, 2021

fanquake locked this on Sep 15, 2021

net processing: Move peer_map to PeerManager #19910

Conflicts