g_peer_map
from a global in net_processing.cpp’s unnamed namespace to being a member m_peer_map
of PeerManager
.
g_peer_map
from a global in net_processing.cpp’s unnamed namespace to being a member m_peer_map
of PeerManager
.
155@@ -156,7 +156,10 @@ static UniValue getpeerinfo(const JSONRPCRequest& request)
156 for (const CNodeStats& stats : vstats) {
157 UniValue obj(UniValue::VOBJ);
158 CNodeStateStats statestats;
159- bool fStateStats = GetNodeStateStats(stats.nodeid, statestats);
160+ bool fStateStats{false};
161+ if (node.peerman) {
node.peerman
. I think an Ensure..()
like EnsureMempool
would be good. It doesn’t really make sense to call getpeerinfo
without a peerman anyway?
125@@ -93,7 +126,14 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
126 */
127 void Misbehaving(const NodeId pnode, const int howmuch, const std::string& message);
128
129+ /** Get statistics from node state */
130+ bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats);
0 bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats);
nit for new code
175@@ -146,6 +176,16 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
176 CTxMemPool& m_mempool;
177
178 int64_t m_stale_tip_check_time; //!< Next time to check for stale tip
179+
180+ /** Protects m_peer_map */
181+ Mutex m_peer_mutex;
182+ /**
183+ * Map of all Peer objects, keyed by peer id. This map is protected
184+ * by the global m_peer_mutex. Once a shared pointer reference is
322e87b6039877ed3457baea93cc87d78f3f4d96
0 * by the m_peer_mutex. Once a shared pointer reference is
175@@ -146,6 +176,16 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
176 CTxMemPool& m_mempool;
177
178 int64_t m_stale_tip_check_time; //!< Next time to check for stale tip
179+
180+ /** Protects m_peer_map */
110- for (auto& node_stats : stats) {
111- std::get<1>(node_stats) =
112- GetNodeStateStats(std::get<0>(node_stats).nodeid, std::get<2>(node_stats));
113+ if (m_context->peerman) {
114+ TRY_LOCK(::cs_main, lockMain);
115+ if (lockMain) {
m_context->connman
level ? You may save one more control flow.
Does it change something if this check is moved above at the
m_context->connman
level ? You may save one more control flow.
Why increase code block with locked ::cs_main
?
Does it change something if this check is moved above at the m_context->connman level ? You may save one more control flow.
Yes, it’s a try lock. It’s possible that we’re not able to take the lock, and therefore that we’re not able to get the node state stats, but we’d still get the node stats. If we move the try above the m_context->connman level, then if we couldn’t get the cs_main lock, we wouldn’t be able to get any stats.
Eventually, none of the statistics should require cs_main, so the try lock can be removed entirely.
952@@ -953,7 +953,7 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
953 LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
954 }
955
956-bool GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) {
957+bool PeerManager::GetNodeStateStats(NodeId nodeid, CNodeStateStats &stats) {
GetNodeStateStats
which is actually a member of PeerManager
, and still have a CNodeState
, even if inside this function you effectively fetch it. Maybe rename GetNodeProcessingStats
?
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
870@@ -915,8 +871,8 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
871 PeerRef peer = GetPeerRef(nodeid);
872 assert(peer != nullptr);
873 misbehavior = WITH_LOCK(peer->m_misbehavior_mutex, return peer->m_misbehavior_score);
874- LOCK(g_peer_mutex);
875- g_peer_map.erase(nodeid);
876+ LOCK(m_peer_mutex);
Using GetPeerRef()
with an encapsulated lock for m_peer_map
, then manually locking the private mutex 3 lines later looks strange. It also requires two locks… all the drawbacks of mutex encapsulation without exploiting the benefits :)
Since misbehavior changes to the peer will go unobserved after this point anyway, why not just take and remove with a single lock?
Something like (untested):
0PeerRef PeerManager::RemovePeer(NodeId id)
1{
2 PeerRef ret;
3 {
4 LOCK(m_peer_mutex);
5 auto it = m_peer_map.find(id);
6 if (it != m_peer_map.end()) {
7 ret = std::move(it->second);
8 m_peer_map.erase(it);
9 }
10 }
11 return ret;
12}
Yes, that’s much better. Thank you!
I’ve implemented this within GetPeerRef()
to avoid the code duplication, and also added another commit that tidies up FinalizeNode()
a bit more. Let me know what you think.
EDIT: The comment below is wrong. I introduced a different bug when implementing this as Cory points out below.
oops. There’s a bug in the code above which caused this to fail. You can’t move the shared_ptr out of the map and then erase it. More precisely, I think something like the following is happening:
ret = std::move(it->second);
means ret
is reset to be a shared pointer to Peer
. It steals the refcount from it->second
, since std::move(it->second)
is an rvalue. std::move
doesn’t do anything to the map entry.Peer
object.Removing the std::move
fixed this.
This is really uninteresting minutia, but @jnewbery and I happened to get into a discussion/debate about it today so I’m following up with details. Everyone else can safely ignore this as boring.
I don’t believe the comment above is correct. If removing the std::move
helps, I’m wondering if there might be another issue causing problems.
- ret = std::move(it->second); means ret is reset to be a shared pointer to Peer. It steals the refcount from it->second, since std::move(it->second) is an rvalue. std::move doesn’t do anything to the map entry.
Indeed std::move
doesn’t do anything to the map entry as it’s essentially just a cast, but the operator=
does likely (definitely?) reset it.
See libc++’s implementation of shared_ptr’s move assignment operator, which calls the move constructor, where the rhs is null’d.
- the entry is then erased from the map, decrementing the refcount to zero and destructing the Peer object.
We can write a quick test to see that the refcount never hits zero after removal from the map, that copy vs. move refcounts are different as we would expect them to be, and that a shared_ptr moved from a map’s value is still safe to use:
0#include <cassert>
1#include <cstdio>
2#include <map>
3#include <memory>
4#include <string>
5
6static std::shared_ptr<std::string> copytest()
7{
8 std::map<char, std::shared_ptr<std::string> > foo;
9 foo.emplace('c', std::make_shared<std::string>("hello"));
10 auto it = foo.find('c');
11 std::shared_ptr<std::string> ret;
12 if (it != foo.end()) {
13 assert(it->second.use_count() == 1);
14 ret = it->second;
15 assert(it->second != nullptr); // <------------------
16 assert(ret.use_count() == 2); // <------------------
17 foo.erase(it);
18 assert(ret.use_count() == 1);
19 }
20 return ret;
21}
22
23static std::shared_ptr<std::string> movetest()
24{
25 std::map<char, std::shared_ptr<std::string> > foo;
26 foo.emplace('c', std::make_shared<std::string>("world"));
27 auto it = foo.find('c');
28 std::shared_ptr<std::string> ret;
29 if (it != foo.end()) {
30 assert(it->second.use_count() == 1);
31 ret = std::move(it->second);
32 assert(it->second == nullptr); // <------------------
33 assert(ret.use_count() == 1); // <------------------
34 foo.erase(it);
35 assert(ret.use_count() == 1);
36 }
37 return ret;
38}
39
40int main()
41{
42 auto copyptr = copytest();
43 assert(copyptr.use_count() == 1);
44 auto moveptr = movetest();
45 assert(moveptr.use_count() == 1);
46
47 printf("%s %s!\n", copyptr->c_str(), moveptr->c_str());
48}
Note that this doesn’t actually prove that the map itself is still in working order after the move+removal, but I don’t see why that should break anything.
Lastly, we have cases in our code where we move from a map’s value. One example: https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L1012
tl;dr: If moving a shared_ptr
out of a mapped value causes a problem that a copy doesn’t, we should make sure to understand why and potentially audit other places where we do it as well.
Aha, I think I found the problem. From your previous version of this commit:
0...
1 if (it != m_peer_map.end()) {
2 ret = std::move(it->second);
3 if (erase) m_peer_map.erase(it);
4 }
5 return ret;
Moving only makes sense when erasing.
863@@ -864,23 +864,18 @@ void PeerManager::ReattemptInitialBroadcast(CScheduler& scheduler) const
864 }
865
866 void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
867- fUpdateConnectionTime = false;
868+ PeerRef peer = RemovePeer(nodeid);
906@@ -909,11 +907,16 @@ void PeerManager::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
907 LogPrint(BCLog::NET, "Cleared nodestate for peer=%d\n", nodeid);
908 }
909
910-PeerRef PeerManager::GetPeerRef(NodeId id)
911+PeerRef PeerManager::GetPeerRef(NodeId id, bool erase)
This is (mostly) a style preference, but I’m not a fan of adding a boolean option to substantially change the functionality of a function as opposed to just creating two functions. GetPeerRef() is logically a read-only operation.
It’s also easier easy to slip up and make mistakes like this one: #19910 (review) ;)
You got me. I’ve made it a separate function.
GetPeerRef() is logically a read-only operation.
Good point. I’ve made it const
(and made the mutex mutable
)
Code review ACK 3bf59172859b484405321d9778da16e41dc049cc.
This is a simple and nice refactor. Verified moved code.
142@@ -143,6 +143,10 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface
143 * May return nullptr if the Peer object can't be found. */
144 PeerRef GetPeerRef(NodeId id) const;
145
146+ /** Get a shared pointer to the Peer object and remove it from m_peer_map.
147+ * May return nullptr if the Peer object can't be found. */
841+ PeerRef ret;
842 LOCK(m_peer_mutex);
843 auto it = m_peer_map.find(id);
844- return it != m_peer_map.end() ? it->second : nullptr;
845+ if (it != m_peer_map.end()) ret = it->second;
846+ return ret;
6bfcef3797d4d105180e325033929684f84cfc08
Why are these changes required? What are benefits to return an empty shared pointer instead nullptr
-constructed one? Btw, in all caller places the return value is checked for nullptr
, not being empty.
+1 for considering this!
I’d be curious to know if compilers are allowed to rearrange the code before deciding how many return statements there are. I could see this going either way.
Thanks @hebasto! Yes, I think you’re right. If I’m reading it correctly, the result of the conditional operator is a prvalue (item 5 in https://en.cppreference.com/w/cpp/language/operator_other#Conditional_operator since 4 doesn’t apply), and copy elision is mandatory for returning a prvalue.
I’ve reverted this.
… and copy elision is mandatory for returning a prvalue.
Good to have C++17 :)
Code Review ACK aedd418
Private members » Anon-namespaced statics/functions
Especially when the functionality is strongly intertwined with the class of which it’s being made a member.
789@@ -790,11 +790,9 @@ void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
790 LOCK(cs_main);
791 int misbehavior{0};
792 {
793- PeerRef peer = GetPeerRef(nodeid);
794+ PeerRef peer = RemovePeer(nodeid);
Core review ACK 4fe77ae9aab73d8de6c40d504e079e30eff45f91.
Could squash last commit or fix the comment in 5a1f08ce1f4d49fe85b8571cd5da1b1096339a89.
789@@ -842,11 +790,9 @@ void PeerManager::FinalizeNode(const CNode& node, bool& fUpdateConnectionTime) {
790 LOCK(cs_main);
791 int misbehavior{0};
792 {
793- PeerRef peer = GetPeerRef(nodeid);
794+ PeerRef peer = RemovePeer(nodeid);
795 assert(peer != nullptr);
This is a little awkward because of the comment in RemovePeer:
0/** Get a shared pointer to the Peer object and remove it from m_peer_map.
1 * May return an empty shared_ptr if the Peer object can't be found. */
Might be worth of a comment explaining that at this point the peer is guaranteed to exist, but not afterwards. Then again, that’s kinda what the assert means. Shrug.
This allows us to avoid repeated locking in FinalizeNode()
849+ if (it != m_peer_map.end()) {
850+ ret = std::move(it->second);
851+ m_peer_map.erase(it);
852+ }
853+ return ret;
854+}
Re-ACK 3025ca9
Peer
constructor explicit