[p2p] Add Peer struct for per-peer data in net processing

jnewbery commented at 9:16 am on July 28, 2020: member

We currently have two structures for per-peer data:

CNode in net, which should just contain connection layer data (eg socket, send/recv buffers, etc), but currently also contains some application layer data (eg tx/block inventory).
CNodeState in net processing, which contains p2p application layer data, but requires cs_main to be locked for access.

This PR adds a third struct Peer, which is for p2p application layer data, and doesn’t require cs_main. Eventually all application layer data from CNode should be moved to Peer, and any data that doesn’t strictly require cs_main should be moved from CNodeState to Peer (probably all of CNodeState eventually).

Peer objects are stored as shared pointers in a net processing global map g_peer_map, which is protected by g_peer_mutex. To use a Peer object, g_peer_mutex is locked, a copy of the shared pointer is taken, and the lock is released. Individual members of Peer are protected by different mutexes that guard related data. The lifetime of the Peer object is managed by the shared_ptr refcount.

This PR adds the Peer object and moves the misbehaving data from CNodeState to Peer. This allows us to immediately remove 15 LOCK(cs_main) instances.

For more motivation see #19398

jnewbery commented at 9:24 am on July 28, 2020: member

Thank you @MarcoFalke @promag @jonatack @dongcarl @ariard @theuni @fjahr @hebasto @troygiorshev for reviewing the prerequisite PRs! This is the first PR in the cs_main_split sequence where we start getting benefit by reducing cs_main scope in net processing. It also introduces the new structure and locking model, so it’s the PR that requires the most conceptual/approach review in the sequence. If this gets merged, then it’s a long procession of PRs to slowly move data into the new Peer object.

fanquake added the label P2P on Jul 28, 2020

jnewbery force-pushed on Jul 28, 2020

in src/net_processing.cpp:483 in d60a16bb3f outdated

478+ * TODO: move most members from CNodeState to this structure.
479+ * TODO: move remaining application-layer data members from CNode to this structure.
480+ */
481+struct Peer {
482+    /** Same id as the CNode object for this peer */
483+    NodeId m_id;

MarcoFalke commented at 3:12 pm on July 28, 2020:

0    const NodeId m_id;

in commit d60a16bb3f58fcb9422c277256b3c3a31cab3b2f

Any reason to not make this const? Having this mutable would violate the assumption that the id is unique per peer.

jnewbery commented at 8:06 am on July 29, 2020:

You’re right. This should be const. Fixed

in src/net_processing.cpp:476 in d60a16bb3f outdated

467@@ -468,6 +468,44 @@ static CNodeState *State(NodeId pnode) EXCLUSIVE_LOCKS_REQUIRED(cs_main) {
468     return &it->second;
469 }
470 
471+/**
472+ * Data stucture for an individual peer. This struct is not protected by
473+ * cs_main since it does not contain validation-critical data.
474+ *
475+ * Memory is owned by shared pointers and this object is destructed when
476+ * the refcount drops to zero (in FinalizeNode()).

MarcoFalke commented at 3:13 pm on July 28, 2020:

nit in commit d60a16bb3f58fcb9422c277256b3c3a31cab3b2f

0 * the refcount drops to zero.

FinalizeNode only decreases the refcount by one. It can still be larger than one (as long as someone is holding a PeerRef, e.g. getpeerinfo or similar)

jnewbery commented at 8:06 am on July 29, 2020:

Fixed

in src/net_processing.cpp:506 in d60a16bb3f outdated

501+static PeerRef GetPeer(NodeId id) {
502+    PeerRef ret;
503+    LOCK(g_peer_mutex);
504+    auto it = g_peer_map.find(id);
505+    if (it != g_peer_map.end()) ret = it->second;
506+    return ret;

MarcoFalke commented at 3:17 pm on July 28, 2020:

nit in same commit:

Can be written with one less line ;)

0    if (it != g_peer_map.end()) return it->second;
1    return {}; // ( or return nullptr; )

troygiorshev commented at 4:00 pm on July 28, 2020:

This may be a rare case where a ternary makes things clearer.

0    return (it != g_peer_map.end()) ? it->second : nullptr;

jnewbery commented at 8:06 am on July 29, 2020:

Done with Troy’s ternary. Thanks!

MarcoFalke commented at 3:47 pm on July 28, 2020: member

Approach ACK 3fa130da3ad7960f9ece4d34ff6e108b64b6f957, only some style-nits

DrahtBot commented at 5:04 pm on July 28, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#19791 ([net processing] Move Misbehaving() to PeerManager by jnewbery)
#19723 (Ignore unknown messages before VERACK by sdaftuar)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/net_processing.cpp:284 in 3fa130da3a outdated

269@@ -270,12 +270,6 @@ struct CNodeState {
270     const CService address;
271     //! Whether we have a fully established connection.
272     bool fCurrentlyConnected;
273-    //! Accumulated misbehaviour score for this peer.
274-    int nMisbehavior;
275-    //! Whether this peer should be disconnected and marked as discouraged (unless it has the noban permission).
276-    bool m_should_discourage;
277-    //! String name of this peer (debugging/logging purposes).
278-    const std::string name;

troygiorshev commented at 8:33 pm on July 28, 2020:

ACK on the removal of the unused CNodeState.name, however it’s kinda hidden here and makes the misbehaving moves in 53a2978b5ea27852f7bfab337ccec83feee4e72e harder to follow. Maybe it should be done in its own commit?

jnewbery commented at 8:10 am on July 29, 2020:

It absolutely should. When I wrote the original branch, I moved name to Peer as well, but since then, I removed all uses of name in #19472 and #19583 but failed to remove it as a member. I’ve now separated its removal into a separate commit.

in src/net_processing.cpp:499 in 3fa130da3a outdated

494+Mutex g_peer_mutex;
495+static std::map<NodeId, PeerRef> g_peer_map GUARDED_BY(g_peer_mutex);
496+
497+/** Get a shared pointer to the Peer object.
498+ *  May return nullptr if the Peer object can't be found. */
499+static PeerRef GetPeer(NodeId id) {

troygiorshev commented at 8:37 pm on July 28, 2020:

nit: GetPeer is already in the codebase as CService::GetPeer. Suggest GetPeerRef instead

jnewbery commented at 8:15 am on July 29, 2020:

ok, changed to GetPeerRef()

in src/net_processing.cpp:463 in 3fa130da3a outdated

458@@ -468,6 +459,51 @@ static CNodeState *State(NodeId pnode) EXCLUSIVE_LOCKS_REQUIRED(cs_main) {
459     return &it->second;
460 }
461 
462+/**
463+ * Data stucture for an individual peer. This struct is not protected by

troygiorshev commented at 8:52 pm on July 28, 2020:

s/stucture/structure/

jnewbery commented at 8:15 am on July 29, 2020:

Fixed

troygiorshev commented at 9:00 pm on July 28, 2020: contributor

tACK 3fa130da3ad7960f9ece4d34ff6e108b64b6f957 with some questions and suggestions.

Is the plan to lock every member of Peer behind its own mutex? Why not give each Peer its own mutex instead?
Misbehaving’s cs_main locks can be removed from denialofservice_tests.cpp: around here. I looked and couldn’t find any further places.
clang-format complains a bit on each commit.

in src/net_processing.cpp:501 in d60a16bb3f outdated

492+ * by the global g_peer_mutex. Once a shared pointer reference is
493+ * taken, the lock may be released. Individual fields are protected by
494+ * their own locks.
495+ */
496+Mutex g_peer_mutex;
497+static std::map<NodeId, PeerRef> g_peer_map GUARDED_BY(g_peer_mutex);

MarcoFalke commented at 4:36 am on July 29, 2020:

in commit d60a16b

Is there any reason to make this a global? This map gets populated in PeerLogicValidation::InitializeNode, which is called by connman. PeerLogicValidation keeps a pointer to this connman, so it would seem natural to me that it also keeps track of the peer map, e.g. in a (private) member.

jnewbery commented at 7:54 am on July 29, 2020:

This was mostly just following the convention of net_processing having its data stored in globals. I’ve had a look at what would be involved in moving g_peer_map and GetPeer() to be members of PeerLogicValidation. The main problem is that in future commits I want to be able to call GetPeer() from functions all across net_processing, and many of those aren’t member functions of PLV. That means I’d need to either pass a reference to PLV down to those functions, or make them members of PLV.

Doing that (and moving more of the state currently stored as globals into PLV) seems like a reasonable thing to do, but feels a bit separate from the immediate goal of moving data into Peer so I’m inclined to leave it for later. What do you think?

sdaftuar commented at 6:33 pm on August 6, 2020:

I think that if you agree it makes sense to put it in PLV, then now is probably the best time to do it, unless there’s something that would break if you did? In which case I’d say deferring until we do a bigger PLV cleanup would make sense too.

sdaftuar commented at 6:39 pm on August 6, 2020:

I wonder if the ~CNetProcessing destructor should be cleaning up this global map, if this isn’t moved into PLV? It looks like we don’t clean up mapNodeState there, but that seems like an oversight – unless there’s some good reason I’m missing?

jnewbery commented at 8:39 pm on August 6, 2020:

If I can get buy-in here from everyone else to move a bunch of the static functions from net_processing into PLV, then I’m very happy to move g_peer_map into PLV and move those functions as needed in future PRs.

theuni commented at 2:05 pm on August 7, 2020:

+1 on this. No preference whether it happens here or in a follow-up, but I agree PLV is a better home.

jnewbery commented at 9:46 am on August 12, 2020:

I started trying to do this, but it’s an enormous change. Misbehaving() fetches the PeerRef, and Misbehaving() is called all over net_processing. Let’s defer this change to its own PR (probably a series of PRs since it touches almost all of the functions in net_processing.

jnewbery commented at 10:06 am on August 12, 2020:

I wonder if the ~CNetProcessing destructor should be cleaning up this global map, if this isn’t moved into PLV? It looks like we don’t clean up mapNodeState there, but that seems like an oversight – unless there’s some good reason I’m missing?

All nodes are cleaned up by CConnman during the Shutdown sequence (Shutdown() in init.cpp calls CConnman.StopNodes(), which calls FinalizeNode() for each node). When we come to destruct the PeerLogicValidation object both maps should already be empty.

I could add a PeerLogicValidation dtor which asserts these maps are empty, but that seems a bit orthogonal to this PR (if we introduce a bug that means the nodes are finalized during shutdown, then they’ll exist in both the CNodeState map and the Peer map.

jnewbery commented at 5:18 pm on August 24, 2020:

@MarcoFalke @theuni @sdaftuar - I’ve made a branch that puts peer_map inside PeerLogicValidation here: https://github.com/jnewbery/bitcoin/tree/2020-08-peer-plv2. It builds on top of (merged) #19704 and (open) #19791. Let me know what you think. If that 19791 gets merged first, I’ll switch this PR to use that branch. If not, I’m happy to do the follow-up work to move g_peer_map into PeerLogicValidation later.

jnewbery commented at 5:18 pm on September 7, 2020:

PR: #19910

jnewbery commented at 8:34 am on July 29, 2020: member

Is the plan to lock every member of Peer behind its own mutex? Why not give each Peer its own mutex instead?

Currently the plan is to have related members behind a single mutex, eg all tx relay fields behind the same mutex. The main reason is that that model maps most closely to what exists already (look at the various locks inside CNode). Once everything has moved to Peer then it should be much easier to change the locking model, but I didn’t want to add a debate about what the ideal locking model is to this project.

Misbehaving’s cs_main locks can be removed from denialofservice_tests.cpp: around here. I looked and couldn’t find any further places.

Fixed. Thanks!

clang-format complains a bit on each commit.

Can you expand a bit. What am I supposed to be looking for here?

jnewbery force-pushed on Jul 29, 2020

jnewbery commented at 1:48 pm on July 29, 2020: member

clang-format complains a bit on each commit.

ok, I’ve taken clang-format’s suggestions, except one that joins an initializer list onto the same line as the function and parameter list (which makes a really long line), and one which would break the scripted-diff linter.

jonatack commented at 6:55 am on July 30, 2020: member

@jnewbery it looks like there is an incomplete rebase-in-progress in the last commit 123bba3b03a6b0f9d791758cb5dfc2e5468bf193; will review on next push.

in src/net_processing.cpp:426 in 730ad2ee40 outdated

421@@ -424,9 +422,8 @@ struct CNodeState {
422     //! Whether this peer relays txs via wtxid
423     bool m_wtxid_relay{false};
424 
425-    CNodeState(CAddress addrIn, std::string addrNameIn, bool is_inbound, bool is_manual) :
426-        address(addrIn), name(std::move(addrNameIn)), m_is_inbound(is_inbound),
427-        m_is_manual_connection (is_manual)
428+    CNodeState(CAddress addrIn, bool is_inbound, bool is_manual) :
429+        address(addrIn), m_is_inbound(is_inbound), m_is_manual_connection (is_manual)

MarcoFalke commented at 7:50 am on July 30, 2020:

dumb style nit in commit 730ad2ee4026ab47daf6c35e8dd4d5a665a5442e:

Colon goes on next line to avoid clang-format complaint:

 0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
 1index e64215c7d5..ca36daabf6 100644
 2--- a/src/net_processing.cpp
 3+++ b/src/net_processing.cpp
 4@@ -422,8 +422,8 @@ struct CNodeState {
 5     //! Whether this peer relays txs via wtxid
 6     bool m_wtxid_relay{false};
 7 
 8-    CNodeState(CAddress addrIn, bool is_inbound, bool is_manual) :
 9-        address(addrIn), m_is_inbound(is_inbound), m_is_manual_connection (is_manual)
10+    CNodeState(CAddress addrIn, bool is_inbound, bool is_manual)
11+        : address(addrIn), m_is_inbound(is_inbound), m_is_manual_connection(is_manual)
12     {
13         fCurrentlyConnected = false;
14         nMisbehavior = 0;

jnewbery commented at 11:30 am on July 30, 2020:

ah! Thank you!

jnewbery force-pushed on Jul 30, 2020

jnewbery commented at 8:23 am on July 30, 2020: member

@jonatack oops. Fixed. Thanks!

jnewbery added this to the "Chasing Concept ACK" column in a project

in src/net_processing.cpp:474 in afe3604f5e outdated

469+ * TODO: move most members from CNodeState to this structure.
470+ * TODO: move remaining application-layer data members from CNode to this structure.
471+ */
472+struct Peer {
473+    /** Same id as the CNode object for this peer */
474+    const NodeId m_id;

jonatack commented at 10:12 am on July 30, 2020:

337528b ISTM this type is an int64_t defined in net.h; should we initialize a default here for safety, like for the other members m_misbehavior_score{0} and m_should_discourage{false} added in 36ac41a (edit: similar to net.h:L415 std::atomic<NodeId> nLastNodeId{0};)

jnewbery commented at 11:15 am on July 30, 2020:

Sure. This gets set in the initializer list of the only ctor, but there’s no harm in adding this.

MarcoFalke commented at 11:29 am on July 30, 2020:

It is impossible to not initialize a const member, so I’d prefer to keep it as is, but not strong opinion

in src/net_processing.cpp:910 in afe3604f5e outdated

906@@ -867,6 +907,13 @@ void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
907 }
908 
909 void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
910+    int misbehavior;

jonatack commented at 10:16 am on July 30, 2020:

36ac41a seems safer to initialize misbehavior with a value, even if set ATM a few lines further down

jnewbery commented at 11:26 am on July 30, 2020:

ok. Done.

in src/net_processing.cpp:913 in afe3604f5e outdated

906@@ -867,6 +907,13 @@ void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
907 }
908 
909 void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
910+    int misbehavior;
911+    {
912+        PeerRef peer = GetPeerRef(nodeid);

jonatack commented at 10:22 am on July 30, 2020:

36ac41a should we add an if (peer == nullptr) return false; check here as well

jnewbery commented at 11:27 am on July 30, 2020:

I’ve added an assert, like for CNodeState. The Peer object should always exist when we try to remove the peer.

in src/net_processing.cpp:3805 in afe3604f5e outdated

3753@@ -3720,15 +3754,15 @@ void ProcessMessage(
3754 bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
3755 {
3756     const NodeId peer_id{pnode.GetId()};
3757+    PeerRef peer = GetPeerRef(peer_id);

jonatack commented at 10:28 am on July 30, 2020:

36ac41a

perhaps this line inside the inner block (beginning with the following line), like the three cases above
should we add an if (peer == nullptr) return false; check here as well?

MarcoFalke commented at 11:30 am on July 30, 2020:

A reference to the node exists (CNode& pnode) so it should be impossible for the peer ref to be deleted already

jnewbery commented at 11:35 am on July 30, 2020:

perhaps this line inside the block like the three cases above

I don’t think this is required. The block is to limit the scope of the lock. Putting the peer variable in there is fine, but I don’t think it’s consistent with project style.

The block in GetNodeStateStats() was left over from a previous iteration of this branch. I’ve removed it. The block in FinalizeNode() is to force the PeerRef object to go out of scope, which will generally cause the underlying Peer object to be destructed, since that’s usually the last reference.

should we add an if (peer == nullptr) return false; check here as well?

Yes we should. Thank you!

in src/net_processing.cpp:883 in afe3604f5e outdated

879+        mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, pnode->fInbound, pnode->m_manual_connection));
880+    }
881+    {
882+        PeerRef peer = std::make_shared<Peer>(nodeid);
883+        LOCK(g_peer_mutex);
884+        g_peer_map.insert(std::make_pair(nodeid, std::move(peer)));

jonatack commented at 10:30 am on July 30, 2020:

337528b not sure here, s/insert/emplace/ to construct rather than copy?

jnewbery commented at 11:21 am on July 30, 2020:

ok, I’ve changed this to emplace_hint since the new peers are always added to the end of the map (same as for CNodeStates in mapNodeState)

jonatack commented at 12:32 pm on July 30, 2020:

Interesting, TIL thanks, and there were only two emplace_hint statements in the codebase, one of which is in the same function. From https://en.cppreference.com/w/cpp/container/map/emplace_hint it looks like a good choice for lower complexity and better perf in this case.

jonatack commented at 10:40 am on July 30, 2020: member

ACK afe3604f5e1a7eae5c38543147f55900569c9908

Some questions below, feel free to ignore.

jnewbery force-pushed on Jul 30, 2020

jnewbery commented at 11:41 am on July 30, 2020: member

Thanks for the review @jonatack and @MarcoFalke . All your latest review comments are addressed.

troygiorshev commented at 12:11 pm on July 30, 2020: contributor

reACK 951093884be4b6e48d2850dfc667c298f85afe9c

laanwj commented at 2:49 pm on July 30, 2020: member

Code review ACK 8e35bf59062b3a823182588e0bf809b3367c2be0 ~~951093884be4b6e48d2850dfc667c298f85afe9c~~ @theuni As you made the CNode and CNodeState separation originally can you take a look here please.

jonatack commented at 3:23 pm on July 30, 2020: member

re-ACK 951093884be4b6e48d2850dfc667c298f85afe9c per git diff afe3604 9510938

fjahr commented at 7:46 pm on July 30, 2020: member

Code review ACK 951093884be4b6e48d2850dfc667c298f85afe9c

Still testing a bit but everything looks good!

jnewbery commented at 8:04 pm on July 30, 2020: member

Note for maintainers: please don’t merge this yet. It has 4 ACKs, but going down this path is quite an important design decision. I’d like @theuni, @sipa or @sdaftuar to weight in before we move forward.

in src/net_processing.cpp:483 in cdba96b9fc outdated

472@@ -479,6 +473,13 @@ struct Peer {
473     /** Same id as the CNode object for this peer */
474     const NodeId m_id{0};
475 
476+    /** Protects misbehavior data members */
477+    Mutex m_misbehavior_mutex;

ariard commented at 11:38 pm on July 30, 2020:

Have you considered std::atomic ? It has API to read/add atomically and it might there fit as we don’t control flow on union of both values. I guess it’s a bit faster than library mutex, but surely not worthy the complexity, maybe for next moved values.

jnewbery commented at 5:12 pm on August 6, 2020:

I think it’s easier to reason about having both misbehavior fields (m_misbehavior_score and m_should_discourage) under the same mutex since they’re so closely related.

in src/net_processing.cpp:480 in cdba96b9fc outdated

472@@ -479,6 +473,13 @@ struct Peer {
473     /** Same id as the CNode object for this peer */
474     const NodeId m_id{0};
475 
476+    /** Protects misbehavior data members */
477+    Mutex m_misbehavior_mutex;
478+    /** Accumulated misbehavior score for this peer */
479+    int nMisbehavior GUARDED_BY(m_misbehavior_mutex){0};
480+    /** Whether this peer should be disconnected and banned (unless it has the noban permission) */

ariard commented at 11:40 pm on July 30, 2020:

Banned or discouraged ? I thought we banned the former to avoid people being discouraged to understand this logic.

jnewbery commented at 5:14 pm on August 6, 2020:

Thanks. This was a bad rebase. Now fixed.

ariard commented at 0:05 am on July 31, 2020: member

Approach ACK 9510938

I really like the approach to gather members by logical usage and progressively swap their cs_main guard by a new per-structure one. I thought at first, it would add new lock dependencies if we moved everything from CNodeState directly into Peer as a quick read of related issue description made it thought but in fact no.

Perhaps #19398 should laid out more the lock refactoring strategy and compare with any concurrent one (wasn’t this discussed on IRC a while ago ?). It may help to take a decision here.

I glanced back on #18963, it adds also a CPeerState, aiming also to dry-up CNodeState but as branch tip was only used to signal end of validation processing back to source peer and resume its message processing. I think this current approach is somehow concurrent to #18963 but not incompatible with its end-goal of introducing asynchronous headers/block processing. Once this refactoring is done, it would be easier to reason on adding a new async “validation” thread, still challenging but easier.

jnewbery force-pushed on Aug 6, 2020

in src/net_processing.cpp:505 in 28837a0a2d outdated

493+Mutex g_peer_mutex;
494+static std::map<NodeId, PeerRef> g_peer_map GUARDED_BY(g_peer_mutex);
495+
496+/** Get a shared pointer to the Peer object.
497+ *  May return nullptr if the Peer object can't be found. */
498+static PeerRef GetPeerRef(NodeId id)

sdaftuar commented at 6:49 pm on August 6, 2020:

So even apart from not using cs_main, there’s another design change happening here in how we use these objects, when compared with CNodeState – no lock on the map is maintained in order to have a shared_ptr to the peer object.

That means that our code needs to be able to handle having the entry in the map erased out from under it, correct? That might be worth mentioning somewhere as a design consideration for future code.

jnewbery commented at 8:41 pm on August 6, 2020:

Yes, that’s possible. Do you think the comment on the map needs expanding? It currently says “Once a shared pointer reference is taken, the lock may be released. Individual fields are protected by their own locks.”

in src/net_processing.cpp:482 in c45a56d2fb outdated

472@@ -479,6 +473,13 @@ struct Peer {
473     /** Same id as the CNode object for this peer */
474     const NodeId m_id{0};
475 
476+    /** Protects misbehavior data members */

sdaftuar commented at 6:51 pm on August 6, 2020:

ACK spelling change

jnewbery commented at 8:50 pm on August 6, 2020:

🇺🇸

in src/net_processing.cpp:914 in c45a56d2fb outdated

906@@ -906,7 +907,11 @@ void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
907 }
908 
909 void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
910+    int misbehavior{0};
911     {
912+        PeerRef peer = GetPeerRef(nodeid);
913+        assert(peer != nullptr);
914+        misbehavior = WITH_LOCK(peer->m_misbehavior_mutex, return peer->nMisbehavior);

sdaftuar commented at 6:56 pm on August 6, 2020:

This is a mouthful; would it be reasonable to add accessor functions to the Peer class that handle the lock acquisitions themselves?

jnewbery commented at 8:58 pm on August 6, 2020:

For functions that access multiple fields under the same lock, we’ll need to hold the lock externally to the struct anyway (see Misbehaving() for example, which accesses both m_misbehavior_score and m_should_discourage while holding m_misbehavior_score). So I could add setters/getters which handle the locking, but the locks would still need to be public and it wouldn’t help with encapsulating Peer.

I like that Peer is a struct with all public members and no internal logic, and I also think the WITH_LOCK macro is pretty nifty so don’t mind this pattern. If others also have a strong distaste for the WITH_LOCK pattern, I can look at adding getter methods.

troygiorshev commented at 10:08 pm on August 6, 2020:

I’m fond of WITH_LOCK. Especially if we’re going the way of many individual locks, it’s much quicker this way to see what’s locking what.

This does give me flashbacks to Java though… Button button = new Button("button");

theuni commented at 2:17 pm on August 7, 2020:

@sdaftuar This was my high-level feedback as well. I strongly prefer encapsulation as it’s really really hard to screw up encapsulated locking, but it’s a burden when multiple operations could otherwise be handled with a single lock. But as this work/PR is intended to break up locking that has grown unwieldy, I can see why @jnewbery went for this approach.

Also, just in terms of evolution of thinking… I have always preferred encapsulated locks because of the security guarantees. But I’m less against exposed mutexes these days due to our annotations and sanitizers.

jnewbery commented at 9:53 am on August 12, 2020:

It seems like there are two different points here:

“This is a mouthful” (@sdaftuar) seems like a stylistic comment. I actually like the fact that the WITH_LOCK macro is explicit about the lock it’s taking.
“I strongly prefer encapsulation as it’s really really hard to screw up encapsulated locking” (@theuni) is a comment about locking design. Cory, correct me if I’m wrong, but I think your distaste here is informed by how cs_main has been (mis)used in the past, where it’s locked the mutex for huge scopes unnecessarily. With lock annotations and fine-grained mutexes, I don’t see a problem with exposing those mutexes outside the class/struct.

theuni commented at 1:32 pm on August 12, 2020:

My primary concern is that 5 years from now someone changes some code that interacts with a Peer and forgets to grab a lock. With encapsulated locking, that can’t happen.

But I’m now satisfied that even without the encapsulation, we have safeguards to prevent that.

sdaftuar approved

sdaftuar commented at 7:05 pm on August 6, 2020: member

Seems fine. Had a few questions but nothing that should be a blocker if you ignore. I’m not strongly wedded to the design decisions here though, it’s not obvious to me that going down this road of having fine-grain, per-peer locks really adds anything, compared to (say) having just one lock for a whole Peer object (which maybe is even held the whole time you’re using it, for simplicity). But nothing here seems broken, and I don’t see deadlock risks or anything for how this is being used now, so this seems okay to me.

jnewbery commented at 8:49 pm on August 6, 2020: member

Thanks for the review @sdaftuar!

I’m not strongly wedded to the design decisions here though, it’s not obvious to me that going down this road of having fine-grain, per-peer locks really adds anything, compared to (say) having just one lock for a whole Peer object (which maybe is even held the whole time you’re using it, for simplicity).

I’m not strongly wedded to it either. My reason for doing it this way is that it most closely reflects the locking design in CNode, so the future PRs that move the remaining data to Peer should be easier to review. Once they’re all in Peer, moving from fine-grained mutexes to coarser-grained mutexes should also be an easy change if it’s desired, but moving the other way requires a bit more thought. So in summary: this is easier and leaves more flexibility later.

in src/net_processing.cpp:914 in c45a56d2fb outdated

906@@ -906,7 +907,11 @@ void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
907 }
908 
909 void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
910+    int misbehavior{0};
911     {
912+        PeerRef peer = GetPeerRef(nodeid);
913+        assert(peer != nullptr);

theuni commented at 2:21 pm on August 7, 2020:

This is mostly a question for future stuff that gets moved in but… What if the refcount is more than 1 here? Do we continue tearing it down?

As a contrived (bad) example: what if someone bumps the misbehavior score after it’s been evaluated here?

jnewbery commented at 10:21 am on August 12, 2020:

We’ll remove the Peer shared ptr from g_peer_map, but we don’t destruct the object until the refcount drops to 0. Your contrived example is fine. There would have to be another thread currently in Misbehaving() holding a shared ptr to the Peer object. We’d remove it from the map, the other thread would increment the misbehaving score, and then the Peer object would be destructed when that thread left the Misbehaving() function.

I’ve taken a look at the future commits in https://github.com/jnewbery/bitcoin/tree/2020-06-cs-main-split and I can’t see any problems with this style of teardown.

theuni commented at 6:51 pm on August 13, 2020:

My point here was that calls to Misbehaving() might go unobserved, because the score may have already been evaluated before the other thread bumped it. This is not possible in with the current behavior because of cs_main: once we’re in FinalizeNode, we’re guaranteed that no other thread is accessing it (unless they’ve cached the CNodeState*, which would be a bug).

Obviously we don’t have any threads doing that currently, so I’m wondering if we want to continue to enforce that invariant with a:

0// This isn't a guarantee but it's close enough
1assert(peer.use_count() <= 1);

I’m not too bothered if there’s a rare misbehaving bump that gets missed. My concern is that Peer’s destructor may eventually gain some functionality, and we’ll want to know if we need to keep track of which thread deletes it.

Edit: Trying my point one more time. As I see it, this PR removes the guarantee that every call into Misbehaving() will be reflected upon evaluation in FinalizeNode(). That’s not a huge deal, but it should at least be noted in case more important guarantees are removed by future moves into Peer.

jnewbery commented at 3:41 pm on August 28, 2020:

Thanks @theuni. This is a very good point. I’ll add a comment about not relying on this being the final state of Peer in a future commit.

in src/net_processing.cpp:917 in 28837a0a2d outdated

905@@ -864,6 +906,10 @@ void PeerLogicValidation::ReattemptInitialBroadcast(CScheduler& scheduler) const
906 }
907 
908 void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTime) {
909+    {
910+        LOCK(g_peer_mutex);
911+        g_peer_map.erase(nodeid);

theuni commented at 2:33 pm on August 7, 2020:

Should this and mapNodeState.erase(nodeid) happen at the same time, under the same cs_main lock?

What are the consequences of them being out of sync? (Still reachable by State() but not by GetPeerRef())

jnewbery commented at 10:26 am on August 12, 2020:

No bad consequences, but I’ve moved the Peer cleanup underneath the cs_main lock anyway to make this simpler to think about.

theuni commented at 2:35 pm on August 7, 2020: member

Left a few comments/questions. Conceptually I’m nervous about a shared_ptr’s lifetime outliving its home in g_peer_map.

DrahtBot added the label Needs rebase on Aug 12, 2020

[net processing] Remove CNodeState.name

This has been unused since logging peer IPs was removed from
Misbehaving() in a8865f8b.

aba03359a6

jnewbery force-pushed on Aug 12, 2020

jnewbery commented at 9:54 am on August 12, 2020: member

Rebased (https://github.com/bitcoin/bitcoin/compare/8fe28ceb126dd3206f04f426aef8d067f538ec3f..3c56d2385e195d3de587df95f64ce97fc727b1a3)

[net processing] Add Peer

Peer is a struct for holding per-peer data. This structure is not
protected by cs_main since it does not contain validation-critical data.

7cd4159ac8

[net processing] Move misbehavior tracking state to Peer

Misbehavior tracking state is now contained in Peer instead of
CNode. It is no longer guarded by cs_main, but instead by a
dedicated m_misbehavior_mutex lock.

This allows us to remove 14 cs_main locks from net_processing.

1f96d2e673

scripted-diff: rename misbehavior members

-BEGIN VERIFY SCRIPT-
sed -i 's/nMisbehavior/m_misbehavior_score/g' src/net_processing.cpp src/net_processing.h src/rpc/net.cpp src/qt/rpcconsole.cpp
-END VERIFY SCRIPT-

8e35bf5906

jnewbery force-pushed on Aug 12, 2020

jnewbery commented at 10:27 am on August 12, 2020: member

Thanks everyone for review. I’ve rebased and addressed all comments from @sdaftuar and @theuni . This is ready for re-review.

DrahtBot removed the label Needs rebase on Aug 12, 2020

jnewbery commented at 11:04 am on August 13, 2020: member

@troygiorshev @laanwj @fjahr @jonatack this PR is now rebased and updated to address @sdaftuar and @theuni review comments. Do you mind re-ACKing?

troygiorshev commented at 6:06 pm on August 13, 2020: contributor

reACK 8e35bf59062b3a823182588e0bf809b3367c2be0 via git range-diff master 9510938 8e35bf5

Only minor changes.

And, because I don’t think I’ve seen it yet, ACK on the name of g_peer_mutex. I like the mutex suffix better than the cs prefix, leaving the prefix open to only g, m, or nothing.

promag commented at 2:57 pm on August 15, 2020: member

Concept ACK.

theuni commented at 4:52 pm on August 27, 2020: member

ACK 8e35bf59062b3a823182588e0bf809b3367c2be0.

I think my comment here is important for reviewers to keep in mind here and for future work on Peer, but I don’t see any issues introduced as part of this PR.

jonatack commented at 5:35 pm on August 28, 2020: member

ACK 8e35bf59062b3a823182588e0bf809b3367c2be0 keeping in mind Cory’s comment (https://github.com/bitcoin/bitcoin/pull/19607#discussion_r470173964) for the follow-up

sipa commented at 6:16 pm on August 28, 2020: member

Concept ACK.

laanwj merged this on Aug 28, 2020

laanwj closed this on Aug 28, 2020

jnewbery deleted the branch on Aug 28, 2020

jnewbery removed this from the "Chasing Concept ACK" column in a project

jnewbery referenced this in commit 283995bdfa on Aug 28, 2020

sidhujag referenced this in commit 2633b5906b on Aug 28, 2020

jnewbery referenced this in commit 415890fcce on Sep 2, 2020

jnewbery referenced this in commit 3be8b7eb18 on Sep 3, 2020

MarcoFalke referenced this in commit 147d50d63e on Sep 7, 2020

jnewbery referenced this in commit 597a84fd4c on Sep 7, 2020

jnewbery referenced this in commit 98ce880b0c on Sep 8, 2020

jnewbery referenced this in commit 85e7deb9cd on Sep 28, 2020

jnewbery referenced this in commit 36ddcac2af on Sep 29, 2020

jnewbery referenced this in commit d8a2ad43f4 on Oct 19, 2020

jnewbery referenced this in commit e679b8dea9 on Nov 19, 2020

jnewbery referenced this in commit cd6ae81d1c on Dec 14, 2020

jnewbery referenced this in commit 3be7e5584f on Dec 19, 2020

jnewbery referenced this in commit 717a374e74 on Dec 19, 2020

MarcoFalke referenced this in commit df127ecede on Dec 22, 2020

mjdietzx referenced this in commit 65e3729684 on Dec 26, 2020

Fabcien referenced this in commit b6a215da6d on Jan 6, 2021

nolim1t referenced this in commit 9bbf08bf98 on Feb 16, 2021

fanquake referenced this in commit d3fa42c795 on May 24, 2021

fanquake referenced this in commit 9344697e57 on Mar 25, 2022

DrahtBot locked this on Aug 16, 2022

[p2p] Add Peer struct for per-peer data in net processing #19607

Conflicts