CNodeState encapsulation, lock reduction #8147

kazcw commented at 7:00 AM on June 5, 2016: contributor

I wrote this as a reimplementation for #7942 but forgot to actually push it. The merged version of that PR is, as @gmaxwell brought up, prone to lock inversion. It actually has an inversion of cs_main/cs_filter; the conflicting locks are all taken in the same thread, but it's not good practice.

This uses a new mutex for the two Misbehavior()-related fields of CNodeState, and as an additional lock for mapNodeState. No other locks except addrman's (which is fully encapsulated) are taken while cs_misbehavior is held.

Also, tacked on the fix for the redundant variable from #7942 pointed out by @sipa.

sipa commented at 3:42 PM on June 5, 2016: member

This mixed locking for mapNodeState is not particularly easy to reason about.

What about creating a separate map (alongside mapNodeState) for storing misbehaviour information? Longer term, we'll want to see mapNodeState (and the processing-related data in CNode itself) split up into multiple modules (each with their own state) anyway.

MarcoFalke added the label Refactoring on Jun 5, 2016

kazcw force-pushed on Jun 5, 2016

kazcw commented at 9:19 PM on June 5, 2016: contributor

That does sound better. I didn't like protecting the struct with different locks but I was trying to avoid the overhead of a new map. Reimplemented with a CMisbehaviorTracker that encapsulates its lock. TODO before mergeable: add unit tests for the new class.

kazcw force-pushed on Jun 5, 2016

in src/main.cpp:None in 291644ef7b outdated

 300 | +    typedef std::lock_guard<std::mutex> lock_guard;
 301 | +    struct Info {
 302 | +        const std::string name;
 303 | +        int badness;
 304 | +        bool ban;
 305 | +        Info(std::string&& nameIn, int badnessIn, bool banIn) : name(std::move(nameIn)), badness(badnessIn), ban(banIn) {}

sipa commented at 2:52 PM on June 6, 2016:

Is the std::move necessary here, if nameIn is already an rvalue reference?

kazcw commented at 3:54 PM on June 6, 2016:

The Info ctor accepts an rvalue reference and binds it to nameIn, making it an lvalue.

sipa commented at 4:15 PM on June 6, 2016:

Thanks for clearing that up, I'm still learning all the details related to rvalue reference semantics. I'm starting to understand the necessity of all the logic related to forwarding arguments...

in src/main.cpp:None in 291644ef7b outdated

 344 | +    // returns false if node is not registered
 345 | +    bool Increase(NodeId node, int delta)
 346 | +    {
 347 | +        std::string logentry;
 348 | +        {
 349 | +        lock_guard guard{cs};

sipa commented at 3:15 PM on June 6, 2016:

Indentation?

sipa commented at 3:26 PM on June 6, 2016: member

Nice, utACK with 2 nits, but needs rebase.

kazcw force-pushed on Jun 6, 2016

kazcw commented at 4:01 PM on June 6, 2016: contributor

Rebased and fixed indentation.

sipa commented at 4:21 PM on June 6, 2016: member

utACK ae3c24040cfef500daf5c4b890d1cabe2e191d9a

kazcw commented at 2:56 PM on June 10, 2016: contributor

Some things about this design bug me, but I think I found a solution to my API nits that could fit well with future main state encapsulation improvements too.

(TLDR; minor reimplementation coming soon)

This interface is inconsistent about what happens when an unregistered NodeId is referenced; for some functions it's an internal error, for others it's an expected possibility. It works, because some callers know their NodeId is in a registered state -- they have access to the CNode. I'm thinking of putting a NodeInfo* into CNode, that would be an opaque handle as far as net or main code is concerned. A single-domain node info manager like the misbehavior tracker would store its per-node data in a private object within the NodeInfo. This would clear up not-found handling: all node info functions require a handle (which implies that the CNode is live); access through a NodeId requires first trying to obtain a handle (using a global map with its own lock). It also eliminates the need for map lookups in some common cases and the need for a new map for each encapsulated domain.

It's similar to the existing mapNodeState, but has some important differences: the node info is accessible from a CNode with the assurance that the info object exists; access that doesn't require a lookup from a NodeId doesn't require any synchronization outside what the domain-info manager requires (which could be very little for something like the misbehavior tracker); and of course, the encapsulation.

sipa commented at 3:01 PM on June 10, 2016: member

Long term, I think having the CNodeState pointed to by CNode is wrong: the network layer should not depend on the message handlers that are implemented elsewhere (and especially not on main, where they are now).

Ideally, CNode becomes private inside net, and is never exposed. Message handlers get called with the NodeId passed to them, and sending messages in done through a function exposed from net that takes a NodeId.

kazcw commented at 5:10 PM on June 10, 2016: contributor

Codifying expectations about lifetime with the NodeInfo handle can be done orthogonally to CNode encapsulation: If InitializeNode assigns the NodeIds, they could be the NodeInfo handles (typedef shared_ptr<NodeInfo> NodeId). Getting the NodeInfo handle in the message handler would be the same as getting the NodeId in the message handler (currently with CNode::GetId, eventually as a parameter to the message handler). My previous distinction between having a NodeInfo handle and having a NodeId becomes the distinction between having a shared_ptr<NodeInfo> and a weak_ptr<NodeInfo>.

theuni commented at 5:49 PM on June 10, 2016: member

Many of those problems are already solved by #8085, so I'd like to very much request that we get that in before messing with CNode/CNodeState refactors. @sipa that PR works towards exactly what you describe. CNodes are no longer generally accessible except by proxy through CConnman, which also takes care of messages to keep CNodeState in sync without CNode having to know anything about it. In fact, CConnman doesn't know anything about CNodeState, it simply fires off signals when interesting things happen.

kazcw commented at 6:25 PM on June 10, 2016: contributor

@theuni I'm excited to see that p2p refactor land, it's much needed work. I think I can make these changes for more encapsulated, cs_main-less CNodeState data without any nontrivial conflict with that PR. NodeIds are mostly opaque to the net layer and the net layer doesn't know anything about CNodeStates. There are a couple lines of changes for this that would overlap (moving NodeId assignment out of net), so I'll write my changes to be applied after your PR.

theuni commented at 6:33 PM on June 10, 2016: member

@kazcw That sounds great, thanks! Also, maybe I could convince you to review that PR when you've got some time, since some of this is fresh in your mind :)

kazcw renamed this:
~~separate mutex for Misbehavior()~~
CNodeState re-encapsulation: misbehavior without cs_main
on Jun 10, 2016

kazcw force-pushed on Jun 11, 2016

kazcw renamed this:
~~CNodeState re-encapsulation: misbehavior without cs_main~~
WIP: decoupling CNodeState info from cs_main
on Jun 11, 2016

kazcw force-pushed on Jun 11, 2016

kazcw renamed this:
~~WIP: decoupling CNodeState info from cs_main~~
CNodeState encapsulation, lock reduction
on Jun 15, 2016

kazcw force-pushed on Jun 15, 2016

kazcw force-pushed on Jun 16, 2016

Split and encapsulate CNodeState; reduce locking.

Move the CNodeState properties that aren't currently closely tied to cs_main
data into separately-encapsulated domains of a Peer object; each domain handles
its synchronization internally. Move a few main-only CNode fields to Peer.
Reduce main's usage of CNode fields by determining Peer properties based on the
CNode at the time the Peer is created.

Reduce data guarded by cs_main, cs_filter, cs_feeFilter.

Allow punishing (ban/disconnect) nodes that haven't sent versions yet.

8f5734abe5

kazcw force-pushed on Jun 16, 2016

kazcw commented at 7:19 AM on June 17, 2016: contributor

I'm going to break much smaller independent changes out of this for separate merging (with the eventual actual modularization being pure code movement). It seems like separating misbehavior from cs_main properly depends on the first steps of modularization. This PR is superseded by the coming incremental changes.

kazcw closed this on Jun 17, 2016

bitcoin locked this on Sep 8, 2021