[no merge, meta] refactor: net/net processing split #28252

pull dergoegge wants to merge 65 commits into bitcoin:master from dergoegge:2023-08-conn-interface changing 36 files +3095 −1767
  1. dergoegge commented at 4:30 pm on August 10, 2023: member

    This PR is supposed to provide context for some of the refactoring PRs I’ve been working on (#25268, #26621, etc).

    The aim is to completely decouple CConnman and its internals from PeerManager to allow for isolated testing of our message processing code (isolated from net only as isolating from validation is another can of worms). To get there, this work refactors CConnman’s API to use NodeId as the primary handle for managing connections and defines a new interface ConnectionsInterface that describes the interface between PeerManager and CConnman. The result is that CNode is no longer part of this interface and access to it is no longer needed within net processing.

    Most of the API simply shifts from CConnman::DoX(CNode*) to CConnman::DoX(NodeId) and where possible the PR minimizes the need for new methods on CConnman to avoid the slight overhead of (internally) locking m_nodes_mutex to look up the right CNode by its id.

    Some steps this PR is taking:

    • We can’t test PeerManager in isolation if it is not the owner of its own state. Therefore this PR moves all the remaining application layer data from net to net processing.
    • Some use of CNode in net processing only accesses constant data that is available from the time of connection establishment. This PR introduces ConnectionContext which is meant to hold all this constant connection data and for each connection a copy is stored on the corresponding Peer instance. This reduces the need for accessing CNode in net processing and avoids some For{Each}Node calls (which are a horrible layer violation).
    • Some use of CNode in net processing directly modifies eviction related members of CNode. Instead of wrapping access to these members in CConnman methods, this PR creates a new manager (EvictionManager) that encapsulates the eviction logic as well as the required per-peer data (entirely removing the relevant CNode members).
    • To disconnect a peer , PeerManager directly sets the disconnection flag CNode::fDisconnect to true. This PR wraps access to that flag on CConnman, i.e. CConnman::IsDisconnected & CConnman::DisconnectNode(NodeId).

    Finally, the PR also introduces initial unit tests for PeerManager that make use of a mocked ConnectionsInterface instead of the actual CConnman.

    Looking for conceptual & approach review.


    This is not super polished so I would expect the CI to fail but I will break this up and polish if there is enough conceptual support. I will not keep this rebased as the general architectural direction this is taking should be clear in any case.

  2. [test] Make all denial of service tests use their own connman/peerman e86cd1f977
  3. [eviction] Create EvictionManager stub 981617d338
  4. [eviction] Make EvictionManager keep track of candidates 6de4364e8e
  5. [net] Get candidates from eviction manager
    We temporarily introduce a method to get a specific
    NodeEvictionCandidate from the eviction manager for use in
    `CConnman::AttemptToEvictConnection()`. `EvictionManager::GetCandidate`
    is removed once we are done moving the remaining eviction state from
    `CNode` to the eviction manager.
    bc7e7fc60a
  6. [net] Remove unused CNode state
    Now that the eviction manager is aware of the keyed net group and
    whether or not a peer is prefered for eviction, we can get rid of the
    corresponding CNode members, since they are not used elsewhere.
    7f7d02cdb6
  7. [eviction] Move ping/block/tx times from CNode to EvictionManager 157afa0b00
  8. [eviction] Report services to EvictionManager 5a3ecac7c0
  9. [eviction] Move BIP37 tx relay and bloom filter status to EvictionManager 98d484706e
  10. [eviction] Move SelectNodeToEvict into EvictionManager 3bdaabd27d
  11. [doc] Document relevant initial NodeEvictionCandidate values 6b53d0a2f5
  12. [fuzz] Add EvictionManager target 2977ca3780
  13. [eviction] Report number of blocks in flight to EvictionManager 3a459904ab
  14. [eviction] Report block announce time to EvictionManager c9ebc0e51c
  15. [eviction] Report nodes protected from slow chains to EvictionManager d006913263
  16. [eviction] Report successful connections to EvictionManager 549a56f471
  17. [eviction] Pass max outbound full/block relay connection counts to EvictionManager a583f630e9
  18. [net processing] Move outbound eviction logic to EvictionManager e53cfc985b
  19. [refactor] Cleanup EvictionManager::SelectOutboundNodesToEvict db72115eea
  20. [refactor] Split internals of EvictionManager::SelectOutboundNodesToEvict into two functions 221f5aa429
  21. scripted-diff: Rename SelectNodeToEvict for clarification
    SelectNodeToEvict only suggests inbound nodes to evict.
    
    -BEGIN VERIFY SCRIPT-
    ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); }
    
    ren SelectNodeToEvict SelectInboundNodeToEvict
    
    -END VERIFY SCRIPT-
    314b533ceb
  22. --- eviction stats --- 458fb9ade0
  23. scripted-diff: rename CNodeStateStats
    Most of the stats in CNodeStateStats are now taken from the Peer object.
    Rename the struct to PeerStats.
    
    -BEGIN VERIFY SCRIPT-
    ren() { sed -i "s:\<$1\>:$2:g" $(git grep -l "\<$1\>" ./src ./test); }
    
    ren CNodeStateStats          PeerStats
    ren GetNodeStateStats        GetPeerStats
    ren nodeStateStats           m_peer_stats
    ren statestats               peer_stats
    ren fNodeStateStatsAvailable m_peer_stats_available
    ren fStateStats              peer_stats_available
    
    -END VERIFY SCRIPT-
    a210244ca9
  24. [net processing] Move cleanSubVer to net_processing
    Also rename to m_clean_subversion.
    
    Since the subversion is now contained in the Peer object, it may not be
    available in `getpeerinfo` while the peer connection is being torn down.
    Update the rpc help text to mark the field optional and update the tests
    to tolerate the field being absent.
    5c16c704b0
  25. [net processing] Remove CNode::nLocalHostNonce
    Handle version nonces entirely within net_processing.
    1be89db847
  26. [net processing] Move m_bip152_highbandwidth_{to,from} to net_processing ee92e0ddfa
  27. [net processing] Move nTimeOffset to net_processing 69d9d7511a
  28. [net processing] Move nVersion to net_processing ae5b4a1914
  29. [net processing] Move m_greatest_common_version to net_processing ee06a1d1cb
  30. [net processing] Move CNode::addrLocal to Peer 605e41b303
  31. --- move app-data to net proc --- 02c48eadea
  32. [net] Introduce ConnectionContext cc46840a16
  33. [net] CNode holds ConnectionContext 3c7b8e247b
  34. --- connection context --- 128b9fa406
  35. [net processing] Store copy of ConnectionContext on Peer 2dcead90b3
  36. [net processing] Pass ConnectionContext to AddTxAnnouncement 2d121e273b
  37. [net processing] Pass ConnectionContext to RejectIncomingTxs edca8a4efb
  38. [net processing] Don't pass CNode to SetupAddressRelay 6989fac39b
  39. --- connection context (trivial cases net processing) --- 8f4cc0ada0
  40. [net] Convert m_nodes to std::unordered_map
    For quick lookups by id.
    
    In tests it is now necessary to use ConnmanTestMsg with AddTestNode and
    ClearTestNodes, to make sure map lookups succeed.
    7a2791246d
  41. [net] Use NodeId map lookups instead of looping afd45141fa
  42. --- m_nodes as map --- abd5f1c86f
  43. [net processing] Use CConnman::DisconnectNode over flipping flag 6107379ce4
  44. [net processing] Remove unnecessary fDisconnect checks
    For{Each}Node already checks this internally.
    2cf04c10a6
  45. [net] Introduce IsDisconnected(id) b446a60f19
  46. [net processing] Use CConnman::IsDisconnected over fDisconnect 85527f5335
  47. --- disconnection --- 2a63a42e9d
  48. [net] PushMessage by id d6df523486
  49. [net processing] Pass fSuccessfullyConnected to FinalizeNode e4e5b27302
  50. [net] Wrap CNode::fSuccessfullyConnected on CConnman 323c9f6c71
  51. [net] Wrap CNode::fPauseSend on CConnman eb50739619
  52. [net] Wrap CNode::PollMessage on CConnman a252c05938
  53. [net] Wrap CNode::PongReceived on CConnman c163c6faec
  54. [net] Pass ConnectionContext to CConnman::ShouldRunInactivityCheck ce44b312a1
  55. [net] Pass ConnectionContext to GetLocalAddrForPeer d411c161c1
  56. [net processing] Reduce CNode usage 053b7329b5
  57. [net processing] Don't use For{Each}Node e8e9fa72f4
  58. [net] Delete unused For{Each}Node 2f62ff9435
  59. [net] Remove CNode from NetEventsInterface 3404d46ee8
  60. --- CConnman API -> NodeId --- cde767f5f1
  61. [net] Define ConnectionsInterface 64e40e9f15
  62. [net processing] PeerManager uses ConnectionsInterface d9b7541a25
  63. --- connections interface --- 5f4c73c1fc
  64. [test] Add MockedConnectionsInterface cde425c717
  65. [test] Add initial unit tests (version handshake, ping/pong) for PeerManager d9ca0ecdd3
  66. --- initial unit tests --- f6508f9563
  67. DrahtBot commented at 4:30 pm on August 10, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK theuni

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  68. DrahtBot added the label Needs rebase on Aug 10, 2023
  69. DrahtBot commented at 5:32 pm on August 10, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  70. ajtowns commented at 2:07 am on August 11, 2023: contributor

    Most of the API simply shifts from CConnman::DoX(CNode*) to CConnman::DoX(NodeId)

    That seems undesirable to me? Where possible, acting directly on the node via a pointer/reference is much better than taking a global lock, doing a lookup, and then acting.

    • We can’t test PeerManager in isolation if it is not the owner of its own state.

    I think it’s a mistake trying to have many different owners of “per-node” state – that increases the lookups, locks and coordination required when adding/removing nodes. I think a simpler approach would be to add a unique_ptr<Peer> (and perhaps likewise for other per-peer data) to CNode, which is initialized via NetEventsInterface::InitializeNode. If you want to test PeerManager without a CConnman, it then becomes the test framework’s job to set up a bunch of CNode objects and initialize them. In that world, the Peer struct is just the processing-related data members of CNode, and largely not something that should be passed around independently (let alone in addition to) the CNode its embedded in.

  71. dergoegge commented at 10:54 am on August 11, 2023: member

    @ajtowns Thanks for the quick feedback! I think we’ll mostly disagree here but I appreciate your opinion.

    Most of the API simply shifts from CConnman::DoX(CNode*) to CConnman::DoX(NodeId)

    That seems undesirable to me? Where possible, acting directly on the node via a pointer/reference is much better than taking a global lock, doing a lookup, and then acting.

    While it is obviously true that directly accessing the CNode pointers is more performant (no locking or look ups), accessing internals of other components like this leads to spaghetti code and the performance win of micro or nano seconds is just not worth it.

    I think a simpler approach would be to add a unique_ptr<Peer> (and perhaps likewise for other per-peer data) to CNode, which is initialized via NetEventsInterface::InitializeNode.

    This sounds like doing a complete 180 on the work that happened over multiple years to decouple net and net processing. I’m not re-inventing the wheel in this PR, this general direction has been what other contributors have had in mind and worked on for multiple years. I’d prefer finishing up previous work and making things consistent instead of going in an entirely new direction.

  72. ajtowns commented at 11:23 am on August 11, 2023: contributor

    While it is obviously true that directly accessing the CNode pointers is more performant (no locking or look ups), accessing internals of other components like this leads to spaghetti code and the performance win of micro or nano seconds is just not worth it.

    If we want to have multiple threads processing different peers, then avoiding (or at least minimising) having to go through shared data structures will be necessary. Keeping our data structures in sync is already challenging, making it more complicated and having more things to keep coordinated is a bad thing.

    I think a simpler approach would be to add a unique_ptr<Peer> (and perhaps likewise for other per-peer data) to CNode, which is initialized via NetEventsInterface::InitializeNode.

    This sounds like doing a complete 180 on the work that happened over multiple years to decouple net and net processing.

    If you’re going in the wrong direction, doing a 180 is a good thing, and going even further in the same direction is a bad thing.

  73. dergoegge commented at 12:09 pm on August 11, 2023: member

    then avoiding (or at least minimising) having to go through shared data structures will be necessary

    I could agree on s/necessary/more performant, but with the shared data structure design it would really only cause contention for the look ups which is really not that bad. i.e.

    0// in CConnman::DoX(NodeId)
    1  {
    2    LOCK(m_nodes_mutex);
    3    // do lookup
    4  }
    5
    6  // do work on the CNode
    

    Keeping our data structures in sync is already challenging,

    It’s challenging right now because we are passing internal pointers around.

    making it more complicated and having more things to keep coordinated is a bad thing.

    It’s really not that complicated if all you have to do is call moduleX.ForgetPeer(id) in FinalizeNode.

    If you’re going in the wrong direction, doing a 180 is a good thing, and going even further in the same direction is a bad thing.

    I think we’re just gonna disagree on what the right direction is :) It mostly seems like a discussion of performance vs. sane interfaces and I don’t think it is worth optimizing for performance in this specific case (at the cost of bad interfaces) as it doesn’t represent a bottleneck either way.

  74. theuni commented at 4:04 pm on August 11, 2023: member

    We’ve debated the CNode* passing/abstracting thing ever since CConnman was merged. This has come up over and over so it’s definitely not new. And the big problem is: everyone has their own take. The question of “should we use CNode or CConnman as the interface for network stuff” could be debated forever. I think the only correct answer is “not both arbitrarily”, which is what we do now.

    I commend @dergoegge for working on this. It’s tough and requires picking an approach and sticking to it.

    Personally, I don’t love any solution, but NodeId as an access handle has always seemed like one of the most reasonable approaches to me. It’s not the most sophisticated, but it’s straightforward. And from my experiments years ago anyway, there were fewer footguns as there are no more external lifetime concerns. So Concept ACK from me on that part.

    What matters to me is that CConnman gets a clean separation and CNode’s internals are no longer exposed. TBH I’m less concerned about the approach taken to get there. I understand that there may be a performance trade-off, but imo it’s worth it for simplicity. I fully expect people to disagree, but if performance is the argument, imo it needs to be demonstrated. @ajtowns’s take is also totally reasonable imo, just a matter of different priorities I think.

  75. ajtowns commented at 2:55 am on August 14, 2023: contributor

    I think we’re just gonna disagree on what the right direction is

    Isn’t the point of opening this to discuss that and at least try to come to an agreement? Being immediately dismissed out of hand feels pretty frustrating.

    It’s challenging right now because we are passing internal pointers around.

    I don’t really think that’s at all true? Passing pointers to objects is a completely normal thing; and whether something is “internal” or not is just a matter of how you define the api – if other things are accessing it, it’s not “internal” by definition, though perhaps it should be.

    I’d contend that the challenging things is mostly that we’re letting nodes interact with other nodes’ state directly.

    It’s really not that complicated if all you have to do is call moduleX.ForgetPeer(id) in FinalizeNode.

    It also means either wrapping *Assert(GetState(nodeid)), doing a whole bunch of normally redundant if (peer) ... boilerplate after peer = GetPeerRef(nodeid), or else leaving potential nullptr dereferences lying about. No, it’s not that complicated, but it’s largely unnecessary, same as using a possibly null pointer in a function param, when you could use an always-valid reference instead.

    It mostly seems like a discussion of performance vs. sane interfaces and I don’t think it is worth optimizing for performance in this specific case (at the cost of bad interfaces) as it doesn’t represent a bottleneck either way.

    I don’t think this approach is a “sane” interface – interacting with many peers is fundamentally something we should be able to do in parallel, and adding many synchronisation points where we prevent ourselves from operating in parallel is a fundamentally broken approach. It’s broken both from the perspective of “it would be nice if we could do better in the future”, in that you’re making those improvements harder, but it’s also broken from the perspective of “it’s easier to avoid bugs when you can just focus on a single thing, and not have to worry about many interactions”.

    Ignoring the performance aspect, and just focussing on the “net_processing deals with a single peer and doesn’t interfere with the state of other peers” would, I think, look like:

    • replace ForNode and ForEachNode with a callback – ie, pass a lambda (with copy captures) back to connman, and have connman forward that lambda back to each peer to deal with as they choose
    • replace the ForEachNode loops on evictions with a cached score – when you run ProcessMessages or SendMessages nodes update their eviction score, and connman tracks the worst scores so that when a node needs to be evicted it knows exactly which node to evict
    • …and I think that’s it? At that point all of PeerManager only needs to know about a single peer at any one time, which makes for simpler code (and makes parallelising net processing much more plausible, though that gets into “performance”) Huh, that seems much simpler than I would have expected.

    Having PeerManager only focus on a single peer also seems a much better design for fuzz (or unit) testing to me – in the above model, you’d test interactions with other peers solely via the callbacks, eg, without ever necessarily having to actually instantiate multiple peers. Am I missing something there?

    (Actually parallelising the msghand thread probably depends on whether we can at least do parallel lookups into the mempool perhaps via a reader/writer lock approach, but something like this would be a necessary step too)

  76. dergoegge commented at 1:34 pm on August 14, 2023: member

    Being immediately dismissed out of hand feels pretty frustrating.

    I apologize, that wasn’t my intention.

    I want to make it clear though that this PR is about continuing previous decoupling work that has seemingly dropped of peoples radar when previous contributors switched focus or left the project. I do not think the previous work went in the wrong direction nor that this breaks potentially multi threading the message processing code in the future.

    If you think this design is somehow fundamentally broken w.r.t. parallelizing, please elaborate why because I don’t see it.

    whether something is “internal” or not is just a matter of how you define the api – if other things are accessing it, it’s not “internal” by definition, though perhaps it should be.

    It seemed well understood to me that we think of our p2p stack to consist of two layers/modules: net (lower level networking, connection management) and message processing (net processing) that each have their own responsibilities. Therefore I would classify pointers into their respective data structures as internals.

    Perhaps a competing PR/design doc of what you are thinking about would help? I’ve thought about the design in this PR so much that I might be a little stuck in my ways now.


    Judging by the rest of your comment, it seems to me like parallelizing message processing is of big interest to you. I agree that that is cool but I also think we should have better testing (ideally implementation agnostic) before switching to that. In any case maybe it makes more sense to discuss that in another issue?

    Actually parallelising the msghand thread probably depends on whether we can at least do parallel lookups into the mempool perhaps via a reader/writer lock approach, but something like this would be a necessary step too

    Using cs_main for so many validation unrelated things is probably another blocker, but perhaps easier to solve.

  77. ajtowns commented at 8:56 am on August 16, 2023: contributor

    I think I might be coming across as disagreeing more than I actually do? I’m all for modularisation and better testing, and getting things tightly focussed so that you don’t need to keep huge amounts of context in your head; but I think the way we’ve been doing that in net_processing in many places is adding complexity and interrelationships rather than reducing them.

    Perhaps a competing PR/design doc of what you are thinking about would help? I’ve thought about the design in this PR so much that I might be a little stuck in my ways now.

    I think it would be better to move towards something more like:

    • CTxMempool, TxRequestTracker, Chainstate, etc: global high-level state accessed by PeerManager
    • PeerManager: deals with updating a single peer’s logical/high-level state
    • CConnman: deals with collecting all our peers together, and prioritising which ones to service/connect/evict
    • CNode: deals with all of the state for a single peer
      • high-level, protocol related state: Peer m_peer; CNodeState m_state (probably via an opaque unique_ptr)
      • low-level, I/O state: Sock m_sock
      • coordination vs other peers: NodeId id; nRefCount; fPauseRecv; fDisconnect; ...

    ie, I guess I don’t think net-vs-net_processing should be looked at as a “high level” vs “low level” split at all; but rather as the overall management of connections, who to keep, who to prioritise, etc (CConnman) vs the details of communicating with an individual connection, which messages to send, how to deal with messages you receive, etc (net_processing).

    (There’s also a bunch of “low-level” things in netCSerializedNetMsg, TransportSerializer, CNetMessage; but those are all more related to message handling in net_processing than CConnman)

    Maybe think of it as net_processing as being like the bartender which gets you your drinks and listens to your sob stories; and CConnman is like the bouncer that lets you in in the first place or kicks you out when the bartender says you’re causing trouble, or a celebrity arrives and they need to make space, or you’re violating fire safety rules, or whatever. In that analogy, the things that go in CNode (rather than Peer) are the bits of info the bouncer cares about, either to read (fDisconnect = “kick this guy out”) or as info to pass onto the bartender, like giving someone an stamp on their hand at an all ages venue, so that the bartender knows they can serve them alcohol without having to interrupt the bouncer to ask if they checked their age or not: it’s simpler and more efficient if you can just look these things up directly, rather than having to ask someone. (The analogy breaks if you push it too far, of course)

    That would imply a few implementation changes compared to master:

    • PeerManager doesn’t have m_peer_map or m_node_states, or their associated locks (but does still have m_txrequest and other global state)
    • PeerManager shouldn’t need a reference back to CConnman at all; when interacting with one peer generates work for other peers, that should be coordinated either via global state (mempool/txrequest/etc), or passing the extra work directly from ProcessMessages/SendMessages back to the CConnman that invoked it, which can then distribute the work to appropriate peers
    • peer I/O should be via CNode not CConnman (ie, move PushMessage into CNode, matching PollMessage)
    • CValidationInterface would have to implemented by CConnman instead of PeerManager (those functions apply across all nodes, so it needs to go through the container of all nodes)
    • similarly, the RPC functionality currently exposed through PeerManager would need to go through CConnman first, for the same reason
    • probably splitting CNode out of net.h, perhaps splitting Peer/CNodeState out of net_processing.h (so that net_node.h has an opaque class Peer;, but net_node.cpp can #include <net_msgstate.h> in order to implement CNode::~CNode and destruct its opaque unique_ptr<Peer> m_peer)

    I think an approach like that would have a few benefits:

    • breaks the circular dependency between PeerManager and CConnman
    • lets you test PeerManager in isolation without even instantiating a dummy CConnman
    • can require PeerManager to be passed into CConnman at construction, rather than at connman->Start(), simplifying its API
    • can have CConnman own its NetEventsInterface instead of and be responsible for destructing it (make the virtual destructor public), removing the risk that peerman gets destructed while connman is still using it
    • potentially lets you simplify the per-peer locks, since all access to nodes’ high-level state goes via CConnman
    • keeps things efficient – you’re still separating things into logical modules, but not adding layers of lookups and indirection

    Using cs_main for so many validation unrelated things is probably another blocker, but perhaps easier to solve.

    mempool.cs can also be a blocker for effective parallelisation – you can’t quite do tx relay to two peers concurrently when even looking up a txid via the mempool to see if an INV is a duplicate requires an exclusive lock, especially if that lock’s held for an extended period (eg while validating a tx or block or reorging).

    If you think this design is somehow fundamentally broken w.r.t. parallelizing, please elaborate why because I don’t see it.

    If you want good performance then you keep things simple – few locks, access things directly. Often that approach is also simple and the easy to understand. Doing things in a complicated, round about way isn’t “fundamentally broken” – you can make it work, and you can always throw enough hardware at it to make it work fast enough. But I think it should be pretty obvious that:

    0bool CNode::IsDisconnected() const { return fDisconnected; }
    1void CNode::Disconnect() { fDisconnected = true; }
    

    is simpler and more performant and better for parallelisation than

     0bool CConnman::IsDisconnected(NodeId id) const
     1{
     2    LOCK(m_nodes_mutex);
     3    auto it = m_nodes.find(id);
     4    return it == m_nodes.end() ||
     5           it->second->fDisconnect;
     6}
     7bool CConnman::DisconnectNode(NodeId id)
     8{
     9    LOCK(m_nodes_mutex);
    10    auto it = m_nodes.find(id);
    11    if (it == m_nodes.end()) return false;
    12    it->second->fDisconnect = true;
    13    return true;
    14}
    

    The more complicated code here isn’t doing anything clever to justify the complexity: it’s still just reading/setting the same atomic bool. Both PeerManager and CConnman care about this bool, so it’s not “internal” to one or the other.

    Now, sure: a quick map lookup while holding a global lock isn’t the end of the world, but refactoring should be making things simpler or more efficient or both, and this is doing the opposite. That’s why I think it’s the wrong direction. If the design is forcing your code to be complicated, ugly and slow, then the design’s bad.

    For parallelisation in particular, you get best performance with the fewest locks (so you don’t have to stop, and potentially have another process claim the cpu) and the least shared data (so you can just process it in cache without having to synchronise with other tasks). So adding locks and lookups over shared data on things that were previously just a single atomic operation is completely backwards.

    Here’s a concrete example: https://github.com/ajtowns/bitcoin/commits/202308-netnode . It first splits out CNode and related things from net.h, then moves PushMessage into CNode (and so it doesn’t require m_connman.m_total_bytes_sent_mutex or m_connman at all for sending messages), then also moves GetAddresses() into PeerManager. That drops the number of m_connman uses in net_processing from 80 to 27 by my count, so seems a decent start on cutting out the circular dep between CConnman and PeerManager. Dropping connman.ForEachNode and for (peer : m_peer_map) in this model is a whole other bunch of work, of course.

    For example, if you look at net vs net_processing as low-vs-high level, moving -maxsendbuffer would be backwards; but if you look at it as “message handling” vs “dealing with the collection of all nodes” its logical: maxsendbuffer affects your communication with an individual node, it’s not something that matters across nodes.

  78. ajtowns commented at 1:41 am on August 17, 2023: contributor

    I guess two other things might be worth thinking about:

    • the approach I describe above maybe can be looked at as moving much of the low-level stuff into CNode (or net_node.h), leaving net_processing for the high-level logic about messages and peer state, and net/CConnman for the high-level logic about dealing with many peers. so it’s still something of a low/high level split, just splitting at a different point. moving as much low-level logic into CNode as possible also helps ensure we’re able to do all the low-level work across each node independently, which then may make it easier to parallelise that work.
    • one of the other things about parallelising message handling, is that if you’re dealing with node A, and it decides it wants to loop over all nodes, then if some other node B is super busy doing a long task, that ends up blocking A as well. having the design be “A just deals with its own stuff and adds anything for B to a queue that B will deal with when it gets around to it” works a fair bit better there, I think; the complexity is pretty much all in implementing the “queue” part.
  79. dergoegge commented at 8:05 pm on September 28, 2023: member
    Closing since there doesn’t seem to be sufficient support/interest for this
  80. dergoegge closed this on Sep 28, 2023


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me