rpc: skip getpeerinfo for a peer without CNodeStateStats #26515

pull mzumsande wants to merge 1 commits into bitcoin:master from mzumsande:202211_getpeerinfo_allornothing changing 1 files +8 −0
  1. mzumsande commented at 10:14 pm on November 16, 2022: contributor

    The objects CNode, CNodeState and Peer store different info about a peer - InitializeNode() and FinalizeNode() make sure that for the duration of a connection, we should always have one of each for a peer.

    Therefore, there is no situation in which, as part of getpeerinfo RPC, GetNodeStateStats() (which requires a CNodeState and a Peer entry for a NodeId to succeed) could fail for a legitimate reason while the peer is connected - this can only happen if there is a race condition between peer disconnection and the getpeerinfo processing (see also a more detailed description of this in #26457#pullrequestreview-1181641835).

    But in this case I think it’s better to just not include the newly disconnected peer in the response instead of returning just parts of its data.

    An earlier version of this PR also made the affected CNodeStateStats fields non-optional (see https://github.com/mzumsande/bitcoin/commit/5f900e27d0e5ceaa3b800a2eb5a8e8ff6bccad3b). Since this conflicts with #25923 and should be a separate discussion, I removed that commit from this PR.

  2. mzumsande marked this as a draft on Nov 16, 2022
  3. fanquake added the label RPC/REST/ZMQ on Nov 16, 2022
  4. in src/rpc/net.cpp:193 in ec839fe2a4 outdated
    184@@ -185,6 +185,14 @@ static RPCHelpMan getpeerinfo()
    185         UniValue obj(UniValue::VOBJ);
    186         CNodeStateStats statestats;
    187         bool fStateStats = peerman.GetNodeStateStats(stats.nodeid, statestats);
    188+        // GetNodeStateStats() requires the existence of a CNodeState and a Peer object
    189+        // to succeed for this peer. These are created at connection initialisation and
    190+        // exist for the duration of the connection - except if there is a race where the
    191+        // peer got disconnected in between the GetNodeStats() and the GetNodeStateStats()
    192+        // calls. In this case, the peer doesn't need to be reported here.
    193+        if (!fStateStats) {
    


    jonatack commented at 10:57 pm on November 16, 2022:

    Haven’t tested yet if this works, but checking for fStateStats seems a bit of a smell in general and I wanted to remove it entirely, e.g. #25923, though the way you use it here looks like a real improvement that could be orthogonal to much of that pull.

    Note that you haven’t updated the code for the GUI peers details.


    mzumsande commented at 11:06 pm on November 17, 2022:

    Looking into #25923 right now - it seems that it does some what Marco suggested below, making some fields, like txrelay, optional for an intrinsic reason instead of using defaults like -1 or 0 when the info is not known.

    As for the GUI - I didn’t change it on purpose, because this comment scared me off. I think that #25923 is affected by that too, I’ll comment there.


    jnewbery commented at 9:29 pm on November 24, 2022:
    The TRY_LOCK for accessing CNodeStats has been in the GUI since peer information was added in #4225 (see #4225 (comment)). I’m not sure why it was decided that the GUI could block on taking cs_vNodes, but can’t block on taking cs_main.

    mzumsande commented at 7:39 pm on November 28, 2022:
    I’m not sure either, but one reason might be because it slows down IBD a little if the GUI constantly competes with validation for cs_main just to update the peers window. Another reason might be that cs_main is held more often and longer by the node compared to cs_vNodes (current name m_nodes_mutex), so it might become noticeable that the GUI freezes. In both of these cases, skipping some of the requests from the GUI could make sense.
  5. DrahtBot commented at 3:29 am on November 17, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK MarcoFalke
    Concept ACK jonatack
    Approach ACK dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26621 (refactor: Continue moving application data from CNode to Peer by dergoegge)
    • #25923 (p2p: always provide CNodeStateStats and getpeerinfo/netinfo/gui updates by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. jonatack commented at 5:07 am on November 17, 2022: contributor
    Concept ACK, will look further at this change.
  7. maflcko commented at 5:20 pm on November 17, 2022: member
    Do we really want this, given that the fields have been optional forever(?)? I think we should first think if there are fields that should be optional. If the answer to that is no, then this seems fine to do, but without an answer, it seems too early to change.
  8. mzumsande commented at 6:06 pm on November 17, 2022: contributor

    Do we really want this, given that the fields have been optional forever(?)? I think we should first think if there are fields that should be optional. If the answer to that is no, then this seems fine to do, but without an answer, it seems too early to change.

    I think that if we make any fields optional, we should have a reason for this, and communicate to the users within the RPC help in which situations these fields won’t be included. Unless I’m missing something, the honest explanation as to why these fields are optional right now would be: “We can’t keep our internal objects in sync as well as we’d like to, so in case of a race condition some fields may be left out from the response because they are part of Internal Object 1, and others are always present because they belong to Internal Object 2” - this seems really unsatisfactory to me.

    The reason should be something that has to do with the actual meaning of the field, e.g. if there is a good reason why it’s not available under certain conditions (like it’s currently being done for fields like mapped_as or minping).

  9. maflcko commented at 6:21 pm on November 17, 2022: member

    The reason should be something that has to do with the actual meaning of the field

    yes, agree. This is what I was trying to say we should answer first before changing the code.

  10. mzumsande commented at 6:26 pm on November 17, 2022: contributor

    yes, agree. This is what I was trying to say we should answer first before changing the code.

    Ok, I’ll look into this field by field before taking this out of draft. However, note that if we decide to keep certain field optional for an intrinsic reason, it would likely be a behavior change, because we currently include all CNodeStateStats fields as long as we have a Peer and CNodeState object for the peer.

  11. maflcko commented at 6:37 pm on November 17, 2022: member

    if we decide to keep certain field optional for an intrinsic reason, it would likely be a behavior change

    Heh, this might actually be a convincing argument to go ahead with the changes here as-is. (And then add a new field name, if needed, for “truly” optional fields instead of breaking the behaviour?)

  12. maflcko commented at 8:35 am on November 18, 2022: member

    Though,

    If this gets approval, bitcoin-cli -netinfo could probably be simplified as a follow-up because it wouldn’t need to deal with missing entries for some of these fields anymore.

    I don’t think this part is true. bitcoin-cli would still need to deduce if a non-optional field is “true” optional with some heuristics like https://github.com/bitcoin-core/gui/pull/677. Though, this is already true on master and seems unrelated to this pull.

  13. rpc: Require NodeStateStats object in getpeerinfo
    There is no situation in which CNodeStateStats could be
    missing for a legitimate reason - this can only happen if
    there is a race condition between peer disconnection and
    the getpeerinfo call, in which case the disconnected peer
    doesn't need to be included in the response.
    6fefd49527
  14. mzumsande force-pushed on Nov 28, 2022
  15. mzumsande renamed this:
    rpc: skip getpeerinfo for a peer without CNodeStateStats, make its fields non-optional
    rpc: skip getpeerinfo for a peer without CNodeStateStats
    on Nov 28, 2022
  16. mzumsande commented at 7:10 pm on November 28, 2022: contributor

    I removed the part that makes the CNodeStateStats fields non-optional. Since that part conflicts with #25293 and can be a separate discussion (I commented there), this PR now only addresses the situation where we have a CNode but no CNodeStateStats due to a race during shutdown.

    I don’t think this part is true. (…)

    Removed from OP

  17. mzumsande marked this as ready for review on Nov 28, 2022
  18. dergoegge commented at 5:25 pm on December 5, 2022: member

    Approach ACK 6fefd49527fa0ed9535e54f2a3e76fe2599b2350

    Unless I’m missing something, the honest explanation as to why these fields are optional right now would be: “We can’t keep our internal objects in sync as well as we’d like to, so in case of a race condition some fields may be left out from the response because they are part of Internal Object 1, and others are always present because they belong to Internal Object 2” - this seems really unsatisfactory to me.

    I don’t think this is entirely accurate and I summarized my thinking about it below.

    It seems to me that this is all the result of us re-architecting our code base, specifically splitting lower level network and application layer per-peer state into separate data structures managed by different components (CConnman, PeerManager). Obviously an overall good development which we don’t need to rehash the merits of here.

    Ideally these components are loosely coupled (they aren’t really right now but I think we will/should get there eventually), such that the internal state of one is not dependent on the internal state of the other. Even though they are still tightly coupled at the moment, we should not rely on that to be the case going forward. So in my opinion, any user of the CConnman & PeerManager API (in this case the RPC handler querying statistics) should not expect there to be consistency between the internal state of these two modules and instead handle the edge cases (e.g. only one component returns stats) on its own end.

    I like this PR because it is essentially doing what I describe above, that is handling the edge case in which only the CNode stats are available while the application layer stats are not (due to the connection being dropped in between the queries).

  19. maflcko commented at 12:39 pm on December 19, 2022: member

    I wonder if this will introduce races in code (e.g. our tests) that assume a missing entry in the dict implies a fully disconnected node.

    Edit: If there was a race, it would already exist regardless of this pull. m_nodes will have the node always erased before a call to DeleteNode. (This pull isn’t changing that). And a node erased from m_nodes is currently considered to be “fully disconnected”, even if DeleteNode has not yet been called.

  20. maflcko commented at 12:58 pm on December 19, 2022: member

    review ACK 6fefd49527fa0ed9535e54f2a3e76fe2599b2350 👒

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3review ACK 6fefd49527fa0ed9535e54f2a3e76fe2599b2350 👒
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUiLywwAiJVX2n3iULxIoVvCpGT5ZbVbdyz3if/aL9PQCFzU0EA+j6IWYnecZkD/
     8L7ofFFXe89RGlHa1pvMKL2lQ9BYalf4uaL6A/SEN9HdSN/KzIMtxBOFc5yz79xLo
     9hKM8ZPK4STu0QFVIIfy71S3ozH46QGG1lLzHz0fj7Vi1pIWKU3S2c25+82DpcYhv
    10vBqjSZZ1jbEoiKGalQNQ35srRMDaiFKXkkuLa7qpvHmrgoWXV7XuGF0ivnhj7Nf+
    11510gsDJnetWtRgbRhFUa7MfCp9vnYyuGHVur2Vr/WcIbBa0ar/tEOcbiJ6jdl9Sb
    12brhZ/efMHJNNWFCROpH2M6xkNDrKObuTYiIRkoCExjrd8SB6xiFoCLNAQsK5n/wc
    13bmptAUkLhgjpFwka4r6NThU5hKKlmLBpjKXkRh1/qZlmYeUXIfNTk5gr9VGJWEHa
    14UWq6eFZrWThxsNJSI58rU1GscXs07L5QWDaHbhMr/5NOiLUc0GbszjTt9bXfCqbD
    15Vrfs9vMG
    16=TjN1
    17-----END PGP SIGNATURE-----
    
  21. maflcko referenced this in commit 3d974960d3 on Dec 19, 2022
  22. fanquake commented at 1:31 pm on December 19, 2022: member
    This has been merged.
  23. fanquake closed this on Dec 19, 2022

  24. fanquake referenced this in commit 1dc0e4bc6f on Dec 19, 2022
  25. sidhujag referenced this in commit bbc8a636d4 on Dec 19, 2022
  26. jonatack referenced this in commit e72313e6b3 on Jan 11, 2023
  27. mzumsande deleted the branch on Jan 16, 2023
  28. maflcko referenced this in commit 500f25d880 on Jan 18, 2023
  29. fanquake referenced this in commit 2b87e90585 on Jan 20, 2023
  30. fanquake commented at 11:34 am on January 20, 2023: member
    Was backported to 24.x in #26457.
  31. bitcoin locked this on Jan 20, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me